Skip to content
Advertisement

Count distinct number of customers per fiscal year and display all dates in query result

DB-Fiddle

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    order_date DATE,
    customerID VARCHAR(255)
);

INSERT INTO customers
(order_date, customerID
)
VALUES 
('2020-01-15', 'Customer_01'),
('2020-02-03', 'Customer_01'),
('2020-02-15', 'Customer_01'),
('2020-03-18', 'Customer_01'),
('2020-03-20', 'Customer_01'),
('2020-04-22', 'Customer_01'),
('2021-01-19', 'Customer_01'),

('2020-01-25', 'Customer_02'),
('2020-02-26', 'Customer_02'),
('2020-11-23', 'Customer_02'),
('2021-01-17', 'Customer_02'),
('2021-02-20', 'Customer_02');

Expected Result:

order_date   |      quantity
             |    (fiscal year)
-------------|----------------------------------------------------
2020-01-15   |           1   --> Customer_01 appears the first time between 2019-03 and 2020-02
2020-01-25   |           1   --> Customer_02 appears the first time between 2019-03 and 2020-02
2020-02-03   |           0   
2020-02-15   |           0
2020-02-26   |           0
2020-03-18   |           1   --> Customer_01 appears the first time between 2020-03 and 2021-02
2020-03-20   |           0
2020-04-22   |           0
2020-11-23   |           1   --> Customer_02 appears the first time between 2020-03 and 2021-02
2021-01-17   |           0
2021-01-19   |           0
2021-02-20   |           0

In the above result I want to list all order dates and count the number of customers distinct per fiscal year.
The fiscal year starts two months after the calender year and therefore goes from March to February.
(e.g. from 2020-03 til 2021-02).

For example Customer_01 appears the first time on 2020-03-18 within the fiscal year 2020-03 til 2021-02.
Therefore, this order_date gets assigned 1 to it.
If the customer appears again within the fiscal year the next order_date will get assigned 0 to it.


With reference to this question in MariaDB I was able to achieve the expected result as you can see in the DB-Fiddle.

However, now I want to get the same results using postgresSQL.
Therefore, I have modified the query so far to this:

SELECT
order_date,
SUM(rn = 1) AS quantity
FROM 

  (SELECT 
  order_date, 
  row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date) rn
  FROM customers
  ) t
  
GROUP BY 1;

However, now I am getting an error function sum(boolean) does not exist on the SUM(rn = 1) part.
What is the equivalnt syntax for the SUM(rn = 1) in postgresSQL to achieve the expected result?

Advertisement

Answer

After further investigation I came up with the following solution:

DB-Fiddle

SELECT
order_date,
(CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity
FROM 

  (SELECT 
  order_date, 
  (row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date)) AS rolling_count
  FROM customers
  ORDER BY 1
  ) t
  
GROUP BY 1,2
ORDER BY 1;

As comparison here is the MariaDB of the query:

SELECT
order_date,
(CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity
FROM 

  (SELECT 
  order_date, 
  (row_number() over(PARTITION BY YEAR(order_date - INTERVAL 2 MONTH), customerID ORDER BY order_date)) AS rolling_count
  FROM customers
  ORDER BY 1
  ) t
  
GROUP BY 1
ORDER BY 1;
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement