CREATE TABLE customers (
id SERIAL PRIMARY KEY,
order_date DATE,
customerID VARCHAR(255)
);
INSERT INTO customers
(order_date, customerID
)
VALUES
('2020-01-15', 'Customer_01'),
('2020-02-03', 'Customer_01'),
('2020-02-15', 'Customer_01'),
('2020-03-18', 'Customer_01'),
('2020-03-20', 'Customer_01'),
('2020-04-22', 'Customer_01'),
('2021-01-19', 'Customer_01'),
('2020-01-25', 'Customer_02'),
('2020-02-26', 'Customer_02'),
('2020-11-23', 'Customer_02'),
('2021-01-17', 'Customer_02'),
('2021-02-20', 'Customer_02');
Expected Result:
order_date | quantity
| (fiscal year)
-------------|----------------------------------------------------
2020-01-15 | 1 --> Customer_01 appears the first time between 2019-03 and 2020-02
2020-01-25 | 1 --> Customer_02 appears the first time between 2019-03 and 2020-02
2020-02-03 | 0
2020-02-15 | 0
2020-02-26 | 0
2020-03-18 | 1 --> Customer_01 appears the first time between 2020-03 and 2021-02
2020-03-20 | 0
2020-04-22 | 0
2020-11-23 | 1 --> Customer_02 appears the first time between 2020-03 and 2021-02
2021-01-17 | 0
2021-01-19 | 0
2021-02-20 | 0
In the above result I want to list all order dates and count the number of customers distinct per fiscal year.
The fiscal year starts two months after the calender year and therefore goes from March to February.
(e.g. from 2020-03 til 2021-02).
For example Customer_01 appears the first time on 2020-03-18 within the fiscal year 2020-03 til 2021-02.
Therefore, this order_date gets assigned 1 to it.
If the customer appears again within the fiscal year the next order_date will get assigned 0 to it.
With reference to this question in MariaDB I was able to achieve the expected result as you can see in the DB-Fiddle.
However, now I want to get the same results using postgresSQL.
Therefore, I have modified the query so far to this:
SELECT
order_date,
SUM(rn = 1) AS quantity
FROM
(SELECT
order_date,
row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date) rn
FROM customers
) t
GROUP BY 1;
However, now I am getting an error function sum(boolean) does not exist on the SUM(rn = 1) part.
What is the equivalnt syntax for the SUM(rn = 1) in postgresSQL to achieve the expected result?
Advertisement
Answer
After further investigation I came up with the following solution:
SELECT
order_date,
(CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity
FROM
(SELECT
order_date,
(row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date)) AS rolling_count
FROM customers
ORDER BY 1
) t
GROUP BY 1,2
ORDER BY 1;
As comparison here is the MariaDB of the query:
SELECT order_date, (CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity FROM (SELECT order_date, (row_number() over(PARTITION BY YEAR(order_date - INTERVAL 2 MONTH), customerID ORDER BY order_date)) AS rolling_count FROM customers ORDER BY 1 ) t GROUP BY 1 ORDER BY 1;