CREATE TABLE customers ( id SERIAL PRIMARY KEY, order_date DATE, customerID VARCHAR(255) ); INSERT INTO customers (order_date, customerID ) VALUES ('2020-01-15', 'Customer_01'), ('2020-02-03', 'Customer_01'), ('2020-02-15', 'Customer_01'), ('2020-03-18', 'Customer_01'), ('2020-03-20', 'Customer_01'), ('2020-04-22', 'Customer_01'), ('2021-01-19', 'Customer_01'), ('2020-01-25', 'Customer_02'), ('2020-02-26', 'Customer_02'), ('2020-11-23', 'Customer_02'), ('2021-01-17', 'Customer_02'), ('2021-02-20', 'Customer_02');
Expected Result:
order_date | quantity | (fiscal year) -------------|---------------------------------------------------- 2020-01-15 | 1 --> Customer_01 appears the first time between 2019-03 and 2020-02 2020-01-25 | 1 --> Customer_02 appears the first time between 2019-03 and 2020-02 2020-02-03 | 0 2020-02-15 | 0 2020-02-26 | 0 2020-03-18 | 1 --> Customer_01 appears the first time between 2020-03 and 2021-02 2020-03-20 | 0 2020-04-22 | 0 2020-11-23 | 1 --> Customer_02 appears the first time between 2020-03 and 2021-02 2021-01-17 | 0 2021-01-19 | 0 2021-02-20 | 0
In the above result I want to list all order dates
and count the number of customers
distinct per fiscal year.
The fiscal year
starts two months after the calender year and therefore goes from March
to February
.
(e.g. from 2020-03
til 2021-02
).
For example Customer_01
appears the first time on 2020-03-18
within the fiscal year 2020-03
til 2021-02
.
Therefore, this order_date
gets assigned 1
to it.
If the customer appears again within the fiscal year the next order_date
will get assigned 0
to it.
With reference to this question in MariaDB
I was able to achieve the expected result as you can see in the DB-Fiddle.
However, now I want to get the same results using postgresSQL
.
Therefore, I have modified the query so far to this:
SELECT order_date, SUM(rn = 1) AS quantity FROM (SELECT order_date, row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date) rn FROM customers ) t GROUP BY 1;
However, now I am getting an error function sum(boolean) does not exist
on the SUM(rn = 1)
part.
What is the equivalnt syntax for the SUM(rn = 1)
in postgresSQL
to achieve the expected result?
Advertisement
Answer
After further investigation I came up with the following solution:
SELECT order_date, (CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity FROM (SELECT order_date, (row_number() over(PARTITION BY DATE_PART('year', (order_date - INTERVAL '2 month')::date), customerID ORDER BY order_date)) AS rolling_count FROM customers ORDER BY 1 ) t GROUP BY 1,2 ORDER BY 1;
As comparison here is the MariaDB of the query:
SELECT order_date, (CASE WHEN t.rolling_count > 1 THEN 0 ELSE t.rolling_count END) AS quantity FROM (SELECT order_date, (row_number() over(PARTITION BY YEAR(order_date - INTERVAL 2 MONTH), customerID ORDER BY order_date)) AS rolling_count FROM customers ORDER BY 1 ) t GROUP BY 1 ORDER BY 1;