Percentage of tardiness and first date for defaults in postgresql

I have a table where I register a debt and the paid date:

CREATE TABLE my_table 
(
    the_debt_id varchar(6) NOT NULL, 
    the_debt_paid timestamp NOT NULL, 
    the_debt_due date NOT NULL
)

INSERT INTO my_table
VALUES ('LMUS01', '2019-05-02 09:00:01', '2019-05-02'), 
       ('LMUS01', '2019-06-03 10:45:12', '2019-06-02'), 
       ('LMUS01', '2019-07-01 15:39:58', '2019-07-02'), 
       ('LMUS02', '2019-05-03 19:43:44', '2019-05-07'), 
       ('LMUS02', '2019-06-07 08:37:05', '2019-06-07')

​x
 
CREATE TABLE my_table (    the_debt_id varchar(6) NOT NULL,     the_debt_paid timestamp NOT NULL,     the_debt_due date NOT NULL)​INSERT INTO my_tableVALUES ('LMUS01', '2019-05-02 09:00:01', '2019-05-02'),        ('LMUS01', '2019-06-03 10:45:12', '2019-06-02'),        ('LMUS01', '2019-07-01 15:39:58', '2019-07-02'),        ('LMUS02', '2019-05-03 19:43:44', '2019-05-07'),        ('LMUS02', '2019-06-07 08:37:05', '2019-06-07')​

What I want is to aggregate this data per debt_id, payments (the quantity of payments per debt_id), tardiness (if the paid_date > due_date), the first due_date per debt_id and the percentage that each debt was late. This table should give the idea:

the_debt_id    payments    tardiness    first_due_date    percentage
LMUS01         3           1            2019-05-02        0.33
LMUS02         2           0            2019-05-07        0

 
the_debt_id    payments    tardiness    first_due_date    percentageLMUS01         3           1            2019-05-02        0.33LMUS02         2           0            2019-05-07        0​

So I tried this so far:

WITH t1 AS(
SELECT the_debt_id, the_debt_due, the_debt_paid, 
CASE
WHEN the_debt_paid::date > the_debt_due THEN 1
ELSE 0
END AS tardiness 
FROM my_table), 
t2 AS(
SELECT the_debt_id, 
sum(tardiness) AS tardiness, 
count(the_debt_id) AS payments, 
first_value(the_debt_due)
FROM t1
GROUP BY the_debt_id),
t3 AS(
SELECT *,
tardiness/payments::float AS percentage 
FROM t2)
SELECT * FROM t3

 
WITH t1 AS(SELECT the_debt_id, the_debt_due, the_debt_paid, CASEWHEN the_debt_paid::date > the_debt_due THEN 1ELSE 0END AS tardiness FROM my_table), t2 AS(SELECT the_debt_id, sum(tardiness) AS tardiness, count(the_debt_id) AS payments, first_value(the_debt_due)FROM t1GROUP BY the_debt_id),t3 AS(SELECT *,tardiness/payments::float AS percentage FROM t2)SELECT * FROM t3​

I get an error where it says I need an OVER clause, which means that I need a partition but I’m not sure how to combine GROUP BY and PARTITION. Any help will be greatly appreciated.

Answer

Aggregation seems appropriate:

select the_debt_id,
       count(*) as payments,
       count(*) filter (where the_debt_paid::date > the_debt_due) as num_tardy,
       min(the_debt_due) as first_due_date,
       avg( (the_debt_paid::date > the_debt_due)::int ) as tardy_ratio
from my_table t
group by the_debt_id;

 
select the_debt_id,       count(*) as payments,       count(*) filter (where the_debt_paid::date > the_debt_due) as num_tardy,       min(the_debt_due) as first_due_date,       avg( (the_debt_paid::date > the_debt_due)::int ) as tardy_ratiofrom my_table tgroup by the_debt_id;​

Here is a db<>fiddle.

Advertisement

Answer