My dataset is about sales, each line corresponds to an invoice. It is possible to have 2 registers in the same day for the same customer, if he had bought twice in that day.
As you can see in the image below, the blue square shows us that customer 355122 (id_cliente = customer_id
) bought twice (275831N
and 275826N
invoice’s id) in the same day (2020-12-19
) (penult_data = second-last
date). This query is meant to be a support table, to left join the main table and bring those results.
First of all, i’ve created a row number just over customer_id
(blue arrow, aux
), so that I could just join with aux = 2
(that should be the second-last register), but in cases that the customer bought twice that day, the second-last invoice is not the second-last date he bought. I need the second-last DATE. He can buy 1,2,3,4,5 times a day, so I cannot assume a correct aux
number to filter.
Then, for some reason, I also created an aux2
, it’s a row number over customer and date, but it really didn’t help. I needed something that would repeat the index for the same date, so that index = 2
would be the second-last date.
I cannot use group by
because i’m retrieving the salesman id (penult_vend
), the store id (penult_empe
), and so on from the second-last date
This is the output of part of the query I’m using (as I said, the support table to left join the main table). I’m filtering to this customer’s id.
Does somebody knows any function or method to make this work?
I’m using google big query.
Thanks
Advertisement
Answer
Assuming the column penult_data
has only date information without time of the day, you can find the second to last “date” and then the last “invoice” on that date by using the DENSE_RANK()
and ROW_NUMBER()
functions:
dense_rank() over(partition by id_cliente order by penult_data desc) as rnd, row_number() over(partition by id_cliente, penult_data order by penult_nf desc) as rnf,
Then, you can use the filtering condition:
where rnd = 2 and rnf = 1