How to get the second row by date and by id, without group by (not sure about row number)

My dataset is about sales, each line corresponds to an invoice. It is possible to have 2 registers in the same day for the same customer, if he had bought twice in that day.

As you can see in the image below, the blue square shows us that customer 355122 (id_cliente = customer_id) bought twice (275831N and 275826N invoice’s id) in the same day (2020-12-19) (penult_data = second-last date). This query is meant to be a support table, to left join the main table and bring those results.

First of all, i’ve created a row number just over customer_id (blue arrow, aux), so that I could just join with aux = 2 (that should be the second-last register), but in cases that the customer bought twice that day, the second-last invoice is not the second-last date he bought. I need the second-last DATE. He can buy 1,2,3,4,5 times a day, so I cannot assume a correct aux number to filter.

Then, for some reason, I also created an aux2, it’s a row number over customer and date, but it really didn’t help. I needed something that would repeat the index for the same date, so that index = 2 would be the second-last date.

I cannot use group by because i’m retrieving the salesman id (penult_vend), the store id (penult_empe), and so on from the second-last date

This is the output of part of the query I’m using (as I said, the support table to left join the main table). I’m filtering to this customer’s id.

Does somebody knows any function or method to make this work?

I’m using google big query.

Thanks

Answer

Assuming the column penult_data has only date information without time of the day, you can find the second to last “date” and then the last “invoice” on that date by using the DENSE_RANK() and ROW_NUMBER() functions:

dense_rank() over(partition by id_cliente 
                  order by penult_data desc) as rnd,
row_number() over(partition by id_cliente, penult_data 
                  order by penult_nf desc) as rnf,

​x
 
dense_rank() over(partition by id_cliente                   order by penult_data desc) as rnd,row_number() over(partition by id_cliente, penult_data                   order by penult_nf desc) as rnf,​

Then, you can use the filtering condition:

where rnd = 2 and rnf = 1

 
where rnd = 2 and rnf = 1​

Advertisement

Answer