Skip to content
Advertisement

Numbering duplicate rows in bigquery

I’m trying to number rows in which column a and column b is a match and when they occur again regardless of the date.

The dataset looks like this and I want the row_number column:

rider driver date row_number
jim joe 2020-10-20 1
jim joe 2020-10-28 2
jim jack 2020-10-29 1
jim joe 2020-10-31 3
john jane 2020-10-29 1
john jane 2020-10-31 2

I thought about using the function <row_number over partition by> but I don’t think it worked.. Help would be appreciated! Thanks!

Advertisement

Answer

Row_number() is actually for what you are looking for.

WITH data as (

SELECT "jim" as rider, "joe" as driver, "2020-10-20" as date UNION ALL
SELECT "jim", "joe", "2020-10-28" UNION ALL
SELECT "jim", "jack" , "2020-10-29" UNION ALL
SELECT "jim", "joe","2020-10-31" UNION ALL
SELECT "john", "jane", "2020-10-29"  UNION ALL
SELECT "john", "jane", "2020-10-31" )

SELECT 
  rider, driver, date, 
  ROW_NUMBER() OVER (partition by rider, driver) as row_number
FROM data
ORDER by rider, driver
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement