Skip to content
Advertisement

Find sum of data from the latest date in bigquery sql

I have a table like so:

    ID   | Date        | Language
    ---------------------------------
    A    | 2013-04-10  | EN
    A    | 2013-04-11  | EN
    A    | 2013-05-12  | SN
    B    | 2013-04-01  | SN
    B    | 2013-05-28  | SN
    .... (and many more dates for other ID)

I want the query to pick up the LATEST Date for each ID, pick the Language for that code, and so on for all the records, and sum it up from all backlog data to LATEST Date. So for the data above, the result should be 1 + 1 = 2 for EN and 1 for SN language, for ID = A and so goes for the other IDs. I’ve found almost same question on S/O but in LINQ query (here), not sure how to do in standard sql.

This is what I’ve been trying (up only to sum all data without putting WHERE clause for latest date):

    SELECT 
        ID, 
        Date,
        SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS Sum_EN, #count all language from latest date of each ID
        SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS Sum_SN,
    FROM t 
    #WHERE Date from latest date to all backlog data 
    GROUP BY ID, Date

  

Sample output:

    ID   |   Date      | Sum_EN | Sum_SN
    --------------------------------------
    A    | 2013-05-12  | 2      |   1
    B    | 2013-05-28  | 0      |   2
    .... (and many more dates for other ID)

Note: I comment WHERE clause in query because not sure how to select for latest date for each ID

Advertisement

Answer

If you would like max date for each Id, then you can add max date and it will give your expected output.

 SELECT 
        ID, 
        max(Date) Date,
        SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS Sum_EN, #count all language from latest date of each ID
        SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS Sum_SN,
    FROM t 
    #WHERE Date from latest date to all backlog data 
    GROUP BY ID 
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement