Skip to content
Advertisement

Finding the overall data from specific date using bigquery

I am creating a query using bigquery that will calculate the overall number of data within 5 days from a specific date for each ID. So far I managed to get the result where the result return the data for each day (day 1 until 5), which means there are 5 dates in the result. However, what I want is actually to return only 1 date which is the last date so it will be like that 1 date showing the overall data for the past 5 days from that specific date for each of ID.

Sample data:

enter image description here

Sample code:

  SELECT
   ID,
   Date,
    SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS lang_EN,
    SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS lang_SN,
     FROM (
      SELECT 
      DATE(Timestamp) as Date ,
       ID, 
      CASE 
     WHEN Language in ('EN', 'English') THEN 'EN' 
     WHEN Language in ('MY', 'Malay')  THEN 'MY' ELSE Language 
     END AS Language,  
           FROM t
           WHERE Smooch_User_ID IS NOT NULL AND DATE(Timestamp) between '2020-01-01' and '2020-01-31'
           GROUP BY   ID, Language, DATE(Timestamp) 
         )
          GROUP BY  ID,Date

Sample output:

So far, the query return the data for each date from day 1 to 5. How to make it so it returns the overall data from date 1 to 5 and also return column date only for the day 5’s date like below (assuming day 1 is 2020-01-01 and day 5 is 2020-01-05).
enter image description here

Advertisement

Answer

I think one approach could be declaring variables, if you’re using script, so that you could extract/tag your output against end_date in your interval. Eg.

    declare end_dt date default "2020-07-05"; -- this can be used as placeholder and replaced at run-time based on execution
    declare start_dt date default DATE_SUB(end_dt, INTERVAL 5 DAY);
    
    select 
      end_dt as ts_date, 
      id, 
      sum(CASE WHEN lang = 'EN' THEN 1 ELSE 0 END) AS lang_EN, 
      sum(CASE WHEN lang = 'SN' THEN 1 ELSE 0 END) AS lang_SN 
    from(
      -- sample data
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
      union all
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
      union all
      select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "SN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
      union all
      select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "SN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "EN" as lang
      union all
      select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "SN" as lang
      union all
      select cast("2020-07-04 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "EN" as lang
    ) where date(ts) between start_dt and end_dt
    group by 1,2

And output:

Row     ts_date     id  lang_EN     lang_SN      
1       2020-07-05  A   2           1    
2       2020-07-05  B   2           2    
3       2020-07-05  C   1           1    
4       2020-07-05  D   1           1
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement