I am creating a query using bigquery that will calculate the overall number of data within 5 days from a specific date for each ID. So far I managed to get the result where the result return the data for each day (day 1 until 5), which means there are 5 dates in the result. However, what I want is actually to return only 1 date which is the last date so it will be like that 1 date showing the overall data for the past 5 days from that specific date for each of ID.
Sample data:
Sample code:
SELECT ID, Date, SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS lang_EN, SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS lang_SN, FROM ( SELECT DATE(Timestamp) as Date , ID, CASE WHEN Language in ('EN', 'English') THEN 'EN' WHEN Language in ('MY', 'Malay') THEN 'MY' ELSE Language END AS Language, FROM t WHERE Smooch_User_ID IS NOT NULL AND DATE(Timestamp) between '2020-01-01' and '2020-01-31' GROUP BY ID, Language, DATE(Timestamp) ) GROUP BY ID,Date
Sample output:
So far, the query return the data for each date from day 1 to 5. How to make it so it returns the overall data from date 1 to 5 and also return column date only for the day 5’s date like below (assuming day 1 is 2020-01-01 and day 5 is 2020-01-05).
Advertisement
Answer
I think one approach could be declaring variables, if you’re using script, so that you could extract/tag your output against end_date in your interval. Eg.
declare end_dt date default "2020-07-05"; -- this can be used as placeholder and replaced at run-time based on execution declare start_dt date default DATE_SUB(end_dt, INTERVAL 5 DAY); select end_dt as ts_date, id, sum(CASE WHEN lang = 'EN' THEN 1 ELSE 0 END) AS lang_EN, sum(CASE WHEN lang = 'SN' THEN 1 ELSE 0 END) AS lang_SN from( -- sample data select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang union all select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang union all select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "SN" as lang union all select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang union all select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang union all select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang union all select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang union all select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "SN" as lang union all select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "EN" as lang union all select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "SN" as lang union all select cast("2020-07-04 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "EN" as lang ) where date(ts) between start_dt and end_dt group by 1,2
And output:
Row ts_date id lang_EN lang_SN 1 2020-07-05 A 2 1 2 2020-07-05 B 2 2 3 2020-07-05 C 1 1 4 2020-07-05 D 1 1