I am creating a query using bigquery that will calculate the overall number of data within 5 days from a specific date for each ID. So far I managed to get the result where the result return the data for each day (day 1 until 5), which means there are 5 dates in the result. However, what I want is actually to return only 1 date which is the last date so it will be like that 1 date showing the overall data for the past 5 days from that specific date for each of ID.
Sample data:
Sample code:
SELECT
ID,
Date,
SUM(CASE WHEN Language = 'EN' THEN 1 ELSE 0 END) AS lang_EN,
SUM(CASE WHEN Language = 'SN' THEN 1 ELSE 0 END) AS lang_SN,
FROM (
SELECT
DATE(Timestamp) as Date ,
ID,
CASE
WHEN Language in ('EN', 'English') THEN 'EN'
WHEN Language in ('MY', 'Malay') THEN 'MY' ELSE Language
END AS Language,
FROM t
WHERE Smooch_User_ID IS NOT NULL AND DATE(Timestamp) between '2020-01-01' and '2020-01-31'
GROUP BY ID, Language, DATE(Timestamp)
)
GROUP BY ID,Date
Sample output:
So far, the query return the data for each date from day 1 to 5. How to make it so it returns the overall data from date 1 to 5 and also return column date only for the day 5’s date like below (assuming day 1 is 2020-01-01 and day 5 is 2020-01-05).
Advertisement
Answer
I think one approach could be declaring variables, if you’re using script, so that you could extract/tag your output against end_date in your interval. Eg.
declare end_dt date default "2020-07-05"; -- this can be used as placeholder and replaced at run-time based on execution
declare start_dt date default DATE_SUB(end_dt, INTERVAL 5 DAY);
select
end_dt as ts_date,
id,
sum(CASE WHEN lang = 'EN' THEN 1 ELSE 0 END) AS lang_EN,
sum(CASE WHEN lang = 'SN' THEN 1 ELSE 0 END) AS lang_SN
from(
-- sample data
select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
union all
select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "EN" as lang
union all
select cast("2020-07-01 01:01:25.550 UTC" as timestamp) as ts, "A" as id, "SN" as lang
union all
select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
union all
select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "EN" as lang
union all
select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
union all
select cast("2020-07-02 01:01:25.550 UTC" as timestamp) as ts, "B" as id, "SN" as lang
union all
select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "SN" as lang
union all
select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "C" as id, "EN" as lang
union all
select cast("2020-07-03 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "SN" as lang
union all
select cast("2020-07-04 01:01:25.550 UTC" as timestamp) as ts, "D" as id, "EN" as lang
) where date(ts) between start_dt and end_dt
group by 1,2
And output:
Row ts_date id lang_EN lang_SN
1 2020-07-05 A 2 1
2 2020-07-05 B 2 2
3 2020-07-05 C 1 1
4 2020-07-05 D 1 1