Skip to content
Advertisement

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong

SELECT  User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,

case
  when  time BETWEEN '04:00:00' AND '12:00:00' 
  then  "morning"
  when  time < '04:00:00' OR time > '20:00:00' 
  then  "night" 
end AS part_of_day 

FROM (
      SELECT User_ID, 
        TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
        COUNT(*) AS cnt
    

Error received:

Syntax error: Expected ")" but got identifier "COUNT" at [19:9]

Advertisement

Answer

Even though you did not share any sample data, I was able to identify some issues within your code.

I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:

WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT  User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,

case
  when  time BETWEEN '04:00:00' AND '12:00:00' 
  then  "morning"
  when  time < '04:00:00' OR time > '20:00:00' 
  then  "night" 
end AS part_of_day 

FROM (
      SELECT User_ID,
        TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
        COUNT(*) AS cnt
      FROM data
   GROUP BY User_ID, Channel, Request_Timestamp
   #order by Request_Timestamp

    )
    GROUP BY User_ID, Time;

First, notice that I have changed the column’s name in your ARRAY_AGG() method, it had to be done because it would cause the error “Duplicate column name”. Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn’t aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.

Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement