Skip to content
Advertisement

Optimization of BigQuery aggregated data between two dates

I’m trying to compare the result of one query (example: number of rows), between the beginning of the current month and the current date, compared to what happened the previous month.

Example: Today is 25/01, so I’d like to know the number of rows created between the 01/01 and 25/01, vs the previous month (same interval) 01/12 and 25/12.

I’d like to retrieve it in one row, so that I can return the value of the current month, and the string : up/down depending on whether there have been more or less rows, compared to the previous month.

I’ve managed to get it working this way, but it looks too hacky and I’d like to know if there is a better approach (apart from retrieving two rows and process the result).

SELECT MAX(total_current) as current, IF(MAX(total_current) > MAX(total_previous), 'up', 'down') as status, 'Number of Items'
FROM
(SELECT INTEGER(count(*)) as total_current, INTEGER(0) as total_previous
FROM [mybucket.mytable] 
WHERE mydate BETWEEN TIMESTAMP(STRFTIME_UTC_USEC(CURRENT_TIMESTAMP(), "%Y-%m-01")) and CURRENT_TIMESTAMP()),
(SELECT INTEGER(count(*)) as total_previous, INTEGER(0) as total_current
FROM [mybucket.mytable] 
WHERE mydate 
BETWEEN DATE_ADD(TIMESTAMP(STRFTIME_UTC_USEC(CURRENT_TIMESTAMP(), "%Y-%m-01")), -1, 'MONTH') 
AND DATE_ADD(CURRENT_TIMESTAMP(), -1, 'MONTH'))

Does it make sense, or is absolutely wrong? If so, how could I improve it, or it’s just that this kind of things aren’t supposed to be done in a query.

Advertisement

Answer

I’ve managed to optimise/simplify the query as follows:

SELECT TOP(LEFT(DATE(mydate), 7), 2) as month, count(*) as total
FROM [mybucket.mytable]
WHERE DAY(mydate) BETWEEN 1 and DAY(CURRENT_TIMESTAMP())
AND LEFT(DATE(mydate), 7) >= LEFT(DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'MONTH')), 7);

However, I still would like to get only one row with the result of the current month and up/down compared to the previous month.

For example if the query returns this:

1   2013-12 48946    
2   2014-01 40497

In that case I’d like to get the following row: 1 40497 'down' (because previous month’s value was higher).
Is there any way to do it? Thanks

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement