In AWS Timestream I am trying to get the average heart rate for the first month since we have received heart rate samples for a specific user and the average for the last week. I’m having trouble with the query to get the first month part. When I try to use MIN(time)
in the where clause I get the error: WHERE clause cannot contain aggregations, window functions or grouping operations.
SELECT * FROM "DATABASE"."TABLE" WHERE measure_name = 'heart_rate' AND time < min(time) + 30
If I add it as a column and try to query on the column, I get the error: Column ‘first_sample_time’ does not exist
SELECT MIN(time) AS first_sample_time FROM "DATABASE"."TABLE" WHERE measure_name = 'heart_rate' AND time > first_sample_time
Also if I try to add to MIN(time)
I get the error: line 1:18: ‘+’ cannot be applied to timestamp, integer
SELECT MIN(time) + 30 AS first_sample_time FROM "DATABASE"."TABLE"
Here is what I finally came up with but I’m wondering if there is a better way to do it?
WITH first_month AS ( SELECT Min(time) AS creation_date, From_milliseconds( To_milliseconds( Min(time) ) + 2628000000 ) AS end_of_first_month, USER FROM "DATABASE"."TABLE" WHERE USER = 'xxx' AND measure_name = 'heart_rate' GROUP BY USER ), first_month_avg AS ( SELECT Avg(hm.measure_value :: DOUBLE) AS first_month_average, fm.USER FROM "DATABASE"."TABLE" hm JOIN first_month fm ON hm.USER = fm.USER WHERE measure_name = 'heart_rate' AND hm.time BETWEEN fm.creation_date AND fm.end_of_first_month GROUP BY fm.USER ), last_week_avg AS ( SELECT Avg(measure_value :: DOUBLE) AS last_week_average, USER FROM "DATABASE"."TABLE" WHERE measure_name = 'heart_rate' AND time > ago(14d) AND USER = 'xxx' GROUP BY USER ) SELECT lwa.last_week_average, fma.first_month_average, lwa.USER FROM first_month_avg fma JOIN last_week_avg lwa ON fma.USER = lwa.USER
Is there a better or more efficient way to do this?
Advertisement
Answer
I can see you’ve run into a few challenges along the way to your solution, and hopefully I can clear these up for you and also propose a cleaner way of reaching your solution.
Filtering on aggregates As you’ve experienced first hand, SQL doesn’t allow aggregates in the where statement, and you also cannot filter on new columns you’ve created in the select statement, such as aggregates or case statements, as those columns/results are not present in the table you’re querying.
Fortunately there are ways around this, such as:
Making your main query a subquery, and then filtering on the result of that query, like below
Select * from (select *,count(that_good_stuff) as total_good_stuff from tasty_table group by 1,2,3) where total_good_stuff > 69
This works because the aggregate column (count) is no longer an aggregate at the time it’s called in the where statement, it’s in the result of the subquery.
- Having clause
If a subquery isn’t your cup of tea, you can use the having clause straight after your group by statement, which acts like a where statement except exclusively for handling aggregates.
This is better than resorting to a subquery in most cases, as it’s more readable and I believe more efficient.
select *,count(that_good_stuff) as total_good_stuff from tasty_table group by 1,2,3 having total_good_stuff > 69
Finally, window statements are fantastic…they’ve really helped condense many queries I’ve made in the past by removing the need for subqueries/ctes. If you could share some example raw data (remove any pii of course) I’d be happy to share an example for your use case.
Nevertheless, hope this helps! Tom