I’m using the Bixi public dataset found at https://www.bixi.com/en/open-data and have been asked to find “the average number of trips a day for each year-month combination in the dataset”. Here’s an example of the table from which I’m querying:
id | start_date | start_station_code | end_date | end_station_code | duration_sec | is_member |
---|---|---|---|---|---|---|
85843 | 2016-04-15 00:00:00 | 6315 | 2016-04-15 00:05:00 | 6315 | 349 | 1 |
85844 | 2016-04-15 17:16:00 | 6315 | 2016-04-15 17:37:00 | 6315 | 1293 | 0 |
The query that I used to successfully answer the question was as follows, but how can I simplify it?:
SELECT daily_trips_2016.avg_daily_trips AS avg_daily_trips_2016, daily_trips_2017.avg_daily_trips AS avg_daily_trips_2017, daily_trips_2016.month FROM ( SELECT -- This sub-query determines the average number of daily trips per month for the year 2016 ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips, MONTH(start_date) AS month FROM trips WHERE YEAR(start_date) = 2016 GROUP BY month) AS daily_trips_2016 LEFT JOIN ( SELECT -- This sub-query determines the average number of daily trips per month for the year 2017 ROUND(COUNT(*) / (COUNT(DISTINCT DAY(start_date))), 0) AS avg_daily_trips, MONTH(start_date) AS month FROM trips WHERE YEAR(start_date) = 2017 GROUP BY month) AS daily_trips_2017 ON daily_trips_2016.month = daily_trips_2017.month; -- Since both sub-queries share the "month" column, the JOIN happens on this column -- and we can see the daily averages side by side for each year
Advertisement
Answer
Use conditional aggregation:
SELECT ROUND(SUM(YEAR(start_date) = 2016) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2016 THEN DAY(start_date) END), 0) AS avg_daily_trips_2016, ROUND(SUM(YEAR(start_date) = 2017) / COUNT(DISTINCT CASE WHEN YEAR(start_date) = 2017 THEN DAY(start_date) END), 0) AS avg_daily_trips_2017, MONTH(start_date) AS month FROM trips WHERE YEAR(start_date) IN (2016, 2017) GROUP BY month;