With MySQL, how to calculate a MEDIAN with Left Join records?

Question

Given the following two tables, I'd like to know how to calculate the MEDIAN rating for reviews on a weekly basis. reviews (id, user_id, completed_at) reviews.completed_at -- lets us know the user ...

Accepted Answer

The problem is not clear. I&#8217;ll use reviews.completed_at for the date because reviews_areas.created_at contains just a time.We need to join reviews for the date with reviews_areas for the rating.To avoid the same week on different years from overlapping, we use yearweek to turn dates into the year + week.To get the median we need to find the middle row (or rows if there&#8217;s an even number) for each week. There&#8217;s many ways to do this. I&#8217;ll crib from Justin Grant&#8217;s clever answer. We count the row_number() ordered ascending and descending. Where they overlap +/- 1 are the median rows. Then we average them.1 2 3 4 5 66 5 4 3 2 1    ^^^    median rowsFirst, we get the row numbers by week.select    yearweek(completed_at) as week,    rating,    row_number() over(        partition by yearweek(completed_at)        order by rating asc, id asc    ) as row_asc,    row_number() over(        partition by yearweek(completed_at)        order by rating desc, id desc    ) as row_descfrom reviews_areas rajoin reviews r on r.id = ra.review_idThe row numbers are ordered by order by rating asc, id asc. The id is a secondary sort to disambiguate rows with the same rating.+--------+--------+---------+----------+| week   | rating | row_asc | row_desc |+--------+--------+---------+----------+| 201928 |     10 |       4 |        1 || 201928 |      9 |       3 |        2 || 201928 |      5 |       2 |        3 || 201928 |      0 |       1 |        4 || 201929 |      5 |       2 |        1 || 201929 |      1 |       1 |        2 || 201930 |      5 |       1 |        1 |+--------+--------+---------+----------+Then we use that as a common table expression to average the middle rows of each week. A subquery works just as well.with rating_weeks as (    select        yearweek(completed_at) as week,        rating,        row_number() over(            partition by yearweek(completed_at)            order by rating asc, id asc        ) as row_asc,        row_number() over(            partition by yearweek(completed_at)            order by rating desc, id desc        ) as row_desc    from reviews_areas ra    join reviews r on r.id = ra.review_id)select    week,    -- Take the average of the possibly 2 median rows    avg(rating)from rating_weekswhere    -- Find the rows which overlap +/- 1. These are the median rows.    row_asc in (row_desc, row_desc - 1, row_desc + 1)group by weekorder by week+--------+-------------+| week   | avg(rating) |+--------+-------------+| 201928 |      7.0000 || 201929 |      3.0000 || 201930 |      5.0000 |+--------+-------------+

Advertisement

Answer