I have an MVP DB fiddle: https://www.db-fiddle.com/f/cUn1Lo2xhbTAUwwV5q9wKV/2
I am trying to get the number of unique shift_id
s in the table on any date using window functions.
I tried to use COUNT(DISTINCT(shift_id))
but that is not supported on MySQL 8 with window functions at the moment.
Just in case the fiddle goes down. Here is the test schema:
CREATE TABLE `scores` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `shift_id` int unsigned NOT NULL, `employee_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL, `score` double(8,2) unsigned NOT NULL, `created_at` timestamp NOT NULL, PRIMARY KEY (`id`) ); INSERT INTO scores(shift_id, employee_name, score, created_at) VALUES (1, "John", 6.72, "2020-04-01 00:00:00"), (1, "Bob", 15.71, "2020-04-01 00:00:00"), (1, "Bob", 54.02, "2020-04-01 00:00:00"), (1, "John", 23.55, "2020-04-01 00:00:00"), (2, "John", 9.13, "2020-04-02 00:00:00"), (2, "Bob", 44.76, "2020-04-02 00:00:00"), (2, "Bob", 33.40, "2020-04-02 00:00:00"), (2, "James", 20, "2020-04-02 00:00:00"), (3, "John", 20, "2020-04-02 00:00:00"), (3, "Bob", 20, "2020-04-02 08:00:00"), (3, "Bob", 30, "2020-04-02 08:00:00"), (3, "James", 10, "2020-04-02 08:00:00")
And my query which has two attempted methods using what I saw on this post: Count distinct in window functions
SELECT ANY_VALUE(employee_name) AS `employee_name`, DATE(created_at) AS `shift_date`, COUNT(*) OVER (PARTITION BY ANY_VALUE(created_at), ANY_VALUE(shift_id)) AS `shifts_on_day_1`, ( dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) asc) + dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) desc) - 1 ) as `shifts_on_day_2` FROM scores GROUP BY employee_name, DATE(created_at);
The expected result would be any row with the date of 2020-04-01 would have a shifts_on_day
of 1 and the rows with the date of 2nd April would have shifts_on_day
at 2.
I have considered using a correlated subquery but that is a performance nightmare with millions of rows in the table and thousands being returned in the query.
Update: I think the necessity for window functions is that there is already a group by in the query. All the data is needed in one query with the end goal bring to get the average_score of each employees on a specific day. To get that total score for each employee I can just COUNT(*)
. But then I need to divide that by the total shifts in the day to get the average.
Update
The end result is to be able to get the total number of rows per employee per date in the table divided by the total number of shits that occurred on that date – that will provide the average row count in that date per employee.
Expected result is hence:
name | shift_date | avrg ------+------------+----- Bob | 2020-04-01 | 2 2 / 1 = 2 ; two rows for Bob, one shift_id (1) that day Bob | 2020-04-02 | 2 4 / 2 = 2 ; four rows for Bob, two shift_ids (2,3) that day James | 2020-04-02 | 1 2 / 2 = 1 ; two rows for James, two shift_ids (2,3) that day John | 2020-04-01 | 2 2 / 1 = 2 ; two rows for John, one shift_id (1) that day John | 2020-04-02 | 1 2 / 2 = 1 ; two rows for John, two shift_ids (2,3) that day
Advertisement
Answer
“All rows per date and employee” and “distinct count of IDs per date” are two complete different aggregations; you cannot do one aggregation and somehow retrieve the other aggregation from the elsewise aggregated rows. This rules window functions on the aggregation result out.
You need two separate aggregations instead. For instance:
with empdays as ( select employee_name, date(created_at) as shift_date, count(*) as total from scores group by employee_name, date(created_at) ) , days as ( select date(created_at) as shift_date, count(distinct shift_id) as total from scores group by date(created_at) ) select ed.employee_name, shift_date, ed.total / d.total as average from empdays ed join days d using (shift_date) order by ed.employee_name, shift_date;