Skip to content
Advertisement

Getting count of distinct column with window functions in MySQL 8

I have an MVP DB fiddle: https://www.db-fiddle.com/f/cUn1Lo2xhbTAUwwV5q9wKV/2

I am trying to get the number of unique shift_ids in the table on any date using window functions.

I tried to use COUNT(DISTINCT(shift_id)) but that is not supported on MySQL 8 with window functions at the moment.

Just in case the fiddle goes down. Here is the test schema:

CREATE TABLE `scores` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `shift_id` int unsigned NOT NULL,
  `employee_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `score` double(8,2) unsigned NOT NULL,
  `created_at` timestamp NOT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO scores(shift_id, employee_name, score, created_at) VALUES
(1, "John", 6.72, "2020-04-01 00:00:00"),
(1, "Bob", 15.71, "2020-04-01 00:00:00"),
(1, "Bob", 54.02, "2020-04-01 00:00:00"),
(1, "John", 23.55, "2020-04-01 00:00:00"),

(2, "John", 9.13, "2020-04-02 00:00:00"),
(2, "Bob", 44.76, "2020-04-02 00:00:00"),
(2, "Bob", 33.40, "2020-04-02 00:00:00"),
(2, "James", 20, "2020-04-02 00:00:00"),

(3, "John", 20, "2020-04-02 00:00:00"),
(3, "Bob", 20, "2020-04-02 08:00:00"),
(3, "Bob", 30, "2020-04-02 08:00:00"),
(3, "James", 10, "2020-04-02 08:00:00")

And my query which has two attempted methods using what I saw on this post: Count distinct in window functions

SELECT
    ANY_VALUE(employee_name) AS `employee_name`,
    DATE(created_at) AS `shift_date`,
    COUNT(*) OVER (PARTITION BY ANY_VALUE(created_at), ANY_VALUE(shift_id)) AS `shifts_on_day_1`,

    (
        dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) asc) +
        dense_rank() over (partition by ANY_VALUE(created_at) order by ANY_VALUE(shift_id) desc) - 1
    ) as `shifts_on_day_2`

FROM scores
    GROUP BY employee_name, DATE(created_at);

The expected result would be any row with the date of 2020-04-01 would have a shifts_on_day of 1 and the rows with the date of 2nd April would have shifts_on_day at 2.

I have considered using a correlated subquery but that is a performance nightmare with millions of rows in the table and thousands being returned in the query.

Update: I think the necessity for window functions is that there is already a group by in the query. All the data is needed in one query with the end goal bring to get the average_score of each employees on a specific day. To get that total score for each employee I can just COUNT(*). But then I need to divide that by the total shifts in the day to get the average.

Update

The end result is to be able to get the total number of rows per employee per date in the table divided by the total number of shits that occurred on that date – that will provide the average row count in that date per employee.

Expected result is hence:

name  | shift_date | avrg
------+------------+-----
Bob   | 2020-04-01 | 2     2 / 1 = 2 ; two rows for Bob, one shift_id (1) that day
Bob   | 2020-04-02 | 2     4 / 2 = 2 ; four rows for Bob, two shift_ids (2,3) that day
James | 2020-04-02 | 1     2 / 2 = 1 ; two rows for James, two shift_ids (2,3) that day
John  | 2020-04-01 | 2     2 / 1 = 2 ; two rows for John, one shift_id (1) that day
John  | 2020-04-02 | 1     2 / 2 = 1 ; two rows for John, two shift_ids (2,3) that day

Advertisement

Answer

“All rows per date and employee” and “distinct count of IDs per date” are two complete different aggregations; you cannot do one aggregation and somehow retrieve the other aggregation from the elsewise aggregated rows. This rules window functions on the aggregation result out.

You need two separate aggregations instead. For instance:

with empdays as
(
  select employee_name, date(created_at) as shift_date, count(*) as total
  from scores
  group by employee_name, date(created_at)
)
, days as 
(
  select date(created_at) as shift_date, count(distinct shift_id) as total
  from scores
  group by date(created_at)
)
select ed.employee_name, shift_date, ed.total / d.total as average
from empdays ed
join days d using (shift_date)
order by ed.employee_name, shift_date;

Demo: https://www.db-fiddle.com/f/qjqbibriXtos6Hsi5qcwi6/0

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement