Skip to content
Advertisement

Why isn’t GROUP_BY returning the desired records?

Given a table of:

invitations (id, created_at, type, email)

Given the following query:

SELECT DISTINCT email_address
FROM invitations
WHERE type = 'email'
AND date(created_at) in (curdate() - interval 3 day, 
                           curdate() - interval 8 day,
                           curdate() - interval 13 day,
                           curdate() - interval 21 day,
                           curdate() - interval 34 day,
                           curdate() - interval 50 day);

Given my query above, I get several hundred records. What I would like is to get the records grouped by email, meaning a unique email should exist at most once.

I therefore added:

GROUP BY email_address

This results in only 2 records when I run the query… Am I using GROUP BY incorrectly?

Advertisement

Answer

There’s no point doing SELECT DISTINCT email ... GROUP BY email – use either the DISTINCT or the GROUP BY to achieve the same result

If you got two email addresses out as a result, then the set of records selected only contained two unique email addresses

GROUP BY is used to aggregate data where some combination of columns is the same. Any other columns outside this combination must be contained within an agregation function like AVG, MAX, SUM etc

An example:

SELECT city, ethnicity, MAX(age), AVG(salary)
FROM people
GROUP BY city, ethnicity

It will present a unique set of city/ethnicity pairs, together with the oldest age and the average pay for the group

DISTINCT is essentially less useful than a GROUP BY, as it doesn’t let you run aggregation functions on other columns; it jsut returns a unique set of records across all selected columns. I consider it a convenience equivalent:

SELECT DISTINCT city, ethnicity
FROM people

SELECT city, ethnicity
FROM people
GROUP BY city, ethnicity --more to type
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement