Given a table of:
invitations (id, created_at, type, email)
Given the following query:
SELECT DISTINCT email_address FROM invitations WHERE type = 'email' AND date(created_at) in (curdate() - interval 3 day, curdate() - interval 8 day, curdate() - interval 13 day, curdate() - interval 21 day, curdate() - interval 34 day, curdate() - interval 50 day);
Given my query above, I get several hundred records. What I would like is to get the records grouped by email, meaning a unique email should exist at most once.
I therefore added:
GROUP BY email_address
This results in only 2 records when I run the query… Am I using GROUP BY
incorrectly?
Advertisement
Answer
There’s no point doing SELECT DISTINCT email ... GROUP BY email
– use either the DISTINCT or the GROUP BY to achieve the same result
If you got two email addresses out as a result, then the set of records selected only contained two unique email addresses
GROUP BY is used to aggregate data where some combination of columns is the same. Any other columns outside this combination must be contained within an agregation function like AVG, MAX, SUM etc
An example:
SELECT city, ethnicity, MAX(age), AVG(salary) FROM people GROUP BY city, ethnicity
It will present a unique set of city/ethnicity pairs, together with the oldest age and the average pay for the group
DISTINCT is essentially less useful than a GROUP BY, as it doesn’t let you run aggregation functions on other columns; it jsut returns a unique set of records across all selected columns. I consider it a convenience equivalent:
SELECT DISTINCT city, ethnicity FROM people SELECT city, ethnicity FROM people GROUP BY city, ethnicity --more to type