Question
Web_events
table containid
,…,channel
,account_id
accounts
table containid
, …,sales_rep_id
sales_reps
table containsid
,name
Given the above tables, write an SQL query to determine the number of times a particular channel
was used in the web_events
table for each name in sales_reps
. Your final table should have three columns – the name of the sales_reps
, the channel
, and the number of occurrences. Order your table with the highest number of occurrences first.
Answer
SELECT s.name, w.channel, COUNT(*) num_events FROM accounts a JOIN web_events w ON a.id = w.account_id JOIN sales_reps s ON s.id = a.sales_rep_id GROUP BY s.name, w.channel ORDER BY num_events DESC;
The COUNT(*)
is confusing to me. I don’t get how SQL figure out thatCOUNT(*)
is COUNT(w.channel)
. Can anyone clarify?
Advertisement
Answer
I don’t get how SQL figure out that COUNT(*) is COUNT(w.channel)
COUNT()
is an aggregation function that counts the number of rows that match a condition. In fact, COUNT(<expression>)
in general (or COUNT(column)
in particular) counts the the number of rows where the expression (or column) is not NULL
.
In general, the following do exactly the same thing:
COUNT(*)
COUNT(1)
COUNT(<primary key used on inner join>)
In general, I prefer COUNT(*)
because that is the SQL standard for this. I can accept COUNT(1)
as a recognition that COUNT(*)
is just feature bloat. However, I see no reason to use the third version, because it just requires excess typing.
More than that, I find that new users often get confused between these two constructs:
COUNT(w.channel)
COUNT(DISTINCT w.channel)
People learning SQL often think the first really does the second. For this reason, I recommend sticking with the simpler ways of counting rows. Then use COUNT(DISTINCT)
when you really want to incur the overhead to count unique values (COUNT(DISTINCT)
is more expensive than COUNT()
).