Question
Web_eventstable containid,…,channel,account_idaccountstable containid, …,sales_rep_idsales_repstable containsid,name
Given the above tables, write an SQL query to determine the number of times a particular channel was used in the web_events table for each name in sales_reps. Your final table should have three columns – the name of the sales_reps, the channel, and the number of occurrences. Order your table with the highest number of occurrences first.
Answer
SELECT s.name, w.channel, COUNT(*) num_events FROM accounts a JOIN web_events w ON a.id = w.account_id JOIN sales_reps s ON s.id = a.sales_rep_id GROUP BY s.name, w.channel ORDER BY num_events DESC;
The COUNT(*) is confusing to me. I don’t get how SQL figure out thatCOUNT(*) is COUNT(w.channel). Can anyone clarify?
Advertisement
Answer
I don’t get how SQL figure out that COUNT(*) is COUNT(w.channel)
COUNT() is an aggregation function that counts the number of rows that match a condition. In fact, COUNT(<expression>) in general (or COUNT(column) in particular) counts the the number of rows where the expression (or column) is not NULL.
In general, the following do exactly the same thing:
COUNT(*)COUNT(1)COUNT(<primary key used on inner join>)
In general, I prefer COUNT(*) because that is the SQL standard for this. I can accept COUNT(1) as a recognition that COUNT(*) is just feature bloat. However, I see no reason to use the third version, because it just requires excess typing.
More than that, I find that new users often get confused between these two constructs:
COUNT(w.channel)COUNT(DISTINCT w.channel)
People learning SQL often think the first really does the second. For this reason, I recommend sticking with the simpler ways of counting rows. Then use COUNT(DISTINCT) when you really want to incur the overhead to count unique values (COUNT(DISTINCT) is more expensive than COUNT()).