I came across this question and was trying to understand how the answer given there is actually transforming the data.
Input Table
+---------+-------+---------------+ | user_id | State | Subscriptions | +---------+-------+---------------+ | 1 | LA | 4 | | 2 | LA | 4 | | 3 | LA | 12 | | 4 | LA | 12 | | 5 | LA | 8 | | 6 | LA | 3 | | 7 | NY | 14 | | 8 | NY | 15 | | 9 | NY | 3 | | 10 | NY | 2 | | 11 | NY | 4 | | 12 | NY | 12 | | 13 | OH | 6 | | 14 | OH | 8 | | 15 | OH | 2 | | 16 | OH | 3 | +---------+-------+---------------+
Output Table
+--------------------+----+----+----+ | Subscription_Range | LA | NY | OH | +--------------------+----+----+----+ | 1 to 4 | 3 | 3 | 2 | | 5 to 11 | 1 | 0 | 2 | | 12 to 15 | 2 | 3 | 0 | +--------------------+----+----+----+
Answer given by Gordon Linoff:
select (case when subscriptions <= 4 then '1 to 4'
when subscriptions <= 11 then '5 to 11'
when subscriptions <= 15 then '12 to 15'
end) as subscription_range,
sum(case when state = 'LA' then 1 else 0 end) as LA,
sum(case when state = 'NY' then 1 else 0 end) as NY,
sum(case when state = 'OH' then 1 else 0 end) as OH
from t
group by (case when subscriptions <= 4 then '1 to 4'
when subscriptions <= 11 then '5 to 11'
when subscriptions <= 15 then '12 to 15'
end)
order by min(subscriptions);
I wanted to understand fundamentally how this query executes.
For example:
- When the first row is selected will the
Subscriptionscolumn be checked first? (Since it is the first that is checked with acasein the query.) - After checking it turns out that it should be assigned
1 to 4. Now what next? - Will the state column be checked? It turns out that it is
LAbut I am not getting how the further execution is happening. I am trying to imagine the table forming before aggregation.
Does SQL operate in a row-wise fashion? As in, each and every row is picked up from the database and the corresponding part of the query is applied to each column? (Like case is applied to the Subscriptions column in this case.)
Advertisement
Answer
When evaluating the expressions within the select clause prior to aggregation, you can imagine that the following table would be obtained for the given dataset:
+--------------------+-----+----+----+ | subscription_range | LA | NY | OH | +--------------------+-----+----+----+ | '1 to 4' | 1 | 0 | 0 | | '1 to 4' | 1 | 0 | 0 | | '12 to 15' | 1 | 0 | 0 | | '12 to 15' | 1 | 0 | 0 | | '5 to 11' | 1 | 0 | 0 | | '1 to 4' | 1 | 0 | 0 | | '12 to 15' | 0 | 1 | 0 | | '12 to 15' | 0 | 1 | 0 | | '1 to 4' | 0 | 1 | 0 | | '1 to 4' | 0 | 1 | 0 | | '1 to 4' | 0 | 1 | 0 | | '12 to 15' | 0 | 1 | 0 | | '5 to 11' | 0 | 0 | 1 | | '5 to 11' | 0 | 0 | 1 | | '1 to 4' | 0 | 0 | 1 | | '1 to 4' | 0 | 0 | 1 | +--------------------+-----+----+----+
Here, for each row in the dataset, the first case statement yields a string, and the subsequent case statements yield either a 1 or 0 depending on whether the state column validates the test expression.
When aggregated, the query calculates the same set of subscription_range values and groups by this data such that each subscription_range is distinct.
The remaining numerical data in the subsequent columns is then summed over each group by the sum expressions enclosing each case statement, yielding:
+--------------------+----+----+----+ | subscription_range | LA | NY | OH | +--------------------+----+----+----+ | 1 to 4 | 3 | 3 | 2 | | 5 to 11 | 1 | 0 | 2 | | 12 to 15 | 2 | 3 | 0 | +--------------------+----+----+----+