Skip to content
Advertisement

How does a SQL query transform the data?

I came across this question and was trying to understand how the answer given there is actually transforming the data.

Input Table

+---------+-------+---------------+
| user_id | State | Subscriptions |
+---------+-------+---------------+
|       1 | LA    |             4 |
|       2 | LA    |             4 |
|       3 | LA    |            12 |
|       4 | LA    |            12 |
|       5 | LA    |             8 |
|       6 | LA    |             3 |
|       7 | NY    |            14 |
|       8 | NY    |            15 |
|       9 | NY    |             3 |
|      10 | NY    |             2 |
|      11 | NY    |             4 |
|      12 | NY    |            12 |
|      13 | OH    |             6 |
|      14 | OH    |             8 |
|      15 | OH    |             2 |
|      16 | OH    |             3 |
+---------+-------+---------------+

Output Table

+--------------------+----+----+----+
| Subscription_Range | LA | NY | OH |
+--------------------+----+----+----+
| 1 to 4             |  3 |  3 |  2 |
| 5 to 11            |  1 |  0 |  2 |
| 12 to 15           |  2 |  3 |  0 |
+--------------------+----+----+----+

Answer given by Gordon Linoff:

 select (case when subscriptions <= 4 then '1 to 4'
                 when subscriptions <= 11 then '5 to 11'
                 when subscriptions <= 15 then '12 to 15'
            end) as subscription_range,
           sum(case when state = 'LA' then 1 else 0 end) as LA,
           sum(case when state = 'NY' then 1 else 0 end) as NY,
           sum(case when state = 'OH' then 1 else 0 end) as OH
    from t
    group by (case when subscriptions <= 4 then '1 to 4'
                   when subscriptions <= 11 then '5 to 11'
                   when subscriptions <= 15 then '12 to 15'
              end)
    order by min(subscriptions);

I wanted to understand fundamentally how this query executes.

For example:

  1. When the first row is selected will the Subscriptions column be checked first? (Since it is the first that is checked with a case in the query.)
  2. After checking it turns out that it should be assigned 1 to 4. Now what next?
  3. Will the state column be checked? It turns out that it is LA but I am not getting how the further execution is happening. I am trying to imagine the table forming before aggregation.

Does SQL operate in a row-wise fashion? As in, each and every row is picked up from the database and the corresponding part of the query is applied to each column? (Like case is applied to the Subscriptions column in this case.)

Advertisement

Answer

When evaluating the expressions within the select clause prior to aggregation, you can imagine that the following table would be obtained for the given dataset:

+--------------------+-----+----+----+
| subscription_range | LA  | NY | OH |
+--------------------+-----+----+----+
| '1 to 4'           |   1 |  0 |  0 |
| '1 to 4'           |   1 |  0 |  0 |
| '12 to 15'         |   1 |  0 |  0 |
| '12 to 15'         |   1 |  0 |  0 |
| '5 to 11'          |   1 |  0 |  0 |
| '1 to 4'           |   1 |  0 |  0 |
| '12 to 15'         |   0 |  1 |  0 |
| '12 to 15'         |   0 |  1 |  0 |
| '1 to 4'           |   0 |  1 |  0 |
| '1 to 4'           |   0 |  1 |  0 |
| '1 to 4'           |   0 |  1 |  0 |
| '12 to 15'         |   0 |  1 |  0 |
| '5 to 11'          |   0 |  0 |  1 |
| '5 to 11'          |   0 |  0 |  1 |
| '1 to 4'           |   0 |  0 |  1 |
| '1 to 4'           |   0 |  0 |  1 |
+--------------------+-----+----+----+

Here, for each row in the dataset, the first case statement yields a string, and the subsequent case statements yield either a 1 or 0 depending on whether the state column validates the test expression.

When aggregated, the query calculates the same set of subscription_range values and groups by this data such that each subscription_range is distinct.

The remaining numerical data in the subsequent columns is then summed over each group by the sum expressions enclosing each case statement, yielding:

+--------------------+----+----+----+
| subscription_range | LA | NY | OH |
+--------------------+----+----+----+
| 1 to 4             |  3 |  3 |  2 |
| 5 to 11            |  1 |  0 |  2 |
| 12 to 15           |  2 |  3 |  0 |
+--------------------+----+----+----+
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement