Skip to content
Advertisement

Group by + sum for different thresholds (preferably in a single query)

Let’s say I have a table with 3 columns:

Column A: Client_ID (not unique, can appear in multiple rows)

Column B: Price paid for a product

Column C: Product (type)

+---+----+------+---+
|   | A  |  B   | C |
+---+----+------+---+
| 1 | 5  | 1200 | 1 | 
| 2 | 12 | 10   | 1 | 
| 3 | 12 | 15   | 1 | 
| 4 | 12 | 1750 | 2 | 
| 5 | 12 | 1250 | 2 | 
| 6 | 18 | 10   | 1 | 
| 7 | 18 | 1500 | 1 | 
| 8 | 18 | 1050 | 2 |
| 9 | 18 | 1050 | 2 |
+---+----+------+---+

I want to get 6 columns:

Client_ID | Product | Number_of_transactions | Sum_of_transactions | Number_of_transactions_over_1000 | Sum_of_transactions_over_1000

i.e

| Client_ID | Product | Nb_tr | Sum_tr | Nb_tr_1000 | Sum_tr_1000 |
+-----------+---------+-------+--------+------------+-------------+
|         5 |       1 |     1 |   1200 |          1 |        1200 |
|         5 |       2 |     0 |      0 |          0 |           0 |
|        12 |       1 |     2 |     25 |          0 |           0 |
|        12 |       2 |     2 |   3000 |          2 |        3000 |
|        18 |       1 |     2 |   1510 |          1 |        1500 |
|        18 |       2 |     2 |   2100 |          2 |        2100 |
+-----------+---------+-------+--------+------------+-------------+

With Client_ID x Product unique

I think the first part can be achieved with a simple GROUP BY

(SELECT ...
SUM(B) as Sum_of_transactions, COUNT(B) as Number_of_transactions 
...
GROUP BY A, C)

However, how can I get Number_of_transactions_over_1000 & Sum_of_transactions_over_1000 also? I know that I can obtain them with a separate query (by adding where B > 1000) and I can left join the query for Number_of_transactions & Sum_of_transactions with the query for Number_of_transactions_over_1000 & Sum_of_transactions_over_1000 afterwards, but can it be done in a single query (with a single SELECT instructions)?

Advertisement

Answer

Note: This will not extrapolate product_type, but gives correct results. Your input data table is named as data_p in following query, change accordingly. Also see if SAS supports CASE WHEN operation.[Because of no live SAS DB with me, i could not check it on SAS but this SQL is generic enough.]

SELECT
  clientId,
  p_type,
  SUM(price) as sum_of_tr,
  COUNT(*) as num_of_tr,
  SUM(CASE WHEN price>1000 THEN price ELSE 0 END) as sum_of_tr_over_1000,
  SUM(CASE WHEN price>1000 THEN 1     ELSE 0 END) as num_of_tr_over_1000 
FROM
  data_p
GROUP BY
  clientId, p_type
ORDER BY
 clientId, p_type
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement