Skip to content
Advertisement

Counting identical records in a SQL table with respect to a column

Problem Description

I am looking for a SQL query that can create table 2 using table 1. I’ve thought of way to do this but it’s not very eloquent and would prefer something simpler if possible.

TABLE 1:

ID Col_1 Col_2 Col_3 Col_4
01 A Q R U
02 A Q R P
03 A Q R U
04 A Q R P
05 A Q T U
06 A Q T P
07 A Q T U
08 A Q T P
09 A Q R U
10 B Q R P
11 B Q R U
12 B Q R P
13 B Q T U
14 C Q T P
15 C Q T U
16 C Q T U
17 C Q R P
18 C Q R U
19 C Q R P
20 C Q R U

TABLE 2:

ID Col_1 Col_2 Col_3 Col_4
01 9 9 5 3
02 9 9 5 2
03 9 9 5 3
04 9 9 5 2
05 9 9 4 2
06 9 9 4 2
07 9 9 4 2
08 9 9 4 2
09 9 9 5 3
10 4 4 3 2
11 4 4 3 1
12 4 4 3 2
13 4 4 1 1
14 7 7 3 1
15 7 7 3 2
16 7 7 3 2
17 7 7 4 2
18 7 7 4 2
19 7 7 4 2
20 7 7 4 2

The numbers in Table 2 are generated as follows:

In Table 2’s Col_1, the number is the count of all IDs that have the same Col_1 value.
In Table 2’s Col_2, the number is the count of all IDs that have the same Col_1 AND Col_2 value.
In Table 2’s Col_3, the number is the count of all IDs that have the same Col_1 AND Col_2 AND Col_3 value.
In Table 2’s Col_4, the number is the count of all IDs that have the same Col_1 AND Col_2 AND Col_3 AND Col_4 value.

Example:
In Table 1, ID01’s Col_1 value is ‘A’. ‘A’ matches with 02, 03, 04, 05, 06, 07, 08, and 09 (8 other IDs) so the number is 9.
ID01’s Col_2 value is ‘Q’. ‘A’ in Col_1 AND ‘Q’ in Col_2 matches with 02, 03, 04, 05, 06, 07, 08, and 09 (8 other IDs) so the number is 9.
ID01’s Col_3 value is ‘R’. ‘A’ in Col_1 AND ‘Q’ in Col_2 AND ‘R’ in Col_3 matches with 02, 03, 04, and 09 (4 other IDs) so the number is 5.
ID01’s Col_4 value is ‘U’. ‘A’ in Col_1 AND ‘Q’ in Col_2 AND ‘R’ in Col_3 AND ‘U’ in Col_4 matches with 03 and 09 (2 other IDs) so the number is 3.

Essentially, we are trying to find the total number of identical IDs in Table 1 – with respect to the Col of interest.

What I’ve tried

I figured out a way to get the numbers in Table 2, with short SQL queries:

-- Somehow use in Table 2 
SELECT 
    col_1,
    col_2,
    col_3,
    col_4,
    COUNT() AS ct
FROM sample_table
GROUP BY 
    col_1,
    col_2,
    col_3,
    col_4
ORDER BY ct DESC;

this returns

Col_1 Col_2 Col_3 Col_4 ct
A Q R U 3
A Q R P 2
A Q T P 2
A Q T U 2
B Q R P 2
C Q R P 2
C Q R U 2
C Q T U 2
B Q R U 1
B Q T U 1
C Q T P 1

which are the correct numbers (ct) for Col_4 of Table 2. I would just need to match the columns in this output and Table 1. I can also GROUP BYfewer Columns to get the correct numbers for Col_3, Col_2, and 1. e.g.

SELECT 
    col_1,
    col_2,
    col_3,
    COUNT() AS ct
FROM samptbl
GROUP BY 
    col_1,
    col_2,
    col_3
ORDER BY ct DESC;
Col_1 Col_2 Col_3 ct
A Q R 5
A Q T 4
C Q R 4
B Q R 3
C Q T 3
B Q T 1

I’m doing this in SQLite right now but I can use any other language if needed (PostgreSQL, Python). I would really like this to be 100% in a SQL if possible. The data sets that I’m working with are large, so good efficiency is desired.

Advertisement

Answer

You can do it with window function COUNT():

SELECT ID,
       COUNT(*) OVER (PARTITION BY Col_1) Col_1,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2) Col_2,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2, Col_3) Col_3,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2, Col_3, Col_4) Col_4
FROM table1
ORDER BY ID

Without any changes it works for Postgresql also.
See the demo.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement