Problem Description

I am looking for a SQL query that can create table 2 using table 1. I’ve thought of way to do this but it’s not very eloquent and would prefer something simpler if possible.

TABLE 1:

ID	Col_1	Col_2	Col_3	Col_4
01	A	Q	R	U
02	A	Q	R	P
03	A	Q	R	U
04	A	Q	R	P
05	A	Q	T	U
06	A	Q	T	P
07	A	Q	T	U
08	A	Q	T	P
09	A	Q	R	U
10	B	Q	R	P
11	B	Q	R	U
12	B	Q	R	P
13	B	Q	T	U
14	C	Q	T	P
15	C	Q	T	U
16	C	Q	T	U
17	C	Q	R	P
18	C	Q	R	U
19	C	Q	R	P
20	C	Q	R	U

TABLE 2:

ID	Col_1	Col_2	Col_3	Col_4
01	9	9	5	3
02	9	9	5	2
03	9	9	5	3
04	9	9	5	2
05	9	9	4	2
06	9	9	4	2
07	9	9	4	2
08	9	9	4	2
09	9	9	5	3
10	4	4	3	2
11	4	4	3	1
12	4	4	3	2
13	4	4	1	1
14	7	7	3	1
15	7	7	3	2
16	7	7	3	2
17	7	7	4	2
18	7	7	4	2
19	7	7	4	2
20	7	7	4	2

The numbers in Table 2 are generated as follows:

In Table 2’s Col_1, the number is the count of all IDs that have the same Col_1 value.
In Table 2’s Col_2, the number is the count of all IDs that have the same Col_1 AND Col_2 value.
In Table 2’s Col_3, the number is the count of all IDs that have the same Col_1 AND Col_2 AND Col_3 value.
In Table 2’s Col_4, the number is the count of all IDs that have the same Col_1 AND Col_2 AND Col_3 AND Col_4 value.

Example:
In Table 1, ID01’s Col_1 value is ‘A’. ‘A’ matches with 02, 03, 04, 05, 06, 07, 08, and 09 (8 other IDs) so the number is 9.
ID01’s Col_2 value is ‘Q’. ‘A’ in Col_1 AND ‘Q’ in Col_2 matches with 02, 03, 04, 05, 06, 07, 08, and 09 (8 other IDs) so the number is 9.
ID01’s Col_3 value is ‘R’. ‘A’ in Col_1 AND ‘Q’ in Col_2 AND ‘R’ in Col_3 matches with 02, 03, 04, and 09 (4 other IDs) so the number is 5.
ID01’s Col_4 value is ‘U’. ‘A’ in Col_1 AND ‘Q’ in Col_2 AND ‘R’ in Col_3 AND ‘U’ in Col_4 matches with 03 and 09 (2 other IDs) so the number is 3.

Essentially, we are trying to find the total number of identical IDs in Table 1 – with respect to the Col of interest.

What I’ve tried

I figured out a way to get the numbers in Table 2, with short SQL queries:

-- Somehow use in Table 2 
SELECT 
    col_1,
    col_2,
    col_3,
    col_4,
    COUNT() AS ct
FROM sample_table
GROUP BY 
    col_1,
    col_2,
    col_3,
    col_4
ORDER BY ct DESC;

this returns

Col_1	Col_2	Col_3	Col_4	ct
A	Q	R	U	3
A	Q	R	P	2
A	Q	T	P	2
A	Q	T	U	2
B	Q	R	P	2
C	Q	R	P	2
C	Q	R	U	2
C	Q	T	U	2
B	Q	R	U	1
B	Q	T	U	1
C	Q	T	P	1

which are the correct numbers (ct) for Col_4 of Table 2. I would just need to match the columns in this output and Table 1. I can also GROUP BYfewer Columns to get the correct numbers for Col_3, Col_2, and 1. e.g.

SELECT 
    col_1,
    col_2,
    col_3,
    COUNT() AS ct
FROM samptbl
GROUP BY 
    col_1,
    col_2,
    col_3
ORDER BY ct DESC;

Col_1	Col_2	Col_3	ct
A	Q	R	5
A	Q	T	4
C	Q	R	4
B	Q	R	3
C	Q	T	3
B	Q	T	1

I’m doing this in SQLite right now but I can use any other language if needed (PostgreSQL, Python). I would really like this to be 100% in a SQL if possible. The data sets that I’m working with are large, so good efficiency is desired.

Answer

You can do it with window function COUNT():

SELECT ID,
       COUNT(*) OVER (PARTITION BY Col_1) Col_1,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2) Col_2,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2, Col_3) Col_3,
       COUNT(*) OVER (PARTITION BY Col_1, Col_2, Col_3, Col_4) Col_4
FROM table1
ORDER BY ID

Without any changes it works for Postgresql also.
See the demo.

Counting identical records in a SQL table with respect to a column

Problem Description

What I’ve tried

Advertisement

Answer

ID	Col_1	Col_2	Col_3	Col_4
01	A	Q	R	U
02	A	Q	R	P
03	A	Q	R	U
04	A	Q	R	P
05	A	Q	T	U
06	A	Q	T	P
07	A	Q	T	U
08	A	Q	T	P
09	A	Q	R	U
10	B	Q	R	P
11	B	Q	R	U
12	B	Q	R	P
13	B	Q	T	U
14	C	Q	T	P
15	C	Q	T	U
16	C	Q	T	U
17	C	Q	R	P
18	C	Q	R	U
19	C	Q	R	P
20	C	Q	R	U

ID	Col_1	Col_2	Col_3	Col_4
01	9	9	5	3
02	9	9	5	2
03	9	9	5	3
04	9	9	5	2
05	9	9	4	2
06	9	9	4	2
07	9	9	4	2
08	9	9	4	2
09	9	9	5	3
10	4	4	3	2
11	4	4	3	1
12	4	4	3	2
13	4	4	1	1
14	7	7	3	1
15	7	7	3	2
16	7	7	3	2
17	7	7	4	2
18	7	7	4	2
19	7	7	4	2
20	7	7	4	2

ID	Col_1	Col_2	Col_3	Col_4
01	A	Q	R	U
02	A	Q	R	P
03	A	Q	R	U
04	A	Q	R	P
05	A	Q	T	U
06	A	Q	T	P
07	A	Q	T	U
08	A	Q	T	P
09	A	Q	R	U
10	B	Q	R	P
11	B	Q	R	U
12	B	Q	R	P
13	B	Q	T	U
14	C	Q	T	P
15	C	Q	T	U
16	C	Q	T	U
17	C	Q	R	P
18	C	Q	R	U
19	C	Q	R	P
20	C	Q	R	U

ID	Col_1	Col_2	Col_3	Col_4
01	9	9	5	3
02	9	9	5	2
03	9	9	5	3
04	9	9	5	2
05	9	9	4	2
06	9	9	4	2
07	9	9	4	2
08	9	9	4	2
09	9	9	5	3
10	4	4	3	2
11	4	4	3	1
12	4	4	3	2
13	4	4	1	1
14	7	7	3	1
15	7	7	3	2
16	7	7	3	2
17	7	7	4	2
18	7	7	4	2
19	7	7	4	2
20	7	7	4	2

ID	Col_1	Col_2	Col_3	Col_4
01	A	Q	R	U
02	A	Q	R	P
03	A	Q	R	U
04	A	Q	R	P
05	A	Q	T	U
06	A	Q	T	P
07	A	Q	T	U
08	A	Q	T	P
09	A	Q	R	U
10	B	Q	R	P
11	B	Q	R	U
12	B	Q	R	P
13	B	Q	T	U
14	C	Q	T	P
15	C	Q	T	U
16	C	Q	T	U
17	C	Q	R	P
18	C	Q	R	U
19	C	Q	R	P
20	C	Q	R	U

ID	Col_1	Col_2	Col_3	Col_4
01	9	9	5	3
02	9	9	5	2
03	9	9	5	3
04	9	9	5	2
05	9	9	4	2
06	9	9	4	2
07	9	9	4	2
08	9	9	4	2
09	9	9	5	3
10	4	4	3	2
11	4	4	3	1
12	4	4	3	2
13	4	4	1	1
14	7	7	3	1
15	7	7	3	2
16	7	7	3	2
17	7	7	4	2
18	7	7	4	2
19	7	7	4	2
20	7	7	4	2