Skip to content
Advertisement

Is there a way to count the number of unique value across multiple columns in SQL

I would like to count the number of unique values based on the tx_id, here is part of the raw data:

For example, the result should look like below:

From the raw data, you can see that there are two different tx_id and I use this to identify each of the group. Therefore, for insance, all the tx_id = '149362 is in the same group.

And in the column of pa3 and pa4, there are 2 different group which can be categorized by looking at the first 4 characters, like “V16F”, “V15S”. Moreover, I have to count the number of different wordings in the same group. For example, you can see that column pa3 contains V16F2021117016, V15S2021144019, V16F2021117017 while column pa4 contains only . Therefore, there are V15S2021145018.

Therefore, we count 2 for the group “V16F” and 2 for group “V15S”. You may notice that the counting is not based on the columns pa3 and pa4 but based on the last 4 characters. For example V16F2021117016andV16F2021117017, they belong to the same group,”V16F”, but different words since the last 4 characters are ‘7016’ and ‘7017’ respectively.

However I could not find a way out at this moment and only typed some sql code in below. Hopes someone can help me.

Here is the wrong output:

Advertisement

Answer

The simplest way to do it is to use UNION ALL to get all pa3s and pa4s in 1 column and then aggregate:

Or, with UNION, which removes duplicate rows, so there is no need for DISTINCT:

Which can be further simplified to:

Another way, is to directly use conditional aggregation with more complicated logic that works for this sample data:

See the demo.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement