Skip to content
Advertisement

Count string occurances within a list column – Snowflake/SQL

I have a table with a column that contains a list of strings like below:

EXAMPLE:

STRING                                                                 User_ID    [...]
"[""null"",""personal"",""Other""]"                                    2122213    .... 
"[""Other"",""to_dos_and_thing""]"                                     2132214    ....  
"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"         4342323    ....

QUESTION:

I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:

SELECT u.STRING, count(u.USERID) as cnt
FROM table u
group by  u.STRING
order by cnt desc;

However the above method doesn’t work as it only counts the number of user ids that use a specific grouping of strings.

The ideal output using the example above would like this!

DESIRED OUTPUT:

STRING                     COUNT_Instances                                                             
"null"                     1223
"personal"                 543
"Other"                    324                  
"to_dos_and_thing"         221                                
"getting_things_done"      146
"Work!!!!!"                22 

Advertisement

Answer

Based on your description, here is my sample table:

create table u (user_id number, string varchar);

insert into u values
(2122213, '"[""null"",""personal"",""Other""]"'),
(2132214, '"[""Other"",""to_dos_and_thing""]"'),
(2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );

I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here’s the query and output:

select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u,
lateral SPLIT_TO_TABLE( string  , ',' ) s
GROUP BY extracted
order by count(*) DESC;


+---------------------+----------+
|      EXTRACTED      | COUNT(*) |
+---------------------+----------+
| Other               |        2 |
| null                |        1 |
| personal            |        1 |
| to_dos_and_thing    |        1 |
| getting_things_done |        1 |
| TO_dos_and_thing    |        1 |
| Work!!!!!           |        1 |
+---------------------+----------+

SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement