I have a table with a column that contains a list of strings like below:
EXAMPLE:
STRING User_ID [...] "[""null"",""personal"",""Other""]" 2122213 .... "[""Other"",""to_dos_and_thing""]" 2132214 .... "[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]" 4342323 ....
QUESTION:
I want to be able to get a count of the amount of times each unique string appears (strings are seperable within the strings column by commas) but only know how to do the following:
SELECT u.STRING, count(u.USERID) as cnt FROM table u group by u.STRING order by cnt desc;
However the above method doesn’t work as it only counts the number of user ids that use a specific grouping of strings.
The ideal output using the example above would like this!
DESIRED OUTPUT:
STRING COUNT_Instances "null" 1223 "personal" 543 "Other" 324 "to_dos_and_thing" 221 "getting_things_done" 146 "Work!!!!!" 22
Advertisement
Answer
Based on your description, here is my sample table:
create table u (user_id number, string varchar); insert into u values (2122213, '"[""null"",""personal"",""Other""]"'), (2132214, '"[""Other"",""to_dos_and_thing""]"'), (2132215, '"[""getting_things_done"",""TO_dos_and_thing"",""Work!!!!!""]"' );
I used SPLIT_TO_TABLE to split each string as a row, and then REGEXP_SUBSTR to clean the data. So here’s the query and output:
select REGEXP_SUBSTR( s.VALUE, '""(.*)""', 1, 1, 'i', 1 ) extracted, count(*) from u, lateral SPLIT_TO_TABLE( string , ',' ) s GROUP BY extracted order by count(*) DESC; +---------------------+----------+ | EXTRACTED | COUNT(*) | +---------------------+----------+ | Other | 2 | | null | 1 | | personal | 1 | | to_dos_and_thing | 1 | | getting_things_done | 1 | | TO_dos_and_thing | 1 | | Work!!!!! | 1 | +---------------------+----------+
SPLIT_TO_TABLE https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html REGEXP_SUBSTR https://docs.snowflake.com/en/sql-reference/functions/regexp_substr.html