I have a table with two columns, user ids and strings. I want to add a third column that counts the number of strings within the second column that start with the entire string value in any given row. There is only one row per user.
The goal is to get the following table structure:
Here the count is equal to the number of rows in the string column that start with the given value in that row. I’ve tried using count(string like string + ‘%’) in combination with over(partition by string) but it doesn’t seem to work how I had hoped.
Any help is appreciated (btw I’m using SQL on Redshift).
Advertisement
Answer
The fastest way in Redshift is probably to use window functions. However, it requires a lot of verbosity — because you need a separate column for each string length:
select t.*, (case when string = string_1 then count(*) over (partition by string_1) when string = string_2 then count(*) over (partition by string_2) when string = string_3 then count(*) over (partition by string_3) end) from (select t.*, left(string, 1) as string_1, left(string, 2) as string_2, left(string, 3) as string_3 from t ) t;
Hmmm . . . The subquery is not needed:
select t.*, (case len(string) when 1 then count(*) over (partition by left(string, 1)) when 2 then count(*) over (partition by left(string, 2)) when 3 then count(*) over (partition by left(string, 3)) end) from t;