I have the following challenge: I have a table called hashtags_users_grouped
which has the following structure:
hashtag_id | user_id | count 123 1 1 245 1 3 123 2 5
In each row, we find values that tell me when a certain user mentioned a certain hashtag and how many times he did it. In this example, user 1 mentioned hashtag 123 one time and 245 three times, while user 2 only mentioned hashtag 123 five times.
I want to do a query that would give me the following output:
user | hashtag_123 | hashtag_245 1 1 3 2 5 0
In other words, the same information as the first table, but with a column per hashtag, to know the number of times a user mentioned each hashtag.
It would be easy to do this using a recursive method (like using a Pyspark data frame and iterating over each hashtag), but I am looking forward to achieving it in a single query. Do you know any way to do this?
EDIT: User #Larnu said I should use PIVOT
. How would you write a query using it? I tried but didn’t receive the expected results
Advertisement
Answer
Given the column naming you specify and the unknown list of hashtag values, I see you resorting to dynamic SQL.
However, here is a simple PIVOT example based on the data you shared that you may run in SSMS:
DECLARE @hashtags_users_grouped table ( hashtag_id int, [user_id] int, [count] int ); INSERT INTO @hashtags_users_grouped VALUES ( 123, 1, 1 ), ( 245, 1, 3 ), ( 123, 2, 5 ); SELECT [user_id], ISNULL( [123], 0 ) AS hashtag_123, ISNULL( [245], 0 ) AS hashtag_245 FROM @hashtags_users_grouped AS hug PIVOT ( MAX ( [count] ) FOR hashtag_id IN ( [123], [245] ) ) piv;
RETURNS
+---------+-------------+-------------+ | user_id | hashtag_123 | hashtag_245 | +---------+-------------+-------------+ | 1 | 1 | 3 | | 2 | 5 | 0 | +---------+-------------+-------------+
In a PIVOT
, your column values (e.g. 123, 245 ) get transposed into column headers, hence the FOR hashtag_id IN ( [123], [245] )
part. To do this without dynamic SQL you would have to list a “FOR [column]” for every possible hashtag_id value. Given the unknown size of this list, it would quickly become unmanageable trying to maintain the above code when a new value is introduced. So, dynamic SQL to the rescue.
Depending on your version of SQL Server, this is how I might approach it using Dynamic SQL:
DECLARE @headers varchar(MAX), @in varchar(MAX), @pivot varchar(MAX); SELECT @headers = STRING_AGG ( FORMATMESSAGE( 'ISNULL( [%i], 0 ) AS [hashtag_%i]', hashtag_id, hashtag_id ), ', ' ), @in = STRING_AGG( FORMATMESSAGE( '[%i]', hashtag_id ), ', ' ) FROM ( SELECT DISTINCT hashtag_id FROM hashtags_users_grouped ) AS x; SET @pivot = FORMATMESSAGE( 'SELECT [user_id], %s FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( %s ) ) AS piv;' , @headers , @in ); PRINT @headers; PRINT @in; PRINT @pivot;
PRINT @headers;
ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245]
PRINT @in;
[123], [245]
PRINT @pivot;
SELECT [user_id], ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245] FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( [123], [245] ) ) AS piv;
And finally, to execute the dynamic SQL:
EXEC ( @pivot );
Note: The dynamic SQL example does not reference the table variable.