I have a postgreSQL database with the following table:
CREATE TABLE dummy ( created_at TIMESTAMPTZ, tweet TEXT, label INT);
The following data is inserted:
INSERT INTO dummy VALUES ('2020-12-18 00:00:00+00', 'foo foo squared', 1); INSERT INTO dummy VALUES ('2020-12-18 00:00:00+00', 'foo fox squared', 1); INSERT INTO dummy VALUES ('2020-12-18 00:00:00+00', 'foo bar', 3); INSERT INTO dummy VALUES ('2020-12-18 00:00:00+00', 'raven bar', 2); INSERT INTO dummy VALUES ('2020-12-17 00:00:00+00', 'bar standard', 1); INSERT INTO dummy VALUES ('2020-12-17 00:00:00+00', 'bar none', 2); INSERT INTO dummy VALUES ('2020-12-17 00:00:00+00', 'bar', 1);
I have the following query that fetches the frequency of each word in the table and ranks them, grouping by day:
select * from ( select date_trunc('day', created_at) as created_day, word, count(*) as cnt, rank() over(partition by date_trunc('day', created_at) order by count(*) desc) rn from dummy d cross join lateral regexp_split_to_table( trim(regexp_replace(tweet, 'y(rt|co|https|amp|none)y', '', 'g')), 's+' ) w(word) group by created_day, word ) t where (created_day > CURRENT_DATE - INTERVAL '10 days') and word IS NOT NULL order by created_day DESC, rn;
Result:
created_day | word | cnt | rn ------------------------+----------+-----+---- 2020-12-18 00:00:00+00 | foo | 4 | 1 2020-12-18 00:00:00+00 | bar | 2 | 2 2020-12-18 00:00:00+00 | squared | 2 | 2 2020-12-18 00:00:00+00 | raven | 1 | 4 2020-12-18 00:00:00+00 | fox | 1 | 4 2020-12-17 00:00:00+00 | bar | 3 | 1 2020-12-17 00:00:00+00 | standard | 1 | 2
I would like to group by the label so that the new query shows the word rankings filtered by label like so:
created_day | word | cnt | rn | label | ------------------------+----------+-----+-------------- 2020-12-18 00:00:00+00 | foo | 3 | 1 | 1 | 2020-12-18 00:00:00+00 | squared | 2 | 2 | 1 | 2020-12-18 00:00:00+00 | fox | 1 | 3 | 1 | 2020-12-17 00:00:00+00 | bar | 2 | 1 | 1 | 2020-12-17 00:00:00+00 | standard | 1 | 2 | 1 |
I’ve tried adding a WHERE clause to filter by predictions:
where (created_day > CURRENT_DATE - INTERVAL '10 days') and word IS NOT NULL and label = 1 order by created_day DESC, rn;
But I get the error column "label" does not exist
.
How would I go about fixing this?
Advertisement
Answer
Your new result shows you want to count and rank per day, word and label. This means you just have to add label
to your GROUP BY
and PARTITION BY
clauses (and probably to ORDER BY
, too).
select * from ( select date_trunc('day', created_at) as created_day, word, label, count(*) as cnt, rank() over(partition by date_trunc('day', created_at), label order by count(*) desc) as rn from dummy d cross join lateral regexp_split_to_table ( trim(regexp_replace(tweet, 'y(rt|co|https|amp|none)y', '', 'g')), 's+' ) w(word) group by created_day, word, label ) t where (created_day > current_date - interval '10 days') and word is not null order by created_day desc, label, rn;