I have the following table:
CREATE TABLE my_table ( the_visitor_id varchar(5) NOT NULL, the_visitor_visit timestamp NOT NULL, the_visitor_returning text ); INSERT INTO my_table VALUES ('VIS01', '2019-05-02 09:00:00','YES' ), ('VIS01', '2019-05-04 12:00:00',NULL ), ('VIS01', '2019-05-05 18:00:00',NULL ), ('VIS02', '2019-05-06 18:30:00',NULL), ('VIS02', '2019-05-15 12:00:00',NULL), ('VIS03', '2019-06-30 18:00:00','YES'), ('VIS04', '2019-06-30 18:00:00','NULL');
And I would like to filter out all visitor_id’s that have only one observation (or record). In this case VIS03 and VIS04, so I must end up with VIS01 and VIS02. I tried this:
SELECT DISTINCT ON(the_visitor_id) the_visitor_id, the_visitor_visit, the_visitor_returning FROM my_table
The expected result should be:
the_visitor_id the_visitor_visit the_visitor_returning VIS01 2019-05-02 09:00:00 YES VIS01 2019-05-04 12:00:00 VIS01 2019-05-05 18:00:00 VIS02 2019-05-06 18:30:00 VIS02 2019-05-15 12:00:00
But I guess that something like a rank is needed. Any help will be greatly appreciated.
Advertisement
Answer
There are probably other ways of doing this, but it you create a derived table CTE of only the visitor_ids that have more than 1 row, then use that in the join to your table. Obviously, if my_table is large an index would enhance the performance.
WITH cte AS ( SELECT the_visitor_id FROM my_table GROUP BY the_visitor_id HAVING count(*) > 1 ) SELECT my_table.* FROM my_table INNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id