I have the following table:
x
CREATE TABLE my_table (
the_visitor_id varchar(5) NOT NULL,
the_visitor_visit timestamp NOT NULL,
the_visitor_returning text
);
INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','YES' ),
('VIS01', '2019-05-04 12:00:00',NULL ),
('VIS01', '2019-05-05 18:00:00',NULL ),
('VIS02', '2019-05-06 18:30:00',NULL),
('VIS02', '2019-05-15 12:00:00',NULL),
('VIS03', '2019-06-30 18:00:00','YES'),
('VIS04', '2019-06-30 18:00:00','NULL');
And I would like to filter out all visitor_id’s that have only one observation (or record). In this case VIS03 and VIS04, so I must end up with VIS01 and VIS02. I tried this:
SELECT DISTINCT ON(the_visitor_id) the_visitor_id,
the_visitor_visit, the_visitor_returning
FROM my_table
The expected result should be:
the_visitor_id the_visitor_visit the_visitor_returning
VIS01 2019-05-02 09:00:00 YES
VIS01 2019-05-04 12:00:00
VIS01 2019-05-05 18:00:00
VIS02 2019-05-06 18:30:00
VIS02 2019-05-15 12:00:00
But I guess that something like a rank is needed. Any help will be greatly appreciated.
Advertisement
Answer
There are probably other ways of doing this, but it you create a derived table CTE of only the visitor_ids that have more than 1 row, then use that in the join to your table. Obviously, if my_table is large an index would enhance the performance.
WITH cte
AS (
SELECT the_visitor_id
FROM my_table
GROUP BY the_visitor_id
HAVING count(*) > 1
)
SELECT my_table.*
FROM my_table
INNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id