Find IDs with only one observation in PostgreSQL

I have the following table:

CREATE TABLE my_table (
    the_visitor_id varchar(5) NOT NULL, 
    the_visitor_visit timestamp NOT NULL,
    the_visitor_returning text
);

INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','YES' ), 
       ('VIS01', '2019-05-04 12:00:00',NULL ), 
       ('VIS01', '2019-05-05 18:00:00',NULL ), 
       ('VIS02', '2019-05-06 18:30:00',NULL), 
       ('VIS02', '2019-05-15 12:00:00',NULL), 
       ('VIS03', '2019-06-30 18:00:00','YES'),
       ('VIS04', '2019-06-30 18:00:00','NULL');

​x
 
CREATE TABLE my_table (    the_visitor_id varchar(5) NOT NULL,     the_visitor_visit timestamp NOT NULL,    the_visitor_returning text);​INSERT INTO my_tableVALUES ('VIS01', '2019-05-02 09:00:00','YES' ),        ('VIS01', '2019-05-04 12:00:00',NULL ),        ('VIS01', '2019-05-05 18:00:00',NULL ),        ('VIS02', '2019-05-06 18:30:00',NULL),        ('VIS02', '2019-05-15 12:00:00',NULL),        ('VIS03', '2019-06-30 18:00:00','YES'),       ('VIS04', '2019-06-30 18:00:00','NULL');​

And I would like to filter out all visitor_id’s that have only one observation (or record). In this case VIS03 and VIS04, so I must end up with VIS01 and VIS02. I tried this:

SELECT DISTINCT ON(the_visitor_id) the_visitor_id, 
the_visitor_visit, the_visitor_returning 
FROM my_table

 
SELECT DISTINCT ON(the_visitor_id) the_visitor_id, the_visitor_visit, the_visitor_returning FROM my_table​

The expected result should be:

the_visitor_id   the_visitor_visit    the_visitor_returning
VIS01            2019-05-02 09:00:00  YES
VIS01            2019-05-04 12:00:00    
VIS01            2019-05-05 18:00:00    
VIS02            2019-05-06 18:30:00    
VIS02            2019-05-15 12:00:00

 
the_visitor_id   the_visitor_visit    the_visitor_returningVIS01            2019-05-02 09:00:00  YESVIS01            2019-05-04 12:00:00    VIS01            2019-05-05 18:00:00    VIS02            2019-05-06 18:30:00    VIS02            2019-05-15 12:00:00       ​

But I guess that something like a rank is needed. Any help will be greatly appreciated.

Answer

There are probably other ways of doing this, but it you create a derived table CTE of only the visitor_ids that have more than 1 row, then use that in the join to your table. Obviously, if my_table is large an index would enhance the performance.

WITH cte
AS (
    SELECT the_visitor_id
    FROM my_table
    GROUP BY the_visitor_id
    HAVING count(*) > 1
    )
SELECT my_table.*
FROM my_table
INNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id

 
WITH cteAS (    SELECT the_visitor_id    FROM my_table    GROUP BY the_visitor_id    HAVING count(*) > 1    )SELECT my_table.*FROM my_tableINNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id​

Advertisement

Answer