Skip to content
Advertisement

Find IDs with only one observation in PostgreSQL

I have the following table:

CREATE TABLE my_table (
    the_visitor_id varchar(5) NOT NULL, 
    the_visitor_visit timestamp NOT NULL,
    the_visitor_returning text
);

INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','YES' ), 
       ('VIS01', '2019-05-04 12:00:00',NULL ), 
       ('VIS01', '2019-05-05 18:00:00',NULL ), 
       ('VIS02', '2019-05-06 18:30:00',NULL), 
       ('VIS02', '2019-05-15 12:00:00',NULL), 
       ('VIS03', '2019-06-30 18:00:00','YES'),
       ('VIS04', '2019-06-30 18:00:00','NULL');

And I would like to filter out all visitor_id’s that have only one observation (or record). In this case VIS03 and VIS04, so I must end up with VIS01 and VIS02. I tried this:

SELECT DISTINCT ON(the_visitor_id) the_visitor_id, 
the_visitor_visit, the_visitor_returning 
FROM my_table

The expected result should be:

the_visitor_id   the_visitor_visit    the_visitor_returning
VIS01            2019-05-02 09:00:00  YES
VIS01            2019-05-04 12:00:00    
VIS01            2019-05-05 18:00:00    
VIS02            2019-05-06 18:30:00    
VIS02            2019-05-15 12:00:00       

But I guess that something like a rank is needed. Any help will be greatly appreciated.

Advertisement

Answer

There are probably other ways of doing this, but it you create a derived table CTE of only the visitor_ids that have more than 1 row, then use that in the join to your table. Obviously, if my_table is large an index would enhance the performance.

WITH cte
AS (
    SELECT the_visitor_id
    FROM my_table
    GROUP BY the_visitor_id
    HAVING count(*) > 1
    )
SELECT my_table.*
FROM my_table
INNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement