Skip to content
Advertisement

PostgreSQL join table on json property and get oldest first from result with nulls first

I got 2 tables; domains and events. Im trying to create a query that returns a list of distinct domains that is ordered by the oldest events (for that domain) with nulls first and distinct domain.

Basically this query will do the job:

SELECT * FROM domains left join events on events.attributes ->> 'domain' = domains.domain AND events.name = 'event1' WHERE parent is null ORDER BY domain, moment asc nulls first;

But the output is not distinct on ‘domain’. When using ‘distinct on’, it gives the wrong output where the timestamp (moment) is not the latest one for that domain in the events table:

SELECT distinct on (domain) domain,moment FROM domains left join events on events.attributes ->> 'domain' = domains.domain AND events.name = 'event1' WHERE parent is null ORDER BY domain, moment asc nulls first;

Nor does this seem to work:

SELECT * FROM (SELECT DISTINCT on (domain) domain,moment,parent FROM "domains" left join events on events.attributes ->> 'domain' = domains.domain AND events.name = 'event1') AS domains ORDER BY moment asc nulls first;

The following code simulates the database and the query that i need to have distinct on without having distinct modify the order of the rows:

(Oh, and as you will see, the parent needs to be null so it only selects top domains. But thats a simple ‘where’.)

create table domains (domain text, parent text);
        insert into domains (domain, parent) values ('stackoverflow.com', null);
        insert into domains (domain, parent) values ('test.stackoverflow.com', 'stackoverflow.com');
        insert into domains (domain, parent) values ('github.com', null);
        insert into domains (domain, parent) values ('example.com', null);
        insert into domains (domain, parent) values ('google.com', null);
        
create table events (name text, attributes jsonb, moment timestamp with time zone);
        insert into events (name, attributes, moment) values('event1', '{"domain": "example.com"}', '2011-01-01');
        insert into events (name, attributes, moment) values('event1', '{"domain": "github.com"}', '2012-01-01');
        insert into events (name, attributes, moment) values('event1', '{"domain": "stackoverflow.com"}', '2013-01-01');
        insert into events (name, attributes, moment) values('event1', '{"domain": "example.com"}', '2014-01-01');
        insert into events (name, attributes, moment) values('event1', '{"domain": "stackoverflow.com"}', '2015-01-01');

SELECT * FROM domains left join events on events.attributes ->> 'domain' = domains.domain AND events.name = 'event1' WHERE parent is null ORDER BY domain, moment asc nulls first;

How do I get this to work?

Advertisement

Answer

Have you considered using a row_number window function? Something like

SELECT *
FROM
(
SELECT *, row_number() OVER (PARTITION BY events.attributes->>'domain' ORDER BY moment ASC) rn FROM domains left join events on events.attributes ->> 'domain' = domains.domain AND events.name = 'event1' WHERE parent is null
) a
WHERE rn = 1
ORDER BY domain, moment asc nulls first

The inner query uses row_number() OVER (PARTITION BY events.attributes->>'domain' ORDER BY moment ASC) rn to create a field that, for each events.attributes->>'domain' grouping, numbers things in order of events.moment. The outer query simply limits to the first one for each grouping and does the final ordering.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement