Skip to content
Advertisement

Ideal Postgres Index For Json Data With Integer Timestamp

I have millions of records in this table using Amazon Aurora Postgres 10.7:

create table "somedb"."sometable"
(
    id varchar(4096) not null constraint "sometable_pkey" primary key,
    tag varchar(255) not null,
    json jsonb not null
);

Example row:

{"id": "abc", "ts": 1580879910, "data": "my stuff"}

I have these queries that take dozens of seconds:

SELECT jsonData->'data'
WHERE (jsonData->>'ts' >= '1576000473')
ORDER BY jsonData->>'ts' ASC LIMIT 100 OFFSET 50000;

I’m trying to improve performance here, and these are all the indexes that I tried, but at most I get an INDEX SCAN in the query plan at best.

create index "sometable_ts"
on "somedb"."sometable" ((jsondata -> 'ts'::text));

create index "sometable_ts-int" 
on "somedb"."sometable" using btree (((jsondata ->> 'ts')::integer));

I adjust my queries as well to: ORDER BY (jsonData->>'ts')::integer, but nothing.

Best plan:

Limit  (cost=613080.18..613149.46 rows=100 width=356) (actual time=24934.492..24937.344 rows=100 loops=1)
    ->  Index Scan using "sometable_ts-int" on "sometable"  (cost=0.43..3891408.61 rows=5616736 width=356) (actual time=0.068..24889.459 rows=885000 loops=1)
        Index Cond: (((jsondata ->> 'ts'::text))::integer >= 1576000473)
Planning time: 0.145 ms
Execution time: 24937.381 ms

Can anyone recommend a way to adjust the indexes or queries for these to become faster? Thanks!

Advertisement

Answer

Using OFFSET like this will always cause bad performance.

You should use keyset pagination:

Create this index:

CREATE INDEX ON somedb.sometable (id, (jsonData->>'ts'));

Then, to paginate, your first query is:

SELECT jsonData->'data'
FROM somedb.sometable
WHERE jsonData->>'ts' >= '1576000473'
ORDER BY jsonData->>'ts', id
LIMIT 100;

Remember jsonData->>'ts' and id from the last result row you got in last_ts and last_id.

Your next page is found with

SELECT jsonData->'data'
FROM somedb.sometable
WHERE (jsonData->>'ts', id) > (last_ts, last_id)
ORDER BY jsonData->>'ts', id
LIMIT 100;

Keep going like this, and retrieving the 500th page will be as fast as retrieving the first.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement