Skip to content
Advertisement

Speed up query on JSONB object field Postgres / indexing a JSONB field

I am trying to speed up queries on an object field in Postgres.

The table I am searching has the following structure:

  • page_id: integer
  • lang: varchar(2)
  • images: jsonb

The images JSONB field contains objects like this:

    {
      "-1": {
        "ns": 6,
        "known": "",
        "title": "File:Architrave nuraghe.jpg",
        "missing": "",
        "contentmodel": "wikitext",
        "pagelanguage": "it",
        "pagelanguagedir": "ltr",
        "pagelanguagehtmlcode": "it"
      },
      "-2": {
        "ns": 6,
        "known": "",
        "title": "File:Commons-logo.svg",
        "missing": "",
        "contentmodel": "wikitext",
        "pagelanguage": "it",
        "pagelanguagedir": "ltr",
        "pagelanguagehtmlcode": "it"
      },
    }

I need to retrieve all pages that use a certain file – which I do like so:

select * from (
       select lang, page_id, img.b::jsonb->>'title' as file
       from (
            select *
        from pages where jsonb_typeof(images_jsonb) ='object') a,
             jsonb_each(images_jsonb) as img(a, b)
        ) as q
where file = 'File:Vigoleno castello2.jpg';

It works ok but it’s dead slow – query plan is like this:

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..199113.78 rows=3998 width=39)
   ->  Seq Scan on pages  (cost=0.00..193066.80 rows=3998 width=39)
         Filter: (jsonb_typeof(images_jsonb) = 'object'::text)
   ->  Function Scan on jsonb_each img  (cost=0.00..1.50 rows=1 width=32)
         Filter: ((b ->> 'title'::text) = 'File:Vigoleno castello2.jpg'::text)
(5 rows)

I am thinking that indexing on the object field, and I’ve read up a bit on indexing on object, but can’t find something about indexing on fields of fields.

I have thought about normalizing into another table as an alternative strategy, but I’d like to avoid that (keeping things in sync etc is a bit of a burden).

Any ideas?

Advertisement

Answer

You can’t do this with the normal jsonb index methods, as they don’t let you index into object values without specifying the keys leading to them.

You can use a helper function which converts your JSONB to an array of values.

create function jsonb_to_array(jsonb) returns text[] immutable language sql as $$ 
    select array_agg(value->>'title') from jsonb_each ($1)
$$;

create index on pages using gin (jsonb_to_array(images)) 
    where jsonb_typeof(images) ='object'

select * from pages where 
    jsonb_typeof(images_jsonb) ='object' and
    jsonb_to_array(images_jsonb) && ARRAY['File:Vigoleno castello2.jpg']

Note that I didn’t preserve the unnesting in your query, because I don’t know to what extent you wanted the unnested results, versus just doing the unnesting to provide a way to get at the correct row. You might need to filter on ‘File:Vigoleno castello2.jpg’ twice, once to get the correct row via the index, and once more to get the correct element from within the row.

There are many other variations on this theme. You could use the helper function to return a JSONB with an array of titles, rather than a text[]. Or you could make it return a JSONB of an array of objects, rather than an object of objects. If you did that, you could use @> to query into that array of objects.

create function jsonb_to_array2(jsonb) returns jsonb immutable language sql as $$ 
        select jsonb_agg(value||jsonb_build_object('key',key)) from jsonb_each ($1)
$$;

create index on pages using gin (jsonb_to_array2(images_jsonb)) 
    where jsonb_typeof(images_jsonb) ='object'

select * from pages where 
    jsonb_typeof(images_jsonb) ='object' and
    jsonb_to_array2(images_jsonb) @> '[{"title":"File:Architrave nuraghe.jpg"}]';
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement