I have a JSON structure with two arrays saved in a JSONB column. A bit simplified it looks like this
{ "prop1": "abc", "prop2": "xyz", "items": [ { "itemId": "123", "price": "10.00" }, { "itemId": "124", "price": "9.00" }, { "itemId": "125", "price": "8.00" } ], "groups": [ { "groupId": "A", "discount": "20", "discountId": "1" }, { "groupId": "B", "discount": "30", "discountId": "2" }, { "groupId": "B", "discount": "20", "discountId": "3" }, { "groupId": "C", "discount": "40", "discountId": "4" } ] }
Schema:
CREATE TABLE campaign ( id TEXT PRIMARY KEY, data JSONB );
Since each row (data column) can be fairly large, I’m trying to filter out matching item objects and group objects from the items
and groups
arrays.
My current query is this
SELECT * FROM campaign WHERE (data -> 'items' @> '[{"productId": "123"}]') OR (data -> 'groups' @> '[{"groupId": "B"}]')
which returns rows containing either the matching group or the matching item. However, depending on the row, the data
column can be a fairly large JSON object (there may be hundreds of objects in items
and tens in groups
and I’ve omitted several keys/properties for brevity in this example) which is affecting query performance (I’ve added GIN indexes on the items
and groups
arrays, so missing indices is not why it’s slow).
How can I filter out the items
and groups
arrays to only contain matching elements?
Given this matching row
{ "prop1": "abc", "prop2": "xyz", "items": [ { "itemId": "123", "price": "10.00" }, { "itemId": "124", "price": "9.00" }, { "itemId": "125", "price": "8.00" } ], "groups": [ { "groupId": "A", "discount": "20", "discountId": "1" }, { "groupId": "B", "discount": "30", "discountId": "2" }, { "groupId": "B", "discount": "20", "discountId": "3" }, { "groupId": "C", "discount": "40", "discountId": "4" } ] }
I’d like the result to be something like this (the matching item/group could be in different columns from the rest of the data
column – doesn’t have to be returned in a single JSON object with two arrays like this, but I would prefer it if doesn’t affect performance or lead to a really hairy query):
{ "prop1": "abc", "prop2": "xyz", "items": [ { "itemId": "123", "price": "10.00" } ], "groups": [ { "groupId": "B" "discount": "20", "discountId": "3" } ] }
What I’ve managed to do so far is unwrap and match an object in the items
array using this query, which removes the ‘items’ array from the data
column and filters out the matching item
object to a separate column, but I’m struggling to join this with matches in the groups
array.
SELECT data - 'items', o.obj FROM campaign c CROSS JOIN LATERAL jsonb_array_elements(c.data #> '{items}') o(obj) WHERE o.obj ->> 'productId' = '124'
How can I filter both arrays in one query?
Bonus question: For the groups
array I also want to return the object with the lowest discount
value if possible. Or else the result would need to be an array of matching group objects instead of a single matching group
.
Related questions: How to filter jsonb array elements and How to join jsonb array elements in Postgres?
Advertisement
Answer
If your postgres version is 12 or more, you can use the jsonpath language and functions. The query below returns the expected result with the subset of items and groups which match the given criteria. Then you can adapt this query within a sql function so that the search criteria is an input parameter.
SELECT jsonb_set(jsonb_set( data , '{items}' , jsonb_path_query_array(data, '$.items[*] ? (@.itemId == "123" && @.price == "10.00")')) , '{groups}' , jsonb_path_query_array(data, '$.groups[*] ? (@.groupId == "B" && @.discount == "20" && @.discountId == "3")')) FROM (SELECT '{ "prop1": "abc", "prop2": "xyz", "items": [ { "itemId": "123", "price": "10.00" }, { "itemId": "124", "price": "9.00" }, { "itemId": "125", "price": "8.00" } ], "groups": [ { "groupId": "A", "discount": "20", "discountId": "1" }, { "groupId": "B", "discount": "30", "discountId": "2" }, { "groupId": "B", "discount": "20", "discountId": "3" }, { "groupId": "C", "discount": "40", "discountId": "4" } ] }' :: jsonb) AS d(data) WHERE jsonb_path_exists(data, '$.items[*] ? (@.itemId == "123" && @.price == "10.00")') AND jsonb_path_exists(data, '$.groups[*] ? (@.groupId == "B" && @.discount == "20" && @.discountId == "3")')