Skip to content
Advertisement

Redshift get Json Keys

I have a table field with a value like this:

{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}

I want to know if there is some way to get the “0fc8a2a1-e334-43b8-9311-ce46da9cd32c” and “4344d89b-7f0d-4453-b2c5-d0d4a39d7d25” values.

Advertisement

Answer

Redshift is not good with JSON, and especially not good with arbitrary JSON keys (as @GMB mentioned). Also it’s not good with nested data structures.

So actually, you have 2 problems:

  1. extacting the json keys. I see 2 options here:
  • Use python UDF
  • Use regex
  1. unnesting an array of keys into a table. There’s a trick to unnest data into rows (see CROSS JOIN with seq table in queries belowbelow) – described in this SO answer.

1. Solution with python UDF

you can implement json parsing in python and register it as a user defined function https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html

function:

create or replace function f_py_json_keys (a varchar(65535))
    returns varchar(65535)
    stable
as $$
    import json
    return ",".join(json.loads(a).keys())
$$ language plpythonu;

query:

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select f_py_json_keys(json) as keys,
           regexp_count(keys, ',') + 1 as number_of_occurrences
    from input
)
select
    split_part(keys, ',', idx) as id
from input_with_occurences cross join seq
where idx <= number_of_occurrences

2. Solution with REGEX magic

Redshift has some regex functions. That’s a working example that would do the job for the payload you specified:

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select *,
           regexp_count(json,
                        '\{?\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\":\s\{[\w\s\":,]+\}') as number_of_occurrences
    from input
)
select
       REGEXP_SUBSTR(json, '\{?\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\":\s\{[\w\s\":,]+\}', 1, idx, 'e') as id
       from input_with_occurences cross join seq
        where idx <= number_of_occurrences

result looks like:

+------------------------------------+
|id                                  |
+------------------------------------+
|0fc8a2a1-e334-43b8-9311-ce46da9cd32c|
|4344d89b-7f0d-4453-b2c5-d0d4a39d7d25|
+------------------------------------+
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement