Skip to content
Advertisement

Eliminate double quotes from a json field and deal with characters in PostgreSQL

Some time ago I needed help with a postgresql database with a jsonb field to convert numerical values surrounded by double quotes into standard numerical values and this solution worked very well: Eliminate double quotes from a json field in a selective manner in PostgreSQL

I had a field named “test_json” which is composed of for example these 2 rows:

{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}

And the problem was to have these 2 lines after correction:

{"test_name": "Full", "parameter1": 4.219, "parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}

So the proposed solution is really great…

However I have now cases where the database has been fed with a random character instead of a number in this case this conversion does not work. Example of bad datas:

{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "}", "parameter2": 4.159}

Or

{"test_name": "Full", "parameter1": "u", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}

But it is hardly the same character so I struggle to find and replace them automatically…

Does anyone have an idea for a postgresql function in a json field to test if the value of the field “parameter1” can be converted to numerical and, if not, replace it with a missing value within this field? Like for example after final correction with old first line to “parameter1”: “u”

{"test_name": "Full,"parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}

Thanks a lot!!

Advertisement

Answer

step-by-step demo:db<>fiddle

Deconstruct the JSON element, filter all numeric-like texts and reconstruct the resulting elements:

SELECT
    id,
    jsonb_object_agg(key, value::numeric)    -- 3
       ||  jsonb_build_object('test_name', test_json -> 'test_name') as test_json  -- 4
FROM
    t_test,
    jsonb_each_text(test_json) as objects    -- 1
WHERE value ~ '^[0-9.]+$'                   -- 2
GROUP BY id, test_json
  1. Deconstruction: All attributes to one row
  2. Filter all values that look like numbers
  3. Reaggregate the JSON object
  4. Add the “test_name” attribute manually, because this attribute was removed by the filter before.

The UPDATE statement:

UPDATE t_test t
SET test_json = s.test_json
FROM (
    -- <query above>
) s
WHERE t.id = s.id;

Edit: 2nd Version using regexp_replace():

  1. Step: Eliminate all non-numeric occurrences text-based using a RegExp:

demo:db<>fiddle

SELECT 
    regexp_replace(
        test_json::text, 
        '"((parameter1|parameter2)": "(?:(?![0-9.]+).)+?)"', '', 'g'
    )
FROM
    t_test

This uses the “negative lookbehind” solution from this answer: https://stackoverflow.com/a/977294/3984221

  1. Step: Eliminate the " characters with your former approach.
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement