Some time ago I needed help with a postgresql database with a jsonb field to convert numerical values surrounded by double quotes into standard numerical values and this solution worked very well: Eliminate double quotes from a json field in a selective manner in PostgreSQL
I had a field named “test_json” which is composed of for example these 2 rows:
{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}
And the problem was to have these 2 lines after correction:
{"test_name": "Full", "parameter1": 4.219, "parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}
So the proposed solution is really great…
However I have now cases where the database has been fed with a random character instead of a number in this case this conversion does not work. Example of bad datas:
{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "}", "parameter2": 4.159}
Or
{"test_name": "Full", "parameter1": "u", "parameter2": 4.4137}
{"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}
But it is hardly the same character so I struggle to find and replace them automatically…
Does anyone have an idea for a postgresql function in a json field to test if the value of the field “parameter1” can be converted to numerical and, if not, replace it with a missing value within this field? Like for example after final correction with old first line to “parameter1”: “u”
{"test_name": "Full,"parameter2": 4.4137}
{"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}
Thanks a lot!!
Advertisement
Answer
Deconstruct the JSON element, filter all numeric-like texts and reconstruct the resulting elements:
SELECT
id,
jsonb_object_agg(key, value::numeric) -- 3
|| jsonb_build_object('test_name', test_json -> 'test_name') as test_json -- 4
FROM
t_test,
jsonb_each_text(test_json) as objects -- 1
WHERE value ~ '^[0-9.]+$' -- 2
GROUP BY id, test_json
- Deconstruction: All attributes to one row
- Filter all values that look like numbers
- Reaggregate the JSON object
- Add the “test_name” attribute manually, because this attribute was removed by the filter before.
The UPDATE
statement:
UPDATE t_test t
SET test_json = s.test_json
FROM (
-- <query above>
) s
WHERE t.id = s.id;
Edit: 2nd Version using regexp_replace()
:
- Step: Eliminate all non-numeric occurrences text-based using a RegExp:
SELECT
regexp_replace(
test_json::text,
'"((parameter1|parameter2)": "(?:(?![0-9.]+).)+?)"', '', 'g'
)
FROM
t_test
This uses the “negative lookbehind” solution from this answer: https://stackoverflow.com/a/977294/3984221
- Step: Eliminate the
"
characters with your former approach.