Some time ago I needed help with a postgresql database with a jsonb field to convert numerical values surrounded by double quotes into standard numerical values and this solution worked very well: Eliminate double quotes from a json field in a selective manner in PostgreSQL
I had a field named “test_json” which is composed of for example these 2 rows:
{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137} {"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}
And the problem was to have these 2 lines after correction:
{"test_name": "Full", "parameter1": 4.219, "parameter2": 4.4137} {"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}
So the proposed solution is really great…
However I have now cases where the database has been fed with a random character instead of a number in this case this conversion does not work. Example of bad datas:
{"test_name": "Full", "parameter1": "4.219", "parameter2": 4.4137} {"test_name": "Full", "parameter1": "}", "parameter2": 4.159}
Or
{"test_name": "Full", "parameter1": "u", "parameter2": 4.4137} {"test_name": "Full", "parameter1": "3.758", "parameter2": 4.159}
But it is hardly the same character so I struggle to find and replace them automatically…
Does anyone have an idea for a postgresql function in a json field to test if the value of the field “parameter1” can be converted to numerical and, if not, replace it with a missing value within this field? Like for example after final correction with old first line to “parameter1”: “u”
{"test_name": "Full,"parameter2": 4.4137} {"test_name": "Full", "parameter1": 3.758, "parameter2": 4.159}
Thanks a lot!!
Advertisement
Answer
Deconstruct the JSON element, filter all numeric-like texts and reconstruct the resulting elements:
SELECT id, jsonb_object_agg(key, value::numeric) -- 3 || jsonb_build_object('test_name', test_json -> 'test_name') as test_json -- 4 FROM t_test, jsonb_each_text(test_json) as objects -- 1 WHERE value ~ '^[0-9.]+$' -- 2 GROUP BY id, test_json
- Deconstruction: All attributes to one row
- Filter all values that look like numbers
- Reaggregate the JSON object
- Add the “test_name” attribute manually, because this attribute was removed by the filter before.
The UPDATE
statement:
UPDATE t_test t SET test_json = s.test_json FROM ( -- <query above> ) s WHERE t.id = s.id;
Edit: 2nd Version using regexp_replace()
:
- Step: Eliminate all non-numeric occurrences text-based using a RegExp:
SELECT regexp_replace( test_json::text, '"((parameter1|parameter2)": "(?:(?![0-9.]+).)+?)"', '', 'g' ) FROM t_test
This uses the “negative lookbehind” solution from this answer: https://stackoverflow.com/a/977294/3984221
- Step: Eliminate the
"
characters with your former approach.