We have content and country tables. Country is pretty simple: country_name column defined as string: Albania, Belgium, China, Denmark etc…
Content is a table with half a million of rows with various data with countries column defined as array text[]. Each value there has a number of countries concatenated like: {“denmark,finland,france,germany,ireland,gb,italy,netherlands,poland,russia,spain,sweden,australia,brazil,canada,china,india,indonesia,japan,malaysia,vietnam,mexico,”south korea”,thailand,usa,singapore,uae”}
The update from internal team is for a thousand of records and we are not sure if countries are all spelled correctly. So the task is to reconcile against the country_name in country table.
I am doing replace(replace(country_array::text,'{',''),'}','') as country_text
and think about doing UNPIVOT to check each column against country table.
Is there any other easier way to make sure countries array in Content table has valid country names from country table?
Thank you
Advertisement
Answer
You can unnest()
each array to a set of rows, and ensure that all values occur in the country
table. The following query gives you the array elements that are missing in the reference table:
select * from content c cross join lateral unnest(c.countries) as t(country_name) left join country y on y.country_name = t.country_name where y.country_name is null
country table:
id | country_name -: | :----------- 1 | albania 2 | denmark
content table:
id | countries -: | :---------------- 1 | {albania,denmark} 1 | {albania,france}
query results:
id | countries | country_name -: | :--------------- | :----------- 1 | {albania,france} | france