Skip to content
Advertisement

PostgreSQL parse countries in array against the countries table

We have content and country tables. Country is pretty simple: country_name column defined as string: Albania, Belgium, China, Denmark etc…

Content is a table with half a million of rows with various data with countries column defined as array text[]. Each value there has a number of countries concatenated like: {“denmark,finland,france,germany,ireland,gb,italy,netherlands,poland,russia,spain,sweden,australia,brazil,canada,china,india,indonesia,japan,malaysia,vietnam,mexico,”south korea”,thailand,usa,singapore,uae”}

The update from internal team is for a thousand of records and we are not sure if countries are all spelled correctly. So the task is to reconcile against the country_name in country table.

I am doing replace(replace(country_array::text,'{',''),'}','') as country_text and think about doing UNPIVOT to check each column against country table. Is there any other easier way to make sure countries array in Content table has valid country names from country table?

Thank you

Advertisement

Answer

You can unnest() each array to a set of rows, and ensure that all values occur in the country table. The following query gives you the array elements that are missing in the reference table:

select *
from 
    content c
    cross join lateral unnest(c.countries) as t(country_name)
    left join country y on y.country_name = t.country_name
where y.country_name is null

Demo on DB Fiddle

country table:

id | country_name
-: | :-----------
 1 | albania     
 2 | denmark     

content table:

id | countries        
-: | :----------------
 1 | {albania,denmark}
 1 | {albania,france} 

query results:

id | countries        | country_name
-: | :--------------- | :-----------
 1 | {albania,france} | france      
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement