Skip to content
Advertisement

SQL Rows to column but NOT transpose

We have a table regarding identification where data is as such:

Customer Number Identification
101 passport-us
101 Licence
102 passport-uk
102 Licence
102 birth-cert
103 Licence-learner

Thus one customer may have submitted multiple identification others in varying number/combination.

Result we are after is :

Customer Number Passport Licence Birth Certificate
101 passport-us licence
102 passport-uk licence birth-cert
103 licence-learner

So basically taking the exact text and placing in the subsequent column.

What we know is:

  • there are only three identification type we are interested in (if there are others we dont catch)
  • they will contain within it the keyword (passport,birth, etc)
  • there is no given order to how customer number is coming in the table, there is a separate primary key else where in the bigger table.
  • we are aiming the query to run in hive

I was thinking along the lines of self join three times with REGEXP_EXTRACT perhaps.

Thoughts?

Advertisement

Answer

You can do it using pure sql like this:

select
  CustomerNumber,
  max(case when Identification like 'passport%' then Identification else null end) as Passport,
  max(case when Identification like 'Licence%' then Identification else null end) as Licence,
  max(case when Identification ='birth-cert' then Identification else null end) as BirthCertificate
from identification
group by CustomerNumber
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement