I’ve looked for an answer here to no avail; wondering if this problem is best suited outside SQL environment but thought I’d see how it could be solved.
I’m trying to look at web journeys and as a result need my data to be in the following format:
ID DATE EVENT 1 01/01/20 "Landing Page" 1 01/01/20 "purchase page" 2...etc
At present the data is in the following format:
ID DATE EVENT 1 01/01/20 "Landing page","purchase page" 2... etc
I essentially want to break out the event field by comma and create a new row for each comma separated event with all other fields duplicated. My SQL isn’t great, have tried numerous temp tables, unions and split_parts to break out the field to some success but cannot get it into its own row.
Advertisement
Answer
Indeed it works as @GMB suggested.
Only that the function StringTokenizerDelim()
is not in the default search path.
You’ll have to qualify it explicitly with the v_txtindex
schema in which it resides:
WITH -- your input indata(id,dt,ev) AS ( SELECT 1, DATE '2020-01-01','landing page,purchase page' UNION ALL SELECT 2, DATE '2020-01-02','landingpage,browse-article page,purchase page' ) , tokens AS ( SELECT id , dt , v_txtindex.StringTokenizerDelim(ev,',') OVER(PARTITION BY id,dt) FROM indata ) SELECT id , dt , words AS event FROM tokens; -- out id | dt | event -- out ----+------------+--------------------- -- out 1 | 2020-01-01 | landing page -- out 1 | 2020-01-01 | purchase page -- out 2 | 2020-01-02 | landingpage -- out 2 | 2020-01-02 | browse-article page -- out 2 | 2020-01-02 | purchase page