My data is in a SQL table in the following format (I have about 20 different answers per user_id):
USER_ID ANSWER -------------------------- 1 CAR 1 10-20 miles 1 SALES 2 TRAIN 2 0-10 miles 2 TEACHER
I would like to create a view in PostgreSQL where all the data is shown according to User ID
USER_ID ANSWER1 ANSWER2 ANSWER3 ----------------------------------------------- 1 CAR 10-20 miles SALES 2 TRAIN 0-10 miles TEACHER
Thank you!
Advertisement
Answer
It is great example of not relational data (the semantic depends on row number), although the data are saved in relational database. You cannot to make relational data from not relational inside relational database. There is not any mechanism how to do this transformation safely. Theoretically, if physical order of your data is same like in your picture, then you can write queries (but you should not to use ORDER BY
clause, because it can change order, because your data doesn’t hold necessary data for correct ordering):
You need function:
CREATE OR REPLACE FUNCTION public.x_transformed() RETURNS TABLE(user_id integer, rowno integer, answer character varying) LANGUAGE plpgsql AS $function$ declare r record; prev_user_id int; begin for r in select * from x loop if user_id is distinct from r.user_id then rowno := 1; else rowno := rowno + 1; end if; user_id = r.user_id; answer = r.answer; return next; end loop; end; $function$
This function can add missing order information to relation. It is pretty ugly solution, but your input data are in very unhappy format, and there is not possible any clean solution (based on relational algebra):
postgres=# select * from x_transformed(); ┌─────────┬───────┬─────────────┐ │ user_id │ rowno │ answer │ ╞═════════╪═══════╪═════════════╡ │ 1 │ 1 │ CAR │ │ 1 │ 2 │ 10-20 miles │ │ 1 │ 3 │ SALES │ │ 2 │ 1 │ TRAIN │ │ 2 │ 2 │ 0-10 miles │ │ 2 │ 3 │ TEACHER │ └─────────┴───────┴─────────────┘ (6 rows)
Now, the transformation to the requested format is easy (search pivoting):
select user_id, max(answer) filter (where rowno = 1) as answer1, max(answer) filter (where rowno = 2) as answer2, max(answer) filter (where rowno = 3) as answer3 from x_transformed() group by user_id; ┌─────────┬─────────┬─────────────┬─────────┐ │ user_id │ answer1 │ answer2 │ answer3 │ ╞═════════╪═════════╪═════════════╪═════════╡ │ 2 │ TRAIN │ 0-10 miles │ TEACHER │ │ 1 │ CAR │ 10-20 miles │ SALES │ └─────────┴─────────┴─────────────┴─────────┘ (2 rows)
The main problem is in format of your input data. It can be good enough for file and application processing, but are terrible bad for storing (and processing) in relation database. The relation (or the table) is heap (not file).
With knowledge of Postgres’s internals, you can ensure expected order although you use some relational operations. You can use internal implicit column ctid
. ctid
is an unique address of row. This will not work on other databases, and it will not work on older Postgres releases:
select user_id, max(answer) filter (where rowno = 1) as answer1, max(answer) filter (where rowno = 2) as answer2, max(answer) filter (where rowno = 3) as answer3 from (select user_id, answer, row_number() over (partition by user_id order by ctid) as rowno from x) s group by user_id order by user_id; ┌─────────┬─────────┬─────────────┬─────────┐ │ user_id │ answer1 │ answer2 │ answer3 │ ╞═════════╪═════════╪═════════════╪═════════╡ │ 1 │ CAR │ 10-20 miles │ SALES │ │ 2 │ TRAIN │ 0-10 miles │ TEACHER │ └─────────┴─────────┴─────────────┴─────────┘ (2 rows)