I am having trouble querying some data. The table I am trying to pull the data from is a LOG table, where I would like to see changes in the values next to each other (example below)
Table:
+-----------+----+-------------+----------+------------+ | UNIQUE_ID | ID | NAME | CITY | DATE | +-----------+----+-------------+----------+------------+ | xa220 | 1 | John Smith | Berlin | 2020.05.01 | | xa195 | 1 | John Smith | Berlin | 2020.03.01 | | xa111 | 1 | John Smith | München | 2020.01.01 | | xa106 | 2 | James Brown | Atlanta | 2018.04.04 | | xa100 | 2 | James Brown | Boston | 2017.12.10 | | xa76 | 3 | Emily Wolf | Shanghai | 2016.11.03 | | xa20 | 3 | Emily Wolf | Shanghai | 2016.07.03 | | xa15 | 3 | Emily Wolf | Tokyo | 2014.02.22 | | xa12 | 3 | Emily Wolf | null | 2014.02.22 | +-----------+----+-------------+----------+------------+
Desired outcome:
+----+-------------+----------+---------------+ | ID | NAME | CITY | PREVIOUS_CITY | +----+-------------+----------+---------------+ | 1 | John Smith | Berlin | München | | 2 | James Brown | Atlanta | Boston | | 3 | Emily Wolf | Shanghai | Tokyo | | 3 | Emily Wolf | Tokyo | null | +----+-------------+----------+---------------+
I have been trying to use FIRST and LAST values, however, cannot get the desired outcome.
select distinct id, name, city, first_value(city) over (partition by id order by city) as previous_city from test
Any help is appreciated! Thank you!
Advertisement
Answer
Use the LAG
function to get the city for previous date and display only the rows where current city and the result of lag are different:
WITH cte AS ( SELECT t.*, LAG(CITY, 1, CITY) OVER (PARTITION BY ID ORDER BY "DATE") LAG_CITY FROM yourTable t ) SELECT ID, NAME, CITY, LAG_CITY AS PREVIOUS_CITY FROM cte WHERE CITY <> LAG_CITY OR CITY IS NULL AND LAG_CITY IS NOT NULL OR CITY IS NOT NULL AND LAG_CITY IS NULL ORDER BY ID, "DATE" DESC;
Some comments on how LAG
is being used and its values checked are warranted. We use the three parameter version of LAG
here. The second parameter means the number of records to look back, which in this case is 1 (the default). The third parameter means the default value to use should a given record per ID
partition be the first. In this case, we use the default as the same CITY
value. This means that the first record would never appear in the result set.
For the WHERE
clause above, a matching record is one for which the city and lag city are different, or for where one of the two be NULL
and the other not NULL
. This is the logic needed to treat a NULL
city and some not NULL
city value as being different.