I have an audit table that is structured as follows:
id customer field status timestamp 1 37399262 phone successful 2020-10-08 20:16:39 2 37399262 website failed 2020-10-08 20:16:39 3 37399262 website failed 2020-10-09 21:25:22 4 37399262 website successful 2020-10-10 09:08:35
This table is tracking whether we were able to successfully source a particular data point for a customer.
Sometimes we are able to find the data on the first attempt, and you know this because there was no entry with a status of failed, the first occurrence of a particular field was tagged successful (e.g. phone
)
Other times, we fail to find the data point on the first n searches but eventually are able to find it. You know this because are one or more entries with a status of failed, then later a status of successful (e.g. website
).
I am having trouble writing a MySQL query that would only select the fields where we previously failed to find the data point but then were able to find it.
Ideally, this query would produce an output of:
customer field success_id success_timestamp last_fail_id last_fail_timestamp 37399262 website 4 2020-10-10 09:08:35 3 2020-10-09 21:25:22
I couldn’t see any similar questions, although it’s tricky to describe what I am looking for. Any help would be appreciated!
Advertisement
Answer
For MySql 8.0+ you can do it with LAG() window function:
select customer, field, id success_id, timestamp success_timestamp, last_fail_id, last_fail_timestamp from ( select *, lag(status) over (partition by customer, field order by timestamp) prev_status, lag(id) over (partition by customer, field order by timestamp) last_fail_id, lag(timestamp) over (partition by customer, field order by timestamp) last_fail_timestamp from tablename ) t where status = 'successful' and prev_status = 'failed'
For previous versions of MySql, assuming that the column id
is increasing for any new attempt:
select t.customer, t.field, t.id success_id, t.timestamp success_timestamp, g.last_fail_id, g.last_fail_timestamp from tablename t inner join ( select customer, field, max(case when status = 'failed' then timestamp end) last_fail_timestamp, max(case when status = 'failed' then id end) last_fail_id from tablename group by customer, field having last_fail_timestamp is not null ) g on g.customer = t.customer and g.field = t.field where t.status = 'successful'
See the demo.
Results:
> customer | field | success_id | success_timestamp | last_fail_id | last_fail_timestamp > -------: | :------ | ---------: | :------------------ | -----------: | :------------------ > 37399262 | website | 4 | 2020-10-10 09:08:35 | 3 | 2020-10-09 21:25:22