I have a query that returns data in the following sample:
SELECT timestamp, atm_id FROM TRANSACTIONS ORDER BY TIMESTAMP ASC;
Output
TIMESTAMP | ATM_ID | -------------------- 2010-01-01 | EP02 | 2010-01-01 | EP02 | 2010-01-28 | EP02 | 2010-02-07 | EP02 | 2010-02-09 | EP11 | 2010-03-19 | EP11 | 2010-03-19 | EP02 | 2010-04-03 | EP05 | 2010-04-30 | EP02 |
I know how to group by ATM_ID and put the count in-front of each
SELECT ATM_ID, COUNT(*) CNT FROM TRANSACTIONS GROUP BY ATM_ID;
Based on the sample data above, this will yield something like
ATM_ID | CNT --------------- EP02 | 6 EP11 | 2 EP05 | 1
However, I am interested in grouping on a different level. If a certain ATM_ID is duplicated in consecutive rows, the number of rows having the same ATM_ID in sequence should be included in the output, even if the same ATM_ID appears later after a different ATM_ID
Desired Output
ATM_ID | CNT --------------- EP02 | 4 --Four rows of ATM_ID EP02 EP11 | 2 --Followed by 2 rows of ATM_ID EP11 EP02 | 1 --Followed by 1 row of ATM_ID EP02 EP05 | 1 --Followed by 1 row of ATM_ID EP05 EP02 | 1 --Followed by 1 row of ATM_ID EP02
Ignore the comments on the right, these are just for clarifications, not part of the output. Is that possible?
PS: The answer below by Syed Aladeen gives the output with the correct count, but with the wrong order. I create an SQL fiddle for convenience:
SQL Fiddle
Advertisement
Answer
-- Oracle 12c+: pattern matching with s(dt, atm_id) as ( select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual) select * from s match_recognize ( order by dt measures v.atm_id as atm_id, count(v.atm_id) as cnt, first(dt) as min_dt, last (dt) as max_dt pattern (v+) define v as v.atm_id = first(atm_id) ); ATM_ CNT MIN_DT MAX_DT ---- ---------- ------------------- ------------------- EP02 4 2010-01-01 00:00:00 2010-02-07 00:00:00 EP11 2 2010-02-09 00:00:00 2010-03-19 00:00:00 EP02 1 2010-03-19 00:00:00 2010-03-19 00:00:00 EP05 1 2010-04-03 00:00:00 2010-04-03 00:00:00 EP02 1 2010-04-30 00:00:00 2010-04-30 00:00:00 Elapsed: 00:00:00.01 -- Oracle 8i+: window sort + window buffer + group by [+ order by] with s(dt, atm_id) as ( select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual) select atm_id, count(*) cnt, min(dt) min_dt, max(dt) as max_dt from (select dt, atm_id, count(lg) over (order by dt) ct, lg from (select dt, atm_id, decode(atm_id, lag(atm_id) over (order by dt), null, 1) lg from s ) ) group by ct, atm_id order by min_dt; ATM_ CNT MIN_DT MAX_DT ---- ---------- ------------------- ------------------- EP02 4 2010-01-01 00:00:00 2010-02-07 00:00:00 EP11 1 2010-02-09 00:00:00 2010-02-09 00:00:00 EP02 1 2010-03-19 00:00:00 2010-03-19 00:00:00 EP11 1 2010-03-19 00:00:00 2010-03-19 00:00:00 EP05 1 2010-04-03 00:00:00 2010-04-03 00:00:00 EP02 1 2010-04-30 00:00:00 2010-04-30 00:00:00 6 rows selected.