I have a query that returns data in the following sample:
x
SELECT timestamp, atm_id FROM TRANSACTIONS ORDER BY TIMESTAMP ASC;
Output
TIMESTAMP | ATM_ID |
--------------------
2010-01-01 | EP02 |
2010-01-01 | EP02 |
2010-01-28 | EP02 |
2010-02-07 | EP02 |
2010-02-09 | EP11 |
2010-03-19 | EP11 |
2010-03-19 | EP02 |
2010-04-03 | EP05 |
2010-04-30 | EP02 |
I know how to group by ATM_ID and put the count in-front of each
SELECT
ATM_ID,
COUNT(*) CNT
FROM
TRANSACTIONS
GROUP BY
ATM_ID;
Based on the sample data above, this will yield something like
ATM_ID | CNT
---------------
EP02 | 6
EP11 | 2
EP05 | 1
However, I am interested in grouping on a different level. If a certain ATM_ID is duplicated in consecutive rows, the number of rows having the same ATM_ID in sequence should be included in the output, even if the same ATM_ID appears later after a different ATM_ID
Desired Output
ATM_ID | CNT
---------------
EP02 | 4 --Four rows of ATM_ID EP02
EP11 | 2 --Followed by 2 rows of ATM_ID EP11
EP02 | 1 --Followed by 1 row of ATM_ID EP02
EP05 | 1 --Followed by 1 row of ATM_ID EP05
EP02 | 1 --Followed by 1 row of ATM_ID EP02
Ignore the comments on the right, these are just for clarifications, not part of the output. Is that possible?
PS: The answer below by Syed Aladeen gives the output with the correct count, but with the wrong order. I create an SQL fiddle for convenience:
SQL Fiddle
Advertisement
Answer
-- Oracle 12c+: pattern matching
with s(dt, atm_id) as (
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all
select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual)
select *
from s
match_recognize (
order by dt
measures v.atm_id as atm_id,
count(v.atm_id) as cnt,
first(dt) as min_dt,
last (dt) as max_dt
pattern (v+)
define v as v.atm_id = first(atm_id)
);
ATM_ CNT MIN_DT MAX_DT
---- ---------- ------------------- -------------------
EP02 4 2010-01-01 00:00:00 2010-02-07 00:00:00
EP11 2 2010-02-09 00:00:00 2010-03-19 00:00:00
EP02 1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP05 1 2010-04-03 00:00:00 2010-04-03 00:00:00
EP02 1 2010-04-30 00:00:00 2010-04-30 00:00:00
Elapsed: 00:00:00.01
-- Oracle 8i+: window sort + window buffer + group by [+ order by]
with s(dt, atm_id) as (
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all
select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual)
select atm_id, count(*) cnt, min(dt) min_dt, max(dt) as max_dt
from
(select dt, atm_id, count(lg) over (order by dt) ct, lg
from
(select dt, atm_id, decode(atm_id, lag(atm_id) over (order by dt), null, 1) lg
from s
)
)
group by ct, atm_id
order by min_dt;
ATM_ CNT MIN_DT MAX_DT
---- ---------- ------------------- -------------------
EP02 4 2010-01-01 00:00:00 2010-02-07 00:00:00
EP11 1 2010-02-09 00:00:00 2010-02-09 00:00:00
EP02 1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP11 1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP05 1 2010-04-03 00:00:00 2010-04-03 00:00:00
EP02 1 2010-04-30 00:00:00 2010-04-30 00:00:00
6 rows selected.