Skip to content
Advertisement

How to aggregate count of rows having the same value in a sequence?

I have a query that returns data in the following sample:

SELECT timestamp, atm_id FROM TRANSACTIONS ORDER BY TIMESTAMP ASC;

Output

TIMESTAMP  | ATM_ID | 
--------------------
2010-01-01 | EP02   |
2010-01-01 | EP02   |
2010-01-28 | EP02   |
2010-02-07 | EP02   |
2010-02-09 | EP11   |
2010-03-19 | EP11   |
2010-03-19 | EP02   |
2010-04-03 | EP05   |
2010-04-30 | EP02   |

I know how to group by ATM_ID and put the count in-front of each

SELECT
   ATM_ID,
   COUNT(*) CNT
FROM
   TRANSACTIONS
GROUP BY
   ATM_ID;

Based on the sample data above, this will yield something like

 ATM_ID |  CNT  
---------------
 EP02   |  6
 EP11   |  2
 EP05   |  1

However, I am interested in grouping on a different level. If a certain ATM_ID is duplicated in consecutive rows, the number of rows having the same ATM_ID in sequence should be included in the output, even if the same ATM_ID appears later after a different ATM_ID

Desired Output

 ATM_ID |  CNT  
---------------
 EP02   |  4         --Four rows of ATM_ID EP02
 EP11   |  2         --Followed by 2 rows of ATM_ID EP11
 EP02   |  1         --Followed by 1 row of ATM_ID EP02
 EP05   |  1         --Followed by 1 row of ATM_ID EP05
 EP02   |  1         --Followed by 1 row of ATM_ID EP02

Ignore the comments on the right, these are just for clarifications, not part of the output. Is that possible?

PS: The answer below by Syed Aladeen gives the output with the correct count, but with the wrong order. I create an SQL fiddle for convenience:

SQL Fiddle

Advertisement

Answer

-- Oracle 12c+: pattern matching
with s(dt, atm_id) as (
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all
select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual)
select *
from s
match_recognize (
order by dt
measures v.atm_id as atm_id,
count(v.atm_id) as cnt,
first(dt) as min_dt,
last (dt) as max_dt
pattern (v+)
define v as v.atm_id = first(atm_id)
);

ATM_        CNT MIN_DT              MAX_DT
---- ---------- ------------------- -------------------
EP02          4 2010-01-01 00:00:00 2010-02-07 00:00:00
EP11          2 2010-02-09 00:00:00 2010-03-19 00:00:00
EP02          1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP05          1 2010-04-03 00:00:00 2010-04-03 00:00:00
EP02          1 2010-04-30 00:00:00 2010-04-30 00:00:00

Elapsed: 00:00:00.01

-- Oracle 8i+: window sort + window buffer + group by [+ order by]
with s(dt, atm_id) as (
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-01', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-01-28', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-07', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-02-09', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP11' from dual union all
select to_date('2010-03-19', 'yyyy-mm-dd'), 'EP02' from dual union all
select to_date('2010-04-03', 'yyyy-mm-dd'), 'EP05' from dual union all
select to_date('2010-04-30', 'yyyy-mm-dd'), 'EP02' from dual)
select atm_id, count(*) cnt, min(dt) min_dt, max(dt) as max_dt
from
  (select dt, atm_id, count(lg) over (order by dt) ct, lg
    from
     (select dt, atm_id, decode(atm_id, lag(atm_id) over (order by dt), null, 1) lg
      from s
     )
  )
group by ct, atm_id
order by min_dt;

ATM_        CNT MIN_DT              MAX_DT
---- ---------- ------------------- -------------------
EP02          4 2010-01-01 00:00:00 2010-02-07 00:00:00
EP11          1 2010-02-09 00:00:00 2010-02-09 00:00:00
EP02          1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP11          1 2010-03-19 00:00:00 2010-03-19 00:00:00
EP05          1 2010-04-03 00:00:00 2010-04-03 00:00:00
EP02          1 2010-04-30 00:00:00 2010-04-30 00:00:00

6 rows selected.
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement