the situation (a real life situation) is:
I have a table with “log” records, for simplicity let’s assume there are only columns NAME, TIME, STATE (in reality there are few more). There are cca 1 Milion rows in which there are 100 Names (computers) and cca 10 states (productive, idle, automated, maintenance…) This can be interpreted as: *At t0 computer1 was idle, At t1 computer1 was productive… and so on.
I ask your help with replacing IDLE states with their preceding state, if there are less then 5 consecutive idles. Every other state resets the “idle counter”, if there are more then 5 idles, keep the 6th and subsequent.
This is best described on an example, so for the below gray lets assume:
- column
STATE
is written inline - only one name
- states described by letter
ABCD...
,I
is the idle - data ordered in time, left is oldest
STATE column I HAVE: AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC... STATE column I NEED: AABBCCCCAABAAAAAAIIIIAAAAAABBBAACC...
You can easilly see, that less then 5 Idle states were “ran” over by the previous state.
It was decided, that these short Idle periods are false positively evaluated and should be smoothened out (longer periods delayed, like in the code example). I have cca 24 1M record tables, that I need to “recalculate” to this new format. I’m not looking for a simple select, I need to perform an update, so the table stays in this new format and I can work with it in the future the same way as untill now.
I’m not a complete sql beginner, so far suspect that the way is by using OVER clause with PARTITION and PRECEDING and FOLLOWING, but this is the level I got stuck.
If needed, I may use an approach by doing *CREATE TABLE AS SELECT … * and dropping the old table. I am also pretty fluent in python, but I don’t see a way in fetching, processing and running so many updates back efectively.
Thank you very much for your advice.
Advertisement
Answer
Will this work for you (I used a counter instead of timestamps):
create table tsovf (id varchar2(10), rn number, cond varchar2(1)); insert into tsovf select 'comp1', rownum, substr('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC',rownum,1) from dual connect by level<length('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC'); insert into tsovf select 'comp2', rownum, substr('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC',rownum,1) from dual connect by level<length('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC'); with fq as (select id, rn, cond, lag( cond,1) over (partition by id order by rn) prv1, lag( cond,2) over (partition by id order by rn) prv2, lag( cond,3) over (partition by id order by rn) prv3, lag( cond,4) over (partition by id order by rn) prv4, lag( cond,5) over (partition by id order by rn) prv5 from tsovf), sq as ( select id,rn, case when cond='I' and prv1!='I' then prv1 when cond='I' and prv2!='I' then prv2 when cond='I' and prv3!='I' then prv3 when cond='I' and prv4!='I' then prv4 when cond='I' and prv5!='I' then prv5 else cond end cond from fq) select * from sq order by id,rn;
The output is:
comp1 1 A comp1 2 A comp1 3 B comp1 4 B comp1 5 C comp1 6 C comp1 7 C comp1 8 C comp1 9 A comp1 10 A comp1 11 B comp1 12 A comp1 13 A comp1 14 A comp1 15 A comp1 16 A comp1 17 A comp1 18 I comp1 19 I comp1 20 I comp1 21 I comp1 22 A comp1 23 A comp1 24 A comp1 25 A comp1 26 A comp1 27 A comp1 28 B comp1 29 B comp1 30 B comp1 31 A comp1 32 A comp1 33 C comp2 1 A comp2 2 A comp2 3 B comp2 4 B comp2 5 C comp2 6 C comp2 7 C comp2 8 C comp2 9 A comp2 10 A comp2 11 B comp2 12 A comp2 13 A comp2 14 A comp2 15 A comp2 16 A comp2 17 A comp2 18 I comp2 19 I comp2 20 I comp2 21 I comp2 22 A comp2 23 A comp2 24 A comp2 25 A comp2 26 A comp2 27 A comp2 28 B comp2 29 B comp2 30 B comp2 31 A comp2 32 A comp2 33 C