Skip to content
Advertisement

Transform table to smoothen records in one column

the situation (a real life situation) is:

I have a table with “log” records, for simplicity let’s assume there are only columns NAME, TIME, STATE (in reality there are few more). There are cca 1 Milion rows in which there are 100 Names (computers) and cca 10 states (productive, idle, automated, maintenance…) This can be interpreted as: *At t0 computer1 was idle, At t1 computer1 was productive… and so on.

I ask your help with replacing IDLE states with their preceding state, if there are less then 5 consecutive idles. Every other state resets the “idle counter”, if there are more then 5 idles, keep the 6th and subsequent.

This is best described on an example, so for the below gray lets assume:

  • column STATE is written inline
  • only one name
  • states described by letter ABCD..., I is the idle
  • data ordered in time, left is oldest
STATE column I HAVE: AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC...
STATE column I NEED: AABBCCCCAABAAAAAAIIIIAAAAAABBBAACC...

You can easilly see, that less then 5 Idle states were “ran” over by the previous state.

It was decided, that these short Idle periods are false positively evaluated and should be smoothened out (longer periods delayed, like in the code example). I have cca 24 1M record tables, that I need to “recalculate” to this new format. I’m not looking for a simple select, I need to perform an update, so the table stays in this new format and I can work with it in the future the same way as untill now.

I’m not a complete sql beginner, so far suspect that the way is by using OVER clause with PARTITION and PRECEDING and FOLLOWING, but this is the level I got stuck.

If needed, I may use an approach by doing *CREATE TABLE AS SELECT … * and dropping the old table. I am also pretty fluent in python, but I don’t see a way in fetching, processing and running so many updates back efectively.

Thank you very much for your advice.

Advertisement

Answer

Will this work for you (I used a counter instead of timestamps):

    create table tsovf (id varchar2(10), rn number, cond varchar2(1));

    insert into tsovf 
    select 'comp1', rownum, substr('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC',rownum,1)
    from dual
    connect by level<length('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC');

    insert into tsovf 
    select 'comp2', rownum, substr('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC',rownum,1)
    from dual
    connect by level<length('AABBCIICAABAIIIIIIIIIAAAIIIBIIAACC');

    with fq as (select id, rn, cond, 
    lag( cond,1) over (partition by id order by rn) prv1,
    lag( cond,2) over (partition by id order by rn) prv2,
    lag( cond,3) over (partition by id order by rn) prv3,
    lag( cond,4) over (partition by id order by rn) prv4,
    lag( cond,5) over (partition by id order by rn) prv5
    from tsovf),
    sq as (
    select id,rn, 
    case 
    when cond='I' and prv1!='I' then prv1
    when cond='I' and prv2!='I' then prv2
    when cond='I' and prv3!='I' then prv3
    when cond='I' and prv4!='I' then prv4
    when cond='I' and prv5!='I' then prv5
    else cond
    end cond
    from fq)
    select * 
    from sq
    order by id,rn;

The output is:

    comp1   1   A
    comp1   2   A
    comp1   3   B
    comp1   4   B
    comp1   5   C
    comp1   6   C
    comp1   7   C
    comp1   8   C
    comp1   9   A
    comp1   10  A
    comp1   11  B
    comp1   12  A
    comp1   13  A
    comp1   14  A
    comp1   15  A
    comp1   16  A
    comp1   17  A
    comp1   18  I
    comp1   19  I
    comp1   20  I
    comp1   21  I
    comp1   22  A
    comp1   23  A
    comp1   24  A
    comp1   25  A
    comp1   26  A
    comp1   27  A
    comp1   28  B
    comp1   29  B
    comp1   30  B
    comp1   31  A
    comp1   32  A
    comp1   33  C
    comp2   1   A
    comp2   2   A
    comp2   3   B
    comp2   4   B
    comp2   5   C
    comp2   6   C
    comp2   7   C
    comp2   8   C
    comp2   9   A
    comp2   10  A
    comp2   11  B
    comp2   12  A
    comp2   13  A
    comp2   14  A
    comp2   15  A
    comp2   16  A
    comp2   17  A
    comp2   18  I
    comp2   19  I
    comp2   20  I
    comp2   21  I
    comp2   22  A
    comp2   23  A
    comp2   24  A
    comp2   25  A
    comp2   26  A
    comp2   27  A
    comp2   28  B
    comp2   29  B
    comp2   30  B
    comp2   31  A
    comp2   32  A
    comp2   33  C
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement