Skip to content
Advertisement

extract “spells” from observed dates in PostgreSQL

I have a table like this

| winner | year |
|--------|------|
| alice  | 1999 |
| bob    | 2000 |
| bob    | 2001 |
| alice  | 2003 |
| alice  | 2005 |
| alice  | 2007 |

I’d like to infer non-overlapping spells, i.e. a table that looks like.

| winner | start| end  |
|--------|------|------|
| alice  | 1999 | 1999 |
| bob    | 2000 | 2001 |
| alice  | 2003 | 2007 |

In python, I would do something like

winners = [(1999, 'alice'),
           (2000, 'bob'),
           (2001, 'bob'),
           (2003, 'alice'),
           (2005, 'alice'),
           (2007, 'alice')]
winners.sort()

spells = []
spell = {}
for year, name in winners:
    if spell:
        if name == spell['name']:
            spell['end'] = year
        else:
            spells.append(spell)
            spell = {'name': name, 'start': year, 'end': year}
    else:
        spell = {'name': name, 'start': year, 'end': year}

But am not sure how to achieve the same result in SQL.

Advertisement

Answer

This is a gaps and islands problem. The simplest way is to use row_number():

select name, min(year), max(year)
from (select t.*, row_number() over (partition by name order by year) as seqnum
      from t
     ) t
group by name, year - seqnum
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement