Skip to content
Advertisement

Python/Excel/SQL column dates to exploded rows

I have an Excel spreadsheet table like this where it shows how many units were sold each week for 2019 and 2020 (columns) and I am trying to put it into an access database formatted like the second table:

| model | 2019 - Wk 1 U's | 2019 - Wk 2 U's | 2019 - Wk 3 U's | ... | 2020 - Wk 52 U's |
---------------------------------------------------------------------------------------
 abc            50                45                65                    20
 def            72                93                57                    68

and I want to get it like this

| model | year | month | week | units | 
---------------------------------------
 abc      2019    1       1      50
 abc      2019    1       2      45
 abc      2019    1       3      65
 def      2019    1       5      72
 def      2019    1       6      93
 def      2019    1       7      57
 ...      ....    ..      ..     .. 
 abc      2020    12      52     20
 def      2020    12      52     68

What’s the best way to do this in Python (pandas) or VBA/Excel formula (or SQL if need be)?

Advertisement

Answer

Just need to be systematic

  1. row index set_index("model")
  2. name column series df.columns.set_names("weekencoded", inplace=True)
  3. then you are ready to stack()
  4. rename column to units, reset_index() so encoded string is available as columns
  5. parse out the values you want from encoded string
  6. I’ve left it for you to convert a week number to a month number
data = """| model | 2019 - Wk 1 U's | 2019 - Wk 2 U's | 2019 - Wk 3 U's | ... | 2020 - Wk 52 U's |
---------------------------------------------------------------------------------------
 abc            50                45                65                    20
 def            72                93                57                    68"""

df = pd.DataFrame([[c for c in l.split(" ") if c!=""] for l in data.split("n")[2:]],
            columns=[c.strip() for c in data.split("n")[0].split("|") if c!="" and "..." not in c])

df.set_index("model", inplace=True)
df.columns.set_names("weekencoded", inplace=True)
df = df.stack().to_frame().rename(columns={0:"units"}).reset_index().assign(
    year=lambda dfa: dfa.apply(lambda s: re.match("20[1,2][0-9]", s["weekencoded"])[0], axis=1),
    week=lambda dfa: dfa.apply(lambda s: re.findall(r"(^.*Wk )([0-9]+)( U.*$)", s["weekencoded"])[0][1], axis=1),
)

output

model       weekencoded units  year week
  abc   2019 - Wk 1 U's    50  2019    1
  abc   2019 - Wk 2 U's    45  2019    2
  abc   2019 - Wk 3 U's    65  2019    3
  abc  2020 - Wk 52 U's    20  2020   52
  def   2019 - Wk 1 U's    72  2019    1
  def   2019 - Wk 2 U's    93  2019    2
  def   2019 - Wk 3 U's    57  2019    3
  def  2020 - Wk 52 U's    68  2020   52
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement