I have an Excel spreadsheet table like this where it shows how many units were sold each week for 2019 and 2020 (columns) and I am trying to put it into an access database formatted like the second table:
x
| model | 2019 - Wk 1 U's | 2019 - Wk 2 U's | 2019 - Wk 3 U's | ... | 2020 - Wk 52 U's |
---------------------------------------------------------------------------------------
abc 50 45 65 20
def 72 93 57 68
and I want to get it like this
| model | year | month | week | units |
---------------------------------------
abc 2019 1 1 50
abc 2019 1 2 45
abc 2019 1 3 65
def 2019 1 5 72
def 2019 1 6 93
def 2019 1 7 57
. .. .. ..
abc 2020 12 52 20
def 2020 12 52 68
What’s the best way to do this in Python (pandas) or VBA/Excel formula (or SQL if need be)?
Advertisement
Answer
Just need to be systematic
- row index
set_index("model")
- name column series
df.columns.set_names("weekencoded", inplace=True)
- then you are ready to
stack()
- rename column to units,
reset_index()
so encoded string is available as columns - parse out the values you want from encoded string
- I’ve left it for you to convert a week number to a month number
data = """| model | 2019 - Wk 1 U's | 2019 - Wk 2 U's | 2019 - Wk 3 U's | ... | 2020 - Wk 52 U's |
---------------------------------------------------------------------------------------
abc 50 45 65 20
def 72 93 57 68"""
df = pd.DataFrame([[c for c in l.split(" ") if c!=""] for l in data.split("n")[2:]],
columns=[c.strip() for c in data.split("n")[0].split("|") if c!="" and "..." not in c])
df.set_index("model", inplace=True)
df.columns.set_names("weekencoded", inplace=True)
df = df.stack().to_frame().rename(columns={0:"units"}).reset_index().assign(
year=lambda dfa: dfa.apply(lambda s: re.match("20[1,2][0-9]", s["weekencoded"])[0], axis=1),
week=lambda dfa: dfa.apply(lambda s: re.findall(r"(^.*Wk )([0-9]+)( U.*$)", s["weekencoded"])[0][1], axis=1),
)
output
model weekencoded units year week
abc 2019 - Wk 1 U's 50 2019 1
abc 2019 - Wk 2 U's 45 2019 2
abc 2019 - Wk 3 U's 65 2019 3
abc 2020 - Wk 52 U's 20 2020 52
def 2019 - Wk 1 U's 72 2019 1
def 2019 - Wk 2 U's 93 2019 2
def 2019 - Wk 3 U's 57 2019 3
def 2020 - Wk 52 U's 68 2020 52