Skip to content
Advertisement

Calculate TimeDiff in Pandas based on a column values

Having a dataframe like that:

enter image description here

Desirable result is to get aggregated IDs with time diffs between Start and End looking like that:

enter image description here

Tried simple groupings and diffs but it does not work:

df[df['Name'] == 'Start'].groupby('ID')['Time']-
df[df['Name'] == 'End'].groupby('ID')['Time']

How this task can be done in pandas? Thanks!

Advertisement

Answer

A possible solution is to join the table on itself like this:

df_start = df[df['Name'] == 'Start']
df_end = df[df['Name'] == 'End']
df_merge = df_start.merge(df_end, on='id', suffixes=('_start', '_end'))
df_merge['diff'] = df_merge['Time_end'] - df_merge['Time_start']
print(df_merge.to_string())

Output:

   id Name_start          Time_start Name_end            Time_end            diff
0   1      Start 2017-11-02 12:00:14      End 2017-11-07 22:45:13 5 days 10:44:59
1   2      Start 2018-01-28 06:53:09      End 2018-02-05 13:31:14 8 days 06:38:05
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement