Having a dataframe like that:
Desirable result is to get aggregated IDs with time diffs between Start and End looking like that:
Tried simple groupings and diffs but it does not work:
df[df['Name'] == 'Start'].groupby('ID')['Time']- df[df['Name'] == 'End'].groupby('ID')['Time']
How this task can be done in pandas? Thanks!
Advertisement
Answer
A possible solution is to join the table on itself like this:
df_start = df[df['Name'] == 'Start'] df_end = df[df['Name'] == 'End'] df_merge = df_start.merge(df_end, on='id', suffixes=('_start', '_end')) df_merge['diff'] = df_merge['Time_end'] - df_merge['Time_start'] print(df_merge.to_string())
Output:
id Name_start Time_start Name_end Time_end diff 0 1 Start 2017-11-02 12:00:14 End 2017-11-07 22:45:13 5 days 10:44:59 1 2 Start 2018-01-28 06:53:09 End 2018-02-05 13:31:14 8 days 06:38:05