Selective summation of columns in a pandas dataframe

Question

The COVID-19 tracking project (api described here) provides data on many aspects of the pandemic. Each row of the JSON is one day's data for one state. As many people know, the pandemic is hitting different states differently -- New York and its neighbors hardest first, with other states being hit later. Here is a subset of the data: To

Accepted Answer

to get the expected output, you can use groupby on date and np.where the states are isin the states you want, sum on positive, unstack and assign to get the column total df_f = all_states.groupby(['date',                            np.where(all_states['state'].isin(["NY","NJ","CT"]),                                     'tristate', 'not_tristate')])                 ['positive'].sum()                 .unstack()                 .assign(total=lambda x: x.sum(axis=1))print (df_f)          not_tristate  tristate   totaldate                                    20200502         53128    312977  36610520200503         54563    316415  37097820200504         55893    318953  37484620200505         57179    321192  378371or with pivot_table, you get similar result with:print ( all_states.assign(state= np.where(all_states['state'].isin(["NY","NJ","CT"]),                                           'tristate', 'not_tristate'))                  .pivot_table(index='date', columns='state', values='positive',                                aggfunc='sum', margins=True))state     not_tristate  tristate      Alldate                                     20200502         53128    312977   36610520200503         54563    316415   37097820200504         55893    318953   37484620200505         57179    321192   378371All             220763   1269537  1490300

Advertisement

Answer