I have pandas dataframe like this :
created_at | lat | long | hex_ID | |
---|---|---|---|---|
0 | 2020-10-13 15:12:18.682905 | 28.690628 | 77.323285 | 883da1ab0bfffff |
1 | 2020-10-12 22:49:05.886170 | 28.755408 | 77.112289 | 883da18e87fffff |
2 | 2020-10-13 15:24:17.692375 | 28.690571 | 77.323335 | 883da1ab0bfffff |
3 | 2020-10-12 23:21:13.700226 | 28.589922 | 77.082738 | 883da112a1fffff |
4 | 2020-10-13 15:43:58.887592 | 28.649227 | 77.339063 | 883da1a941fffff |
and I want to convert it like this
created_at | hex_id | count | |
---|---|---|---|
0 | 2020-10-28 22:00:00 | 883da11185fffff | 4 |
1 | 2020-09-09 10:00:00 | 883da111ebfffff | 2 |
2 | 2020-12-02 20:00:00 | 883da10769fffff | 2 |
3 | 2020-10-16 07:00:00 | 883da111c3fffff | 1 |
4 | 2020-12-13 11:00:00 | 883da11747fffff | 4 |
As of now i am taking the dataframe dumping it into postgres and running the below query and then exporting data and at last importing back to my notebook .
Query :
SELECT created_at('hour', timestamp),count(lat),hex_id FROM public."ML_Data" group by created_at('hour', timestamp),hex_id
I was wondering if I could directly do it in the notebook file
Advertisement
Answer
just use groupy in df.
# 2020-10-13 15:12:18.682905 -> 2020-10-13 15:00:00 df['created_at_n'] = df['created_at'].astype(str).str.split(':').str[0] + ':00:00' df.groupby(['created_at_n', 'hex_id'])['lat'].count()