Skip to content
Advertisement

Is there a way to run posqresql queries in a pandas dataframe?

I have pandas dataframe like this :

created_at lat long hex_ID
0 2020-10-13 15:12:18.682905 28.690628 77.323285 883da1ab0bfffff
1 2020-10-12 22:49:05.886170 28.755408 77.112289 883da18e87fffff
2 2020-10-13 15:24:17.692375 28.690571 77.323335 883da1ab0bfffff
3 2020-10-12 23:21:13.700226 28.589922 77.082738 883da112a1fffff
4 2020-10-13 15:43:58.887592 28.649227 77.339063 883da1a941fffff

and I want to convert it like this

created_at hex_id count
0 2020-10-28 22:00:00 883da11185fffff 4
1 2020-09-09 10:00:00 883da111ebfffff 2
2 2020-12-02 20:00:00 883da10769fffff 2
3 2020-10-16 07:00:00 883da111c3fffff 1
4 2020-12-13 11:00:00 883da11747fffff 4

As of now i am taking the dataframe dumping it into postgres and running the below query and then exporting data and at last importing back to my notebook .

Query :

SELECT created_at('hour', timestamp),count(lat),hex_id FROM public."ML_Data"
group by created_at('hour', timestamp),hex_id

I was wondering if I could directly do it in the notebook file

Advertisement

Answer

just use groupy in df.

# 2020-10-13 15:12:18.682905 -> 2020-10-13 15:00:00
df['created_at_n'] = df['created_at'].astype(str).str.split(':').str[0] + ':00:00'
df.groupby(['created_at_n', 'hex_id'])['lat'].count()
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement