Skip to content
Advertisement

Tag: pandas

Python problem with adding table to Database with sql

Hi, I’m trying to create a function that will take a table in insert(add) to the database. My Code : this is the old code when i didn’t create a function: Answer You seem to have put table_name under quotations in your to_sql function. Also, you seem to be tackling the issue of data model creation and maintenance, which already

Pandas dataframe combine unique row values

I have a dataframe like the following with over 90000 rows. As you can see, some origin and destination values repeat for example there are multiple rows where origin=101011001, destination=101011002. My goal is to group the repeating origin and destination values and sum the the people column, so the dataframe looks like this: I’ve tried jsondf.groupby([‘origin’, ‘destination’]).sum() which gives me

Selective summation of columns in a pandas dataframe

The COVID-19 tracking project (api described here) provides data on many aspects of the pandemic. Each row of the JSON is one day’s data for one state. As many people know, the pandemic is hitting different states differently — New York and its neighbors hardest first, with other states being hit later. Here is a subset of the data: To

SQL & Pandas Efficiency [closed]

Closed. This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 2 years ago. Improve this question Quick question. What is the rule of thumb when deciding where to begin manipulating data? Should I do it when I

Python Pandas non equal join

Have table OUT Need: so i need make non equal join equivalent SQL query : or as SQL query: The problem in PANDAS is that NON EQUAL SELF JOIN cannot be done with MERGE. And I can’t find another way….. Answer We can solve this in pandas in a smarter way by using groupby with agg and joining the strings.

Calculate TimeDiff in Pandas based on a column values

Having a dataframe like that: Desirable result is to get aggregated IDs with time diffs between Start and End looking like that: Tried simple groupings and diffs but it does not work: How this task can be done in pandas? Thanks! Answer A possible solution is to join the table on itself like this: Output:

Advertisement