Skip to content
Advertisement

How to implement SQL Row_number in Python Pandas?

I am trying to number my dataframe records using SQL “Row_number over” function available in SQL but it results in error as shown in the image. Please note that I don’t wish to number records using Pandas function.

Here is the code

df1.head()

output of df1.head statement

date    beef    veal    pork    lamb_and_mutton broilers    other_chicken   turkey
0   1944-01-01 00:00:00.000000  751.0   85.0    1280.0  89.0    NaN NaN NaN
1   1944-02-01 00:00:00.000000  713.0   77.0    1169.0  72.0    NaN NaN NaN
2   1944-03-01 00:00:00.000000  741.0   90.0    1128.0  75.0    NaN NaN NaN
3   1944-04-01 00:00:00.000000  650.0   89.0    978.0   66.0    NaN NaN NaN
4   1944-05-01 00:00:00.000000  681.0   106.0   1029.0  78.0    NaN NaN NaN

p = """SELECT 
  ROW_NUMBER() OVER(ORDER BY date ASC) AS Row#,
  beef,veal
FROM df1"""

df1 = pysqldf(p)

Once I execute this statement it throws an error

This code is from Python 3 version. Normal SQL queries work but looks like this row_number function isn’t available/supported by Python. Can you please help me with this? I receive an operational error

Advertisement

Answer

The problem is pretty simple and you might have figured it out already. The # breaks the whole thing as that is an unrecognized token.

If you leave that out, your code should work.

from pandasql import sqldf

q1='select beef, veal, ROW_NUMBER() OVER (ORDER BY date ASC) as RN FROM df1'

df_new=sqldf(q1)

Also it is a good practice to name your headers differently from the basic syntax. Date and row can be a functions in SQL, so you better go with ‘RN’ for the row column and ‘date_’ or ‘date_of_purchase’ for the date.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement