I am trying to number my dataframe records using SQL “Row_number over” function available in SQL but it results in error as shown in the image. Please note that I don’t wish to number records using Pandas function.
Here is the code
df1.head()
output of df1.head statement
date beef veal pork lamb_and_mutton broilers other_chicken turkey 0 1944-01-01 00:00:00.000000 751.0 85.0 1280.0 89.0 NaN NaN NaN 1 1944-02-01 00:00:00.000000 713.0 77.0 1169.0 72.0 NaN NaN NaN 2 1944-03-01 00:00:00.000000 741.0 90.0 1128.0 75.0 NaN NaN NaN 3 1944-04-01 00:00:00.000000 650.0 89.0 978.0 66.0 NaN NaN NaN 4 1944-05-01 00:00:00.000000 681.0 106.0 1029.0 78.0 NaN NaN NaN
p = """SELECT ROW_NUMBER() OVER(ORDER BY date ASC) AS Row#, beef,veal FROM df1""" df1 = pysqldf(p)
Once I execute this statement it throws an error
This code is from Python 3 version. Normal SQL queries work but looks like this row_number function isn’t available/supported by Python. Can you please help me with this? I receive an operational error
Advertisement
Answer
The problem is pretty simple and you might have figured it out already. The # breaks the whole thing as that is an unrecognized token.
If you leave that out, your code should work.
from pandasql import sqldf q1='select beef, veal, ROW_NUMBER() OVER (ORDER BY date ASC) as RN FROM df1' df_new=sqldf(q1)
Also it is a good practice to name your headers differently from the basic syntax. Date and row can be a functions in SQL, so you better go with ‘RN’ for the row column and ‘date_’ or ‘date_of_purchase’ for the date.