Using date range where clause in pandas SQL query returns an empty dataframe

I have a table in a database containing millions of rows pertaining to transactions spanning across 10+ years. Since it obviously is a waste to import them all I am trying to import a subset of data limited to particular month ranges. When I try to use the below code as a test to the connection and import top 1000 rows it works fine, but then when I specify a date range in the where clause it returns an empty data frame.

I would really appreciate any help I can get to correct this. Thanks in advance

import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'
'Server=NAME;'
'Database=DBNAME;'
'Trusted_Connection=yes;')

tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes

Output:

DW_Id int64
Company object
Campaign Initiative object
Closing Entry bool
Department Code object
Description object
Document No object
Document Type int64
Entry No int64
Expense Type object
GL Account No object
Incremental Field datetime64[ns]
Posting Date datetime64[ns]
Strategic Initiative object
Vendor No object
Vendor Name object
Amount float64
GBP Amount float64
Actual per CWT object
DW_Batch int64
DW_SourceCode object
DW_TimeStamp datetime64[ns]
dtype: object

df.head()

DW_Id Company Campaign Initiative Closing Entry Department Code Description Document No Document Type Entry No Expense Type ... Posting Date Strategic Initiative Vendor No Vendor Name Amount GBP Amount Actual per CWT DW_Batch DW_SourceCode DW_TimeStamp
0 1 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 1 None ... 2007-02-27 None None None -125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
1 2 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 2 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
2 3 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 3 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437

however when I use the below code to filter for date range between 04-01-2020 AND 04-30-2020 it gives me an empty dataframe

df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)

df1.dtypes

DW_Id                   object
Company                 object
Campaign Initiative     object
Closing Entry           object
Department Code         object
Description             object
Document No             object
Document Type           object
Entry No                object
Expense Type            object
GL Account No           object
Incremental Field       object
Posting Date            object
Strategic Initiative    object
Vendor No               object
Vendor Name             object
Amount                  object
GBP Amount              object
Actual per CWT          object
DW_Batch                object
DW_SourceCode           object
DW_TimeStamp            object
dtype: object

I believe the date range where clause is what is throwing this off but I am unable to find a solution to fix this and would really appreciate any input. Thanks!

Answer

Consider parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc and pandas.read_sql_query. Doing so you avoid the need to escape quotes and concatenate or interpolate literal values or variables.

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE [Posting Date] BETWEEN ? AND ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])

Or by date parts:

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE YEAR([Posting Date]) = ? 
           AND MONTH([Posting Date]) = ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=[2020, 4])

Advertisement

Answer