Skip to content
Advertisement

Using date range where clause in pandas SQL query returns an empty dataframe

I have a table in a database containing millions of rows pertaining to transactions spanning across 10+ years. Since it obviously is a waste to import them all I am trying to import a subset of data limited to particular month ranges. When I try to use the below code as a test to the connection and import top 1000 rows it works fine, but then when I specify a date range in the where clause it returns an empty data frame.

I would really appreciate any help I can get to correct this. Thanks in advance

import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'
'Server=NAME;'
'Database=DBNAME;'
'Trusted_Connection=yes;')

tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes

Output:

DW_Id                            int64
Company                         object
Campaign Initiative             object
Closing Entry                     bool
Department Code                 object
Description                     object
Document No                     object
Document Type                    int64
Entry No                         int64
Expense Type                    object
GL Account No                   object
Incremental Field       datetime64[ns]
Posting Date            datetime64[ns]
Strategic Initiative            object
Vendor No                       object
Vendor Name                     object
Amount                         float64
GBP Amount                     float64
Actual per CWT                  object
DW_Batch                         int64
DW_SourceCode                   object
DW_TimeStamp            datetime64[ns]
dtype: object

df.head()

DW_Id   Company Campaign Initiative Closing Entry   Department Code Description Document No Document Type   Entry No    Expense Type    ... Posting Date    Strategic Initiative    Vendor No   Vendor Name Amount  GBP Amount  Actual per CWT  DW_Batch    DW_SourceCode   DW_TimeStamp
0   1   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   1   None    ... 2007-02-27  None    None    None    -125.25 0.0 None    13726   Nav 2020-05-11 08:50:37.437
1   2   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   2   None    ... 2007-02-27  None    AGD01   AGDATA, INC.    125.25  0.0 None    13726   Nav 2020-05-11 08:50:37.437
2   3   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   3   None    ... 2007-02-27  None    AGD01   AGDATA, INC.    125.25  0.0 None    13726   Nav 2020-05-11 08:50:37.437

however when I use the below code to filter for date range between 04-01-2020 AND 04-30-2020 it gives me an empty dataframe

df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)

df1.dtypes

DW_Id                   object
Company                 object
Campaign Initiative     object
Closing Entry           object
Department Code         object
Description             object
Document No             object
Document Type           object
Entry No                object
Expense Type            object
GL Account No           object
Incremental Field       object
Posting Date            object
Strategic Initiative    object
Vendor No               object
Vendor Name             object
Amount                  object
GBP Amount              object
Actual per CWT          object
DW_Batch                object
DW_SourceCode           object
DW_TimeStamp            object
dtype: object

I believe the date range where clause is what is throwing this off but I am unable to find a solution to fix this and would really appreciate any input. Thanks!

Advertisement

Answer

Consider parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc and pandas.read_sql_query. Doing so you avoid the need to escape quotes and concatenate or interpolate literal values or variables.

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE [Posting Date] BETWEEN ? AND ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])

Or by date parts:

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE YEAR([Posting Date]) = ? 
           AND MONTH([Posting Date]) = ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=[2020, 4])
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement