I have a table in a database containing millions of rows pertaining to transactions spanning across 10+ years. Since it obviously is a waste to import them all I am trying to import a subset of data limited to particular month ranges. When I try to use the below code as a test to the connection and import top 1000 rows it works fine, but then when I specify a date range in the where clause it returns an empty data frame.
I would really appreciate any help I can get to correct this. Thanks in advance
import pyodbc import pandas as pd conn = pyodbc.connect('Driver={SQL Server};' 'Server=NAME;' 'Database=DBNAME;' 'Trusted_Connection=yes;') tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];""" df = pd.read_sql_query(tquery, conn) df.dtypes
Output:
DW_Id int64 Company object Campaign Initiative object Closing Entry bool Department Code object Description object Document No object Document Type int64 Entry No int64 Expense Type object GL Account No object Incremental Field datetime64[ns] Posting Date datetime64[ns] Strategic Initiative object Vendor No object Vendor Name object Amount float64 GBP Amount float64 Actual per CWT object DW_Batch int64 DW_SourceCode object DW_TimeStamp datetime64[ns] dtype: object df.head() DW_Id Company Campaign Initiative Closing Entry Department Code Description Document No Document Type Entry No Expense Type ... Posting Date Strategic Initiative Vendor No Vendor Name Amount GBP Amount Actual per CWT DW_Batch DW_SourceCode DW_TimeStamp 0 1 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 1 None ... 2007-02-27 None None None -125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437 1 2 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 2 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437 2 3 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 3 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
however when I use the below code to filter for date range between 04-01-2020 AND 04-30-2020 it gives me an empty dataframe
df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn) df1.dtypes DW_Id object Company object Campaign Initiative object Closing Entry object Department Code object Description object Document No object Document Type object Entry No object Expense Type object GL Account No object Incremental Field object Posting Date object Strategic Initiative object Vendor No object Vendor Name object Amount object GBP Amount object Actual per CWT object DW_Batch object DW_SourceCode object DW_TimeStamp object dtype: object
I believe the date range where clause is what is throwing this off but I am unable to find a solution to fix this and would really appreciate any input. Thanks!
Advertisement
Answer
Consider parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc
and pandas.read_sql_query
. Doing so you avoid the need to escape quotes and concatenate or interpolate literal values or variables.
sql = '''SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ? AND ?; ''' df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])
Or by date parts:
sql = '''SELECT * FROM [SALES Transactions_V] WHERE YEAR([Posting Date]) = ? AND MONTH([Posting Date]) = ?; ''' df1 = pd.read_sql_query(sql, conn, params=[2020, 4])