I’m trying to write a program in python that uses a query in SQL to collect data and make a regression model. When I try to actually create the model, however, it gives me this error.
import pyodbc import pandas import statsmodels.api as sm import numpy as np server = 'ludsampledb.database.windows.net' database = 'SampleDB' username = 'sampleadmin' password = '+U9Ly9/p' driver = '{ODBC Driver 17 for SQL Server}' table = 'GooglePlayStore' conn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password) sql1 = "SELECT ISNULL((CASE WHEN ISNUMERIC(Rating) = 1 THEN CONVERT(float, Rating) ELSE 0 END), 0) AS 'Rating', ISNULL((CASE WHEN ISNUMERIC(Reviews) = 1 THEN CONVERT(float, Reviews) ELSE 0 END), 0) AS 'Reviews', ISNULL((CASE WHEN ISNUMERIC(SUBSTRING(Size, 0, LEN(Size))) = 1 THEN CONVERT(float, SUBSTRING(Size, 0, LEN(Size))) ELSE 0 END), 0) AS 'Size', ISNULL((CASE WHEN ISNUMERIC(REPLACE(Price, '$', '')) = 1 THEN CONVERT(float, REPLACE(Price, '$', '')) ELSE 0 END), 0) AS 'Price', ISNULL((CASE WHEN ISNUMERIC(REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) = 1 THEN CONVERT(float, REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) ELSE 0 END), 0) AS 'Installs' FROM GooglePlayStore" data = pandas.read_sql(sql1,conn) x = np.array([data["Rating"], data["Size"], data["Installs"], data["Price"]]).reshape(-1, 1) x = sm.add_constant(x) print(x.shape) y = np.array([data['Reviews']]).reshape(-1, 1) print(y.shape) fit = sm.OLS(y, x).fit() #This is where the error is occurring
I’m pretty sure that I know what is going wrong, but I have no idea how to fix it. I’ve tried several things already, but none so far have worked.
Advertisement
Answer
I figured out what was going wrong. It was a little bit about how I had written my x and y values, and I had to get rid of my constant variable:
sql1 = "SELECT ISNULL((CASE WHEN ISNUMERIC(Rating) = 1 THEN CONVERT(float, Rating) ELSE 0 END), 0) AS 'Rating', ISNULL((CASE WHEN ISNUMERIC(Reviews) = 1 THEN CONVERT(float, Reviews) ELSE 0 END), 0) AS 'Reviews', ISNULL((CASE WHEN ISNUMERIC(SUBSTRING(Size, 0, LEN(Size))) = 1 THEN CONVERT(float, SUBSTRING(Size, 0, LEN(Size))) ELSE 0 END), 0) AS 'Size', ROUND(ISNULL((CASE WHEN ISNUMERIC(REPLACE(Price, '$', '')) = 1 THEN CONVERT(float, REPLACE(Price, '$', '')) ELSE 0 END), 0), 0) AS 'Price', (ISNULL((CASE WHEN ISNUMERIC(REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) = 1 THEN CONVERT(float, REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) ELSE 0 END), 0) / 10) AS 'Installs' FROM GooglePlayStore" data = pandas.read_sql(sql1,conn) x = data.reindex(["Price", "Installs", "Size", "Rating"], axis="columns") y = np.array([data['Reviews']]).reshape(-1, 1) fit = sm.OLS(y, x).fit()