Skip to content
Advertisement

How to fix the error: ValueError: endog and exog matrices are different sizes

I’m trying to write a program in python that uses a query in SQL to collect data and make a regression model. When I try to actually create the model, however, it gives me this error.

import pyodbc
import pandas
import statsmodels.api as sm
import numpy as np

server = 'ludsampledb.database.windows.net'
database = 'SampleDB'
username = 'sampleadmin'
password = '+U9Ly9/p'   
driver = '{ODBC Driver 17 for SQL Server}'
table = 'GooglePlayStore'

conn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password)

sql1 = "SELECT ISNULL((CASE WHEN ISNUMERIC(Rating) = 1 THEN CONVERT(float, Rating) ELSE 0 END), 0) AS 'Rating', ISNULL((CASE WHEN ISNUMERIC(Reviews) = 1 THEN CONVERT(float, Reviews) ELSE 0 END), 0) AS 'Reviews', ISNULL((CASE WHEN ISNUMERIC(SUBSTRING(Size, 0, LEN(Size))) = 1 THEN CONVERT(float, SUBSTRING(Size, 0, LEN(Size))) ELSE 0 END), 0) AS 'Size', ISNULL((CASE WHEN ISNUMERIC(REPLACE(Price, '$', '')) = 1 THEN CONVERT(float, REPLACE(Price, '$', '')) ELSE 0 END), 0) AS 'Price', ISNULL((CASE WHEN ISNUMERIC(REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) = 1 THEN CONVERT(float, REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) ELSE 0 END), 0) AS 'Installs' FROM GooglePlayStore"

data = pandas.read_sql(sql1,conn)

x = np.array([data["Rating"], data["Size"], data["Installs"], data["Price"]]).reshape(-1, 1)
x = sm.add_constant(x)
print(x.shape)
y = np.array([data['Reviews']]).reshape(-1, 1)
print(y.shape)

fit = sm.OLS(y, x).fit() #This is where the error is occurring

I’m pretty sure that I know what is going wrong, but I have no idea how to fix it. I’ve tried several things already, but none so far have worked.

Advertisement

Answer

I figured out what was going wrong. It was a little bit about how I had written my x and y values, and I had to get rid of my constant variable:

sql1 = "SELECT ISNULL((CASE WHEN ISNUMERIC(Rating) = 1 THEN CONVERT(float, Rating) ELSE 0 END), 0) AS 'Rating', ISNULL((CASE WHEN ISNUMERIC(Reviews) = 1 THEN CONVERT(float, Reviews) ELSE 0 END), 0) AS 'Reviews', ISNULL((CASE WHEN ISNUMERIC(SUBSTRING(Size, 0, LEN(Size))) = 1 THEN CONVERT(float, SUBSTRING(Size, 0, LEN(Size))) ELSE 0 END), 0) AS 'Size', ROUND(ISNULL((CASE WHEN ISNUMERIC(REPLACE(Price, '$', '')) = 1 THEN CONVERT(float, REPLACE(Price, '$', '')) ELSE 0 END), 0), 0) AS 'Price', (ISNULL((CASE WHEN ISNUMERIC(REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) = 1 THEN CONVERT(float, REPLACE(SUBSTRING(Installs, 0, LEN(Installs)), ',', '')) ELSE 0 END), 0) / 10) AS 'Installs' FROM GooglePlayStore"
data = pandas.read_sql(sql1,conn)

x = data.reindex(["Price", "Installs", "Size", "Rating"], axis="columns")
y = np.array([data['Reviews']]).reshape(-1, 1)

fit = sm.OLS(y, x).fit()
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement