Writing dataframe via sql query (pyodbc): pyodbc.E…

I’d like to parse a dataframe to two pre-define columns in an sql table. The schema in sql is:

abc(varchar(255))
def(varchar(255))

With a dataframe like so:

df = pd.DataFrame(
    [
        [False, False],
        [True, True],

    ],
    columns=["ABC", "DEF"],
)

And the sql query is like so:

with conn.cursor() as cursor:
    string = "INSERT INTO {0}.{1}(abc, def) VALUES (?,?)".format(db, table)
    cursor.execute(string, (df["ABC"]), (df["DEF"]))
    cursor.commit()

So that the query (string) looks like so:

'INSERT INTO my_table(abc, def) VALUES (?,?)'

This creates the following error message:

pyodbc.Error: ('HY004', '[HY004] [Cloudera][ODBC] (11320) SQL type not supported. (11320) (SQLBindParameter)')

So I try using a direct query (not via Python) in the Impala editor, on the following:

'INSERT INTO my_table(abc, def) VALUES ('Hey','Hi');'

And produces this error message:

AnalysisException: Possible loss of precision for target table 'my_table'. Expression ''hey'' (type: `STRING) would need to be cast to VARCHAR(255) for column 'abc'`

How come I cannot even insert into my table simple strings, like “Hi”? Is my schema set up correctly or perhaps something else?

Answer

STRING type in Impala has a size limit of 2GB. VARCHAR‘s length is whatever you define it to be, but not more than 64KB. Thus there is a potential of data loss if you implicitly convert one into another.

By default, literals are treated as type STRING. So, in order to insert a literal into VARCHAR field you need to CAST it appropriately.

INSERT INTO my_table(abc, def) VALUES (CAST('Hey' AS VARCHAR(255)),CAST('Hi' AS VARCHAR(255)));

Writing dataframe via sql query (pyodbc): pyodbc.Error: (‘HY004’, ‘[HY004])

Advertisement

Answer