I am trying to store some tables I create in my code in an RDS instance using psycopg2. The script runs without issue and I can see the table being stored correctly in the DB. However, if I try to retrieve the query, I only see the columns, but no data:
import pandas as pd import psycopg2 test=pd.DataFrame({'A':[1,1],'B':[2,2]}) #connect is a function to connect to the RDS instance connection= connect() cursor=connection.cursor() query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);' cursor.execute(query) connection.commit() cursor.close() connection.close()
This script runs without issues and, printing out file_check
from the following script:
connection=connect() # check if file already exists in SQL sql = """ SELECT "table_name","column_name", "data_type", "table_schema" FROM INFORMATION_SCHEMA.COLUMNS WHERE "table_schema" = 'public' ORDER BY table_name """ file_check=pd.read_sql(sql, con=connection) connection.close()
I get:
table_name column_name data_type table_schema 0 test a character varying public 1 test b character varying public
which looks good.
Running the following however:
read='select * from public.test' df=pd.read_sql(read,con=connection)
returns:
Empty DataFrame Columns: [a, b] Index: []
Anybody have any idea why this is happening? I cannot seem to get around this
Advertisement
Answer
I was able to solve this: As it was pointed out by @AKX, I was only creating the table structure, but I was not filling in the table.
I now import import psycopg2.extras
as well and, after this:
query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);' cursor.execute(query)
I add something like:
update_query='INSERT INTO test(A, B) VALUES(%s,%s) ON CONFLICT DO NOTHING' psycopg2.extras.execute_batch(cursor, update_query, test.values) cursor.close() connection.close()
My table is now correctly filled after checking with pd.read_sql