I am trying to store some tables I create in my code in an RDS instance using psycopg2. The script runs without issue and I can see the table being stored correctly in the DB. However, if I try to retrieve the query, I only see the columns, but no data:
import pandas as pd
import psycopg2
test=pd.DataFrame({'A':[1,1],'B':[2,2]})
#connect is a function to connect to the RDS instance
connection= connect()
cursor=connection.cursor()
query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);'
cursor.execute(query)
connection.commit()
cursor.close()
connection.close()
This script runs without issues and, printing out file_check
from the following script:
connection=connect()
# check if file already exists in SQL
sql = """
SELECT "table_name","column_name", "data_type", "table_schema"
FROM INFORMATION_SCHEMA.COLUMNS
WHERE "table_schema" = 'public'
ORDER BY table_name
"""
file_check=pd.read_sql(sql, con=connection)
connection.close()
I get:
table_name column_name data_type table_schema
0 test a character varying public
1 test b character varying public
which looks good.
Running the following however:
read='select * from public.test'
df=pd.read_sql(read,con=connection)
returns:
Empty DataFrame
Columns: [a, b]
Index: []
Anybody have any idea why this is happening? I cannot seem to get around this
Advertisement
Answer
I was able to solve this: As it was pointed out by @AKX, I was only creating the table structure, but I was not filling in the table.
I now import import psycopg2.extras
as well and, after this:
query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);'
cursor.execute(query)
I add something like:
update_query='INSERT INTO test(A, B) VALUES(%s,%s) ON CONFLICT DO NOTHING'
psycopg2.extras.execute_batch(cursor, update_query, test.values)
cursor.close()
connection.close()
My table is now correctly filled after checking with pd.read_sql