Skip to content
Advertisement

DuckDB – efficiently insert pandas dataframe to table with sequence

CREATE TABLE temp (
    id UINTEGER,
    name VARCHAR,
    age UINTEGER
);
CREATE SEQUENCE serial START 1;

Insertion with series works just fine:

INSERT INTO temp VALUES(nextval('serial'), 'John', 13)

How I can use the sequence with pandas dataframe?

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13

con.execute("INSERT INTO temp SELECT * FROM df")
RuntimeError: Binder Error: table temp has 3 columns but 2 values were supplied

I don’t want to iterate item by item. The goal is to efficiently insert 1000s of items from python to DB. I’m ok to change pandas to something else.

Advertisement

Answer

Can’t you have nextval(‘serial’) as part of your select query when reading the df?

e.g.,

con.execute("INSERT INTO temp SELECT nextval('serial'), Name, Age FROM df")
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement