I am trying to convert a set of relational .csv files to a db with sqlite3:
import pandas as pd import os import csv import sqlite3 dir = 'db_folder/' dir2 = 'test/1/' # Clean data for file in os.listdir(dir): if file.endswith(".csv"): filename = os.path.splitext(os.path.basename(file))[0] with open(f'{dir}{file}', encoding='utf8', errors='ignore') as f: columns = list(filter(None, csv.DictReader(f).fieldnames)) df = pd.read_csv(f, low_memory=False) clean = df.dropna(1, how='all') path = r'./TEST/1/' clean.to_csv(f'{path}{filename}.csv', header=columns, index=False) # Create tables and insert data conn = sqlite3.connect('test/mydatabase.db') cursor = conn.cursor() for file in os.listdir(dir2): filename = os.path.splitext(os.path.basename(file))[0] with open(f'{dir2}{file}', encoding='utf8', errors='ignore') as f: columns = csv.DictReader(f).fieldnames cursor.execute('DROP TABLE IF EXISTS "{table}"'.format(table=filename)) cursor.execute('CREATE TABLE "{table}" ( {columns} )'.format( table=filename, columns=','.join('"{}"'.format(column) for column in columns))) dr = csv.DictReader(f) cursor.executemany('INSERT INTO "{table}" VALUES ( {values} )'.format( table=filename, values=','.join('?' for column in columns)), (list(map(row.get, columns)) for row in dr)) conn.commit() cursor.close() conn.close()
This runs without error and produces a database with the correct tables which have the correct columns. The rows, however, are all NULL for some reason.
I have tried debugging the insertion line to see what is going on. I stepped into the executemany()
function. Inside the DictReader variable was a fieldnames list ['2', '383', '0.5019', '2003-08-12']
which is indeed one of the rows that belong in a table. I cannot figure out why it is being inserted into my database as NULL.
Advertisement
Answer
(list(map(row.get, columns)) for row in dr))
produces a generator object, but executemany
takes a sequence of sequences. You instead need [list(map(row.get, dr.fieldnames)) for row in dr]
, []
instead of ()
.