Skip to content
Advertisement

CSV to SQL, but all values are NULL

I am trying to convert a set of relational .csv files to a db with sqlite3:

import pandas as pd
import os
import csv
import sqlite3

dir = 'db_folder/'
dir2 = 'test/1/'

# Clean data
for file in os.listdir(dir):
    if file.endswith(".csv"):
        filename = os.path.splitext(os.path.basename(file))[0]
        with open(f'{dir}{file}', encoding='utf8', errors='ignore') as f:
            columns = list(filter(None, csv.DictReader(f).fieldnames))
            df = pd.read_csv(f, low_memory=False)
            clean = df.dropna(1, how='all')
            path = r'./TEST/1/'
            clean.to_csv(f'{path}{filename}.csv', header=columns, index=False)


# Create tables and insert data
conn = sqlite3.connect('test/mydatabase.db')
cursor = conn.cursor()

for file in os.listdir(dir2):
    filename = os.path.splitext(os.path.basename(file))[0]
    with open(f'{dir2}{file}', encoding='utf8', errors='ignore') as f:
        columns = csv.DictReader(f).fieldnames
        cursor.execute('DROP TABLE IF EXISTS "{table}"'.format(table=filename))
        cursor.execute('CREATE TABLE "{table}" ( {columns} )'.format(
                        table=filename, columns=','.join('"{}"'.format(column) for column in columns)))

        dr = csv.DictReader(f)
        cursor.executemany('INSERT INTO "{table}" VALUES ( {values} )'.format(
                            table=filename, values=','.join('?' for column in columns)),
                            (list(map(row.get, columns)) for row in dr))
    conn.commit()
cursor.close()
conn.close()

This runs without error and produces a database with the correct tables which have the correct columns. The rows, however, are all NULL for some reason.

I have tried debugging the insertion line to see what is going on. I stepped into the executemany() function. Inside the DictReader variable was a fieldnames list ['2', '383', '0.5019', '2003-08-12'] which is indeed one of the rows that belong in a table. I cannot figure out why it is being inserted into my database as NULL.

Advertisement

Answer

(list(map(row.get, columns)) for row in dr)) produces a generator object, but executemany takes a sequence of sequences. You instead need [list(map(row.get, dr.fieldnames)) for row in dr], [] instead of ().

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement