I’m creating a web application and I encountered a problem with importing data to a table in a postgress database. I have excel with id_b and id_cat(books id and categories id) books have several categories and categories can be assigned to many books, excel looks like this: excel data
It has 30 000 records.
I have a problem how to import it into the database(Postgres). The table for this data has two columns: id_b and id_cat. I wanted to export this data to csv in this way, each book has to be assigned a category identifier (e.g., book with identifier 1 should appear 9 times because it has 9 categories assigned to it and so on)- but I can’t do it easily. It should looks like this: correct data Does anyone know any way to get data in this form?
Advertisement
Answer
Your excel sheet format has a large number of columns, which also depends on the number of categories, and SQL isn’t well adapted to that.
The simplest option would be to:
Export your excel data as CSV.
Use a python script to read it using the csv module and output COPY-friendly tab-delimited format.
Load this into the database (or INSERT directly from python script).
Something like that…
import csv
with open('bookcat.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if row:
id = row[0].strip()
categories = row[1:]
for cat in categories:
cat = cat.strip()
if cat:
print("%st%s" % (id, cat))
csv output version:
import csv
with open('bookcat.csv') as csvfile, open("out.csv","w") as outfile:
reader = csv.reader(csvfile)
writer = csv.writer(outfile)
for row in reader:
if row:
id = row[0].strip()
categories = row[1:]
for cat in categories:
cat = cat.strip()
if cat:
writer.writerow((id, cat))
If you need a specific csv format, check the docs of csv module.