I’m creating a web application and I encountered a problem with importing data to a table in a postgress database. I have excel with id_b and id_cat(books id and categories id) books have several categories and categories can be assigned to many books, excel looks like this: excel data
It has 30 000 records.
I have a problem how to import it into the database(Postgres). The table for this data has two columns: id_b and id_cat. I wanted to export this data to csv in this way, each book has to be assigned a category identifier (e.g., book with identifier 1 should appear 9 times because it has 9 categories assigned to it and so on)- but I can’t do it easily. It should looks like this: correct data Does anyone know any way to get data in this form?
Advertisement
Answer
Your excel sheet format has a large number of columns, which also depends on the number of categories, and SQL isn’t well adapted to that.
The simplest option would be to:
Export your excel data as CSV.
Use a python script to read it using the csv module and output COPY-friendly tab-delimited format.
Load this into the database (or INSERT directly from python script).
Something like that…
import csv with open('bookcat.csv') as csvfile: reader = csv.reader(csvfile) for row in reader: if row: id = row[0].strip() categories = row[1:] for cat in categories: cat = cat.strip() if cat: print("%st%s" % (id, cat))
csv output version:
import csv with open('bookcat.csv') as csvfile, open("out.csv","w") as outfile: reader = csv.reader(csvfile) writer = csv.writer(outfile) for row in reader: if row: id = row[0].strip() categories = row[1:] for cat in categories: cat = cat.strip() if cat: writer.writerow((id, cat))
If you need a specific csv format, check the docs of csv module.