I have a problem like this: I need to optimize the application, with db (postgreSQL), the table looks like this:
CREATE TABLE voter_count( id SERIAL, name VARCHAR NOT NULL, birthDate DATE NOT NULL, count INT NOT NULL, PRIMARY KEY(id), UNIQUE (name, birthDate))
I have more than a thousand such voters, and I need to put all of them in the database, but among them there are several duplicates that could vote several times (from 2 to infinity), and I need to, when meeting such a duplicate, increase the count field for an existing one (for a voter with the same name and birthdate). Previously, I just checked whether there is such a voter in the table or not, and if there is, then find it and increase the count.
But the program worked for too long, and I tried to do it through MULTI INSERT and use ON CONFLICT DO UPDATE to increase count, but I get an error, then I asked a question on stackoverflow, and I was offered to do a lot of INSERTs, through a loop, but in PostgreSQL.
INSERT INTO voter_count(name, birthdate, count) VALUES ('Ivan', '1998-08-05', 1), ('Sergey', '1998-08-29', 1), ('Ivan', '1998-08-05', 1) ON CONFLICT (name, birthdate) DO UPDATE SET count = (voter_count.count + 1)
Question: how to do INSERT in a loop through PostgreSQL.
Advertisement
Answer
Probably the best option is to insert before all the data in a table without primary key, for instance:
CREATE TABLE voter_count_with_duplicates( name VARCHAR NOT NULL, birthDate DATE NOT NULL)
and then insert the data with a single statement:
INSERT INTO voter_count (name, birthDate, count) SELECT name, birthDate, COUNT(*) FROM voter_count_with_duplicates GROUP BY name, birthDate
Note that if you have the data in a structured text file (for instance a CSV file), you can insert all the data into voter_count_with_duplicates
with a single COPY
statement.
If you have to insert (a lot of) new data with the table already populated, there are several possibilities. One is to use the solution in the comment. Another one is to perform an an update and an insert:
WITH present_tuples AS (SELECT name, birthDate, COUNT(*) AS num_of_new_votes FROM voter_count_with_duplicates d JOIN voter_count c ON v.name = d.name and v.birthDate = d.birthDate GROUP BY name, birthDate) UPDATE voter_count SET count = count + num_of_new_votes FROM present_tuples WHERE present_tuples.name = voter_count.name AND present_tuples.birthDate = voter_count.birthDate; WITH new_tuples AS (SELECT name, birthDate, COUNT(*) AS votes FROM voter_count_with_duplicates d WHERE NOT EXISTS SELECT * FROM voter_count c WHERE v.name = d.name and v.birthDate = d.birthDate GROUP BY name, birthDate) INSERT INTO voter_count (name, birthDate, count) SELECT name, birthDate, votes FROM new_tuples;