I need to load a large number of csv files in to a PostgreSQL database. I have a table source_files
which contains the file paths and a flag which indicates whether a file has already been loaded for all the csv files I need to load.
I have written the following code which loads the first file correctly but then throws the error:
ERROR: cursor "curs" does not exist
Why am I getting this error and how can I fix it?
DO $$ DECLARE file_record record; curs CURSOR FOR SELECT id, file_path FROM source_files WHERE added_to_db=FALSE ORDER BY id; BEGIN OPEN curs; LOOP -- Get next entry in source file which contains name of csv to load FETCH curs INTO file_record; exit WHEN NOT found; BEGIN -- As we need to add a column to the data after loading csv but before inserting -- into final table we use a temporary table mytemp DROP TABLE mytemp; CREATE TABLE mytemp ( dataA numeric, dataB numeric ); -- Load csv file EXECUTE FORMAT('COPY mytemp FROM ''%s'' DELIMITER '','' CSV HEADER;', file_record.file_path); -- Add Column specifying what source file the data is from ALTER TABLE mytemp ADD COLUMN source_id int; UPDATE mytemp SET source_id=file_record.id; -- Add the data to the destination table INSERT INTO data_table( dataA, dataB, source_id ) SELECT mytemp.dataA, mytemp.dataB mytemp.source_id FROM mytemp -- Set a flag to indicate that the current file in source_files has been loaded UPDATE source_files SET added_to_db=TRUE WHERE CURRENT OF curs; COMMIT; END; END LOOP; CLOSE curs; END $$;
Advertisement
Answer
There big problem with your code is the COMMIT
. You can use COMMIT
in a DO
statement, but the cursor is closed as soon as the transaction ends.
In SQL you can create a cursor WITH HOLD
that remains valid after the transaction has ended, but that is not available in PL/pgSQL.
I suggest removing the COMMIT
.
Another error in your code is your use of the format
function, which exposes you to SQL injection. Instead of
FORMAT('COPY mytemp FROM ''%s'' DELIMITER '','' CSV HEADER;', file_record.file_path);
use
FORMAT('COPY mytemp FROM %L DELIMITER '','' CSV HEADER;', file_record.file_path);
You can make the code much simpler by using an implicit cursor in the loop:
FOR file_record IN SELECT id, file_path FROM source_files WHERE added_to_db=FALSE ORDER BY id LOOP ... END LOOP;
That saves you declaring the cursor and the EXIT WHEN
. The OPEN
and CLOSE
statements are unnecessary anyway.