I need to load a large number of csv files in to a PostgreSQL database. I have a table source_files which contains the file paths and a flag which indicates whether a file has already been loaded for all the csv files I need to load.
I have written the following code which loads the first file correctly but then throws the error:
ERROR: cursor "curs" does not exist
Why am I getting this error and how can I fix it?
DO $$
DECLARE
file_record record;
curs CURSOR
FOR SELECT id, file_path
FROM source_files
WHERE added_to_db=FALSE
ORDER BY id;
BEGIN
OPEN curs;
LOOP
-- Get next entry in source file which contains name of csv to load
FETCH curs INTO file_record;
exit WHEN NOT found;
BEGIN
-- As we need to add a column to the data after loading csv but before inserting
-- into final table we use a temporary table mytemp
DROP TABLE mytemp;
CREATE TABLE mytemp
(
dataA numeric,
dataB numeric
);
-- Load csv file
EXECUTE FORMAT('COPY mytemp
FROM ''%s''
DELIMITER '',''
CSV HEADER;', file_record.file_path);
-- Add Column specifying what source file the data is from
ALTER TABLE mytemp
ADD COLUMN source_id int;
UPDATE mytemp
SET source_id=file_record.id;
-- Add the data to the destination table
INSERT INTO data_table(
dataA,
dataB,
source_id
)
SELECT
mytemp.dataA,
mytemp.dataB
mytemp.source_id
FROM
mytemp
-- Set a flag to indicate that the current file in source_files has been loaded
UPDATE source_files
SET added_to_db=TRUE WHERE CURRENT OF curs;
COMMIT;
END;
END LOOP;
CLOSE curs;
END $$;
Advertisement
Answer
There big problem with your code is the COMMIT. You can use COMMIT in a DO statement, but the cursor is closed as soon as the transaction ends.
In SQL you can create a cursor WITH HOLD that remains valid after the transaction has ended, but that is not available in PL/pgSQL.
I suggest removing the COMMIT.
Another error in your code is your use of the format function, which exposes you to SQL injection. Instead of
FORMAT('COPY mytemp
FROM ''%s''
DELIMITER '',''
CSV HEADER;', file_record.file_path);
use
FORMAT('COPY mytemp
FROM %L
DELIMITER '',''
CSV HEADER;', file_record.file_path);
You can make the code much simpler by using an implicit cursor in the loop:
FOR file_record IN SELECT id, file_path FROM source_files WHERE added_to_db=FALSE ORDER BY id LOOP ... END LOOP;
That saves you declaring the cursor and the EXIT WHEN. The OPEN and CLOSE statements are unnecessary anyway.