I need to load a large number of csv files in to a PostgreSQL database. I have a table source_files
which contains the file paths and a flag which indicates whether a file has already been loaded for all the csv files I need to load.
I have written the following code which loads the first file correctly but then throws the error:
ERROR: cursor "curs" does not exist
Why am I getting this error and how can I fix it?
DO $$
DECLARE
file_record record;
curs CURSOR
FOR SELECT id, file_path
FROM source_files
WHERE added_to_db=FALSE
ORDER BY id;
BEGIN
OPEN curs;
LOOP
-- Get next entry in source file which contains name of csv to load
FETCH curs INTO file_record;
exit WHEN NOT found;
BEGIN
-- As we need to add a column to the data after loading csv but before inserting
-- into final table we use a temporary table mytemp
DROP TABLE mytemp;
CREATE TABLE mytemp
(
dataA numeric,
dataB numeric
);
-- Load csv file
EXECUTE FORMAT('COPY mytemp
FROM ''%s''
DELIMITER '',''
CSV HEADER;', file_record.file_path);
-- Add Column specifying what source file the data is from
ALTER TABLE mytemp
ADD COLUMN source_id int;
UPDATE mytemp
SET source_id=file_record.id;
-- Add the data to the destination table
INSERT INTO data_table(
dataA,
dataB,
source_id
)
SELECT
mytemp.dataA,
mytemp.dataB
mytemp.source_id
FROM
mytemp
-- Set a flag to indicate that the current file in source_files has been loaded
UPDATE source_files
SET added_to_db=TRUE WHERE CURRENT OF curs;
COMMIT;
END;
END LOOP;
CLOSE curs;
END $$;
Advertisement
Answer
There big problem with your code is the COMMIT
. You can use COMMIT
in a DO
statement, but the cursor is closed as soon as the transaction ends.
In SQL you can create a cursor WITH HOLD
that remains valid after the transaction has ended, but that is not available in PL/pgSQL.
I suggest removing the COMMIT
.
Another error in your code is your use of the format
function, which exposes you to SQL injection. Instead of
FORMAT('COPY mytemp
FROM ''%s''
DELIMITER '',''
CSV HEADER;', file_record.file_path);
use
FORMAT('COPY mytemp
FROM %L
DELIMITER '',''
CSV HEADER;', file_record.file_path);
You can make the code much simpler by using an implicit cursor in the loop:
FOR file_record IN
SELECT id, file_path
FROM source_files
WHERE added_to_db=FALSE
ORDER BY id
LOOP
END LOOP;
That saves you declaring the cursor and the EXIT WHEN
. The OPEN
and CLOSE
statements are unnecessary anyway.