I have a series of CSV files, in the format:
"Alice","Bob","A",123.46,"4" "Charlie","Dana","B",987.6543,"9" ...
I want to produce a table like this:
id file_id mch c1 c2 c3 c4 c5 --- ------- ----- --------- ------ --- -------- --- 1 462 FALSE 'Alice' 'Bob' 'A' 123.46 '4' 2 462 FALSE 'Charlie' 'Dana' 'B' 987.6543 '9' ... 462 FALSE ... ... ... ... ... ... ... ... ... ... ... ... ...
where
id
is the primary key, is arbitrary, and should be automatically generated in order of insertion. It’s specified as such.file_id
is constant per file that I want to insert to this table, though varies between files. It is known before I try to bulk-add the file.mch
is always false, at time of insertionc1
throughc5
are generic column titles, with types known in advance.
Currently I’m using the following SQL command to bulk-insert each of my CSV files:
COPY pos(c1,c2,c3,c4,c5) FROM 'C:/Users/.../filename.csv' WITH (FORMAT CSV, HEADER FALSE, ENCODING 'UTF8')
which works for filling the primary key id. This produces a table without the columns file_id
or mch
.
But I can’t figure out how to properly populate the other two columns (file_id
and mch
) within the same statement, without doing a whole other UPDATE
statement.
Advertisement
Answer
Assuming this table definition:
CREATE TABLE pos ( id serial PRIMARY KEY , file_id int , mch bool , c1 text , c2 text , c3 text , c4 numeric , c5 text );
Table columns not specified in the
COPY FROM
column list will receive their default values.
You already see that effect for id
, where the default is assigned. Remaining columns to deal with: mch
and file_id
:
mch
is always false, at time of insertion
Make it so, permanently:
ALTER TABLE pos ALTER mc SET DEFAULT false;
Necessary privileges: You must own the table or be a superuser to use ALTER TABLE
.
file_id
is constant per file that I want to insert to this table, though varies between files. It is known before I try to bulk-add the file.
Set the column default before running COPY
. You can do that inside the transaction if you don’t want other inserts to have the same default. (DDL commands are fully transactional in Postgres.) But that locks the table for the rest of the transaction (ACCESS EXCLUSIVE
lock).
Or you run a separate command (in its own transaction) before COPY
. Then the default is possibly effective for concurrent transactions. But you only need a very brief lock on the table.
-- BEGIN; -- optional transaction wrapper ALTER TABLE pos ALTER file_id SET DEFAULT 462; COPY pos(c1,c2,c3,c4,c5) FROM 'C:/Users/.../filename.csv' WITH (FORMAT CSV, HEADER FALSE, ENCODING 'UTF8'); -- unchanged! -- ALTER TABLE pos ALTER file_id DROP DEFAULT; -- optional -- COMMIT; -- if you BEGIN; also COMMIT;
You may or may not want to reset the column default for file_id
after COPY
. If you are going to run the next COPY
with a new default right after, you might just set that new default …
Or you may want to record the old column default and reset it. See: