Skip to content
Advertisement

Use COPY to read a csv into database, while adding constant columns

I have a series of CSV files, in the format:

"Alice","Bob","A",123.46,"4"
"Charlie","Dana","B",987.6543,"9"
...

I want to produce a table like this:

id   file_id  mch    c1         c2      c3   c4        c5
---  -------  -----  ---------  ------  ---  --------  ---
1    462      FALSE  'Alice'    'Bob'   'A'  123.46    '4'
2    462      FALSE  'Charlie'  'Dana'  'B'  987.6543  '9'
...  462      FALSE  ...        ...     ...  ...       ...
...  ...      ...    ...        ...     ...  ...       ...

where

  • id is the primary key, is arbitrary, and should be automatically generated in order of insertion. It’s specified as such.
  • file_id is constant per file that I want to insert to this table, though varies between files. It is known before I try to bulk-add the file.
  • mch is always false, at time of insertion
  • c1 through c5 are generic column titles, with types known in advance.

Currently I’m using the following SQL command to bulk-insert each of my CSV files:

COPY pos(c1,c2,c3,c4,c5) 
FROM 'C:/Users/.../filename.csv' 
WITH (FORMAT CSV, HEADER FALSE, ENCODING 'UTF8')

which works for filling the primary key id. This produces a table without the columns file_id or mch.

But I can’t figure out how to properly populate the other two columns (file_id and mch) within the same statement, without doing a whole other UPDATE statement.

Advertisement

Answer

Assuming this table definition:

CREATE TABLE pos (
  id        serial PRIMARY KEY
, file_id   int
, mch       bool
, c1        text
, c2        text
, c3        text
, c4        numeric
, c5        text
);

The manual on COPY:

Table columns not specified in the COPY FROM column list will receive their default values.

You already see that effect for id, where the default is assigned. Remaining columns to deal with: mch and file_id:

  • mch is always false, at time of insertion

Make it so, permanently:

ALTER TABLE pos ALTER mc SET DEFAULT false;

Necessary privileges: You must own the table or be a superuser to use ALTER TABLE.

  • file_id is constant per file that I want to insert to this table, though varies between files. It is known before I try to bulk-add the file.

Set the column default before running COPY. You can do that inside the transaction if you don’t want other inserts to have the same default. (DDL commands are fully transactional in Postgres.) But that locks the table for the rest of the transaction (ACCESS EXCLUSIVE lock).

Or you run a separate command (in its own transaction) before COPY. Then the default is possibly effective for concurrent transactions. But you only need a very brief lock on the table.

-- BEGIN;  -- optional transaction wrapper
ALTER TABLE pos ALTER file_id SET DEFAULT 462;

COPY pos(c1,c2,c3,c4,c5) 
FROM 'C:/Users/.../filename.csv' 
WITH (FORMAT CSV, HEADER FALSE, ENCODING 'UTF8');  -- unchanged!

-- ALTER TABLE pos ALTER file_id DROP DEFAULT;  -- optional
-- COMMIT;  -- if you BEGIN; also COMMIT;

You may or may not want to reset the column default for file_id after COPY. If you are going to run the next COPY with a new default right after, you might just set that new default …

Or you may want to record the old column default and reset it. See:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement