Skip to content
Advertisement

Postgres Dump Selected Rows

I have a database that has a central object that most of the data depends on; many reference table, one to many joins and many to many joins. I need to back up all rows based on a property in the central table and be able to restore these rows.

For backup purposes. I have added ON DELETE CASCADE to the relevant dependent tables. So I can wipe all rows by simply

DELETE FROM main_table WHERE attr=1

But I need to be able to restore all the rows and references associated with them to an earlier point in time. Points:

  • pg_dump does not allow sql statement based dumps
  • COPY does not generate SQL

Therefore this is the solution I propose and I hate it. For each table, attiribute value:

CREATE TABLE tmp_main_table AS SELECT * FROM main_table WHERE attr=val;
CREATE TABLE tmp_table_1 AS SELECT * FROM table_1 WHERE main_table_id IN (SELECT id FROM main_Table WHERE attr=val);
...

Then pg_dump -t tmp_main_table -t tmp_table_1 ...

Restoring will then require reinserting the rows into those tables. I have full control over the database so I can be sure that reference table ides remain the same, but it all seems very brittle. Is there a better way?

EDIT: More Details

We run a bunch of commerce sites with customers updating their inventory via API. However, it often happens that the customers’ API calls are erroneous and they don’t notice it for days. In that case, they will call us and ask to restore their inventory to a previous state. We need to be able to take periodic snap shots of data by customer for quick restore. Here is a simple sample DB to illustrate, in the below example I need to back up just the rows in product by customer_id for restore at a later date and when I restore I want the other rows left alone and only the rows belonging to that particular customer_id restored:

CREATE TABLE customer (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE color (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE flag (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE product (
    id SERIAL PRIMARY KEY,
    name TEXT,
    color_id INTEGER, -- foreign key to color(id)
    customer_id INTEGER -- foreign key to customer(id)
);

CREATE TABLE product_flag_join (
   id SERIAL PRIMARY KEY,
   product_id INTEGER, -- foreign key to product(id)
   flag_id -- foreign key to flag(id)
);

Advertisement

Answer

Using table partitioning seems a good solution. It requires quite a lot of preparation work but gives great comfort in backing up and restoring data. Each customer should have his own partitions in all the tables that contain data specific for a customer, so you can backup the entire database with one command (preferably in a binary format), and restore only selected tables. You can also consider creating the client partitions in separate schemas and restore a whole schema when needed.

If for some reason the above is impossible I would think of the following schema:

  • install a rescue Postgres server on a local machine
  • perform daily binary backups of the entire database
  • restore the backup from the required day to the rescue server when needed
  • generate an SQL dump by querying the rescue server
  • restore the SQL dump to the main server
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement