I have a database that has a central object that most of the data depends on; many reference table, one to many joins and many to many joins. I need to back up all rows based on a property in the central table and be able to restore these rows.
For backup purposes. I have added
ON DELETE CASCADE to the relevant dependent tables. So I can wipe all rows by simply
DELETE FROM main_table WHERE attr=1
But I need to be able to restore all the rows and references associated with them to an earlier point in time. Points:
pg_dumpdoes not allow sql statement based dumps
COPYdoes not generate SQL
Therefore this is the solution I propose and I hate it. For each table, attiribute value:
CREATE TABLE tmp_main_table AS SELECT * FROM main_table WHERE attr=val; CREATE TABLE tmp_table_1 AS SELECT * FROM table_1 WHERE main_table_id IN (SELECT id FROM main_Table WHERE attr=val); ...
pg_dump -t tmp_main_table -t tmp_table_1 ...
Restoring will then require reinserting the rows into those tables. I have full control over the database so I can be sure that reference table ides remain the same, but it all seems very brittle. Is there a better way?
EDIT: More Details
We run a bunch of commerce sites with customers updating their inventory via API. However, it often happens that the customers’ API calls are erroneous and they don’t notice it for days. In that case, they will call us and ask to restore their inventory to a previous state. We need to be able to take periodic snap shots of data by customer for quick restore. Here is a simple sample DB to illustrate, in the below example I need to back up just the rows in
customer_id for restore at a later date and when I restore I want the other rows left alone and only the rows belonging to that particular
CREATE TABLE customer ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE color ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE flag ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE product ( id SERIAL PRIMARY KEY, name TEXT, color_id INTEGER, -- foreign key to color(id) customer_id INTEGER -- foreign key to customer(id) ); CREATE TABLE product_flag_join ( id SERIAL PRIMARY KEY, product_id INTEGER, -- foreign key to product(id) flag_id -- foreign key to flag(id) );
Using table partitioning seems a good solution. It requires quite a lot of preparation work but gives great comfort in backing up and restoring data. Each customer should have his own partitions in all the tables that contain data specific for a customer, so you can backup the entire database with one command (preferably in a binary format), and restore only selected tables. You can also consider creating the client partitions in separate schemas and restore a whole schema when needed.
If for some reason the above is impossible I would think of the following schema:
- install a rescue Postgres server on a local machine
- perform daily binary backups of the entire database
- restore the backup from the required day to the rescue server when needed
- generate an SQL dump by querying the rescue server
- restore the SQL dump to the main server