[WIP] Implement "pg_restore --data-only --clean" as a way to skip WAL

Started by Dimitrios Apostolou9 months ago7 messages
1 attachment(s)

Hello list,

I implemented --clean support for --data-only, in order to avoid logging
to the WAL while populating the database. The attached patch issues a
TRUNCATE before COPY on each worker process, and provides a significant
speed advantage if the cluster is configure with wal_level=minimal.

It also provides a safer way to load the database, as avoiding WAL logging
also avoids potential and painful ENOSPACE on the WAL partition as I
experienced in [1]/messages/by-id/076464ad-3d70-dd25-9e8f-e84f27decfba@gmx.net. In other words it makes things much better for my use
case.

[1]: /messages/by-id/076464ad-3d70-dd25-9e8f-e84f27decfba@gmx.net

But it has some rough edges. I would appreciate guidance and feedback.

* When the table-to-be-TRUNCATEd is referenced as foreign key from other
table, the whole transaction fails with:

ERROR: cannot truncate a table referenced in a foreign key constraint

1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
instead, which has more chances of succeeding, and continuing with
the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
the alternative without failing the whole transaction?

2. Why doesn't --disable-triggers help?
To test this, I have manually issued

ALTER TABLE x DISABLE TRIGGER ALL

to every table and issued manual TRUNCATE still fails. Shouldn't
postgres skip the referential integrity checks?

3. In my tests, all my tables start empty since I have just created the
schema. Then pg_restore --data-only --clean first populates
the /referencing/ tables, which is allowed because of disabled
triggers, and then it tries to load the /referenced/ table.

At this point the referential integrity is already broken. Getting an
error when TRUNCATing the empty /referenced/ table doesn't make
sense.

What do you think?

Thank you in advance,
Dimitris

Attachments:

v1-0001-pg_restore-clean-data-only.patchtext/x-patch; name=v1-0001-pg_restore-clean-data-only.patchDownload
From ee28b0b9c1d3b78f318f259f4907785db98f31e6 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <jimis@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v1] pg_restore --clean --data-only

In parallel restore, it issues a TRUNCATE before COPYing the data into
the tables, within a transaction.

As a result it avoid logging to the WAL, when combined with
wal_level=minimal.
---
 doc/src/sgml/ref/pg_restore.sgml     | 10 ++++++++++
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c |  3 ++-
 src/bin/pg_dump/pg_restore.c         |  6 +++---
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 2e3ba802581..697ca25d70a 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -112,18 +112,28 @@ PostgreSQL documentation
       <listitem>
        <para>
         Before restoring database objects, issue commands
         to <command>DROP</command> all the objects that will be restored.
         This option is useful for overwriting an existing database.
         If any of the objects do not exist in the destination database,
         ignorable error messages will be reported,
         unless <option>--if-exists</option> is also specified.
        </para>
+       <para>
+         In combination with <option>--data-only</option> a TRUNCATE will be
+         attempted instead of DROP, before COPYing the data.  So if you want
+         to overwrite an existing database without re-writing the schema, then
+         issue <option>--data-only --clean</option>.  Together
+         with <option>--parallel</option> it is a high performance way to load
+         the tables, as it avoids logging to the WAL (if the server is
+         configured with <option>wal_level=minimal</option>).  Warning:
+         foreign key constraints might cause table truncation to fail.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-C</option></term>
       <term><option>--create</option></term>
       <listitem>
        <para>
         Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fbf5f1c515e..642795f0223 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -97,18 +97,19 @@ typedef struct _restoreOptions
 	int			noOwner;		/* Don't try to match original object owner */
 	int			noTableAm;		/* Don't issue table-AM-related commands */
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;	/* disable triggers during data-only
 									 * restore */
 	int			use_setsessauth;	/* Use SET SESSION AUTHORIZATION commands
 									 * instead of OWNER TO */
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	int			clean;
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
 	int			no_subscriptions;	/* Skip subscription entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f1ffed038b0..b11c84b2d7e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -976,19 +976,20 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
 					 * to that from using single_txn mode in non-parallel
 					 * restores.
 					 *
 					 * We mustn't do this for load-via-partition-root cases
 					 * because some data might get moved across partition
 					 * boundaries, risking deadlock and/or loss of previously
 					 * loaded data.  (We assume that all partitions of a
 					 * partitioned table will be treated the same way.)
 					 */
-					use_truncate = is_parallel && te->created &&
+					use_truncate = is_parallel &&
+						(te->created || (ropt->dataOnly && ropt->clean)) &&
 						!is_load_via_partition_root(te);
 
 					if (use_truncate)
 					{
 						/*
 						 * Parallel restore is always talking directly to a
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(&AH->public);
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index df119591cca..8aa6f3db57d 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -158,19 +158,19 @@ main(int argc, char **argv)
 	while ((c = getopt_long(argc, argv, "acCd:ef:F:h:I:j:lL:n:N:Op:P:RsS:t:T:U:vwWx1",
 							cmdopts, NULL)) != -1)
 	{
 		switch (c)
 		{
 			case 'a':			/* Dump data only */
 				opts->dataOnly = 1;
 				break;
 			case 'c':			/* clean (i.e., drop) schema prior to create */
-				opts->dropSchema = 1;
+				opts->clean = 1;
 				break;
 			case 'C':
 				opts->createDB = 1;
 				break;
 			case 'd':
 				opts->cparams.dbname = pg_strdup(optarg);
 				break;
 			case 'e':
 				opts->exit_on_error = true;
@@ -337,20 +337,20 @@ main(int argc, char **argv)
 			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
 			exit_nicely(1);
 		}
 		opts->useDB = 1;
 	}
 
 	if (opts->dataOnly && opts->schemaOnly)
 		pg_fatal("options -s/--schema-only and -a/--data-only cannot be used together");
 
-	if (opts->dataOnly && opts->dropSchema)
-		pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
+	if (opts->clean && !opts->dataOnly)
+		opts->dropSchema = 1;
 
 	if (opts->single_txn && opts->txn_size > 0)
 		pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
 
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
 	 */
 	if (opts->createDB && opts->single_txn)
-- 
2.49.0

In reply to: Dimitrios Apostolou (#1)
1 attachment(s)
[WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL

On Mon, 14 Apr 2025, Dimitrios Apostolou wrote:

Hello list,

I implemented --clean support for --data-only, in order to avoid logging to
the WAL while populating the database. The attached patch issues a TRUNCATE
before COPY on each worker process, and provides a significant speed
advantage if the cluster is configure with wal_level=minimal.

It also provides a safer way to load the database, as avoiding WAL logging
also avoids potential and painful ENOSPACE on the WAL partition as I
experienced in [1]. In other words it makes things much better for my use
case.

[1] /messages/by-id/076464ad-3d70-dd25-9e8f-e84f27decfba@gmx.net

Rebased and attached v2 of the patch.

It needed some adjustments for the new flags --with-schema and
--with-data.

I have used this patch several times to pg_restore terabytes of tables
without logging through the WAL, and it performs great.

But it has some rough edges. I would appreciate guidance and feedback.

The rough edges remain: TRUNCATE fails if there are foreign keys. So if
you try pg_restore --data-only --clean to a table referenced via foreign
keys, the patch will not work, as mentioned below.

* When the table-to-be-TRUNCATEd is referenced as foreign key from other
table, the whole transaction fails with:

ERROR: cannot truncate a table referenced in a foreign key constraint

1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
instead, which has more chances of succeeding, and continuing with
the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
the alternative without failing the whole transaction?

2. Why doesn't --disable-triggers help?
To test this, I have manually issued

ALTER TABLE x DISABLE TRIGGER ALL

to every table and issued manual TRUNCATE still fails. Shouldn't
postgres skip the referential integrity checks?

3. In my tests, all my tables start empty since I have just created the
schema. Then pg_restore --data-only --clean first populates
the /referencing/ tables, which is allowed because of disabled
triggers, and then it tries to load the /referenced/ table.

At this point the referential integrity is already broken. Getting an
error when TRUNCATing the empty /referenced/ table doesn't make
sense.

So is there a way to turn off the referential checks for a TRUNCATE?
Do you have any other feedback for this patch?

Thanks,
Dimitris

Attachments:

v2-0001-pg_restore-clean-data-only.patchtext/x-patch; name=v2-0001-pg_restore-clean-data-only.patchDownload
From 1493b0bfff3b048c47878a14dff409763763d12f Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <jimis@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v2] pg_restore --clean --data-only

In parallel restore, it issues a TRUNCATE before COPYing the data into
the tables, within a transaction.

As a result it avoid logging to the WAL, when combined with
wal_level=minimal.
---
 doc/src/sgml/ref/pg_restore.sgml     | 10 ++++++++++
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c | 13 +++++++++++--
 src/bin/pg_dump/pg_restore.c         |  8 ++++----
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 2295df62d03..36537857145 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -122,16 +122,26 @@ PostgreSQL documentation
        <para>
         Before restoring database objects, issue commands
         to <command>DROP</command> all the objects that will be restored.
         This option is useful for overwriting an existing database.
         If any of the objects do not exist in the destination database,
         ignorable error messages will be reported,
         unless <option>--if-exists</option> is also specified.
        </para>
+       <para>
+         In combination with <option>--data-only</option> a TRUNCATE will be
+         attempted instead of DROP, before COPYing the data.  So if you want
+         to overwrite an existing database without re-writing the schema, then
+         issue <option>--data-only --clean</option>.  Together
+         with <option>--parallel</option> it is a high performance way to load
+         the tables, as it avoids logging to the WAL (if the server is
+         configured with <option>wal_level=minimal</option>).  Warning:
+         foreign key constraints might cause table truncation to fail.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-C</option></term>
       <term><option>--create</option></term>
       <listitem>
        <para>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index af0007fb6d2..9428b362c92 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -100,16 +100,17 @@ typedef struct _restoreOptions
 	int			noTableAm;		/* Don't issue table-AM-related commands */
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;	/* disable triggers during data-only
 									 * restore */
 	int			use_setsessauth;	/* Use SET SESSION AUTHORIZATION commands
 									 * instead of OWNER TO */
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	int			clean;
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
 	int			no_policies;	/* Skip row security policies */
 	int			no_publications;	/* Skip publication entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 197c1295d93..d48c6d11e7c 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -979,32 +979,41 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
 					pg_log_info("processing data for table \"%s.%s\"",
 								te->namespace, te->tag);
 
 					/*
 					 * In parallel restore, if we created the table earlier in
 					 * this run (so that we know it is empty) and we are not
 					 * restoring a load-via-partition-root data item then we
 					 * wrap the COPY in a transaction and precede it with a
-					 * TRUNCATE.  If wal_level is set to minimal this prevents
+					 * TRUNCATE.
+					 *
+					 * Likewise if the table was pre-existing and the data is
+					 * being restored with --clean.
+					 *
+					 * If wal_level is set to minimal this prevents
 					 * WAL-logging the COPY.  This obtains a speedup similar
 					 * to that from using single_txn mode in non-parallel
 					 * restores.
 					 *
 					 * We mustn't do this for load-via-partition-root cases
 					 * because some data might get moved across partition
 					 * boundaries, risking deadlock and/or loss of previously
 					 * loaded data.  (We assume that all partitions of a
 					 * partitioned table will be treated the same way.)
 					 */
-					use_truncate = is_parallel && te->created &&
+					use_truncate = is_parallel &&
+						(te->created || (ropt->dumpData && ropt->clean)) &&
 						!is_load_via_partition_root(te);
 
 					if (use_truncate)
 					{
+						pg_log_debug("BEGIN transaction and TRUNCATE table \"%s.%s\"",
+									 te->namespace, te->tag);
+
 						/*
 						 * Parallel restore is always talking directly to a
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(&AH->public);
 
 						/*
 						 * Issue TRUNCATE with ONLY so that child tables are
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index f2182e91825..d08d3979c41 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -207,17 +207,17 @@ main(int argc, char **argv)
 							cmdopts, NULL)) != -1)
 	{
 		switch (c)
 		{
 			case 'a':			/* Dump data only */
 				data_only = true;
 				break;
 			case 'c':			/* clean (i.e., drop) schema prior to create */
-				opts->dropSchema = 1;
+				opts->clean = 1;
 				break;
 			case 'C':
 				opts->createDB = 1;
 				break;
 			case 'd':
 				opts->cparams.dbname = pg_strdup(optarg);
 				break;
 			case 'e':
@@ -420,19 +420,16 @@ main(int argc, char **argv)
 	/* reject conflicting "with-" and "no-" options */
 	if (with_data && no_data)
 		pg_fatal("options --with-data and --no-data cannot be used together");
 	if (with_schema && no_schema)
 		pg_fatal("options --with-schema and --no-schema cannot be used together");
 	if (with_statistics && no_statistics)
 		pg_fatal("options --with-statistics and --no-statistics cannot be used together");
 
-	if (data_only && opts->dropSchema)
-		pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
-
 	if (opts->single_txn && opts->txn_size > 0)
 		pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
 
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
 	 */
 	if (opts->createDB && opts->single_txn)
@@ -451,16 +448,19 @@ main(int argc, char **argv)
 	 */
 	opts->dumpData = ((opts->dumpData && !schema_only && !statistics_only) ||
 					  (data_only || with_data)) && !no_data;
 	opts->dumpSchema = ((opts->dumpSchema && !data_only && !statistics_only) ||
 						(schema_only || with_schema)) && !no_schema;
 	opts->dumpStatistics = ((opts->dumpStatistics && !schema_only && !data_only) ||
 							(statistics_only || with_statistics)) && !no_statistics;
 
+	if (opts->clean && opts->dumpSchema)
+		opts->dropSchema = 1;
+
 	opts->disable_triggers = disable_triggers;
 	opts->enable_row_security = enable_row_security;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTableAm = outputNoTableAm;
 	opts->noTablespace = outputNoTablespaces;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_comments = no_comments;
 	opts->no_policies = no_policies;
-- 
2.49.0

In reply to: Dimitrios Apostolou (#2)
Re: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL

I wonder about the following in pg_restore.c.
Right now my implementation covers only parallel restore.
In the case of non-parallel restore, I want to make the behaviour
similar, i.e. each worker to issue a TRUNCATE before COPY starts.
But then the StartTransaction() doesn't make sense, as everything might
already be in a transaction because of --single-transaction.
Should I completely skip StartTransaction() if !is_parallel?

-                                       use_truncate = is_parallel && te->created &&
+                                       use_truncate = is_parallel &&
+                                               (te->created || (ropt->dumpData && ropt->clean)) &&
                                                 !is_load_via_partition_root(te);
                                         if (use_truncate)
                                         {
+                                               pg_log_debug("BEGIN transaction and TRUNCATE table \"%s.%s\"",
+                                                                        te->namespace, te->tag);
+
                                                 /*
                                                  * Parallel restore is always talking directly to a
                                                  * server, so no need to see if we should issue BEGIN.
                                                  */
                                                 StartTransaction(&AH->public);

/*
* Issue TRUNCATE with ONLY so that child tables are
* not wiped.
*/
ahprintf(AH, "TRUNCATE TABLE ONLY %s;\n\n",
fmtQualifiedId(te->namespace, te->tag));

#4Greg Sabino Mullane
htamfids@gmail.com
In reply to: Dimitrios Apostolou (#2)
Re: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL

I think the overall idea is sound. But we need a better solution for the
truncate fk failure. Can we introspect somehow and do a truncate or do a
delete as necessary? I don't like the idea of simply ignoring the
constraint, or of throwing an error.

--
Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

In reply to: Dimitrios Apostolou (#3)
[WIP PATCH v3] Implement "pg_restore --data-only --clean" as a way to skip WAL

Attaching v3 of the patch, together with a new test file that tests
previously untested flags of pg_restore.

Added to July's commitfest:
https://commitfest.postgresql.org/patch/5821/

On Thu, 12 Jun 2025, Dimitrios Apostolou wrote:

I wonder about the following in pg_restore.c.
Right now my implementation covers only parallel restore.
In the case of non-parallel restore, I want to make the behaviour similar,
i.e. each worker to issue a TRUNCATE before COPY starts.
But then the StartTransaction() doesn't make sense, as everything might
already be in a transaction because of --single-transaction.
Should I completely skip StartTransaction() if !is_parallel?

In the end, I followed the same code path for both parallel and
non-parallel pg_restore. It works even with --single-transaction, where
apparently there are several nested transactions.

The tests pass, so I assume it's OK to start a subtransaction for each
table restoration inside the global single-transaction.

Dimitris

In reply to: Dimitrios Apostolou (#5)
2 attachment(s)
Re: [WIP PATCH v3] Implement "pg_restore --data-only --clean" as a way to skip WAL

Attaching files now...

On Mon, 16 Jun 2025, Dimitrios Apostolou wrote:

Show quoted text

Attaching v3 of the patch, together with a new test file that tests
previously untested flags of pg_restore.

Added to July's commitfest:
https://commitfest.postgresql.org/patch/5821/

On Thu, 12 Jun 2025, Dimitrios Apostolou wrote:

I wonder about the following in pg_restore.c.
Right now my implementation covers only parallel restore.
In the case of non-parallel restore, I want to make the behaviour similar,
i.e. each worker to issue a TRUNCATE before COPY starts.
But then the StartTransaction() doesn't make sense, as everything might
already be in a transaction because of --single-transaction.
Should I completely skip StartTransaction() if !is_parallel?

In the end, I followed the same code path for both parallel and non-parallel
pg_restore. It works even with --single-transaction, where apparently there
are several nested transactions.

The tests pass, so I assume it's OK to start a subtransaction for each table
restoration inside the global single-transaction.

Dimitris

Attachments:

v3-0001-pg_restore-clean-data-only.patchtext/x-patch; name=v3-0001-pg_restore-clean-data-only.patchDownload
From 90341ba3742ebf5679766a48799c7fb3975c6e3a Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <jimis@qt.io>
Date: Sat, 12 Apr 2025 01:59:45 +0200
Subject: [PATCH v3 1/2] pg_restore --clean --data-only

Issues a TRUNCATE before COPYing the data into the tables, within a
transaction.

As a result it avoids logging to the WAL, when combined with
wal_level=minimal.
---
 doc/src/sgml/ref/pg_restore.sgml     | 15 +++++++++++++++
 src/bin/pg_dump/pg_backup.h          |  1 +
 src/bin/pg_dump/pg_backup_archiver.c | 12 ++++++++++--
 src/bin/pg_dump/pg_restore.c         | 15 +++++++++++----
 src/bin/pg_dump/t/001_basic.pl       |  6 ------
 5 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 8c88b07dcc8..86aacab588d 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -121,18 +121,33 @@ PostgreSQL documentation
       <listitem>
        <para>
         Before restoring database objects, issue commands
         to <command>DROP</command> all the objects that will be restored.
         This option is useful for overwriting an existing database.
         If any of the objects do not exist in the destination database,
         ignorable error messages will be reported,
         unless <option>--if-exists</option> is also specified.
        </para>
+       <para>
+         If the schema is not restored but the data is restored (for example
+         using <option>--data-only</option>
+         or <option>--section=data</option>), then instead of DROP, a TRUNCATE
+         will be attempted before COPYing the data.  So if you want to
+         overwrite an existing database without re-defining the schema, then
+         issue <option>--data-only --clean</option>.
+       </para>
+       <para>
+         Together with <option>--parallel</option> it is a high performance
+         way to load the tables, as it avoids logging to the WAL (if the
+         server is configured with <option>wal_level=minimal</option>).
+         Warning: foreign key constraints might cause table truncation to
+         fail.
+       </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-C</option></term>
       <term><option>--create</option></term>
       <listitem>
        <para>
         Create the database before restoring into it.
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index af0007fb6d2..9428b362c92 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -99,18 +99,19 @@ typedef struct _restoreOptions
 	int			noOwner;		/* Don't try to match original object owner */
 	int			noTableAm;		/* Don't issue table-AM-related commands */
 	int			noTablespace;	/* Don't issue tablespace-related commands */
 	int			disable_triggers;	/* disable triggers during data-only
 									 * restore */
 	int			use_setsessauth;	/* Use SET SESSION AUTHORIZATION commands
 									 * instead of OWNER TO */
 	char	   *superuser;		/* Username to use as superuser */
 	char	   *use_role;		/* Issue SET ROLE to this */
+	int			clean;
 	int			dropSchema;
 	int			disable_dollar_quoting;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
 	int			no_policies;	/* Skip row security policies */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 197c1295d93..157ce5b82fc 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -978,34 +978,42 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
 
 					pg_log_info("processing data for table \"%s.%s\"",
 								te->namespace, te->tag);
 
 					/*
 					 * In parallel restore, if we created the table earlier in
 					 * this run (so that we know it is empty) and we are not
 					 * restoring a load-via-partition-root data item then we
 					 * wrap the COPY in a transaction and precede it with a
-					 * TRUNCATE.  If wal_level is set to minimal this prevents
+					 * TRUNCATE.
+					 *
+					 * Likewise if the table was pre-existing and the data is
+					 * being restored with --clean.
+					 *
+					 * If wal_level is set to minimal this prevents
 					 * WAL-logging the COPY.  This obtains a speedup similar
 					 * to that from using single_txn mode in non-parallel
 					 * restores.
 					 *
 					 * We mustn't do this for load-via-partition-root cases
 					 * because some data might get moved across partition
 					 * boundaries, risking deadlock and/or loss of previously
 					 * loaded data.  (We assume that all partitions of a
 					 * partitioned table will be treated the same way.)
 					 */
-					use_truncate = is_parallel && te->created &&
+					use_truncate = ((is_parallel && te->created) || ropt->clean) &&
 						!is_load_via_partition_root(te);
 
 					if (use_truncate)
 					{
+						pg_log_debug("BEGIN transaction and TRUNCATE table \"%s.%s\"",
+									 te->namespace, te->tag);
+
 						/*
 						 * Parallel restore is always talking directly to a
 						 * server, so no need to see if we should issue BEGIN.
 						 */
 						StartTransaction(&AH->public);
 
 						/*
 						 * Issue TRUNCATE with ONLY so that child tables are
 						 * not wiped.
diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index c4b6214d618..002e62d13e0 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -206,19 +206,19 @@ main(int argc, char **argv)
 	while ((c = getopt_long(argc, argv, "acCd:ef:F:gh:I:j:lL:n:N:Op:P:RsS:t:T:U:vwWx1",
 							cmdopts, NULL)) != -1)
 	{
 		switch (c)
 		{
 			case 'a':			/* Dump data only */
 				data_only = true;
 				break;
 			case 'c':			/* clean (i.e., drop) schema prior to create */
-				opts->dropSchema = 1;
+				opts->clean = 1;
 				break;
 			case 'C':
 				opts->createDB = 1;
 				break;
 			case 'd':
 				opts->cparams.dbname = pg_strdup(optarg);
 				break;
 			case 'e':
 				opts->exit_on_error = true;
@@ -419,21 +419,18 @@ main(int argc, char **argv)
 
 	/* reject conflicting "with-" and "no-" options */
 	if (with_data && no_data)
 		pg_fatal("options --with-data and --no-data cannot be used together");
 	if (with_schema && no_schema)
 		pg_fatal("options --with-schema and --no-schema cannot be used together");
 	if (with_statistics && no_statistics)
 		pg_fatal("options --with-statistics and --no-statistics cannot be used together");
 
-	if (data_only && opts->dropSchema)
-		pg_fatal("options -c/--clean and -a/--data-only cannot be used together");
-
 	if (opts->single_txn && opts->txn_size > 0)
 		pg_fatal("options -1/--single-transaction and --transaction-size cannot be used together");
 
 	/*
 	 * -C is not compatible with -1, because we can't create a database inside
 	 * a transaction block.
 	 */
 	if (opts->createDB && opts->single_txn)
 		pg_fatal("options -C/--create and -1/--single-transaction cannot be used together");
@@ -450,18 +447,28 @@ main(int argc, char **argv)
 	 * caused an error in one of the checks above.
 	 */
 	opts->dumpData = ((opts->dumpData && !schema_only && !statistics_only) ||
 					  (data_only || with_data)) && !no_data;
 	opts->dumpSchema = ((opts->dumpSchema && !data_only && !statistics_only) ||
 						(schema_only || with_schema)) && !no_schema;
 	opts->dumpStatistics = ((opts->dumpStatistics && !schema_only && !data_only) ||
 							(statistics_only || with_statistics)) && !no_statistics;
 
+	/*
+	 * If --clean has been issued and the SQL schema is being restored, then
+	 * clear the schema first and recreate it. No need to clear data.
+	 */
+	if (opts->clean && opts->dumpSchema)
+	{
+		opts->clean = 0;
+		opts->dropSchema = 1;
+	}
+
 	opts->disable_triggers = disable_triggers;
 	opts->enable_row_security = enable_row_security;
 	opts->noDataForFailedTables = no_data_for_failed_tables;
 	opts->noTableAm = outputNoTableAm;
 	opts->noTablespace = outputNoTablespaces;
 	opts->use_setsessauth = use_setsessauth;
 	opts->no_comments = no_comments;
 	opts->no_policies = no_policies;
 	opts->no_publications = no_publications;
diff --git a/src/bin/pg_dump/t/001_basic.pl b/src/bin/pg_dump/t/001_basic.pl
index 84ca25e17d6..d4e6748be87 100644
--- a/src/bin/pg_dump/t/001_basic.pl
+++ b/src/bin/pg_dump/t/001_basic.pl
@@ -95,24 +95,18 @@ command_fails_like(
 	[ 'pg_restore', '-d', 'xxx', '-f', 'xxx' ],
 	qr/\Qpg_restore: error: options -d\/--dbname and -f\/--file cannot be used together\E/,
 	'pg_restore: options -d/--dbname and -f/--file cannot be used together');
 
 command_fails_like(
 	[ 'pg_dump', '-c', '-a' ],
 	qr/\Qpg_dump: error: options -c\/--clean and -a\/--data-only cannot be used together\E/,
 	'pg_dump: options -c/--clean and -a/--data-only cannot be used together');
 
-command_fails_like(
-	[ 'pg_restore', '-c', '-a', '-f -' ],
-	qr/\Qpg_restore: error: options -c\/--clean and -a\/--data-only cannot be used together\E/,
-	'pg_restore: options -c/--clean and -a/--data-only cannot be used together'
-);
-
 command_fails_like(
 	[ 'pg_dump', '--if-exists' ],
 	qr/\Qpg_dump: error: option --if-exists requires option -c\/--clean\E/,
 	'pg_dump: option --if-exists requires option -c/--clean');
 
 command_fails_like(
 	[ 'pg_dump', '-j3' ],
 	qr/\Qpg_dump: error: parallel backup only supported by the directory format\E/,
 	'pg_dump: parallel backup only supported by the directory format');
-- 
2.49.0

v3-0002-Add-new-test-file-with-pg_restore-test-cases.patchtext/x-patch; name=v3-0002-Add-new-test-file-with-pg_restore-test-cases.patchDownload
From 159cce30910425914648fde430deeda5be7a41e3 Mon Sep 17 00:00:00 2001
From: Dimitrios Apostolou <jimis@qt.io>
Date: Mon, 16 Jun 2025 02:36:21 +0200
Subject: [PATCH v3 2/2] Add new test file with pg_restore test cases

Test restoring from custom format dumps, with and without offsets in the
table of contents - the latter happens when pg_dump writes to stdout.

Compare restored database contents to the original database.

Test parallel restore -j2 as well as sequential.

Test --single-transaction.

Test --data-only and --clean --data-only.
---
 src/bin/pg_dump/t/007_pg_restore.pl | 150 ++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)
 create mode 100644 src/bin/pg_dump/t/007_pg_restore.pl

diff --git a/src/bin/pg_dump/t/007_pg_restore.pl b/src/bin/pg_dump/t/007_pg_restore.pl
new file mode 100644
index 00000000000..402d97fad81
--- /dev/null
+++ b/src/bin/pg_dump/t/007_pg_restore.pl
@@ -0,0 +1,150 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+my $port = $node->port;
+
+
+# TODO test that --section=post --clean, does not clean any tables.
+
+# Create database
+$node->safe_psql('postgres','
+			CREATE DATABASE db1;
+			\c db1
+			CREATE TABLE t1 (
+				i  integer
+			);
+			INSERT INTO t1 VALUES (1), (2), (3), (4);
+			CREATE TABLE t2 (
+				t  text
+			);
+			INSERT INTO t2 VALUES (\'a\'), (\'bb\'), (\'ccc\');');
+
+# Function to compare two databases of the above kind.
+sub compare_db_contents
+{
+	my ($db1, $db2, $should_match) = @_;
+	$should_match = 1
+	  unless defined $should_match;
+
+	my $query = "
+		SELECT * FROM t1 ORDER BY i;
+		SELECT * FROM t2 ORDER BY t;
+	";
+	my $result1 = $node->safe_psql($db1, $query);
+	my $result2 = $node->safe_psql($db2, $query);
+
+	if ($should_match)
+	{
+		is($result2, $result1, "The database contents should match");
+	} else {
+		isnt($result2, $result1, "The database contents should NOT match");
+	}
+}
+
+sub test_pg_restore
+{
+	my $dump_file = shift;
+	my $file_basename = File::Basename::basename($dump_file);
+
+	my @cmd = ( 'pg_restore', '--dbname', 'db1_restored' );
+	my $cmd_s = "pg_restore";
+
+	# Optionally this function takes a hash as last parameter.
+	if ($_[0])
+	{
+		my (%hash_args) = @_;
+		shift;
+		my @extra_args = $hash_args{'extra_args'};
+		if (@extra_args)
+		{
+			@cmd   = (@cmd, @extra_args);
+			$cmd_s = "$cmd_s @extra_args";
+		}
+	}
+
+	$node->safe_psql(
+		'postgres',
+		'DROP DATABASE IF EXISTS db1_restored;');
+	ok(1, "clean up");
+
+	# Restore into a new database
+	$node->safe_psql('postgres',
+					 'CREATE DATABASE db1_restored;');
+	$node->command_ok([@cmd,
+					   $dump_file],
+					  "$cmd_s  $file_basename");
+
+	# Verify restored db matches the dumped one
+	compare_db_contents('db1', 'db1_restored');
+
+	# Restore again with --data-only.
+	# Now the rows should be duplicate, the databases shouldn't match.
+	$node->command_ok([@cmd, '--data-only',
+					   $dump_file],
+					  "$cmd_s --data-only  $file_basename");
+	compare_db_contents('db1', 'db1_restored', 0);
+
+	# Restore again with --data-only --clean.
+	# The database contents should match.
+	$node->command_ok([@cmd, '--clean', '--data-only',
+					   $dump_file],
+					  "$cmd_s --clean --data-only  $file_basename");
+	compare_db_contents('db1', 'db1_restored');
+
+	# Restore from stdin.
+	my $stderr;
+	my $result = $node->run_log([@cmd, '--clean', '--data-only'],
+								('<'  => $dump_file,
+								 '2>' => \$stderr));
+	if (grep {/^-j/} @cmd)
+	{
+		ok(!$result, "should fail:  $cmd_s --clean --data-only  < $file_basename");
+		chomp($stderr);
+		like($stderr,
+		   '/parallel restore from standard input is not supported$/',
+		   "stderr: parallel restore from standard input is not supported");
+	}
+	else
+	{
+		ok($result, "$cmd_s --clean --data-only  < $file_basename");
+		compare_db_contents('db1', 'db1_restored');
+	}
+}
+
+
+# Basic dump
+my $d1 = "$tempdir/dump_file";
+$node->command_ok(['pg_dump', '--format=custom',
+				   '--file', $d1, 'db1'],
+				  'pg_dump --format=custom --file dump_file');
+# Dump also to stdout, as the TOC doesn't contain offsets.
+my $d2 = "$tempdir/dump_file_stdout";
+my $result = $node->run_log(['pg_dump', '--format=custom', 'db1'],
+							('>' => $d2));
+ok($result, "pg_dump --format=custom      > dump_file_stdout");
+
+
+# Run all pg_restore testcases against each archive.
+test_pg_restore($d1);
+test_pg_restore($d2);
+test_pg_restore($d1, ('extra_args' => ('-j2')));
+test_pg_restore($d2, ('extra_args' => ('-j2')));
+test_pg_restore($d1, ('extra_args' => ('--single-transaction')));
+test_pg_restore($d2, ('extra_args' => ('--single-transaction')));
+
+
+$node->stop('fast');
+
+done_testing();
-- 
2.49.0

In reply to: Greg Sabino Mullane (#4)
Re: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL

On Friday 2025-06-13 03:42, Greg Sabino Mullane wrote:

I think the overall idea is sound. But we need a better solution for the truncate fk failure. Can we introspect somehow and do a truncate or do a delete as necessary? I don't like the idea of simply ignoring the constraint, or of throwing an error.

Sorry I haven't answered, but I don't really know how to catch the error
of TRUNCATE in the C code and do a DELETE instead. I was hoping someone
would chime in and suggest an approach.

Furthermore, silently replacing the TRUNCATE with DELETE will negate all
the performance benefits we are aiming for with this patch (avoiding WAL
logging).

In pg_restore AFAICT the suggested way to handle similar FK errors is to
add another option: --disable-triggers. This works for other operations
besides TRUNCATE, and one can break referential integrity in order to
speed up loading of data.

As such, I think the proper solution would be to change TRUNCATE
implementation in the backend, in order to ignore FK constraints when
the triggers are off. How correct do you think this is? And any ideas on
how to approach the implementation?

Thanks,
Dimitris