PATCH: pg_dump to support "on conflict do update"
Hi hackers,
Here's the patch (against the latest master) that will make pg_dump support
"on conflict do update" .
I've used this patch on v16 for our company's CI (on Github Actions), and
it works perfectly fine.
Users would be able to use it like this:
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
* --on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \*
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dump
There are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion,
this is fine. It's the user's choice if they want to apply it to one or all
tables. We could make the options more powerful (i.e. support multi-tables)
but it would add a lot of complexity.
2. -on-conflict-target-columns should have accepted a list of strings
instead. I'm working on it but I'd like an early review of the overall
patch first.
3. I can't figure out how to add a test for pg_dump. Any pointer would be
appreciated here.
Please help me review this patch as it's my first time submitting a patch
to Postgres.
Thank you!
Tanin
Attachments:
0001-pg_dump-to-support-on-conflict-update.patchapplication/octet-stream; name=0001-pg_dump-to-support-on-conflict-update.patchDownload
From 096f95f79bcb74a3b4996161fc16e5dea3abce7f Mon Sep 17 00:00:00 2001
From: tanin <@tanin>
Date: Fri, 2 May 2025 22:56:51 -0700
Subject: [PATCH] pg_dump to support 'on conflict update'
---
doc/src/sgml/ref/pg_dump.sgml | 26 ++++++++++++++++++++++++++
src/bin/pg_dump/pg_dump.c | 24 ++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c10bca63e55..b9f4bedc812 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1234,6 +1234,32 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--on-conflict-target-columns</option></term>
+ <listitem>
+ <para>
+ Add <literal>ON CONFLICT (target-columns) DO UPDATE ...</literal> to
+ <command>INSERT</command> commands.
+ This option must be used with <option>--on-conflict-update-clause</option>, and with
+ <option>--inserts</option>, <option>--column-inserts</option> or <option>--rows-per-insert</option>.
+ The option cannot be used with <option>--on-conflict-do-nothing</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--on-conflict-update-clause</option></term>
+ <listitem>
+ <para>
+ Add <literal>ON CONFLICT ... DO UPDATE update-clause</literal> to
+ <command>INSERT</command> commands.
+ This option must be used with <option>--on-conflict-target-columns</option>, and with
+ <option>--inserts</option>, <option>--column-inserts</option> or <option>--rows-per-insert</option>.
+ The option cannot be used with <option>--on-conflict-do-nothing</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>--quote-all-identifiers</option></term>
<listitem>
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e2e7975b34e..1efcc491d58 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -183,6 +183,9 @@ static SimpleOidList extension_include_oids = {NULL, NULL};
static SimpleStringList extension_exclude_patterns = {NULL, NULL};
static SimpleOidList extension_exclude_oids = {NULL, NULL};
+static const char *on_conflict_target_columns = NULL;
+static const char *on_conflict_update_clause = NULL;
+
static const CatalogId nilCatalogId = {0, 0};
/* override for standard extra_float_digits setting */
@@ -521,6 +524,8 @@ main(int argc, char **argv)
{"with-schema", no_argument, NULL, 23},
{"with-statistics", no_argument, NULL, 24},
{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
+ {"on-conflict-target-columns", required_argument, NULL, 25},
+ {"on-conflict-update-clause", required_argument, NULL, 26},
{"rows-per-insert", required_argument, NULL, 10},
{"include-foreign-data", required_argument, NULL, 11},
{"table-and-children", required_argument, NULL, 12},
@@ -796,6 +801,13 @@ main(int argc, char **argv)
case 24:
with_statistics = true;
+
+ case 25:
+ on_conflict_target_columns = pg_strdup(optarg);
+ break;
+
+ case 26:
+ on_conflict_update_clause = pg_strdup(optarg);
break;
default:
@@ -883,6 +895,12 @@ main(int argc, char **argv)
if (dopt.do_nothing && dopt.dump_inserts == 0)
pg_fatal("option --on-conflict-do-nothing requires option --inserts, --rows-per-insert, or --column-inserts");
+ if (dopt.do_nothing && (on_conflict_target_columns != NULL || on_conflict_update_clause != NULL))
+ pg_fatal("option --on-conflict-do-nothing cannot be used with --on-conflict-target-columns and --on-conflict-do-update-clause");
+
+ if ((on_conflict_target_columns != NULL) ^ (on_conflict_update_clause != NULL))
+ pg_fatal("option --on-conflict-target-columns and --on-conflict-update-clause must be provided together.");
+
/* Identify archive format to emit */
archiveFormat = parseArchiveFormat(format, &archiveMode);
@@ -1308,6 +1326,8 @@ help(const char *progname)
printf(_(" --no-toast-compression do not dump TOAST compression methods\n"));
printf(_(" --no-unlogged-table-data do not dump unlogged table data\n"));
printf(_(" --on-conflict-do-nothing add ON CONFLICT DO NOTHING to INSERT commands\n"));
+ printf(_(" --on-conflict-target-columns add ON CONFLICT (target-columns) DO UPDATE ... to INSERT commands. A comma-separated list of columns\n"));
+ printf(_(" --on-conflict-update-clause add the clause ON CONFLICT ... DO UPDATE (clause) to INSERT commands\n"));
printf(_(" --quote-all-identifiers quote all identifiers, even if not key words\n"));
printf(_(" --rows-per-insert=NROWS number of rows per INSERT; implies --inserts\n"));
printf(_(" --section=SECTION dump named section (pre-data, data, or post-data)\n"));
@@ -2681,6 +2701,8 @@ dumpTableData_insert(Archive *fout, const void *dcontext)
{
if (dopt->do_nothing)
archputs(" ON CONFLICT DO NOTHING;\n", fout);
+ else if (on_conflict_target_columns != NULL && on_conflict_update_clause != NULL)
+ archprintf(fout, " ON CONFLICT (%s) DO UPDATE SET %s;\n", on_conflict_target_columns, on_conflict_update_clause);
else
archputs(";\n", fout);
/* Reset the row counter */
@@ -2701,6 +2723,8 @@ dumpTableData_insert(Archive *fout, const void *dcontext)
{
if (dopt->do_nothing)
archputs(" ON CONFLICT DO NOTHING;\n", fout);
+ else if (on_conflict_target_columns != NULL && on_conflict_update_clause != NULL)
+ archprintf(fout, " ON CONFLICT (%s) DO UPDATE SET %s;\n", on_conflict_target_columns, on_conflict_update_clause);
else
archputs(";\n", fout);
}
--
2.39.5 (Apple Git-154)
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine.
Users would be able to use it like this:
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
--on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dumpThere are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion, this is fine.
I don't think that is fine. I think it would make the feature unusable for most cases.
At the very least, there would have to be a way to specify which tables are affected.
Yours,
Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes:
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
There are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion, this is fine.
I don't think that is fine. I think it would make the feature unusable for most cases.
At the very least, there would have to be a way to specify which tables are affected.
Yeah. I kind of feel that this entire idea is misguided. pg_dump is
not an ETL tool, and bolting ETL-ish features onto it one at a time
seems destined to end in a mess. But it's particularly awful that
the proposed switch design would apply to all tables. That pretty
much makes it useless except in a dump that selects only one table.
It's also useless except in a --data-only dump, since if we create
the target table then we know perfectly well that it's empty to
start with. So at this point you barely need pg_dump at all,
as opposed to some other tool that does a light syntactic
transformation on the result of COPY.
I think it could be interesting to try to build something that
*is* an ETL tool and is meant for cases like partial data loads.
But pg_dump is serving more than enough masters already. Let's
not add this to its plate.
regards, tom lane