PATCH: pg_dump to support "on conflict do update"
Hi hackers,
Here's the patch (against the latest master) that will make pg_dump support
"on conflict do update" .
I've used this patch on v16 for our company's CI (on Github Actions), and
it works perfectly fine.
Users would be able to use it like this:
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
* --on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \*
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dump
There are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion,
this is fine. It's the user's choice if they want to apply it to one or all
tables. We could make the options more powerful (i.e. support multi-tables)
but it would add a lot of complexity.
2. -on-conflict-target-columns should have accepted a list of strings
instead. I'm working on it but I'd like an early review of the overall
patch first.
3. I can't figure out how to add a test for pg_dump. Any pointer would be
appreciated here.
Please help me review this patch as it's my first time submitting a patch
to Postgres.
Thank you!
Tanin
Attachments:
0001-pg_dump-to-support-on-conflict-update.patchapplication/octet-stream; name=0001-pg_dump-to-support-on-conflict-update.patchDownload+50-1
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine.
Users would be able to use it like this:
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
--on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dumpThere are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion, this is fine.
I don't think that is fine. I think it would make the feature unusable for most cases.
At the very least, there would have to be a way to specify which tables are affected.
Yours,
Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes:
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
There are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion, this is fine.
I don't think that is fine. I think it would make the feature unusable for most cases.
At the very least, there would have to be a way to specify which tables are affected.
Yeah. I kind of feel that this entire idea is misguided. pg_dump is
not an ETL tool, and bolting ETL-ish features onto it one at a time
seems destined to end in a mess. But it's particularly awful that
the proposed switch design would apply to all tables. That pretty
much makes it useless except in a dump that selects only one table.
It's also useless except in a --data-only dump, since if we create
the target table then we know perfectly well that it's empty to
start with. So at this point you barely need pg_dump at all,
as opposed to some other tool that does a light syntactic
transformation on the result of COPY.
I think it could be interesting to try to build something that
*is* an ETL tool and is meant for cases like partial data loads.
But pg_dump is serving more than enough masters already. Let's
not add this to its plate.
regards, tom lane