COPY ON_CONFLICT TABLE; save duplicated record to another table.

Started by jian he2 months ago10 messageshackers
Jump to latest
#1jian he
jian.universality@gmail.com

Hi,
This is for v20.

Reference: https://web.archive.org/web/20240328094030/https://riggs.business/blog/f/postgresql-todo-2023
COPY enhancement:
Detect duplicate rows and redirect them to a separate table without
aborting the load.

While reviewing this TODO, I quickly noticed this idea closely aligns with
https://commitfest.postgresql.org/patch/4817.

Both ideas share common elements: allowing a user-specified table,
validating its metadata, and storing rows in it.
Based on that, I spent some time working on the implementation.

Proposed syntax:

COPY FROM (ON_CONFLICT TABLE, CONFLICT_TABLE conflict_tbl);

The CONFLICT_TABLE requires exactly four columns: COPY target table, COPY
filename, the line number of the duplicate, and the duplicate record itself.

This structure is fixed, a pre-defined data type is unnecessary. Validation is
based solely on the column data types (pg_attribute.atttypid) rather than their
names (pg_attribute.attname). The expected types are OID, TEXT, INT8, and TEXT,
respectively.

This uses INSERT ON CONFLICT infrastructure under the hood.

Demo:

CREATE TABLE t_copy_tbl(a int, b int, c text);
CREATE TABLE err_tbl1(copy_tbl oid, filename text, lineno bigint, line text);
CREATE UNIQUE INDEX ON t_copy_tbl (c);
COPY t_copy_tbl(b,a, c) FROM STDIN (DELIMITER ',', ON_CONFLICT TABLE,
CONFLICT_TABLE err_tbl1, log_verbosity verbose);
4,17,aaaaaa
6,11,aaaaaa
11,1,xxxxxxxx
12,1,xxxxxxxx
13,1,xxxxxxxx
\.

table err_tbl1 ;
copy_tbl | filename | lineno | line
----------+----------+--------+---------------
18231 | STDIN | 2 | 6,11,aaaaaa
18231 | STDIN | 4 | 12,1,xxxxxxxx
18231 | STDIN | 5 | 13,1,xxxxxxxx
(3 rows)

(I need to double-check the exclusion unique constraint)

Comments are welcome!

--
jian
https://www.enterprisedb.com/

Attachments:

v1-0001-COPY-ON_CONFLICT-TABLE.patchtext/x-patch; charset=US-ASCII; name=v1-0001-COPY-ON_CONFLICT-TABLE.patchDownload+863-16
#2Jim Jones
jim.jones@uni-muenster.de
In reply to: jian he (#1)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hi Jian

On 25/04/2026 06:12, jian he wrote:

Comments are welcome!

Thanks for the patch!

A few comments:

== double defGet in function name ==

defGetdefGetCopyOnConflictChoice should probably be
defGetCopyOnConflictChoice

== identical error message in ProcessCopyOptions() ==

The same error message is raised for conflict_tbl_specified and
!conflict_tbl_specified

ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s requires %s option", "CONFLICT_TABLE", "ON_CONFLICT"));

== unused enum in CopyOnErrorChoice ==

The enum COPY_ON_ERROR_TABLE is introduced, but is never used anywhere.

== unconditional ExecOpenIndices() ==

ExecOpenIndices(resultRelInfo, true);

I'm not very familiar with this part of the code, but it looks like that
this change would affect other COPY FROM operations. If I'm mistaken, a
comment would add some value here. Or perhaps something like:

ExecOpenIndices(resultRelInfo, cstate->opts.on_conflict != ONCONFLICT_NONE);

== ON_CONFLICT TABLE not rejected in COPY TO ==

CONFLICT_TABLE is silently ignored, even if the table does not exist:

postgres=# COPY t TO '/dev/null' (ON_CONFLICT TABLE, CONFLICT_TABLE
table_does_not_exist);
COPY 1

I guess adding a is_from to defGetdefGetCopyOnConflictChoice() is the
way to go.

== redundant condition in CopyFrom() ==

The second condition seems unnecessary, as the previous if already tests
for cstate->opts.on_conflict == ONCONFLICT_NONE:

else if (resultRelInfo->ri_NumIndices > 0 &&
cstate->opts.on_conflict != ONCONFLICT_NONE)

== typos ==
regular realtion > regular relation
vertification > verification
resouces > resources
unqiue > unique

== unnecessary pnstrdup (?) ==

newvalues[i] = CStringGetTextDatum(pnstrdup(cstate->line_buf.data,
cstate->line_buf.len));

Is the duplication really necessary? Wouldn't it suffice to use
cstring_to_text_with_len() instead? Something like:

newvalues[i] =
PointerGetDatum(cstring_to_text_with_len(cstate->line_buf.data,
cstate->line_buf.len));

or even

newvalues[i] = CStringGetTextDatum(cstate->line_buf.data)

I'll check the documentation after we get more feedback on the syntax.

Thanks!
Best, Jim

#3Zsolt Parragi
zsolt.parragi@percona.com
In reply to: jian he (#1)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hello!

I tried the patch and found a few issues.

1. Two of them are null pointer dereference crashes, one with
partitioned tables:

CREATE TABLE part_t (a int PRIMARY KEY, b text) PARTITION BY RANGE (a);
CREATE TABLE part_t_p1 PARTITION OF part_t FOR VALUES FROM (0) TO (1000);
CREATE TABLE conflict_log (
rel oid,
file_name text,
line_no bigint,
raw_line text
);

INSERT INTO part_t VALUES (1, 'pre-existing');
COPY part_t (a, b) FROM stdin WITH (on_conflict 'table',
conflict_table 'conflict_log');
2 row-two
1 dup
3 row-three
\.

2. And another with repeateable reads:

CREATE TABLE t_rr (a int PRIMARY KEY, b text);
CREATE TABLE conflict_log (rel oid, fname text, ln bigint, raw text);
INSERT INTO t_rr VALUES (1, 'pre-committed');

BEGIN ISOLATION LEVEL REPEATABLE READ;
COPY t_rr FROM stdin WITH (on_conflict 'table', conflict_table 'conflict_log');
1 dup-row
\.

3. There's also a possible data loss scenario, reports 3 copied 0 actual:

CREATE TABLE conf_log (
relname oid,
fname text,
lineno bigint,
rawline text
);

CREATE TABLE no_idx_tgt (id int, payload text);

CREATE FUNCTION noop_trig() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
RETURN NEW;
END;
$$;

CREATE TRIGGER noop_before BEFORE INSERT ON no_idx_tgt
FOR EACH ROW EXECUTE FUNCTION noop_trig();

COPY no_idx_tgt (id, payload) FROM STDIN
WITH (ON_CONFLICT TABLE, CONFLICT_TABLE conf_log);
1 alpha
2 beta
3 gamma
\.

SELECT 'A: no_idx_tgt count' AS scenario, count(*) AS rows FROM no_idx_tgt;
SELECT 'A: conf_log count' AS scenario, count(*) AS rows FROM conf_log;
SELECT * FROM no_idx_tgt ORDER BY id;

4. Shouldn't the following error out?

CREATE TABLE t (a int PRIMARY KEY, b text);
COPY t TO '/dev/null' (ON_CONFLICT TABLE, CONFLICT_TABLE no_such_table);

#4jian he
jian.universality@gmail.com
In reply to: Zsolt Parragi (#3)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hi.

The attached patch should address most, if not all, of the issues you
both raised.

As explained in [1]/messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-EUTp7w@mail.gmail.com, we can export ExecInsert to let it perform the
main insertion work.
To allow ExecInsert to handle the remaining tasks, we need to carefuly manage
the lifecycle of constructed CopyFromStateData->ModifyTableContext (including
ModifyTableContext->EState): populate it, use it, and then release it.

Since ExecInsert already contains the necessary infrastructure for INSERT ON
CONFLICT DO NOTHING/SELECT, exporting it avoids duplicating that logic in
src/backend/commands/copyfrom.c (which is what v1 of the patch did).

[1]: /messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-EUTp7w@mail.gmail.com

The exclusion unique constraint issue is still not resolved.... but,
overall v2 is better than v1, IMHO.

--
jian
https://www.enterprisedb.com/

Attachments:

v2-0002-COPY-ON_CONFLICT-TABLE.patchtext/x-patch; charset=US-ASCII; name=v2-0002-COPY-ON_CONFLICT-TABLE.patchDownload+1025-15
v2-0001-export-ExecInsert.patchtext/x-patch; charset=US-ASCII; name=v2-0001-export-ExecInsert.patchDownload+46-40
#5Jim Jones
jim.jones@uni-muenster.de
In reply to: jian he (#4)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hi Jian

On 11/05/2026 05:13, jian he wrote:

The attached patch should address most, if not all, of the issues you
both raised.

Thanks for the update. All my points were addressed.

As explained in [1], we can export ExecInsert to let it perform the
main insertion work.
To allow ExecInsert to handle the remaining tasks, we need to carefuly manage
the lifecycle of constructed CopyFromStateData->ModifyTableContext (including
ModifyTableContext->EState): populate it, use it, and then release it.

Since ExecInsert already contains the necessary infrastructure for INSERT ON
CONFLICT DO NOTHING/SELECT, exporting it avoids duplicating that logic in
src/backend/commands/copyfrom.c (which is what v1 of the patch did).

[1]: /messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-
EUTp7w@mail.gmail.com

The exclusion unique constraint issue is still not resolved.... but,
overall v2 is better than v1, IMHO.

One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.

/* Check on_conflict */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_conflict !=
ONCONFLICT_NONE)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "ON_CONFLICT TABLE")));

postgres=# COPY t FROM STDIN (FORMAT binary, ON_CONFLICT TABLE,
CONFLICT_TABLE ctbl);
ERROR: cannot specify ON_CONFLICT TABLE in BINARY mode

Right now the error is rather vague:

ERROR: COPY file signature not recognized

What do you think?

Thanks!

Best, Jim

#6Zsolt Parragi
zsolt.parragi@percona.com
In reply to: Jim Jones (#5)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hello!

One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.

It is possible to crash the current patch with binary mode, that check
is definitely needed.

+ MakeTransitionCaptureState(cstate->conflictRel->trigdesc,
+    RelationGetRelid(cstate->conflictRel),
+    CMD_INSERT);

Shouldn't this update mtstate->mt_transition_capture?

+ if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
...
+ if (conflict_mstate->fireBSTriggers)
+ {
+ ExecBSInsertTriggers(conflict_mstate->ps.state,
conflict_mstate->rootResultRelInfo);
+
+ conflict_mstate->fireBSTriggers = false;
+ }
+

and

+ if (cstate->num_conflicts > 0)
+ {
...
+ /* Execute AFTER STATEMENT insertion triggers */
+ ExecASInsertTriggers(cstate->mtcontext->estate,
+ on_conflict_mtstate->rootResultRelInfo,
+ on_conflict_mtstate->mt_transition_capture);

* Doesn't statements typically fire triggers unconditionally? INSERT
ON CONFLICT DO NOTHING; fires BS+AS triggers even if it doesn't insert
any rows.
* Isn't firing a before statement trigger after some before/after row
triggers were already fired (for non conflicting rows) strange?

+ if (valid_col_count != 4)
+ errdetail_msg = _("The conflict_table is incomplete; exactly four
columns are required.");

if valid_col_count > 4, is it still incomplete, shouldn't the error
message change in that case?

#7jian he
jian.universality@gmail.com
In reply to: Zsolt Parragi (#6)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

On Tue, May 12, 2026 at 4:40 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

Hello!

One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.

It is possible to crash the current patch with binary mode, that check
is definitely needed.

binary mode lacks the concept of a line number or the whole line string.
Since cstate->line_buf is null in binary mode, it will segfault in
```CStringGetTextDatum(cstate->line_buf.data);```

Supporting ON_CONFLICT in binary mode is not trivial.
Since ON_ERROR IGNORE also cannot be used in binary mode, not
supporting ON_CONFLICT in binary mode should be fine, IMHO.

+ MakeTransitionCaptureState(cstate->conflictRel->trigdesc,
+    RelationGetRelid(cstate->conflictRel),
+    CMD_INSERT);

Shouldn't this update mtstate->mt_transition_capture?

Attached v3 fix this issue.

+ if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
...
+ if (conflict_mstate->fireBSTriggers)
+ {
+ ExecBSInsertTriggers(conflict_mstate->ps.state,
conflict_mstate->rootResultRelInfo);
+
+ conflict_mstate->fireBSTriggers = false;
+ }
+

and

+ if (cstate->num_conflicts > 0)
+ {
...
+ /* Execute AFTER STATEMENT insertion triggers */
+ ExecASInsertTriggers(cstate->mtcontext->estate,
+ on_conflict_mtstate->rootResultRelInfo,
+ on_conflict_mtstate->mt_transition_capture);

* Doesn't statements typically fire triggers unconditionally? INSERT
ON CONFLICT DO NOTHING; fires BS+AS triggers even if it doesn't insert
any rows.
* Isn't firing a before statement trigger after some before/after row
triggers were already fired (for non conflicting rows) strange?

Ok. I changed to
Statement-level triggers on the CONFLICT_TABLE are fired
unconditionally, regardless of whether an error occurred or not.
Each row inserted into the CONFLICT_TABLE will fire both the BEFORE
INSERT FOR EACH ROW and AFTER INSERT FOR EACH ROW triggers.

+ if (valid_col_count != 4)
+ errdetail_msg = _("The conflict_table is incomplete; exactly four
columns are required.");

if valid_col_count > 4, is it still incomplete, shouldn't the error
message change in that case?

With v3, whether there are more or fewer columns on the
conflict_table, the error message is now the same:

+ERROR:  cannot use relation "err_tbl1" for COPY on_conflict error saving
+DETAIL:  The conflict_table should only have four columns
+HINT:  The conflict_table must contain exactly four columns with data
types, in order: OID, TEXT, BIGINT, TEXT

Regression tests for permission checks have also been added.

--
jian
https://www.enterprisedb.com/

Attachments:

v3-0001-export-ExecInsert.patchtext/x-patch; charset=US-ASCII; name=v3-0001-export-ExecInsert.patchDownload+46-40
v3-0002-COPY-ON_CONFLICT-TABLE.patchtext/x-patch; charset=US-ASCII; name=v3-0002-COPY-ON_CONFLICT-TABLE.patchDownload+1111-15
#8Jim Jones
jim.jones@uni-muenster.de
In reply to: jian he (#7)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

Hi Jian

On 12/05/2026 10:15, jian he wrote:

Attached v3 fix this issue.

== closing relation too early ==

In noticed that in BeginCopyFrom() you open cstate->conflictRel and
close it after the CopyFromConflictTableCheck() call

cstate->conflictRel = table_open(conflictRelid, NoLock);
CopyFromConflictTableCheck(cstate);
table_close(cstate->conflictRel, NoLock);

But if we take a look at DoCopy(), the function CopyFrom() is called
immediately after a BeginCopyFrom() call, which now references to a
closed relation in cstate->conflictRel. I guess we should close the
relation in EndCopyFrom().

== redundant if blocks ==

These two if (cstate->opts.on_conflict == ONCONFLICT_TABLE) could be merged:

if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
{
node->onConflictAction = ONCONFLICT_NOTHING;
node->canSetTag = true;
}

Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);

if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
Assert(cstate->escontext);

if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
conflictslot = ExecInitExtraTupleSlot(estate,
RelationGetDescr(cstate->conflictRel),
&TTSOpsVirtual);

== comments ==
ON_CONFLCT > ON_CONFLICT
unqiue > unique

IIUC, it should be here (copy.h) "on conflict" instead of "on error", right?

char *on_conflictRel; /* on error, save error info to the table,
* table name */

Thanks!
Best, Jim

#9Zsolt Parragi
zsolt.parragi@percona.com
In reply to: Jim Jones (#8)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

I also noticed the early relation close mentioned by Jim, which can crash the patch.

+      This uses the same mechanism as <link linkend="sql-on-conflict"><command>INSERT ... ON CONFLICT</command></link>.
+      However, exclusion constraints are not supported; only <literal>NOT DEFERRABLE</literal>
+      unique constraints are checked for violations.

EXCLUDE USING gist (... WITH =, ... WITH &&) seems to work fine? Except that the message mentions unique constraint violation.

I also checked the same trigger behaviors as in the other thread[1], especially before triggers on the conflict table, and this patch behaves similarly, it silently drops rows.
I think this could also use some more visibility/documentation about that.

1: /messages/by-id/CAN4CZFPoohFvQTSE0wC+wcrfYiZOxFmUdOq0+9TCVR6Hk8n6iw@mail.gmail.com

#10jian he
jian.universality@gmail.com
In reply to: Zsolt Parragi (#9)
Re: COPY ON_CONFLICT TABLE; save duplicated record to another table.

On Fri, May 15, 2026 at 7:56 PM Zsolt Parragi <zsolt.parragi@percona.com> wrote:

I also noticed the early relation close mentioned by Jim, which can
crash the patch.

fixed.

+      This uses the same mechanism as <link
linkend="sql-on-conflict"><command>INSERT ... ON
CONFLICT</command></link>.
+      However, exclusion constraints are not supported; only
<literal>NOT DEFERRABLE</literal>
+      unique constraints are checked for violations.

EXCLUDE USING gist (... WITH =, ... WITH &&) seems to work fine?
Except that the message mentions unique constraint violation.

I double-checked ExecCheckIndexConstraints, ExecInsertIndexTuples and
added some dummy regression tests to confirm
that INSERT ON CONFLICT DO NOTHING works fine with exclusion constraints.

I also checked the same trigger behaviors as in the other thread[1],
especially before triggers on the conflict table, and this patch
behaves similarly, it silently drops rows.
I think this could also use some more visibility/documentation about that.

1: /messages/by-id/CAN4CZFPoohFvQTSE0wC+wcrfYiZOxFmUdOq0+9TCVR6Hk8n6iw@mail.gmail.com

With the attached v4, row-level and statement-level triggers are now
fired for every insertion to conflict_table

In v3, there was a performance regression when a table don't have any unique or
exclusion constraint, but ON_CONFLICT was still specified as 'TABLE'.
I have attached an SQL test script demonstrating this.
With v4, this regression is now very very minimal for COPY operations where
ON_CONFLICT is set to 'TABLE' on a target table without any unique or exclusion
constraints.

I also polished the documentation.

Comments from Jim Jones also addressed.

--
jian
https://www.enterprisedb.com/

Attachments:

copy_on_conflict_no_unique_idx_test.sqlapplication/sql; name=copy_on_conflict_no_unique_idx_test.sqlDownload
v4-0001-export-ExecInsert.patchtext/x-patch; charset=US-ASCII; name=v4-0001-export-ExecInsert.patchDownload+46-40
v4-0002-COPY-ON_CONFLICT-TABLE.patchtext/x-patch; charset=US-ASCII; name=v4-0002-COPY-ON_CONFLICT-TABLE.patchDownload+1214-20