COPY ON_CONFLICT TABLE; save duplicated record to another table.
Hi,
This is for v20.
Reference: https://web.archive.org/web/20240328094030/https://riggs.business/blog/f/postgresql-todo-2023
COPY enhancement:
Detect duplicate rows and redirect them to a separate table without
aborting the load.
While reviewing this TODO, I quickly noticed this idea closely aligns with
https://commitfest.postgresql.org/patch/4817.
Both ideas share common elements: allowing a user-specified table,
validating its metadata, and storing rows in it.
Based on that, I spent some time working on the implementation.
Proposed syntax:
COPY FROM (ON_CONFLICT TABLE, CONFLICT_TABLE conflict_tbl);
The CONFLICT_TABLE requires exactly four columns: COPY target table, COPY
filename, the line number of the duplicate, and the duplicate record itself.
This structure is fixed, a pre-defined data type is unnecessary. Validation is
based solely on the column data types (pg_attribute.atttypid) rather than their
names (pg_attribute.attname). The expected types are OID, TEXT, INT8, and TEXT,
respectively.
This uses INSERT ON CONFLICT infrastructure under the hood.
Demo:
CREATE TABLE t_copy_tbl(a int, b int, c text);
CREATE TABLE err_tbl1(copy_tbl oid, filename text, lineno bigint, line text);
CREATE UNIQUE INDEX ON t_copy_tbl (c);
COPY t_copy_tbl(b,a, c) FROM STDIN (DELIMITER ',', ON_CONFLICT TABLE,
CONFLICT_TABLE err_tbl1, log_verbosity verbose);
4,17,aaaaaa
6,11,aaaaaa
11,1,xxxxxxxx
12,1,xxxxxxxx
13,1,xxxxxxxx
\.
table err_tbl1 ;
copy_tbl | filename | lineno | line
----------+----------+--------+---------------
18231 | STDIN | 2 | 6,11,aaaaaa
18231 | STDIN | 4 | 12,1,xxxxxxxx
18231 | STDIN | 5 | 13,1,xxxxxxxx
(3 rows)
(I need to double-check the exclusion unique constraint)
Comments are welcome!
Attachments:
v1-0001-COPY-ON_CONFLICT-TABLE.patchtext/x-patch; charset=US-ASCII; name=v1-0001-COPY-ON_CONFLICT-TABLE.patchDownload+863-16
Hi Jian
On 25/04/2026 06:12, jian he wrote:
Comments are welcome!
Thanks for the patch!
A few comments:
== double defGet in function name ==
defGetdefGetCopyOnConflictChoice should probably be
defGetCopyOnConflictChoice
== identical error message in ProcessCopyOptions() ==
The same error message is raised for conflict_tbl_specified and
!conflict_tbl_specified
ereport(ERROR,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s requires %s option", "CONFLICT_TABLE", "ON_CONFLICT"));
== unused enum in CopyOnErrorChoice ==
The enum COPY_ON_ERROR_TABLE is introduced, but is never used anywhere.
== unconditional ExecOpenIndices() ==
ExecOpenIndices(resultRelInfo, true);
I'm not very familiar with this part of the code, but it looks like that
this change would affect other COPY FROM operations. If I'm mistaken, a
comment would add some value here. Or perhaps something like:
ExecOpenIndices(resultRelInfo, cstate->opts.on_conflict != ONCONFLICT_NONE);
== ON_CONFLICT TABLE not rejected in COPY TO ==
CONFLICT_TABLE is silently ignored, even if the table does not exist:
postgres=# COPY t TO '/dev/null' (ON_CONFLICT TABLE, CONFLICT_TABLE
table_does_not_exist);
COPY 1
I guess adding a is_from to defGetdefGetCopyOnConflictChoice() is the
way to go.
== redundant condition in CopyFrom() ==
The second condition seems unnecessary, as the previous if already tests
for cstate->opts.on_conflict == ONCONFLICT_NONE:
else if (resultRelInfo->ri_NumIndices > 0 &&
cstate->opts.on_conflict != ONCONFLICT_NONE)
== typos ==
regular realtion > regular relation
vertification > verification
resouces > resources
unqiue > unique
== unnecessary pnstrdup (?) ==
newvalues[i] = CStringGetTextDatum(pnstrdup(cstate->line_buf.data,
cstate->line_buf.len));
Is the duplication really necessary? Wouldn't it suffice to use
cstring_to_text_with_len() instead? Something like:
newvalues[i] =
PointerGetDatum(cstring_to_text_with_len(cstate->line_buf.data,
cstate->line_buf.len));
or even
newvalues[i] = CStringGetTextDatum(cstate->line_buf.data)
I'll check the documentation after we get more feedback on the syntax.
Thanks!
Best, Jim
Hello!
I tried the patch and found a few issues.
1. Two of them are null pointer dereference crashes, one with
partitioned tables:
CREATE TABLE part_t (a int PRIMARY KEY, b text) PARTITION BY RANGE (a);
CREATE TABLE part_t_p1 PARTITION OF part_t FOR VALUES FROM (0) TO (1000);
CREATE TABLE conflict_log (
rel oid,
file_name text,
line_no bigint,
raw_line text
);
INSERT INTO part_t VALUES (1, 'pre-existing');
COPY part_t (a, b) FROM stdin WITH (on_conflict 'table',
conflict_table 'conflict_log');
2 row-two
1 dup
3 row-three
\.
2. And another with repeateable reads:
CREATE TABLE t_rr (a int PRIMARY KEY, b text);
CREATE TABLE conflict_log (rel oid, fname text, ln bigint, raw text);
INSERT INTO t_rr VALUES (1, 'pre-committed');
BEGIN ISOLATION LEVEL REPEATABLE READ;
COPY t_rr FROM stdin WITH (on_conflict 'table', conflict_table 'conflict_log');
1 dup-row
\.
3. There's also a possible data loss scenario, reports 3 copied 0 actual:
CREATE TABLE conf_log (
relname oid,
fname text,
lineno bigint,
rawline text
);
CREATE TABLE no_idx_tgt (id int, payload text);
CREATE FUNCTION noop_trig() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
RETURN NEW;
END;
$$;
CREATE TRIGGER noop_before BEFORE INSERT ON no_idx_tgt
FOR EACH ROW EXECUTE FUNCTION noop_trig();
COPY no_idx_tgt (id, payload) FROM STDIN
WITH (ON_CONFLICT TABLE, CONFLICT_TABLE conf_log);
1 alpha
2 beta
3 gamma
\.
SELECT 'A: no_idx_tgt count' AS scenario, count(*) AS rows FROM no_idx_tgt;
SELECT 'A: conf_log count' AS scenario, count(*) AS rows FROM conf_log;
SELECT * FROM no_idx_tgt ORDER BY id;
4. Shouldn't the following error out?
CREATE TABLE t (a int PRIMARY KEY, b text);
COPY t TO '/dev/null' (ON_CONFLICT TABLE, CONFLICT_TABLE no_such_table);
Hi.
The attached patch should address most, if not all, of the issues you
both raised.
As explained in [1]/messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-EUTp7w@mail.gmail.com, we can export ExecInsert to let it perform the
main insertion work.
To allow ExecInsert to handle the remaining tasks, we need to carefuly manage
the lifecycle of constructed CopyFromStateData->ModifyTableContext (including
ModifyTableContext->EState): populate it, use it, and then release it.
Since ExecInsert already contains the necessary infrastructure for INSERT ON
CONFLICT DO NOTHING/SELECT, exporting it avoids duplicating that logic in
src/backend/commands/copyfrom.c (which is what v1 of the patch did).
[1]: /messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-EUTp7w@mail.gmail.com
The exclusion unique constraint issue is still not resolved.... but,
overall v2 is better than v1, IMHO.
Hi Jian
On 11/05/2026 05:13, jian he wrote:
The attached patch should address most, if not all, of the issues you
both raised.
Thanks for the update. All my points were addressed.
As explained in [1], we can export ExecInsert to let it perform the
main insertion work.
To allow ExecInsert to handle the remaining tasks, we need to carefuly manage
the lifecycle of constructed CopyFromStateData->ModifyTableContext (including
ModifyTableContext->EState): populate it, use it, and then release it.Since ExecInsert already contains the necessary infrastructure for INSERT ON
CONFLICT DO NOTHING/SELECT, exporting it avoids duplicating that logic in
src/backend/commands/copyfrom.c (which is what v1 of the patch did).[1]: /messages/by-id/CACJufxH_NbPuA+O5YR7xP4xDZ+iHkO2VFkddhrhBz+4-
EUTp7w@mail.gmail.comThe exclusion unique constraint issue is still not resolved.... but,
overall v2 is better than v1, IMHO.
One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.
/* Check on_conflict */
if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_conflict !=
ONCONFLICT_NONE)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "ON_CONFLICT TABLE")));
postgres=# COPY t FROM STDIN (FORMAT binary, ON_CONFLICT TABLE,
CONFLICT_TABLE ctbl);
ERROR: cannot specify ON_CONFLICT TABLE in BINARY mode
Right now the error is rather vague:
ERROR: COPY file signature not recognized
What do you think?
Thanks!
Best, Jim
Hello!
One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.
It is possible to crash the current patch with binary mode, that check
is definitely needed.
+ MakeTransitionCaptureState(cstate->conflictRel->trigdesc,
+ RelationGetRelid(cstate->conflictRel),
+ CMD_INSERT);
Shouldn't this update mtstate->mt_transition_capture?
+ if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
...
+ if (conflict_mstate->fireBSTriggers)
+ {
+ ExecBSInsertTriggers(conflict_mstate->ps.state,
conflict_mstate->rootResultRelInfo);
+
+ conflict_mstate->fireBSTriggers = false;
+ }
+
and
+ if (cstate->num_conflicts > 0)
+ {
...
+ /* Execute AFTER STATEMENT insertion triggers */
+ ExecASInsertTriggers(cstate->mtcontext->estate,
+ on_conflict_mtstate->rootResultRelInfo,
+ on_conflict_mtstate->mt_transition_capture);
* Doesn't statements typically fire triggers unconditionally? INSERT
ON CONFLICT DO NOTHING; fires BS+AS triggers even if it doesn't insert
any rows.
* Isn't firing a before statement trigger after some before/after row
triggers were already fired (for non conflicting rows) strange?
+ if (valid_col_count != 4)
+ errdetail_msg = _("The conflict_table is incomplete; exactly four
columns are required.");
if valid_col_count > 4, is it still incomplete, shouldn't the error
message change in that case?
On Tue, May 12, 2026 at 4:40 AM Zsolt Parragi <zsolt.parragi@percona.com> wrote:
Hello!
One other thing I just noticed in BINARY mode. I believe we're missing a
check in ProcessCopyOptions() with ON_CONFLICT TABLE to show a proper
error message, e.g.It is possible to crash the current patch with binary mode, that check
is definitely needed.
binary mode lacks the concept of a line number or the whole line string.
Since cstate->line_buf is null in binary mode, it will segfault in
```CStringGetTextDatum(cstate->line_buf.data);```
Supporting ON_CONFLICT in binary mode is not trivial.
Since ON_ERROR IGNORE also cannot be used in binary mode, not
supporting ON_CONFLICT in binary mode should be fine, IMHO.
+ MakeTransitionCaptureState(cstate->conflictRel->trigdesc, + RelationGetRelid(cstate->conflictRel), + CMD_INSERT);Shouldn't this update mtstate->mt_transition_capture?
Attached v3 fix this issue.
+ if (cstate->opts.on_conflict == ONCONFLICT_TABLE) ... + if (conflict_mstate->fireBSTriggers) + { + ExecBSInsertTriggers(conflict_mstate->ps.state, conflict_mstate->rootResultRelInfo); + + conflict_mstate->fireBSTriggers = false; + } +and
+ if (cstate->num_conflicts > 0) + { ... + /* Execute AFTER STATEMENT insertion triggers */ + ExecASInsertTriggers(cstate->mtcontext->estate, + on_conflict_mtstate->rootResultRelInfo, + on_conflict_mtstate->mt_transition_capture);* Doesn't statements typically fire triggers unconditionally? INSERT
ON CONFLICT DO NOTHING; fires BS+AS triggers even if it doesn't insert
any rows.
* Isn't firing a before statement trigger after some before/after row
triggers were already fired (for non conflicting rows) strange?
Ok. I changed to
Statement-level triggers on the CONFLICT_TABLE are fired
unconditionally, regardless of whether an error occurred or not.
Each row inserted into the CONFLICT_TABLE will fire both the BEFORE
INSERT FOR EACH ROW and AFTER INSERT FOR EACH ROW triggers.
+ if (valid_col_count != 4) + errdetail_msg = _("The conflict_table is incomplete; exactly four columns are required.");if valid_col_count > 4, is it still incomplete, shouldn't the error
message change in that case?
With v3, whether there are more or fewer columns on the
conflict_table, the error message is now the same:
+ERROR: cannot use relation "err_tbl1" for COPY on_conflict error saving
+DETAIL: The conflict_table should only have four columns
+HINT: The conflict_table must contain exactly four columns with data
types, in order: OID, TEXT, BIGINT, TEXT
Regression tests for permission checks have also been added.
Hi Jian
On 12/05/2026 10:15, jian he wrote:
Attached v3 fix this issue.
== closing relation too early ==
In noticed that in BeginCopyFrom() you open cstate->conflictRel and
close it after the CopyFromConflictTableCheck() call
cstate->conflictRel = table_open(conflictRelid, NoLock);
CopyFromConflictTableCheck(cstate);
table_close(cstate->conflictRel, NoLock);
But if we take a look at DoCopy(), the function CopyFrom() is called
immediately after a BeginCopyFrom() call, which now references to a
closed relation in cstate->conflictRel. I guess we should close the
relation in EndCopyFrom().
== redundant if blocks ==
These two if (cstate->opts.on_conflict == ONCONFLICT_TABLE) could be merged:
if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
{
node->onConflictAction = ONCONFLICT_NOTHING;
node->canSetTag = true;
}
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
Assert(cstate->escontext);
if (cstate->opts.on_conflict == ONCONFLICT_TABLE)
conflictslot = ExecInitExtraTupleSlot(estate,
RelationGetDescr(cstate->conflictRel),
&TTSOpsVirtual);
== comments ==
ON_CONFLCT > ON_CONFLICT
unqiue > unique
IIUC, it should be here (copy.h) "on conflict" instead of "on error", right?
char *on_conflictRel; /* on error, save error info to the table,
* table name */
Thanks!
Best, Jim
I also noticed the early relation close mentioned by Jim, which can crash the patch.
+ This uses the same mechanism as <link linkend="sql-on-conflict"><command>INSERT ... ON CONFLICT</command></link>.
+ However, exclusion constraints are not supported; only <literal>NOT DEFERRABLE</literal>
+ unique constraints are checked for violations.
EXCLUDE USING gist (... WITH =, ... WITH &&) seems to work fine? Except that the message mentions unique constraint violation.
I also checked the same trigger behaviors as in the other thread[1], especially before triggers on the conflict table, and this patch behaves similarly, it silently drops rows.
I think this could also use some more visibility/documentation about that.
1: /messages/by-id/CAN4CZFPoohFvQTSE0wC+wcrfYiZOxFmUdOq0+9TCVR6Hk8n6iw@mail.gmail.com
On Fri, May 15, 2026 at 7:56 PM Zsolt Parragi <zsolt.parragi@percona.com> wrote:
I also noticed the early relation close mentioned by Jim, which can
crash the patch.
fixed.
+ This uses the same mechanism as <link linkend="sql-on-conflict"><command>INSERT ... ON CONFLICT</command></link>. + However, exclusion constraints are not supported; only <literal>NOT DEFERRABLE</literal> + unique constraints are checked for violations.EXCLUDE USING gist (... WITH =, ... WITH &&) seems to work fine?
Except that the message mentions unique constraint violation.
I double-checked ExecCheckIndexConstraints, ExecInsertIndexTuples and
added some dummy regression tests to confirm
that INSERT ON CONFLICT DO NOTHING works fine with exclusion constraints.
I also checked the same trigger behaviors as in the other thread[1],
especially before triggers on the conflict table, and this patch
behaves similarly, it silently drops rows.
I think this could also use some more visibility/documentation about that.1: /messages/by-id/CAN4CZFPoohFvQTSE0wC+wcrfYiZOxFmUdOq0+9TCVR6Hk8n6iw@mail.gmail.com
With the attached v4, row-level and statement-level triggers are now
fired for every insertion to conflict_table
In v3, there was a performance regression when a table don't have any unique or
exclusion constraint, but ON_CONFLICT was still specified as 'TABLE'.
I have attached an SQL test script demonstrating this.
With v4, this regression is now very very minimal for COPY operations where
ON_CONFLICT is set to 'TABLE' on a target table without any unique or exclusion
constraints.
I also polished the documentation.
Comments from Jim Jones also addressed.