Conflict handling for COPY FROM
Hellow hackers,
A few commitfest ago there was same effort to add errors handling to COPY
FROM[1]. /messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com and i see there that we already have infrastructure for supporting
handling of unique violation or exclusion constraint violation error and I
think it is independently useful too. Attached is a patch to do that.
In order to prevent extreme condition the patch also add a new GUC variable
called copy_max_error_limit that control the amount of error to swallow
before start to error and new failed record file options for copy to write
a failed record so the user can examine it.
With the new option COPY FROM can be specified like:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }[ON CONFLICT IGNORE
failed_record_filename] [ [ WITH ] ( option [, ...] ) ]
[1]: . /messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com
/messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com
Comment?
Regards
Surafel
Attachments:
conflict-handling-onCopy-from-v1.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v1.patchDownload+216-16
Hi Surafel,
Andrew and I began reviewing your patch. It applied cleanly and seems to mostly have the functionality you describe. We did have some comments/questions.
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
Thanks for writing this patch!
Karen
Thanks for looking at it
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
By mistake I write it copy_max_error_limit here but in the patch it is copy_maximum_error_limit with a default value of 100 sorry for mismatch
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
I see it now that most of formatting error can be handle safely I will attache the patch for it this week
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
I agree.I will change it to ON CONFLICT LOG if we can’t find better naming
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
okay
On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:
In order to prevent extreme condition the patch also add a new GUC
variable called copy_max_error_limit that control the amount of error to
swallow before start to error and new failed record file options for copy
to write a failed record so the user can examine it.
Why should this be a GUC rather than a COPY option?
In fact, try doing all of this by adding more options to COPY rather than
new syntax.
COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')
It kind of stinks to use a log file written by the server as the
dup-reporting channel though. That would have to be superuser-only.
...Robert
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello,
The attached patch add error handling for
Extra data
missing data
invalid oid
null oid and
row count mismatch
And the record that field on the above case write to the file with appended
error message in it and in case of unique violation or exclusion constraint
violation error the failed record write as it is because the case of the
error can not be identified specifically
The new syntax became :
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
Regards
Surafel
Attachments:
conflict-handling-onCopy-from-v2.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v2.patchDownload+223-15
On Thu, Aug 23, 2018 at 4:16 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
The attached patch add error handling for
Extra datamissing data
invalid oid
null oid and
row count mismatch
Hi,
Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it? I'm moving it to the next CF as "Waiting on
Author". Also I would appreciate if someone from the reviewers (Karen
Huddleston ?) could post a full patch review.
On Aug 20, 2018, at 5:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.Why should this be a GUC rather than a COPY option?
In fact, try doing all of this by adding more options to COPY rather than new syntax.
COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')
It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.
Perhaps a better option would be to allow the user to specify a name for a cursor, and have COPY do the moral equivalent of DECLARE name? Calling a function for each bad row would be another option.
On Thu, Nov 29, 2018 at 3:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it?
Thank you for informing, attach is rebased patch against current master
Regards
Surafel
Attachments:
conflict-handling-onCopy-from-v3.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v3.patchDownload+238-16
On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
Thank you for informing, attach is rebased patch against current
master
copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--
Michael
Hi,
On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?
- Andres
On 2/16/19 12:24 AM, Andres Freund wrote:
Hi,
On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?
I liked Jim Nasby's idea of having it call a function rather than
writing to a log file.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 4, 2019 at 9:06 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
Thank you for informing, attach is rebased patch against current
mastercopy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--
Thank you, here is a rebased patch against current master
regards
Surafel
Attachments:
conflict-handling-onCopy-from-v4.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v4.patchDownload+238-16
On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?- Andres
I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.
regards
Surafel
On February 19, 2019 3:05:37 AM PST, Surafel Temesgen <surafel3000@gmail.com> wrote:
On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de>
wrote:Hi,
On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
This doesn't seem to address Robert's point that a log file requires
to
be super user only, which seems to restrict the feature more than
necessary?- Andres
I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.
Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under - it will always have write permissions to at least the entire data directory. I think not addressing this just about guarantees the feature will be rejected.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
Err, what? Again, that requires super user permissions (in contrast to
copy from/to stdin/out). Backends run as the user postgres runs under
okay i see it now and modified the patch similarly
regards
Surafel
Attachments:
conflict-handling-onCopy-from-v5.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v5.patchDownload+243-16
On 2/20/19 8:01 AM, Surafel Temesgen wrote:
On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under
okay i see it now and modified the patch similarly
Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
On 2/20/19 8:01 AM, Surafel Temesgen wrote:
On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under
okay i see it now and modified the patch similarlyWhy log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.
Or even just return it as a row. CopyBoth is relatively widely supported these days.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi Surafel,
On 2/20/19 8:03 PM, Andres Freund wrote:
On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.Or even just return it as a row. CopyBoth is relatively widely supported these days.
This patch no longer applies so marked Waiting on Author.
Also, it appears that you have some comments from Andrew and Andres that
you should reply to.
Regards,
--
-David
david@pgmasters.net
Hi,
On 2019-03-25 12:50:13 +0400, David Steele wrote:
On 2/20/19 8:03 PM, Andres Freund wrote:
On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.Or even just return it as a row. CopyBoth is relatively widely supported these days.
This patch no longer applies so marked Waiting on Author.
Also, it appears that you have some comments from Andrew and Andres that you
should reply to.
As nothing has happened the last weeks, I've now marked this as
returned with feedback.
- Andres
On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:
On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:On 2/20/19 8:01 AM, Surafel Temesgen wrote:
On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs underokay i see it now and modified the patch similarly
Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.Or even just return it as a row. CopyBoth is relatively widely supported
these days.
hello,
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated
In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation
regards
Surafel