Conflict handling for COPY FROM

Started by Surafel Temesgenover 7 years ago54 messageshackers
Jump to latest
#1Surafel Temesgen
surafel3000@gmail.com

Hellow hackers,

A few commitfest ago there was same effort to add errors handling to COPY
FROM[1]. /messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com and i see there that we already have infrastructure for supporting
handling of unique violation or exclusion constraint violation error and I
think it is independently useful too. Attached is a patch to do that.

In order to prevent extreme condition the patch also add a new GUC variable
called copy_max_error_limit that control the amount of error to swallow
before start to error and new failed record file options for copy to write
a failed record so the user can examine it.

With the new option COPY FROM can be specified like:

COPY table_name [ ( column_name [, ...] ) ]

FROM { 'filename' | PROGRAM 'command' | STDIN }[ON CONFLICT IGNORE
failed_record_filename] [ [ WITH ] ( option [, ...] ) ]

[1]: . /messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com
/messages/by-id/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA@gmail.com

Comment?

Regards

Surafel

Attachments:

conflict-handling-onCopy-from-v1.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v1.patchDownload+216-16
#2Karen Huddleston
khuddleston@pivotal.io
In reply to: Surafel Temesgen (#1)
Re: Conflict handling for COPY FROM

Hi Surafel,

Andrew and I began reviewing your patch. It applied cleanly and seems to mostly have the functionality you describe. We did have some comments/questions.

1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.

Thanks for writing this patch!
Karen

#3Surafel Temesgen
surafel3000@gmail.com
In reply to: Karen Huddleston (#2)
Re: Conflict handling for COPY FROM

Thanks for looking at it
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
By mistake I write it copy_max_error_limit here but in the patch it is copy_maximum_error_limit with a default value of 100 sorry for mismatch
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
I see it now that most of formatting error can be handle safely I will attache the patch for it this week
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
I agree.I will change it to ON CONFLICT LOG if we can’t find better naming
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
okay

#4Robert Haas
robertmhaas@gmail.com
In reply to: Surafel Temesgen (#1)
Re: Conflict handling for COPY FROM

On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

In order to prevent extreme condition the patch also add a new GUC
variable called copy_max_error_limit that control the amount of error to
swallow before start to error and new failed record file options for copy
to write a failed record so the user can examine it.

Why should this be a GUC rather than a COPY option?

In fact, try doing all of this by adding more options to COPY rather than
new syntax.

COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')

It kind of stinks to use a log file written by the server as the
dup-reporting channel though. That would have to be superuser-only.

...Robert
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Surafel Temesgen
surafel3000@gmail.com
In reply to: Robert Haas (#4)
Re: Conflict handling for COPY FROM

Hello,

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and

row count mismatch

And the record that field on the above case write to the file with appended
error message in it and in case of unique violation or exclusion constraint
violation error the failed record write as it is because the case of the
error can not be identified specifically

The new syntax became :

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

Regards

Surafel

Attachments:

conflict-handling-onCopy-from-v2.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v2.patchDownload+223-15
#6Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Surafel Temesgen (#5)
Re: Conflict handling for COPY FROM

On Thu, Aug 23, 2018 at 4:16 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and

row count mismatch

Hi,

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it? I'm moving it to the next CF as "Waiting on
Author". Also I would appreciate if someone from the reviewers (Karen
Huddleston ?) could post a full patch review.

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#4)
Re: Conflict handling for COPY FROM

On Aug 20, 2018, at 5:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000@gmail.com> wrote:
In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.

Why should this be a GUC rather than a COPY option?

In fact, try doing all of this by adding more options to COPY rather than new syntax.

COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')

It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.

Perhaps a better option would be to allow the user to specify a name for a cursor, and have COPY do the moral equivalent of DECLARE name? Calling a function for each bad row would be another option.

#8Surafel Temesgen
surafel3000@gmail.com
In reply to: Dmitry Dolgov (#6)
Re: Conflict handling for COPY FROM

On Thu, Nov 29, 2018 at 3:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it?

Thank you for informing, attach is rebased patch against current master

Regards

Surafel

Attachments:

conflict-handling-onCopy-from-v3.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v3.patchDownload+238-16
#9Michael Paquier
michael@paquier.xyz
In reply to: Surafel Temesgen (#8)
Re: Conflict handling for COPY FROM

On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:

Thank you for informing, attach is rebased patch against current
master

copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--
Michael

#10Andres Freund
andres@anarazel.de
In reply to: Surafel Temesgen (#5)
Re: Conflict handling for COPY FROM

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#10)
Re: Conflict handling for COPY FROM

On 2/16/19 12:24 AM, Andres Freund wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

I liked Jim Nasby's idea of having it call a function rather than
writing to a log file.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Surafel Temesgen
surafel3000@gmail.com
In reply to: Michael Paquier (#9)
Re: Conflict handling for COPY FROM

On Mon, Feb 4, 2019 at 9:06 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:

Thank you for informing, attach is rebased patch against current
master

copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--

Thank you, here is a rebased patch against current master

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v4.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v4.patchDownload+238-16
#13Surafel Temesgen
surafel3000@gmail.com
In reply to: Andres Freund (#10)
Re: Conflict handling for COPY FROM

On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?

- Andres

I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.

regards
Surafel

#14Andres Freund
andres@anarazel.de
In reply to: Surafel Temesgen (#13)
Re: Conflict handling for COPY FROM

On February 19, 2019 3:05:37 AM PST, Surafel Temesgen <surafel3000@gmail.com> wrote:

On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres@anarazel.de>
wrote:

Hi,

On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';

This doesn't seem to address Robert's point that a log file requires

to

be super user only, which seems to restrict the feature more than
necessary?

- Andres

I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.

Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under - it will always have write permissions to at least the entire data directory. I think not addressing this just about guarantees the feature will be rejected.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#15Surafel Temesgen
surafel3000@gmail.com
In reply to: Andres Freund (#14)
Re: Conflict handling for COPY FROM

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:

Err, what? Again, that requires super user permissions (in contrast to
copy from/to stdin/out). Backends run as the user postgres runs under

okay i see it now and modified the patch similarly

regards
Surafel

Attachments:

conflict-handling-onCopy-from-v5.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v5.patchDownload+243-16
#16Andrew Dunstan
andrew@dunslane.net
In reply to: Surafel Temesgen (#15)
Re: Conflict handling for COPY FROM

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

 
okay i see it now and modified the patch similarly 

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#16)
Re: Conflict handling for COPY FROM

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

 
okay i see it now and modified the patch similarly 

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#18David Steele
david@pgmasters.net
In reply to: Andres Freund (#17)
Re: Re: Conflict handling for COPY FROM

Hi Surafel,

On 2/20/19 8:03 PM, Andres Freund wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

This patch no longer applies so marked Waiting on Author.

Also, it appears that you have some comments from Andrew and Andres that
you should reply to.

Regards,
--
-David
david@pgmasters.net

#19Andres Freund
andres@anarazel.de
In reply to: David Steele (#18)
Re: Conflict handling for COPY FROM

Hi,

On 2019-03-25 12:50:13 +0400, David Steele wrote:

On 2/20/19 8:03 PM, Andres Freund wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported these days.

This patch no longer applies so marked Waiting on Author.

Also, it appears that you have some comments from Andrew and Andres that you
should reply to.

As nothing has happened the last weeks, I've now marked this as
returned with feedback.

- Andres

#20Surafel Temesgen
surafel3000@gmail.com
In reply to: Andres Freund (#17)
Re: Conflict handling for COPY FROM

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres@anarazel.de> wrote:

On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:

On 2/20/19 8:01 AM, Surafel Temesgen wrote:

On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Err, what? Again, that requires super user permissions (in
contrast to copy from/to stdin/out). Backends run as the user
postgres runs under

okay i see it now and modified the patch similarly

Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

hello,
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation
regards
Surafel

Attachments:

conflict-handling-onCopy-from-v6.patchtext/x-patch; charset=US-ASCII; name=conflict-handling-onCopy-from-v6.patchDownload+196-19
#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Surafel Temesgen (#20)
#22Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Alvaro Herrera (#21)
#23Surafel Temesgen
surafel3000@gmail.com
In reply to: Alexey Kondratov (#22)
#24Anthony Nowocien
anowocien@gmail.com
In reply to: Surafel Temesgen (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Surafel Temesgen (#20)
#26Surafel Temesgen
surafel3000@gmail.com
In reply to: Alexey Kondratov (#22)
#27Thomas Munro
thomas.munro@gmail.com
In reply to: Surafel Temesgen (#26)
#28Anthony Nowocien
anowocien@gmail.com
In reply to: Thomas Munro (#27)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Surafel Temesgen (#26)
#30Surafel Temesgen
surafel3000@gmail.com
In reply to: Alvaro Herrera (#29)
#31Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Surafel Temesgen (#30)
#32Surafel Temesgen
surafel3000@gmail.com
In reply to: Alexey Kondratov (#31)
#33Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Surafel Temesgen (#32)
#34Surafel Temesgen
surafel3000@gmail.com
In reply to: Alexey Kondratov (#33)
#35Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Surafel Temesgen (#34)
#36Surafel Temesgen
surafel3000@gmail.com
In reply to: Alexey Kondratov (#35)
#37asaba.takanori@fujitsu.com
asaba.takanori@fujitsu.com
In reply to: Surafel Temesgen (#36)
#38Surafel Temesgen
surafel3000@gmail.com
In reply to: asaba.takanori@fujitsu.com (#37)
#39Surafel Temesgen
surafel3000@gmail.com
In reply to: asaba.takanori@fujitsu.com (#37)
#40Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Surafel Temesgen (#39)
#41Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tatsuo Ishii (#40)
#42Surafel Temesgen
surafel3000@gmail.com
In reply to: Tatsuo Ishii (#41)
#43Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Surafel Temesgen (#42)
#44Surafel Temesgen
surafel3000@gmail.com
In reply to: Tatsuo Ishii (#43)
#45asaba.takanori@fujitsu.com
asaba.takanori@fujitsu.com
In reply to: Surafel Temesgen (#39)
#46Surafel Temesgen
surafel3000@gmail.com
In reply to: asaba.takanori@fujitsu.com (#45)
#47Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Surafel Temesgen (#46)
#48asaba.takanori@fujitsu.com
asaba.takanori@fujitsu.com
In reply to: Surafel Temesgen (#46)
#49Surafel Temesgen
surafel3000@gmail.com
In reply to: asaba.takanori@fujitsu.com (#48)
#50asaba.takanori@fujitsu.com
asaba.takanori@fujitsu.com
In reply to: Surafel Temesgen (#49)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Surafel Temesgen (#49)
#52Surafel Temesgen
surafel3000@gmail.com
In reply to: Tom Lane (#51)
#53David Steele
david@pgmasters.net
In reply to: Surafel Temesgen (#52)
#54David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#51)