BUG #2221: Bad delimiters allowed in COPY ... TO

Started by David Fetterabout 20 years ago20 messagesbugs
Jump to latest
#1David Fetter
david@fetter.org

The following bug has been logged online:

Bug reference: 2221
Logged by: David Fetter
Email address: david@fetter.org
PostgreSQL version: all
Operating system: all
Description: Bad delimiters allowed in COPY ... TO
Details:

This came up while I was testing my pg_dump "specify DELIMITER AS and/or
NULL AS" patch.

Below is a repro using newline. A similar problem happens when you specify
the delimiter as a backslash. Should COPY simply not allow these
delimiters? What about carriage return? Just in general, what bytes other
than null ought COPY to reject out of hand?

Also, what regression tests do we want to put in in order to ensure that
COPY ... TO generates output that COPY ... FROM can understand?

CREATE TABLE pqxxevents (
"year" integer,
event text
);

COPY pqxxevents ("year", event) FROM stdin DELIMITER AS '^M';
2010^MOdyssey Two
2038^Mtime_t overflow
1971^Mjtv
1981^MC:\\>
1997^MAsian crisis
1999^M\N
1978^Mbloody\t\tcold
1989^MOde an die Freiheit
2001^MNew millennium
2001^M'911' WTC attack
2001^MA Space Odyssey
2002^Mlibpqxx
3001^MFinal Odyssey
\.

#2David Fetter
david@fetter.org
In reply to: David Fetter (#1)
Re: BUG #2221: Bad delimiters allowed in COPY ... TO

On Sun, Jan 29, 2006 at 09:50:08PM +0000, David Fetter wrote:

The following bug has been logged online:

Bug reference: 2221
Logged by: David Fetter
Email address: david@fetter.org
PostgreSQL version: all
Operating system: all
Description: Bad delimiters allowed in COPY ... TO
Details:

This came up while I was testing my pg_dump "specify DELIMITER AS and/or
NULL AS" patch.

Folks,

Please pardon the self-followup. I believe that this patch fixes the
problem in COPY.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

Attachments:

forbid_badchars_copy.difftext/plain; charset=us-asciiDownload+6-0
#3David Fetter
david@fetter.org
In reply to: David Fetter (#2)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ... TO

On Sun, Jan 29, 2006 at 04:41:43PM -0800, David Fetter wrote:

On Sun, Jan 29, 2006 at 09:50:08PM +0000, David Fetter wrote:

The following bug has been logged online:

Bug reference: 2221
Logged by: David Fetter
Email address: david@fetter.org
PostgreSQL version: all
Operating system: all
Description: Bad delimiters allowed in COPY ... TO
Details:

This came up while I was testing my pg_dump "specify DELIMITER AS and/or
NULL AS" patch.

Folks,

Please pardon the self-followup. I believe that this patch fixes
the problem in COPY.

Another followup, this time with the comment done right.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

Attachments:

forbid_badchars_copy.difftext/plain; charset=us-asciiDownload+6-0
#4Neil Conway
neilc@samurai.com
In reply to: David Fetter (#3)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote:

Another followup, this time with the comment done right.

+       /* Disallow the forbidden_delimiter strings */
+       if (strcspn(cstate->delim, BADCHARS) != 1)
+               elog(ERROR, "COPY delimiter cannot be %#02x",
+                        *cstate->delim);
+ 

The comment is still wrong: referencing "forbidden_delimiter" makes it
sound like there is something named forbidden_delimiter, but there is
not (at least in the patch as submitted).

The patch should also use ereport rather than elog, because this error
message might reasonably be encountered by the user.

-Neil

#5David Fetter
david@fetter.org
In reply to: Neil Conway (#4)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

On Sun, Jan 29, 2006 at 10:20:47PM -0500, Neil Conway wrote:

On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote:

Another followup, this time with the comment done right.

+       /* Disallow the forbidden_delimiter strings */
+       if (strcspn(cstate->delim, BADCHARS) != 1)
+               elog(ERROR, "COPY delimiter cannot be %#02x",
+                        *cstate->delim);
+ 

The comment is still wrong: referencing "forbidden_delimiter" makes
it sound like there is something named forbidden_delimiter, but
there is not (at least in the patch as submitted).

The patch should also use ereport rather than elog, because this
error message might reasonably be encountered by the user.

Patch with BADCHARS attached :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

Attachments:

forbid_badchars_copy.difftext/plain; charset=us-asciiDownload+8-0
#6Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#5)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

David Fetter wrote:

+ 	/* Disallow BADCHARS characters */
+ 	if (strcspn(cstate->delim, BADCHARS) != 1)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY delimiter cannot be \"%#02x\"",
+ 						*cstate->delim)));
+ 

Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a
missing feature; we are performing a sanity check here. We can
reasonably expect never to support CR, LF or \ as the text delimiter.
Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one.

Also, I would probably make the format %#.02x so the result would look
like 0x0d (for a CR).

(I bet David never thought there would so much fuss over a handful of
lines of code)

cheers

andrew

#7David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#6)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

On Mon, Jan 30, 2006 at 08:21:34AM -0500, Andrew Dunstan wrote:

David Fetter wrote:

+ 	/* Disallow BADCHARS characters */
+ 	if (strcspn(cstate->delim, BADCHARS) != 1)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY delimiter cannot be \"%#02x\"",
+ 						*cstate->delim)));
+ 

Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a
missing feature; we are performing a sanity check here. We can
reasonably expect never to support CR, LF or \ as the text
delimiter.

I guess that depends on whether we ever plan to allow people to set
the output record separator to something other than CR?LF.

Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one.

Also, I would probably make the format %#.02x so the result would
look like 0x0d (for a CR).

(I bet David never thought there would so much fuss over a handful
of lines of code)

Actually, I'm happy to see it's getting QA. COPY is something that
has Consequences��� if anything goes wrong with it, so I'd rather do
best efforts up front to get it right. :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

#8Bruce Momjian
bruce@momjian.us
In reply to: David Fetter (#7)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

Is ERRCODE_FEATURE_NOT_SUPPORTED the right errcode? This isn't a
missing feature; we are performing a sanity check here. We can
reasonably expect never to support CR, LF or \ as the text
delimiter.

I guess that depends on whether we ever plan to allow people to set
the output record separator to something other than CR?LF.

Maybe ERRCODE_INVALID_PARAMETER_VALUE ? Or maybe we need a new one.

Agreed. Right now it is invalid and there are no plans to support other
values for end-of-line. I will make the change when the patch is
applied.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Bruce Momjian
bruce@momjian.us
In reply to: David Fetter (#5)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

Uh, couldn't the delimiter be a backslash in CVS mode?

+ #define BADCHARS "\r\n\\"

Also, should we disable DELIMITER and NULL from sharing characters?

---------------------------------------------------------------------------

David Fetter wrote:

On Sun, Jan 29, 2006 at 10:20:47PM -0500, Neil Conway wrote:

On Sun, 2006-01-29 at 17:03 -0800, David Fetter wrote:

Another followup, this time with the comment done right.

+       /* Disallow the forbidden_delimiter strings */
+       if (strcspn(cstate->delim, BADCHARS) != 1)
+               elog(ERROR, "COPY delimiter cannot be %#02x",
+                        *cstate->delim);
+ 

The comment is still wrong: referencing "forbidden_delimiter" makes
it sound like there is something named forbidden_delimiter, but
there is not (at least in the patch as submitted).

The patch should also use ereport rather than elog, because this
error message might reasonably be encountered by the user.

Patch with BADCHARS attached :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10David Fetter
david@fetter.org
In reply to: Bruce Momjian (#9)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote:

Uh, couldn't the delimiter be a backslash in CVS mode?

I don't think so. Folks?

Anyhow, if there are different sets, I could do something like:

#define BADCHARS "\r\n\\"
#define BADCHARS_CSV "\r\n"

and then check for csv_mode, etc.

+ #define BADCHARS "\r\n\\"

Also, should we disable DELIMITER and NULL from sharing characters?

That's on about line 916, post-patch:

/* Don't allow the delimiter to appear in the null string. */
if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must not appear in the NULL specification")));

I suppose that a different error code might be The Right Thing��� here.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

#11David Fetter
david@fetter.org
In reply to: David Fetter (#10)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

On Tue, Jan 31, 2006 at 09:50:26PM -0600, Andrew Dunstan wrote:

David Fetter said:

On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote:

Uh, couldn't the delimiter be a backslash in CVS mode?

I don't think so. Folks?

Using backslash as a delimiter in CSV would be odd, to say the least. As an
escape char it is occasionally used, but not as a delimiter in my
experience. Maybe we should apply the "be liberal in what you accept" rule,
but I think this would be stretching it.

<aol>So do I.</aol>

Also, should we disable DELIMITER and NULL from sharing characters?

That's on about line 916, post-patch:

/* Don't allow the delimiter to appear in the null string. */
if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must not appear in the NULL
specification")));

I suppose that a different error code might be The Right Thing���
here.

ERRCODE_WHAT WERE_YOU_THINKING ?

That's an excellent candidate, or maybe ERRCODE_INVALID_PARAMETER_VALUE.
My vote is for ERRCODE_D00D_WTF ;)

Maybe we need an error code for mutually incompatible param values.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

#12Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#10)
Re: [PATCHES] BUG #2221: Bad delimiters allowed in COPY ...

David Fetter said:

On Tue, Jan 31, 2006 at 08:03:41PM -0500, Bruce Momjian wrote:

Uh, couldn't the delimiter be a backslash in CVS mode?

I don't think so. Folks?

Using backslash as a delimiter in CSV would be odd, to say the least. As an
escape char it is occasionally used, but not as a delimiter in my
experience. Maybe we should apply the "be liberal in what you accept" rule,
but I think this would be stretching it.

Anyhow, if there are different sets, I could do something like:

#define BADCHARS "\r\n\\"
#define BADCHARS_CSV "\r\n"

and then check for csv_mode, etc.

+ #define BADCHARS "\r\n\\"

Also, should we disable DELIMITER and NULL from sharing characters?

That's on about line 916, post-patch:

/* Don't allow the delimiter to appear in the null string. */
if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must not appear in the NULL
specification")));

I suppose that a different error code might be The Right Thing™ here.

ERRCODE_WHAT WERE_YOU_THINKING ?

cheers

andrew

#13Bruce Momjian
bruce@momjian.us
In reply to: David Fetter (#11)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

On Tue, Jan 31, 2006 at 09:50:26PM -0600, Andrew Dunstan wrote:

Also, should we disable DELIMITER and NULL from sharing characters?

That's on about line 916, post-patch:

/* Don't allow the delimiter to appear in the null string. */
if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY delimiter must not appear in the NULL
specification")));

I suppose that a different error code might be The Right Thing���
here.

ERRCODE_WHAT WERE_YOU_THINKING ?

That's an excellent candidate, or maybe ERRCODE_INVALID_PARAMETER_VALUE.
My vote is for ERRCODE_D00D_WTF ;)

Maybe we need an error code for mutually incompatible param values.

Attached is a patch that errors for \r and \n in delimiter and null. I
kept the ERRCODE_FEATURE_NOT_SUPPORTED error code because that is what
all the other error tests use in the copy code in that area. I did
nothing with backslash.

FYI, David, my email reader is having problems reading your emails
because of this line:

Content-Type: text/plain; charset=iso_8859_1

My understanding is this should be iso-8859-1, with dashes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/copytext/plainDownload+13-0
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Attached is a patch that errors for \r and \n in delimiter and null. I
kept the ERRCODE_FEATURE_NOT_SUPPORTED error code because that is what
all the other error tests use in the copy code in that area.

I'd go with INVALID_PARAMETER_VALUE, I think. ISTM that
FEATURE_NOT_SUPPORTED is appropriate for places where we might someday
support the case the error is rejecting. For instance the error just
above your patch is for a multi-character delimiter string. That isn't
completely senseless, it's just not implemented. But we're not ever
going to allow a delimiter setting that conflicts with end-of-line,
and I don't foresee allowing some other value for end-of-line ;-)
... so this check isn't going to be removed someday.

regards, tom lane

#15David Fetter
david@fetter.org
In reply to: Tom Lane (#14)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

On Wed, Feb 01, 2006 at 01:16:08AM -0500, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Attached is a patch that errors for \r and \n in delimiter and
null. I kept the ERRCODE_FEATURE_NOT_SUPPORTED error code because
that is what all the other error tests use in the copy code in
that area.

I'd go with INVALID_PARAMETER_VALUE, I think. ISTM that
FEATURE_NOT_SUPPORTED is appropriate for places where we might
someday support the case the error is rejecting. For instance the
error just above your patch is for a multi-character delimiter
string. That isn't completely senseless, it's just not implemented.
But we're not ever going to allow a delimiter setting that conflicts
with end-of-line, and I don't foresee allowing some other value for
end-of-line ;-) ... so this check isn't going to be removed someday.

I don't know why you're saying that the EOL character will never be
changeable. Other DBs (yes, I know that's not an argument for doing
this, but please bear with me) let you set the "field separator" aka
our DELIMITER and "record separator" aka our newline (or CRLF, in some
cases. Oy!).

Anyhow, Bruce's patch still allows backslash as a delimiter, which can
cause *all* kinds of fun if not disallowed.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

#16Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#15)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

David Fetter wrote:

Anyhow, Bruce's patch still allows backslash as a delimiter, which can
cause *all* kinds of fun if not disallowed.

Yes, I'm puzzled by that. I can't really see any case for allowing it.
And if we do, it should only be allowed in CSV mode, where its use will
be merely perverse rather than broken.

cheers

andrew

#17Bruce Momjian
bruce@momjian.us
In reply to: David Fetter (#15)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

David Fetter wrote:

On Wed, Feb 01, 2006 at 01:16:08AM -0500, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Attached is a patch that errors for \r and \n in delimiter and
null. I kept the ERRCODE_FEATURE_NOT_SUPPORTED error code because
that is what all the other error tests use in the copy code in
that area.

I'd go with INVALID_PARAMETER_VALUE, I think. ISTM that
FEATURE_NOT_SUPPORTED is appropriate for places where we might
someday support the case the error is rejecting. For instance the
error just above your patch is for a multi-character delimiter
string. That isn't completely senseless, it's just not implemented.
But we're not ever going to allow a delimiter setting that conflicts
with end-of-line, and I don't foresee allowing some other value for
end-of-line ;-) ... so this check isn't going to be removed someday.

I don't know why you're saying that the EOL character will never be
changeable. Other DBs (yes, I know that's not an argument for doing
this, but please bear with me) let you set the "field separator" aka
our DELIMITER and "record separator" aka our newline (or CRLF, in some
cases. Oy!).

Anyhow, Bruce's patch still allows backslash as a delimiter, which can
cause *all* kinds of fun if not disallowed.

OK, updated patch, which disallows backslash as a delimiter, and updated
error return code.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/copytext/plainDownload+19-0
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Attached is a patch that errors for \r and \n in delimiter and
null.

I am not convinced that this is a bug. Can you prove that there is no
use-case for asking COPY to emit data in this style? Sure, COPY itself
couldn't read it, but people sometimes feed COPY output to other
programs...

regards, tom lane

#19Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#18)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Attached is a patch that errors for \r and \n in delimiter and
null.

I am not convinced that this is a bug. Can you prove that there is no
use-case for asking COPY to emit data in this style? Sure, COPY itself
couldn't read it, but people sometimes feed COPY output to other
programs...

FYI, Tom, old email from Feb 1, 2006, from your server:

Received: from sss.pgh.pa.us (root@sss.pgh.pa.us [66.207.139.130])
by momjian.us (8.11.6/8.11.6) with ESMTP id l14HX7l21306
for <pgman@candle.pha.pa.us>; Sun, 4 Feb 2007 12:33:08 -0500 (EST)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
by sss.pgh.pa.us (8.13.6/8.13.6) with ESMTP id l14HWWoj020066;
Sun, 4 Feb 2007 12:32:32 -0500 (EST)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: David Fetter <david@fetter.org>, Andrew Dunstan <andrew@dunslane.net>,
neilc@samurai.com, PostgreSQL-patches <pgsql-patches@postgresql.org>
Subject: Re: [PATCHES] [BUGS] BUG #2221: Bad delimiters allowed in COPY ...
In-Reply-To: <200602011410.k11EApH14839@candle.pha.pa.us>
References: <200602011410.k11EApH14839@candle.pha.pa.us>
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
message dated "Wed, 01 Feb 2006 09:10:51 -0500"
----------------

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#19)
Re: [BUGS] BUG #2221: Bad delimiters allowed in COPY ...

Bruce Momjian <bruce@momjian.us> writes:

FYI, Tom, old email from Feb 1, 2006, from your server:

Wups. Sorry about that --- somehow I confused that with the current
thread about copy delimiters. Should have looked harder at the date.

regards, tom lane