[PATCH] COPY command's data format option allows only lowercase csv, text or binary

Started by Bharath Rupireddyover 5 years ago7 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

COPY command's FORMAT option allows only all lowercase csv, text or
binary, this is true because strcmp is being used while parsing these
values.

It would be nice if the uppercase or combination of lower and upper
case format options such as CSV, TEXT, BINARY, Csv, Text, Binary so
on. is also allowed.

To achieve this pg_strcasecmp() is used instead of strcmp.

Attached is a patch having above changes.

Request the community to review the patch, if it makes sense.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v1-0001-COPY-command-s-data-format-option-allows-only-low.patchapplication/octet-stream; name=v1-0001-COPY-command-s-data-format-option-allows-only-low.patchDownload
From f03a91f0787d1ed053e64ea885fbacaed9804a73 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 24 Jun 2020 15:14:43 +0530
Subject: [PATCH v1] COPY command's data format option allows only lowercase
 csv/text/binary

COPY command's FORMAT option allows only all lowercase csv, text
or binary, this is true because strcmp is being used while parsing
these values. Use pg_strcasecmp() instead of strcmp().
---
 src/backend/commands/copy.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cc..2416e76cfc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1133,11 +1133,11 @@ ProcessCopyOptions(ParseState *pstate,
 						 errmsg("conflicting or redundant options"),
 						 parser_errposition(pstate, defel->location)));
 			format_specified = true;
-			if (strcmp(fmt, "text") == 0)
+			if (pg_strcasecmp(fmt, "text") == 0)
 				 /* default format */ ;
-			else if (strcmp(fmt, "csv") == 0)
+			else if (pg_strcasecmp(fmt, "csv") == 0)
 				cstate->csv_mode = true;
-			else if (strcmp(fmt, "binary") == 0)
+			else if (pg_strcasecmp(fmt, "binary") == 0)
 				cstate->binary = true;
 			else
 				ereport(ERROR,
-- 
2.25.1

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bharath Rupireddy (#1)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:

COPY command's FORMAT option allows only all lowercase csv, text or
binary, this is true because strcmp is being used while parsing these
values.

This is nonsense, actually:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# copy foo from stdin (format CSV);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

As that shows, there's already a round of lowercasing done by the parser.
The only way that strcasecmp in copy.c would be useful is if you wanted to
accept things like
copy foo from stdin (format "CSV");
I don't find that to be a terribly good idea. The normal implication
of quoting is that it *prevents* case folding, so why should that
happen anyway?

More generally, though, why would we want to change this policy only
here? I believe we're reasonably consistent about letting the parser
do any required down-casing and then just checking keyword matches
with strcmp.

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

More generally, though, why would we want to change this policy only
here? I believe we're reasonably consistent about letting the parser
do any required down-casing and then just checking keyword matches
with strcmp.

I've had the feeling in the past that our use of pg_strcasecmp() was a
bit random. Looking through the output of 'git grep pg_strcasecmp', it
seems like we don't typically don't use it on option names, but
sometimes we use it on option values. For instance, DefineCollation()
uses pg_strcasecmp() on the collproviderstr, and DefineType() uses it
on storageEl; and also, not to be overlooked, defGetBoolean() uses it
when matching true/false/on/off, which probably affects a lot of
places. On the other hand, ExplainQuery() matches the format using
plain old strcmp(), despite indirectly using pg_strcasecmp() for the
Boolean parameters. So I guess I'm not really convinced that there is
all that much consistency here.

Mind you, I'm not sure whether or not anything really needs to be
changed, or exactly what ought to be done. I'm just making the
observation that it might not be as consistent as you may think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 24, 2020 at 10:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

More generally, though, why would we want to change this policy only
here? I believe we're reasonably consistent about letting the parser
do any required down-casing and then just checking keyword matches
with strcmp.

... Mind you, I'm not sure whether or not anything really needs to be
changed, or exactly what ought to be done. I'm just making the
observation that it might not be as consistent as you may think.

Yeah, I'm sure there are a few inconsistencies. We previously made a
pass to get rid of pg_strcasecmp for anything that had been through
the parser's downcasing (commit fb8697b31) but I wouldn't be surprised
if that missed a few cases, or if new ones have snuck in. Anyway,
"don't use pg_strcasecmp unnecessarily" was definitely the agreed-to
policy as of Jan 2018.

My vague recollection is that there are a few exceptions (defGetBoolean
may well be one of them) where pg_strcasecmp still seemed necessary
because the input might not have come through the parser in some usages.

regards, tom lane

#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#4)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

On Wed, Jun 24, 2020 at 12:55:22PM -0400, Tom Lane wrote:

Yeah, I'm sure there are a few inconsistencies. We previously made a
pass to get rid of pg_strcasecmp for anything that had been through
the parser's downcasing (commit fb8697b31) but I wouldn't be surprised
if that missed a few cases, or if new ones have snuck in. Anyway,
"don't use pg_strcasecmp unnecessarily" was definitely the agreed-to
policy as of Jan 2018.

0d8c9c1 has introduced some in parse_basebackup_options() for the
new manifest option, and fe30e7e for AlterType(), no?

My vague recollection is that there are a few exceptions (defGetBoolean
may well be one of them) where pg_strcasecmp still seemed necessary
because the input might not have come through the parser in some usages.

Yep, there were a couple of exceptions. What was done at this time
was a case-by-case lookup to check what came only from the parser.
--
Michael

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Tom Lane (#2)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

As that shows, there's already a round of lowercasing done by the parser.
The only way that strcasecmp in copy.c would be useful is if you wanted to
accept things like
copy foo from stdin (format "CSV");
I don't find that to be a terribly good idea. The normal implication
of quoting is that it *prevents* case folding, so why should that
happen anyway?

I was able to know that the parser does the lowercasing for other
parts of the query,
what I missed in my understanding is that about the proper usage of quoting.

Thanks for letting me know this point.

I agree with the above understanding to not change that behavior.

Please ignore this patch.

Thank you all for your responses.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#5)
Re: [PATCH] COPY command's data format option allows only lowercase csv, text or binary

On Thu, Jun 25, 2020 at 11:07:33AM +0900, Michael Paquier wrote:

0d8c9c1 has introduced some in parse_basebackup_options() for the
new manifest option, and fe30e7e for AlterType(), no?

Please forget this one. I had a moment of brain fade. Those have
been added for the option values, and on the option names we use
directly strcmp(), so I am not actually seeing a code path on HEAD
where we use pg_strcasecmp for something coming only from the parser.
--
Michael