Refactoring identifier checks to consistently use strcmp

Started by Daniel Gustafssonabout 9 years ago41 messageshackers

daniel@yesql.se

about 9 years ago

Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
mixed. Since the option defnames are all lowercased, either via IDENT, keyword
rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
so in quite a lot of places).

While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
hide a DefElem created with a mixed-case defname where it in other places is
expected to be in lowercase, which may lead to subtle bugs.

The attached patch refactors to use strcmp() consistently for option processing
in the command code as a pre-emptive belts+suspenders move against such subtle
bugs and to make the code more consistent. Also reorders a few checks to have
all in the same “format” and removes a comment related to the above.

Tested with randomizing case on options in make check (not included in patch).

cheers ./daniel

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Daniel Gustafsson (#1)

Re: Refactoring identifier checks to consistently use strcmp

Daniel Gustafsson wrote:

Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
mixed. Since the option defnames are all lowercased, either via IDENT, keyword
rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
so in quite a lot of places).

While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
hide a DefElem created with a mixed-case defname where it in other places is
expected to be in lowercase, which may lead to subtle bugs.

The attached patch refactors to use strcmp() consistently for option processing
in the command code as a pre-emptive belts+suspenders move against such subtle
bugs and to make the code more consistent. Also reorders a few checks to have
all in the same “format” and removes a comment related to the above.

Tested with randomizing case on options in make check (not included in patch).

Does it work correctly in the Turkish locale?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Daniel Gustafsson

daniel@yesql.se

about 9 years ago

In reply to: Alvaro Herrera (#2)

Re: Refactoring identifier checks to consistently use strcmp

On 04 Apr 2017, at 05:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Daniel Gustafsson wrote:

Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
mixed. Since the option defnames are all lowercased, either via IDENT, keyword
rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
so in quite a lot of places).

While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
hide a DefElem created with a mixed-case defname where it in other places is
expected to be in lowercase, which may lead to subtle bugs.

The attached patch refactors to use strcmp() consistently for option processing
in the command code as a pre-emptive belts+suspenders move against such subtle
bugs and to make the code more consistent. Also reorders a few checks to have
all in the same “format” and removes a comment related to the above.

Tested with randomizing case on options in make check (not included in patch).

Does it work correctly in the Turkish locale?

Yes, running make check with random case defnames under tr_TR works fine.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 8 years ago

In reply to: Daniel Gustafsson (#3)

Re: Refactoring identifier checks to consistently use strcmp

On 04/04/2017 10:13 AM, Daniel Gustafsson wrote:

On 04 Apr 2017, at 05:52, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Daniel Gustafsson wrote:

Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
mixed. Since the option defnames are all lowercased, either via IDENT, keyword
rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
so in quite a lot of places).

While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
hide a DefElem created with a mixed-case defname where it in other places is
expected to be in lowercase, which may lead to subtle bugs.

The attached patch refactors to use strcmp() consistently for option processing
in the command code as a pre-emptive belts+suspenders move against such subtle
bugs and to make the code more consistent. Also reorders a few checks to have
all in the same “format” and removes a comment related to the above.

Tested with randomizing case on options in make check (not included in patch).

Does it work correctly in the Turkish locale?

Yes, running make check with random case defnames under tr_TR works fine.

This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
TEMPLATE = pg_catalog.simple,
"STOPWORds" = english
);
ERROR: unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that
to begin wtih, but let's not break backwards-compatibility without a
better reason. I didn't thoroughly check all of the cases here, to see
if there are more like this.

It'd be nice to have some kind of a rule on when pg_strcasecmp should be
used and when strcmp() is enough. Currently, by looking at the code, I
can't tell.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Heikki Linnakangas (#4)

Re: Refactoring identifier checks to consistently use strcmp

Heikki Linnakangas <hlinnaka@iki.fi> writes:

This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
TEMPLATE = pg_catalog.simple,
"STOPWORds" = english
);
ERROR: unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that
to begin wtih, but let's not break backwards-compatibility without a
better reason. I didn't thoroughly check all of the cases here, to see
if there are more like this.

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

It'd be nice to have some kind of a rule on when pg_strcasecmp should be
used and when strcmp() is enough. Currently, by looking at the code, I
can't tell.

My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding. If the user
prevented case-folding by double-quoting, I don't have a lot of sympathy
for any complaints about it. Generally speaking, what we're dealing with
here is things that are logically keywords but we did not wish to make
them real parser keywords. But in SQL, once you quote a keyword, it's
not a keyword at all anymore. So I think the argument that quoting
"stopwords" should be legal at all in this context is pretty weak,
and the argument that quoting a weirdly-cased version of it should
work is even weaker.

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#5)

Re: Refactoring identifier checks to consistently use strcmp

On Wed, Aug 16, 2017 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Tom Lane (#5)

Re: Refactoring identifier checks to consistently use strcmp

On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
TEMPLATE = pg_catalog.simple,
"STOPWORds" = english
);
ERROR: unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that
to begin wtih, but let's not break backwards-compatibility without a
better reason. I didn't thoroughly check all of the cases here, to see
if there are more like this.

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

I agree with this, but I admittedly have no idea how common the above case
would be in the wild.

It'd be nice to have some kind of a rule on when pg_strcasecmp should be
used and when strcmp() is enough. Currently, by looking at the code, I
can't tell.

My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Daniel Gustafsson (#7)

Re: Refactoring identifier checks to consistently use strcmp

On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:

On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
TEMPLATE = pg_catalog.simple,
"STOPWORds" = english
);
ERROR: unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that
to begin wtih, but let's not break backwards-compatibility without a
better reason. I didn't thoroughly check all of the cases here, to see
if there are more like this.

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

I agree with this, but I admittedly have no idea how common the above case
would be in the wild.

It'd be nice to have some kind of a rule on when pg_strcasecmp should be
used and when strcmp() is enough. Currently, by looking at the code, I
can't tell.

My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.

+1

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

This patch has been marked “Waiting on Author”, but I’m not sure what the
concensus of this thread came to with regards to quoted keywords and backwards
compatibility. There seems to be a 2-1 vote for allowing a break, and forcing
all keywords out of the parser to be casefolded. Any other opinions?

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Daniel Gustafsson (#8)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Tue, Sep 5, 2017 at 5:34 PM, Daniel Gustafsson <daniel@yesql.se> wrote:

On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:

On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.

+1

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

This patch has been marked “Waiting on Author”, but I’m not sure what the
concensus of this thread came to with regards to quoted keywords and backwards
compatibility. There seems to be a 2-1 vote for allowing a break, and forcing
all keywords out of the parser to be casefolded. Any other opinions?

This patch impacts the DDL grammar of aggregates, operators,
collations, text search, views, etc. Still I agree with the purpose of
this thread that it would be nice to get a more consistent behavior
even if it breaks some queries, so +1 for the argument with the
post-parser comparison which should use strcmp.

The patch needs a rebase, and there are a couple of places that need
an extra lookup I think:
$ git grep defname -- *.c | grep strcasecmp | wc -l
39

Switching to "waiting on author" for now.
Thanks,
--
Michael

#10

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Michael Paquier (#9)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On 17 Nov 2017, at 03:31, Michael Paquier <michael.paquier@gmail.com> wrote:

On Tue, Sep 5, 2017 at 5:34 PM, Daniel Gustafsson <daniel@yesql.se> wrote:

On 17 Aug 2017, at 11:08, Daniel Gustafsson <daniel@yesql.se> wrote:

On 16 Aug 2017, at 17:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.

+1

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

This patch has been marked “Waiting on Author”, but I’m not sure what the
concensus of this thread came to with regards to quoted keywords and backwards
compatibility. There seems to be a 2-1 vote for allowing a break, and forcing
all keywords out of the parser to be casefolded. Any other opinions?

This patch impacts the DDL grammar of aggregates, operators,
collations, text search, views, etc. Still I agree with the purpose of
this thread that it would be nice to get a more consistent behavior
even if it breaks some queries, so +1 for the argument with the
post-parser comparison which should use strcmp.

Thanks for reviewing!

The patch needs a rebase, and there are a couple of places that need
an extra lookup I think:
$ git grep defname -- *.c | grep strcasecmp | wc -l
39

Rebased and handled a few more places which I had either missed in the last
round, or that had been added in the meantime. “PARALLEL” in aggregatecmds.c
is intentionally using pg_strcasecmp() due to the old-style syntax which is
still supported. AFAICS this covers all relevant codepaths from the 39 above.

cheers ./daniel

#11

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Daniel Gustafsson (#10)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Tue, Nov 28, 2017 at 12:11 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

The patch needs a rebase, and there are a couple of places that need
an extra lookup I think:
$ git grep defname -- *.c | grep strcasecmp | wc -l
39

Rebased and handled a few more places which I had either missed in the last
round, or that had been added in the meantime. “PARALLEL” in aggregatecmds.c
is intentionally using pg_strcasecmp() due to the old-style syntax which is
still supported.

This meritates a comment. Code readers may get confused.

AFAICS this covers all relevant codepaths from the 39 above.

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error
--
Michael

#12

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Michael Paquier (#11)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error

Daniel, I am waiting for your input on this one, and you did not have
much time to send an update. So I am moving this patch to next CF.
--
Michael

#13

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Michael Paquier (#12)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On 29 Nov 2017, at 04:29, Michael Paquier <michael.paquier@gmail.com> wrote:

On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error

Daniel, I am waiting for your input on this one, and you did not have
much time to send an update.

Sorry for the slow updates, I managed to catch a nasty fever which turned my
brain closer to a mushroom soup than I’d like.

So I am moving this patch to next CF.

Good move, thanks!

cheers ./daniel

#14

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Daniel Gustafsson (#13)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Thu, Nov 30, 2017 at 7:40 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

On 29 Nov 2017, at 04:29, Michael Paquier <michael.paquier@gmail.com> wrote:

On Tue, Nov 28, 2017 at 10:07 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); -- error

Daniel, I am waiting for your input on this one, and you did not have
much time to send an update.

Sorry for the slow updates, I managed to catch a nasty fever which turned my
brain closer to a mushroom soup than I’d like.

No problems. Take care.
--
Michael

#15

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Michael Paquier (#11)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On 28 Nov 2017, at 02:07, Michael Paquier <michael.paquier@gmail.com> wrote:

On Tue, Nov 28, 2017 at 12:11 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

The patch needs a rebase, and there are a couple of places that need
an extra lookup I think:
$ git grep defname -- *.c | grep strcasecmp | wc -l
39

Rebased and handled a few more places which I had either missed in the last
round, or that had been added in the meantime. “PARALLEL” in aggregatecmds.c
is intentionally using pg_strcasecmp() due to the old-style syntax which is
still supported.

This meritates a comment. Code readers may get confused.

Good point, added. I also sent a separate doc patch for this to -docs the
other day.

AFAICS this covers all relevant codepaths from the 39 above.

I was just looking at the tsearch code which uses pg_strcmpcase, and
those are defined with makeDefElem() so you should switch to strcmp in
this case as well, no? If I patch the code myself I would get an error
when double-quoting, making those command more consistent with the
rest of what you are patching here:
create extension unaccent;
alter text search dictionary unaccent (Rules = 'unaccent'); -- ok
alter text search dictionary unaccent (RuLes = 'unaccent'); -- ok
alter text search dictionary unaccent ("Rules" = 'unaccent'); — error

For reasons unknown to me I had avoided poking in contrib/. Attached patch
handles the additional defname comparisons in contrib that are applicable.

The remainder of the pg_strcasecmp() calls in the text search code are
operating on a defelem list created in deserialize_deflist() rather than in the
parser, so I opted for keeping that as is rather than casefolding in the list
generation.

cheers ./daniel

#16

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Daniel Gustafsson (#15)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Thu, Nov 30, 2017 at 6:40 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

For reasons unknown to me I had avoided poking in contrib/. Attached patch
handles the additional defname comparisons in contrib that are applicable.

I am having a bit of trouble understanding why the first hunk in
DefineAggregate() is taking PARALLEL as a special case. The
documentation gives three separate synopses for CREATE AGGREGATE, but
parallel appears in all of them, and there are other options that do
too, so the comment doesn't really help me understand why it's special
as compared to other, similar options.

I think the changes in DefineView and ATExecSetRelOptions is wrong,
because transformRelOptions() is still using pg_strcasecmp. With the
patch:

rhaas=# create view v(x) with ("Check_option"="local") as select 1;
CREATE VIEW
rhaas=# create view v(x) with ("check_option"="local") as select 1;
ERROR: WITH CHECK OPTION is supported only on automatically updatable views
HINT: Views that do not select from a single table or view are not
automatically updatable.

Oops.

Here are, for the record, examples of SQL that will be generate errors
or warnings with the patch, but not presently, with a note about which
function got changed at the C level to affect the behavior.

transformRelOptions/interpretOidsOption: create table a () with ("OiDs"=true);
DefineAggregate, second hunk: CREATE AGGREGATE avg (float8) (sfunc =
float8_accum, stype = float8[], finalfunc = float8_avg, initcond =
'{0,0,0}', parallel = 'sAfe');
DefineCollation: CREATE COLLATION stunk ("LoCaLeS" = "C");
compute_attributes_with_style: create function one() returns int as
$$select 1$$ language sql with ("isStrict" = 'true');
DefineOperator: create operator @ (procedure = pg_catalog.int4eq,
leftarg = int4, "RIGHTARG" = int4);
DefineType: create type awesome (input = int4in, "OuTpUt" = int4out);
validateWithCheckOption: create table t(a int, b text, unique(a));
create view x with (check_option = 'loCal') as select * from t;

I have not yet managed to figure out what the impact of the contrib
changes, or the text search changes in core, is. This is partly a
lack of subject matter expertise, but the fact that the functions
being modified in contrib have a grand total of 0 lines of comments
between them does not help. That's not this patch's fault, nor this
patch's job to fix, but it is a barrier to understanding. I think it
would be nice to have a complete list of examples of what syntax this
patch is affecting.

I am actually pretty dubious that we want to do this. I found one bug
already (see above), and I don't think there's any guarantee that it's
the only one, because it's pretty hard to be sure that none of the
remaining calls to pg_strcasecmp are being applied to any of these
values in some other part of the code. I'm not sure that the backward
compatibility issue is a huge deal, but from my point of view this
carries a significant risk of introducing new bugs, might annoy users
who spell any of these keywords in all caps with surrounding quotation
marks, and really has no clear benefit that I can see. The
originally-offered justification is that making this consistent would
help us avoid subtle bugs, but it seems at least as likely to CREATE
subtle bugs, and the status quo is AFAICT harming nobody.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Robert Haas (#16)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Fri, Dec 1, 2017 at 4:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the changes in DefineView and ATExecSetRelOptions is wrong,
because transformRelOptions() is still using pg_strcasecmp. With the
patch:

rhaas=# create view v(x) with ("Check_option"="local") as select 1;
CREATE VIEW
rhaas=# create view v(x) with ("check_option"="local") as select 1;
ERROR: WITH CHECK OPTION is supported only on automatically updatable views
HINT: Views that do not select from a single table or view are not
automatically updatable.

Oops.

I am getting the feeling that there could be other issues than this
one... If this patch ever gets integrated, I think that this should
actually be shaped as two patches:
- One patch with the set of DDL queries taking advantage of the
current grammar with pg_strcasecmp.
- A second patch that does the switch from pg_strcasecmp to strcmp,
and allows checking which paths of patch 1 are getting changed.
Patch 1 is the hard part to figure out all the possible patterns that
could be changed.

I am actually pretty dubious that we want to do this. I found one bug
already (see above), and I don't think there's any guarantee that it's
the only one, because it's pretty hard to be sure that none of the
remaining calls to pg_strcasecmp are being applied to any of these
values in some other part of the code. I'm not sure that the backward
compatibility issue is a huge deal, but from my point of view this
carries a significant risk of introducing new bugs, might annoy users
who spell any of these keywords in all caps with surrounding quotation
marks, and really has no clear benefit that I can see. The
originally-offered justification is that making this consistent would
help us avoid subtle bugs, but it seems at least as likely to CREATE
subtle bugs, and the status quo is AFAICT harming nobody.

Changing opinion here ;)
Yes, I agree that the risk of bugs may not be worth the compatibility
effort at the end. I still see value in this patch for long-term
purposes by making the code more consistent though.
--
Michael

#18

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Michael Paquier (#17)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

Greetings Michael, Daniel, all,

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Fri, Dec 1, 2017 at 4:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the changes in DefineView and ATExecSetRelOptions is wrong,
because transformRelOptions() is still using pg_strcasecmp. With the
patch:

rhaas=# create view v(x) with ("Check_option"="local") as select 1;
CREATE VIEW
rhaas=# create view v(x) with ("check_option"="local") as select 1;
ERROR: WITH CHECK OPTION is supported only on automatically updatable views
HINT: Views that do not select from a single table or view are not
automatically updatable.

Oops.

I am getting the feeling that there could be other issues than this
one... If this patch ever gets integrated, I think that this should
actually be shaped as two patches:
- One patch with the set of DDL queries taking advantage of the
current grammar with pg_strcasecmp.
- A second patch that does the switch from pg_strcasecmp to strcmp,
and allows checking which paths of patch 1 are getting changed.
Patch 1 is the hard part to figure out all the possible patterns that
could be changed.

I am actually pretty dubious that we want to do this. I found one bug
already (see above), and I don't think there's any guarantee that it's
the only one, because it's pretty hard to be sure that none of the
remaining calls to pg_strcasecmp are being applied to any of these
values in some other part of the code. I'm not sure that the backward
compatibility issue is a huge deal, but from my point of view this
carries a significant risk of introducing new bugs, might annoy users
who spell any of these keywords in all caps with surrounding quotation
marks, and really has no clear benefit that I can see. The
originally-offered justification is that making this consistent would
help us avoid subtle bugs, but it seems at least as likely to CREATE
subtle bugs, and the status quo is AFAICT harming nobody.

Changing opinion here ;)
Yes, I agree that the risk of bugs may not be worth the compatibility
effort at the end. I still see value in this patch for long-term
purposes by making the code more consistent though.

Looks like this discussion has progressed to where this patch should
really be marked as Rejected. Does anyone else want to voice an opinion
regarding it, or perhaps Daniel could post a rebuttal to the concerns
raised here?

Thinking through it, for my own 2c on this, I tend to agree with Tom
that, really, we should be using strcmp() for anything coming out of the
parser and that the backward-compatibility break from that is
acceptable. I also understand Robert's concern that there may be other
bugs hiding and I wonder if there might be a way to check for them,
though no great ideas spring to mind offhand. Would be great to hear
your thoughts, Daniel, so leaving this in Waiting on Author for now.

Thanks!

Stephen

#19

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Stephen Frost (#18)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

Stephen Frost <sfrost@snowman.net> writes:

Thinking through it, for my own 2c on this, I tend to agree with Tom
that, really, we should be using strcmp() for anything coming out of the
parser and that the backward-compatibility break from that is
acceptable. I also understand Robert's concern that there may be other
bugs hiding and I wonder if there might be a way to check for them,
though no great ideas spring to mind offhand. Would be great to hear
your thoughts, Daniel, so leaving this in Waiting on Author for now.

FWIW, I don't especially agree with Robert's position, because I think
he is ignoring the argument that it's a bug that some things are
case-sensitive and other seemingly similar things are not.

It's definitely concerning that the submitted patch introduced a new bug,
but we have seldom taken the position that bugs in an initial submission
are sufficient grounds for rejecting the entire concept.

ISTM that if this patch gets rid of a large fraction of the uses of
pg_strcasecmp in the backend, which is what I expect it should, then
it might not be out of reach to go through all the surviving ones to
make sure they are not processing strings that originate in the parser.

regards, tom lane

#20

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Tom Lane (#19)

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

On Sun, Jan 7, 2018 at 9:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

ISTM that if this patch gets rid of a large fraction of the uses of
pg_strcasecmp in the backend, which is what I expect it should, then
it might not be out of reach to go through all the surviving ones to
make sure they are not processing strings that originate in the parser.

Yeah, that's why I think that it is important for this patch to
provide as well tests to make sure that all the code paths are working
as they should with this patch.
--
Michael

#21

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#19)

#22

Daniel Gustafsson

daniel@yesql.se

over 8 years ago

In reply to: Robert Haas (#21)

#23

Daniel Gustafsson