Allow subfield references without parentheses

Started by Peter Eisentrautabout 1 year ago4 messages

peter_e@gmx.net

about 1 year ago

This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).

This has been requested a number of times over the years. [0]/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com is a
recent discussion that has mentioned it.

Specifically, identifier chains of three or more items now have an
additional possible interpretation.

Before:

A.B.C: schema A, table B, column or function C
A.B.C.D: database A, schema B, table C, column or function D

Now additionally:

A.B.C: correlation A, column B, field C; like (A.B).C
A.B.C.D: correlation A, column B, field C, field D; like (A.B).C.D

Also, identifier chains longer than four items now have an analogous
interpretation. They had no possible interpretation before.

(Note that single identifiers and two-part identifiers are not affected
at all.)

The "correlation A" above must be an explicit alias, not just a table name.

If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.

In [0]/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com there was some light discussion about other possible behaviors in
case of conflicts. In any case, with this patch it's possible to
experiment with different possible behaviors, by just replacing the
conditional that errors by another action. I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.

I left a couple of TODO notes in the patch such as where documentation
should be updated, and I didn't do anything about SQL and PL/pgSQL
parameters so far. Also, I tried to weave the additional code into
transformColumnRef() in a way that doesn't move much existing code
around, but eventually this should probably be reorganized a bit to
reduce duplication.

Another thing to think about would be the exact phrasing of any error
messages. Right now, transformColumnRef() assumes that a given
identifier chain can only have one possible interpretation and if it
doesn't find the thing the error says "didn't find the thing". But now
if there are multiple possible interpretations, it should probably say
something more like "didn't find this and also not that" or "didn't find
anything that matches that" or some other variant. I mean, what it does
now isn't bad, but given the amount of attention we have put into the
fine-tuning of these specific errors in the past, some additional
changes might be desired.

[0]: /messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com
/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Peter Eisentraut (#1)

Re: Allow subfield references without parentheses

Peter Eisentraut <peter@eisentraut.org> writes:

This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).
This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.

The obvious concern about this is introduction of ambiguity where
there was none before.

If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.

Not sure if it's rare or not, but I agree with raising an error rather
than silently choosing one alternative. We won't find out if it's
problematic unless we throw an error.

... I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.

I assume that what will happen is that ruleutils will continue to emit
our traditional notation with the extra parentheses. I think we need
to leave it like that, so as not to create a compatibility booby-trap
for loading dumps into older PG versions.

regards, tom lane

Kirill Reshke

reshkekirill@gmail.com

about 1 year ago

In reply to: Tom Lane (#2)

Re: Allow subfield references without parentheses

On Thu, 12 Dec 2024, 21:45 Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).
This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.

The obvious concern about this is introduction of ambiguity where
there was none before.

IMHO SQL standard compatibility is a more compelling argument here.

Show quoted text

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: Peter Eisentraut (#1)

Re: Allow subfield references without parentheses

On Thu, Dec 12, 2024 at 5:54 PM Peter Eisentraut <peter@eisentraut.org> wrote:

This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).

This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.

Specifically, identifier chains of three or more items now have an
additional possible interpretation.

Before:

A.B.C: schema A, table B, column or function C
A.B.C.D: database A, schema B, table C, column or function D

Now additionally:

A.B.C: correlation A, column B, field C; like (A.B).C
A.B.C.D: correlation A, column B, field C, field D; like (A.B).C.D

Also, identifier chains longer than four items now have an analogous
interpretation. They had no possible interpretation before.

(Note that single identifiers and two-part identifiers are not affected
at all.)

The "correlation A" above must be an explicit alias, not just a table name.

If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.

A naive question: instead of performing correlation checks in
transformColumnRef(), can we use transformIndirection() after suitably
constructing A_Indirection node? That way we will cover all the
indirection cases like A.B[i].C as well? This will also address some
difference between the current checks and the checks performed in
transformIndirection() e.g. the checks in patch use ISCOMPLEX()
whereas the checks in
transformIndirection()->ParseFuncOrColumn()->ParseComplexProjection()
check for COMPOSITE types.

In [0] there was some light discussion about other possible behaviors in
case of conflicts. In any case, with this patch it's possible to
experiment with different possible behaviors, by just replacing the
conditional that errors by another action. I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.

The original view definition did not use indirection but the one that
will be dumped and restored will use indirection. That is not a
correctness issue and there may be other places where we might be
already modifying view definitions this way.

--
Best Wishes,
Ashutosh Bapat

Allow subfield references without parentheses

Attachments: