Allow subfield references without parentheses
This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).
This has been requested a number of times over the years. [0]/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com is a
recent discussion that has mentioned it.
Specifically, identifier chains of three or more items now have an
additional possible interpretation.
Before:
A.B.C: schema A, table B, column or function C
A.B.C.D: database A, schema B, table C, column or function D
Now additionally:
A.B.C: correlation A, column B, field C; like (A.B).C
A.B.C.D: correlation A, column B, field C, field D; like (A.B).C.D
Also, identifier chains longer than four items now have an analogous
interpretation. They had no possible interpretation before.
(Note that single identifiers and two-part identifiers are not affected
at all.)
The "correlation A" above must be an explicit alias, not just a table name.
If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.
In [0]/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com there was some light discussion about other possible behaviors in
case of conflicts. In any case, with this patch it's possible to
experiment with different possible behaviors, by just replacing the
conditional that errors by another action. I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.
I left a couple of TODO notes in the patch such as where documentation
should be updated, and I didn't do anything about SQL and PL/pgSQL
parameters so far. Also, I tried to weave the additional code into
transformColumnRef() in a way that doesn't move much existing code
around, but eventually this should probably be reorganized a bit to
reduce duplication.
Another thing to think about would be the exact phrasing of any error
messages. Right now, transformColumnRef() assumes that a given
identifier chain can only have one possible interpretation and if it
doesn't find the thing the error says "didn't find the thing". But now
if there are multiple possible interpretations, it should probably say
something more like "didn't find this and also not that" or "didn't find
anything that matches that" or some other variant. I mean, what it does
now isn't bad, but given the amount of attention we have put into the
fine-tuning of these specific errors in the past, some additional
changes might be desired.
[0]: /messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com
/messages/by-id/CAFiTN-uiwaogH-dbz-ARpUUQM+RQKdU2qmPh1WzM6gEyS8PVRA@mail.gmail.com
Attachments:
v0-0001-Allow-subfield-references-without-parentheses.patchtext/plain; charset=UTF-8; name=v0-0001-Allow-subfield-references-without-parentheses.patchDownload
From 9fea0127a51d1504658b7bab4d9ac505d6135ce3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 12 Dec 2024 08:21:51 +0100
Subject: [PATCH v0] Allow subfield references without parentheses
This allows subfield references in column references without
parentheses, subject to certain condition. This implements the rules
from the SQL standard (since SQL99).
Specifically, identifier chains of three or more items now have an
additional possible interpretation.
Before:
A.B.C: schema A, table B, column or function C
A.B.C.D: database A, schema B, table C, column or function D
Now additionally:
A.B.C: correlation A, column B, field C; like (A.B).C
A.B.C.D: correlation A, column B, field C, field D; like (A.B).C.D
Also, identifier chains longer than four items now have an analogous
interpretation. They had no possible interpretation before.
The "correlation A" above must be an explicit alias, not just a table
name.
If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.
---
doc/src/sgml/syntax.sgml | 2 +
src/backend/executor/functions.c | 3 +
src/backend/parser/parse_expr.c | 216 +++++++++++++++++++++----
src/test/regress/expected/rowtypes.out | 19 +++
src/test/regress/sql/rowtypes.sql | 6 +
5 files changed, 218 insertions(+), 28 deletions(-)
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 916189a7d68..37225c84758 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1441,6 +1441,8 @@ <title>Field Selection</title>
<primary>field selection</primary>
</indexterm>
+ <!-- TODO -->
+
<para>
If an expression yields a value of a composite type (row type), then a
specific field of the row can be extracted by writing
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 3b2f78b2197..1e17951f45f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -302,6 +302,9 @@ sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var)
* (the first possibility takes precedence)
* A.B.C A = function name, B = record-typed parameter name,
* C = field name
+ * A.B.C.D...
+ * A = function name, B = record-typed parameter name,
+ * C, D, etc. = field names TODO
* A.* Whole-row reference to composite parameter A.
* A.B.* Same, with A = function name, B = parameter name
*
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index c2806297aa4..872b7d6be98 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -499,6 +499,12 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
return result;
}
+static bool
+type_is_subfieldable(Oid typeid)
+{
+ return ISCOMPLEX(typeid) || typeid == RECORDOID;
+}
+
/*
* Transform a ColumnRef.
*
@@ -507,18 +513,19 @@ transformIndirection(ParseState *pstate, A_Indirection *ind)
static Node *
transformColumnRef(ParseState *pstate, ColumnRef *cref)
{
- Node *node = NULL;
+ Node *node = NULL, *node2 = NULL;
char *nspname = NULL;
char *relname = NULL;
char *colname = NULL;
- ParseNamespaceItem *nsitem;
- int levels_up;
+ ParseNamespaceItem *nsitem, *nsitem2;
+ int levels_up, levels_up2;
+ ColumnRef *indcref = NULL;
+ List *indirection = NULL;
enum
{
CRERR_NO_COLUMN,
CRERR_NO_RTE,
- CRERR_WRONG_DB,
- CRERR_TOO_MANY
+ CRERR_AMBIGUOUS,
} crerr = CRERR_NO_COLUMN;
const char *err;
@@ -617,8 +624,12 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
* if no luck, try to resolve as unqualified table name (A.*).
* A.B A is an unqualified table name; B is either a
* column or function name (trying column name first).
- * A.B.C schema A, table B, col or func name C.
- * A.B.C.D catalog A, schema B, table C, col or func D.
+ * A.B.C schema A, table B, col or func name C; or
+ * correlation A, column B, field C.
+ * A.B.C.D catalog A, schema B, table C, col or func D; or
+ * correlation A, column B, fields C, D.
+ * A.B.C.D.E...
+ * correlation A, column B, fields C, D, E, etc.
* A.* A is an unqualified table name; means whole-row value.
* A.B.* whole-row value of table B in schema A.
* A.B.C.* whole-row value of table C in schema B in catalog A.
@@ -724,7 +735,45 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
nsitem = refnameNamespaceItem(pstate, nspname, relname,
cref->location,
&levels_up);
- if (nsitem == NULL)
+
+ /*
+ * Also look it up as a subfieldable column reference, but
+ * then it can't end with a star
+ */
+ if (!IsA(field3, A_Star))
+ nsitem2 = refnameNamespaceItem(pstate, NULL, strVal(field1),
+ cref->location,
+ &levels_up2);
+ else
+ nsitem2 = NULL;
+
+ /* must be an explicit alias */
+ if (nsitem2 && nsitem2->p_rte->alias == NULL)
+ nsitem2 = NULL;
+
+ if (nsitem2)
+ node2 = scanNSItemForColumn(pstate, nsitem2, levels_up2, strVal(field2), cref->location);
+
+ /*
+ * If we found a potential subfield reference, check that the
+ * type is subfieldable, else forget it.
+ */
+ if (node2)
+ {
+ if (type_is_subfieldable(castNode(Var, node2)->vartype))
+ {
+ indcref = copyObject(cref);
+ indcref->fields = list_truncate(indcref->fields, 2);
+ indirection = list_copy_tail(cref->fields, 2);
+ }
+ else
+ {
+ nsitem2 = NULL;
+ node2 = NULL;
+ }
+ }
+
+ if (nsitem == NULL && nsitem2 == NULL)
{
crerr = CRERR_NO_RTE;
break;
@@ -740,9 +789,12 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
colname = strVal(field3);
+ if (nsitem)
+ {
/* Try to identify as a column of the nsitem */
node = scanNSItemForColumn(pstate, nsitem, levels_up, colname,
cref->location);
+
if (node == NULL)
{
/* Try it as a function call on the whole row */
@@ -756,6 +808,13 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
false,
cref->location);
}
+ }
+
+ if (node != NULL && node2 != NULL)
+ {
+ crerr = CRERR_AMBIGUOUS;
+ break;
+ }
break;
}
case 4:
@@ -770,20 +829,52 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
nspname = strVal(field2);
relname = strVal(field3);
+ /* Locate the referenced nsitem (only eligible if database name matches) */
+ if (strcmp(catname, get_database_name(MyDatabaseId)) == 0)
+ nsitem = refnameNamespaceItem(pstate, nspname, relname,
+ cref->location,
+ &levels_up);
+ else
+ nsitem = NULL;
+
+ /*
+ * Also look it up as a subfieldable column reference, but
+ * then it can't end with a star
+ */
+ if (!IsA(field4, A_Star))
+ nsitem2 = refnameNamespaceItem(pstate, NULL, strVal(field1),
+ cref->location,
+ &levels_up2);
+ else
+ nsitem2 = NULL;
+
+ /* must be an explicit alias */
+ if (nsitem2 && nsitem2->p_rte->alias == NULL)
+ nsitem2 = NULL;
+
+ if (nsitem2)
+ node2 = scanNSItemForColumn(pstate, nsitem2, levels_up2, strVal(field2), cref->location);
+
/*
- * We check the catalog name and then ignore it.
+ * If we found a potential subfield reference, check that the
+ * type is subfieldable, else forget it.
*/
- if (strcmp(catname, get_database_name(MyDatabaseId)) != 0)
+ if (node2)
{
- crerr = CRERR_WRONG_DB;
- break;
+ if (type_is_subfieldable(castNode(Var, node2)->vartype))
+ {
+ indcref = copyObject(cref);
+ indcref->fields = list_truncate(indcref->fields, 2);
+ indirection = list_copy_tail(cref->fields, 2);
+ }
+ else
+ {
+ nsitem2 = NULL;
+ node2 = NULL;
+ }
}
- /* Locate the referenced nsitem */
- nsitem = refnameNamespaceItem(pstate, nspname, relname,
- cref->location,
- &levels_up);
- if (nsitem == NULL)
+ if (nsitem == NULL && nsitem2 == NULL)
{
crerr = CRERR_NO_RTE;
break;
@@ -799,6 +890,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
colname = strVal(field4);
+ if (nsitem)
+ {
/* Try to identify as a column of the nsitem */
node = scanNSItemForColumn(pstate, nsitem, levels_up, colname,
cref->location);
@@ -815,11 +908,85 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
false,
cref->location);
}
+ }
+
+ if (node != NULL && node2 != NULL)
+ {
+ crerr = CRERR_AMBIGUOUS;
+ break;
+ }
break;
}
default:
- crerr = CRERR_TOO_MANY; /* too many dotted names */
- break;
+ {
+ Node *field1 = (Node *) linitial(cref->fields);
+ Node *field2 = (Node *) lsecond(cref->fields);
+ Node *fieldl = (Node *) llast(cref->fields);
+
+ /* XXX for error reporting below */
+ relname = strVal(field1);
+
+ /*
+ * Look it up as a subfieldable column reference, but then it
+ * can't end with a star
+ */
+ if (!IsA(fieldl, A_Star))
+ nsitem2 = refnameNamespaceItem(pstate, NULL, strVal(field1),
+ cref->location,
+ &levels_up2);
+ else
+ nsitem2 = NULL;
+
+ /* must be an explicit alias */
+ if (nsitem2 && nsitem2->p_rte->alias == NULL)
+ nsitem2 = NULL;
+
+ if (nsitem2)
+ node2 = scanNSItemForColumn(pstate, nsitem2, levels_up2, strVal(field2), cref->location);
+
+ /*
+ * If we found a potential subfield reference, check that the
+ * type is subfieldable, else forget it.
+ */
+ if (node2)
+ {
+ if (type_is_subfieldable(castNode(Var, node2)->vartype))
+ {
+ indcref = copyObject(cref);
+ indcref->fields = list_truncate(indcref->fields, 2);
+ indirection = list_copy_tail(cref->fields, 2);
+ }
+ else
+ {
+ nsitem2 = NULL;
+ node2 = NULL;
+ }
+ }
+
+ if (nsitem2 == NULL)
+ {
+ crerr = CRERR_NO_RTE;
+ break;
+ }
+ break;
+ }
+ }
+
+ /*
+ * If we decided it's a subfield reference, convert it to an indirection
+ * (as if you had written "(A.B).C.D" instead of "A.B.C.D"). (Note that
+ * the subfield references detected above always come from an identifier
+ * chain of length >= 3, but the indirections we are building here have a
+ * column reference of length 2, and so there won't be any endless
+ * recursion.)
+ */
+ if (node2)
+ {
+ A_Indirection *a = makeNode(A_Indirection);
+
+ a->arg = (Node *) indcref;
+ a->indirection = indirection;
+ node = transformIndirection(pstate, a);
}
/*
@@ -860,17 +1027,10 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
errorMissingRTE(pstate, makeRangeVar(nspname, relname,
cref->location));
break;
- case CRERR_WRONG_DB:
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cross-database references are not implemented: %s",
- NameListToString(cref->fields)),
- parser_errposition(pstate, cref->location)));
- break;
- case CRERR_TOO_MANY:
+ case CRERR_AMBIGUOUS:
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("improper qualified name (too many dotted names): %s",
+ errmsg("ambiguous identifier chain: %s",
NameListToString(cref->fields)),
parser_errposition(pstate, cref->location)));
break;
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index 9168979a620..885338ac0d1 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -100,6 +100,7 @@ SELECT * FROM pg_input_error_info('(1,1e400)', 'complex');
"1e400" is out of range for type double precision | | | 22003
(1 row)
+-- test identifier chain syntax for column and field references
create temp table quadtable(f1 int, q quad);
insert into quadtable values (1, ((3.3,4.4),(5.5,6.6)));
insert into quadtable values (2, ((null,4.4),(5.5,6.6)));
@@ -121,6 +122,24 @@ select f1, (q).c1, (qq.q).c1.i from quadtable qq;
2 | (,4.4) | 4.4
(2 rows)
+select f1, qq.q.c1 from quadtable qq;
+ f1 | c1
+----+-----------
+ 1 | (3.3,4.4)
+ 2 | (,4.4)
+(2 rows)
+
+select f1, qq.q.c1.i from quadtable qq;
+ f1 | i
+----+-----
+ 1 | 4.4
+ 2 | 4.4
+(2 rows)
+
+select f1, quadtable.q.c1.i from quadtable; -- fails, works only with explicit alias
+ERROR: missing FROM-clause entry for table "c1"
+LINE 1: select f1, quadtable.q.c1.i from quadtable;
+ ^
create temp table people (fn fullname, bd date);
insert into people values ('(Joe,Blow)', '1984-01-10');
select * from people;
diff --git a/src/test/regress/sql/rowtypes.sql b/src/test/regress/sql/rowtypes.sql
index 174b062144a..145c8a640fb 100644
--- a/src/test/regress/sql/rowtypes.sql
+++ b/src/test/regress/sql/rowtypes.sql
@@ -38,6 +38,7 @@
SELECT * FROM pg_input_error_info('(1,zed)', 'complex');
SELECT * FROM pg_input_error_info('(1,1e400)', 'complex');
+-- test identifier chain syntax for column and field references
create temp table quadtable(f1 int, q quad);
insert into quadtable values (1, ((3.3,4.4),(5.5,6.6)));
@@ -49,6 +50,11 @@
select f1, (q).c1, (qq.q).c1.i from quadtable qq;
+select f1, qq.q.c1 from quadtable qq;
+select f1, qq.q.c1.i from quadtable qq;
+
+select f1, quadtable.q.c1.i from quadtable; -- fails, works only with explicit alias
+
create temp table people (fn fullname, bd date);
insert into people values ('(Joe,Blow)', '1984-01-10');
base-commit: bd10ec529796a13670645e6acd640c6f290df020
--
2.47.1
Peter Eisentraut <peter@eisentraut.org> writes:
This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).
This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.
The obvious concern about this is introduction of ambiguity where
there was none before.
If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.
Not sure if it's rare or not, but I agree with raising an error rather
than silently choosing one alternative. We won't find out if it's
problematic unless we throw an error.
... I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.
I assume that what will happen is that ruleutils will continue to emit
our traditional notation with the extra parentheses. I think we need
to leave it like that, so as not to create a compatibility booby-trap
for loading dumps into older PG versions.
regards, tom lane
On Thu, 12 Dec 2024, 21:45 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).
This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.The obvious concern about this is introduction of ambiguity where
there was none before.
IMHO SQL standard compatibility is a more compelling argument here.
Show quoted text
On Thu, Dec 12, 2024 at 5:54 PM Peter Eisentraut <peter@eisentraut.org> wrote:
This patch allows subfield references in column references without
parentheses, subject to certain condition. This implements (hopes to,
anyway) the rules from the SQL standard (since SQL99).This has been requested a number of times over the years. [0] is a
recent discussion that has mentioned it.Specifically, identifier chains of three or more items now have an
additional possible interpretation.Before:
A.B.C: schema A, table B, column or function C
A.B.C.D: database A, schema B, table C, column or function DNow additionally:
A.B.C: correlation A, column B, field C; like (A.B).C
A.B.C.D: correlation A, column B, field C, field D; like (A.B).C.DAlso, identifier chains longer than four items now have an analogous
interpretation. They had no possible interpretation before.(Note that single identifiers and two-part identifiers are not affected
at all.)The "correlation A" above must be an explicit alias, not just a table name.
If both possible interpretations apply, then an error is raised. (A
workaround is to change the alias used in the query.) Such errors
should be very rare in practice.
A naive question: instead of performing correlation checks in
transformColumnRef(), can we use transformIndirection() after suitably
constructing A_Indirection node? That way we will cover all the
indirection cases like A.B[i].C as well? This will also address some
difference between the current checks and the checks performed in
transformIndirection() e.g. the checks in patch use ISCOMPLEX()
whereas the checks in
transformIndirection()->ParseFuncOrColumn()->ParseComplexProjection()
check for COMPOSITE types.
In [0] there was some light discussion about other possible behaviors in
case of conflicts. In any case, with this patch it's possible to
experiment with different possible behaviors, by just replacing the
conditional that errors by another action. I also studied ruleutils.c a
bit to see if there are any tweaks needed to support this. So far it
seems okay. I'm sure we can come up with some pathological cases, but
so far I haven't done anything about it.
I found a minor inconvenience
#create view idchain as select f1, qq.q.c1 from qtable qq;
CREATE VIEW
#\d+ idchain
View "public.idchain"
Column | Type | Collation | Nullable | Default | Storage | Description
--------+---------+-----------+----------+---------+----------+-------------
f1 | integer | | | | plain |
c1 | complex | | | | extended |
View definition:
SELECT f1,
(q).c1 AS c1
FROM qtable qq;
The original view definition did not use indirection but the one that
will be dumped and restored will use indirection. That is not a
correctness issue and there may be other places where we might be
already modifying view definitions this way.
--
Best Wishes,
Ashutosh Bapat