Weirdness (bug?) with aggregates and subqueries

Started by Laurenz Albeover 2 years ago3 messagesgeneral
Jump to latest
#1Laurenz Albe
laurenz.albe@cybertec.at

I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):

CREATE TABLE aa (a INT);
INSERT INTO aa VALUES (1), (2), (3);
CREATE TABLE xx (x INT);
INSERT INTO xx VALUES (10), (20), (30);

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

sum
═════
6
(1 row)

Huh? Shouldn't that return three rows, just like

SELECT (SELECT sum(14) FROM xx LIMIT 1) FROM aa;

sum
═════
42
42
42
(3 rows)

Looking at the plan of the weird query, the aggregate seems to be in the wrong place:

EXPLAIN (COSTS OFF) SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

QUERY PLAN
══════════════════════════════
Aggregate
-> Seq Scan on aa
SubPlan 1
-> Limit
-> Seq Scan on xx
(5 rows)

And this gives an error:

SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
ERROR: column "aa.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a, (SELECT sum(a) FROM xx LIMIT 1) FROM aa;
^

I think that the optimizer is going astray here...

But perhaps I am missing something obvious.

Yours,
Laurenz Albe

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#1)
Re: Weirdness (bug?) with aggregates and subqueries

Laurenz Albe <laurenz.albe@cybertec.at> writes:

I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):
CREATE TABLE aa (a INT);
INSERT INTO aa VALUES (1), (2), (3);
CREATE TABLE xx (x INT);
INSERT INTO xx VALUES (10), (20), (30);

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

sum
═════
6
(1 row)

Huh? Shouldn't that return three rows, just like

No. The aggregate function is semantically of the closest query level
that contributes a Var to its argument, so it's evaluated at the "FROM
aa" level, causing that level to become an aggregated query that
returns just one row. Then it acts like an outer reference as far
as the sub-select is concerned. This is documented at the end of
Section 4.2.7 in our manual,

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Thank the SQL spec for that weirdness.

regards, tom lane

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#2)
Re: Weirdness (bug?) with aggregates and subqueries

On Wed, 2023-11-08 at 16:36 -0500, Tom Lane wrote:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

I found this in a blog (https://buttondown.email/jaffray/archive/sql-scoping-is-surprisingly-subtle-and-semantic/):
CREATE TABLE aa (a INT);
INSERT INTO aa VALUES (1), (2), (3);
CREATE TABLE xx (x INT);
INSERT INTO xx VALUES (10), (20), (30);

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa;

sum
═════
6
(1 row)

Huh? Shouldn't that return three rows, just like

No. The aggregate function is semantically of the closest query level
that contributes a Var to its argument, so it's evaluated at the "FROM
aa" level, causing that level to become an aggregated query that
returns just one row. Then it acts like an outer reference as far
as the sub-select is concerned. This is documented at the end of
Section 4.2.7 in our manual,

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Thank the SQL spec for that weirdness.

Thanks for the explanation. Seems like another instance of the standard
committee smoking the wrong stuff.

Yours,
Laurenz Albe