BUG #18430: syntax error when using aggregate function in where clause of subquery

Started by PG Bug reporting formabout 2 years ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18430
Logged by: Eric Atkin
Email address: eatkin@certusllc.us
PostgreSQL version: 16.2
Operating system: Arch Linux
Description:

CREATE TABLE driver (
id INTEGER PRIMARY KEY,
city TEXT
);

CREATE TABLE delivery (
id INTEGER PRIMARY KEY,
driver_id INTEGER REFERENCES driver(id)
);

SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
AS deliveries
FROM driver
GROUP BY city
;

This produces:

ERROR: syntax error at or near "array_agg"
LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...

I assume this is because aggregate functions are not normally allowed in a
where clause. However, my reading of the last paragraph of Section 4.2.7 of
the version 16 docs (quoted below) leads me to believe there should be an
exception to that constraint when a subquery occurs in an outer query select
list and refers to one of its variables. Is it possible there is a bug in
the parser causing it to not be aware of this exception?

Perhaps a join of delivery would be a simpler way to write this query in
this trivial example, but I think the subquery approach is more reasonable
in the real world case I've derived it from. In any case, I think postgres
should behave as documented even if I can only poorly demonstrate the
issue.

Thank you,
Eric Atkin

4.2.7 Aggregate Expressions

...

When an aggregate expression appears in a subquery (see Section 4.2.11 and
Section 9.23), the aggregate is normally evaluated over the rows of the
subquery. But an exception occurs if the aggregate's arguments (and
filter_clause if any) contain only outer-level variables: the aggregate then
belongs to the nearest such outer level, and is evaluated over the rows of
that query. The aggregate expression as a whole is then an outer reference
for the subquery it appears in, and acts as a constant over any one
evaluation of that subquery. The restriction about appearing only in the
result list or HAVING clause applies with respect to the query level that
the aggregate belongs to.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: PG Bug reporting form (#1)
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote:

SELECT
    city,
    (SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id))
AS deliveries
FROM driver
GROUP BY city
;

This produces:

ERROR:  syntax error at or near "array_agg"
LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(...

This not a bug, but bad syntax.

Write

SELECT city,
(SELECT count(*)
FROM delivery
WHERE driver_id = ANY (drivers)) AS deliveries
FROM (SELECT city,
array_agg(driver.id) AS drivers
FROM driver
GROUP BY city) AS q;

Yours,
Laurenz Albe

#3Eric Atkin
eatkin@certusllc.us
In reply to: Laurenz Albe (#2)
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

"The aggregate expression as a whole is then an outer reference
for the subquery it appears in, and acts as a constant over any one
evaluation of that subquery."

It seems the aggregate function call should be evaluated at the outer layer
(where it would not be bad syntax) and then is a constant for the inner
query where clause where an array would be allowed.

Thank you for your working example. There is often more than one way to
write any particular query. My version and yours seem to mean the same
thing, but one is arguably more readable. At the very least, is this not a
documentation bug?

On Fri, Apr 12, 2024 at 11:45 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Show quoted text

On Fri, 2024-04-12 at 17:14 +0000, PG Bug reporting form wrote:

SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(

driver.id))

AS deliveries
FROM driver
GROUP BY city
;

This produces:

ERROR: syntax error at or near "array_agg"
LINE 3: ...(SELECT count(*) FROM delivery WHERE driver_id IN

array_agg(...

This not a bug, but bad syntax.

Write

SELECT city,
(SELECT count(*)
FROM delivery
WHERE driver_id = ANY (drivers)) AS deliveries
FROM (SELECT city,
array_agg(driver.id) AS drivers
FROM driver
GROUP BY city) AS q;

Yours,
Laurenz Albe

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

On Fri, Apr 12, 2024 at 10:32 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18430
Logged by: Eric Atkin
Email address: eatkin@certusllc.us
PostgreSQL version: 16.2
Operating system: Arch Linux
Description:

(SELECT count(*) FROM delivery WHERE driver_id IN array_agg(driver.id
))

This doesn't seem to have anything to do with scoping of inner and outer
queries. Where did you come to believe that:

scalar IN array

is a valid operator/expression.

i.e., your query fails for the same reason this one does:

select 1 in array[1,2,3];

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Atkin (#3)
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

Eric Atkin <eatkin@certusllc.us> writes:

It seems the aggregate function call should be evaluated at the outer layer
(where it would not be bad syntax) and then is a constant for the inner
query where clause where an array would be allowed.

Correct, but the problem is not with the array_agg call, it's with
your use of IN. In the first place, IN requires parens around its
righthand side. But that only gets us past "syntax error":

regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
ERROR: operator does not exist: integer = integer[]
LINE 3: ... (SELECT count(*) FROM delivery WHERE driver_id IN (array_...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

That's because what's inside the parens is supposed to be either a
sub-select or a list of things directly comparable to the LHS.
To do what you're after, you need to use the "scalar = ANY(array)"
construct:

regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
city | deliveries
------+------------
(0 rows)

IN and =ANY are more-or-less equivalent when the RHS is a sub-select,
but not for an array RHS.

There's nothing particularly wrong with Laurenz's construction,
but it's not necessary to split it up like that.

regards, tom lane

#6Eric Atkin
eatkin@certusllc.us
In reply to: Tom Lane (#5)
Re: BUG #18430: syntax error when using aggregate function in where clause of subquery

Perfect. I had tried the `= ANY` construct but got the same syntax error
and mistakenly assumed it was a problem with the parser rather than my
misreading of the `ANY (array)` docs (9.24.3).
Thank you both for your help.

On Fri, Apr 12, 2024 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Eric Atkin <eatkin@certusllc.us> writes:

It seems the aggregate function call should be evaluated at the outer

layer

(where it would not be bad syntax) and then is a constant for the inner
query where clause where an array would be allowed.

Correct, but the problem is not with the array_agg call, it's with
your use of IN. In the first place, IN requires parens around its
righthand side. But that only gets us past "syntax error":

regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id IN (array_agg(driver.id
)))
AS deliveries
FROM driver
GROUP BY city
;
ERROR: operator does not exist: integer = integer[]
LINE 3: ... (SELECT count(*) FROM delivery WHERE driver_id IN (array_...
^
HINT: No operator matches the given name and argument types. You might
need to add explicit type casts.

That's because what's inside the parens is supposed to be either a
sub-select or a list of things directly comparable to the LHS.
To do what you're after, you need to use the "scalar = ANY(array)"
construct:

regression=# SELECT
city,
(SELECT count(*) FROM delivery WHERE driver_id = any (array_agg(
driver.id)))
AS deliveries
FROM driver
GROUP BY city
;
city | deliveries
------+------------
(0 rows)

IN and =ANY are more-or-less equivalent when the RHS is a sub-select,
but not for an array RHS.

There's nothing particularly wrong with Laurenz's construction,
but it's not necessary to split it up like that.

regards, tom lane