BUG #16900: SELECT MAX subquery not allowed in WHERE clause when using WITH RECURSIVE

Started by PG Bug reporting formabout 5 years ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16900
Logged by: Daniel Farlow
Email address: dan.farlow@gmail.com
PostgreSQL version: 13.2
Operating system: macOS Mojave 10.14.6
Description:

Exact sequence to reproduce problem:

CREATE TABLE IF NOT EXISTS
Tasks (task_id int, subtasks_count int);

TRUNCATE TABLE Tasks;

INSERT INTO
Tasks (task_id, subtasks_count)
VALUES
('1', '3'),
('2', '2'),
('3', '4');

CREATE TABLE IF NOT EXISTS
Executed (task_id int, subtask_id int);

TRUNCATE TABLE Executed;

INSERT INTO
Executed (task_id, subtask_id)
VALUES
('1', '2'),
('3', '1'),
('3', '2'),
('3', '3'),
('3', '4');

WITH RECURSIVE possible_tasks_subtasks AS (
SELECT
task_id, subtasks_count as max_subtask_count, 1 AS subtask_id
FROM
Tasks
UNION ALL
SELECT
task_id, max_subtask_count, subtask_id + 1
FROM
possible_tasks_subtasks
-- using SELECT MAX in the WHERE clause below is what causes the error
WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks))
SELECT
P.task_id, P.subtask_id
FROM
possible_tasks_subtasks P
LEFT JOIN
Executed E ON P.task_id = E.task_id AND P.subtask_id = E.subtask_id
WHERE
E.task_id IS NULL OR E.subtask_id IS NULL;

Output received: ERROR: aggregate functions are not allowed in WHERE

Output expected (result set in any order):

+---------+---------------+
| task_id | subtask_id |
+---------+---------------+
| 1 | 1 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
+---------+---------------+

Platform information: PostgreSQL 13.2 on x86_64-apple-darwin18.7.0, compiled
by Apple clang version 11.0.0 (clang-1100.0.33.17), 64-bit

Comments: As highlighted in the docs on aggregate functions (ref:
https://www.postgresql.org/docs/current/tutorial-agg.html), the aggregate
max cannot be used directly in the WHERE clause but a query can be restated
so as to get the desired result using a subquery; hence, something like the
example in the docs

SELECT city FROM weather WHERE temp_lo = max(temp_lo);

can be transformed to

SELECT city FROM weather
WHERE temp_lo = (SELECT max(temp_lo) FROM weather);

I tried this same approach in my WITH query, as stated above, but received
"ERROR: aggregate functions are not allowed in WHERE". I looked at the docs
on WITH queries (ref:
https://www.postgresql.org/docs/current/queries-with.html) but did not see
any mention of aggregate function restriction.

Two other things possibly worth mentioning that may be relevant:
1. The problematic query executes as expected when using MySQL version
8.0.23. (This seems to be relevant since a rival database has implemented
this ability.)
2. I can modify my PostgreSQL query to work as expected by modifying

WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks)

to be

WHERE
subtask_id < (SELECT max_subtask_count FROM Tasks ORDER BY
max_subtask_count DESC LIMIT 1)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16900: SELECT MAX subquery not allowed in WHERE clause when using WITH RECURSIVE

PG Bug reporting form <noreply@postgresql.org> writes:

CREATE TABLE IF NOT EXISTS
Tasks (task_id int, subtasks_count int);
...

UNION ALL
SELECT
task_id, max_subtask_count, subtask_id + 1
FROM
possible_tasks_subtasks
-- using SELECT MAX in the WHERE clause below is what causes the error
WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks))

max_subtask_count is not a field of Tasks; perhaps you meant to
write MAX(subtasks_count)?

As the query is written, the aggregate is over a field of
possible_tasks_subtasks, making it illegal in WHERE, just
as the error says. (From the point of view of the "SELECT FROM Tasks"
subquery, it's a constant outer reference, not an aggregate of
that subquery. This is per SQL spec.)

regards, tom lane

#3Pantelis Theodosiou
ypercube@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #16900: SELECT MAX subquery not allowed in WHERE clause when using WITH RECURSIVE

On Fri, Feb 26, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

CREATE TABLE IF NOT EXISTS
Tasks (task_id int, subtasks_count int);
...

UNION ALL
SELECT
task_id, max_subtask_count, subtask_id + 1
FROM
possible_tasks_subtasks
-- using SELECT MAX in the WHERE clause below is what causes the error
WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks))

max_subtask_count is not a field of Tasks; perhaps you meant to
write MAX(subtasks_count)?

As the query is written, the aggregate is over a field of
possible_tasks_subtasks, making it illegal in WHERE, just
as the error says. (From the point of view of the "SELECT FROM Tasks"
subquery, it's a constant outer reference, not an aggregate of
that subquery. This is per SQL spec.)

regards, tom lane

Tom explains the reasons for the fail.
We can't be sure of what is needed but perhaps it's this:

WHERE
subtask_id < max_subtask_count

which gives you the wanted output.

Best regards
Pantelis

Show quoted text
#4Pantelis Theodosiou
ypercube@gmail.com
In reply to: Pantelis Theodosiou (#3)
Re: BUG #16900: SELECT MAX subquery not allowed in WHERE clause when using WITH RECURSIVE

On Fri, Feb 26, 2021 at 9:20 PM Pantelis Theodosiou <ypercube@gmail.com>
wrote:

On Fri, Feb 26, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:
...

WHERE
subtask_id < (SELECT MAX(max_subtask_count) FROM Tasks))

max_subtask_count is not a field of Tasks; perhaps you meant to
write MAX(subtasks_count)?

As the query is written, the aggregate is over a field of
possible_tasks_subtasks, making it illegal in WHERE, just
as the error says. (From the point of view of the "SELECT FROM Tasks"
subquery, it's a constant outer reference, not an aggregate of
that subquery. This is per SQL spec.)

regards, tom lane

Tom explains the reasons for the fail.
We can't be sure of what is needed but perhaps it's this:

WHERE
subtask_id < max_subtask_count

which gives you the wanted output.

And if I understand correctly the desired query, it can be expressed

without recursive CTEs, using generate_series():

SELECT
T.task_id, G.subtask_id
FROM
Tasks AS T
CROSS JOIN LATERAL
generate_series(1, subtasks_count) AS G (subtask_id)
LEFT JOIN
Executed E
ON T.task_id = E.task_id AND G.subtask_id = E.subtask_id
WHERE
E.task_id IS NULL ;

Show quoted text

Best regards
Pantelis