BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan

Started by PG Bug reporting form12 months ago7 messagesbugs

noreply@postgresql.org

12 months ago

The following bug has been logged on the website:

Bug reference: 18904
Logged by: jinhui lai
Email address: jh.lai@qq.com
PostgreSQL version: 17.4
Operating system: ubuntu
Description:

Dear Postgres Developer,
If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases, you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.
Best regards,
1. How to repeat?
docker pull postgres:latest
docker run --name postgres -e POSTGRES_PASSWORD=1213 -d -p 5432:5432
postgres
docker exec -it postgres psql -U postgres -c "CREATE DATABASE testdb;"
docker exec -it postgres pgbench -U postgres -i -s 10 testdb
docker exec -it postgres psql -U postgres
psql (17.4 (Debian 17.4-1.pgdg120+2))
Type "help" for help.
\timing on
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 0.499 ms
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 16058.769 ms (00:16.059)
EXPLAIN SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
QUERY PLAN
---------------------------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..4776396.28 rows=1 width=708)
-> Append (cost=0.00..2776396.25 rows=100000001 width=708)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..0.00 rows=1
width=453)
-> Result (cost=0.00..0.00 rows=0 width=449)
One-Time Filter: false
-> Subquery Scan on "*SELECT* 1" (cost=0.00..2276396.25
rows=100000000 width=453)
-> Nested Loop (cost=0.00..1276396.25 rows=100000000
width=449)
-> Seq Scan on pgbench_accounts (cost=0.00..26394.00
rows=1000000 width=97)
-> Materialize (cost=0.00..2.50 rows=100 width=352)
-> Seq Scan on pgbench_tellers (cost=0.00..2.00
rows=100 width=352)
JIT:
Functions: 9
Options: Inlining true, Optimization true, Expressions true, Deforming
true
(13 rows)
2. What do i expect to see?
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 0.499 ms (few time)
3. What do i see instead
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 16058.769 ms (00:16.059)

David G. Johnston

david.g.johnston@gmail.com

12 months ago

In reply to: PG Bug reporting form (#1)

On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 18904
Logged by: jinhui lai
Email address: jh.lai@qq.com
PostgreSQL version: 17.4
Operating system: ubuntu
Description:

Dear Postgres Developer,
If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases, you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.

These failure to optimize requests are not bugs and are better discussed on
the -general list where some sense of demand can be ascertained.

There is little desire to evaluate where clause expressions in the manner
you propose, and while executing the subcomponents in most-restrictive to
least-restrictive would be nice - stopping should any of them return no
rows - it’s seems like quite a niche situation to spend time on.

David J.

Tom Lane

tgl@sss.pgh.pa.us

12 months ago

In reply to: David G. Johnston (#2)

Re: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org>
wrote:

If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases, you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.

These failure to optimize requests are not bugs and are better discussed on
the -general list where some sense of demand can be ascertained.

Indeed. To get something like this in, you have to demonstrate that
the required developer effort and planner runtime will be repaid by
successfully optimizing a nontrivial fraction of real-world cases.
I'm not really convinced about that in most of these cases.

In this specific case, it seems like it would not take very many extra
cycles for plan_set_operations and its subroutines to notice that an
input relation is "dummy" (proven empty) and then simplify the set
operation accordingly. But it might still not be worth doing, if it
complicates that already-complicated code a lot. You'd have to be
careful about ALL vs. DISTINCT for instance.

Anyway, as David said, this is not a bug. If it's something you
really want to see happen, try writing a patch yourself.

regards, tom lane

David Rowley

dgrowleyml@gmail.com

12 months ago

In reply to: Tom Lane (#3)

Re: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan

On Mon, 28 Apr 2025 at 06:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, as David said, this is not a bug. If it's something you
really want to see happen, try writing a patch yourself.

This is already somewhat better in v18 thanks to [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295.
generate_nonunion_paths() already has some code to put the smallest
child on the left, and the recent changes in nodeSetOp.c means very
little work will be done when the left side is empty.

David

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295

Jinhui Lai

jh.lai@qq.com

12 months ago

In reply to: David Rowley (#4)

Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan

Dear PostgreSQL Developer,

My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed. Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.

Best regrerds,
原始邮件

发件人：David Rowley <dgrowleyml@gmail.com>
发件时间：2025年4月29日 19:57
收件人：Tom Lane <tgl@sss.pgh.pa.us>
抄送：David G. Johnston <david.g.johnston@gmail.com>, jh.lai@qq.com <jh.lai@qq.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
主题：Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan

On Mon, 28 Apr 2025 at 06:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, as David said, this is not a bug.  If it's something you
> really want to see happen, try writing a patch yourself.

This is already somewhat better in v18 thanks to [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295.
generate_nonunion_paths() already has some code to put the smallest
child on the left, and the recent changes in nodeSetOp.c means very
little work will be done when the left side is empty.

David

[1]:  https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295

David Rowley

dgrowleyml@gmail.com

12 months ago

In reply to: Jinhui Lai (#5)

Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan

On Wed, 30 Apr 2025 at 00:13, Jinhui Lai <jh.lai@qq.com> wrote:

My point is that this kind of query, which is always logically empty,
should not be executed at all. Executing it not only wastes time, but more
concerningly, it can lead to incorrect results. I have discovered bugs in
MySQL, MariaDB, and TiDB where such queries, which should have returned
empty result sets, instead produced incorrect non-empty results due to
being actually executed. Since I’ve only recently started working with
PostgreSQL in the past two days, I haven’t yet tested whether it exhibits
similar bugs. If I do find any such cases, I will provide concrete
examples. Therefore, my additional point is that avoiding the execution of
such queries can fundamentally eliminate these types of logical bugs at
their source.

I appreciate you pointing these things out. There are certainly other
places in our codebase where we are better at optimising things when
finding a dummy relation. e.g INNER JOINs. Your part about "it can lead to
incorrect results", you've shown us no evidence of that. If you have some,
please demonstrate the incorrect results.

All we're telling you here is that you've shown us no evidence that there's
an actual bug here. Per Tom's email in [1]/messages/by-id/435085.1745776956@sss.pgh.pa.us, he doesn't seem particularly
against making improvements in this area and I'm not either, but not part
of a bug fix. Any changes for this would be for PostgreSQL 19 at the
earliest.

If you're keen to work on the patch, have a look at is_dummy_rel() in
joinrels.c. Likely that'll need to be modified to handle SubqueryScanPaths.
The bulk of the code changes after that will go
into generate_nonunion_paths(). Take note about Tom's warning about
INTERSECT ALL vs INTERSECT. If you do come up with a patch, post it to the
pgsql-hackers mailing list and explain what you'd like to do. See the
guidelines in [2]https://wiki.postgresql.org/wiki/Submitting_a_Patch.

David

[1]: /messages/by-id/435085.1745776956@sss.pgh.pa.us
[2]: https://wiki.postgresql.org/wiki/Submitting_a_Patch

Francisco Olarte

folarte@peoplecall.com

12 months ago

In reply to: Jinhui Lai (#5)

Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan

On Tue, 29 Apr 2025 at 18:13, Jinhui Lai <jh.lai@qq.com> wrote:

My point is that this kind of query, which is always logically empty,
should not be executed at all. Executing it not only wastes time, but more
concerningly, it can lead to incorrect results. I have discovered bugs in
MySQL, MariaDB, and TiDB where such queries, which should have returned
empty result sets, instead produced incorrect non-empty results due to
being actually executed.

So, you fear unfound bugs.

Since I’ve only recently started working with PostgreSQL in the past two

days, I haven’t yet tested whether it exhibits similar bugs. If I do find
any such cases, I will provide concrete examples. Therefore, my additional
point is that avoiding the execution of such queries can fundamentally
eliminate these types of logical bugs at their source.

If you find bugs the team is normally very good in finding those, and any
bug you find can affect not only "logically empty" but some other queries,
so it is better to fix the real bug. In fact you stop a testing tool,
sending some logically empty queries and asserting the empty result.

OTOH adding a "logically empty prover" adds complexity, maintenance effort
and bug surface to the system. You could end up with incorrectly empty
queries due to a bug in the "prover" classifying a query as empty where it
was not. And it could, and I believe it would, result in overall decreased
performance for most of the user base.

Francisco Olarte.