WHERE column = X AND column = Y will always be zero matching rows

Started by [Quipsy] Markus Kargover 2 years ago8 messagesbugs
Jump to latest

Hello PostgreSQL Developers,

I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit.

(In the following X and Y are literals; X <> Y)

I noticed is that...

EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y

...says that PostgreSQL actually wants to perform a Scan!

I wonder why squandering any resources into a Scan here, as it is pretty obvious that the result is guaranteed to be always, under any conditions, and will always be: zero matching rows - at least in a universe where a single value cannot be X AND Y at the same time.

This can be seen as a chance to add an optimization ("col = X AND col = Y always is zero matching rows), or in case such an optimization already exists in PostgreSQL, as a bug.

BTW the same happens for column IS NULL AND column = Y.

Regards,
-Markus

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: [Quipsy] Markus Karg (#1)
Re: WHERE column = X AND column = Y will always be zero matching rows

On Thursday, August 3, 2023, [Quipsy] Markus Karg <karg@quipsy.de> wrote:

EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y

…says that PostgreSQL actually wants to perform a Scan!

I wonder why squandering any resources into a Scan here

Mostly because no one wants to volunteer to write code to deal with poorly
written queries, nor, more technically, expend compute resources looking
for situations that should not happen.

David J.

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: [Quipsy] Markus Karg (#1)
Re: WHERE column = X AND column = Y will always be zero matching rows

On Thu, 2023-08-03 at 14:20 +0000, [Quipsy] Markus Karg wrote:

I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit.
 
(In the following X and Y are literals; X <> Y)
 
I noticed is that…
 
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
 
…says that PostgreSQL actually wants to perform a Scan!

This is not a bug.

I wonder why squandering any resources into a Scan here, as it is pretty obvious that the
result is guaranteed to be always, under any conditions, and will always be: zero matching
rows – at least in a universe where a single value cannot be X AND Y at the same time.

Such an optimization, while certainly possible, would not be free, and each
query with more than one WHERE condition would have to pay the price.
On the other hand, only ill-written queries would benefit.

Since PostgreSQL only caches execution plans in special cases, anything that
slows down the optimizer should improve enough statements that it is a net win.
I don't see that here (but that is of course a matter of opinion).

Yours,
Laurenz Albe

In reply to: David G. Johnston (#2)
AW: WHERE column = X AND column = Y will always be zero matching rows

David,

thank you for your opinion, so it is a chance for future optimization.

Whether or not this „should not happen“ is a different discussion. 😉

-Markus

Von: David G. Johnston <david.g.johnston@gmail.com>
Gesendet: Donnerstag, 3. August 2023 16:48
An: [Quipsy] Markus Karg <karg@quipsy.de>
Cc: pgsql-bugs@lists.postgresql.org
Betreff: Re: WHERE column = X AND column = Y will always be zero matching rows

On Thursday, August 3, 2023, [Quipsy] Markus Karg <karg@quipsy.de<mailto:karg@quipsy.de>> wrote:
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y

…says that PostgreSQL actually wants to perform a Scan!

I wonder why squandering any resources into a Scan here

Mostly because no one wants to volunteer to write code to deal with poorly written queries, nor, more technically, expend compute resources looking for situations that should not happen.

David J.

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: [Quipsy] Markus Karg (#4)
Re: WHERE column = X AND column = Y will always be zero matching rows

On Thu, Aug 3, 2023, 07:53 [Quipsy] Markus Karg <karg@quipsy.de> wrote:

David,

thank you for your opinion, so it is a chance for future optimization.

Whether or not this „should not happen“ is a different discussion. 😉

Given that this has come up a number of times over the decades with the
same response the practical reality of an optimization are basically zero.
You want to patch your own fork then sure, it is technically possible.

David J.

Show quoted text
#6Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: [Quipsy] Markus Karg (#1)
Re: WHERE column = X AND column = Y will always be zero matching rows

On Thu, Aug 3, 2023 at 9:21 AM [Quipsy] Markus Karg <karg@quipsy.de> wrote:

Hello PostgreSQL Developers,

I am using PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit.

(In the following X and Y are literals; X <> Y)
I noticed is that…

EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y

…says that PostgreSQL actually wants to perform a Scan!

Actually no... you're original complain, never execute the scan
(because of the filter is recognized as false)
the one, with "col is null and col = Y" on the other side do execute the scan

postgres=# create table t1 (id int);
CREATE TABLE
postgres=# insert into t1 select generate_series(1, 100000);
INSERT 0 100000

postgres=# explain select * from t1 where id =5 and id =6;
QUERY PLAN
-----------------------------------------------------------
Result (cost=0.00..1855.06 rows=1 width=4)
One-Time Filter: false
-> Seq Scan on t1 (cost=0.00..1855.06 rows=1 width=4)
Filter: (id = 5)
(4 filas)

postgres=# explain analyze select * from t1 where id =5 and id =6;
QUERY PLAN
---------------------------------------------------------------------------------------
Result (cost=0.00..1693.00 rows=1 width=4) (actual time=0.005..0.007
rows=0 loops=1)
One-Time Filter: false
-> Seq Scan on t1 (cost=0.00..1693.00 rows=1 width=4) (never executed)
Filter: (id = 5)
Planning Time: 0.200 ms
Execution Time: 0.056 ms
(6 filas)

--
Jaime Casanova
SYSTEMGUARDS

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: [Quipsy] Markus Karg (#1)
Re: WHERE column = X AND column = Y will always be zero matching rows

"[Quipsy] Markus Karg" <karg@quipsy.de> writes:

(In the following X and Y are literals; X <> Y)
I noticed is that...
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
...says that PostgreSQL actually wants to perform a Scan!

You should learn to read plans before complaining about them.

What you actually get is something like

regression=# explain analyze select * from tenk1 where hundred = 42 and hundred = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Result (cost=5.04..224.95 rows=1 width=244) (actual time=0.001..0.002 rows=0 loops=1)
One-Time Filter: false
-> Bitmap Heap Scan on tenk1 (cost=5.04..224.95 rows=1 width=244) (never executed)
Recheck Cond: (hundred = 42)
-> Bitmap Index Scan on tenk1_hundred (cost=0.00..5.04 rows=100 width=0) (never executed)
Index Cond: (hundred = 42)
Planning Time: 0.148 ms
Execution Time: 0.046 ms
(8 rows)

See the "never executed" markings? The one-time filter prevents
anything below the Result from actually doing anything.

From memory, the reason it's like this is that it's the most
straightforward extension of cases involving pseudoconstant
conditions, that is parameters not known at plan time.
If you had something like "WHERE var = param1 AND var = const1"
then you might get something involving a "param1 = const1"
one-time filter, and below that a plan to retrieve rows with
"var = const1", which would only be run when the run-time-supplied
parameter happens to match the constant.

David's statement that we don't want to write code for such cases
isn't completely true, but it is true that we don't want to expend
code on producing a pretty-looking plan for them.

BTW the same happens for column IS NULL AND column = Y.

That indeed is a case we haven't bothered with, and probably
never will. The multiple-constants-equated-to-the-same-thing
situation is dealt with by the equivalence class machinery,
which has many other purposes but happens to be able to detect
that situation cheaply. But IS NULL is not an equivalence
condition (as we define those, anyway). Likewise, we won't
reduce something like "var > C1 AND var < C1" to constant-false;
there's just not enough win probability to justify spending
planner cycles looking for such cases.

regards, tom lane

In reply to: Tom Lane (#7)
AW: WHERE column = X AND column = Y will always be zero matching rows

Tom,

thanks a lot for your kind explanation, this was very interesting to read!

In fact, I have missed the "never executed" marks.

BTW, I did not "complain". I just noticed something and was unclear if this is a bug or not, and I clearly told so in the opening of my posting. It was definitively not my intention to bother or annoy anybody, actually. Sorry to have troubled so many in this mailing list.

-Markus

-----Ursprüngliche Nachricht-----
Von: Tom Lane <tgl@sss.pgh.pa.us>
Gesendet: Donnerstag, 3. August 2023 19:59
An: [Quipsy] Markus Karg <karg@quipsy.de>
Cc: pgsql-bugs@lists.postgresql.org
Betreff: Re: WHERE column = X AND column = Y will always be zero matching rows

"[Quipsy] Markus Karg" <karg@quipsy.de> writes:

(In the following X and Y are literals; X <> Y) I noticed is that...
EXPLAIN SELECT column FROM Table WHERE column = X AND column = Y
...says that PostgreSQL actually wants to perform a Scan!

You should learn to read plans before complaining about them.

What you actually get is something like

regression=# explain analyze select * from tenk1 where hundred = 42 and hundred = 99;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Result (cost=5.04..224.95 rows=1 width=244) (actual time=0.001..0.002 rows=0 loops=1)
One-Time Filter: false
-> Bitmap Heap Scan on tenk1 (cost=5.04..224.95 rows=1 width=244) (never executed)
Recheck Cond: (hundred = 42)
-> Bitmap Index Scan on tenk1_hundred (cost=0.00..5.04 rows=100 width=0) (never executed)
Index Cond: (hundred = 42) Planning Time: 0.148 ms Execution Time: 0.046 ms
(8 rows)

See the "never executed" markings? The one-time filter prevents anything below the Result from actually doing anything.

From memory, the reason it's like this is that it's the most straightforward extension of cases involving pseudoconstant conditions, that is parameters not known at plan time.
If you had something like "WHERE var = param1 AND var = const1"
then you might get something involving a "param1 = const1"
one-time filter, and below that a plan to retrieve rows with "var = const1", which would only be run when the run-time-supplied parameter happens to match the constant.

David's statement that we don't want to write code for such cases isn't completely true, but it is true that we don't want to expend code on producing a pretty-looking plan for them.

BTW the same happens for column IS NULL AND column = Y.

That indeed is a case we haven't bothered with, and probably never will. The multiple-constants-equated-to-the-same-thing
situation is dealt with by the equivalence class machinery, which has many other purposes but happens to be able to detect that situation cheaply. But IS NULL is not an equivalence condition (as we define those, anyway). Likewise, we won't reduce something like "var > C1 AND var < C1" to constant-false; there's just not enough win probability to justify spending planner cycles looking for such cases.

regards, tom lane