BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

Started by PG Bug reporting formover 3 years ago5 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17623
Logged by: Stamatis Zampetakis
Email address: zabetak@gmail.com
PostgreSQL version: 14.5
Operating system: Debian 10.2.1-6
Description:

Steps to reproduce:
CREATE TABLE emp (empno INT, name VARCHAR, deptno INT);
INSERT INTO emp VALUES (0, 'Alex', 0);
INSERT INTO emp VALUES (10, 'Bob', 1);

CREATE TABLE dept (deptno INT);
INSERT INTO dept VALUES (1);

SELECT e.name
FROM emp e
INNER JOIN dept d ON e.deptno = d.deptno
WHERE (10 / e.empno) = 1

Actual output:
ERROR: division by zero

Expected output:
Bob

The error is caused since the filter condition in the WHERE clause is
evaluated before the join. Filter push-down is a very common and powerful
optimization but when there are operators in the WHERE clause that may throw
(such as division, cast, etc) this optimization is unsafe.

The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
applied to the result of FROM so in the case above pushing filters below the
join seems to violate the standard.

Citing the standard:
"If all optional clauses are omitted, then the result of the <table
expression> is the same as the result of the
<from clause>. Otherwise, each specified clause is applied to the result of
the previously specified clause
and the result of the <table expression> is the result of the application of
the last specified clause."

One of the optional clauses mentioned in the previous paragraph is the
<where clause>. There seems to be a clearly defined order between the <from
clause>, which includes inner joins, and the <where clause>.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

On Wed, Sep 28, 2022 at 5:29 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17623
Logged by: Stamatis Zampetakis
Email address: zabetak@gmail.com
PostgreSQL version: 14.5
Operating system: Debian 10.2.1-6
Description:

Steps to reproduce:
CREATE TABLE emp (empno INT, name VARCHAR, deptno INT);
INSERT INTO emp VALUES (0, 'Alex', 0);
INSERT INTO emp VALUES (10, 'Bob', 1);

CREATE TABLE dept (deptno INT);
INSERT INTO dept VALUES (1);

SELECT e.name
FROM emp e
INNER JOIN dept d ON e.deptno = d.deptno
WHERE (10 / e.empno) = 1

Actual output:
ERROR: division by zero

Expected output:
Bob

The error is caused since the filter condition in the WHERE clause is
evaluated before the join. Filter push-down is a very common and powerful
optimization but when there are operators in the WHERE clause that may
throw
(such as division, cast, etc) this optimization is unsafe.

The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
applied to the result of FROM so in the case above pushing filters below
the
join seems to violate the standard.

The failure to document such a deviation from the standard can be
considered a bug but not the deviation itself. That is intentional. In
terms of trade-offs the current behavior seems reasonable. You'd need a
real example motivating a desire to make a change that will likely add
complexity and cost to every query most of which work just fine with
relevant clauses pushed down to restrict the volume of data that needs to
be joined.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The SQL standard (Section 7.4 general rule 1) mandates that WHERE should be
applied to the result of FROM so in the case above pushing filters below
the join seems to violate the standard.

The failure to document such a deviation from the standard can be
considered a bug but not the deviation itself. That is intentional. In
terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries. I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation. A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

regards, tom lane

#4Stamatis Zampetakis
zabetak@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

I agree that brutally disabling filter pushdown is out of question.
However, it may not be so bad in terms of performance if done selectively
only for operators that may raise an error.

If for instance you have a big conjunction and only one conjunct contains a
cast you can push everything below the join and keep only the cast on the
top.

There may be other ways to retain performance while being inline with
standard SQL semantics; I haven't given too much though about it.

I didn't find a relevant discussion in the archives so I thought it was
worth putting this down for people who may bump into this in the future.

I don't have a strong opinion if it is worth trying to fix this or not
cause I don't really know much about Postgres internals.

In any case, it may be good to document the current behavior somewhere.

Note that the workaround with CASE WHEN can address this simplistic example
but it can easily get very complicated if you want to perform a cast based
on the result of a complex join; every join condition has to become part of
the WHEN clause.

There are probably other workarounds such as wrapping the join in a sub
query, putting the risky computation in project, and the filtering in a
outer query but these syntactic rewritings may become meaningless in the
eyes of the query optimizer. I don't know what Postgres will do in this
case but I know other optimizers that will happily pushdown the filter all
the way down and on top of the scan.

All that to say that the SQL semantics that say that WHERE is applied on
the result of FROM are quite well-known so when exceptions come into play
things get tricky.

Apologies for the long message and many thanks for taking the time to look
into this.

On Wed, Sep 28, 2022, 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The SQL standard (Section 7.4 general rule 1) mandates that WHERE

should be

applied to the result of FROM so in the case above pushing filters below
the join seems to violate the standard.

The failure to document such a deviation from the standard can be
considered a bug but not the deviation itself. That is intentional. In
terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries. I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation. A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

regards, tom lane

#5Stamatis Zampetakis
zabetak@gmail.com
In reply to: Stamatis Zampetakis (#4)
Re: BUG #17623: WHERE should be evaluated after FROM clause when operators may throw

Just for the sake of completeness, (in case it was not clear in my previous
message), I would like to highlight that this problem is different from the
evaluation order of predicates in the WHERE clause, which is implementation
specific. Due to that, the documentation below (shared by Tom) does not
apply here.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

If we want to document this deviation from the standard it should be done
in the page below:

https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-WHERE

Best,
Stamatis

On Wed, Sep 28, 2022 at 5:44 PM Stamatis Zampetakis <zabetak@gmail.com>
wrote:

Show quoted text

I agree that brutally disabling filter pushdown is out of question.
However, it may not be so bad in terms of performance if done selectively
only for operators that may raise an error.

If for instance you have a big conjunction and only one conjunct contains
a cast you can push everything below the join and keep only the cast on the
top.

There may be other ways to retain performance while being inline with
standard SQL semantics; I haven't given too much though about it.

I didn't find a relevant discussion in the archives so I thought it was
worth putting this down for people who may bump into this in the future.

I don't have a strong opinion if it is worth trying to fix this or not
cause I don't really know much about Postgres internals.

In any case, it may be good to document the current behavior somewhere.

Note that the workaround with CASE WHEN can address this simplistic
example but it can easily get very complicated if you want to perform a
cast based on the result of a complex join; every join condition has to
become part of the WHEN clause.

There are probably other workarounds such as wrapping the join in a sub
query, putting the risky computation in project, and the filtering in a
outer query but these syntactic rewritings may become meaningless in the
eyes of the query optimizer. I don't know what Postgres will do in this
case but I know other optimizers that will happily pushdown the filter all
the way down and on top of the scan.

All that to say that the SQL semantics that say that WHERE is applied on
the result of FROM are quite well-known so when exceptions come into play
things get tricky.

Apologies for the long message and many thanks for taking the time to look
into this.

On Wed, Sep 28, 2022, 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The SQL standard (Section 7.4 general rule 1) mandates that WHERE

should be

applied to the result of FROM so in the case above pushing filters

below

the join seems to violate the standard.

The failure to document such a deviation from the standard can be
considered a bug but not the deviation itself. That is intentional. In
terms of trade-offs the current behavior seems reasonable.

Enforcing such a restriction would be absolutely disastrous for the
performance of a large number of real-world queries. I doubt that
any other RDBMS takes that spec wording literally either.

I'd suggest something like

CASE WHEN e.empno = 0 THEN NULL ELSE 10 / e.empno END

if you need to program around the lack of guarantees about the
order of evaluation. A WITH clause can also be used as an
optimization fence when you need one.

BTW, this *is* documented, see for example

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

regards, tom lane