BUG #16368: Incorrect function inlining in the presence of a window function

Started by PG Bug reporting formalmost 6 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16368
Logged by: Elvis Pranskevichus
Email address: elprans@gmail.com
PostgreSQL version: 12.2
Operating system: Gentoo Linux
Description:

Consider the following function:

CREATE OR REPLACE FUNCTION intfmt(input text, fmt text)
RETURNS bigint
LANGUAGE sql
AS $$
SELECT
CASE WHEN fmt IS NULL
THEN input::bigint
ELSE to_number(input, fmt)::bigint
END;
$$;

And the following query:

SELECT
intfmt('123,456', q.fmt) AS "out"
FROM
(SELECT
("v" = first_value("v") OVER ()),
'999,999' AS "fmt"
FROM
(SELECT 1 AS "v") AS "q2"
) AS "q";

The expected result is the integer 123456, but the query fails with:

ERROR: invalid input syntax for type bigint: "123,456"
CONTEXT: SQL function "intfmt" during inlining

Which means that somehow during inlining of "intfmt" Postgres incorrectly
takes the first branch in the `CASE` expression. This only happens in the
presence of the "first_value" window call in the nested query.

Elvis

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

On Wed, Apr 15, 2020 at 11:07 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 16368
Logged by: Elvis Pranskevichus
Email address: elprans@gmail.com
PostgreSQL version: 12.2
Operating system: Gentoo Linux
Description:

Consider the following function:

CREATE OR REPLACE FUNCTION intfmt(input text, fmt text)
[...]
SELECT
CASE WHEN fmt IS NULL
THEN input::bigint
ELSE to_number(input, fmt)::bigint
END;
[...]
SELECT
[...]

intfmt('123,456', q.fmt) AS "out"

The expected result is the integer 123456, but the query fails with:

ERROR: invalid input syntax for type bigint: "123,456"
CONTEXT: SQL function "intfmt" during inlining

Which means that somehow during inlining of "intfmt" Postgres incorrectly
takes the first branch in the `CASE` expression.

During inlining the case expression becomes:

CASE WHEN q.fmt IS NULL
THEN '123,456'::bigint
ELSE to_number('123,456', q.fmt)
END;

It doesn't "take" a branch - it turns variables into constants and, as
written, some of those constants are invalid for the types they are being
assigned to.

This only happens in the
presence of the "first_value" window call in the nested query.

The ability to optimize, and how, depends on the whole query.

I don't actually know whether this is a bug or just an expected downside to
using inline-able functions and case statements to avoid malformed data
parsing.

Writing the function in pl/pgsql prevents the inlining and stabilizes the
query.

David J.

#3Elvis Pranskevichus
elprans@gmail.com
In reply to: David G. Johnston (#2)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

On Wednesday, April 15, 2020 11:32:54 A.M. PDT David G. Johnston wrote:

During inlining the case expression becomes:

CASE WHEN q.fmt IS NULL
THEN '123,456'::bigint
ELSE to_number('123,456', q.fmt)
END;

It doesn't "take" a branch - it turns variables into constants and, as
written, some of those constants are invalid for the types they are
being assigned to.

This only happens in the
presence of the "first_value" window call in the nested query.

The ability to optimize, and how, depends on the whole query.

I don't actually know whether this is a bug or just an expected
downside to using inline-able functions and case statements to avoid
malformed data parsing.

Writing the function in pl/pgsql prevents the inlining and stabilizes
the query.

IMO, an optimization that leads to wrong query results is unquestionably
a bug. And "use pl/pgsql" (which is much slower) is arguably not the
best answer here.

inline-able functions and case statements to avoid
malformed data parsing.

Consider any other case where an error is guarded by a "CASE WHEN", such
as division by zero. I think the use of "CASE WHEN" should disqualify
the function from being inlined, or, maybe, constant folding should be
disabled in the branches of "CASE WHEN" when inlining and when the
optimizer is unable to reason about the "CASE" condition.

Thoughts?

Elvis

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Elvis Pranskevichus (#3)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

On Wed, Apr 15, 2020 at 1:08 PM Elvis Pranskevichus <elprans@gmail.com>
wrote:

On Wednesday, April 15, 2020 11:32:54 A.M. PDT David G. Johnston wrote:

During inlining the case expression becomes:

CASE WHEN q.fmt IS NULL
THEN '123,456'::bigint
ELSE to_number('123,456', q.fmt)
END;

It doesn't "take" a branch - it turns variables into constants and, as
written, some of those constants are invalid for the types they are
being assigned to.

This only happens in the
presence of the "first_value" window call in the nested query.

The ability to optimize, and how, depends on the whole query.

I don't actually know whether this is a bug or just an expected
downside to using inline-able functions and case statements to avoid
malformed data parsing.

Writing the function in pl/pgsql prevents the inlining and stabilizes
the query.

IMO, an optimization that leads to wrong query results is unquestionably
a bug. And "use pl/pgsql" (which is much slower) is arguably not the
best answer here.

Maybe not, but if you want something that works it is a solution.

inline-able functions and case statements to avoid
malformed data parsing.

Consider any other case where an error is guarded by a "CASE WHEN", such
as division by zero.

There aren't all that many and throwing out a perfectly good optimization
for boundary cases that will error, as opposed to returning but including
invalid results, isn't that desirable either.

I think the use of "CASE WHEN" should disqualify

the function from being inlined, or, maybe, constant folding should be
disabled in the branches of "CASE WHEN" when inlining and when the
optimizer is unable to reason about the "CASE" condition.

Thoughts?

Outside my wheelhouse as to what is practical. I suspect we'll get a
better answer on that front in due time.
The world isn't perfect and I sure cannot write a patch to change this
behavior and for my money its something I could live with if it was decided
that this is the best option at this time.

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#4)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Wed, Apr 15, 2020 at 1:08 PM Elvis Pranskevichus <elprans@gmail.com>
wrote:

Consider any other case where an error is guarded by a "CASE WHEN", such
as division by zero.

There aren't all that many and throwing out a perfectly good optimization
for boundary cases that will error, as opposed to returning but including
invalid results, isn't that desirable either.

In point of fact, there are many ways in which CASE and related constructs
fail to guarantee evaluation order, as noted in
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

The particular case mentioned there seems to be about the same as here:
constant-folding happens even in CASE arms that will never be reached at
runtime. Problems could happen earlier than that, too. It's not hard to
imagine cases that wouldn't execute "as desired" unless we didn't even do
parse analysis (e.g, subexpression type determination) on a CASE arm until
it's reached at runtime. But we're not going to make that sort of thing
happen; it's just too much contortion and inefficiency for too little
benefit. In particular, people for whom the current implementation works
OK would be quite unhappy with the performance impact of de-optimizing
CASE that way.

regards, tom lane

#6Elvis Pranskevichus
elprans@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

On Wednesday, April 15, 2020 5:14:05 P.M. PDT Tom Lane wrote:

In point of fact, there are many ways in which CASE and related
constructs fail to guarantee evaluation order, as noted in
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EX
PRESS-EVAL

The particular case mentioned there seems to be about the same as
here: constant-folding happens even in CASE arms that will never be
reached at runtime.

Yes, but function arguments aren't constants are they? At least the
documentation makes no effort to mention that.

would be quite unhappy with the performance impact of de-optimizing
CASE that way.

I'm not arguing for the general de-optimization for CASE, just for not
treating arguments of inlined functions as constants in the CASE
statement. For arguments of a prepared statement this optimization
makes even less sense.

Elvis

#7Elvis Pranskevichus
elprans@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

The particular case mentioned there seems to be about the same as
here: constant-folding happens even in CASE arms that will never be
reached at runtime.

Since the actual issue here is false positive exceptions, why can't we
treat errors raised during const eval as a signal to cancel the constant
folding, i.e. ignore them and keep the failing expression unoptimized?

Elvis

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Elvis Pranskevichus (#6)
Re: BUG #16368: Incorrect function inlining in the presence of a window function

At Wed, 15 Apr 2020 18:17:30 -0700, Elvis Pranskevichus <elprans@gmail.com> wrote in

On Wednesday, April 15, 2020 5:14:05 P.M. PDT Tom Lane wrote:

In point of fact, there are many ways in which CASE and related
constructs fail to guarantee evaluation order, as noted in
https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EX
PRESS-EVAL

The particular case mentioned there seems to be about the same as
here: constant-folding happens even in CASE arms that will never be
reached at runtime.

Yes, but function arguments aren't constants are they? At least the
documentation makes no effort to mention that.

would be quite unhappy with the performance impact of de-optimizing
CASE that way.

I'm not arguing for the general de-optimization for CASE, just for not
treating arguments of inlined functions as constants in the CASE
statement. For arguments of a prepared statement this optimization
makes even less sense.

Perhaps there's seems to be no nice way to detect the case where an
arm in a case expression cannot even be executed although the
expression is seemingly constant-foldable. (Note that all arms of the
case has a chance to be reached since the case-expression is NOT a
constant at the parse time.) Isn't there any workaround usable for
you? I'm not sure what is your real issue, but it seems to me
unnatural that the first argument of intfmt is a literal string.

For example, the following query works.

SELECT
intfmt(q.data, q.fmt) AS "out"
FROM
(SELECT
("v" = first_value("v") OVER ()),
'123,456' AS data, '999,999' AS fmt
FROM
(SELECT 1 AS "v") AS "q2"
) AS "q";

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center