BUG #17637: case-when branches taken even if they dont match, raising errors

Started by PG Bug reporting formover 3 years ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17637
Logged by: Facundo Etchezar
Email address: hctf90@gmail.com
PostgreSQL version: 14.5
Operating system: Debian 11.3.0-3
Description:

Hi! I've come across some weird behavior. I'm inserting a row into a table,
and I parse/cast a text column in one way or the other depending on the
result of a join with a case-when expression. The issue is that for some
reason it seems the branches that aren't taken are evaluated anyway, which
in turn raises a cast error. Here is a tiny repro where I make two tables
and then try to insert in the test table a float8 or bool value based on the
result of the joined tmap table.

This tiny repro below raises the error SQL Error [22P02]: ERROR: invalid
input syntax for type boolean: "123.4" Like it's trying to parse the text
column as bool even if it shouldn't reach that part of the case-when.

drop table if exists test;
drop table if exists tmap;
create table test(
id int8,
vf float8,
vb bool
);
create table tmap(
id int8,
mapped_to int8
);
insert into tmap values(1, 1);
insert into tmap values(2, 2);
insert into test
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;

Weirdly enough if you forego the join, doing this below, it works fine:

drop table if exists test;
create table test(
id int8,
vf float8,
vb bool
);
insert into test
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case t.id when 1 then v::float8 else null end,
case t.id when 2 then v::bool else null end
from tmp t;

This should result in the same behavior yet it works fine, without throwing
an error.

I've also tested this in https://www.db-fiddle.com with versions 15 beta,
13, 12. All with the same error. Versions 11 and 10 seem to work fine.

Thank you for your time.

#2Richard Guo
guofenglinux@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17637: case-when branches taken even if they dont match, raising errors

On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

drop table if exists test;
drop table if exists tmap;
create table test(
id int8,
vf float8,
vb bool
);
create table tmap(
id int8,
mapped_to int8
);
insert into tmap values(1, 1);
insert into tmap values(2, 2);
insert into test
with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;

I think this has something to do with the CTE used here. In
preprocess_expression, we do not know the value of m.mapped_to, so we
cannot tell the test condition is constant FALSE. Thus we need go on
processing the result. But thanks to the CTE, we know t.v is const
'123.4'::text, and we want to convert it to boolean, which triggers the
error.

I'm not sure about this being a bug.

Thanks
Richard

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#2)
Re: BUG #17637: case-when branches taken even if they dont match, raising errors

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;

I'm not sure about this being a bug.

I'm not buying it either. The point is that the constants from the
WITH clause get pulled up into the outer query, whereupon you have

case m.mapped_to when 2 then '123.4'::text::bool else null end

and then we apply constant-folding which tries to perform the bool
conversion. (There are some folding rules whereby if a WHEN condition
reduces to constant true or constant false, we drop all the
therefore-unreachable THEN/ELSE arms without folding them --- but
that doesn't help here since m.mapped_to isn't a constant.)

I'm not especially eager to lobotomize the const-folding rules
in order to make toy examples like this one work. I don't think
it's representative of real queries; but we *would* be penalizing
real queries if we didn't perform such folding.

I believe you could dodge the issue in this particular case
by marking the WITH query as MATERIALIZED, which'll serve as
an optimization fence to prevent the constants from being
hoisted into the outer query.

regards, tom lane

#4Facundo Etchezar
hctf90@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #17637: case-when branches taken even if they dont match, raising errors

I see. So is this behavior expected? The two snippets should work the same
when you look at them but one errors out and the other doesn't. I'm
thinking either both should work or both should error out.

I had this issue on an insert from a NiFi instance that received JSON data
(field sensor measurements from edge devices) and inserted it into a
PostgreSQL instance. The snippet is a short "toy example" simply because I
managed to make an easy reproduction case for your convenience to test it
easily, that's all.

That being said, using the materialized CTE makes it work just fine, so
I'll have this in mind if it happens again. Thank you.

On Thu, Oct 13, 2022 at 11:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Oct 13, 2022 at 6:08 PM PG Bug reporting form <
noreply@postgresql.org> wrote:

with tmp as (select 1::int8 id, '123.4'::text v)
select t.id,
case m.mapped_to when 1 then v::float8 else null end,
case m.mapped_to when 2 then v::bool else null end
from tmp t
join tmap m on m.id = t.id;

I'm not sure about this being a bug.

I'm not buying it either. The point is that the constants from the
WITH clause get pulled up into the outer query, whereupon you have

case m.mapped_to when 2 then '123.4'::text::bool else null end

and then we apply constant-folding which tries to perform the bool
conversion. (There are some folding rules whereby if a WHEN condition
reduces to constant true or constant false, we drop all the
therefore-unreachable THEN/ELSE arms without folding them --- but
that doesn't help here since m.mapped_to isn't a constant.)

I'm not especially eager to lobotomize the const-folding rules
in order to make toy examples like this one work. I don't think
it's representative of real queries; but we *would* be penalizing
real queries if we didn't perform such folding.

I believe you could dodge the issue in this particular case
by marking the WITH query as MATERIALIZED, which'll serve as
an optimization fence to prevent the constants from being
hoisted into the outer query.

regards, tom lane

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Facundo Etchezar (#4)
Re: BUG #17637: case-when branches taken even if they dont match, raising errors

On Fri, Oct 14, 2022 at 12:23 AM Facundo Etchezar <hctf90@gmail.com> wrote:

I see. So is this behavior expected? The two snippets should work the same
when you look at them but one errors out and the other doesn't. I'm
thinking either both should work or both should error out.

While reasonable, this particular dynamic falls into a grey area. SQL is a
strongly typed language and having a column of data that is conditionally
in different data formats is not really compatible with that design. The
CASE expression does allow for handling of this typically but that only
works during execution - and in this case the problematic optimization is
happening during parsing. More people write structurally correct
inefficient queries than non-structurally correct ones and so the parsing
time optimization stays as-is. As noted, you have a way to prohibit the
optimization from revealing the design problem with your query.

David J.

#6Facundo Etchezar
hctf90@gmail.com
In reply to: David G. Johnston (#5)
Re: BUG #17637: case-when branches taken even if they dont match, raising errors

SQL is a strongly typed language and having a column of data that is
conditionally in different data formats is not really compatible with that
design.

I completely understand the argument of not touching the optimizer for
these cases, especially since I have a way to make it work which Tom
provided. But this is really not what is happening in the query.

I'm doing the equivalent of this piece of pseudo-code:

text v = '1234';
int8 id_type = 1;
bool bool_col;
int8 int_col;
if(id_type == 1) {
int_col = int8.parse(v);
} else {
int_col = null;
}
if(id_type == 2) {
bool_col = bool.parse(v);
} else {
bool_col = null;
}

And this query is executing both conditionals as if the condition is true
in both cases which is impossible, id_type can't be both values at the same
time. There is no dynamic typing whatsoever, it's all strongly typed, it's
just a very normal conditional operation.

Although this matter is way beyond the initial issue, which I consider
resolved since the 'materialized' keyword lets me make my original insert
just fine.

On Fri, Oct 14, 2022 at 10:28 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Fri, Oct 14, 2022 at 12:23 AM Facundo Etchezar <hctf90@gmail.com>
wrote:

I see. So is this behavior expected? The two snippets should work the
same when you look at them but one errors out and the other doesn't. I'm
thinking either both should work or both should error out.

While reasonable, this particular dynamic falls into a grey area. SQL is
a strongly typed language and having a column of data that is conditionally
in different data formats is not really compatible with that design. The
CASE expression does allow for handling of this typically but that only
works during execution - and in this case the problematic optimization is
happening during parsing. More people write structurally correct
inefficient queries than non-structurally correct ones and so the parsing
time optimization stays as-is. As noted, you have a way to prohibit the
optimization from revealing the design problem with your query.

David J.