BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

Started by PG Bug reporting formover 1 year ago14 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18594
Logged by: Francisco Javier Ossandon
Email address: fco.j.ossandon@gmail.com
PostgreSQL version: 15.5
Operating system: Linux
Description:

Dear developers:
I have been using Postgres for some years now, and I just found what looks
like a bug, or at least I did not see anything in the documentation that
could explain it.

* version():
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
(Red Hat 7.3.1-12), 64-bit

The main issue is that "CASE WHEN THEN ELSE" is giving an expected result in
what seems like simple logic. I have a series of WHENs with conditions over
the value of a column and an ELSE where I return the same column value if
the CASEs above don't apply, and the query always returns the value of the
ELSE condition like if the WHENs above would not exist. The following query
exemplifies this, where the first CASE column gives an unexpected result
when using the same column in the ELSE, while the other CASE columns provide
the expected result when removing the ELSE or using a different column in
the ELSE or a constant string:
```
SELECT oid,
typname,
typcategory,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typcategory
END AS test_case_fails,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
END AS test_case_works_no_else,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typname
END AS test_case_works_other_col,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE 'ELSE'
END AS test_case_works_constant
FROM pg_type
ORDER BY oid
LIMIT 10
;
```
This returns the following:
```
oid|typname
|typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|
---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+
16|bool |B |B | |bool
|ELSE |
17|bytea |U |U | |bytea
|ELSE |
18|char |Z |Z | |char
|ELSE |
19|name |S |S |String types |String
types |String types |
20|int8 |N |N |Numeric types |Numeric
types |Numeric types |
21|int2 |N |N |Numeric types |Numeric
types |Numeric types |
22|int2vector|A |A |
|int2vector |ELSE |
23|int4 |N |N |Numeric types |Numeric
types |Numeric types |
24|regproc |N |N |Numeric types |Numeric
types |Numeric types |
25|text |S |S |String types |String
types |String types |
```
It looks weird, so I'm reporting this to you for review.

Best regards,

#2Andrew Dunstan
andrew@dunslane.net
In reply to: PG Bug reporting form (#1)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On 2024-08-28 We 12:17 PM, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18594
Logged by: Francisco Javier Ossandon
Email address: fco.j.ossandon@gmail.com
PostgreSQL version: 15.5
Operating system: Linux
Description:

Dear developers:
I have been using Postgres for some years now, and I just found what looks
like a bug, or at least I did not see anything in the documentation that
could explain it.

* version():
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712
(Red Hat 7.3.1-12), 64-bit

The main issue is that "CASE WHEN THEN ELSE" is giving an expected result in
what seems like simple logic. I have a series of WHENs with conditions over
the value of a column and an ELSE where I return the same column value if
the CASEs above don't apply, and the query always returns the value of the
ELSE condition like if the WHENs above would not exist. The following query
exemplifies this, where the first CASE column gives an unexpected result
when using the same column in the ELSE, while the other CASE columns provide
the expected result when removing the ELSE or using a different column in
the ELSE or a constant string:
```
SELECT oid,
typname,
typcategory,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typcategory
END AS test_case_fails,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
END AS test_case_works_no_else,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typname
END AS test_case_works_other_col,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE 'ELSE'
END AS test_case_works_constant
FROM pg_type
ORDER BY oid
LIMIT 10
;
```
This returns the following:
```
oid|typname
|typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|
---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+
16|bool |B |B | |bool
|ELSE |
17|bytea |U |U | |bytea
|ELSE |
18|char |Z |Z | |char
|ELSE |
19|name |S |S |String types |String
types |String types |
20|int8 |N |N |Numeric types |Numeric
types |Numeric types |
21|int2 |N |N |Numeric types |Numeric
types |Numeric types |
22|int2vector|A |A |
|int2vector |ELSE |
23|int4 |N |N |Numeric types |Numeric
types |Numeric types |
24|regproc |N |N |Numeric types |Numeric
types |Numeric types |
25|text |S |S |String types |String
types |String types |
```
It looks weird, so I'm reporting this to you for review.

The bug is in the query, not in Postgres. Try casting typcategory to
text in your failing case to see the difference. As it is the literals
are being cast to "char" because that's what typcategory is.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#3Francisco J. Ossandón
fco.j.ossandon@gmail.com
In reply to: Andrew Dunstan (#2)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

Hello Andrew,
I thought about that before writing and tried casting to TEXT in the WHEN
comparison bits but it still failed.
CASE
WHEN typcategory::TEXT = 'N' THEN 'Numeric types'
WHEN typcategory::TEXT = 'S' THEN 'String types'
WHEN typcategory::TEXT = 'E' THEN 'Enum types'
ELSE typcategory
END AS test_case_fails,

After your answer, I tried again but this time adding the casting to the
ELSE too, and then it worked:
CASE
WHEN typcategory::TEXT = 'N' THEN 'Numeric types'
WHEN typcategory::TEXT = 'S' THEN 'String types'
WHEN typcategory::TEXT = 'E' THEN 'Enum types'
ELSE typcategory::TEXT

So is the ELSE column hijacking the data type of the whole expression?

Cheers,

Francisco

El mié, 28 ago 2024 a las 13:57, Andrew Dunstan (<andrew@dunslane.net>)
escribió:

On 2024-08-28 We 12:17 PM, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18594
Logged by: Francisco Javier Ossandon
Email address: fco.j.ossandon@gmail.com
PostgreSQL version: 15.5
Operating system: Linux
Description:

Dear developers:
I have been using Postgres for some years now, and I just found what

looks

like a bug, or at least I did not see anything in the documentation that
could explain it.

* version():
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1

20180712

(Red Hat 7.3.1-12), 64-bit

The main issue is that "CASE WHEN THEN ELSE" is giving an expected

result in

what seems like simple logic. I have a series of WHENs with conditions

over

the value of a column and an ELSE where I return the same column value if
the CASEs above don't apply, and the query always returns the value of

the

ELSE condition like if the WHENs above would not exist. The following

query

exemplifies this, where the first CASE column gives an unexpected result
when using the same column in the ELSE, while the other CASE columns

provide

the expected result when removing the ELSE or using a different column in
the ELSE or a constant string:
```
SELECT oid,
typname,
typcategory,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typcategory
END AS test_case_fails,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
END AS test_case_works_no_else,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE typname
END AS test_case_works_other_col,
CASE
WHEN typcategory = 'N' THEN 'Numeric types'
WHEN typcategory = 'S' THEN 'String types'
WHEN typcategory = 'E' THEN 'Enum types'
ELSE 'ELSE'
END AS test_case_works_constant
FROM pg_type
ORDER BY oid
LIMIT 10
;
```
This returns the following:
```
oid|typname

|typcategory|test_case_fails|test_case_works_no_else|test_case_works_other_col|test_case_works_constant|

---+----------+-----------+---------------+-----------------------+-------------------------+------------------------+

16|bool |B |B | |bool
|ELSE |
17|bytea |U |U | |bytea
|ELSE |
18|char |Z |Z | |char
|ELSE |
19|name |S |S |String types

|String

types |String types |
20|int8 |N |N |Numeric types

|Numeric

types |Numeric types |
21|int2 |N |N |Numeric types

|Numeric

types |Numeric types |
22|int2vector|A |A |
|int2vector |ELSE |
23|int4 |N |N |Numeric types

|Numeric

types |Numeric types |
24|regproc |N |N |Numeric types

|Numeric

types |Numeric types |
25|text |S |S |String types

|String

types |String types |
```
It looks weird, so I'm reporting this to you for review.

The bug is in the query, not in Postgres. Try casting typcategory to
text in your failing case to see the difference. As it is the literals
are being cast to "char" because that's what typcategory is.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

--
Francisco J. Ossandon
Bioinformatician
Ph.D. in Biotechnology

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Francisco J. Ossandón (#3)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

=?UTF-8?Q?Francisco_J=2E_Ossand=C3=B3n?= <fco.j.ossandon@gmail.com> writes:

So is the ELSE column hijacking the data type of the whole expression?

It's the only CASE result that is supplying a definite type at all.
But see

https://www.postgresql.org/docs/current/typeconv-union-case.html

particularly the footnote to the bit about "Select the first
non-unknown input type as the candidate type, then consider
each other non-unknown input type, left to right."

The WHEN clauses have exactly nothing to do with the result type
of the CASE: it's the THEN and ELSE clauses that supply the result.

regards, tom lane

#5Francisco J. Ossandón
fco.j.ossandon@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

Hello Tom,
Thanks for the explanation and the link to the documentation. I understand
now what happened.
So it was a mistake on my side.
Thanks again for the patience and replies.

Best regards,

Francisco

El mié, 28 ago 2024 a las 20:11, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:

=?UTF-8?Q?Francisco_J=2E_Ossand=C3=B3n?= <fco.j.ossandon@gmail.com>
writes:

So is the ELSE column hijacking the data type of the whole expression?

It's the only CASE result that is supplying a definite type at all.
But see

https://www.postgresql.org/docs/current/typeconv-union-case.html

particularly the footnote to the bit about "Select the first
non-unknown input type as the candidate type, then consider
each other non-unknown input type, left to right."

The WHEN clauses have exactly nothing to do with the result type
of the CASE: it's the THEN and ELSE clauses that supply the result.

regards, tom lane

--
Francisco J. Ossandon
Bioinformatician
Ph.D. in Biotechnology

#6Chris BSomething
xpusostomos@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

In the 6 rules for type resolution, rule 5 says "select the first
non-unknown input type as candidate, and consider if other non-unknown
types can be implicitly converted." ... should there not be a new rule,
something like considering if the unknown types can be converted, then
notice that a long string can't be converted to a char, because it won't
fit? And then fail with an interesting error?

On Thu, 29 Aug 2024 at 08:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

=?UTF-8?Q?Francisco_J=2E_Ossand=C3=B3n?= <fco.j.ossandon@gmail.com>
writes:

So is the ELSE column hijacking the data type of the whole expression?

It's the only CASE result that is supplying a definite type at all.
But see

https://www.postgresql.org/docs/current/typeconv-union-case.html

particularly the footnote to the bit about "Select the first
non-unknown input type as the candidate type, then consider
each other non-unknown input type, left to right."

The WHEN clauses have exactly nothing to do with the result type
of the CASE: it's the THEN and ELSE clauses that supply the result.

regards, tom lane

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Chris BSomething (#6)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com>
wrote:

In the 6 rules for type resolution, rule 5 says "select the first
non-unknown input type as candidate, and consider if other non-unknown
types can be implicitly converted." ... should there not be a new rule,
something like considering if the unknown types can be converted, then
notice that a long string can't be converted to a char, because it won't
fit? And then fail with an interesting error?

Do you have an example demonstrating this new rule? I'm doubting this
resolution logic goes into that much depth regarding the semantics of each
type. It is implied that even if an implicit cast exists from type A to
type B that actual values of type A may fail during the casting process to
make them type B. That would be the interesting error you'd end up seeing
- but from the type casting/input system, not the type resolution one.

David J.

#8Chris BSomething
xpusostomos@gmail.com
In reply to: David G. Johnston (#7)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Tue, Feb 18, 2025, 6:31 AM David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com>
wrote:

In the 6 rules for type resolution, rule 5 says "select the first
non-unknown input type as candidate, and consider if other non-unknown
types can be implicitly converted." ... should there not be a new rule,
something like considering if the unknown types can be converted, then
notice that a long string can't be converted to a char, because it won't
fit? And then fail with an interesting error?

Do you have an example demonstrating this new rule? I'm doubting this
resolution logic goes into that much depth regarding the semantics of each
type. It is implied that even if an implicit cast exists from type A to
type B that actual values of type A may fail during the casting process to
make them type B. That would be the interesting error you'd end up seeing
- but from the type casting/input system, not the type resolution one.

David J.

That sounds like a lot of words to justify an error that could be caught at
parse time, that is actually never caught at all. It seems to me that
whatever type is inferred from a long constant string, it won't fit into a
char... At least not without casting it. And therefore an error should come
out.

Show quoted text
#9David G. Johnston
david.g.johnston@gmail.com
In reply to: Chris BSomething (#8)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Mon, Feb 17, 2025 at 3:49 PM Chris BSomething <xpusostomos@gmail.com>
wrote:

That sounds like a lot of words to justify an error that could be caught
at parse time, that is actually never caught at all. It seems to me that
whatever type is inferred from a long constant string, it won't fit into a
char... At least not without casting it. And therefore an error should come
out.

I thought you were trying to convince people to modify the documentation?
This reads more like wanting to hack the parser to teach it forbidden
knowledge.

In any case I think this email report is closed and burying any discussion
about this material here just makes it harder for people to find.

David J.

#10Chris BSomething
xpusostomos@gmail.com
In reply to: David G. Johnston (#9)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Tue, Feb 18, 2025, 7:13 AM David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Mon, Feb 17, 2025 at 3:49 PM Chris BSomething <xpusostomos@gmail.com>
wrote:

That sounds like a lot of words to justify an error that could be caught
at parse time, that is actually never caught at all. It seems to me that
whatever type is inferred from a long constant string, it won't fit into a
char... At least not without casting it. And therefore an error should come
out.

I thought you were trying to convince people to modify the documentation?
This reads more like wanting to hack the parser to teach it forbidden
knowledge.

Hack? Forbidden knowledge? What an odd comment.

Show quoted text
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#7)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Wed, Jan 22, 2025 at 3:00 AM Chris BSomething <xpusostomos@gmail.com>
wrote:

... should there not be a new rule,
something like considering if the unknown types can be converted, then
notice that a long string can't be converted to a char, because it won't
fit? And then fail with an interesting error?

Do you have an example demonstrating this new rule? I'm doubting this
resolution logic goes into that much depth regarding the semantics of each
type. It is implied that even if an implicit cast exists from type A to
type B that actual values of type A may fail during the casting process to
make them type B. That would be the interesting error you'd end up seeing
- but from the type casting/input system, not the type resolution one.

Yeah. We document somewhere that the contents of a literal string are
*not* considered while deciding what type it should be taken as; thus,
the possibility of a casting failure is not part of the rules.

I think such a rule would be pretty dangerous, even if we could
implement it easily. As an example, if you entered something you
meant as a timestamp, but you fat-fingered the punctuation or
something, it'd likely fall back to being considered just "text",
potentially silently changing the semantics of the expression. We had
a lot of trouble with that sort of behavior back when the system had a
lot of implicit casts to text. We got rid of (most of) those in 8.3
or thereabouts, and it made the semantics generally a lot safer.

regards, tom lane

#12Chris BSomething
xpusostomos@gmail.com
In reply to: Tom Lane (#11)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Tue, 18 Feb 2025 at 10:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. We document somewhere that the contents of a literal string are
*not* considered while deciding what type it should be taken as; thus,
the possibility of a casting failure is not part of the rules.

The length of a literal is not strictly speaking its "contents". One could
for example, have its type as varchar(x) where x is its length. And then
not have implicit typecasts from varchar(x) to varchar( < x).

I think such a rule would be pretty dangerous, even if we could
implement it easily. As an example, if you entered something you
meant as a timestamp, but you fat-fingered the punctuation or
something, it'd likely fall back to being considered just "text",

I don't get that, I thought timestamps have to be preceded by "TIMESTAMP".

potentially silently changing the semantics of the expression. We had

a lot of trouble with that sort of behavior back when the system had a
lot of implicit casts to text. We got rid of (most of) those in 8.3
or thereabouts, and it made the semantics generally a lot safer.

So... you solved that, it sounds like.

And yet this super dangerous behavior which could sit dormant and
undiscovered in a view in a large computer system, doesn't bother anyone?
OK, whatever.

Chris

#13David G. Johnston
david.g.johnston@gmail.com
In reply to: Chris BSomething (#12)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On Wednesday, February 19, 2025, Chris BSomething <xpusostomos@gmail.com>
wrote:

And yet this super dangerous behavior which could sit dormant and
undiscovered in a view in a large computer system, doesn't bother anyone?
OK, whatever.

There a quite a few of those in SQL…testing queries is a requirement. And
being explicit with types is strongly suggested.

To be clear, the complaint is that we allow input functions such as
varchar(n) and char to truncate their inputs as opposed to erroring out if
the input length is too long. Right? If not I’m still confused as to what
exact behavior you believe is dangerous.

David J.

#14Vik Fearing
vik@postgresfriends.org
In reply to: Chris BSomething (#12)
Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

On 19/02/2025 13:50, Chris BSomething wrote:

On Tue, 18 Feb 2025 at 10:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think such a rule would be pretty dangerous, even if we could
implement it easily.  As an example, if you entered something you
meant as a timestamp, but you fat-fingered the punctuation or
something, it'd likely fall back to being considered just "text",

I don't get that, I thought timestamps have to be preceded by "TIMESTAMP".

The SQL standard requires you to do that, but PostgreSQL is a bit more
lax about it (and in my humble opinion, thankfully so). If you did
prefix your literals with their type, you wouldn't be having these
problems in the first place. The problem stems from postgres treating
your unmarked literal as "unknown" and then trying to figure out what
the heck it is.

--

Vik Fearing