BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Started by PG Bug reporting formover 2 years ago11 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18051
Logged by: Nicolas Gouteux
Email address: nicolas.gouteux@sonarsource.com
PostgreSQL version: 13.7
Operating system: Linux
Description:

Hi

Since SQL inception, char(N) dataype is supposed to add padding blanks up to
N characters, whereas varchar(N) is supposed to truncate extra trailing
white spaces.

This is confirmed by the documentation:
If the string to be stored is shorter than the declared length, values of
type character will be space-padded
https://www.postgresql.org/docs/13/datatype-character.html

However, the following snippet exhibits the exact opposite behavior:

create table if not exists ngx_char (
charcol char(10) not null,
varcharcol varchar(10) not null
);
truncate table ngx_char;
insert into ngx_char (charcol, varcharcol) values ('A', 'A');
insert into ngx_char (charcol, varcharcol) values ('A ', 'A ');
select charcol || 'B', varcharcol || 'C' from ngx_char;
select * from ngx_char where charcol = varcharcol;

Output:

?column?,?column?
AB,AC
AB,A C

This is very strange! Is it me?

Thanks

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

On Wednesday, August 9, 2023, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 18051
Logged by: Nicolas Gouteux
Email address: nicolas.gouteux@sonarsource.com
PostgreSQL version: 13.7
Operating system: Linux
Description:

create table if not exists ngx_char (
charcol char(10) not null,
varcharcol varchar(10) not null
);
truncate table ngx_char;
insert into ngx_char (charcol, varcharcol) values ('A', 'A');
insert into ngx_char (charcol, varcharcol) values ('A ', 'A ');
select charcol || 'B', varcharcol || 'C' from ngx_char;
select * from ngx_char where charcol = varcharcol;

Output:

?column?,?column?
AB,AC
AB,A C

This is very strange! Is it me?

This is working as designed and documented. The concatenation to text
removes the insignificant white space in char. There is no manipulation of
content for varchar.

Just don’t use char. There is no good reason to deal with its implicit
behaviors.

David J.

#3Nicolas Gouteux
nicolas.gouteux@sonarsource.com
In reply to: David G. Johnston (#2)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Hi

Sorry, but I still do not get it:
- char type adds padding up to its length
- select length(charcol), char_length(charcoal)
both yield 1
- how can the length of a char(10) be different than 10 since is right
padded?
- I am pretty sure that Oracle/MS-SQL/Sybase would say that length is 10...

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org&gt; in your IDE?

On Wed, 9 Aug 2023 at 14:57, David G. Johnston <david.g.johnston@gmail.com>
wrote:

Show quoted text

On Wednesday, August 9, 2023, PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18051
Logged by: Nicolas Gouteux
Email address: nicolas.gouteux@sonarsource.com
PostgreSQL version: 13.7
Operating system: Linux
Description:

create table if not exists ngx_char (
charcol char(10) not null,
varcharcol varchar(10) not null
);
truncate table ngx_char;
insert into ngx_char (charcol, varcharcol) values ('A', 'A');
insert into ngx_char (charcol, varcharcol) values ('A ', 'A ');
select charcol || 'B', varcharcol || 'C' from ngx_char;
select * from ngx_char where charcol = varcharcol;

Output:

?column?,?column?
AB,AC
AB,A C

This is very strange! Is it me?

This is working as designed and documented. The concatenation to text
removes the insignificant white space in char. There is no manipulation of
content for varchar.

Just don’t use char. There is no good reason to deal with its implicit
behaviors.

David J.

#4Nicolas Gouteux
nicolas.gouteux@sonarsource.com
In reply to: Nicolas Gouteux (#3)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Just tried that on Oracle, and of course, the length is 10... (see pic
attached)
[image: image.png]

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org&gt; in your IDE?

On Wed, 9 Aug 2023 at 16:04, Nicolas Gouteux <
nicolas.gouteux@sonarsource.com> wrote:

Show quoted text

Hi

Sorry, but I still do not get it:
- char type adds padding up to its length
- select length(charcol), char_length(charcoal)
both yield 1
- how can the length of a char(10) be different than 10 since is right
padded?
- I am pretty sure that Oracle/MS-SQL/Sybase would say that length is 10...

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org&gt; in your IDE?

On Wed, 9 Aug 2023 at 14:57, David G. Johnston <david.g.johnston@gmail.com>
wrote:

On Wednesday, August 9, 2023, PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18051
Logged by: Nicolas Gouteux
Email address: nicolas.gouteux@sonarsource.com
PostgreSQL version: 13.7
Operating system: Linux
Description:

create table if not exists ngx_char (
charcol char(10) not null,
varcharcol varchar(10) not null
);
truncate table ngx_char;
insert into ngx_char (charcol, varcharcol) values ('A', 'A');
insert into ngx_char (charcol, varcharcol) values ('A ', 'A ');
select charcol || 'B', varcharcol || 'C' from ngx_char;
select * from ngx_char where charcol = varcharcol;

Output:

?column?,?column?
AB,AC
AB,A C

This is very strange! Is it me?

This is working as designed and documented. The concatenation to text
removes the insignificant white space in char. There is no manipulation of
content for varchar.

Just don’t use char. There is no good reason to deal with its implicit
behaviors.

David J.

Attachments:

image.pngimage/png; name=image.pngDownload
#5David Rowley
dgrowleyml@gmail.com
In reply to: Nicolas Gouteux (#3)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

On Thu, 10 Aug 2023 at 02:05, Nicolas Gouteux
<nicolas.gouteux@sonarsource.com> wrote:

Sorry, but I still do not get it:

I think the key is this sentence from the documentation:

"Trailing spaces are removed when converting a character value to one
of the other string types."

Your concatenation example from earlier required that the type be
implicitly cast to another type as we have no char-to-char
concatenation operator, per:

postgres=# select oprname,oprleft::regtype,oprright::regtype,oprcode
from pg_operator where oprname = '||';
oprname | oprleft | oprright | oprcode
---------+--------------------+--------------------+-----------------
|| | anycompatiblearray | anycompatible | array_append
|| | anycompatible | anycompatiblearray | array_prepend
|| | anycompatiblearray | anycompatiblearray | array_cat
|| | text | text | textcat
|| | bit varying | bit varying | bitcat
|| | bytea | bytea | byteacat
|| | text | anynonarray | textanycat
|| | anynonarray | text | anytextcat
|| | tsvector | tsvector | tsvector_concat
|| | tsquery | tsquery | tsquery_or
|| | jsonb | jsonb | jsonb_concat
(11 rows)

Concatenating two char(10)s would just use textcat(), so the above
line from the doc applies since type conversion is required.

- char type adds padding up to its length
- select length(charcol), char_length(charcoal)
both yield 1

There is a length function (bpcharlen) for char, per:

postgres=# select proargtypes[0]::Regtype,prosrc from pg_proc where
proname = 'length';
proargtypes | prosrc
-------------+--------------------
text | textlen
character | bpcharlen
lseg | lseg_length
path | path_length
bytea | length_in_encoding
bit | bitlength
bytea | byteaoctetlen
tsvector | tsvector_length
(8 rows)

However, that function does:

/* get number of bytes, ignoring trailing spaces */
len = bcTruelen(arg);

and that wasn't by accident, per:

commit f27976c85b1fb9002727cce65b9f9567e158f754
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Feb 1 06:27:48 2004 +0000

Make length() disregard trailing spaces in char(n) values, per discussion
some time ago and recent patch from Gavin Sherry. Update documentation
to point out that trailing spaces are insignificant in char(n).

unfortunately, we didn't link to discussions in commit messages back
then. It might be worth you searching the archives shortly before
that date to see if anything interesting comes up.

As mentioned by David, there's not much call for using char(N) in
PostgreSQL. I don't know the history, but I always imagined char(N)
existence was owed to improved internal optimizations in RDBMS
implementations that might have been possible if the tuples were
fixed-sized. We have no such advantages in PostgreSQL as even a
char(N) will be stored as a variable length field. In theory, we could
take some advantage in that as tuple deformation becomes less
efficient for columns that come after a variable length field due to
the offset into the tuple not being fixed, however, because we store
char(N)s as variable length, we can't take advantage of that and it's
too late as changing it would change the binary format of the type.

David

#6David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#5)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

On Thu, 10 Aug 2023 at 02:37, David Rowley <dgrowleyml@gmail.com> wrote:

unfortunately, we didn't link to discussions in commit messages back
then. It might be worth you searching the archives shortly before
that date to see if anything interesting comes up.

I got curious:

/messages/by-id/13994.1075617087@sss.pgh.pa.us

David

#7Nicolas Gouteux
nicolas.gouteux@sonarsource.com
In reply to: David Rowley (#5)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Hi David

Thanks a lot for the explanation.

Don't get me wrong, I almost never used the char datatype for my almost 40
years of DB programming.

Char is a reminiscence from the time when VAX and IBM machines had a single
mantra: fixed length data
The fact that white spaces are insignificant is all good. Nobody would like
to pad these days.
I hold for proof that client code is full of ltrim() rtrim() all the time...

I was just trying to show more junior programmers the difference between
the 'assumed' padded char and the more modern 'trimming' varchar
So imagine my surprise when both length() and concatenation() yielded the
exactly opposite behavior than expected!

I come from Sybase & Oracle, and I can tell you data length(char_type)
always return the length of the field.

I don't have any issue with this discrepancy with other vendors, but I
believe it's good to know (and maybe advertise in the docs?

Thanks again for taking the time to respond

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org&gt; in your IDE?

On Wed, 9 Aug 2023 at 16:38, David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Thu, 10 Aug 2023 at 02:05, Nicolas Gouteux
<nicolas.gouteux@sonarsource.com> wrote:

Sorry, but I still do not get it:

I think the key is this sentence from the documentation:

"Trailing spaces are removed when converting a character value to one
of the other string types."

Your concatenation example from earlier required that the type be
implicitly cast to another type as we have no char-to-char
concatenation operator, per:

postgres=# select oprname,oprleft::regtype,oprright::regtype,oprcode
from pg_operator where oprname = '||';
oprname | oprleft | oprright | oprcode
---------+--------------------+--------------------+-----------------
|| | anycompatiblearray | anycompatible | array_append
|| | anycompatible | anycompatiblearray | array_prepend
|| | anycompatiblearray | anycompatiblearray | array_cat
|| | text | text | textcat
|| | bit varying | bit varying | bitcat
|| | bytea | bytea | byteacat
|| | text | anynonarray | textanycat
|| | anynonarray | text | anytextcat
|| | tsvector | tsvector | tsvector_concat
|| | tsquery | tsquery | tsquery_or
|| | jsonb | jsonb | jsonb_concat
(11 rows)

Concatenating two char(10)s would just use textcat(), so the above
line from the doc applies since type conversion is required.

- char type adds padding up to its length
- select length(charcol), char_length(charcoal)
both yield 1

There is a length function (bpcharlen) for char, per:

postgres=# select proargtypes[0]::Regtype,prosrc from pg_proc where
proname = 'length';
proargtypes | prosrc
-------------+--------------------
text | textlen
character | bpcharlen
lseg | lseg_length
path | path_length
bytea | length_in_encoding
bit | bitlength
bytea | byteaoctetlen
tsvector | tsvector_length
(8 rows)

However, that function does:

/* get number of bytes, ignoring trailing spaces */
len = bcTruelen(arg);

and that wasn't by accident, per:

commit f27976c85b1fb9002727cce65b9f9567e158f754
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Feb 1 06:27:48 2004 +0000

Make length() disregard trailing spaces in char(n) values, per
discussion
some time ago and recent patch from Gavin Sherry. Update documentation
to point out that trailing spaces are insignificant in char(n).

unfortunately, we didn't link to discussions in commit messages back
then. It might be worth you searching the archives shortly before
that date to see if anything interesting comes up.

As mentioned by David, there's not much call for using char(N) in
PostgreSQL. I don't know the history, but I always imagined char(N)
existence was owed to improved internal optimizations in RDBMS
implementations that might have been possible if the tuples were
fixed-sized. We have no such advantages in PostgreSQL as even a
char(N) will be stored as a variable length field. In theory, we could
take some advantage in that as tuple deformation becomes less
efficient for columns that come after a variable length field due to
the offset into the tuple not being fixed, however, because we store
char(N)s as variable length, we can't take advantage of that and it's
too late as changing it would change the binary format of the type.

David

#8Nicolas Gouteux
nicolas.gouteux@sonarsource.com
In reply to: David Rowley (#6)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Thanks! Very interesting
At least I am not (yet) totally crazy!
Cheers

Nicolas Gouteux | Sonar

https://sonarsource.com
Are you using SonarLint <https://www.sonarlint.org&gt; in your IDE?

On Wed, 9 Aug 2023 at 16:43, David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Thu, 10 Aug 2023 at 02:37, David Rowley <dgrowleyml@gmail.com> wrote:

unfortunately, we didn't link to discussions in commit messages back
then. It might be worth you searching the archives shortly before
that date to see if anything interesting comes up.

I got curious:

/messages/by-id/13994.1075617087@sss.pgh.pa.us

David

#9Félix GERZAGUET
felix.gerzaguet@gmail.com
In reply to: Nicolas Gouteux (#7)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

Hello Nicolas,

On Wed, Aug 9, 2023 at 4:47 PM Nicolas Gouteux <
nicolas.gouteux@sonarsource.com> wrote:

I was just trying to show more junior programmers the difference between
the 'assumed' padded char and the more modern 'trimming' varchar
So imagine my surprise when both length() and concatenation() yielded the
exactly opposite behavior than expected!

For such pedagogic purpose you can use:

select length(cast('A' as char(4))::bytea)

But you then have to explain the ::bytea trick :-)

Best Regards,

#10David Rowley
dgrowleyml@gmail.com
In reply to: Nicolas Gouteux (#7)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

On Thu, 10 Aug 2023 at 02:47, Nicolas Gouteux
<nicolas.gouteux@sonarsource.com> wrote:

I don't have any issue with this discrepancy with other vendors, but I believe it's good to know (and maybe advertise in the docs?

Maybe it's worth noting it down in [1]https://www.postgresql.org/docs/current/functions-string.html in char_length and length.

Looking at [2]/messages/by-id/Pine.LNX.4.58.0401271806250.22203@linuxworld.com.au, it does not look like they were able to glean much
guidance from the SQL standard on this. It's late here, but it seems
to me that if it was left as it was, then the user could have had a
choice by using length(rtrim(col)), but if we strip them out and the
user wants to get the full padded width, it's much harder to do maybe
with pg_column_size() and some insider knowledge on when we use 1-byte
headers and when we use 4-byte headers.

Anyway, 2004 was a long time ago. I can't imagine we could possibly
make such a change today to put it back. We might even struggle if
the SQL standard was more clear on it (I've not looked again to check
if there've been improvements from what was found in 2004).

David

[1]: https://www.postgresql.org/docs/current/functions-string.html
[2]: /messages/by-id/Pine.LNX.4.58.0401271806250.22203@linuxworld.com.au

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#10)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation

David Rowley <dgrowleyml@gmail.com> writes:

Anyway, 2004 was a long time ago. I can't imagine we could possibly
make such a change today to put it back.

Yeah. IMV, char(N) is a legacy type with legacy behaviors, and
we shouldn't change those behaviors for fear of breaking legacy
applications that might expect them. If you don't like the way it
works, don't use char(N).

BTW, as far as the question of better optimization of fixed-width
fields goes, we couldn't do that anyway with char(N) except in the
ever-more-minority case of single-byte database encoding. That's
because N is counted in characters not bytes (as is quite clear
from the SQL standard, even if their opinion about trailing spaces
is less clear). I think that's a primary reason why nobody has
bothered to pursue such an optimization, and in turn that's why
char(N) is now such a backwater.

regards, tom lane