Make length(char(n)) return 'true' length

Started by Gavin Sherryalmost 22 years ago7 messages
#1Gavin Sherry
swm@linuxworld.com.au
1 attachment(s)

The attached patch changes the existing behaviour of length(char(n)).
Currently, this is what happens:

template1=# select length('blah'::char(10));
length
--------
10
(1 row)

With this patch:

template1=# select length('blah'::char(10));
length
--------
4
(1 row)

This behaviour was proposed by Tom back in November last year. (I have
tried to handle multibyte correctly but probably haven't -- hence my email
hackers instead of patches).

The spec doesn't give us any insight (as far as I can tell) as to how we
should do it length(char(n)), but the above seems consistent with other
parts of the code (eg, comparison functions, concatenation).

SQL200X has these choice paragraphs for those who are interested:

<length expression> returns the length of a given character string,
as an exact numeric value, in characters or octets according to the
choice of function.

And:

the result is the number of explicit or implicit
<char length units> in <char length expression>, counted in
accordance with the definition of those units in the relevant
normatively referenced document.

I have no idea what the 'normatively referenced document' is, but grep-ing
through all of SQL200X, 99 and 92 didn't reveal anything too interesting.

Gavin

Attachments:

bpcharlen.patchtext/plain; charset=US-ASCII; name=bpcharlen.patchDownload
Index: src/backend/utils/adt/varchar.c
===================================================================
RCS file: /usr/local/cvsroot/pgsql-server/src/backend/utils/adt/varchar.c,v
retrieving revision 1.103
diff -2 -c -r1.103 varchar.c
*** src/backend/utils/adt/varchar.c	29 Nov 2003 19:51:59 -0000	1.103
--- src/backend/utils/adt/varchar.c	27 Jan 2004 06:18:33 -0000
***************
*** 511,522 ****
  {
  	BpChar	   *arg = PG_GETARG_BPCHAR_P(0);
  
  	/* optimization for single byte encoding */
  	if (pg_database_encoding_max_length() <= 1)
! 		PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ);
  
! 	PG_RETURN_INT32(
! 			  pg_mbstrlen_with_len(VARDATA(arg), VARSIZE(arg) - VARHDRSZ)
! 		);
  }
  
--- 511,530 ----
  {
  	BpChar	   *arg = PG_GETARG_BPCHAR_P(0);
+ 	int			len;
+ 	char	   *str;
  
+ 	len = bcTruelen(arg);
+ 	
  	/* optimization for single byte encoding */
  	if (pg_database_encoding_max_length() <= 1)
! 		PG_RETURN_INT32(len);
  
! 	str = palloc(len);
! 	StrNCpy(str,VARDATA(arg),len);
! 
! 	len = pg_mbstrlen_with_len(str, len);
! 	pfree(str);
! 
! 	PG_RETURN_INT32(len);
  }
  
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#1)
Re: Make length(char(n)) return 'true' length

Gavin Sherry <swm@linuxworld.com.au> writes:

The attached patch changes the existing behaviour of length(char(n)).

Applied, with some simplifications (there wasn't any particular need to
make a temporary copy of the string).

I also added some documentation stating that trailing spaces are
semantically insignificant in char(n), which I believe is now generally
true (have we missed any places?). There didn't seem to be any place to
specifically mention length() in this connection.

Interestingly, our regression tests did not detect this change of
behavior.

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#1)
Re: Make length(char(n)) return 'true' length

Looks good to me but I will get some other eyse on it before I apply it.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------

Gavin Sherry wrote:

The attached patch changes the existing behaviour of length(char(n)).
Currently, this is what happens:

template1=# select length('blah'::char(10));
length
--------
10
(1 row)

With this patch:

template1=# select length('blah'::char(10));
length
--------
4
(1 row)

This behaviour was proposed by Tom back in November last year. (I have
tried to handle multibyte correctly but probably haven't -- hence my email
hackers instead of patches).

The spec doesn't give us any insight (as far as I can tell) as to how we
should do it length(char(n)), but the above seems consistent with other
parts of the code (eg, comparison functions, concatenation).

SQL200X has these choice paragraphs for those who are interested:

<length expression> returns the length of a given character string,
as an exact numeric value, in characters or octets according to the
choice of function.

And:

the result is the number of explicit or implicit
<char length units> in <char length expression>, counted in
accordance with the definition of those units in the relevant
normatively referenced document.

I have no idea what the 'normatively referenced document' is, but grep-ing
through all of SQL200X, 99 and 92 didn't reveal anything too interesting.

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Gavin Sherry
swm@linuxworld.com.au
In reply to: Bruce Momjian (#3)
Re: Make length(char(n)) return 'true' length

I believe Tom applied this while you were away.

Gavin

On Thu, 12 Feb 2004, Bruce Momjian wrote:

Show quoted text

Looks good to me but I will get some other eyse on it before I apply it.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------

Gavin Sherry wrote:

The attached patch changes the existing behaviour of length(char(n)).
Currently, this is what happens:

template1=# select length('blah'::char(10));
length
--------
10
(1 row)

With this patch:

template1=# select length('blah'::char(10));
length
--------
4
(1 row)

This behaviour was proposed by Tom back in November last year. (I have
tried to handle multibyte correctly but probably haven't -- hence my email
hackers instead of patches).

The spec doesn't give us any insight (as far as I can tell) as to how we
should do it length(char(n)), but the above seems consistent with other
parts of the code (eg, comparison functions, concatenation).

SQL200X has these choice paragraphs for those who are interested:

<length expression> returns the length of a given character string,
as an exact numeric value, in characters or octets according to the
choice of function.

And:

the result is the number of explicit or implicit
<char length units> in <char length expression>, counted in
accordance with the definition of those units in the relevant
normatively referenced document.

I have no idea what the 'normatively referenced document' is, but grep-ing
through all of SQL200X, 99 and 92 didn't reveal anything too interesting.

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#4)
Re: Make length(char(n)) return 'true' length

Gavin Sherry wrote:

I believe Tom applied this while you were away.

Oh, sorry, I see it now:

test=> select length('blah'::char(10));
length
--------
4
(1 row)

I did test this before placing it the queue, but I now realize I have
been testing regression installs, not command-line installs that I invoke
via psql manually.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Gavin Sherry (#1)
Re: Make length(char(n)) return 'true' length

Already applied. Thanks.

---------------------------------------------------------------------------

Gavin Sherry wrote:

The attached patch changes the existing behaviour of length(char(n)).
Currently, this is what happens:

template1=# select length('blah'::char(10));
length
--------
10
(1 row)

With this patch:

template1=# select length('blah'::char(10));
length
--------
4
(1 row)

This behaviour was proposed by Tom back in November last year. (I have
tried to handle multibyte correctly but probably haven't -- hence my email
hackers instead of patches).

The spec doesn't give us any insight (as far as I can tell) as to how we
should do it length(char(n)), but the above seems consistent with other
parts of the code (eg, comparison functions, concatenation).

SQL200X has these choice paragraphs for those who are interested:

<length expression> returns the length of a given character string,
as an exact numeric value, in characters or octets according to the
choice of function.

And:

the result is the number of explicit or implicit
<char length units> in <char length expression>, counted in
accordance with the definition of those units in the relevant
normatively referenced document.

I have no idea what the 'normatively referenced document' is, but grep-ing
through all of SQL200X, 99 and 92 didn't reveal anything too interesting.

Gavin

Content-Description:

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: Make length(char(n)) return 'true' length

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Looks good to me but I will get some other eyse on it before I apply it.

It's in already ...

regards, tom lane