Substring auto trim

Started by Charles O'Farrellover 16 years ago8 messagesbugs
Jump to latest
#1Charles O'Farrell
charleso@gmail.com

Hi guys,

I'm not sure whether this a really dumb question, but I'm curious as to what
might be the problem.

We have a column 'foo' which is of type character (not varying).

select substr(foo, 1, 10) from bar

The result of this query are values whose trailing spaces have been trimmed
automatically. This causes incorrect results when comparing to a value that
may contain trailing spaces.

select * from bar where substr(foo, 1, 4) = 'AB '

I should mention that we normally run Oracle and DB2 (and have done for many
years), but I have been pushing for Postgres as an alternative.
Fortunately this is all handled through Hibernate, and so for now I have
wrapped the substr command in rpad which seems to do the trick.

Any light you can shed on this issue would be much appreciated.

Cheers,

Charles O'Farrell

PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu
4.4.1-4ubuntu8) 4.4.1, 32-bit

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Charles O'Farrell (#1)
Re: Substring auto trim

Hello

2010/1/13 Charles O'Farrell <charleso@gmail.com>:

Hi guys,

I'm not sure whether this a really dumb question, but I'm curious as to what
might be the problem.

We have a column 'foo' which is of type character (not varying).

select substr(foo, 1, 10) from bar

The result of this query are values whose trailing spaces have been trimmed
automatically. This causes incorrect results when comparing to a value that
may contain trailing spaces.

select * from bar where substr(foo, 1, 4) = 'AB  '

You have to write C function substr for type "any" :( Because "char"
and char(n) are two different types, and you cannot to write function
for char(n)

I should mention that we normally run Oracle and DB2 (and have done for many
years), but I have been pushing for Postgres as an alternative.
Fortunately this is all handled through Hibernate, and so for now I have
wrapped the substr command in rpad which seems to do the trick.

Any light you can shed on this issue would be much appreciated.

Function substr has first parameter of type "text". When pg call this
function, then it does conversion from char(x) to text.

Regards
Pavel Stehule

Show quoted text

Cheers,

Charles O'Farrell

PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu
4.4.1-4ubuntu8) 4.4.1, 32-bit

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#2)
Re: Substring auto trim

2010/1/13 Pavel Stehule <pavel.stehule@gmail.com>:

Hello

2010/1/13 Charles O'Farrell <charleso@gmail.com>:

Hi guys,

I'm not sure whether this a really dumb question, but I'm curious as to what
might be the problem.

We have a column 'foo' which is of type character (not varying).

select substr(foo, 1, 10) from bar

The result of this query are values whose trailing spaces have been trimmed
automatically. This causes incorrect results when comparing to a value that
may contain trailing spaces.

select * from bar where substr(foo, 1, 4) = 'AB  '

You have to write C function substr for type "any" :( Because "char"
and char(n) are two different types, and you cannot to write function
for char(n)

I should mention that we normally run Oracle and DB2 (and have done for many
years), but I have been pushing for Postgres as an alternative.
Fortunately this is all handled through Hibernate, and so for now I have
wrapped the substr command in rpad which seems to do the trick.

Any light you can shed on this issue would be much appreciated.

I thing, so there is workaround,

create or replace function substr(character, int, int) returns character as $$
select substr($1::cstring::text,$2,$3)
$$ language sql;

postgres=# create table f(a character(5));
CREATE TABLE
postgres=# insert into f values('a'),('ab'),('abc');
INSERT 0 3
postgres=# select * from f;
a
-------
a
ab
abc
(3 rows)

postgres=# select * from f where substr(a,1,3) = 'a ';
a
-------
a
(1 row)

postgres=# select * from f where substr(a,1,3) = 'ab ';
a
-------
ab
(1 row)

Regards
Pavel Stehule

Show quoted text

Function substr has first parameter of type "text". When pg call this
function, then it does conversion from char(x) to text.

Regards
Pavel Stehule

Cheers,

Charles O'Farrell

PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu
4.4.1-4ubuntu8) 4.4.1, 32-bit

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Charles O'Farrell (#1)
Re: Substring auto trim

"Charles O'Farrell" <charleso@gmail.com> writes:

We have a column 'foo' which is of type character (not varying).

select substr(foo, 1, 10) from bar

The result of this query are values whose trailing spaces have been trimmed
automatically. This causes incorrect results when comparing to a value that
may contain trailing spaces.

What's the data type of the value being compared to? I get, for instance,

postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4);
?column?
----------
t
(1 row)

The actual value coming out of the substr() is indeed just 'ab',
but that ought to be considered equal to 'ab ' anyway in char(n)
semantics.

Postgres considers that trailing blanks in a char(n) value are
semantically insignificant, so it strips them when converting to a type
where they would be significant (ie, text or varchar). What's happening
in this scenario is that substr() is defined to take and return text,
so the stripping happens before substr ever sees it.

As Pavel noted, you could possibly work around this particular case by
defining a variant of substr() that takes and returns char(n), but on
the whole I'd strongly advise switching over to varchar/text if
possible. The semantics of char(n) are so weird/braindamaged that
it's best avoided.

BTW, if you do want to use the workaround, this seems sufficient:

create function substr(char,int,int) returns char
strict immutable language internal as 'text_substr' ;

It's the same C code, you're just avoiding the coercion on input.

regards, tom lane

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#4)
Re: Substring auto trim

Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's the data type of the value being compared to? I get, for
instance,

postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4);
?column?
----------
t
(1 row)

This looks like another situation where we're running into trouble
because of non-standard behavior when people might be expecting
something consistent with other products and the explicit language
in the standard.

Quoting from section 5.3 of "WG3:HBA-003 H2-2003-305 August, 2003
(ISO-ANSI Working Draft) Foundation (SQL/Foundation)":

| 13) The declared type of a <character string literal> is
| fixed-length character string. The length of a <character
| string literal> is the number of <character representation>s
| that it contains. Each <quote symbol> contained in <character
| string literal> represents a single <quote> in both the value
| and the length of the <character string literal>. The two
| <quote>s contained in a <quote symbol> shall not be separated
| by any <separator>.
|
| NOTE 72 * <character string literal>s are allowed to be
| zero-length strings (i.e., to contain no characters) even
| though it is not permitted to declare a <data type> that is
| CHARACTER with <length> 0 (zero).

Based on that, the cast of the literals to char(4) in your example
should not be needed. I don't know if there's any reasonable fix
or if this should be handled with a doc change or FAQ entry.

-Kevin

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#5)
Re: Substring auto trim

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's the data type of the value being compared to? I get, for
instance,

postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4);

This looks like another situation where we're running into trouble
because of non-standard behavior when people might be expecting
something consistent with other products and the explicit language
in the standard.

If we were to change that so that 'ab ' were implicitly typed as
char(4), then we'd start getting bug reports from people complaining
that "select 'ab' = 'ab '" yields true. I remain of the opinion that
char(n) is so hopelessly brain-damaged that we should be very careful
to NOT bring it into our mainstream behavior.

regards, tom lane

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#6)
Re: Substring auto trim

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

This looks like another situation where we're running into
trouble because of non-standard behavior when people might be
expecting something consistent with other products and the
explicit language in the standard.

If we were to change that so that 'ab ' were implicitly typed as
char(4), then we'd start getting bug reports from people
complaining that "select 'ab' = 'ab '" yields true. I remain of
the opinion that char(n) is so hopelessly brain-damaged that we
should be very careful to NOT bring it into our mainstream
behavior.

I'm inclined to agree with you, but it does present a barrier to
those migrating. Are there any "migration considerations" documents
where we should mention this? Standards compliance notes in the
docs? Some form of this question seems to be asked frequently....

-Kevin

#8Charles O'Farrell
charleso@gmail.com
In reply to: Kevin Grittner (#7)
Re: Substring auto trim

On Thu, Jan 14, 2010 at 3:21 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov

wrote:

I'm inclined to agree with you, but it does present a barrier to
those migrating. Are there any "migration considerations" documents
where we should mention this? Standards compliance notes in the
docs? Some form of this question seems to be asked frequently....

Many thanks for the quick and detailed responses. Looks like we'll have to
stick with that work around for now.

Part of the problem for us re: varchars is that we are using Cobol where
trailing spaces are significant and litter all our data and queries.

Thanks again.

Charles