Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] Oracle buysInnobase)
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
?column?
----------
(0 rows)
I see how you can interpret the SQL Standard to make the above response
a correct one. But is it the response that you would like?
Suppose, for instance, that you have an employee table in your database.
You have another table called benefits. Perhaps it is even in another
database. Conceivably even in a database other than PostgreSQL.
Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.street
If the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it. If you trim it, then the indexes go out the window. If the indexes
go out the window, then we table scan.
I don't like that behavior. Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.
Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type. But there are lots of database systems badly designed. And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.
-----Original Message-----
From: Tino Wildenhain [mailto:tino@wildenhain.de]
Sent: Wednesday, October 19, 2005 1:05 PM
To: Marc G. Fournier
Cc: Dann Corbit; Richard_D_Levine@raytheon.com; pgsql-
hackers@postgresql.org; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
I'm CC'ng this over to -hackers ... Tom? Comments?
On Wed, 19 Oct 2005, Dann Corbit wrote:
Yes, clearly that is the wrong result according to the SQL
standard.
Here is a SQL*Server query:
select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a 'It returns (correctly): 1
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Marc G. Fournier
Sent: Wednesday, October 19, 2005 11:41 AM
To: Richard_D_Levine@raytheon.com
Cc: pgsql-general@postgresql.org
Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys InnobaseOn Wed, 19 Oct 2005, Richard_D_Levine@raytheon.com wrote:
I was referring to trailing blanks, but did not explicitly say
it,
though showed it in the examples. I am pretty sure that the SQL
standard says that trailing whitespace is insignificant in
string
Show quoted text
comparison.
Then we are broken too :)
# select 'a ' = 'a ';
?column?
----------
f
(1 row)experiment=# SELECT 'a '::char = 'a '::char;
?column?
----------
t
Dann Corbit wrote:
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
?column?
----------
(0 rows)I see how you can interpret the SQL Standard to make the above response
a correct one. But is it the response that you would like?
When the compared datatypes are VARCHAR: YES
When the compared datatypes are fixed length like CHAR: NO
Suppose, for instance, that you have an employee table in your database.
You have another table called benefits. Perhaps it is even in another
database. Conceivably even in a database other than PostgreSQL.Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.streetIf the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it. If you trim it, then the indexes go out the window. If the indexes
go out the window, then we table scan.
If you want to compare datatypes that are different, and you don't have
an index that accomodates, that's the price you pay for comparing
different data types.
I don't like that behavior. Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type. But there are lots of database systems badly designed. And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.-----Original Message-----
From: Tino Wildenhain [mailto:tino@wildenhain.de]
Sent: Wednesday, October 19, 2005 1:05 PM
To: Marc G. Fournier
Cc: Dann Corbit; Richard_D_Levine@raytheon.com; pgsql-
hackers@postgresql.org; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
I'm CC'ng this over to -hackers ... Tom? Comments?
On Wed, 19 Oct 2005, Dann Corbit wrote:
Yes, clearly that is the wrong result according to the SQL
standard.
Here is a SQL*Server query:
select 1 where 'a' = 'a ' AND 'a' = 'a ' AND 'a ' = 'a 'It returns (correctly): 1
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Marc G. Fournier
Sent: Wednesday, October 19, 2005 11:41 AM
To: Richard_D_Levine@raytheon.com
Cc: pgsql-general@postgresql.org
Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys InnobaseOn Wed, 19 Oct 2005, Richard_D_Levine@raytheon.com wrote:
I was referring to trailing blanks, but did not explicitly say
it,
though showed it in the examples. I am pretty sure that the SQL
standard says that trailing whitespace is insignificant instring
comparison.
Then we are broken too :)
# select 'a ' = 'a ';
?column?
----------
f
(1 row)experiment=# SELECT 'a '::char = 'a '::char;
?column?
----------
t---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Terry Fielder
terry@greatgulfhomes.com
Associate Director Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
Fax: (416) 441-9085
-----Original Message-----
From: Terry Fielder [mailto:terry@ashtonwoodshomes.com]
Sent: Wednesday, October 19, 2005 2:05 PM
To: Dann Corbit
Cc: Tino Wildenhain; Marc G. Fournier; Richard_D_Levine@raytheon.com;
pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)Dann Corbit wrote:
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a '
as
varchar(30));
?column?
----------
(0 rows)I see how you can interpret the SQL Standard to make the above
response
a correct one. But is it the response that you would like?
When the compared datatypes are VARCHAR: YES
What is the value of doing that?
I can see plenty of harm and absolutely no return. We are talking about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?
And even if people think it is a good idea, it seems to be at odds with
the way that I read the standard (though I am easily confused by simple
enough language at times).
It could well be that I am jaded from years of doing it the wrong way (I
expect two character strings with all leading non-blanks in agreement to
compare equal).
Perhaps this is old hat to the long-timers around here and there is a
good explanation as to why varchar should have non-blank padding when
comparisons are performed. Can someone point me to documentation that
explains it?
[snip]
Import Notes
Resolved by subject fallback
On Wed, 19 Oct 2005, Dann Corbit wrote:
-----Original Message-----
From: Terry Fielder [mailto:terry@ashtonwoodshomes.com]
Sent: Wednesday, October 19, 2005 2:05 PM
To: Dann Corbit
Cc: Tino Wildenhain; Marc G. Fournier; Richard_D_Levine@raytheon.com;
pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)Dann Corbit wrote:
Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a '
as
varchar(30));
?column?
----------
(0 rows)I see how you can interpret the SQL Standard to make the above
response
a correct one. But is it the response that you would like?
When the compared datatypes are VARCHAR: YES
What is the value of doing that?
I can see plenty of harm and absolutely no return. We are talking about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?
It buys you the ability to store things where trailing spaces are
signficant (for example passwords) within the existing limitations of not
having a full set of the collation behavior.
On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
When the compared datatypes are VARCHAR: YES
What is the value of doing that?
I can see plenty of harm and absolutely no return. We are talking about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?
Well, looking from the point of view of using indexes, indexes can only
really match on things that are equal. Which means the system is going
to have to trim them anyway. I'm of the opinion that strings are
strings and spaces are no different from other characters.
That bit of the standard quoted earlier, if you read the PAD character
that is different from any other character as being the NUL character,
then 'a<space><nul>' is clearly different from 'a<space><space>'. This
whacky space behaviour is something I associate with the char(N) and is
the main reason I never use it.
Perhaps this is old hat to the long-timers around here and there is a
good explanation as to why varchar should have non-blank padding when
comparisons are performed. Can someone point me to documentation that
explains it?
The way I understood it:
char(N) is blank padding
varchar(N) is not
If you make varchar(n) do blank padding, then what's the difference
between the two types? You may as well get rid of one...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Wednesday, October 19, 2005 2:46 PM
To: Dann Corbit
Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
Richard_D_Levine@raytheon.com; pgsql-hackers@postgresql.org; pgsql-
general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
buysInnobase)On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
When the compared datatypes are VARCHAR: YES
What is the value of doing that?
I can see plenty of harm and absolutely no return. We are talking
about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?Well, looking from the point of view of using indexes, indexes can
only
really match on things that are equal. Which means the system is going
to have to trim them anyway. I'm of the opinion that strings are
strings and spaces are no different from other characters.That bit of the standard quoted earlier, if you read the PAD character
that is different from any other character as being the NUL character,
then 'a<space><nul>' is clearly different from 'a<space><space>'. This
whacky space behaviour is something I associate with the char(N) and
is
the main reason I never use it.
Perhaps this is old hat to the long-timers around here and there is
a
good explanation as to why varchar should have non-blank padding
when
comparisons are performed. Can someone point me to documentation
that
explains it?
The way I understood it:
char(N) is blank padding
varchar(N) is notIf you make varchar(n) do blank padding, then what's the difference
between the two types?
Storage. The blank padding is only for comparison purposes.
You may as well get rid of one...
All the other database systems seem to handle it in the way that I
expect.
Which is not to say that it is the right way or that it agrees with the
standard.
But it is how it appears to me, so far.
Import Notes
Resolved by subject fallback
Dann Corbit wrote:
I can see plenty of harm and absolutely no return. We are talking about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?
100% YES!
If two values are the same, then any function on those two values should
return the same result. Otherwise what does "equals" mean? At the very
least length() is broken by your argument.
Here it's CHAR that's broken IMHO - spawn of some punch-card spawned
data processing rules of the 70s.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
Dann Corbit wrote:
I can see plenty of harm and absolutely no return. We are talking about
blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?100% YES!
If two values are the same, then any function on those two values should
return the same result. Otherwise what does "equals" mean? At the very
least length() is broken by your argument.
I agree completely. I would much rather be precise than intuitive. And,
I have done applications where the padding length was important,
especially when working with remote batch processing in the credit
industries and the like. Writing queries to create and process
fixed-width batch files is much easier if you can rely on these kinds of
behaviors.
Show quoted text
Here it's CHAR that's broken IMHO - spawn of some punch-card spawned
data processing rules of the 70s.
--
Richard Huxton
Archonet Ltd---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster