Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] Oracle buysInnobase)

Started by Dann Corbitover 20 years ago34 messageshackersgeneral
Jump to latest
#1Dann Corbit
DCorbit@connx.com
hackersgeneral

-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
Sent: Wednesday, October 19, 2005 2:34 PM
To: Dann Corbit
Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
Richard_D_Levine@raytheon.com; pgsql-hackers@postgresql.org; pgsql-
general@postgresql.org
Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy]

[GENERAL]

Oracle buysInnobase)

On Wed, 19 Oct 2005, Dann Corbit wrote:

-----Original Message-----
From: Terry Fielder [mailto:terry@ashtonwoodshomes.com]
Sent: Wednesday, October 19, 2005 2:05 PM
To: Dann Corbit
Cc: Tino Wildenhain; Marc G. Fournier;

Richard_D_Levine@raytheon.com;

pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL]

Oracle

buysInnobase)

Dann Corbit wrote:

Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:

connxdatasync=# select 1 where cast('a' as varchar(30)) =

cast('a '

as

varchar(30));
?column?
----------
(0 rows)

I see how you can interpret the SQL Standard to make the above

response

a correct one. But is it the response that you would like?

When the compared datatypes are VARCHAR: YES

What is the value of doing that?

I can see plenty of harm and absolutely no return. We are talking

about

blank padding before comparison. Do you really want 'Danniel '
considered distinct from 'Danniel ' in a comparison? In real life,
what does that buy you?

It buys you the ability to store things where trailing spaces are
signficant (for example passwords) within the existing limitations of

not

having a full set of the collation behavior.

I suppose that there may be some value in having 'J&^% ' be different
from 'J&^% '.

I would expect to insert a password with trailing blanks to be the same
as inserting a password with no trailing blanks.

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine. It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area. When I read the standard, it looked to me like
PostgreSQL was not performing correctly. It is not unlikely that I read
it wrong.

#2Josh Berkus
josh@agliodbs.com
In reply to: Dann Corbit (#1)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Dann,

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.

AFAIT, the standard says "implementation-specific". So we're standard.

The main cost for comparing trimmed values is performance; factoring an
rtrim into every comparison will add significant overhead to the already
CPU-locked process of, for example, creating indexes. We're looking for
ways to make the comparison operators lighter-weight, not heavier.

My general perspective on this is that if trailing blanks are a significant
hazard for your application, then trim them on data input. That requires
a *lot* less peformance overhead than doing it every time you compare
something.

Changing the behaviour would break backwards compatibility for some users.
For that matter, I've been subscribed to 8 PostgreSQL mailing lists since
1999, and this is the first time I can recall someone complaining about
this comparison behavior. So it's obviously not a widespread issue.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#3Dann Corbit
DCorbit@connx.com
In reply to: Josh Berkus (#2)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

If there is a significant performance benefit to not expanding text columns in comparison operations, then it seems it should be OK.

I probably read the standard wrong, but it seems to me that varchar, char, and bpchar columns should all behave the same (e.g. if you do not expand with <blank> or the PAD character (whatever that is) then all char type columns should behave the same. I guess that there could be different default collations for different column types though (that is clearly allowed in the standard). Perhaps it just needs to be documented in such a way that even a blockhead like me can comprehend it easily.

Show quoted text

-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Wednesday, October 19, 2005 5:06 PM
To: pgsql-hackers@postgresql.org
Cc: Dann Corbit; Stephan Szabo; Terry Fielder; Tino Wildenhain; Marc G.
Fournier; Richard_D_Levine@raytheon.com; pgsql-general@postgresql.org
Subject: Re: [HACKERS] 'a' == 'a '

Dann,

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.

AFAIT, the standard says "implementation-specific". So we're standard.

The main cost for comparing trimmed values is performance; factoring an
rtrim into every comparison will add significant overhead to the already
CPU-locked process of, for example, creating indexes. We're looking for
ways to make the comparison operators lighter-weight, not heavier.

My general perspective on this is that if trailing blanks are a
significant
hazard for your application, then trim them on data input. That requires
a *lot* less peformance overhead than doing it every time you compare
something.

Changing the behaviour would break backwards compatibility for some users.
For that matter, I've been subscribed to 8 PostgreSQL mailing lists since
1999, and this is the first time I can recall someone complaining about
this comparison behavior. So it's obviously not a widespread issue.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#4Chris Travers
chris@travelamericas.com
In reply to: Josh Berkus (#2)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Josh Berkus wrote:

Dann,

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine. It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area. When I read the standard, it looked to me like
PostgreSQL was not performing correctly. It is not unlikely that I read
it wrong.

AFAIT, the standard says "implementation-specific". So we're standard.

The main cost for comparing trimmed values is performance; factoring an
rtrim into every comparison will add significant overhead to the already
CPU-locked process of, for example, creating indexes. We're looking for
ways to make the comparison operators lighter-weight, not heavier.

If I understand the spec correctly, it seems to indicate that this is
specific to the locale/character set. Assuming that the standard
doesn't have anything to do with any character sets, it should be
possible to make this available for those who want it as an initdb
option. Whether or not this is important enough to offer or not is
another matter.

Personally my questions are:

1) How many people have been bitten by this badly?
2) How many people have been bitten by joins that depend on padding?

Personally, unlike case folding, this seems to be an area where a bit of
documentation (i.e. all collation sets have are assumed to have the NO
PAD option in the SQL standard) would be sufficient to answer to
questions of standards-compliance.

My general perspective on this is that if trailing blanks are a significant
hazard for your application, then trim them on data input. That requires
a *lot* less peformance overhead than doing it every time you compare
something.

In general I agree. But I am not willing to jump to the conclusion that
it will never be warranted to add this as an initdb option. I am more
interested in what cases people see where this would be required. But I
agree that the bar is much higher than it is in many other cases.

Best Wishes,
Chris Travers
Metatron Technology Consulting

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#4)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Chris Travers <chris@travelamericas.com> writes:

If I understand the spec correctly, it seems to indicate that this is
specific to the locale/character set.

The spec associates padding behavior with collations, which per spec are
separate from the datatypes --- that is, you should be able to able to
specify a collation for each string-type table column (whether char(N)
or varchar(N)) and even for each literal string constant. We do not
currently have that capability, and accordingly fall back to binding
PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).

AFAICS this choice is allowed by the spec since the default collation is
implementation-defined.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Tom Lane <tgl@sss.pgh.pa.us> writes:

Chris Travers <chris@travelamericas.com> writes:

If I understand the spec correctly, it seems to indicate that this is
specific to the locale/character set.

The spec associates padding behavior with collations, which per spec are
separate from the datatypes --- that is, you should be able to able to
specify a collation for each string-type table column (whether char(N)
or varchar(N)) and even for each literal string constant. We do not
currently have that capability, and accordingly fall back to binding
PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).

AFAICS this choice is allowed by the spec since the default collation is
implementation-defined.

Does it even make sense for char(N) to not be space padded? I had the
impression char(N) was always N characters long, not more or less. I can't
picture any other character being used for padding, then you would need a more
flexible rtrim function.

And I can understand the collation order determining whether 'a' and 'a '
compare equal. But surely if you store 'a' in a varchar(N) you have to get 'a'
back out, not some other string! Does the spec really allow varchar to
actually be padded and not just compare ignoring trailing space?

(I can't believe anyone really wants varchar to be space padded. Space padding
always seemed like a legacy feature for databases with fixed record length
data types. Why would anyone want a string data type that can't represent all
strings?)

--
greg

#7Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#6)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

On 10/20/2005 2:17 AM, Greg Stark wrote:

(I can't believe anyone really wants varchar to be space padded. Space padding
always seemed like a legacy feature for databases with fixed record length
data types. Why would anyone want a string data type that can't represent all
strings?)

They must have buried that "bow to COBOL" so deep in the code that they
had no choice but to abuse their power and stuff this cruft into the
standard.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#8Andrew Kelly
akelly@corisweb.org
In reply to: Jan Wieck (#7)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

On Thu, 2005-10-20 at 08:14 -0400, Jan Wieck wrote:

On 10/20/2005 2:17 AM, Greg Stark wrote:

(I can't believe anyone really wants varchar to be space padded. Space padding
always seemed like a legacy feature for databases with fixed record length
data types. Why would anyone want a string data type that can't represent all
strings?)

They must have buried that "bow to COBOL" so deep in the code that they
had no choice but to abuse their power and stuff this cruft into the
standard.

MOVE SPACES TO 0101

You just gotta love it

Andy

#9Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Dann Corbit (#3)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

At 05:33 PM 10/19/2005 -0700, Dann Corbit wrote:

If there is a significant performance benefit to not expanding text
columns in comparison operations, then it seems it should be OK.

I probably read the standard wrong, but it seems to me that varchar, char,
and bpchar columns should all behave the same (e.g. if you do not expand
with <blank> or the PAD character (whatever that is) then all char type
columns should behave the same. I guess that there could be different
default collations for different column

I am not a DB guru. BUT IMO they should NOT behave the same.

Varchars should NOT be padded.

For the very reason when you select text out of varchar fields the result
is not padded.

If I insert a string with a single trailing space into a varchar, I _want_
that single trailing space to still be there when I retrieve it, and not
followed by more spaces. Otherwise I will have to pick a different database ;).

So similarly, I would expect that varchars 'a ' and 'a ' when compared
should be different.

However, in the case of _chars_ which are padded, then 'a ' should be
padded so that it can be compared with 'a '.

Otherwise there will be no reason to do equality comparisons of char(5)
fields with char(8) fields - they can NEVER be the same :).

But would that mean that when one does equality comparisons of varchars
with chars, one would probably want padding? Or only varchars of the same
length as the char would have a chance of matching?

Hmm.. I think I better leave this one to the DB gurus :). But I really
don't ever want 'a ' to be the same as 'a ' for varchars.

Link.

#10Dann Corbit
DCorbit@connx.com
In reply to: Lincoln Yeoh (#9)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Greg Stark
Sent: Wednesday, October 19, 2005 11:17 PM
To: Tom Lane
Cc: Chris Travers; josh@agliodbs.com; pgsql-hackers@postgresql.org;

Dann

Corbit; Stephan Szabo; Terry Fielder; Tino Wildenhain; Marc G.

Fournier;

Richard_D_Levine@raytheon.com; pgsql-general@postgresql.org
Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '

Tom Lane <tgl@sss.pgh.pa.us> writes:

Chris Travers <chris@travelamericas.com> writes:

If I understand the spec correctly, it seems to indicate that this

is

specific to the locale/character set.

The spec associates padding behavior with collations, which per spec

are

separate from the datatypes --- that is, you should be able to able

to

specify a collation for each string-type table column (whether

char(N)

or varchar(N)) and even for each literal string constant. We do not
currently have that capability, and accordingly fall back to binding
PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).

AFAICS this choice is allowed by the spec since the default

collation is

implementation-defined.

Does it even make sense for char(N) to not be space padded? I had the
impression char(N) was always N characters long, not more or less. I

can't

picture any other character being used for padding, then you would

need a

more
flexible rtrim function.

And I can understand the collation order determining whether 'a' and

'a '

compare equal. But surely if you store 'a' in a varchar(N) you have to

get

'a'
back out, not some other string! Does the spec really allow varchar to
actually be padded and not just compare ignoring trailing space?

(I can't believe anyone really wants varchar to be space padded. Space
padding
always seemed like a legacy feature for databases with fixed record

length

data types. Why would anyone want a string data type that can't

represent

all
strings?)

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the same
length before performing the comparison.

Hence, you will see that 'Danniel' = 'Danniel ' is true in most cases.

Now, this really does not have any connection with storage or varchar or
bpchar or char or text or anything like that.

It is only the action to be taken when a comparison operation is
performed.

#11Chris Travers
chris@travelamericas.com
In reply to: Dann Corbit (#10)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Dann Corbit wrote:

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

IIrc, varchar and bpchar are stored in a similar way, but are presented
differently when retrieved. I.e. storage is separate from presentation
in this case. I.e. the padding in bpchar occurs when it is presented
and stripped when it is stored.

Again, I am happy "solving" this simply by documenting it since any
questions of interpretation and implimentation of the standard would be
answered. So far what I (and I am sure others) have not heard is a
strong case for changing the behavior, given that it is in line with a
reasonable interpretation of the standards.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the same
length before performing the comparison.

Understood, but what gain do you have in a case like this that might
justify the effort that would go into making it, say, an initdb option?
How often does this behavior cause problems?

Best Wishes,
Chris Travers
Metatron Technology Consulting

#12Dann Corbit
DCorbit@connx.com
In reply to: Chris Travers (#11)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

-----Original Message-----
From: Chris Travers [mailto:chris@travelamericas.com]
Sent: Thursday, October 20, 2005 11:53 AM
To: Dann Corbit
Cc: Greg Stark; Tom Lane; Chris Travers; josh@agliodbs.com; pgsql-
hackers@postgresql.org; Stephan Szabo; Terry Fielder; Tino Wildenhain;
Marc G. Fournier; Richard_D_Levine@raytheon.com; pgsql-
general@postgresql.org
Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '

Dann Corbit wrote:

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

IIrc, varchar and bpchar are stored in a similar way, but are

presented

differently when retrieved. I.e. storage is separate from

presentation

in this case. I.e. the padding in bpchar occurs when it is presented
and stripped when it is stored.

Again, I am happy "solving" this simply by documenting it since any
questions of interpretation and implimentation of the standard would

be

answered. So far what I (and I am sure others) have not heard is a
strong case for changing the behavior, given that it is in line with a
reasonable interpretation of the standards.

I believe that this is a reasonable response. In order to comply with
the standard, bpchar and varchar would have to be stored with different
default collating sequences (which is fine with me). If (indeed) that
is the case, the only action needed would be to document the collating
sequences used.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the

same

length before performing the comparison.

Understood, but what gain do you have in a case like this that might
justify the effort that would go into making it, say, an initdb

option?

How often does this behavior cause problems?

I do not even know if it is a good idea. I was just pointing out that
the behavior of PostgreSQL is different from all the big database
vendors in this area and according to my reading of the standard, the
behavior was not compliant.

As to how often it causes a problem, I can't say. It has caused me
puzzlement on a few occasions, but no end of the world disasters.

#13Richard D Levine
Richard_D_Levine@raytheon.com
In reply to: Chris Travers (#11)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

I will happily reiterate that I am the troll who started this mess by
whining about how *Oracle* handles this. Tom's explanation that CHAR is
has a PAD collation and VARCHAR has a NO PAD collation have restored my
faith that there is goodness in the world. My whining was out of
ignorance. I wouldn't change the proper way PostgreSQL works. Documenting
it is good. I will use this new found knowledge from now on in my database
designs.

Cheers,

Rick

Chris Travers <chris@travelamericas.com> wrote on 10/20/2005 01:52:36 PM:

Show quoted text

Dann Corbit wrote:

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

IIrc, varchar and bpchar are stored in a similar way, but are presented
differently when retrieved. I.e. storage is separate from presentation
in this case. I.e. the padding in bpchar occurs when it is presented
and stripped when it is stored.

Again, I am happy "solving" this simply by documenting it since any
questions of interpretation and implimentation of the standard would be
answered. So far what I (and I am sure others) have not heard is a
strong case for changing the behavior, given that it is in line with a
reasonable interpretation of the standards.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the same
length before performing the comparison.

Understood, but what gain do you have in a case like this that might
justify the effort that would go into making it, say, an initdb option?
How often does this behavior cause problems?

Best Wishes,
Chris Travers
Metatron Technology Consulting

#14John D. Burger
john@mitre.org
In reply to: Dann Corbit (#10)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

[Removed all the non-list addresses]

Dann Corbit wrote:

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the
same
length before performing the comparison.

Hence, you will see that 'Danniel' = 'Danniel ' is true in most cases.

Now, this really does not have any connection with storage or varchar
or
bpchar or char or text or anything like that.

Is this really true??? My understanding of the spec was that this was
=exactly= the difference between char(N) and varchar(N) - the former is
padded to length N when you store it, or at least the DB has to act as
if this is the case. Can someone quote the appropriate chapter and
verse?

Thanks.

- John D. Burger
MITRE

#15Dann Corbit
DCorbit@connx.com
In reply to: John D. Burger (#14)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Look back in the stack and you will find that I have quoted chapter and
verse (see the attached html file in a previous email that I sent).

This is in relation to the comparison operator.

-----Original Message-----
From: John D. Burger [mailto:john@mitre.org]
Sent: Thursday, October 20, 2005 12:53 PM
To: Dann Corbit
Cc: <pgsql-hackers@postgresql.org>; pgsql-general General
Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '

[Removed all the non-list addresses]

Dann Corbit wrote:

Let me make something clear:
When we are talking about padding here it is only in the context of

a

comparison operator and NOT having anything to do with storage.

Given two strings of different in a comparison, most database

systems

(by default) will blank pad the shorter string so that they are the
same
length before performing the comparison.

Hence, you will see that 'Danniel' = 'Danniel ' is true in most

cases.

Now, this really does not have any connection with storage or

varchar

or
bpchar or char or text or anything like that.

Is this really true??? My understanding of the spec was that this was
=exactly= the difference between char(N) and varchar(N) - the former

is

padded to length N when you store it, or at least the DB has to act as
if this is the case. Can someone quote the appropriate chapter and
verse?

Again, the operations that I was talking about literally have nothing to
do with the internal data type of the string, as long as it is
fundamentally a character type. I was referring to what happens when
two strings are involved in a comparison operation. The standard spells
it out very clearly. Whether something is blank padded or not during a
comparison has nothing whatever to do with the type of the string and
everything to do with the collating sequence.

Show quoted text

Thanks.

- John D. Burger
MITRE

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Travers (#11)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Chris Travers <chris@travelamericas.com> writes:

IIrc, varchar and bpchar are stored in a similar way, but are presented
differently when retrieved. I.e. storage is separate from presentation
in this case. I.e. the padding in bpchar occurs when it is presented
and stripped when it is stored.

This is not so, although I've occasionally wondered whether we shouldn't
try to make it so. Certainly we don't rely on char(N) to be physically
fixed-width (and can't, at least not with variable-width character
encodings) so there can be no performance advantage to actually storing
the insignificant spaces. The hard part would be in figuring out how
the output routine could know how many spaces to add back.

regards, tom lane

#17Richard D Levine
Richard_D_Levine@raytheon.com
In reply to: Tom Lane (#16)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Tom Lane <tgl@sss.pgh.pa.us> wrote on 10/20/2005 03:11:23 PM:
<snip>

The hard part would be in figuring out how
the output routine could know how many spaces to add back.

The length is in the metadata for the column, or am I being dense?

Show quoted text

regards, tom lane

#18Dann Corbit
DCorbit@connx.com
In reply to: Richard D Levine (#17)
hackers
Re: [GENERAL] 'a' == 'a '

-----Original Message-----
From: Richard_D_Levine@raytheon.com

[mailto:Richard_D_Levine@raytheon.com]

Sent: Thursday, October 20, 2005 2:12 PM
To: Tom Lane
Cc: Chris Travers; Dann Corbit; Greg Stark; josh@agliodbs.com; pgsql-
general@postgresql.org; pgsql-hackers@postgresql.org; Marc G.

Fournier;

Stephan Szabo; Terry Fielder; Tino Wildenhain
Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '

Tom Lane <tgl@sss.pgh.pa.us> wrote on 10/20/2005 03:11:23 PM:
<snip>

The hard part would be in figuring out how
the output routine could know how many spaces to add back.

The length is in the metadata for the column, or am I being dense?

I guess that what Tom is saying is that it would be nice to store
everything as VARCHAR. But with (for instance) BPCHAR, the returned
string is blank padded. So if you store
'Danniel'
in BPCHAR(20), you will get back:
'Danniel '
But if you store
'Danniel'
In VARCHAR(20)
You will get back exactly what you put in.

I guess that additional ambiguity arises if you add additional spaces to
the end. Many database systems solve this by trimming the characters
from the end of the string upon storage and the returned string will not
have any trailing blanks. I am not arguing pro nor con this way of
doing things.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard D Levine (#17)
hackersgeneral
Re: [HACKERS] 'a' == 'a '

Richard_D_Levine@raytheon.com writes:

Tom Lane <tgl@sss.pgh.pa.us> wrote on 10/20/2005 03:11:23 PM:

The hard part would be in figuring out how
the output routine could know how many spaces to add back.

The length is in the metadata for the column, or am I being dense?

The output routine hasn't got access to that metadata. In general,
in fact, a random Datum can't be linked to any column since it may have
been generated on-the-fly.

It'd be easy if we stuck the column width into the stored representation
of bpchar, but this would defeat one of the purposes, ie, to make the
stored representations of text and bpchar alike.

regards, tom lane

#20Richard D Levine
Richard_D_Levine@raytheon.com
In reply to: Dann Corbit (#18)
hackers
Re: [GENERAL] 'a' == 'a '

"Dann Corbit" <DCorbit@connx.com> wrote on 10/20/2005 04:24:26 PM:

-----Original Message-----
From: Richard_D_Levine@raytheon.com

[mailto:Richard_D_Levine@raytheon.com]

Sent: Thursday, October 20, 2005 2:12 PM
To: Tom Lane
Cc: Chris Travers; Dann Corbit; Greg Stark; josh@agliodbs.com; pgsql-
general@postgresql.org; pgsql-hackers@postgresql.org; Marc G.

Fournier;

Stephan Szabo; Terry Fielder; Tino Wildenhain
Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '

Tom Lane <tgl@sss.pgh.pa.us> wrote on 10/20/2005 03:11:23 PM:
<snip>

The hard part would be in figuring out how
the output routine could know how many spaces to add back.

The length is in the metadata for the column, or am I being dense?

I guess that what Tom is saying is that it would be nice to store
everything as VARCHAR.

<snip>

I get that part, but he asked how to determine the proper output length
based on the PAD semantics, and I was saying to just pad whatever is stored
to the length available in the column metadata for those collations that
require padding.

I guess that additional ambiguity arises if you add additional spaces to
the end. Many database systems solve this by trimming the characters
from the end of the string upon storage and the returned string will not
have any trailing blanks. I am not arguing pro nor con this way of
doing things.

For PAD correlations, Oracle stores the blanks. Interbase does not. Going
from Interbase to Oracle I switched from CHAR to VARCHAR2 because of this,
shot my self in the foot by not understanding PAD, whined about it, and you
know the rest.

As to how it is output, I know the language interface has an effect.
Embedded SQL in C will put the null terminator in different places for
different databases and different switch settings particular to each
database.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#18)
hackers
#22Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#21)
hackers
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Dann Corbit (#22)
hackers
#24Wilkin, Kurt
Kurt.Wilkin@fnzc.co.nz
In reply to: Andrew Dunstan (#23)
hackers
#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Wilkin, Kurt (#24)
hackers
#26Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
hackers
#27Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#26)
hackers
#28Bruce Momjian
bruce@momjian.us
In reply to: Dann Corbit (#27)
hackers
#29Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#28)
hackers
#30Bruce Momjian
bruce@momjian.us
In reply to: Dann Corbit (#29)
hackers
#31Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#30)
hackers
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#31)
hackers
#33Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#32)
hackers
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#33)
hackers