varchar/char size

Started by Bruce Momjianabout 28 years ago16 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?

I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

--
Bruce Momjian
maillist@candle.pha.pa.us

#2The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#1)
Re: [HACKERS] varchar/char size

On Wed, 7 Jan 1998, Bruce Momjian wrote:

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Can you remind me what the difference is between text and varchar? Why
would you use varchar over text?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#3Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#2)
Re: [HACKERS] varchar/char size

On Wed, 7 Jan 1998, Bruce Momjian wrote:

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Can you remind me what the difference is between text and varchar? Why
would you use varchar over text?

Only because SQL people are used to varchar, and not text, and sometimes
people want to have a maximum size if they are displaying this data in a
form that is only of limited size.

--
Bruce Momjian
maillist@candle.pha.pa.us

#4Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#1)
Re: [HACKERS] varchar/char size

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?
I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Well, in many relational databases access can be optimized by having
fixed-length tuple storage structures. Also, it allows re-use of deleted
space in storage pages. It may be that neither of these points have any
bearing on Postgres, and never will, but unless that clearly the case then
I would be inclined to keep the storage scheme as it is currently.

- Tom

#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#4)
Re: [HACKERS] varchar/char size

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?
I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Well, in many relational databases access can be optimized by having
fixed-length tuple storage structures. Also, it allows re-use of deleted
space in storage pages. It may be that neither of these points have any
bearing on Postgres, and never will, but unless that clearly the case then
I would be inclined to keep the storage scheme as it is currently.

With Ingres and Informix char() is fixed size, while varchar() is
VARiable size.

--
Bruce Momjian
maillist@candle.pha.pa.us

#6Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#5)
Re: [HACKERS] varchar/char size

Bruce Momjian wrote:

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?
I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Well, in many relational databases access can be optimized by having
fixed-length tuple storage structures. Also, it allows re-use of deleted
space in storage pages. It may be that neither of these points have any
bearing on Postgres, and never will, but unless that clearly the case then
I would be inclined to keep the storage scheme as it is currently.

With Ingres and Informix char() is fixed size, while varchar() is
VARiable size.

Go for it. Let me know if I can help with testing or anything...

- Tom

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#6)
Re: [HACKERS] varchar/char size

Bruce Momjian wrote:

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?
I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Well, in many relational databases access can be optimized by having
fixed-length tuple storage structures. Also, it allows re-use of deleted
space in storage pages. It may be that neither of these points have any
bearing on Postgres, and never will, but unless that clearly the case then
I would be inclined to keep the storage scheme as it is currently.

With Ingres and Informix char() is fixed size, while varchar() is
VARiable size.

Go for it. Let me know if I can help with testing or anything...

I know we have text, and that it is better than the others, but if our
varchar() were both variable sized storage, and you could place a max on
it, it would be useful for certain applications.

--
Bruce Momjian
maillist@candle.pha.pa.us

#8Zeugswetter Andreas DBT
Andreas.Zeugswetter@telecom.at
In reply to: Bruce Momjian (#7)
Re: [HACKERS] varchar/char size

Bruce wrote:

On Wed, 7 Jan 1998, Bruce Momjian wrote:

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like

text,

but with a pre-defined limit.

Can you remind me what the difference is between text and

varchar? Why

would you use varchar over text?

Only because SQL people are used to varchar, and not text, and

sometimes

people want to have a maximum size if they are displaying this data in

a

form that is only of limited size

1. Thanks for the very nice change !
2. now the difference:
- varchar must fit directly into tuple
- varchar enforces a supplied max length (as in varchar(256))
- text has no size limit (2Gb in Informix) (therefore should be
pointer to LOB iff >= max row size)
- max size of varchar is limited by max row size (32k)
- max size can be used to align btree index (advantage ?
(Informix does it))
- therefore varchar better performance than text for small texts
(implementation specific)
- index for text ? (is btree useful for avg. 50k html pages ? I
don't think so.)

Andreas

#9Mattias Kregert
matti@algonet.se
In reply to: Bruce Momjian (#1)
Re: [HACKERS] varchar/char size

Bruce Momjian wrote:

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?

I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added?
Shouldn't these types be the same internally, but with maxlen checked
for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT?
Or perhaps CHAR could be put into the same type also?

If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing):

VARCHAR(10) becomes VARTEXT(10, false) // 10 chars, no padding
TEXT becomes VARTEXT(0, false) // infinite length, no padding
CHAR(10) becomes VARTEXT(10, true) // 10 chars, padded

Would not this be easier to handle than three different types? This
type stuff would be handled in the parser. There would be only one
storage function, which could do any kind of coding to make the VARTEXT
take as little space as possible on disk.
Perhaps it would (in some cases) be good to have the possibility to
specify compression of the text. That could be another bool attribute
to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that
people can squeeze the maximum out of their disk space.

A related question: Is it possible to store tuples over more than one
block? Would it be possible to split a big TEXT into multiple blocks?

/* m */

#10Noname
darrenk@insightdist.com
In reply to: Mattias Kregert (#9)
Re: [HACKERS] varchar/char size

A related question: Is it possible to store tuples over more than one
block? Would it be possible to split a big TEXT into multiple blocks?

Possible, but would cut the access speed to (1 / # blocks), no?

There is a var in the tuple header, t_chain, 6.2.1 that has since been
removed for 6.3. I think its original purpose was with time-travel,
_but_, if we go with a ROWID instead of an oid in the future, this could
be put back in the header and would be the actual address of the next
block in the chain.

Oracle has this concept of chained rows. It is how they implement all
of their LONG* types and also handle rows of normal types that are
larger than the block size.

darrenk

#11Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#1)
Re: [HACKERS] varchar/char size

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?

I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

My CA/Ingres Admin manual points out that there is a tradeoff between
compressing tuples to save disk storage and the extra processing work
required to uncompress for use. They suggest that the only case where you
would consider compressing on disk is when your system is very I/O bound,
and you have CPU to burn.

The default for Ingres is to not compress anything, but you can specify
compression on a table-by-table basis.

btw, char() is a bit trickier to handle correctly if you do compress it on
disk, since trailing blanks must be handled correctly all the way through.
For example, you would want 'hi' = 'hi ' to be true, which is not a
requirement for varchar().

- Tom

#12Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Mattias Kregert (#9)
Re: [HACKERS] varchar/char size

Bruce Momjian wrote:

Does someone want to remind me why we allocate the full size for char()
and varchar(), when we really can just allocate the size of the given
string?

I relize char() has to be padded, but why varchar()?

In my experience, char() is full size as defined by create, and
varchar() is the the size of the actual data in the field, like text,
but with a pre-defined limit.

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
This is how Ingres and Informix handle it.

There is very little difference in the types because internally they are
handled the same. The only difference is when we need to specify a max
length, we do that with those types.

Btw, why is VARCHAR not simply an alias for TEXT, with maxlen added?
Shouldn't these types be the same internally, but with maxlen checked
for VARCHAR in the parser and maxlen set to "infinite"(-1?) for TEXT?
Or perhaps CHAR could be put into the same type also?

Right now we do some of the special processing using the OID of VARCHAR
and BPCHAR, which is char(). We would have to generalize the length
idea for each type, which is not hard to do.

If we have a type called VARTEXT(int maxLen, bool doPaddingProcessing):

VARCHAR(10) becomes VARTEXT(10, false) // 10 chars, no padding
TEXT becomes VARTEXT(0, false) // infinite length, no padding
CHAR(10) becomes VARTEXT(10, true) // 10 chars, padded

Would not this be easier to handle than three different types? This
type stuff would be handled in the parser. There would be only one
storage function, which could do any kind of coding to make the VARTEXT
take as little space as possible on disk.
Perhaps it would (in some cases) be good to have the possibility to
specify compression of the text. That could be another bool attribute
to VARTEXT, used by "COMPRESSED VARCHAR()" or "COMPRESSED TEXT" so that
people can squeeze the maximum out of their disk space.

A related question: Is it possible to store tuples over more than one
block? Would it be possible to split a big TEXT into multiple blocks?

I don't know why it is not possible, but I suppose it goes to the
internal workings of PostgreSQL and how rows are added and modified.

--
Bruce Momjian
maillist@candle.pha.pa.us

#13The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#12)
Re: [HACKERS] varchar/char size

On Fri, 9 Jan 1998, Bruce Momjian wrote:

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
This is how Ingres and Informix handle it.

But how do we store this to the file system? If I setup a table
with a char(20), and one of the records has a value of "a", does it then
write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes
("")?

If the second, is there a reason why, as far as writing to the
file system is concerned, char() can't be treated like varchar()? I'd
imagine you could save one helluva lot of "disk space" by doing that, no?

Then again, thinkiing of it that way, I may as well just use
varchar() instead, right?

See, this is what *really* gets me lost...I use text for
everything, since I really haven't got a clue as to *why* I'd want to use
either char() or varchar() instead...

Now, from what I *think* I recall you stating, char() and
varchar() are more for backwards compatibility? Compatibility with other
SQL engines? If so...as long as we have a type char(), does our backend
representation have to be any different between char() and text?

#14Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#13)
Re: [HACKERS] varchar/char size

On Fri, 9 Jan 1998, Bruce Momjian wrote:

Is CHAR padded on disk? Of course it should be padded for
representation, but for storage, couldn't it be stored just like
TEXT or VARCHAR? Before storing, it could be trimmed, and when
read from storage, it could be padded with spaces on the right.

Well, traditionally, CHAR() is fixed length, and VARCHAR() is variable.
This is how Ingres and Informix handle it.

But how do we store this to the file system? If I setup a table
with a char(20), and one of the records has a value of "a", does it then
write 1 byte to the file system, or does it write 1 byte ("a") + 19 bytes
("")?

20+VARHDRSZ bytes for char(20), 1+VARHDRSZ for varchar(20)

If the second, is there a reason why, as far as writing to the
file system is concerned, char() can't be treated like varchar()? I'd
imagine you could save one helluva lot of "disk space" by doing that, no?

But then you have variable length records where char(x) forces a fixed
length. Currently, the code treats all varlena structures as variable,
so we readly don't take advantage of this, but we may some day.

Then again, thinkiing of it that way, I may as well just use
varchar() instead, right?

Yep.

See, this is what *really* gets me lost...I use text for
everything, since I really haven't got a clue as to *why* I'd want to use
either char() or varchar() instead...

Now, from what I *think* I recall you stating, char() and
varchar() are more for backwards compatibility? Compatibility with other
SQL engines? If so...as long as we have a type char(), does our backend
representation have to be any different between char() and text?

We need the fixed length trim cabability of char(), and I think we need
the padding of char() too.

--
Bruce Momjian
maillist@candle.pha.pa.us

#15Mattias Kregert
matti@algonet.se
In reply to: Noname (#10)
Re: [HACKERS] Storing rows bigger than one block

Darren King wrote:

A related question: Is it possible to store tuples over more than one
block? Would it be possible to split a big TEXT into multiple blocks?

Possible, but would cut the access speed to (1 / # blocks), no?

For "big" (multile blocks) rows, maybe. Consecutive blocks should be
buffered by the disk or the os, so I don't think the difference would
be big, or even noticeable.

There is a var in the tuple header, t_chain, 6.2.1 that has since been
removed for 6.3. I think its original purpose was with time-travel,
_but_, if we go with a ROWID instead of an oid in the future, this could
be put back in the header and would be the actual address of the next
block in the chain.

Oracle has this concept of chained rows. It is how they implement all
of their LONG* types and also handle rows of normal types that are
larger than the block size.

Yes! I can't see why PostgreSQL should not be able to store rows bigger
than one block? I have seen people referring to this limitation every
now and then, but I don't understand why it has to be that way?
Is this something fundamental to PostgreSQL?

/* m */

#16Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Noname (#10)
Re: [HACKERS] Storing rows bigger than one block

Mattias Kregert wrote:

Darren King wrote:

There is a var in the tuple header, t_chain, 6.2.1 that has since been
removed for 6.3. I think its original purpose was with time-travel,
_but_, if we go with a ROWID instead of an oid in the future, this could
be put back in the header and would be the actual address of the next
block in the chain.

No, this is not for time-travel. Look at implementation guide.

Oracle has this concept of chained rows. It is how they implement all
of their LONG* types and also handle rows of normal types that are
larger than the block size.

Yes! I can't see why PostgreSQL should not be able to store rows bigger
than one block? I have seen people referring to this limitation every
now and then, but I don't understand why it has to be that way?
Is this something fundamental to PostgreSQL?

^^^^^^^^^^^
It seems that answeer is "No". Just - not implemented feature.
Personally, I would like multi-representation feature more than that.
And easy to implement.

Vadim