varchar(), text,char() overhead

Started by Bruce Momjianalmost 28 years ago8 messages
#1Bruce Momjian
maillist@candle.pha.pa.us

Do people want the overhead of char(), varchar(), and text to be reduced
from 4-bytes to 2-bytes. We store the length in this overhead, but
since we have a size limit on tuple size, we can't have a field over 8k
in size anyway. Even if we up that to 32k for 6.3, we still only use 2
bytes.

I have added it to the TODO list. Most of the code already supports it
by using VARSIZE and VARDATA macros. Once the structure size changes,
the macros change too. The only issue is places where they take the
first four bytes of the variable-length type and cast it to an int32,
which will not work in this case. We have to change this so it uses the
macros too.

--
Bruce Momjian
maillist@candle.pha.pa.us

#2Noname
darrenk@insightdist.com
In reply to: Bruce Momjian (#1)
Re: [HACKERS] varchar(), text,char() overhead

Do people want the overhead of char(), varchar(), and text to be reduced
from 4-bytes to 2-bytes. We store the length in this overhead, but
since we have a size limit on tuple size, we can't have a field over 8k
in size anyway. Even if we up that to 32k for 6.3, we still only use 2
bytes.

I have added it to the TODO list. Most of the code already supports it
by using VARSIZE and VARDATA macros. Once the structure size changes,
the macros change too. The only issue is places where they take the
first four bytes of the variable-length type and cast it to an int32,
which will not work in this case. We have to change this so it uses the
macros too.

Would be a nice space-saver if you have tables with many small text fields.

Dig out that old message of mine concerning block size and check out item #4.

Excerpted below if you've finally deleted it... :) :)

Date: Wed, 29 Jan 1997 13:38:10 -0500
From: aixssd!darrenk (Darren King)
Subject: [HACKERS] Max size of data types and tuples.
...
4. Since only 13 bits are needed for storing the size of these
textual fields in a tuple, could PostgreSql use a 16-bit int to
store it? Currently, the size is padded to four bytes in the
tuple and this eats space if you have many textual fields.
Without further digging, I'm assuming that the size is double-word
aligned so that the actual text starts on a double-word boundary.
...

darrenk

#3Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#2)
Re: [HACKERS] varchar(), text,char() overhead

macros too.

Would be a nice space-saver if you have tables with many small text fields.

Dig out that old message of mine concerning block size and check out item #4.

Excerpted below if you've finally deleted it... :) :)

Date: Wed, 29 Jan 1997 13:38:10 -0500
From: aixssd!darrenk (Darren King)
Subject: [HACKERS] Max size of data types and tuples.
...
4. Since only 13 bits are needed for storing the size of these
textual fields in a tuple, could PostgreSql use a 16-bit int to
store it? Currently, the size is padded to four bytes in the
tuple and this eats space if you have many textual fields.
Without further digging, I'm assuming that the size is double-word
aligned so that the actual text starts on a double-word boundary.
...

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the
'special variable' length value from -1 to 0.

--
Bruce Momjian
maillist@candle.pha.pa.us

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Bruce Momjian (#3)
Re: [HACKERS] varchar(), text,char() overhead

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the
'special variable' length value from -1 to 0.

I am not going to make any changes to the variable length overhead for
char(), varchar(), and text at this time. It is too close to beta. I
will keep the item on the TODO list, and we can hash it out later.

--
Bruce Momjian
maillist@candle.pha.pa.us

#5Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#3)
Re: [HACKERS] varchar(), text,char() overhead

Bruce Momjian wrote:

macros too.

Would be a nice space-saver if you have tables with many small text fields.

Dig out that old message of mine concerning block size and check out item #4.

Excerpted below if you've finally deleted it... :) :)

Date: Wed, 29 Jan 1997 13:38:10 -0500
From: aixssd!darrenk (Darren King)
Subject: [HACKERS] Max size of data types and tuples.
...
4. Since only 13 bits are needed for storing the size of these
textual fields in a tuple, could PostgreSql use a 16-bit int to
store it? Currently, the size is padded to four bytes in the
tuple and this eats space if you have many textual fields.
Without further digging, I'm assuming that the size is double-word
aligned so that the actual text starts on a double-word boundary.
...

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the

^^^^^^^^^^^^^^^^^^^^^^
Also, multi-representation feature allows to have 2Gb in varlena fields.

'special variable' length value from -1 to 0.

Yes, it's way.

Vadim

#6Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Vadim B. Mikheev (#5)
Re: [HACKERS] varchar(), text,char() overhead

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the

^^^^^^^^^^^^^^^^^^^^^^
Also, multi-representation feature allows to have 2Gb in varlena fields.

What is multi-representation feature? Large objects?

--
Bruce Momjian
maillist@candle.pha.pa.us

#7Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Bruce Momjian (#6)
Re: [HACKERS] varchar(), text,char() overhead

Bruce Momjian wrote:

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the

^^^^^^^^^^^^^^^^^^^^^^
Also, multi-representation feature allows to have 2Gb in varlena fields.

What is multi-representation feature? Large objects?

Yes. Server could store varlena fields in LO when size of field or
tuple at whole is too big to be stored in relation blocks.
This allows to have tuples much longer than data blocks.
This is also Ok for performance sometime (if big varlenas are not used
in WHERE they could be not read from disk for each tuple; if UPDATE don't
change out-stored varlenas they could be not stored twice).

We could use vl_len < 0 for out-stored varlenas: vl_len = -1000
could mean that size of data is 1000 bytes, data stored in LO and
LO' id (oid?) is in vl_dat. It seems easy to implement (without
optimization of access to data).

Vadim

#8Noname
darrenk@insightdist.com
In reply to: Vadim B. Mikheev (#7)
Re: [HACKERS] varchar(), text,char() overhead

I had forgotten about your mention of this. I am running some tests
now, and things look promising. However, if we go to 64k or 128k
tuples, we would be in trouble. (We can do 64k tuples by changing the
'special variable' length value from -1 to 0.

I am not going to make any changes to the variable length overhead for
char(), varchar(), and text at this time. It is too close to beta. I
will keep the item on the TODO list, and we can hash it out later.

I've been slowed this week...totalled my car Sat. nite (actually some
other bonehead did it for me), so I've been a touch busy with insurance
agents, etc...but I _will_ have this in by the beta date.

Tuples will only go up to 32k since there are only 15 bits available to
point to items on the page. Unless we want to expand that structure...I
tested bit-field alignment on aix and it seems to favor 4-byte boundaries.

For now the three bit fields total 32 bits, but expand those and the size
is padded up to 64. I have no idea how gcc or other compilers align bit
fields. I been working without trying to expand this structure size, so
I've stuck to 32k (15-bits) as the limit.

I have so far...

1. Synced all references to the BLCKSZ define.
2. Made a "-k" option to postgres and created a global "BlockSize" variable.
3. Fixed the places where BLCKSZ was used in variable declarations to use
the BlockSize global.

To do...

1. Should the block size of a database be written to a file like the version?
And then be read in when postmaster starts and passed to each backend? This
would limit all of the databases in one PG_DATA directory to the same block
size. Couldn't do it on a "per database" basis since the template is only
created once by initdb.

2. Should the limit of the char fields be based on the block size? Been trying
to get this to work. Creates fields just fine, but backend seems to be passing
the fields back padded to the full size.

Is it possible for the back and front end to have a tuple split across packets
or does everything have to be in one 8k packet? Rather than have the interfaces
needing to know about differing sizes, could tuples-spanning-packets be added
to the libpq protocol somehow? Or would this have a bottleneck effect?

darrenk