About varlena2

Started by Qingqing Zhouabout 20 years ago3 messages
#1Qingqing Zhou
zhouqq@cs.toronto.edu

To reduce size of varlen2.vl_len to int16. This has been mentioned before,
but is there any show-stopper reasoning preventing us from doing that or
somebody has been working on it?

Sorry, just to repeat myself. Char types will benefit from that. Many
applications are from DB2, Oracle or SQL Server:

Max Char Length
DB2 32672
SQL 8000
Oracle 4000

All of above just need varlena2. To support bigger char types, we could
follow the tradition "long varchar", etc. Or, we can introduce several new
data types like "short varchar" to keep compatible with previous
PostgreSQL applications.

Regards,
Qingqing

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Qingqing Zhou (#1)
Re: About varlena2

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

To reduce size of varlen2.vl_len to int16. This has been mentioned before,
but is there any show-stopper reasoning preventing us from doing that or
somebody has been working on it?

Sorry, just to repeat myself. Char types will benefit from that.

I have considerably less than zero interest in creating variant char
types with an int16 header. The proposal that was on the table was
to use this for numeric and inet types, where it could be done without
introducing any user-visible semantics changes.

regards, tom lane

#3ITAGAKI Takahiro
itagaki.takahiro@lab.ntt.co.jp
In reply to: Qingqing Zhou (#1)
Re: About varlena2

Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:

To reduce size of varlen2.vl_len to int16. This has been mentioned before,
but is there any show-stopper reasoning preventing us from doing that or
somebody has been working on it?

Hi, I'm rewriting the patch that I proposed before.
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg00421.php)
This is another way to reduce the size of variable length types,
using variable length headers.

I'm sure that there are pros and cons of this approach.
Pros.
- Optimized for short variables (length <= 127),
where the header takes only one byte.
- It can represent long data.
Cons.
- More complexity and operations to extract lengths and buffers.
- Needs more works to support TOAST.

To support TOAST, I think the following representations.
It might be good to use only A and B, if TOAST is not needed.

| Representation | Size | Mode |
--+----------------------------+-------+---------------------+
A | 0******* + data | 1 + n | length <= 127 |
B | 10****** + 1 byte + data | 2 + n | length <= 16K -1 |
C | 110----- + 4 bytes + data | 5 + n | length <= 4G -1 |
D | 1110---- + 6 bytes + data | 7 + n | Compressed |
E | 11110--- + 12 bytes | 13 | External |
F | 11111--- + 16 bytes | 17 | External+Compressed |
('*' bits are used for length, '-' are unused.)

Comments welcome,
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories