Compression (was Re: [HACKERS] varchar/char size)

Started by Andrew Martinabout 28 years ago1 messages

martin@biochemistry.ucl.ac.uk

about 28 years ago

My CA/Ingres Admin manual points out that there is a tradeoff between
compressing tuples to save disk storage and the extra processing work
required to uncompress for use. They suggest that the only case where you
would consider compressing on disk is when your system is very I/O bound,
and you have CPU to burn.

The default for Ingres is to not compress anything, but you can specify
compression on a table-by-table basis.

btw, char() is a bit trickier to handle correctly if you do compress it on
disk, since trailing blanks must be handled correctly all the way through.
For example, you would want 'hi' = 'hi ' to be true, which is not a
requirement for varchar().

- Tom

Anybody thought about real gzip style compression? There's a specialiased
RDBMS called Iditis (written specifically for one task) which, like
PostgreSQL stores data at the file level and uses a gzip-based library
to access the files. I gather this is transparent to the software. Has
anyone thought of anything equivalent for PG/SQL?

To be honest I haven't looked into how Iditis does it (it's a commercial
program and I don't have the source). I don't actually see how this
could be done for small writes of data - how does it build the lookup
tables for the compression? However, it might be worth considering for
use with the text field type.

Andrew
----------------------------------------------------------------------------
Dr. Andrew C.R. Martin University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk (Home) andrew@stagleys.demon.co.uk
URL: http://www.biochem.ucl.ac.uk/~martin
Tel: (Work) +44(0)171 419 3890 (Home) +44(0)1372 275775