Compressing table images

Started by Brian Hurtover 19 years ago4 messages
#1Brian Hurt
bhurt@janestcapital.com

My apologies if this subject has already been hashed to death, or if
this is the wrong list, but I was wondering if people had seen this paper:
http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a
database. The huge advantage of doing this is that it reduced the disk
traffic by (approximately) a factor of four- at the cost of more CPU
utilization.

Any thoughts or comments?

Brian

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Brian Hurt (#1)
Re: Compressing table images

Brian Hurt wrote:

My apologies if this subject has already been hashed to death, or if
this is the wrong list, but I was wondering if people had seen this paper:
http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a
database. The huge advantage of doing this is that it reduced the disk
traffic by (approximately) a factor of four- at the cost of more CPU
utilization.
Any thoughts or comments?

I don't know if that is the algorithm we use but PostgreSQL will
compress its data within the table.

Joshua D. Drake

Brian

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Joshua D. Drake (#2)
Re: Compressing table images

Joshua D. Drake wrote:

Brian Hurt wrote:

My apologies if this subject has already been hashed to death, or if
this is the wrong list, but I was wondering if people had seen this paper:
http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a
database. The huge advantage of doing this is that it reduced the disk
traffic by (approximately) a factor of four- at the cost of more CPU
utilization.
Any thoughts or comments?

I don't know if that is the algorithm we use but PostgreSQL will
compress its data within the table.

But only in certain very specific cases. And we compress on a
per-attribute basis. Compressing at the page level is pretty much out
of the question; but compressing at the tuple level I think is doable.
How much benefit that brings is another matter. I think we still have
more use for our limited manpower elsewhere.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#4Jim C. Nasby
jnasby@pervasive.com
In reply to: Alvaro Herrera (#3)
Re: Compressing table images

On Thu, May 11, 2006 at 05:05:26PM -0400, Alvaro Herrera wrote:

Joshua D. Drake wrote:

Brian Hurt wrote:

My apologies if this subject has already been hashed to death, or if
this is the wrong list, but I was wondering if people had seen this paper:
http://www.cwi.nl/htbin/ins1/publications?request=intabstract&key=ZuHeNeBo:ICDE:06

Basically it describes a compression algorithm for tables of a
database. The huge advantage of doing this is that it reduced the disk
traffic by (approximately) a factor of four- at the cost of more CPU
utilization.
Any thoughts or comments?

I don't know if that is the algorithm we use but PostgreSQL will
compress its data within the table.

But only in certain very specific cases. And we compress on a
per-attribute basis. Compressing at the page level is pretty much out
of the question; but compressing at the tuple level I think is doable.
How much benefit that brings is another matter. I think we still have
more use for our limited manpower elsewhere.

Except that I think it would be highly useful to allow users to change
the limits used for both toasting and compressing on a per-table and/or
per-field basis. For example, if you have a varchar(1500) in a table
it's unlikely to ever be large enough to trigger toasting, but if that
field is rarely updated it could be a big win to store it toasted. Of
course you can always create a 'side table' (vertical partitioning), but
all of that framework already exists in the database; we just don't
provide the required knobs. I suspect it wouldn't be that hard to expose
those knobs. In fact, if we could agree on syntax, this is probably a
beginner TODO.

ISTR having this discussion on one of the lists recently, but I can't
find it, and don't see anything in the TODO. Basically, I think we'd
want knobs that say: if this field is over X size, compress it. If it's
over Y size, store it externally. Per-table and per-cluster (ie: GUC)
knobs for that would be damn handy as well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461