Compression

Started by Yang Zhangalmost 15 years ago13 messagesgeneral
Jump to latest
#1Yang Zhang
yanghatespam@gmail.com

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yang Zhang (#1)
Re: Compression

On Thursday, April 14, 2011 4:01:54 pm Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

TOAST?
http://www.postgresql.org/docs/9.0/interactive/storage-toast.html
--
Adrian Klaver
adrian.klaver@gmail.com

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Yang Zhang (#1)
Re: Compression

On 15/04/2011 7:01 AM, Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is
out-of-line compression of large values using TOAST.

Row compression would be interesting, but I can't imagine it not having
been investigated already.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Craig Ringer (#3)
Re: Compression

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

On 15/04/2011 7:01 AM, Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is
out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the row as
well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is the default
for most TOAST-able data types. Compression will be attempted first, then out-of-
line storage if the row is still too big. "

Row compression would be interesting, but I can't imagine it not having
been investigated already.

--
Adrian Klaver
adrian.klaver@gmail.com

#5Yang Zhang
yanghatespam@gmail.com
In reply to: Adrian Klaver (#4)
Re: Compression

On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

On 15/04/2011 7:01 AM, Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's

row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is

out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the row as
well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is the
default for most TOAST-able data types. Compression will be attempted first,
then out-of-

line storage if the row is still too big. "

Row compression would be interesting, but I can't imagine it not having

been investigated already.

--

Adrian Klaver

adrian.klaver@gmail.com

Already know about TOAST. I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

--
Yang Zhang
http://yz.mit.edu/

#6mark
dvlhntr@gmail.com
In reply to: Yang Zhang (#5)
Re: Compression

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Yang Zhang
Sent: Thursday, April 14, 2011 6:51 PM
To: Adrian Klaver
Cc: pgsql-general@postgresql.org; Craig Ringer
Subject: Re: [GENERAL] Compression

On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
<adrian.klaver@gmail.com> wrote:

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

On 15/04/2011 7:01 AM, Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's

row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is

out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the

row as

well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is

the

default for most TOAST-able data types. Compression will be attempted

first,

then out-of-

line storage if the row is still too big. "

Row compression would be interesting, but I can't imagine it not

having

been investigated already.

--

Adrian Klaver

adrian.klaver@gmail.com

Already know about TOAST. I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

There is a (closed source) PG fork that has row (or column) oriented storage
that can have compression applied to them.... if you are willing to give up
updates and deletes on the table that is.

I haven't seen a lot of people talking about wanting that in the Postgres
core tho.

-M

Show quoted text

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yang Zhang (#5)
Re: Compression

On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:

adrian.klaver@gmail.com

Already know about TOAST. I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

I am obviously missing something. The TOAST mechanism is designed to keep tuple
data below the default 8KB page size. In fact it kicks in at a lower level than
that:

"The TOAST code is triggered only when a row value to be stored in a table is
wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will
compress and/or move field values out-of-line until the row value is shorter than
TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can be had.
During an UPDATE operation, values of unchanged fields are normally preserved as-
is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none
of the out-of-line values change.'

Granted no all data types are TOASTable. Are you looking for something more
aggressive than that?

--
Adrian Klaver
adrian.klaver@gmail.com

#8Yang Zhang
yanghatespam@gmail.com
In reply to: Adrian Klaver (#7)
Re: Compression

On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:

adrian.klaver@gmail.com

Already know about TOAST. I could've been clearer, but that's not the

same as the block-/page-level compression I was referring to.

I am obviously missing something. The TOAST mechanism is designed to keep
tuple data below the default 8KB page size. In fact it kicks in at a lower
level than that:

"The TOAST code is triggered only when a row value to be stored in a table
is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code
will compress and/or move field values out-of-line until the row value is
shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains
can be had. During an UPDATE operation, values of unchanged fields are
normally preserved as-is; so an UPDATE of a row with out-of-line values
incurs no TOAST costs if none of the out-of-line values change.'

Granted no all data types are TOASTable. Are you looking for something more
aggressive than that?

Yes.

http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html

http://wiki.apache.org/hadoop/UsingLzoCompression

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html

--

Adrian Klaver

adrian.klaver@gmail.com

--
Yang Zhang
http://yz.mit.edu/

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yang Zhang (#8)
Re: Compression

On Thursday, April 14, 2011 7:46:34 pm Yang Zhang wrote:

On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian.klaver@gmail.com>

wrote:

Granted no all data types are TOASTable. Are you looking for something
more aggressive than that?

Yes.

http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html

http://wiki.apache.org/hadoop/UsingLzoCompression

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-
algorithms.html

I can see that as a another use case for SQL/MED in 9.1+.

--

Adrian Klaver

adrian.klaver@gmail.com

--
Adrian Klaver
adrian.klaver@gmail.com

#10Yang Zhang
yanghatespam@gmail.com
In reply to: mark (#6)
Re: Compression

On Thu, Apr 14, 2011 at 6:46 PM, mark <dvlhntr@gmail.com> wrote:

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Yang Zhang
Sent: Thursday, April 14, 2011 6:51 PM
To: Adrian Klaver
Cc: pgsql-general@postgresql.org; Craig Ringer
Subject: Re: [GENERAL] Compression

On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
<adrian.klaver@gmail.com> wrote:

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

On 15/04/2011 7:01 AM, Yang Zhang wrote:

Is there any effort to add compression into PG, a la MySQL's

row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is

out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the

row as

well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is

the

default for most TOAST-able data types. Compression will be attempted

first,

then out-of-

line storage if the row is still too big. "

Row compression would be interesting, but I can't imagine it not

having

been investigated already.

--

Adrian Klaver

adrian.klaver@gmail.com

Already know about TOAST.  I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

There is a (closed source) PG fork that has row (or column) oriented storage
that can have compression applied to them.... if you are willing to give up
updates and deletes on the table that is.

Greenplum and Aster?

We *are* mainly doing analytical (non-updating/deleting) processing.
But it's not a critical pain point - we're mainly interested in FOSS
for now.

I haven't seen a lot of people talking about wanting that in the Postgres
core tho.

-M

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Yang Zhang
http://yz.mit.edu/

#11Craig Ringer
craig@2ndquadrant.com
In reply to: Adrian Klaver (#4)
Re: Compression

On 15/04/2011 8:07 AM, Adrian Klaver wrote:

"EXTENDED allows both compression and out-of-line storage. This is the
default for most TOAST-able data types. Compression will be attempted
first, then out-of-

line storage if the row is still too big. "

Good point. I was unclear; thanks for pointing it out.

What I was trying to say is that there's no whole-row compression, ie
compression of the whole tuple except for minimal headers. A value in a
field may be compressed, but you can't (say) compress a 100-column row
of integers in Pg, because the individual fields don't support compression.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Craig Ringer (#11)
Re: Compression

On Thursday, April 14, 2011 9:37:10 pm Craig Ringer wrote:

On 15/04/2011 8:07 AM, Adrian Klaver wrote:

"EXTENDED allows both compression and out-of-line storage. This is the
default for most TOAST-able data types. Compression will be attempted
first, then out-of-

line storage if the row is still too big. "

Good point. I was unclear; thanks for pointing it out.

What I was trying to say is that there's no whole-row compression, ie
compression of the whole tuple except for minimal headers. A value in a
field may be compressed, but you can't (say) compress a 100-column row
of integers in Pg, because the individual fields don't support compression.

Got it now, thanks.
--
Adrian Klaver
adrian.klaver@gmail.com

#13rtshadow
przemek@hadapt.com
In reply to: mark (#6)
Re: Compression

Where do I find more information about PG fork you mentioned?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Compression-tp4304322p5727363.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.