Are there plans to add data compression feature to postgresql?
_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
On Sun, Oct 26, 2008 at 9:54 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
Are there plans to add data compression feature to postgresql?
There already is data compression in postgresql.
Scott-
Straight from Postgres doc
The zlib compression
library will be used by default. If you don't want to use it then you must
specify the --without-zlib option for configure. Using this option disables support for compressed
archives in pg_dump and pg_restore. Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.
Date: Sun, 26 Oct 2008 10:37:01 -0600
From: scott.marlowe@gmail.com
To: guxiaobo1982@hotmail.com
Subject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?
CC: pgsql-general@postgresql.orgOn Sun, Oct 26, 2008 at 9:54 AM, 小锟斤拷 锟斤拷 <guxiaobo1982@hotmail.com> wrote:
Are there plans to add data compression feature to postgresql?
There already is data compression in postgresql.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
_________________________________________________________________
Stay organized with simple drag and drop from Windows Live Hotmail.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_102008
2008/10/26 Martin Gainty <mgainty@hotmail.com>:
Scott-
Straight from Postgres doc
The zlib compression library will be used by default. If you don't want to
use it then you must specify the --without-zlib option for configure. Using
this option disables support for compressed archives in pg_dump and
pg_restore.
I was thinking more along the lines of the automatic compression of
text types over 4k or so when they are moved out of line and into
toast tables.
The original question was a little vague though wasn't it?
You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.
Note that most data stored in the TOAST table is compressed.
IE a Text type with length greater than around 2K will be stored in the
TOAST table. By default data in the TOAST table is compressed, this can
be overriden.
However I expect that compression will reduce the performance of certain
queries.
http://www.postgresql.org/docs/8.3/interactive/storage-toast.html
Out of interested, in what context did you want compression?
Ron Mayer <rm_pg@cheapcomplexdevices.com>
Sent by: pgsql-general-owner@postgresql.org
27/10/2008 07:34
To
小波 顾 <guxiaobo1982@hotmail.com>
cc
"pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject
Re: [GENERAL] Are there plans to add data compression feature to
postgresql?
You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
******************************************************************************
Sorry for following up so late, actually I mean compression features like what other commercial RDBMS have, such as DB2 9.5 or SQL Server 2008. In those databases, all data types in all tables can be compressed, following are two features we think very useful:
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
2. If two values have the same text or pattern, only one is stored, and that one is compressed with traditional compress methods too.
To: pgsql-general@postgresql.orgSubject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?From: Chris.Ellis@shropshire.gov.ukDate: Mon, 27 Oct 2008 10:19:31 +0000Note that most data stored in the TOAST table is compressed. IE a Text type with length greater than around 2K will be stored in the TOAST table. By default data in the TOAST table is compressed, this can be overriden. However I expect that compression will reduce the performance of certain queries. http://www.postgresql.org/docs/8.3/interactive/storage-toast.html Out of interested, in what context did you want compression?
Ron Mayer <rm_pg@cheapcomplexdevices.com> Sent by: pgsql-general-owner@postgresql.org
27/10/2008 07:34
To
小锟斤拷 锟斤拷 <guxiaobo1982@hotmail.com>
cc
"pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject
Re: [GENERAL] Are there plans to add data compression feature to postgresql?
You might want to try using a file system (ZFS, NTFS) thatdoes compression, depending on what you're trying to compress.-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
******************************************************************************
_________________________________________________________________
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE
2008/10/29 小波 顾 <guxiaobo1982@hotmail.com>
1. Little integers of types take 8 bytes in the past now only take 4 or 2
bytes if there are not so large.
So what actually happen if I have a table with few mills of values that fit
in 2 bytes, but all of the sudent I am going to add another column with
something that requires 8 bytes ? update on all columns ? I am actually even
against varchars in my databases, so something like that sounds at least
creepy.
--
GJ
Data Compression
The new data compression feature in SQL Server 2008 reduces the size of tables, indexes or a subset of their partitions by storing fixed-length data types in variable length storage format and by reducing the redundant data. The space savings achieved depends on the schema and the data distribution. Based on our testing with various data warehouse databases, we have seen a reduction in the size of real user databases up to 87% (a 7 to 1 compression ratio) but more commonly you should expect a reduction in the range of 50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).
SQL Server provides two types of compression as follows:
· ROW compression enables storing fixed length types in variable length storage format. So for example, if you have a column of data type BIGINT which takes 8 bytes of storage in fixed format, when compressed it takes a variable number of bytes—anywhere from 0 bytes to up to 8 bytes. Since column values are stored as variable length, an additional 4‑bit length code is stored for each field within the row. Additionally, zero and NULL values don’t take any storage except for the 4‑bit code.
· PAGE compression is built on top of ROW compression. It minimizes storage of redundant data on the page by storing commonly occurring byte patterns on the page once and then referencing these values for respective columns. The byte pattern recognition is type-independent. Under PAGE compression, SQL Server optimizes space on a page using two techniques.
The first technique is column prefix. In this case, the system looks for a common byte pattern as a prefix for all values of a specific column across rows on the page. This process is repeated for all the columns in the table or index. The column prefix values that are computed are stored as an anchor record on the page and the data or index rows refer to the anchor record for the common prefix, if available, for each column.
The second technique is page level dictionary. This dictionary stores common values across columns and rows and stores them in a dictionary. The columns are then modified to refer to the dictionary entry.
Compression comes with additional CPU cost. This overhead is paid when you query or execute DML operations on compressed data. The relative CPU overhead with ROW is less than for PAGE, but PAGE compression can provide better compression. Since there are many kinds of workloads and data patterns, SQL Server exposes compression granularity at a partition level. You can choose to compress the whole table or index or a subset of partitions. For example, in a DW workload, if CPU is the dominant cost in your workload but you want to save some disk space, you may want to enable PAGE compression on partitions that are not accessed frequently while not compressing the current partition(s) that are accessed and manipulated more frequently. This reduces the total CPU cost, at a small increase in disk space requirements. If I/O cost is dominant for your workload, or you need to reduce disk space costs, compressing all data using PAGE compression may be the best choice. Compression can give many-fold speedups if it causes your working set of frequently touched pages to be cached in the main memory buffer pool, when it does not otherwise fit in memory. Preliminary performance results on one large-scale internal DW query performance benchmark used to test SQL Server 2008 show a 58% disk savings, an average 15% reduction in query runtime, and an average 20% increase in CPU cost. Some queries speeded up by a factor of up to seven. Your results depend on your workload, database, and hardware.
The commands to compress data are exposed as options in CREATE/ALTER DDL statements and support both ONLINE and OFFLINE mode. Additionally, a stored procedure is provided to help you estimate the space savings prior to actual compression.
Backup Compression
Backup compression helps you to save in multiple ways.
By reducing the size of your SQL backups, you save significantly on disk media for your SQL backups. While all compression results depend on the nature of the data being compressed, results of 50% are not uncommon, and greater compression is possible. This enables you to use less storage for keeping your backups online, or to keep more cycles of backups online using the same storage.
Backup compression also saves you time. Traditional SQL backups are almost entirely limited by I/O performance. By reducing the I/O load of the backup process, we actually speed up both backups and restores.
Of course, nothing is entirely free, and this reduction in space and time come at the expense of using CPU cycles. The good news here is that the savings in I/O time offsets the increased use of CPU time, and you can control how much CPU is used by your backups at the expense of the rest of your workload by taking advantage of the Resource Governor.
URL:http://msdn.microsoft.com/en-us/library/cc278097.aspx
Date: Wed, 29 Oct 2008 15:35:44 +0000From: gryzman@gmail.comTo: guxiaobo1982@hotmail.comSubject: Re: [GENERAL] Are there plans to add data compression feature to postgresql?CC: chris.ellis@shropshire.gov.uk; pgsql-general@postgresql.org
2008/10/29 小波 顾 <guxiaobo1982@hotmail.com>
1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes if there are not so large.
So what actually happen if I have a table with few mills of values that fit in 2 bytes, but all of the sudent I am going to add another column with something that requires 8 bytes ? update on all columns ? I am actually even against varchars in my databases, so something like that sounds at least creepy.
-- GJ
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes:
[ snip a lot of marketing for SQL Server ]
I think the part of this you need to pay attention to is
Of course, nothing is entirely free, and this reduction in space and
time come at the expense of using CPU cycles.
We already have the portions of this behavior that seem to me to be
likely to be worthwhile (such as NULL elimination and compression of
large field values). Shaving a couple bytes from a bigint doesn't
strike me as interesting.
(Note: you could build a user-defined type that involved a one-byte
length indicator followed by however many bytes of the bigint you
needed. So someone who thought this might be worthwhile could do it
for themselves. I don't see it being a win, though.)
regards, tom lane
I can imagine my big stats tables , with 300-400M rows, all big ints, that
- mostly - require that sort of length. Gain, none, hassle 100%.
Import Notes
Reply to msg id not found: 49089E14.8010105@emproshunts.comReference msg id not found: 49089E14.8010105@emproshunts.com | Resolved by subject fallback
小波 顾 wrote:
Data Compression MSSQL 2008 technots ..... Your results depend on
your workload, database, and hardware....
Sounds cool but i wonder what real world results are??
For IO bound systems lots of pluses
but for CPU bound workloads it would suck
Import Notes
Resolved by subject fallback
On Wed, Oct 29, 2008 at 10:09 AM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
Data Compression
The new data compression feature in SQL Server 2008 reduces the size of
tables, indexes or a subset of their partitions by storing fixed-length data
types in variable length storage format and by reducing the redundant data.
The space savings achieved depends on the schema and the data distribution.
Based on our testing with various data warehouse databases, we have seen a
reduction in the size of real user databases up to 87% (a 7 to 1 compression
ratio) but more commonly you should expect a reduction in the range of
50-70% (a compression ratio between roughly 2 to 1 and 3 to 1).
I'm sure this makes for a nice brochure or power point presentation,
but in the real world I can't imagine putting that much effort into it
when compressed file systems seem the place to be doing this.
Tom Lane wrote:
=?utf-8?Q?=E5=B0=8F=E6=B3=A2_=E9=A1=BE?= <guxiaobo1982@hotmail.com> writes:
[ snip a lot of marketing for SQL Server ]
I think the part of this you need to pay attention to is
Of course, nothing is entirely free, and this reduction in space and
time come at the expense of using CPU cycles.We already have the portions of this behavior that seem to me to be
likely to be worthwhile (such as NULL elimination and compression of
large field values). Shaving a couple bytes from a bigint doesn't
strike me as interesting.
Think about it on a fact table for a warehouse. A few bytes per bigint
multiplied by several billions/trillions of bigints (not an exaggeration
in a DW) and you're talking some significant storage saving on the main
storage hog in a DW. Not to mention the performance _improvements_ you
can get, even with some CPU overhead for dynamic decompression, if the
planner/optimiser understands how to work with the compression index/map
to perform things like range/partition elimination etc. Admittedly this
depends heavily on the storage mechanics and optimisation techniques of
the DB, but there is value to be had there ... IBM is seeing typical
storage savings in the 40-60% range, mostly based on boring,
bog-standard int, char and varchar data.
The IDUG (so DB2 users themselves, not IBM's marketing) had a
competition to see what was happening in the real world, take a look if
interested: http://www.idug.org/wps/portal/idug/compressionchallenge
Other big benefits come with XML ... but that is even more dependent on
the starting point. Oracle and SQL Server will see big benefits in
compression with this, because their XML technology is so
mind-bogglingly broken in the first place.
So there's certainly utility in this kind of feature ... but whether it
rates above some of the other great stuff in the PostgreSQL pipeline is
questionable.
Ciao
Fuzzy
:-)
------------------------------------------------
Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/
Grant Allen wrote:
...warehouse...DB2...IBM is seeing typical
storage savings in the 40-60% range
Sounds about the same as what compressing file systems claim:
http://opensolaris.org/os/community/zfs/whatis/
"ZFS provides built-in compression. In addition to
reducing space usage by 2-3x, compression also reduces
the amount of I/O by 2-3x. For this reason, enabling
compression actually makes some workloads go faster.
I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.
Ron Mayer wrote:
Grant Allen wrote:
...warehouse...DB2...IBM is seeing typical storage savings in the
40-60% rangeSounds about the same as what compressing file systems claim:
http://opensolaris.org/os/community/zfs/whatis/
"ZFS provides built-in compression. In addition to
reducing space usage by 2-3x, compression also reduces
the amount of I/O by 2-3x. For this reason, enabling
compression actually makes some workloads go faster.I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.
I just so happen to have access to a Netezza system :-) I'll see if I
can find out.
One other thing I forgot to mention: Compression by the DB trumps
filesystem compression in one very important area - shared_buffers! (or
buffer_cache, bufferpool or whatever your favourite DB calls its working
memory for caching data). Because the data stays compressed in the
block/page when cached by the database in one of its buffers, you get
more bang for you memory buck in many circumstances! Just another angle
to contemplate :-)
Ciao
Fuzzy
:-)
------------------------------------------------
Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/
On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
One other thing I forgot to mention: Compression by the DB trumps
filesystem compression in one very important area - shared_buffers!
(or buffer_cache, bufferpool or whatever your favourite DB calls its
working memory for caching data). Because the data stays compressed
in the block/page when cached by the database in one of its buffers,
you get more bang for you memory buck in many circumstances! Just
another angle to contemplate :-)
The additional latency added by decompression is reasonably small
compared with traditional disk access time. It's rather large compared
to memory access time.
Cheers,
Steve
Steve Atkins wrote:
On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
One other thing I forgot to mention: Compression by the DB trumps
filesystem compression in one very important area - shared_buffers!
(or buffer_cache, bufferpool or whatever your favourite DB calls its
working memory for caching data). Because the data stays compressed
in the block/page when cached by the database in one of its buffers,
you get more bang for you memory buck in many circumstances! Just
another angle to contemplate :-)The additional latency added by decompression is reasonably small
compared with traditional disk access time. It's rather large compared
to memory access time.
The one place where Compression is an immediate benefit is the wire. It
is easy to forget that one of our number one bottlenecks (even at
gigabit) is the amount of data we are pushing over the wire.
Joshua D. Drake
Show quoted text
Cheers,
Steve
On Oct 29, 2008, at 10:43 PM, Joshua D. Drake wrote:
Steve Atkins wrote:
On Oct 29, 2008, at 9:50 PM, Grant Allen wrote:
One other thing I forgot to mention: Compression by the DB trumps
filesystem compression in one very important area -
shared_buffers! (or buffer_cache, bufferpool or whatever your
favourite DB calls its working memory for caching data). Because
the data stays compressed in the block/page when cached by the
database in one of its buffers, you get more bang for you memory
buck in many circumstances! Just another angle to contemplate :-)The additional latency added by decompression is reasonably small
compared with traditional disk access time. It's rather large
compared to memory access time.The one place where Compression is an immediate benefit is the wire.
It is easy to forget that one of our number one bottlenecks (even at
gigabit) is the amount of data we are pushing over the wire.
Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over
the wire?
Cheers,
Steve
Steve Atkins wrote:
The one place where Compression is an immediate benefit is the wire.
It is easy to forget that one of our number one bottlenecks (even at
gigabit) is the amount of data we are pushing over the wire.Wouldn't "ssl_ciphers=NULL-MD5" or somesuch give zlib compression over
the wire?
I don't think so.
Joshua D. Drake
Show quoted text
Cheers,
Steve