Is a plan for lmza commpression in pg_dump

Started by Stanislav Lackoalmost 17 years ago21 messages
#1Stanislav Lacko
lacko@spacesystems.sk
1 attachment(s)

Hi.

Is it in todo or in a plan to implement lmza commpression in pg_dump
backups?

Thanks

Stano
--
------------------------------------------------------------------------

Space Systems

*Mgr. Stano LACKO*

mobil: +421 908 175 753

fax.: +421 2 555 72 676

e-mail: lacko@spacesystems.sk <mailto:lacko@spacesystems.sk>

*Space Systems, s.r.o.*

Zámocká 30

811 01 Bratislava

www.spacesystems.sk <http://www.spacesystems.sk/&gt;

Attachments:

logo_space.gifimage/gif; name=logo_space.gifDownload
#2Bruce Momjian
bruce@momjian.us
In reply to: Stanislav Lacko (#1)
Re: Is a plan for lmza commpression in pg_dump

Stanislav Lacko wrote:

Hi.

Is it in todo or in a plan to implement lmza commpression in pg_dump
backups?

Nope, never heard anything about it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#2)
Re: Is a plan for lmza commpression in pg_dump

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Wednesday, February 04, 2009 3:28 PM
To: Stanislav Lacko
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Is a plan for lmza commpression in pg_dump

Stanislav Lacko wrote:

Hi.

Is it in todo or in a plan to implement lmza commpression in pg_dump
backups?

Nope, never heard anything about it.

In case the PG group does get interested in insertion of compression
algorithms into PostgreSQL {it seems it could be useful in many
different areas}, the 7zip format seems to be excellent in a number of
ways.

Here is an interesting benchmark that shows 7z format winning a large
area of the "optimal compressors" performance graph:
http://users.elis.ugent.be/~wheirman/compression/

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

Unfortunately LZOP (which wins the top half of the "optimal compressors"
graph where the compression and decompression speed is more important
than amount of compression) does not have a liberal license.
http://www.lzop.org/

#4Andrew Chernow
ac@esilo.com
In reply to: Dann Corbit (#3)
Re: Is a plan for lmza commpression in pg_dump

Dann Corbit wrote:

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

I played with this but found the SDK extremely confusing and flat out horrible.
One personal dislike was the unnecessary use of C++; although it was the
horrible API that turned me off. I'm not even sure if I ever got a test program
working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy API
with many variants; my fav is LZO1X-1(15). Its known for its compresison and
decompresison speeds ... its blazing fast. zlib typically gets 5-8% more
compression.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

#5daveg
daveg@sonic.net
In reply to: Andrew Chernow (#4)
Re: Is a plan for lmza commpression in pg_dump

On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:

Dann Corbit wrote:

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

I played with this but found the SDK extremely confusing and flat out
horrible. One personal dislike was the unnecessary use of C++; although it
was the horrible API that turned me off. I'm not even sure if I ever got a
test program working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy
API with many variants; my fav is LZO1X-1(15). Its known for its
compresison and decompresison speeds ... its blazing fast. zlib typically
gets 5-8% more compression.

LZO rocks. I wonder if the lzo developer would consider a license exception
so that postgresql could use it? What would we need?

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#6Andrew Dunstan
andrew@dunslane.net
In reply to: daveg (#5)
Re: Is a plan for lmza commpression in pg_dump

daveg wrote:

On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:

Dann Corbit wrote:

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

I played with this but found the SDK extremely confusing and flat out
horrible. One personal dislike was the unnecessary use of C++; although it
was the horrible API that turned me off. I'm not even sure if I ever got a
test program working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy
API with many variants; my fav is LZO1X-1(15). Its known for its
compresison and decompresison speeds ... its blazing fast. zlib typically
gets 5-8% more compression.

LZO rocks. I wonder if the lzo developer would consider a license exception
so that postgresql could use it? What would we need?

Probably a BSD license or a clean room implementation which we could BSD
license.

cheers

andrew

#7Bruce Momjian
bruce@momjian.us
In reply to: daveg (#5)
Re: Is a plan for lmza commpression in pg_dump

daveg wrote:

On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:

Dann Corbit wrote:

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

I played with this but found the SDK extremely confusing and flat out
horrible. One personal dislike was the unnecessary use of C++; although it
was the horrible API that turned me off. I'm not even sure if I ever got a
test program working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy
API with many variants; my fav is LZO1X-1(15). Its known for its
compresison and decompresison speeds ... its blazing fast. zlib typically
gets 5-8% more compression.

LZO rocks. I wonder if the lzo developer would consider a license exception
so that postgresql could use it? What would we need?

The chance of us using anything but one zlib is near zero so please do
not persue this; this discussion comes up much too often.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8daveg
daveg@sonic.net
In reply to: Bruce Momjian (#7)
Re: Is a plan for lmza commpression in pg_dump

On Sat, Feb 07, 2009 at 02:47:05PM -0500, Bruce Momjian wrote:

daveg wrote:

On Wed, Feb 04, 2009 at 10:23:17PM -0500, Andrew Chernow wrote:

Dann Corbit wrote:

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

I played with this but found the SDK extremely confusing and flat out
horrible. One personal dislike was the unnecessary use of C++; although it
was the horrible API that turned me off. I'm not even sure if I ever got a
test program working.

LZO (http://www.oberhumer.com/opensource/lzo/) is a great algorithm, easy
API with many variants; my fav is LZO1X-1(15). Its known for its
compresison and decompresison speeds ... its blazing fast. zlib typically
gets 5-8% more compression.

LZO rocks. I wonder if the lzo developer would consider a license exception
so that postgresql could use it? What would we need?

The chance of us using anything but one zlib is near zero so please do
not persue this; this discussion comes up much too often.

That this comes up "much to often" suggests that there is more than near
zero interest. Why can only one compression library can be considered?
We use multiple readline implementations, for better or worse.

I think the context here is for pg_dump only and in that context a faster
compression library makes a lot of sense. I'd be happy to prepare a patch
if the license issue can be accomodated. Hence my question, what sort of
licence accomodation would we need to be able to use this library?

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#9Grzegorz Jaskiewicz
gj@pointblue.com.pl
In reply to: daveg (#8)
Re: Is a plan for lmza commpression in pg_dump

On 7 Feb 2009, at 21:08, daveg wrote:

That this comes up "much to often" suggests that there is more than
near
zero interest. Why can only one compression library can be
considered?
We use multiple readline implementations, for better or worse.

I don't see anything wrong with using standard unix pipes... and do it
in truly unix and scalable way !

#10Robert Haas
robertmhaas@gmail.com
In reply to: daveg (#8)
Re: Is a plan for lmza commpression in pg_dump

That this comes up "much to often" suggests that there is more than near
zero interest. Why can only one compression library can be considered?
We use multiple readline implementations, for better or worse.

I think the context here is for pg_dump only and in that context a faster
compression library makes a lot of sense. I'd be happy to prepare a patch
if the license issue can be accomodated. Hence my question, what sort of
licence accomodation would we need to be able to use this library?

Based on previous discussions, I suspect that the answer here is
"complete relicensing as BSD". I think pursuing any sort of licensing
exception is completely futile as there will still be restrictions
that will be unacceptable to many in the community.

But if someone had an actual BSD-LICENSED compression library that was
better than what we have now, I'm not sure why Bruce (or anyone)
should be opposed to incorporating it. It's just that all of the
proposals that come up for this sort of thing aren't that.

...Robert

#11Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#10)
Re: Is a plan for lmza commpression in pg_dump

Robert Haas wrote:

That this comes up "much to often" suggests that there is more than near
zero interest. Why can only one compression library can be considered?
We use multiple readline implementations, for better or worse.

I think the context here is for pg_dump only and in that context a faster
compression library makes a lot of sense. I'd be happy to prepare a patch
if the license issue can be accomodated. Hence my question, what sort of
licence accomodation would we need to be able to use this library?

Based on previous discussions, I suspect that the answer here is
"complete relicensing as BSD". I think pursuing any sort of licensing
exception is completely futile as there will still be restrictions
that will be unacceptable to many in the community.

But if someone had an actual BSD-LICENSED compression library that was
better than what we have now, I'm not sure why Bruce (or anyone)
should be opposed to incorporating it. It's just that all of the
proposals that come up for this sort of thing aren't that.

You can be I would oppose it. It is not efficient for us to support
every compression-of-the-month project that comes along. If something
was BSD, well tested, and clearly superior, we might consider it, but I
have seen nothing like that for 10 years and I doubt I will see
something the next 5. I am thinking we need to add this to the
"Features we do not want" section of our todo list.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#11)
Re: Is a plan for lmza commpression in pg_dump

On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:

Robert Haas wrote:

That this comes up "much to often" suggests that there is more
than near
zero interest. Why can only one compression library can be
considered?
We use multiple readline implementations, for better or worse.

I think the context here is for pg_dump only and in that context a
faster
compression library makes a lot of sense. I'd be happy to prepare
a patch
if the license issue can be accomodated. Hence my question, what
sort of
licence accomodation would we need to be able to use this library?

Based on previous discussions, I suspect that the answer here is
"complete relicensing as BSD". I think pursuing any sort of
licensing
exception is completely futile as there will still be restrictions
that will be unacceptable to many in the community.

But if someone had an actual BSD-LICENSED compression library that
was
better than what we have now, I'm not sure why Bruce (or anyone)
should be opposed to incorporating it. It's just that all of the
proposals that come up for this sort of thing aren't that.

You can be I would oppose it. It is not efficient for us to support
every compression-of-the-month project that comes along. If something
was BSD, well tested, and clearly superior, we might consider it,
but I

Well that's pretty much what I said.

have seen nothing like that for 10 years and I doubt I will see
something the next 5. I am thinking

I am doubtful too.

we need to add this to the
"Features we do not want" section of our todo list.

"Proprietary compression algorithms, even with Postgresql-specific
license exceptions"?

...Robert

#13Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#12)
Re: Is a plan for lmza commpression in pg_dump

Robert Haas wrote:

have seen nothing like that for 10 years and I doubt I will see
something the next 5. I am thinking

I am doubtful too.

we need to add this to the
"Features we do not want" section of our todo list.

"Proprietary compression algorithms, even with Postgresql-specific
license exceptions"?

Yep. Does it make sense to make our license more complex to get 1%
percent better compression in certain cases? Probably not. Also
consider the code maintenance, patents, larger tarball, bugs, etc.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14David Fetter
david@fetter.org
In reply to: Robert Haas (#12)
Re: Is a plan for lmza commpression in pg_dump

On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote:

On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:

we need to add this to the "Features we do not want" section of our
todo list.

"Proprietary compression algorithms, even with Postgresql-specific
license exceptions"?

Considering that the entire project ships with a BSD license, which
very specifically allows use of all or any tiniest part of it with
(skipping some legalese) two restrictions: mention PGDG in the
copyright list, and don't sue us no matter what happens, any
"Postgresql-specific license exceptions" are equivalent to "that
algorithm is no longer proprietary" because any project could simply
use PostgreSQL's version and have done.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#15daveg
daveg@sonic.net
In reply to: Robert Haas (#12)
Re: Is a plan for lmza commpression in pg_dump

On Sat, Feb 07, 2009 at 08:49:29PM -0500, Robert Haas wrote:

"Proprietary compression algorithms, even with Postgresql-specific
license exceptions"?

To be fair, lzo is GPL, which is a stretch to consider proprietary.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: David Fetter (#14)
Re: Is a plan for lmza commpression in pg_dump

On Sat, Feb 07, 2009 at 08:31:23PM -0800, David Fetter wrote:

Considering that the entire project ships with a BSD license, which
very specifically allows use of all or any tiniest part of it with
(skipping some legalese) two restrictions: mention PGDG in the
copyright list, and don't sue us no matter what happens, any
"Postgresql-specific license exceptions" are equivalent to "that
algorithm is no longer proprietary" because any project could simply
use PostgreSQL's version and have done.

Why don't we just add an option to pg_dump --use-compress-program, just
like tar and then people can use their "compression algorithm of the
week" and we don't need to care about the licence or anything.

It's not like the case of TOAST where it actually needs to be builtin.
Tar doesn't have any compression builtin, yet you don't see many
uncompressed tar files...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#17Greg Stark
greg.stark@enterprisedb.com
In reply to: Robert Haas (#12)
Re: Is a plan for lmza commpression in pg_dump

On 8 Feb 2009, at 02:49, Robert Haas <robertmhaas@gmail.com> wrote:

On Feb 7, 2009, at 4:53 PM, Bruce Momjian <bruce@momjian.us> wrote:

we need to add this to the
"Features we do not want" section of our todo list.

"Proprietary compression algorithms, even with Postgresql-specific
license exceptions"?

Now that I would agree about. We would have to explain that we're bsd
licenced *because* we want people to be able to reuse our code outside
postgres including commercial projects

#18Andrew Chernow
ac@esilo.com
In reply to: Martijn van Oosterhout (#16)
Re: Is a plan for lmza commpression in pg_dump

Why don't we just add an option to pg_dump --use-compress-program, just
like tar and then people can use their "compression algorithm of the
week" and we don't need to care about the licence or anything.

Can't this be done already?

pg_dump -Z 0 | compression_binary >mydump

If -Z is unspecified, I think it won't compress? Maybe you can just drop the -Z.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Martijn van Oosterhout (#16)
Re: Is a plan for lmza commpression in pg_dump

Martijn van Oosterhout wrote:

Why don't we just add an option to pg_dump --use-compress-program, just
like tar and then people can use their "compression algorithm of the
week" and we don't need to care about the licence or anything.

It's not like the case of TOAST where it actually needs to be builtin.
Tar doesn't have any compression builtin, yet you don't see many
uncompressed tar files...

tar compresses/decompresses the whole archive via a single pipe. pg_dump
compresses individual data members. If the compression isn't builtin it
will make life much more difficult, and probably make parallel restore
as well as some other operations well nigh impossible.

cheers

andrew

#20Peter Eisentraut
peter_e@gmx.net
In reply to: daveg (#8)
Re: Is a plan for lmza commpression in pg_dump

daveg wrote:

I think the context here is for pg_dump only and in that context a faster
compression library makes a lot of sense. I'd be happy to prepare a patch
if the license issue can be accomodated.

Some kind of performance data (space and time) would be required to
support any change in this area.

Notice that the thread originally called for lzma support, which is
completely at the opposite end of the spectrum of compression algorithms
in terms of space and time, compared to lzo. So it's not really clear
what the requirements are in the first place.

#21Andrew Chernow
ac@esilo.com
In reply to: Peter Eisentraut (#20)
Re: Is a plan for lmza commpression in pg_dump

Peter Eisentraut wrote:

Notice that the thread originally called for lzma support, which is
completely at the opposite end of the spectrum of compression algorithms
in terms of space and time, compared to lzo. So it's not really clear
what the requirements are in the first place.

Instead of trying to figure out the needs/wants of a DBA, a general purpose
solution, it might be better to figure out how to make the compression choice
user-driven. Maybe the requirement should be to make this the user's decision;
pipe'n the output to the compression of choice seems to be the simplest approach.

There are cases the highest compression is desired even if it takes forever, and
cases for just the opposite. Not sure why this has to be builtin or why it much
use zlib, other than this is the current method.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/