Possible TODO item: copy to/from pipe

Started by Tom Lanealmost 20 years ago36 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

After re-reading what I just wrote to Andreas about how compression of
COPY data would be better done outside the backend than inside, it
struck me that we are missing a feature that's fairly common in Unix
programs. Perhaps COPY ought to have the ability to pipe its output
to a shell command, or read input from a shell command. Maybe something
like

COPY mytable TO '| gzip >/home/tgl/mytable.dump.gz';

(I'm not wedded to the above syntax, it's just an off-the-cuff thought.)

Of course psql would need the same capability, since the server-side
copy would still be restricted to superusers.

You can accomplish COPY piping now through psql, but it's a bit awkward:

psql -c "COPY mytable TO stdout" mydb | gzip ...

Thoughts? Is this worth doing, or is the psql -c approach good enough?

regards, tom lane

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Possible TODO item: copy to/from pipe

Tom Lane wrote:

You can accomplish COPY piping now through psql, but it's a bit awkward:

psql -c "COPY mytable TO stdout" mydb | gzip ...

Thoughts? Is this worth doing, or is the psql -c approach good enough?

I think it's good enough. And there is also

pg_dump -F c -t bigtable -f bigtable.dump

cheers

andrew

#3David Fetter
david@fetter.org
In reply to: Tom Lane (#1)
Re: Possible TODO item: copy to/from pipe

On Wed, May 31, 2006 at 11:03:14AM -0400, Tom Lane wrote:

After re-reading what I just wrote to Andreas about how compression
of COPY data would be better done outside the backend than inside,
it struck me that we are missing a feature that's fairly common in
Unix programs. Perhaps COPY ought to have the ability to pipe its
output to a shell command, or read input from a shell command.
Maybe something like

COPY mytable TO '| gzip >/home/tgl/mytable.dump.gz';

That's a great syntax :)

Similarly,

COPY mytable FROM 'create_sample_data --table mytable --rows 10000000 |';

would be cool.

(I'm not wedded to the above syntax, it's just an off-the-cuff
thought.)

It will be familiar to Perl users, for better or worse. Come to that,
should the prefixes > and >> also mean their corresponding shell
things?

Of course psql would need the same capability, since the server-side
copy would still be restricted to superusers.

Roight.

You can accomplish COPY piping now through psql, but it's a bit awkward:

psql -c "COPY mytable TO stdout" mydb | gzip ...

Thoughts? Is this worth doing, or is the psql -c approach good enough?

I think it's worth doing :)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

#4Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#1)
Re: Possible TODO item: copy to/from pipe

Tom Lane wrote:

After re-reading what I just wrote to Andreas about how compression of
COPY data would be better done outside the backend than inside, it
struck me that we are missing a feature that's fairly common in Unix
programs. Perhaps COPY ought to have the ability to pipe its output
to a shell command, or read input from a shell command. Maybe something
like

COPY mytable TO '| gzip >/home/tgl/mytable.dump.gz';

(I'm not wedded to the above syntax, it's just an off-the-cuff thought.)

Of course psql would need the same capability, since the server-side
copy would still be restricted to superusers.

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to receive the
reduced stuff. But clients should be agnostic of server side installed
tools, and probably not be able to address them directly. Sounds like a
potential security issue.

Regards,
Andreas

#5Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Andreas Pflug (#4)
Re: Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to receive the
reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format that's
efficient for both text and binary data. Relying on some external XYZzip
version seems not too portable to me.

Regards,
Andreas

#6Dave Page
dpage@pgadmin.org
In reply to: Andreas Pflug (#5)
Re: Possible TODO item: copy to/from pipe

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andreas Pflug
Sent: 31 May 2006 16:41
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format that's
efficient for both text and binary data. Relying on some
external XYZzip
version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

Regards, Dave.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#5)
Re: Possible TODO item: copy to/from pipe

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format that's
efficient for both text and binary data. Relying on some external XYZzip
version seems not too portable to me.

I dislike putting this into the backend precisely because it's trying to
impose a one-size-fits-all compression solution. Someone might wish to
use bzip2 instead of gzip, for instance, or tweak the compression level
options of gzip. It's trivial for the user to do that if the
compression program is separate, not trivial at all if it's wired into
COPY. Also, a pipe feature would have uses unrelated to compression,
such as on-the-fly analysis or generation of data.

regards, tom lane

#8Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dave Page (#6)
Re: Possible TODO item: copy to/from pipe

Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andreas Pflug
Sent: 31 May 2006 16:41
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format that's
efficient for both text and binary data. Relying on some
external XYZzip
version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

Regards,
Andreas

#9Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#7)
Re: Possible TODO item: copy to/from pipe

I dislike putting this into the backend precisely because it's trying to
impose a one-size-fits-all compression solution. Someone might wish to
use bzip2 instead of gzip, for instance, or tweak the compression level
options of gzip. It's trivial for the user to do that if the
compression program is separate, not trivial at all if it's wired into
COPY. Also, a pipe feature would have uses unrelated to compression,
such as on-the-fly analysis or generation of data.

It seems that it would be better to have the options within pg_dump
which would give the most flexibility.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#10Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Joshua D. Drake (#9)
Re: Possible TODO item: copy to/from pipe

Joshua D. Drake wrote:

I dislike putting this into the backend precisely because it's trying to
impose a one-size-fits-all compression solution. Someone might wish to
use bzip2 instead of gzip, for instance, or tweak the compression level
options of gzip. It's trivial for the user to do that if the
compression program is separate, not trivial at all if it's wired into
COPY. Also, a pipe feature would have uses unrelated to compression,
such as on-the-fly analysis or generation of data.

It seems that it would be better to have the options within pg_dump
which would give the most flexibility.

What about all other client tools?

My COPY WITH COMPRESSION is not the same as taking a copy file and
zipping it; it creates a copy file with BinarySignature that has
compressed bytes in the data part, thus it can be handled by any client
app that can stream binary copy files from/to the server.

Regards,
Andreas

#11Magnus Hagander
magnus@hagander.net
In reply to: Andreas Pflug (#10)
Re: Possible TODO item: copy to/from pipe

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be

COPY ... TO '|

/bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable

format that's

efficient for both text and binary data. Relying on some external
XYZzip version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

For the record, gzip on win32 works perfectly fine both as a separate
program and running in a pipe. No problem at all. The only issue is that
it's not available by default. (And possible issues with programs
launching it that don't know how to deal with windows style directory
naming)

//Magnus

#12Chris Browne
cbbrowne@acm.org
In reply to: Dave Page (#6)
Re: Possible TODO item: copy to/from pipe

pgadmin@pse-consulting.de (Andreas Pflug) writes:

Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andreas
Pflug
Sent: 31 May 2006 16:41
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format
that's efficient for both text and binary data. Relying on some
external XYZzip version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

Shall we make PostgreSQL less powerful because of that?
--
"cbbrowne","@","cbbrowne.com"
http://cbbrowne.com/info/advocacy.html
"Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath. At night, the ice weasels come."
-- Matt Groening

#13Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Chris Browne (#12)
Re: Possible TODO item: copy to/from pipe

Chris Browne wrote:

pgadmin@pse-consulting.de (Andreas Pflug) writes:

Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andreas
Pflug
Sent: 31 May 2006 16:41
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format
that's efficient for both text and binary data. Relying on some
external XYZzip version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

Shall we make PostgreSQL less powerful because of that?

I never said that. We shall seek solutions that run painless on most
popular platforms are useful to users.
I wonder if we'd be able to ship gzip with the windows installer, to
insure proper integration.

Regards,
Andreas

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Andreas Pflug (#13)
Re: Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Chris Browne wrote:

pgadmin@pse-consulting.de (Andreas Pflug) writes:

Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andreas
Pflug
Sent: 31 May 2006 16:41
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Possible TODO item: copy to/from pipe

Andreas Pflug wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be
COPY ... TO '| /bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable format
that's efficient for both text and binary data. Relying on some
external XYZzip version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

Shall we make PostgreSQL less powerful because of that?

I never said that. We shall seek solutions that run painless on most
popular platforms are useful to users.
I wonder if we'd be able to ship gzip with the windows installer, to
insure proper integration.

I wish somebody would explain to me the compelling use case for this. We
do not have to build every possible capability into Postgres. As Tony
Hoare said, "Inside every large problem is a small problem struggling to
get out."

This just seems like creeping featurism to me.

cheers

andrew

#15David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#14)
Re: Possible TODO item: copy to/from pipe

On Wed, May 31, 2006 at 02:46:29PM -0400, Andrew Dunstan wrote:

I wish somebody would explain to me the compelling use case for
this.

As with "in-place upgrades,"[1]A feature people seem to think we don't need, although convincing cases have been made for it. -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter the compelling use case is being short
on disk space. For somebody with a multi-TB (or whatever figure
sounds big this week) PostgreSQL database, it may be impossible to get
space for twice or more that. Giving people the option to stream
COPYs through a pipe would alleviate a lot of pain.

Cheers,
D

[1]: A feature people seem to think we don't need, although convincing cases have been made for it. -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter
cases have been made for it.
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

#16Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#15)
Re: Possible TODO item: copy to/from pipe

David Fetter wrote:

On Wed, May 31, 2006 at 02:46:29PM -0400, Andrew Dunstan wrote:

I wish somebody would explain to me the compelling use case for
this.

As with "in-place upgrades,"[1] the compelling use case is being short
on disk space. For somebody with a multi-TB (or whatever figure
sounds big this week) PostgreSQL database, it may be impossible to get
space for twice or more that. Giving people the option to stream
COPYs through a pipe would alleviate a lot of pain.

Cheers,
D

[1] A feature people seem to think we don't need, although convincing
cases have been made for it.

But why is that hugely better than piping psql output to gzip?

The Unix philosophy is to use small chains of tools rather than put
everything into one big tool.

Thus, this is quite unlike inplace upgrade, which I agree would be great
(and where I would far rather see people spend their efforts), because
unlike for this "feature" there is no viable alternative.

cheers

andrew

#17Dave Page
dpage@pgadmin.org
In reply to: Magnus Hagander (#11)
Re: Possible TODO item: copy to/from pipe

On 31/5/06 18:28, "Magnus Hagander" <mha@sollentuna.net> wrote:

Won't help too much, until gzip's output is piped back too, so a
replacement for COPY .. TO STDOUT COMPRESSED would be

COPY ... TO '|

/bin/gzip |' STDOUT, to enable clients to

receive the

reduced stuff.

Forgot to mention:
COPY COMPRESSED was also meant to introduce a portable

format that's

efficient for both text and binary data. Relying on some external
XYZzip version seems not too portable to me.

It does have that advantage. Gzip and others are not particularly
Windows friendly for example.

... as most windows programs are pipe agnostic.

For the record, gzip on win32 works perfectly fine both as a separate
program and running in a pipe. No problem at all. The only issue is that
it's not available by default. (And possible issues with programs
launching it that don't know how to deal with windows style directory
naming)

Exactly my point; how many production Windows servers do you have with gzip
anywhere near them? Andreas' point about pipes is also valid though - it's
simply not the norm on Windows as I found when we were porting Slony
(more.exe barfs at >8MB being pipe in).

Regards, Dave.

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#16)
Re: Possible TODO item: copy to/from pipe

Andrew Dunstan wrote:

David Fetter wrote:

As with "in-place upgrades,"[1] the compelling use case is being short
on disk space. For somebody with a multi-TB (or whatever figure
sounds big this week) PostgreSQL database, it may be impossible to get
space for twice or more that. Giving people the option to stream
COPYs through a pipe would alleviate a lot of pain.

But why is that hugely better than piping psql output to gzip?

psql output has already travelled over the network.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19Dave Page
dpage@pgadmin.org
In reply to: Andreas Pflug (#13)
Re: Possible TODO item: copy to/from pipe

On 31/5/06 19:13, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

I wonder if we'd be able to ship gzip with the windows installer, to
insure proper integration.

'Fraid not. It's GPL'd.

Regards, Dave.

#20Steve Atkins
steve@blighty.com
In reply to: Dave Page (#19)
Re: Possible TODO item: copy to/from pipe

On May 31, 2006, at 12:58 PM, Dave Page wrote:

On 31/5/06 19:13, "Andreas Pflug" <pgadmin@pse-consulting.de> wrote:

I wonder if we'd be able to ship gzip with the windows installer, to
insure proper integration.

'Fraid not. It's GPL'd.

Well, one implementation of it is. zlib is new-bsd-ish, though, and
includes
minigzip, which should be just fine for use in a pipe on windows.

(Not that that's an argument one way or the other as to whether this
is something we should do).

Cheers,
Steve

#21Magnus Hagander
magnus@hagander.net
In reply to: Steve Atkins (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#10)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#17)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#18)
#25Dave Page
dpage@pgadmin.org
In reply to: Tom Lane (#23)
#26Martijn van Oosterhout
kleptog@svana.org
In reply to: Steve Atkins (#20)
#27Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Dave Page (#25)
#28Magnus Hagander
magnus@hagander.net
In reply to: Andreas Pflug (#27)
#29Joshua D. Drake
jd@commandprompt.com
In reply to: Magnus Hagander (#28)
#30Martijn van Oosterhout
kleptog@svana.org
In reply to: Magnus Hagander (#28)
#31Dawid Kuroczko
qnex42@gmail.com
In reply to: Tom Lane (#1)
#32Mark Woodward
pgsql@mohawksoft.com
In reply to: Tom Lane (#1)
#33Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#26)
#34Bruce Momjian
bruce@momjian.us
In reply to: Andreas Pflug (#4)
#35Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Bruce Momjian (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Andreas Pflug (#35)