COPY (query) TO file

Started by Mark Woodwardalmost 20 years ago35 messageshackers
Jump to latest
#1Mark Woodward
pgsql@mohawksoft.com

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is a
good idea.

Currently, the COPY command only copies a table, what if it could operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

I have no idea if it is doable, but I can see uses for replication

psql -h source mydb -c "COPY (select * from mytable where ID > xxxxx) as
mytable TO STDOUT" | psql -h target mydb -c "COPY mytable FROM stdin"

#2Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#1)
Re: COPY (query) TO file

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is a
good idea.

Currently, the COPY command only copies a table, what if it could operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

I have no idea if it is doable, but I can see uses for replication

I doubt it be really usefull (apart from maybe saving some work
coding a client app) but did you actually test it with

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Regards
Tino

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Woodward (#1)
Re: COPY (query) TO file

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is a
good idea.

Currently, the COPY command only copies a table, what if it could operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

Isn't this already being worked on? The TODO list says:

Allow COPY to output from views
Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

cheers

andrew

#4Mark Woodward
pgsql@mohawksoft.com
In reply to: Tino Wildenhain (#2)
Re: COPY (query) TO file

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is
a
good idea.

Currently, the COPY command only copies a table, what if it could
operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

I have no idea if it is doable, but I can see uses for replication

I doubt it be really usefull (apart from maybe saving some work
coding a client app) but did you actually test it with

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Why "create table?"

The idea is that you would have one or more redundent databases and use
the COPY TO/FROM to keep them up to date.

#5Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#4)
Re: COPY (query) TO file

Mark Woodward wrote:

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is
a
good idea.

Currently, the COPY command only copies a table, what if it could
operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

I have no idea if it is doable, but I can see uses for replication

I doubt it be really usefull (apart from maybe saving some work
coding a client app) but did you actually test it with

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Why "create table?"

Just to simulate and time the proposal.
SELECT ... already works over the network and if COPY from a
select (which would basically work like yet another wire
protocol) isnt significantly faster, why bother?

The idea is that you would have one or more redundent databases and use
the COPY TO/FROM to keep them up to date.

Well, if you have databases you would have regular tables - and
can use copy as it is now :-)

Regards
Tino

#6Mark Woodward
pgsql@mohawksoft.com
In reply to: Tino Wildenhain (#5)
Re: COPY (query) TO file

Mark Woodward wrote:

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it
is
a
good idea.

Currently, the COPY command only copies a table, what if it could
operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

I have no idea if it is doable, but I can see uses for replication

I doubt it be really usefull (apart from maybe saving some work
coding a client app) but did you actually test it with

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Why "create table?"

Just to simulate and time the proposal.
SELECT ... already works over the network and if COPY from a
select (which would basically work like yet another wire
protocol) isnt significantly faster, why bother?

Because the format of COPY is a common transmiter/receiver for PostgreSQL,
like this:

pg_dump -t mytable | psql -h target -c "COPY mytable FROM STDIN"

With a more selective copy, you can use pretty much this mechanism to
limit a copy to a sumset of the records in a table.

The idea is that you would have one or more redundent databases and use
the COPY TO/FROM to keep them up to date.

Well, if you have databases you would have regular tables - and
can use copy as it is now :-)

But COPY copies all the records, not some of the records.

Show quoted text

Regards
Tino

#7Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Andrew Dunstan (#3)
Re: COPY (query) TO file

Andrew Dunstan wrote:

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is a
good idea.

Currently, the COPY command only copies a table, what if it could operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

Isn't this already being worked on? The TODO list says:

Allow COPY to output from views
Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

I second that - allowing arbitrary SELECT statements as a COPY source
seems much more powerful and flexible than just supporting COPY FROM VIEW.

Stefan

#8Neil Conway
neilc@samurai.com
In reply to: Andrew Dunstan (#3)
Re: COPY (query) TO file

On Fri, 2006-06-02 at 09:56 -0400, Andrew Dunstan wrote:

Allow COPY to output from views

FYI, there is a patch for this floating around -- I believe it was
posted to -patches a few months back.

Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

+1.

-Neil

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Neil Conway (#8)
Re: COPY (query) TO file

Neil Conway wrote:

On Fri, 2006-06-02 at 09:56 -0400, Andrew Dunstan wrote:

Allow COPY to output from views

FYI, there is a patch for this floating around -- I believe it was
posted to -patches a few months back.

I have it. The pieces of it than I can use to implement the idea below,
I'll try to rescue. I don't think it's that much though.

Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

+1.

Sounds like a plan.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#6)
Re: COPY (query) TO file

Mark Woodward wrote:
...

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Why "create table?"

Just to simulate and time the proposal.
SELECT ... already works over the network and if COPY from a
select (which would basically work like yet another wire
protocol) isnt significantly faster, why bother?

Because the format of COPY is a common transmiter/receiver for PostgreSQL,
like this:

pg_dump -t mytable | psql -h target -c "COPY mytable FROM STDIN"

With a more selective copy, you can use pretty much this mechanism to
limit a copy to a sumset of the records in a table.

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

For example if you copy numbers, int4 (4 bytes)
gets expanded to up to 10 bytes. Of course
can get the same wire load if you use to_char()
with regular select.

The idea is that you would have one or more redundent databases and use
the COPY TO/FROM to keep them up to date.

Well, if you have databases you would have regular tables - and
can use copy as it is now :-)

But COPY copies all the records, not some of the records.

yes, that would be the point in having them "up to date"
and not "partially maybe something up to date" ;)

COPY is fine for import of data, but for export
I think it should be implemented in the frontend.

Regards
Tino

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tino Wildenhain (#10)
Re: COPY (query) TO file

Tino Wildenhain <tino@wildenhain.de> writes:

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

You're not seriously suggesting we reimplement evaluation of WHERE clauses
on the client side, are you?

regards, tom lane

#12Tino Wildenhain
tino@wildenhain.de
In reply to: Tom Lane (#11)
Re: COPY (query) TO file

Tom Lane wrote:

Tino Wildenhain <tino@wildenhain.de> writes:

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

You're not seriously suggesting we reimplement evaluation of WHERE clauses
on the client side, are you?

no, did I? But what is wrong with something like:

\COPY 'SELECT foo,bar,baz FROM footable WHERE baz=5 ORDER BY foo' TO
file|stdout

which would just run the query (in the backend of course) and
format the output just like copy would...

I mean, ok, when its in the backend, its fine too (beside the
data expansion if you dont implement compression...)
but I thougt implementing in the frontend would be easier...

Regards
Tino

#13Mark Woodward
pgsql@mohawksoft.com
In reply to: Tino Wildenhain (#10)
Re: COPY (query) TO file

Mark Woodward wrote:
...

create table as select ...; followed by a copy of that table
if it really is faster then just the usual select & fetch?

Why "create table?"

Just to simulate and time the proposal.
SELECT ... already works over the network and if COPY from a
select (which would basically work like yet another wire
protocol) isnt significantly faster, why bother?

Because the format of COPY is a common transmiter/receiver for
PostgreSQL,
like this:

pg_dump -t mytable | psql -h target -c "COPY mytable FROM STDIN"

With a more selective copy, you can use pretty much this mechanism to
limit a copy to a sumset of the records in a table.

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

Because "COPY" runs on the back-end, not the front end, and the front end
may not even be in the same city as the backend. When you issue a "COPY"
the file it reads or writes local to the backend. True, the examples I
gave may not show how that is important, but consider this:

psql -h remote masterdb -c "COPY (select * from mytable where ID <
xxlastxx) as mytable TO '/replicate_backup/mytable-060602.pgc'"

This runs completely in the background and can serve as a running backup.

#14Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#13)
Re: COPY (query) TO file

Mark Woodward wrote:
...

pg_dump -t mytable | psql -h target -c "COPY mytable FROM STDIN"

With a more selective copy, you can use pretty much this mechanism to
limit a copy to a sumset of the records in a table.

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

Because "COPY" runs on the back-end, not the front end, and the front end
may not even be in the same city as the backend. When you issue a "COPY"
the file it reads or writes local to the backend. True, the examples I
gave may not show how that is important, but consider this:

We were talking about COPY to stdout :-) Copy to file is another
issue :-) Copy to (server fs) file has so many limitations I dont see
wide use for it. (Of course there are usecases)

psql -h remote masterdb -c "COPY (select * from mytable where ID <
xxlastxx) as mytable TO '/replicate_backup/mytable-060602.pgc'"

This runs completely in the background and can serve as a running backup.

And you are sure it would be much faster then a server local running
psql just dumping the result of a query?
(And you could more easy avoid raceconditions in contrast to several
remote clients trying to trigger your above backup )

But what do I know... I was just asking :-)

Regards
Tino

#15David Fetter
david@fetter.org
In reply to: Andrew Dunstan (#3)
Re: COPY (query) TO file

On Fri, Jun 02, 2006 at 09:56:07AM -0400, Andrew Dunstan wrote:

Mark Woodward wrote:

Tom had posted a question about file compression with copy. I thought
about it, and I want to through this out and see if anyone things it is a
good idea.

Currently, the COPY command only copies a table, what if it could operate
with a query, as:

COPY (select * from mytable where foo='bar') as BAR TO stdout

Isn't this already being worked on? The TODO list says:

Allow COPY to output from views
Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

+1 :)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

#16Bruce Momjian
bruce@momjian.us
In reply to: Tino Wildenhain (#12)
Re: COPY (query) TO file

Tino Wildenhain <tino@wildenhain.de> writes:

Tom Lane wrote:

Tino Wildenhain <tino@wildenhain.de> writes:

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

You're not seriously suggesting we reimplement evaluation of WHERE clauses
on the client side, are you?

No, he's suggesting the client implement COPY formatting after fetching a
regular result set.

Of course this runs into the same problem other clients have dealing with
large result sets. libpq doesn't want to let the client deal with partial
results so you have to buffer up the entire result set in memory.

I was also vaguely pondering whether all the DDL commands could be generalized
to receive or send COPY formatted data for repeated execution. It would be
neat to be able to prepare an UPDATE with placeholders and stream data in COPY
format as parameters to the UPDATE to execute it thousands or millions of
times without any protocol overhead or network pipeline stalls.

--
greg

#17Mark Woodward
pgsql@mohawksoft.com
In reply to: Tino Wildenhain (#14)
Re: COPY (query) TO file

Mark Woodward wrote:
...

pg_dump -t mytable | psql -h target -c "COPY mytable FROM STDIN"

With a more selective copy, you can use pretty much this mechanism to
limit a copy to a sumset of the records in a table.

Ok, but why not just implement this into pg_dump or psql?
Why bother the backend with that functionality?

Because "COPY" runs on the back-end, not the front end, and the front
end
may not even be in the same city as the backend. When you issue a "COPY"
the file it reads or writes local to the backend. True, the examples I
gave may not show how that is important, but consider this:

We were talking about COPY to stdout :-) Copy to file is another
issue :-) Copy to (server fs) file has so many limitations I dont see
wide use for it. (Of course there are usecases)

"wide use for" is not always the same as "useful." Sometimes "useful" is
something not easily doable in other ways or completes a feature set.

psql -h remote masterdb -c "COPY (select * from mytable where ID <
xxlastxx) as mytable TO '/replicate_backup/mytable-060602.pgc'"

This runs completely in the background and can serve as a running
backup.

And you are sure it would be much faster then a server local running
psql just dumping the result of a query?

No I can't be sure of that at all, but .... The COPY command has a
specific use that is understood and an operation that is separate from the
normal query mechanism.

(And you could more easy avoid raceconditions in contrast to several
remote clients trying to trigger your above backup )

Again, the examples may not have been precise in presenting "why," the
focus was mostly "what" so it could be discussed. As a generic feature it
has many potential uses. Trying to debate and defend a specific use limits
the potential scope of the feature.

Why have COPY anyway? Why not just use "SELECT * FROM TABLE?"

#18Joshua D. Drake
jd@commandprompt.com
In reply to: Stefan Kaltenbrunner (#7)
Re: COPY (query) TO file

Allow COPY to output from views
Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

I second that - allowing arbitrary SELECT statements as a COPY source
seems much more powerful and flexible than just supporting COPY FROM VIEW.

Not to be a sour apple or anything but I don't see how any of this is
needed in the backend since we can easily use Psql to do it, or pretty
much any other tool.

Joshua D. Drake

Stefan

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#19Mark Woodward
pgsql@mohawksoft.com
In reply to: Joshua D. Drake (#18)
Re: COPY (query) TO file

Allow COPY to output from views
Another idea would be to allow actual SELECT statements in a COPY.

Personally I strongly favor the second option as being more flexible
than the first.

I second that - allowing arbitrary SELECT statements as a COPY source
seems much more powerful and flexible than just supporting COPY FROM
VIEW.

Not to be a sour apple or anything but I don't see how any of this is
needed in the backend since we can easily use Psql to do it, or pretty
much any other tool.

There is an important difference between a capability in the backend vs
one synthesized in the frontend.

#20Joshua D. Drake
jd@commandprompt.com
In reply to: Mark Woodward (#19)
Re: COPY (query) TO file

Not to be a sour apple or anything but I don't see how any of this is
needed in the backend since we can easily use Psql to do it, or pretty
much any other tool.

There is an important difference between a capability in the backend vs
one synthesized in the frontend.

And that would be? The suspense is killing me :)

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#21Joshua D. Drake
jd@commandprompt.com
In reply to: Mark Woodward (#19)
#22Joshua D. Drake
jd@commandprompt.com
In reply to: Mark Woodward (#19)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tino Wildenhain (#12)
#24PFC
lists@peufeu.com
In reply to: Bruce Momjian (#16)
#25Bruce Momjian
bruce@momjian.us
In reply to: PFC (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
#27PFC
lists@peufeu.com
In reply to: Bruce Momjian (#25)
#28Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#17)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#26)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#29)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#26)
#32Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
#33Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#32)
#34Harald Fuchs
hf0406x@protecting.net
In reply to: Mark Woodward (#1)
#35Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Harald Fuchs (#34)