Performance testing of COPY (SELECT) TO
Hi,
we have a large export here, I made an in-house benchmark
between Informix, plain PostgreSQL-8.1.4 and
8.2devel+COPY(SELECT) using the same data and query.
Find the results below for the two PostgreSQL versions.
With PostgreSQL 8.1.4, I used this:
begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;
With 8.2devel, I simple used
copy (select ...) to stdout csv delimiter '|';
# of clients: 1* 3** 10**
PostgreSQL 1:33 10:58 55:46
PostgreSQL 8.2 1:19 4:55 18:28
* - average of 4 runs, the first was with cold caches after reboot
** - 1 run, average of cliens' runtimes
Performance between 8.1.4 and 8.2devel is interesting:
1 client: 15%
3 clients: 55.2%
10 clients: 66.9%
The same machine was used for testing.
Best regards,
Zolt�n B�sz�rm�nyi
=?iso-8859-2?Q?B=F6sz=F6rm=E9nyi_Zolt=E1n?= <zboszor@dunaweb.hu> writes:
With PostgreSQL 8.1.4, I used this:
begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;
The performance of this would doubtless vary a lot with the temp_buffers
setting. Did you try different values?
It'd also be interesting to time the same way (with a temp table) in
devel. I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1. In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.
regards, tom lane
B�sz�rm�nyi Zolt�n <zboszor@dunaweb.hu> writes:
With PostgreSQL 8.1.4, I used this:
begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;The performance of this would doubtless vary a lot with the temp_buffers
setting. Did you try different values?
Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.
It'd also be interesting to time the same way (with a temp table) in
devel. I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1. In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.
I will report back with that, say on Monday.
In the meantime, I documented the COPY (SELECT) case
and modified parser/analyze.c and tcop/utility.c so neither of them
calls anything from under another directory. I think it's cleaner now.
Also, I tried to implement more closely what printtup() does.
Please, review.
Best regards,
Zolt�n B�sz�rm�nyi
Attachments:
It'd also be interesting to time the same way (with a temp table) in
devel. I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1. In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.
In the export, there is a largish table, that has both many columns and rows.
With COPY(SELECT) patch applied:
time psql -c "copy (select * from table) to 'file'" dbx
COPY 886046
real 0m13.253s
user 0m0.000s
sys 0m0.000s
time psql -c "copy table to 'file'" dbx
COPY 886046
real 0m13.234s
user 0m0.000s
sys 0m0.000s
time psql -c "copy table to stdout" dbx >file
real 0m15.155s
user 0m0.540s
sys 0m0.450s
time psql -c "copy (select * from table) to stdout" dbx >file
real 0m15.079s
user 0m0.540s
sys 0m0.590s
Surprisingly, without the COPY(SELECT) patch it's slower,
it's the lowest from five runs, e.g. with warm caches:
time psql -c "copy table to 'file'" dbx
real 0m20.464s
user 0m0.000s
sys 0m0.010s
time psql -c "copy table to stdout" dbx >file
real 0m25.753s
user 0m0.570s
sys 0m0.460s
With the original settings, temp_buffers = 1000 on 8.2CVS,
the one client case looks like this: first run 1:44, second run 1:12,
third run 1:04. It seems it's a bit faster both on startup and on
subsequent runs.
Best regards,
Zolt�n B�sz�rm�nyi
Hi,
B�sz�rm�nyi Zolt�n �rta:
B�sz�rm�nyi Zolt�n <zboszor@dunaweb.hu> writes:
With PostgreSQL 8.1.4, I used this:
begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;The performance of this would doubtless vary a lot with the temp_buffers
setting. Did you try different values?Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.It'd also be interesting to time the same way (with a temp table) in
devel. I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1. In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.I will report back with that, say on Monday.
It seems my previous mail hasn't reached
the hackers list, I answer here.
In the export, there is a largish table,
that has both many columns and rows.
With COPY(SELECT) patch applied:
time psql -c "copy (select * from table) to 'file'" dbx
COPY 886046
real 0m13.253s
user 0m0.000s
sys 0m0.000s
time psql -c "copy table to 'file'" dbx
COPY 886046
real 0m13.234s
user 0m0.000s
sys 0m0.000s
time psql -c "copy table to stdout" dbx >file
real 0m15.155s
user 0m0.540s
sys 0m0.450s
time psql -c "copy (select * from table) to stdout" dbx >file
real 0m15.079s
user 0m0.540s
sys 0m0.590s
Surprisingly, without the COPY(SELECT) patch it's slower,
this is the lowest from five runs, e.g. with warm caches:
time psql -c "copy table to 'file'" dbx
real 0m20.464s
user 0m0.000s
sys 0m0.010s
time psql -c "copy table to stdout" dbx >file
real 0m25.753s
user 0m0.570s
sys 0m0.460s
With the original settings, temp_buffers = 1000 on 8.2CVS,
the export runtime with one client looks like this:
first run 1:44, second run 1:12, third run 1:04.
It seems it's a bit faster both on startup and on
subsequent runs.
Best regards,
Zolt�n B�sz�rm�nyi
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
---------------------------------------------------------------------------
B���sz���rm���nyi Zolt���n wrote:
B?sz?rm?nyi Zolt?n <zboszor@dunaweb.hu> writes:
With PostgreSQL 8.1.4, I used this:
begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;The performance of this would doubtless vary a lot with the temp_buffers
setting. Did you try different values?Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.It'd also be interesting to time the same way (with a temp table) in
devel. I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1. In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.I will report back with that, say on Monday.
In the meantime, I documented the COPY (SELECT) case
and modified parser/analyze.c and tcop/utility.c so neither of them
calls anything from under another directory. I think it's cleaner now.
Also, I tried to implement more closely what printtup() does.
Please, review.Best regards,
Zolt?n B?sz?rm?nyi
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian �rta:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
Thanks. Would you please add this instead?
psql built-in \copy (select ...) now also work.
Best regards,
Zolt�n B�sz�rm�nyi
Attachments:
pgsql-copyselect-8.patch.gzapplication/x-tar; name=pgsql-copyselect-8.patch.gzDownload
Removed Cc: to pgsql-hackers.
Zolt�n,
Zoltan Boszormenyi wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.Thanks. Would you please add this instead?
psql built-in \copy (select ...) now also work.
Please check this one out. I took the version you posted here and
changed the stuff in the parser that I didn't like, and removed the ugly
"SELECT * FROM" stuff that was bothering me. I also removed the
transformCopyStmt stuff as it seems unnecessary to me. I did all that
stuff in a cleaner way (IMO).
I also cleaned up the grammar -- basically added a separate case from
the regular COPY. I took the opportunity to remove the
backwards-compatible options from there. I didn't check that stuff very
much but it should continue to work ...
I noticed that this works:
alvherre=# copy (values (1, 'uno'), (2, 'dos'), (3, 'tr;es'), (4, NULL)) to stdout with delimiter ';' null 'NUL' csv quote as '"';
1;uno
2;dos
3;"tr;es"
4;NUL
which is nice.
With this patch, the COPY view FROM stdout path now throws an error --
in your version it worked (because of that "COPY * FROM" stuff), and
from previous discussion it seems reasonable to behave differently for
views than for plain tables (i.e. it's reasonable that we fail for
views).
I also broke the check for a FOR UPDATE clause. Not sure where but it
must be easy to fix :-) I'd do it myself but I'm heading to bed right
now.
I also wanted to check these hunks in your patch, which I didn't like
very much:
-ERROR: column "a" of relation "test" does not exist
+ERROR: column "a" does not exist
but didn't got around to it.
I also noticed that the new copyselect regression test is not added to
the serial schedule.
I'll repost a reworked version at some point, if no one beats me to it.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Attachments:
copyview-9.patchtext/plain; charset=us-asciiDownload+705-455
Alvaro Herrera <alvherre@commandprompt.com> writes:
... I'd do it myself but I'm heading to bed right now.
...
I'll repost a reworked version at some point, if no one beats me to it.
I was planning to start looking at this patch tomorrow (unless Gavin
produces a new bitmap-index patch by then). I'll work from this one
unless somebody produces a better version meanwhile.
regards, tom lane
Hi,
what's the problem with COPY view TO, other than you don't like it? :-)
That was the beginning and is used in production
according to the original authors.
I also broke the check for a FOR UPDATE clause. Not sure where but it
must be easy to fix :-) I'd do it myself but I'm heading to bed right
now.
Fixed.
I also wanted to check these hunks in your patch, which I didn't like
very much:-ERROR: column "a" of relation "test" does not exist +ERROR: column "a" does not exist
It was because of too much code sharing. I fixed it by passing
the relation name to CopyGetAttnums() in the relation case,
so the other regression tests aren't bothered now.
The docs and the regression test is modified according to your version.
Best regards,
Zolt�n B�sz�rm�nyi
Attachments:
Import Notes
Resolved by subject fallback
B�sz�rm�nyi Zolt�n wrote:
Hi,
what's the problem with COPY view TO, other than you don't like it? :-)
The problem is that it required a ugly piece of code. Not supporting it
means we can keep the code nice. The previous discussion led to this
conclusion anyway so I don't know why we are debating it again.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote:
B�sz�rm�nyi Zolt�n wrote:
Hi,
what's the problem with COPY view TO, other than you don't like it? :-)
The problem is that it required a ugly piece of code. Not supporting it
means we can keep the code nice. The previous discussion led to this
conclusion anyway so I don't know why we are debating it again.
What is so ugly about it? I haven't looked at the code, but I am curious
to know.
I also don't recall the consensus being quite so clear cut. I guess
there is a case for saying that if it's not allowed then you know that
"COPY relname TO" is going to be fast. But, code aesthetics aside, the
reasons for disallowing it seem a bit thin, to me.
cheers
andrew
Andrew Dunstan �rta:
Alvaro Herrera wrote:
B�sz�rm�nyi Zolt�n wrote:
Hi,
what's the problem with COPY view TO, other than you don't like it? :-)
The problem is that it required a ugly piece of code. Not supporting it
means we can keep the code nice. The previous discussion led to this
conclusion anyway so I don't know why we are debating it again.What is so ugly about it? I haven't looked at the code, but I am
curious to know.I also don't recall the consensus being quite so clear cut. I guess
there is a case for saying that if it's not allowed then you know that
"COPY relname TO" is going to be fast. But, code aesthetics aside, the
reasons for disallowing it seem a bit thin, to me.cheers
andrew
I would say the timing difference between
"COPY table TO" and "COPY (SELECT * FROM table) TO"
was noise, so it's not even faster.
And an updatable VIEW *may* allow COPY view FROM...
Best regards,
Zolt�n B�sz�rm�nyi
Zoltan Boszormenyi wrote:
Andrew Dunstan �rta:
Alvaro Herrera wrote:
B�sz�rm�nyi Zolt�n wrote:
what's the problem with COPY view TO, other than you don't like it? :-)
The problem is that it required a ugly piece of code. Not supporting it
means we can keep the code nice. The previous discussion led to this
conclusion anyway so I don't know why we are debating it again.What is so ugly about it? I haven't looked at the code, but I am
curious to know.
It used a "SELECT * FROM %s" string that was passed back to the parser.
I also don't recall the consensus being quite so clear cut. I guess
there is a case for saying that if it's not allowed then you know that
"COPY relname TO" is going to be fast. But, code aesthetics aside, the
reasons for disallowing it seem a bit thin, to me.I would say the timing difference between
"COPY table TO" and "COPY (SELECT * FROM table) TO"
was noise, so it's not even faster.
Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).
And an updatable VIEW *may* allow COPY view FROM...
May I remind you that we've been in feature freeze for four weeks
already? Now it's *not* the time to be drooling over cool features that
would be nice to have.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).
this will definitely be the case - however, this is not what it was made
for. it has not been made to be fast but it has been made to fulfill
some other task. the reason why this has been implemented is: consider a
large scale database containing hundreds of gigs of data. in our special
case we have to export in a flexible way. the data which has to be
exported comes from multiple tables (between 3 and 7 depending on the
data we are looking at in this project. the export has to be performed
in a flexible way and it needs certain parameters. defining tmp tables
and store the data in there is simply not "nice" at all. in most cases
exports want to transform data on the fly - speed is not as important as
flexibility here.
so in my view the speed argument does not matter. if somebody passes a
stupid query to copy he will get stupid runtimes - just like on ordinary
sql. however, we can use COPY's capabilities to format / escape data to
make exports more flexible. so basically it is a win.
best regards,
hans
--
Cybertec Geschwinde & Sch�nig GmbH
Sch�ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at
Hans-Juergen Schoenig wrote:
Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).this will definitely be the case - however, this is not what it was made
for. it has not been made to be fast but it has been made to fulfill
some other task. the reason why this has been implemented is: consider a
large scale database containing hundreds of gigs of data. in our special
case we have to export in a flexible way. the data which has to be
exported comes from multiple tables (between 3 and 7 depending on the
data we are looking at in this project. the export has to be performed
in a flexible way and it needs certain parameters. defining tmp tables
and store the data in there is simply not "nice" at all. in most cases
exports want to transform data on the fly - speed is not as important as
flexibility here.
My question is, if we allow this:
copy (select * from view) to stdout;
(or to a file, whatever), is it enough for you? Or would you insist on
also having
copy view to stdout;
?
We can, and the posted patch does, support the first form, but not the
second. In fact I deliberately removed support for the second form for
Zolt�n's patch because it uglifies the surrounding code.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera �rta:
Zoltan Boszormenyi wrote:
Andrew Dunstan �rta:
Alvaro Herrera wrote:
B�sz�rm�nyi Zolt�n wrote:
what's the problem with COPY view TO, other than you don't like it? :-)
The problem is that it required a ugly piece of code. Not supporting it
means we can keep the code nice. The previous discussion led to this
conclusion anyway so I don't know why we are debating it again.What is so ugly about it? I haven't looked at the code, but I am
curious to know.It used a "SELECT * FROM %s" string that was passed back to the parser.
I also don't recall the consensus being quite so clear cut. I guess
there is a case for saying that if it's not allowed then you know that
"COPY relname TO" is going to be fast. But, code aesthetics aside, the
reasons for disallowing it seem a bit thin, to me.I would say the timing difference between
"COPY table TO" and "COPY (SELECT * FROM table) TO"
was noise, so it's not even faster.Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).
COPY relname TO meant tables _and_ views to me.
My previous tsting showed no difference between
COPY table TO and COPY (SELECT * FROM table) TO.
Similarly a slow query defined in the view should show
no difference between COPY view TO and
COPY (SELECT * FROM view) TO.
And remember, Bruce put the original COPY view TO
patch into the unapplied queue, without the SELECT
feature.
Rewriting COPY view TO internally to
COPY (SELECT * FROM view) TO is very
straightforward, even if you think it's ugly.
BTW, why is it ugly if I call raw_parser()
from under src/backend/parser/*.c ?
It is on a query distinct to the query the parser
is currently running. Or is it the recursion
that bothers you? It's not a possible infinite
recursion.
And an updatable VIEW *may* allow COPY view FROM...
May I remind you that we've been in feature freeze for four weeks
already? Now it's *not* the time to be drooling over cool features that
would be nice to have
Noted. However, as the COPY view TO is
a straight internal rewrite, a COPY view FROM
could also be. Even if it's a long term development.
I wasn't proposing delaying beta.
Best regards,
Zolt�n B�sz�rm�nyi
Alvaro Herrera <alvherre@commandprompt.com> writes:
My question is, if we allow this:
copy (select * from view) to stdout;
(or to a file, whatever), is it enough for you? Or would you insist on
also having
copy view to stdout;
?
We can, and the posted patch does, support the first form, but not the
second. In fact I deliberately removed support for the second form for
Zolt�n's patch because it uglifies the surrounding code.
Personally, I have no moral objection to supporting the second form
as a special case of the general COPY-from-select feature, but if it
can't be done without uglifying the code then I'd agree with dropping
it. I guess the question is whether the uglification is intrinsic or
just a result of being descended from a poor original implementation.
The feature-freeze argument seems not relevant, given that the code
we had on the feature-freeze date did both things.
Has this patch settled to the point where I can review it, or is it
still in motion?
regards, tom lane
Alvaro Herrera wrote:
Hans-Juergen Schoenig wrote:
Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).this will definitely be the case - however, this is not what it was made
for. it has not been made to be fast but it has been made to fulfill
some other task. the reason why this has been implemented is: consider a
large scale database containing hundreds of gigs of data. in our special
case we have to export in a flexible way. the data which has to be
exported comes from multiple tables (between 3 and 7 depending on the
data we are looking at in this project. the export has to be performed
in a flexible way and it needs certain parameters. defining tmp tables
and store the data in there is simply not "nice" at all. in most cases
exports want to transform data on the fly - speed is not as important as
flexibility here.My question is, if we allow this:
copy (select * from view) to stdout;
(or to a file, whatever), is it enough for you? Or would you insist on
also havingcopy view to stdout;
?
i would say that "copy view to stdout" is just some syntactic sugar (to
me at least). the important thing is that we add the flexibility of
SELECT to it. a view is nothing else than a rule on SELECT anyway. to be
honest i never thought about views when creating this copy idea.
however, i think it is not bad to have it because i have seen a couple
of times already that tables turn into views when new features are added
to an existing data structure . if we support copy on views this means
that exports can stay as they are even if the data structure is changed
in that way.
however, if people think that views are not needed that way it is still
a good solution as views are not the basic reason why this new
functionality is a good thing to have.
many thanks,
hans
--
Cybertec Geschwinde & Sch�nig GmbH
Sch�ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at
Zoltan Boszormenyi wrote:
Alvaro Herrera �rta:
Remember that we were talking about supporting views, not tables. And
if a view uses a slow query then you are in immediate danger of having a
slow COPY. This may not be a problem but it needs to be discussed and
an agreement must be reached before introducing such a change (and not
during feature freeze).COPY relname TO meant tables _and_ views to me.
My previous tsting showed no difference between
COPY table TO and COPY (SELECT * FROM table) TO.
Similarly a slow query defined in the view should show
no difference between COPY view TO and
COPY (SELECT * FROM view) TO.
The difference is that we are giving a very clear distinction between a
table and a view. If we don't support the view in the direct COPY, but
instead insist that it be passed via a SELECT query, then the user will
be aware that it may be slow.
"relname" at this point may mean anything -- are you supporting
sequences and toast tables as well?
And remember, Bruce put the original COPY view TO
patch into the unapplied queue, without the SELECT
feature.
All sort of junk enters that queue so that's not an argument. (Not
meant to insult Bruce -- I'm just saying that he doesn't filter stuff.
We've had patches rejected from the queue before plenty of times.)
Rewriting COPY view TO internally to
COPY (SELECT * FROM view) TO is very
straightforward, even if you think it's ugly.
BTW, why is it ugly if I call raw_parser()
from under src/backend/parser/*.c ?
It is on a query distinct to the query the parser
is currently running. Or is it the recursion
that bothers you? It's not a possible infinite
recursion.
It's ugly because you are forcing the system to parse something that
was already parsed.
On the other hand I don't see why you are arguing in favor of a useless
feature whose coding is dubious; you can have _the same thing_ with nice
code and no discussion.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support