COPY with no WAL, in certain circumstances

Started by Simon Riggsover 19 years ago39 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

http://archives.postgresql.org/pgsql-hackers/2006-10/msg01172.php

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

BEGIN;
CREATE TABLE foo..
COPY foo...
COMMIT;

BEGIN;
TRUNCATE foo..
COPY foo...
COMMIT;

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

This plays nicely with the --single-transaction option in psql to allow
fast restores/upgrades.

YMMV but disk bound COPY will benefit greatly from this patch, some
tests showing 100% gain. COPY is still *very* CPU intensive, so some
tests have shown negligible benefit, fyi, but that isn't the typical
case.

Applies cleanly to CVS HEAD, passes make check.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

copy_nowal.v1.patchtext/x-patch; charset=UTF-8; name=copy_nowal.v1.patchDownload+187-58
#2Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
Re: COPY with no WAL, in certain circumstances

FYI, I am going need to add documentation in the COPY manual page or no
one will know about this performance enhancement.

---------------------------------------------------------------------------

Simon Riggs wrote:

http://archives.postgresql.org/pgsql-hackers/2006-10/msg01172.php

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

BEGIN;
CREATE TABLE foo..
COPY foo...
COMMIT;

BEGIN;
TRUNCATE foo..
COPY foo...
COMMIT;

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

This plays nicely with the --single-transaction option in psql to allow
fast restores/upgrades.

YMMV but disk bound COPY will benefit greatly from this patch, some
tests showing 100% gain. COPY is still *very* CPU intensive, so some
tests have shown negligible benefit, fyi, but that isn't the typical
case.

Applies cleanly to CVS HEAD, passes make check.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#2)
Re: COPY with no WAL, in certain circumstances

On Sat, 2007-01-06 at 11:05 -0500, Bruce Momjian wrote:

FYI, I am going need to add documentation in the COPY manual page or no
one will know about this performance enhancement.

I have some questions:

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

BEGIN;
CREATE TABLE foo..
COPY foo...
COMMIT;

What if I do this?

BEGIN;
CREATE TABLE foo...
INSERT INTO foo VALUES ('1');
COPY foo...

COMMIT;

?

E.g., what are the boundaries of ignoring the WAL?

Joshua D. Drake

BEGIN;
TRUNCATE foo..
COPY foo...
COMMIT;

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

This plays nicely with the --single-transaction option in psql to allow
fast restores/upgrades.

YMMV but disk bound COPY will benefit greatly from this patch, some
tests showing 100% gain. COPY is still *very* CPU intensive, so some
tests have shown negligible benefit, fyi, but that isn't the typical
case.

Applies cleanly to CVS HEAD, passes make check.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#4Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#3)
Re: COPY with no WAL, in certain circumstances

Joshua D. Drake wrote:

On Sat, 2007-01-06 at 11:05 -0500, Bruce Momjian wrote:

FYI, I am going need to add documentation in the COPY manual page or no
one will know about this performance enhancement.

I have some questions:

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

BEGIN;
CREATE TABLE foo..
COPY foo...
COMMIT;

What if I do this?

BEGIN;
CREATE TABLE foo...
INSERT INTO foo VALUES ('1');
COPY foo...

COMMIT;

On ABORT, the entire table disappears, as well as the INSERT, so I don't
see any problem. I assume the INSERT is WAL logged.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: COPY with no WAL, in certain circumstances

Bruce Momjian <bruce@momjian.us> writes:

FYI, I am going need to add documentation in the COPY manual page or no
one will know about this performance enhancement.

I don't think it belongs in COPY. What would make more sense is another
item under the "populating a database" performance tips, suggesting that
wrapping the restore into a single transaction is a good idea. We don't
really want to be documenting this separately under COPY, CREATE INDEX,
and everywhere else that might eventually optimize the case.

Come to think of it, that page also fails to suggest that PITR logging
shouldn't be on during bulk load.

regards, tom lane

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#4)
Re: COPY with no WAL, in certain circumstances

BEGIN;
CREATE TABLE foo...
INSERT INTO foo VALUES ('1');
COPY foo...

COMMIT;

On ABORT, the entire table disappears, as well as the INSERT, so I don't
see any problem. I assume the INSERT is WAL logged.

No I don't see any problems, I am just trying to understand the
boundaries. E.g., is there some weird limitation where if I have any
values in the table before the copy (like the example above) that copy
will go through WAL.

Or in other words, does this patch mean that all COPY execution that is
within a transaction will ignore WAL?

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

In reply to: Simon Riggs (#1)
Re: COPY with no WAL, in certain circumstances

Simon Riggs wrote:

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

Cool.

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot
of people use COPY because it's faster than INSERT but expects that it
will be in WAL. The default would be use_wal_in_copy = true.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#8Joshua D. Drake
jd@commandprompt.com
In reply to: Euler Taveira de Oliveira (#7)
Re: COPY with no WAL, in certain circumstances

On Sat, 2007-01-06 at 16:41 -0200, Euler Taveira de Oliveira wrote:

Simon Riggs wrote:

As discussed on -hackers, its possible to avoid writing any WAL at all
for COPY in these circumstances:

Cool.

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot
of people use COPY because it's faster than INSERT but expects that it
will be in WAL. The default would be use_wal_in_copy = true.

That I don't think makes sense. A copy is an all or nothing option, if a
copy fails in the middle the whole thing is rolled back.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Euler Taveira de Oliveira (#7)
Re: COPY with no WAL, in certain circumstances

Euler Taveira de Oliveira <euler@timbira.com> writes:

Simon Riggs wrote:

The enclosed patch implements this, as discussed. There is no user
interface to enable/disable, just as with CTAS and CREATE INDEX; no
docs, just code comments.

IMHO, this deserves an GUC parameter (use_wal_in_copy?).

Why? The whole point is that it's automatic and transparent.

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#6)
Re: COPY with no WAL, in certain circumstances

Joshua D. Drake wrote:

BEGIN;
CREATE TABLE foo...
INSERT INTO foo VALUES ('1');
COPY foo...

COMMIT;

On ABORT, the entire table disappears, as well as the INSERT, so I don't
see any problem. I assume the INSERT is WAL logged.

No I don't see any problems, I am just trying to understand the
boundaries. E.g., is there some weird limitation where if I have any
values in the table before the copy (like the example above) that copy
will go through WAL.

Or in other words, does this patch mean that all COPY execution that is
within a transaction will ignore WAL?

Yes, because it is possible to do in all cases.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#10)
Re: COPY with no WAL, in certain circumstances

On Sat, 2007-01-06 at 15:24 -0500, Bruce Momjian wrote:

Joshua D. Drake wrote:

BEGIN;
CREATE TABLE foo...
INSERT INTO foo VALUES ('1');
COPY foo...

COMMIT;

On ABORT, the entire table disappears, as well as the INSERT, so I don't
see any problem. I assume the INSERT is WAL logged.

No I don't see any problems, I am just trying to understand the
boundaries. E.g., is there some weird limitation where if I have any
values in the table before the copy (like the example above) that copy
will go through WAL.

Or in other words, does this patch mean that all COPY execution that is
within a transaction will ignore WAL?

Yes, because it is possible to do in all cases.

Very happy to add documentation where Tom suggested.

Reason for no documentation was that CREATE INDEX and CREATE TABLE AS
SELECT already use this optimisation, but to my knowledge neither was/is
documented on those command pages.

The rule is: if the relfilenode for a table is new in this transaction
(and therefore the whole things will be dropped at end-of-transaction)
then *all* COPY commands are able to avoid writing WAL safely, if:
- PITR is not enabled
- there is no active portal (which could have been opened on an earlier
commandid and could therefore see data prior to the switch to the new
relfilenode). In those cases, *not* using WAL causes no problems at all,
so sleep well without it.

So all of these work as shown

BEGIN;
COPY foo... --uses WAL
TRUNCATE foo...
COPY foo.. --no WAL
COPY foo.. --no WAL
COMMIT;

BEGIN;
CREATE TABLE foo...
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo... --no WAL
COMMIT;

BEGIN;
CREATE TABLE foo... AS SELECT
--no WAL
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
COMMIT;

BEGIN;
DECLARE CURSOR cursor
CREATE TABLE foo...
COPY foo.. --uses WAL because active portal
COPY foo.. --uses WAL because active portal
CLOSE cursor
COPY foo.. --no WAL
COPY foo.. --no WAL
COMMIT;

psql --single-transaction -f mydb.pgdump

Come to think of it, I should be able to use
pg_current_xlog_insert_location() to come up with a test case.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#12Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#11)
Re: COPY with no WAL, in certain circumstances

Simon Riggs wrote:

Or in other words, does this patch mean that all COPY execution that is
within a transaction will ignore WAL?

Yes, because it is possible to do in all cases.

Very happy to add documentation where Tom suggested.

Reason for no documentation was that CREATE INDEX and CREATE TABLE AS
SELECT already use this optimisation, but to my knowledge neither was/is
documented on those command pages.

I wasn't aware those used the optimization. Seems they all should be
documented somewhere.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In reply to: Joshua D. Drake (#8)
Re: COPY with no WAL, in certain circumstances

Joshua D. Drake wrote:

IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot
of people use COPY because it's faster than INSERT but expects that it
will be in WAL. The default would be use_wal_in_copy = true.

That I don't think makes sense. A copy is an all or nothing option, if a
copy fails in the middle the whole thing is rolled back.

I was worried about PITR, but Simon answers my question: PITR enables so
uses WAL.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#14Robert Treat
xzilla@users.sourceforge.net
In reply to: Bruce Momjian (#12)
Re: COPY with no WAL, in certain circumstances

On Saturday 06 January 2007 16:40, Bruce Momjian wrote:

Simon Riggs wrote:

Or in other words, does this patch mean that all COPY execution that
is within a transaction will ignore WAL?

Yes, because it is possible to do in all cases.

Very happy to add documentation where Tom suggested.

Reason for no documentation was that CREATE INDEX and CREATE TABLE AS
SELECT already use this optimisation, but to my knowledge neither was/is
documented on those command pages.

I wasn't aware those used the optimization. Seems they all should be
documented somewhere.

Might I suggest somewhere under chapter 27, with something akin to what we
have for documenting lock levels and the different operations that use them.
We document the reasons you want to avoid WAL and various operations in the
database that do this automagically.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#15Robert Treat
xzilla@users.sourceforge.net
In reply to: Simon Riggs (#11)
Re: COPY with no WAL, in certain circumstances

On Saturday 06 January 2007 16:36, Simon Riggs wrote:

The rule is: if the relfilenode for a table is new in this transaction
(and therefore the whole things will be dropped at end-of-transaction)
then *all* COPY commands are able to avoid writing WAL safely, if:
- PITR is not enabled
- there is no active portal (which could have been opened on an earlier
commandid and could therefore see data prior to the switch to the new
relfilenode). In those cases, *not* using WAL causes no problems at all,
so sleep well without it.

<snip>

BEGIN;
CREATE TABLE foo...
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo... --no WAL
COMMIT;

Is there some technical reason that the INSERT statements need to use WAL in
these scenarios? ISTM that in the above scenario there are no cases where
the INSERT statements are any more recoverable than the COPY statements.
While there might not be much gain from bypassing WAL on a single insert, in
bunches, or more importantly when doing INSERT INTO foo SELECT *, it could be
a nice improvement as well. Am I overlooking something?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#11)
Re: COPY with no WAL, in certain circumstances

"Simon Riggs" <simon@2ndquadrant.com> writes:

The rule is: if the relfilenode for a table is new in this transaction
(and therefore the whole things will be dropped at end-of-transaction)
then *all* COPY commands are able to avoid writing WAL safely, if:
- PITR is not enabled
- there is no active portal (which could have been opened on an earlier
commandid and could therefore see data prior to the switch to the new
relfilenode). In those cases, *not* using WAL causes no problems at all,
so sleep well without it.

Uh ... what in the world has an active portal got to do with it?
I think you've confused snapshot considerations with crash recovery.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: COPY with no WAL, in certain circumstances

Bruce Momjian <bruce@momjian.us> writes:

Simon Riggs wrote:

Reason for no documentation was that CREATE INDEX and CREATE TABLE AS
SELECT already use this optimisation, but to my knowledge neither was/is
documented on those command pages.

I wasn't aware those used the optimization. Seems they all should be
documented somewhere.

We don't document every single optimization in the system ... if we did,
the docs would be as big as the source code and equally unreadable by
non-programmers. I think it's a much better idea just to mention it one
place and not try to enumerate exactly which commands have the optimization.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Treat (#15)
Re: COPY with no WAL, in certain circumstances

Robert Treat <xzilla@users.sourceforge.net> writes:

On Saturday 06 January 2007 16:36, Simon Riggs wrote:
<snip>

BEGIN;
CREATE TABLE foo...
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo.. --no WAL
INSERT INTO foo --uses WAL
COPY foo... --no WAL
COMMIT;

Is there some technical reason that the INSERT statements need to use WAL in
these scenarios?

First, there's enough other overhead to an INSERT that you'd not save
much percentagewise. Second, not using WAL doesn't come for free: the
cost is having to fsync the whole table afterwards. So it really only
makes sense for commands that one can expect are writing pretty much
all of the table. I could easily see it being a net loss for individual
INSERTs.

regards, tom lane

#19Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#18)
Re: COPY with no WAL, in certain circumstances

Is there some technical reason that the INSERT statements need to use WAL in
these scenarios?

First, there's enough other overhead to an INSERT that you'd not save
much percentagewise. Second, not using WAL doesn't come for free: the
cost is having to fsync the whole table afterwards. So it really only
makes sense for commands that one can expect are writing pretty much
all of the table. I could easily see it being a net loss for individual
INSERTs.

What about multi value inserts? Just curious.

Joshua D. Drake

regards, tom lane

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#19)
Re: COPY with no WAL, in certain circumstances

"Joshua D. Drake" <jd@commandprompt.com> writes:

cost is having to fsync the whole table afterwards. So it really only
makes sense for commands that one can expect are writing pretty much
all of the table. I could easily see it being a net loss for individual
INSERTs.

What about multi value inserts? Just curious.

I wouldn't want the system to assume that a multi-VALUES insert is
writing most of the table. Would you? The thing is reasonable for
inserting maybe a few hundred or few thousand rows at most, and that's
still small in comparison to typical tables.

regards, tom lane

#21Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#20)
#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#18)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#16)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#24)
#26Martijn van Oosterhout
kleptog@svana.org
In reply to: Simon Riggs (#25)
#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Martijn van Oosterhout (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#25)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#27)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#28)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
#33Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#34)
#36Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#36)
#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#37)
#39Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#17)