Is full_page_writes=off safe in conjunction with PITR?

Started by Tom Laneover 19 years ago36 messages
#1Tom Lane
tgl@sss.pgh.pa.us

While thinking about the patch I just made to allow full_page_writes to
be turned off again, it struck me that this patch only fixes the problem
for post-crash XLOG replay. There is still a hazard if the variable is
turned off in a PITR master system. The reason is that while a base
backup is being taken, the backup-taker might read an inconsistent state
of a page and include that in the backup. This is not a problem if
full_page_writes is ON --- it's logically equivalent to a torn page
write and will be fixed on the slave by XLOG replay. But it *is* a
problem if full_page_writes is OFF, for exactly the same reason that
torn page writes are a problem.

I think we had originally argued that there was no problem anyway
because the kernel should cause the page write to appear atomic to other
processes (since we issue it in a single write() command). But that's
only true if the backup-taker reads in units that are multiples of
BLCKSZ. If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half. I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

regards, tom lane

#2Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#1)
Re: Is full_page_writes=off safe in conjunction with

Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:

I think we had originally argued that there was no problem anyway
because the kernel should cause the page write to appear atomic to other
processes (since we issue it in a single write() command). But that's
only true if the backup-taker reads in units that are multiples of
BLCKSZ. If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half. I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

Why not just tell the backup-taker to take backups using 8K pages ?

---------------
Hannu

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#2)
Re: Is full_page_writes=off safe in conjunction with PITR?

Hannu Krosing <hannu@skype.net> writes:

Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:

If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half. I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

Why not just tell the backup-taker to take backups using 8K pages ?

How? (No, I don't think tar's blocksize options control this
necessarily --- those indicate the blocking factor on the *tape*.
And not everyone uses tar anyway.)

Even if this would work for all popular backup programs, it seems
far too fragile: the consequence of forgetting the switch would be
silent data corruption, which you might not notice until the slave
had been in live operation for some time.

regards, tom lane

#4Noname
markir@paradise.net.nz
In reply to: Tom Lane (#1)
Re: Is full_page_writes=off safe in conjunction with PITR?

Quoting Tom Lane <tgl@sss.pgh.pa.us>:

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

Yeah - if you are enabling PITR, then you care about safety and integrity, so it
makes sense (well, to me anyway).

Cheers

Mark

#5Florian Weimer
fw@deneb.enyo.de
In reply to: Tom Lane (#1)
Re: Is full_page_writes=off safe in conjunction with PITR?

* Tom Lane:

I think we had originally argued that there was no problem anyway
because the kernel should cause the page write to appear atomic to other
processes (since we issue it in a single write() command).

I doubt Linux makes any such guarantees. See this recent thread on
linux-kernel: <http://marc.theaimsgroup.com/?t=114489284200003&gt;

#6Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#3)
Re: Is full_page_writes=off safe in conjunction with

Ühel kenal päeval, R, 2006-04-14 kell 17:31, kirjutas Tom Lane:

Hannu Krosing <hannu@skype.net> writes:

Ühel kenal päeval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:

If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half. I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

Why not just tell the backup-taker to take backups using 8K pages ?

How?

use find + dd, or whatever. I just dont want it to be made universally
unavailable just because some users *might* use an file/disk-level
backup solution which is incompatible.

(No, I don't think tar's blocksize options control this
necessarily --- those indicate the blocking factor on the *tape*.
And not everyone uses tar anyway.)

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

Even if this would work for all popular backup programs, it seems
far too fragile: the consequence of forgetting the switch would be
silent data corruption, which you might not notice until the slave
had been in live operation for some time.

We may declare only one solution to be supported by us with
XLogArchivingActive, say a gnu tar modified to read in Nx8K blocks
( pg_tar :p ).

I guess that even if we can control what operating system does, it is
still possible to get a torn page using some SAN solution, where you can
freeze the image for backup independent of OS.

----------------
Hannu

#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: Is full_page_writes=off safe in conjunction with PITR?

Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

��hel kenal p��eval, R, 2006-04-14 kell 16:40, kirjutas Tom Lane:

If the backup-taker reads, say, 4K at a time then it's
certainly possible that it gets a later version of the second half of a
page than it got of the first half. I don't know about you, but I sure
don't feel comfortable making assumptions at that level about the
behavior of tar or cpio.

I fear we still have to disable full_page_writes (force it ON) if
XLogArchivingActive is on. Comments?

Why not just tell the backup-taker to take backups using 8K pages ?

How? (No, I don't think tar's blocksize options control this
necessarily --- those indicate the blocking factor on the *tape*.
And not everyone uses tar anyway.)

Even if this would work for all popular backup programs, it seems
far too fragile: the consequence of forgetting the switch would be
silent data corruption, which you might not notice until the slave
had been in live operation for some time.

Yea, it is a problem. Even a 10k read is going to read 2k into the next
page.

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off. Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

In fact, could we have pg_start_backup() turn on full_page_writes and
have pg_stop_backup turn it off, if postgresql.conf has it off.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off.

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#6)
Re: Is full_page_writes=off safe in conjunction with PITR?

Hannu Krosing <hannu@skype.net> writes:

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

Given Florian's concern, sounds like you might have to write your own
kernel too. In which case, generating a variant build of Postgres
that allows full_page_writes to be disabled is certainly not beyond
your powers. But for the ordinary mortal DBA, I think this combination
is just too unsafe to even consider.

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#8)
Re: Is full_page_writes=off safe in conjunction with PITR?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off.

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

We should probably throw a LOG message too.

Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

Ah, yea.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#9)
Re: Is full_page_writes=off safe in conjunction with

Ühel kenal päeval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane:

Hannu Krosing <hannu@skype.net> writes:

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

Given Florian's concern, sounds like you might have to write your own
kernel too. In which case, generating a variant build of Postgres
that allows full_page_writes to be disabled is certainly not beyond
your powers. But for the ordinary mortal DBA, I think this combination
is just too unsafe to even consider.

I guess that writing our own pg_tar, which cooperates with postgres
backends to get full pages, is still in the realm of possible things,
even on kernels which dont guarantee atomic visibility of write() calls.

But until such is included in the distribution it is a good idea indeed
to disable full_page_writes=off when doing PITR.

--------------
Hannu

#12Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hannu Krosing (#11)
Re: Is full_page_writes=off safe in conjunction with

Hannu Krosing wrote:

?hel kenal p?eval, L, 2006-04-15 kell 11:49, kirjutas Tom Lane:

Hannu Krosing <hannu@skype.net> writes:

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

Given Florian's concern, sounds like you might have to write your own
kernel too. In which case, generating a variant build of Postgres
that allows full_page_writes to be disabled is certainly not beyond
your powers. But for the ordinary mortal DBA, I think this combination
is just too unsafe to even consider.

I guess that writing our own pg_tar, which cooperates with postgres
backends to get full pages, is still in the realm of possible things,
even on kernels which dont guarantee atomic visibility of write() calls.

But until such is included in the distribution it is a good idea indeed
to disable full_page_writes=off when doing PITR.

The cost/benefit of that seems very discouraging. Most backup
applications allow for a block size to be specified, so it isn't
unreasonable to assume that people who really want PITR and
full_page_writes can easily set the block size to 8k. However, I don't
think we are going to allow that to be configured --- you would have to
hack up our backend code to allow the combination.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Marko Kreen
markokr@gmail.com
In reply to: Bruce Momjian (#12)
Re: Is full_page_writes=off safe in conjunction with

On 4/16/06, Bruce Momjian <pgman@candle.pha.pa.us> wrote:

Hannu Krosing wrote:

I guess that writing our own pg_tar, which cooperates with postgres
backends to get full pages, is still in the realm of possible things,
even on kernels which dont guarantee atomic visibility of write() calls.

But until such is included in the distribution it is a good idea indeed
to disable full_page_writes=off when doing PITR.

The cost/benefit of that seems very discouraging. Most backup
applications allow for a block size to be specified, so it isn't
unreasonable to assume that people who really want PITR and
full_page_writes can easily set the block size to 8k. However, I don't
think we are going to allow that to be configured --- you would have to
hack up our backend code to allow the combination.

The problem is that they allow configuring _target_ block size,
not reading block size. I did some tests with strace:

* GNU cpio version 2.5

allows to change only output block size, input block is 512
bytes. Maybe uses device's block size?

* tar (GNU tar) 1.15.1

the '-b' and '--record-size' options change also input block
size, but to get 8192 bytes for output block, the first read is 7680
bytes to make room for tar header. the rest of reads are indeed 8192
bytes, but that won't help us anymore.

* cp (coreutils) 5.2.1

fixed block size of 4096 bytes.

* rsync version 2.6.5

it does not have a way to change input block size. but it seems
that it reads with 32k blocks or full file if length < 32k.

So we should probably document that rsync is only working solution.

--
marko

#14Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: Is full_page_writes=off safe in conjunction with

On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off.

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

I can see the danger of which you speak, but does it necessarily apply
to all forms of backup?

Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

It seems that we should write an API to allow a backup device to ask for
blocks from the database.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com/

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#13)
Re: Is full_page_writes=off safe in conjunction with

"Marko Kreen" <markokr@gmail.com> writes:

So we should probably document that rsync is only working solution.

No, we're just turning off the variable. One experiment on one version
of rsync doesn't prove it's "safe", even if there weren't the kernel-
behavior issue to consider.

regards, tom lane

#16Martijn van Oosterhout
kleptog@svana.org
In reply to: Hannu Krosing (#6)
Re: Is full_page_writes=off safe in conjunction with

On Sat, Apr 15, 2006 at 01:31:58PM +0300, Hannu Krosing wrote:

(No, I don't think tar's blocksize options control this
necessarily --- those indicate the blocking factor on the *tape*.
And not everyone uses tar anyway.)

If I'm desperate enough to get the 2x reduction of WAL writes, I may
even write my own backup solution.

I must be missing something obvious, but why don't we compress the
xlogs? They appear to be quite compressable (>75%) with standard gzip...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#17Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#15)
Re: Is full_page_writes=off safe in conjunction with

Ühel kenal päeval, P, 2006-04-16 kell 11:31, kirjutas Tom Lane:

"Marko Kreen" <markokr@gmail.com> writes:

So we should probably document that rsync is only working solution.

No, we're just turning off the variable. One experiment on one version
of rsync doesn't prove it's "safe", even if there weren't the kernel-
behavior issue to consider.

But if we do need to consider the kernel-level behaviour mentioned, then
the whole PITR thing becomes an impossibility. Consider the case when we
get a torn page during the initial copy with tar/cpio/rsync/whatever,
and no WAL record updates it.

In that case we will just have a torn page in backup with no way to fix
it.

-------------
Hannu

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#17)
Re: Is full_page_writes=off safe in conjunction with

Hannu Krosing <hannu@skype.net> writes:

But if we do need to consider the kernel-level behaviour mentioned, then
the whole PITR thing becomes an impossibility. Consider the case when we
get a torn page during the initial copy with tar/cpio/rsync/whatever,
and no WAL record updates it.

The only way the backup program could read a torn page is if the
database is writing that page concurrently, in which case there must
be a WAL record for the action.

This was all thought through carefully when the PITR mechanism was
designed, and it is solid -- as long as we are doing full-page writes.
Unfortunately, certain people forced that feature into 8.1 without
adequate review of the system's assumptions ...

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#14)
Re: Is full_page_writes=off safe in conjunction with

Simon Riggs <simon@2ndquadrant.com> writes:

On Sat, 2006-04-15 at 11:45 -0400, Tom Lane wrote:

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

I can see the danger of which you speak, but does it necessarily apply
to all forms of backup?

No, but the problem is we're not sure which forms are safe; it appears
to depend on poorly-documented details of behavior of both the kernel
and the backup program --- details that might well vary from one version
to the next even of the "same" program. Given the variety of platforms
PG runs on, I can't see us expending the effort to try to monitor which
combinations it might be safe to not use full_page_writes with.

It seems that we should write an API to allow a backup device to ask for
blocks from the database.

I don't think we have the manpower or interest to develop and maintain
our own backup tool --- or tools, actually, as you'd at least want a tar
replacement and an rsync replacement. Oracle might be able to afford
to throw programmers at that sort of thing, but where are you going to
get volunteers for tasks as mind-numbing as maintaining a PG-specific
tar replacement?

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#16)
Re: Is full_page_writes=off safe in conjunction with

Martijn van Oosterhout <kleptog@svana.org> writes:

I must be missing something obvious, but why don't we compress the
xlogs? They appear to be quite compressable (>75%) with standard gzip...

Might be worth experimenting with, but I'm a bit dubious. We've seen
several tests showing that XLogInsert's calculation of a CRC for each
WAL record is a bottleneck (that's why we backed off from 64-bit CRC
to 32-bit recently). I'd think that any nontrivial compression
algorithm would be vastly slower than CRC ...

regards, tom lane

#21Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#7)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian <pgman ( at ) candle ( dot ) pha ( dot ) pa ( dot ) us> writes:

I am thinking we should throw an error on pg_start_backup() and
pg_stop_backup if full_page_writes is off.

No, we'll just change the test in xlog.c so that fullPageWrites is
ignored if XLogArchivingActive.

Seems archive_command and
full_page_writes can still be used if we are not in the process of doing
a file system backup.

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

I am confused. Since we checkpoint during pg_start_backup(), isn't any
write to a file while the tar backup is going on going to be a full page
write? And once we pg_stop_backup(), do we need full page writes?

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#22Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#19)
Re: Is full_page_writes=off safe in conjunction with

On Sun, Apr 16, 2006 at 04:44:50PM -0400, Tom Lane wrote:

It seems that we should write an API to allow a backup device to ask for
blocks from the database.

I don't think we have the manpower or interest to develop and maintain
our own backup tool --- or tools, actually, as you'd at least want a tar
replacement and an rsync replacement. Oracle might be able to afford
to throw programmers at that sort of thing, but where are you going to
get volunteers for tasks as mind-numbing as maintaining a PG-specific
tar replacement?

Why would it have to replicate the functionality of tar or rsync? AFAICT
we'd only need the ability to produce something that could be consummed
by either a postgres backend or some other utility of our own creation.
I also think it'd be fine to forgo the rsync capabilities, at least in
an initial version.

Come to think of it, someone not too long ago was proposing an API to
allow a 'PITR slave' to subscribe to a master for WAL segments/changes;
it seems logical to me for that API to also provide the ability to send
relation data as well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

I am confused. Since we checkpoint during pg_start_backup(), isn't any
write to a file while the tar backup is going on going to be a full page
write? And once we pg_stop_backup(), do we need full page writes?

Hm. The case I was concerned about was where a page is never written
to while the backup occurs (thus not triggering any full-page WAL
entry), and then the first post-backup write is partial. However, if
the backup is guaranteed to have captured a non-torn copy of such a page
then there shouldn't be any problem. So if we consider the initial
checkpoint to be a *required part* of pg_start_backup (right now it is
not) then maybe we can get away with this. It needs more eyeballs on it
though ... after having been burnt once by full_page_writes, I'm pretty
shy ...

regards, tom lane

#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#23)
Re: Is full_page_writes=off safe in conjunction with PITR?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Think harder: we are only safe if the first write to a given page after
it's mis-copied by the archiver is a full page write. The requirement
therefore continues after pg_stop_backup. Unless you want to add
infrastructure to keep track for *every page* in the DB of whether it's
been fully written since the last backup?

I am confused. Since we checkpoint during pg_start_backup(), isn't any
write to a file while the tar backup is going on going to be a full page
write? And once we pg_stop_backup(), do we need full page writes?

Hm. The case I was concerned about was where a page is never written
to while the backup occurs (thus not triggering any full-page WAL
entry), and then the first post-backup write is partial. However, if
the backup is guaranteed to have captured a non-torn copy of such a page
then there shouldn't be any problem. So if we consider the initial
checkpoint to be a *required part* of pg_start_backup (right now it is
not) then maybe we can get away with this. It needs more eyeballs on it
though ... after having been burnt once by full_page_writes, I'm pretty
shy ...

Right. The comment in pg_start_backup() has to be updated:

/*
* Force a CHECKPOINT. This is not strictly necessary, but it seems like
* a good idea to minimize the amount of past WAL needed to use the
* backup. Also, this guarantees that two successive backup runs will
* have different checkpoint positions and hence different history file
* names, even if nothing happened in between.
*/
RequestCheckpoint(true, false);

This is a much simpler fix than people talking about writing their own
backup programs.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#25Joshua D. Drake
jd@commandprompt.com
In reply to: Jim C. Nasby (#22)
Re: Is full_page_writes=off safe in conjunction with

Come to think of it, someone not too long ago was proposing an API to
allow a 'PITR slave' to subscribe to a master for WAL segments/changes;
it seems logical to me for that API to also provide the ability to send
relation data as well.

Is that what replication is for?

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#24)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

This is a much simpler fix than people talking about writing their own
backup programs.

Well, it's still not exactly trivial. The hack that was being proposed
involved having the admin manually do

full_page_writes = ON (ie, edit config file and SIGHUP)
pg_start_backup
take backup dump
pg_stop_backup
full_page_writes = OFF (ie, edit config file and SIGHUP)

with some additions to pg_start_backup/pg_stop_backup to complain if
full_page_writes isn't ON. Aside from being a PITA, this isn't at
all secure, first for the obvious reason that we're only checking
full_page_writes at start/stop and not whether it was on for the whole
interval, and second because SIGHUP is asynchronous. Backends respond
to the signal when they feel like it (in practice, upon starting a new
interactive command) and so it'd be quite possible for a long-running
query to still be executing with full_page_writes off long after the
pg_start_backup has occurred.

If we were to do this, I'd want some more-bulletproof mechanism for
forcing full_page_writes on during the backup. We could probably
keep a "backup in progress" flag in shared memory, and examine that
along with the GUC variable before deciding to omit a full-page write.

I seem to recall that there were previous proposals for such a flag,
which I resisted because I didn't want any macroscopic user-visible
change in behavior during a backup. But forcing full-page WAL writes
is something I could live with as a "backup mode" behavior.

regards, tom lane

#27Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#26)
Re: Is full_page_writes=off safe in conjunction with PITR?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

This is a much simpler fix than people talking about writing their own
backup programs.

Well, it's still not exactly trivial. The hack that was being proposed
involved having the admin manually do

full_page_writes = ON (ie, edit config file and SIGHUP)
pg_start_backup
take backup dump
pg_stop_backup
full_page_writes = OFF (ie, edit config file and SIGHUP)

with some additions to pg_start_backup/pg_stop_backup to complain if
full_page_writes isn't ON. Aside from being a PITA, this isn't at
all secure, first for the obvious reason that we're only checking
full_page_writes at start/stop and not whether it was on for the whole
interval, and second because SIGHUP is asynchronous. Backends respond
to the signal when they feel like it (in practice, upon starting a new
interactive command) and so it'd be quite possible for a long-running
query to still be executing with full_page_writes off long after the
pg_start_backup has occurred.

If we were to do this, I'd want some more-bulletproof mechanism for
forcing full_page_writes on during the backup. We could probably
keep a "backup in progress" flag in shared memory, and examine that
along with the GUC variable before deciding to omit a full-page write.

I seem to recall that there were previous proposals for such a flag,
which I resisted because I didn't want any macroscopic user-visible
change in behavior during a backup. But forcing full-page WAL writes
is something I could live with as a "backup mode" behavior.

Yes, good point. The setting has to be seen by all backends at the same
time, so yea, a shared memory variable seems required.

The manual method is clearly a loser.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

If we were to do this, I'd want some more-bulletproof mechanism for
forcing full_page_writes on during the backup. We could probably
keep a "backup in progress" flag in shared memory, and examine that
along with the GUC variable before deciding to omit a full-page write.

Yes, good point. The setting has to be seen by all backends at the same
time, so yea, a shared memory variable seems required.

I've applied a patch for this. On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off. The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure. So that comment's been wrong all along.

regards, tom lane

#29Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#28)
Re: Is full_page_writes=off safe in conjunction with PITR?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

If we were to do this, I'd want some more-bulletproof mechanism for
forcing full_page_writes on during the backup. We could probably
keep a "backup in progress" flag in shared memory, and examine that
along with the GUC variable before deciding to omit a full-page write.

Yes, good point. The setting has to be seen by all backends at the same
time, so yea, a shared memory variable seems required.

I've applied a patch for this. On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off. The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure. So that comment's been wrong all along.

Great, yea, checkpoing syncs up the dirty buffers with the file system,
and it is true we need that to happen before the backup begins.

The idea of creating functions to mark start/stop of backup has clearly
been a win here.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#30Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#28)
Re: Is full_page_writes=off safe in conjunction with PITR?

On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:

I've applied a patch for this. On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off. The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure. So that comment's been wrong all along.

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#31Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jim C. Nasby (#30)
Re: Is full_page_writes=off safe in conjunction with PITR?

Jim C. Nasby wrote:

On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:

I've applied a patch for this. On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off. The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure. So that comment's been wrong all along.

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#32Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#31)
Re: Is full_page_writes=off safe in conjunction with PITR?

Bruce Momjian wrote:

Jim C. Nasby wrote:

On Mon, Apr 17, 2006 at 03:00:58PM -0400, Tom Lane wrote:

I've applied a patch for this. On reflection, the CHECKPOINT during
pg_start_backup was actually necessary for torn-page safety even without
full_page_writes off. The reason is that the torn-page risk occurs when
we write a page from shared memory, not when we modify it in memory.
Without a CHECKPOINT, a page modified just before pg_start_backup could
be dumped during the backup and then be saved in a torn state, even
though no WAL record for it is emitted anytime during the backup
procedure. So that comment's been wrong all along.

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.

Just to clarify, 8.1.4 will remove control for turning off
full_page_writes, but 8.2 will allow such control, and allow it can be
used with PITR because we will automatically turn it on during file
system backup.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#33Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: Is full_page_writes=off safe in conjunction with

On Sun, 2006-04-16 at 16:44 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

It seems that we should write an API to allow a backup device to ask for
blocks from the database.

I don't think we have the manpower or interest to develop and maintain
our own backup tool --- or tools, actually, as you'd at least want a tar
replacement and an rsync replacement. Oracle might be able to afford
to throw programmers at that sort of thing, but where are you going to
get volunteers for tasks as mind-numbing as maintaining a PG-specific
tar replacement?

Agreed. The only reason to do that would be to combine it with an
incremental backup solution also, so that some positive benefit also
came from the work.

I think an easier answer must be to make pg_start_backup() throw a
checkpoint, then hold any database writes until pg_stop_backup() is
called. (In the case of full_page_writes = off and fsync = on only).
That way all the data is fsynced to disk and the physical backup is
guaranteed to see whole blocks always, as we need it to.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com/

#34Hannu Krosing
hannu@skype.net
In reply to: Bruce Momjian (#31)
Re: Is full_page_writes=off safe in conjunction with

Ühel kenal päeval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:

Jim C. Nasby wrote:

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.

Except that people currently using full_page_writes=off on 8.1 may see a sudden
drop in performance after upgrading.

Do you have an estimate, how big the impact is ?

-----------
Hannu

#35Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hannu Krosing (#34)
Re: Is full_page_writes=off safe in conjunction with

Hannu Krosing wrote:

?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:

Jim C. Nasby wrote:

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.

Except that people currently using full_page_writes=off on 8.1 may see a sudden
drop in performance after upgrading.

Yea, but if it can cause corruption, we have no choice. It will be
mentioned in the release notes.

Do you have an estimate, how big the impact is ?

Nope.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#36Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#35)
Re: Is full_page_writes=off safe in conjunction with

On Tue, 2006-04-18 at 08:44 -0400, Bruce Momjian wrote:

Hannu Krosing wrote:

?hel kenal p?eval, E, 2006-04-17 kell 17:14, kirjutas Bruce Momjian:

Jim C. Nasby wrote:

Are you going to back-patch this? If I understand correctly current
behavior could mean people using PITR may have invalid backups. In the
meantime, perhaps we should send an email to -annouce recommending that
folks issue a CHEKCPOINT; after pg_start_backup and before initiating
the filesystem copy.

We are disabling full_page_writes for 8.1.4, so they should be fine.

Except that people currently using full_page_writes=off on 8.1 may see a sudden
drop in performance after upgrading.

Yea, but if it can cause corruption, we have no choice. It will be
mentioned in the release notes.

Perhaps would should make it more visible then that? The postgresql.org
website has said, PostgreSQL 8.1 released since it was... perhaps it is
time to make it say:

PostgreSQL 8.1.4 Critical Patch released?

Joshua D. Drake

Do you have an estimate, how big the impact is ?

Nope.

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/