Interaction of PITR backups and Bulk operations avoiding WAL

Started by Simon Riggsalmost 19 years ago20 messages
#1Simon Riggs
simon@2ndquadrant.com

Reviewing earlier threads, I realised that there was a potential
bug/loophole in PITR backups in conjunction with avoiding WAL for bulk
operations. This would be rare, but should be fixed.
http://archives.postgresql.org/pgsql-hackers/2006-05/msg01113.php

Say you issue COPY, CREATE INDEX etc..
pg_start_backup()
pg_stop_backup()
...then bulk operation ends.
This will result in a base backup that does not contain the data written
during the bulk operation and the changes aren't in WAL either.

I propose to fix this by making two new calls
bool RequestBulkCommandUseNoWAL(void)
void ResetBulkCommandUseNoWAL(void)

so we would use it like this
use_wal = RequestBulkCommandUseNoWAL()
and then at end of operation
if (!use_wal)
ResetBulkCommandUseNoWAL();

The routine would record a flag on the shmem ControlFile data that would
prevent pg_start backup functions from executing while a bulk operation
was in progress. It would also prevent a bulk operation from using no
WAL while a backup was in progress, as is already the case, since the
backup can only take place while archiving is enabled.

A new entry point pg_start_backup(text, bool) would allow the user to
specify whether to wait for bulk ops to finish, or not. The old entry
point would always wait, to ensure safety in all cases.

Thoughts?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: Interaction of PITR backups and Bulk operations avoiding WAL

"Simon Riggs" <simon@2ndquadrant.com> writes:

Say you issue COPY, CREATE INDEX etc..
pg_start_backup()
pg_stop_backup()
...then bulk operation ends.
This will result in a base backup that does not contain the data written
during the bulk operation and the changes aren't in WAL either.

Uh, no. The state of XLogArchivingActive() isn't affected by that.

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

regards, tom lane

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Interaction of PITR backups and Bulk operationsavoiding WAL

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Say you issue COPY, CREATE INDEX etc..
pg_start_backup()
pg_stop_backup()
...then bulk operation ends.
This will result in a base backup that does not contain the data written
during the bulk operation and the changes aren't in WAL either.

Uh, no. The state of XLogArchivingActive() isn't affected by that.

Sorry, error case should have been

Say you issue COPY, CREATE INDEX etc..
set archive_command
pg_ctl reload
pg_start_backup()
pg_stop_backup()
...then bulk operation ends.

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

As long as we allow it to be turned on/off during normal operation then
there is a current window of error.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#3)
Re: Interaction of PITR backups and Bulk operationsavoiding WAL

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

regards, tom lane

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#4)
Re: Interaction of PITR backups and Bulk operationsavoiding WAL

Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

So how to do a file based backup without permanent archiving? If
pg_start_backup would turn on archiving temporarily with forcing
archiving all WAL files that contain open transactions, this would be
possible. This is what's requested for sites where PITR isn't needed,
just filesystem level backup. Currently, this can be mimicked somehow by
turning on archiving on-the-fly, hoping that all xactions are in the WAL
archive when pg_start_backup is issued (Simons mail shows how this will
fail).

Regards,
Andreas

#7Csaba Nagy
nagy@ecircle-ag.com
In reply to: Tom Lane (#4)
Re: Interaction of PITR backups and Bulk operationsavoiding WAL

On Fri, 2007-03-09 at 17:47, Tom Lane wrote:

I don't think that people are very likely to need to turn archiving on
and off on-the-fly.

We did need occasionally to turn archiving on on-the-fly. It did happen
that I started up a new DB machine and I did not have yet the log
archive available, so I had to wait with configuring that, but the
machine went on-line before the archive machine was ready... and then
later I had to switch on archiving. It was very convenient that I could
do it without a restart.

It's true that has been rare occasion, more often you just need to
change the archive command (e.g. to archive to a different location if
the archive repository goes down).

It's somewhat moot for us as we changed to use Slony (which is a heavy
beast but once it works it's great).

Cheers,
Csaba.

#8Jim C. Nasby
jim@nasby.net
In reply to: Simon Riggs (#5)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

On Fri, Mar 09, 2007 at 04:57:18PM +0000, Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

I think the docs should also explain why it's server-start only, since
if someone wanted to they could circumvent the behavior by having
archive_command call a shell script that changes it's behavior.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#8)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

"Jim C. Nasby" <jim@nasby.net> writes:

On Fri, Mar 09, 2007 at 04:57:18PM +0000, Simon Riggs wrote:

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

I think the docs should also explain why it's server-start only, since
if someone wanted to they could circumvent the behavior by having
archive_command call a shell script that changes it's behavior.

Um, what's the problem with that? The concern was about whether PG
would produce consistent WAL output, not whether the archive_command
actually needed to do anything.

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

Where is this patch?

---------------------------------------------------------------------------

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#10)
Re: Interaction of PITR backups andBulkoperationsavoiding WAL

On Mon, 2007-04-02 at 19:09 -0400, Bruce Momjian wrote:

Where is this patch?

see Hackers thread: "Minor changes to Recovery related code", Mar 30

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#12Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

I _still_ have no patch for this.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#12)
Re: Interaction of PITR backups andBulkoperationsavoiding WAL

On Wed, 2007-04-04 at 22:05 -0400, Bruce Momjian wrote:

I _still_ have no patch for this.

Bruce,

As I've mentioned, there is another thread where the discussion
continued, which you should refer to.

The subject of this thread is a potential bug that has existed since 8.1
and that I recently picked up on. Knowing that fixing bugs was OK after
freeze, I thought it best to finish the features I was working on first.
Even so, before freeze I requested some additional time to work on some
related minor items.

So there's nothing overdue now and in any case will be finished shortly.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#14Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#13)
Re: Interaction of PITR backups andBulkoperationsavoiding WAL

Simon Riggs wrote:

On Wed, 2007-04-04 at 22:05 -0400, Bruce Momjian wrote:

I _still_ have no patch for this.

Bruce,

As I've mentioned, there is another thread where the discussion
continued, which you should refer to.

The subject of this thread is a potential bug that has existed since 8.1
and that I recently picked up on. Knowing that fixing bugs was OK after
freeze, I thought it best to finish the features I was working on first.
Even so, before freeze I requested some additional time to work on some
related minor items.

So there's nothing overdue now and in any case will be finished shortly.

OK, I was unaware you were still working on this item.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

Simon, has this patch been submitted?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: Interaction of PITR backups andBulkoperationsavoiding WAL

On Thu, 2007-04-26 at 18:51 -0400, Bruce Momjian wrote:

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

Simon, has this patch been submitted?

Yes

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#17Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#16)
Re: Interaction of PITR backups andBulkoperationsavoiding WAL

Simon Riggs wrote:

On Thu, 2007-04-26 at 18:51 -0400, Bruce Momjian wrote:

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

Simon, has this patch been submitted?

Yes

Uh, do you see it in the patch queue now? If not, what was the subject
line.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#19Jim C. Nasby
decibel@decibel.org
In reply to: Bruce Momjian (#18)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

Simon intended to commit this per
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01761.php
(actually, there was a change in what was being done). I suspect this
item isn't valid any longer.

On Tue, May 15, 2007 at 07:30:58PM -0400, Bruce Momjian wrote:

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#20Bruce Momjian
bruce@momjian.us
In reply to: Jim C. Nasby (#19)
Re: Interaction of PITR backups and Bulkoperationsavoiding WAL

OK, removed.

---------------------------------------------------------------------------

Jim C. Nasby wrote:

Simon intended to commit this per
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01761.php
(actually, there was a change in what was being done). I suspect this
item isn't valid any longer.

On Tue, May 15, 2007 at 07:30:58PM -0400, Bruce Momjian wrote:

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:

On Fri, 2007-03-09 at 11:47 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Fri, 2007-03-09 at 11:15 -0500, Tom Lane wrote:

It strikes me that allowing archive_command to be changed on the fly
might not be such a good idea though, or at least it shouldn't be
possible to flip it from empty to nonempty during live operation.

I'd rather fix it the proposed way than force a restart. ISTM wrong to
have an availability feature cause downtime.

I don't think that people are very likely to need to turn archiving on
and off on-the-fly. Your proposed solution introduces a great deal of
complexity (and risk of future bugs-of-omission, to say nothing of race
conditions) to solve a non-problem. We have better things to be doing
with our development time.

It's certainly a quicker fix. Unless others object, I'll set
archive_command to only be changeable at server startup.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +