Problem with PITR recovery

Started by Bruce Momjianover 20 years ago62 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

I had a problem using PITR recovery just now. If I do:

SELECT pg_start_backup('label');
do my tar
SELECT pg_stop_backup();

and stop the server, delete /data, then recover from the tar, delete
files in pg_xlog, then set recovery.conf to restore, it fails, I think
because no actual pg_xlog file was archived since the tar.

The problem is that we don't archive the partially written xlog file,
and in this case that xlog file contains the information needed to make
the tar file consistent.

Is this a known problem? Do we document this? If so, I can't find it.

I am concerned about folks cleaning out their archive directory after
the pg_stop_backup() not realizing they need that last xlog file to make
the tar valid.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The problem is that we don't archive the partially written xlog file,
and in this case that xlog file contains the information needed to make
the tar file consistent.

Is this a known problem? Do we document this? If so, I can't find it.

Yes, and yes. You did not follow the procedure:

http://www.postgresql.org/docs/8.0/static/backup-online.html#BACKUP-PITR-RECOVERY

In particular, step 2 says:

: ... you need at the least to copy the contents of the pg_xlog
: subdirectory of the cluster data directory, as it may contain logs which
: were not archived before the system went down.

Possibly this needs to be highlighted a little better.

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
1 attachment(s)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The problem is that we don't archive the partially written xlog file,
and in this case that xlog file contains the information needed to make
the tar file consistent.

Is this a known problem? Do we document this? If so, I can't find it.

Yes, and yes. You did not follow the procedure:

http://www.postgresql.org/docs/8.0/static/backup-online.html#BACKUP-PITR-RECOVERY

In particular, step 2 says:

: ... you need at the least to copy the contents of the pg_xlog
: subdirectory of the cluster data directory, as it may contain logs which
: were not archived before the system went down.

Possibly this needs to be highlighted a little better.

I figured that part of the goal of PITR was that you could recover from
just the tar backup and archived WAL files --- using the pg_xlog
contents is nice, but not something we can require.

I understood the last missing WAL log would cause missing information,
but not that it would make the tar backup unusable.

It would be nice if we could force a new WAL file on pg_stop_backup()
and archive the WAL file needed to match the tar file. How hard would
that be?

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.60
diff -c -c -r2.60 backup.sgml
*** doc/src/sgml/backup.sgml	23 Mar 2005 19:38:53 -0000	2.60
--- doc/src/sgml/backup.sgml	17 Apr 2005 03:04:35 -0000
***************
*** 733,740 ****
      the backup history file will be named something like
      <literal>0000000100001234000055CD.007C9330.backup</>.  (The second part of
      this file name stands for an exact position within the WAL file, and can
!     ordinarily be ignored.)  Once you have safely archived the backup dump
!     file, you can delete all archived WAL segments with names numerically
      preceding this one.  The backup history file is just a small text file.
      It contains the label string you gave to <function>pg_start_backup</>, as
      well as the starting and ending times of the backup.  If you used the
--- 733,740 ----
      the backup history file will be named something like
      <literal>0000000100001234000055CD.007C9330.backup</>.  (The second part of
      this file name stands for an exact position within the WAL file, and can
!     ordinarily be ignored.)  Once you have safely archived this WAL
!     segment file, you can delete all archived WAL segments with names numerically
      preceding this one.  The backup history file is just a small text file.
      It contains the label string you gave to <function>pg_start_backup</>, as
      well as the starting and ending times of the backup.  If you used the
#4Ragnar Hafstað
gnari@simnet.is
In reply to: Bruce Momjian (#3)
Re: Problem with PITR recovery

On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
[about backup procedure with PITR documentation

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

Doesn't it refer to the backup file itself (the tar file of the data
directory) ?
You do not want to start deleting WAL segments until that one is safely
archived.

gnari

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Ragnar Hafstað (#4)
Re: Problem with PITR recovery

Ragnar Hafsta��� wrote:

On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:
[about backup procedure with PITR documentation

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

Doesn't it refer to the backup file itself (the tar file of the data
directory) ?

No. That is what I thought it meant on first reading, but looking
closer it is referring to the numbered file, and the tar file has no
specific number.

You do not want to start deleting WAL segments until that one is safely
archived.

Right, but the point of the paragraph is that you need the WAL file that
goes with the backup history file number.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#3)
Re: Problem with PITR recovery

Bruce Momjian wrote:

I figured that part of the goal of PITR was that you could recover from
just the tar backup and archived WAL files --- using the pg_xlog
contents is nice, but not something we can require.

I understood the last missing WAL log would cause missing information,
but not that it would make the tar backup unusable.

It would be nice if we could force a new WAL file on pg_stop_backup()
and archive the WAL file needed to match the tar file. How hard would
that be?

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

I found that the docs mention above are inaccurate because they state
you only need the WAL segment used at the start of the file system
backup, while you really need all the WAL segments used _during_ the
backup before you can safely delete the older WAL segments. Here is
updated text I have applied to HEAD and 8.0.X:

Once you have safely archived the WAL segment files used during the file
system backup (as specified in the backup history file), you can delete
all archived WAL segments with names numerically less. Keep in mind that
only completed WAL segment files are archived, so there will be delay
between running pg_stop_backup and the archiving of all WAL segment
files needed to make the file system backup consistent.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#6)
Re: Problem with PITR recovery

pgman wrote:

I figured that part of the goal of PITR was that you could recover from
just the tar backup and archived WAL files --- using the pg_xlog
contents is nice, but not something we can require.

I understood the last missing WAL log would cause missing information,
but not that it would make the tar backup unusable.

It would be nice if we could force a new WAL file on pg_stop_backup()
and archive the WAL file needed to match the tar file. How hard would
that be?

Added to TODO:

* Force archiving of partially-full WAL files when pg_stop_backup() is
called or the server is stopped

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Jeff Davis
jdavis-pgsql@empires.org
In reply to: Bruce Momjian (#6)
Re: Problem with PITR recovery

I could still use a little clarification. It seems sort of like there is
an extra step, like:

(1) start archiving
(2) pg_start_backup()
(3) copy PGDATA directory with tar
(4) pg_stop_backup()
(5) ??

And the text you have at
http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html

says: "To make use of this backup, you will need to keep around all the
WAL segment files generated during and after the file system backup.".

How long after? Wouldn't you be keeping the WAL segments afterward
anyway by archiving?

I've tested and been able to recover using PITR before, but I'd like a
little clarification on the steps to make absolutely sure that the base
backup I have is viable.

Can you sort of run through the failure case again, and how to prevent
it?

Regards,
Jeff Davis

Show quoted text

On Sun, 2005-04-17 at 21:38 -0400, Bruce Momjian wrote:

Bruce Momjian wrote:

I figured that part of the goal of PITR was that you could recover from
just the tar backup and archived WAL files --- using the pg_xlog
contents is nice, but not something we can require.

I understood the last missing WAL log would cause missing information,
but not that it would make the tar backup unusable.

It would be nice if we could force a new WAL file on pg_stop_backup()
and archive the WAL file needed to match the tar file. How hard would
that be?

I see in the docs:

To make use of this backup, you will need to keep around all the WAL
segment files generated at or after the starting time of the backup. To
aid you in doing this, the pg_stop_backup function creates a backup
history file that is immediately stored into the WAL archive area. This
file is named after the first WAL segment file that you need to have to
make use of the backup. For example, if the starting WAL file is
0000000100001234000055CD the backup history file will be named something
like 0000000100001234000055CD.007C9330.backup. (The second part of this
file name stands for an exact position within the WAL file, and can
ordinarily be ignored.) Once you have safely archived the backup dump
file, you can delete all archived WAL segments with names numerically
preceding this one.

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

I found that the docs mention above are inaccurate because they state
you only need the WAL segment used at the start of the file system
backup, while you really need all the WAL segments used _during_ the
backup before you can safely delete the older WAL segments. Here is
updated text I have applied to HEAD and 8.0.X:

Once you have safely archived the WAL segment files used during the file
system backup (as specified in the backup history file), you can delete
all archived WAL segments with names numerically less. Keep in mind that
only completed WAL segment files are archived, so there will be delay
between running pg_stop_backup and the archiving of all WAL segment
files needed to make the file system backup consistent.

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jeff Davis (#8)
Re: Problem with PITR recovery

Jeff Davis wrote:

I could still use a little clarification. It seems sort of like there is
an extra step, like:

(1) start archiving
(2) pg_start_backup()
(3) copy PGDATA directory with tar
(4) pg_stop_backup()
(5) ??

And the text you have at
http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html

says: "To make use of this backup, you will need to keep around all the
WAL segment files generated during and after the file system backup.".

How long after? Wouldn't you be keeping the WAL segments afterward
anyway by archiving?

I've tested and been able to recover using PITR before, but I'd like a
little clarification on the steps to make absolutely sure that the base
backup I have is viable.

Can you sort of run through the failure case again, and how to prevent
it?

The failure case in the original docs is that you do your
pg_stop_backup(), and then delete all the WAL file before the *.backup
file that was just created. However, you do not have a valid tar backup
until you have archived all the WAL files used from the *.backup WAL
file up to the WAL file that was active at pg_stop_backup(), which is
mentioned in the *.backup file. If you went and deleted your old WAL
files anyway, without waiting for those other WAL files to be archived,
and your disk drive crashed, you wouldn't have a tar backup you could
use, and you had deleted the old WAL files you would have needed to
recover your previous tar backup.

Is there something in the current wording that needs clarification?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Jeff Davis
jdavis-pgsql@empires.org
In reply to: Bruce Momjian (#9)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 00:20 -0400, Bruce Momjian wrote:

Jeff Davis wrote:

Can you sort of run through the failure case again, and how to prevent
it?

The failure case in the original docs is that you do your
pg_stop_backup(), and then delete all the WAL file before the *.backup
file that was just created. However, you do not have a valid tar backup
until you have archived all the WAL files used from the *.backup WAL
file up to the WAL file that was active at pg_stop_backup(), which is
mentioned in the *.backup file. If you went and deleted your old WAL
files anyway, without waiting for those other WAL files to be archived,
and your disk drive crashed, you wouldn't have a tar backup you could
use, and you had deleted the old WAL files you would have needed to
recover your previous tar backup.

Is there something in the current wording that needs clarification?

So, as I understand it: everything works great as long as everything has
been archived up to and including the WAL file that was active when you
did pg_stop_backup(). However, if you do pg_stop_backup() and
immediately delete PGDATA (before any WAL files are archived), the
backup may fail.

I think, to clear it up a little, you might add a step 5 before saying
"If this returns successfully, you're done.", so that people know for
sure that they get a good base backup. It actually seems like something
that maybe pg_stop_backup() should do in the future.

It's a little unclear how you tell which WAL segment was active during
pg_stop_backup(), but that shouldn't be a practical concern since you
can just manually archive them all.

Maybe step 5 could be something like:
(5) Make a copy of all WAL segments above XXXX.backup and store with the
base backup. When it's time to recover, if those WAL segments were not
properly archived, you need to have them available.

(probably needs rewording)

Regards,
Jeff Davis

#11Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#9)
Re: Problem with PITR recovery

On Mon, 18 Apr 2005, Bruce Momjian wrote:

Jeff Davis wrote:

I could still use a little clarification. It seems sort of like there is
an extra step, like:

(1) start archiving
(2) pg_start_backup()
(3) copy PGDATA directory with tar
(4) pg_stop_backup()
(5) ??

And the text you have at
http://candle.pha.pa.us/main/writings/pgsql/sgml/backup-online.html

says: "To make use of this backup, you will need to keep around all the
WAL segment files generated during and after the file system backup.".

How long after? Wouldn't you be keeping the WAL segments afterward
anyway by archiving?

I've tested and been able to recover using PITR before, but I'd like a
little clarification on the steps to make absolutely sure that the base
backup I have is viable.

Can you sort of run through the failure case again, and how to prevent
it?

The failure case in the original docs is that you do your
pg_stop_backup(), and then delete all the WAL file before the *.backup
file that was just created. However, you do not have a valid tar backup
until you have archived all the WAL files used from the *.backup WAL
file up to the WAL file that was active at pg_stop_backup(), which is
mentioned in the *.backup file. If you went and deleted your old WAL
files anyway, without waiting for those other WAL files to be archived,
and your disk drive crashed, you wouldn't have a tar backup you could
use, and you had deleted the old WAL files you would have needed to
recover your previous tar backup.

Is there something in the current wording that needs clarification?

I'd say it's very not cool :) It's not we all expected from PITR.
I recall now Simon mentioned about that and have it in his TODO.
Other thing I don't understand what's the problem to generate WAL file
by demand ? Probably, TODO should says about this.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#12Rob Butler
crodster2k@yahoo.com
In reply to: Oleg Bartunov (#11)
Re: Problem with PITR recovery

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

The reason for this is "the Friday night" scenario.

Let's say you have your WAL's FTP'd to a remote server
off-site. Friday at 4:50 PM Postgres starts a new
WAL, and everyone goes home for the weekend at 5pm.
No activity occurs on the database all weekend long,
so the new WAL never fills and is never closed. If
something should happen during the weekend, and the
disks are ruined on the PG DB server, the last WAL is
never sent to the remote off-site server. The last
transactions of the day are lost, even though they
could have taken place days ago. With feature 3, you
can guarantee that the oldest WAL is XX minutes old,
so at least you have all the transactions within the
last XX minutes.

Of course feature #3 also needs to have some smarts to
it, so it doesn't create a bunch of completely empty
WAL's everytime the timer runs out. It should only
write and close the WAL if there is actually some new
data in it.

Later
Rob

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jeff Davis (#10)
Re: Problem with PITR recovery

Jeff Davis wrote:

On Mon, 2005-04-18 at 00:20 -0400, Bruce Momjian wrote:

Jeff Davis wrote:

Can you sort of run through the failure case again, and how to prevent
it?

The failure case in the original docs is that you do your
pg_stop_backup(), and then delete all the WAL file before the *.backup
file that was just created. However, you do not have a valid tar backup
until you have archived all the WAL files used from the *.backup WAL
file up to the WAL file that was active at pg_stop_backup(), which is
mentioned in the *.backup file. If you went and deleted your old WAL
files anyway, without waiting for those other WAL files to be archived,
and your disk drive crashed, you wouldn't have a tar backup you could
use, and you had deleted the old WAL files you would have needed to
recover your previous tar backup.

Is there something in the current wording that needs clarification?

So, as I understand it: everything works great as long as everything has
been archived up to and including the WAL file that was active when you
did pg_stop_backup(). However, if you do pg_stop_backup() and
immediately delete PGDATA (before any WAL files are archived), the
backup may fail.

Right, and that is the issue that wasn't documented before, and I was
even unclear about it myself when testing initially.

I think, to clear it up a little, you might add a step 5 before saying
"If this returns successfully, you're done.", so that people know for

I see your point. New text is:

4 Again connect to the database as a superuser, and issue the command

SELECT pg_stop_backup();

This should return successfully.

5 Once the WAL segment files used during the backup are archived as
part of normal database activity, you are done.

sure that they get a good base backup. It actually seems like something
that maybe pg_stop_backup() should do in the future.

Yes, I added that to the TODO list:

* Force archiving of partially-full WAL files when pg_stop_backup() is
called or the server is stopped

Doing this will allow administrators to know more easily when the
archive contins all the files needed for point-in-time recovery.

It's a little unclear how you tell which WAL segment was active during
pg_stop_backup(), but that shouldn't be a practical concern since you
can just manually archive them all.

We do have this sentence:

Once you have safely archived the WAL segment files used during the file
system backup (as specified in the backup history file), you can delete
all archived WAL segments with names numerically less.

The information is actually in the *.backup file. I think that is the
only way to know.

And you can't manually copy the WAL files to the archive because they
aren't full and the recommended archive_command will fail if those files
are already in the archive. You could copy them off somewhere else, I
suppose.

Maybe step 5 could be something like:
(5) Make a copy of all WAL segments above XXXX.backup and store with the
base backup. When it's time to recover, if those WAL segments were not
properly archived, you need to have them available.

Again, that doesn't work because of the "no overwrite" behavior of the
archive_command.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Oleg Bartunov
oleg@sai.msu.su
In reply to: Rob Butler (#12)
Re: Problem with PITR recovery

On Mon, 18 Apr 2005, Rob Butler wrote:

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

The reason for this is "the Friday night" scenario.

This is exactly what I'm worry about ! Very typical
scenario. I hope PITR improvement could be done in
8.0.X development cycle.

Let's say you have your WAL's FTP'd to a remote server
off-site. Friday at 4:50 PM Postgres starts a new
WAL, and everyone goes home for the weekend at 5pm.
No activity occurs on the database all weekend long,
so the new WAL never fills and is never closed. If
something should happen during the weekend, and the
disks are ruined on the PG DB server, the last WAL is
never sent to the remote off-site server. The last
transactions of the day are lost, even though they
could have taken place days ago. With feature 3, you
can guarantee that the oldest WAL is XX minutes old,
so at least you have all the transactions within the
last XX minutes.

Of course feature #3 also needs to have some smarts to
it, so it doesn't create a bunch of completely empty
WAL's everytime the timer runs out. It should only
write and close the WAL if there is actually some new
data in it.

Later
Rob

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#15Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Oleg Bartunov (#11)
Re: Problem with PITR recovery

Oleg Bartunov wrote:

Is there something in the current wording that needs clarification?

I'd say it's very not cool :) It's not we all expected from PITR.
I recall now Simon mentioned about that and have it in his TODO.
Other thing I don't understand what's the problem to generate WAL file
by demand ? Probably, TODO should says about this.

Yes, we have TODO items for that and I added another one yesterday.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Noname
simon@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: Re: Problem with PITR recovery

Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20:

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

Yes, I have been working on a design.

1) is required to make PITR better for low transaction rate users.

3) is required to allow standby replication

2) is a standard feature on other DBMS, but I'd have to consider that as
optional.

Anyway, I'll post more in a few hours on this.

Best Regards, Simon Riggs

#17Greg Stark
gsstark@mit.edu
In reply to: Bruce Momjian (#13)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I see your point. New text is:

4 Again connect to the database as a superuser, and issue the command

SELECT pg_stop_backup();

This should return successfully.

5 Once the WAL segment files used during the backup are archived as
part of normal database activity, you are done.

sure that they get a good base backup. It actually seems like something
that maybe pg_stop_backup() should do in the future.

Yes, I added that to the TODO list:

* Force archiving of partially-full WAL files when pg_stop_backup() is
called or the server is stopped

You could even make pg_stop_backup() hang until that's complete.

--
greg

#18Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Rob Butler (#12)
Re: Problem with PITR recovery

OK, I updated the two current TODO items:

* Allow point-in-time recovery to archive partially filled write-ahead
logs

Currently only full WAL files are archived. This means that the most
recent transactions aren't available for recovery in case of a disk
failure. This could be triggered by a user command or a timer.

* Automatically force archiving of partially-filled WAL files when
pg_stop_backup() is called or the server is stopped

Doing this will allow administrators to know more easily when the
archive contins all the files needed for point-in-time recovery.

Is this OK?

---------------------------------------------------------------------------

Rob Butler wrote:

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

The reason for this is "the Friday night" scenario.

Let's say you have your WAL's FTP'd to a remote server
off-site. Friday at 4:50 PM Postgres starts a new
WAL, and everyone goes home for the weekend at 5pm.
No activity occurs on the database all weekend long,
so the new WAL never fills and is never closed. If
something should happen during the weekend, and the
disks are ruined on the PG DB server, the last WAL is
never sent to the remote off-site server. The last
transactions of the day are lost, even though they
could have taken place days ago. With feature 3, you
can guarantee that the oldest WAL is XX minutes old,
so at least you have all the transactions within the
last XX minutes.

Of course feature #3 also needs to have some smarts to
it, so it doesn't create a bunch of completely empty
WAL's everytime the timer runs out. It should only
write and close the WAL if there is actually some new
data in it.

Later
Rob

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Oleg Bartunov (#14)
Re: Problem with PITR recovery

Oleg Bartunov wrote:

On Mon, 18 Apr 2005, Rob Butler wrote:

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

The reason for this is "the Friday night" scenario.

This is exactly what I'm worry about ! Very typical
scenario. I hope PITR improvement could be done in
8.0.X development cycle.

Yes, I described this exact scenario during a talk I gave on Saturday.
I think the only way to do this for 8.0.X now is to run a cron job that
just copies pg_xlog off to another location every so often.

Of course, there is the risk that your cron copy will fail in the
middle, leaving the WAL file corrupt. You would have to copy to a
temporary directory, then once that succeeds, move the files to overlay
the previous copies.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#20Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Greg Stark (#17)
Re: Problem with PITR recovery

Greg Stark wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I see your point. New text is:

4 Again connect to the database as a superuser, and issue the command

SELECT pg_stop_backup();

This should return successfully.

5 Once the WAL segment files used during the backup are archived as
part of normal database activity, you are done.

sure that they get a good base backup. It actually seems like something
that maybe pg_stop_backup() should do in the future.

Yes, I added that to the TODO list:

* Force archiving of partially-full WAL files when pg_stop_backup() is
called or the server is stopped

You could even make pg_stop_backup() hang until that's complete.

You mean don't force the archive copy but just have pg_stop_backup()
hang until the files fill? Yea, we could do that, but there is no way
to know how long the hang might take.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#21Greg Stark
gsstark@mit.edu
In reply to: Bruce Momjian (#20)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

You mean don't force the archive copy but just have pg_stop_backup()
hang until the files fill? Yea, we could do that, but there is no way
to know how long the hang might take.

Actually I meant both.

--
greg

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Ragnar Hafsta� wrote:

On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

Doesn't it refer to the backup file itself (the tar file of the data
directory) ?

No. That is what I thought it meant on first reading, but looking
closer it is referring to the numbered file, and the tar file has no
specific number.

Yes, that is exactly what it meant, and your patch has destroyed the
meaning.

regards, tom lane

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#18)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

OK, I updated the two current TODO items:
* Automatically force archiving of partially-filled WAL files when
pg_stop_backup() is called or the server is stopped

Is this OK?

Archive on stop is right out. The common reason for a stop is that the
system is being shut down, and we don't have time to archive a WAL file
before init will kill -9 us.

regards, tom lane

#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#23)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

OK, I updated the two current TODO items:
* Automatically force archiving of partially-filled WAL files when
pg_stop_backup() is called or the server is stopped

Is this OK?

Archive on stop is right out. The common reason for a stop is that the
system is being shut down, and we don't have time to archive a WAL file
before init will kill -9 us.

Ah, good point. Can we do it for 'smart' shutdown mode, which is the
default? I see server stop scripts using 'fast' where we would not do
the WAL archive.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#22)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Ragnar Hafsta��� wrote:

On Sat, 2005-04-16 at 23:06 -0400, Bruce Momjian wrote:

I am not clear on what the "backup dump file" is? I assume it means
0000000100001234000055CD. It is called "WAL segment file" above. I
will rename that phrase to match the above terminology. Patch attached
and applied.

Doesn't it refer to the backup file itself (the tar file of the data
directory) ?

No. That is what I thought it meant on first reading, but looking
closer it is referring to the numbered file, and the tar file has no
specific number.

Yes, that is exactly what it meant, and your patch has destroyed the
meaning.

The sentence was:

Once you have safely archived the backup dump file, you can delete all
archived WAL segments with names numerically preceding this one.

so you were saying:

Once you have safely archived the file system backup, you can delete all
archived WAL segments with names numerically preceding this one.

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

The current text version is in CVS.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#25)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

regards, tom lane

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#24)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Archive on stop is right out. The common reason for a stop is that the
system is being shut down, and we don't have time to archive a WAL file
before init will kill -9 us.

Ah, good point. Can we do it for 'smart' shutdown mode, which is the
default? I see server stop scripts using 'fast' where we would not do
the WAL archive.

[ thinks about it... ] Yeah, that seems doable, since 'smart' mode by
definition isn't making any promises about getting out of town quick.

However, would it really be all that helpful to do that? I'm not sure
I trust a backup methodology that depends on having shut down the server
in "the right way".

It seems reasonable to me to have pg_stop_backup() close the current WAL
segment, and also to have some time-limit-driven mechanism for doing so.
What's the use-case for doing it on postmaster stop, though?

regards, tom lane

#28Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#26)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

I modified the sentence to say:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
you can delete all archived WAL segments with names numerically less.

Feel free to whack it a second time.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#29Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#27)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Archive on stop is right out. The common reason for a stop is that the
system is being shut down, and we don't have time to archive a WAL file
before init will kill -9 us.

Ah, good point. Can we do it for 'smart' shutdown mode, which is the
default? I see server stop scripts using 'fast' where we would not do
the WAL archive.

[ thinks about it... ] Yeah, that seems doable, since 'smart' mode by
definition isn't making any promises about getting out of town quick.

However, would it really be all that helpful to do that? I'm not sure
I trust a backup methodology that depends on having shut down the server
in "the right way".

It seems reasonable to me to have pg_stop_backup() close the current WAL
segment, and also to have some time-limit-driven mechanism for doing so.
What's the use-case for doing it on postmaster stop, though?

I am thinking someone runs a tar backup at night, shuts down the server
the next day, and goes to recover to a new machine. Wouldn't they think
the shutdown server had flushed all its archive logs? I would.

I guess I would expect some kind of sanity in how the logs are kept.
Our current "keep the last one active" is a pretty strange user
interface and I think a shutdown server should give a resonable API, and
I think that includes flushing logs. In fact, considering we would have
a timer, you could argue that a shutdown could be down for a very long
time and flushing the archive logs would make sense.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#30Simon Riggs
simon@2ndquadrant.com
In reply to: Noname (#16)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 16:44 +0200, simon@2ndquadrant.com wrote:

Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20:

I'd say it's very not cool :) It's not we all
expected from PITR.
I recall now Simon mentioned about that and have it
in his TODO.
Other thing I don't understand what's the problem to
generate WAL file
by demand ? Probably, TODO should says about this.

This would definetly be a good feature to have. What
I would prefer is:

1) have the pitr stop command write out and close the
WAL that it is currently using.

2) have another stored proc which can be invoked at
any time that will write out and close the WAL that is
currently in use when that command is executed.

3) have a feature in postgres that will automatically
write out and close the WAL if the server hasn't had
any activity in XX minutes, or hasn't closed a WAL
file in XX minutes.

Yes, I have been working on a design.

1) is required to make PITR better for low transaction rate users.

3) is required to allow standby replication

2) is a standard feature on other DBMS, but I'd have to consider that as
optional.

My plan would be to write a special xlog record for xlog switching. This
would be a special processing instruction, rather than a data/redo
instructions. This would be implemented as another xlog info value on
the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo
would simply set a variable to be used elsewhere.)

When written the xlog switch instruction (XLogInsert) would switch to a
new xlog, just as if a file had been filled, causing it to be
immediately archived. On wal replay, ReadRecord would read the
instruction, then react by moving to the next file, as if it had
naturally reached EOF.

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems. That is additional
functionality that I would add later when the above all works...

That would be initiated through a single function pg_walfile_switch()
which would be called from
1) pg_stop_backup()
2) by user command
3) at a specified timeout within archiver (already built in)

A shutdown checkpoint would also have the same effect as an
XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
away the file. Otherwise, we'd have a problem as to which order to write
the messages in at shutdown time. (Not happy about that bit, so
suggestions welcome...)

I'd suggest this as a backpatch for 8.0.x, when completed. I'll commit
to doing this in time for 8.1, possibly sooner.

Comments?

Best Regards, Simon Riggs

#31Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#28)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

I modified the sentence to say:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
you can delete all archived WAL segments with names numerically less.

Feel free to whack it a second time.

whack...

...you can delete all archived WAL segments with names numerically
less.

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

Best Regards, Simon Riggs

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#30)
Re: Problem with PITR recovery

Simon Riggs <simon@2ndquadrant.com> writes:

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

That would be initiated through a single function pg_walfile_switch()
which would be called from
1) pg_stop_backup()
2) by user command
3) at a specified timeout within archiver (already built in)

I would really, really, like NOT to have a user command for this.
(If pg_stop_backup does it, that already provides an out for anyone
who thinks they need to invoke it manually.)

A shutdown checkpoint would also have the same effect as an
XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
away the file.

The archiver is stopped before we do the shutdown, no?

I'd suggest this as a backpatch for 8.0.x, when completed.

Not a chance --- it's a new feature, not a bug fix, and has substantial
risk of breaking things.

regards, tom lane

#33Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#32)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 19:21 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

Yeh, there's just too many references to the file length for comfort.

That would be initiated through a single function pg_walfile_switch()
which would be called from
1) pg_stop_backup()
2) by user command
3) at a specified timeout within archiver (already built in)

I would really, really, like NOT to have a user command for this.
(If pg_stop_backup does it, that already provides an out for anyone
who thinks they need to invoke it manually.)

Actually, me too. Never saw the need for the Oracle command myself.

A shutdown checkpoint would also have the same effect as an
XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
away the file.

The archiver is stopped before we do the shutdown, no?

Currently, the bgwriter issues the Shutdown checkpoint and the archiver
is always stopped after the bgwriter has issued the checkpoint and quit.
It should be possible to send archiver a signal to attempt any remaining
archiving before shutdown.

Of course, this behaviour would only be initiated when
XLogArchivingActive() is true, since it makes no sense otherwise.

I'd suggest this as a backpatch for 8.0.x, when completed.

Not a chance --- it's a new feature, not a bug fix, and has substantial
risk of breaking things.

No problem for me personally; I only request it, according to users
wishes.

Best Regards, Simon Riggs

#34Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#32)
Re: Problem with PITR recovery

Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

Good point. I don't see non-full WAL archiving as a problem for the
backup or shutdown, but I do see an issue with doing archives every X
seconds. If someone sets that really low (and someone will) we could
easily fill the disk. However, rather than do it ourselves, maybe we
should make it visible to administrators so they know exactly what is
happening and can undo it in case they need to recover, something like:

archive_command = 'gzip <%p >%f'

so the compression is done in a way that is visible to the
administrator.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#35Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#31)
Re: Problem with PITR recovery

Simon Riggs wrote:

On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

I modified the sentence to say:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
you can delete all archived WAL segments with names numerically less.

Feel free to whack it a second time.

whack...

...you can delete all archived WAL segments with names numerically
less.

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

OK, new wording:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
all archived WAL segments with names numerically less are no longer
needed to recover the file system backup and may be deleted. However,
you should consider keeping several backup sets to be absolutely certain
that you are can recover your data. Keep in mind that only completed WAL
segment files are archived, so there will be delay between running
<function>pg_stop_backup</> and the archiving of all WAL segment files
needed to make the file system backup consistent.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#36Oleg Bartunov
oleg@sai.msu.su
In reply to: Simon Riggs (#31)
Re: Problem with PITR recovery

On Mon, 18 Apr 2005, Simon Riggs wrote:

On Mon, 2005-04-18 at 13:41 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I guess I didn't see the connection between the file system backup and
the WAL files, when in fact you need the WAL files that go with the file
system badckup to do the recovery. Do you have new suggested text?

I think it probably needs to mention *both* the tar dump and the WAL
segment file(s). I can take a whack at it if you like.

I modified the sentence to say:

Once you have safely archived the file system backup and the WAL segment
files used during the backup (as specified in the backup history file),
you can delete all archived WAL segments with names numerically less.

Feel free to whack it a second time.

whack...

...you can delete all archived WAL segments with names numerically
less.

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

I see that clear and deterministic procedure of online backup as I imagined
earlier becomes fuzzy and blurred :) This is obviously not suited even
for my notebook.

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#37Oleg Bartunov
oleg@sai.msu.su
In reply to: Simon Riggs (#33)
Re: Problem with PITR recovery

On Tue, 19 Apr 2005, Simon Riggs wrote:

I'd suggest this as a backpatch for 8.0.x, when completed.

Not a chance --- it's a new feature, not a bug fix, and has substantial
risk of breaking things.

No problem for me personally; I only request it, according to users
wishes.

Users wish deterministic procedure of online backup. Well, it should be
at least clearly documented and explained.

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#38Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#34)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

Good point. I don't see non-full WAL archiving as a problem for the
backup or shutdown, but I do see an issue with doing archives every X
seconds. If someone sets that really low (and someone will) we could
easily fill the disk.

The disk would only fill if the archiver doesn't keep up with
transmitting xlog files to the archive. The archive can fill up if it is
not correctly sized, even now. Switching log files every N seconds would
at least give a very predictable archive sizing calculation which should
actually work against users sizing their archives poorly.

However, rather than do it ourselves, maybe we
should make it visible to administrators so they know exactly what is
happening and can undo it in case they need to recover, something like:

archive_command = 'gzip <%p >%f'

so the compression is done in a way that is visible to the
administrator.

As long as we tell them there's more than one way to do it. Many tape
drives offer hardware compression, for example, so there would be no
gain in doing this twice.

Best Regards, Simon Riggs

#39Simon Riggs
simon@2ndquadrant.com
In reply to: Oleg Bartunov (#36)
Re: Problem with PITR recovery

On Tue, 2005-04-19 at 08:55 +0400, Oleg Bartunov wrote:

On Mon, 18 Apr 2005, Simon Riggs wrote:

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

I see that clear and deterministic procedure of online backup as I imagined
earlier becomes fuzzy and blurred :)

The process is involved and requires strictly observed administration
procedures, just as it does with other database systems. Each of them
have difficulties that need to be surmounted and require much thought to
implement. If PostgreSQL is the first DBMS on which you have attempted
to implement transactional archive recovery then you will definitely
find it hard, just as most Oracle and SQLServer DBAs don't understand
how their log recovery systems work either.

This is obviously not suited even
for my notebook.

Thats a pretty silly comment Oleg.

Since most laptops require portability as the main objective and that
usually requires or at least must frequently expect disconnection from
networks and other peripheral devices such as tape units, then no, the
PITR design isn't suitable in general for laptop use. If you use your
notebook as a production system with online archiving then PITR is
suitable.

PITR was designed to offer data protection for major production systems.
My experience was that these sites would have a reasonable stream of
transactions coming through, making the time between log file switches
somewhat predictable and usually every few minutes. The use case of a
very low transaction rate system was not considered fully since it was
felt that people in that situation would be less bothered to protect
their data with a rigorous backup procedure, leaving the issue we have
been discussing.

If you want recoverability, use PITR. If you choose not to use PITR,
thats fine. If you'd like to help make it better, that's fine too.

Best Regards, Simon Riggs

#40Oleg Bartunov
oleg@sai.msu.su
In reply to: Simon Riggs (#39)
Re: Problem with PITR recovery

On Tue, 19 Apr 2005, Simon Riggs wrote:

On Tue, 2005-04-19 at 08:55 +0400, Oleg Bartunov wrote:

On Mon, 18 Apr 2005, Simon Riggs wrote:

but I'm not sure it's best practice to delete them at that point. I
would recommend that users keep at least the last 3 backups. So, I'd
prefer the wording

...all archived WAL segments with names numerically less will no longer
be needed as part of that backup set. You may delete them at that point,
though you should consider keeping more than one backup set to be
absolutely certain that you are can recover your data.

I see that clear and deterministic procedure of online backup as I imagined
earlier becomes fuzzy and blurred :)

The process is involved and requires strictly observed administration
procedures, just as it does with other database systems. Each of them
have difficulties that need to be surmounted and require much thought to
implement. If PostgreSQL is the first DBMS on which you have attempted
to implement transactional archive recovery then you will definitely
find it hard, just as most Oracle and SQLServer DBAs don't understand
how their log recovery systems work either.

This is not an argument ! It's shame we still don't understand do we really
have reliable online backup or just hype with a lot of restriction and
caution. I'm not experienced Oracle DBA but I don't want to be a blind user.
I read seminal papers about recovery and I thought I understand how
it should works in our system. I want to be 110% sure to claim we're
ready to recommend it to our clients. I'm sure there are many experienced
DBA's who also don't understand what we have right now, especially after
this thread.

This is obviously not suited even
for my notebook.

Thats a pretty silly comment Oleg.

Don't be silly, Simon. It was just my reaction !

Since most laptops require portability as the main objective and that
usually requires or at least must frequently expect disconnection from
networks and other peripheral devices such as tape units, then no, the
PITR design isn't suitable in general for laptop use. If you use your
notebook as a production system with online archiving then PITR is
suitable.

PITR was designed to offer data protection for major production systems.
My experience was that these sites would have a reasonable stream of
transactions coming through, making the time between log file switches
somewhat predictable and usually every few minutes. The use case of a
very low transaction rate system was not considered fully since it was
felt that people in that situation would be less bothered to protect
their data with a rigorous backup procedure, leaving the issue we have
been discussing.

If you want recoverability, use PITR. If you choose not to use PITR,
thats fine. If you'd like to help make it better, that's fine too.

These sentences are not fair, Simon. I understand your point but I want
to have postgresql applicable not just for major production systems.
You forget that before production stage you have a lot of development and
testing. I don't want something exotical and I'm a bit surprized
about your reaction. I don't want to think about how difficult backup in
Oracle and other major dbms you're so experienced ! I'm PostgreSQL user
and PostgreSQL is rather transparent system and I'd like to have
understandable recovery process. Now I see all limitations and cautions and
waiting for improvements. Nobody attack you, I'm a bit dissapointed, but
this is what we have.

Best Regards, Simon Riggs

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#41Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#38)
Re: Problem with PITR recovery

Simon Riggs wrote:

On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

Good point. I don't see non-full WAL archiving as a problem for the
backup or shutdown, but I do see an issue with doing archives every X
seconds. If someone sets that really low (and someone will) we could
easily fill the disk.

The disk would only fill if the archiver doesn't keep up with
transmitting xlog files to the archive. The archive can fill up if it is
not correctly sized, even now. Switching log files every N seconds would
at least give a very predictable archive sizing calculation which should
actually work against users sizing their archives poorly.

I was thinking of the archiver filling because of lots of almost-empty
16mb files. If you archive every five seconds, it is 11 Gigs/hour,
which is not too bad, I guess, but I would bet compression would save
space and I/O load too.

However, rather than do it ourselves, maybe we
should make it visible to administrators so they know exactly what is
happening and can undo it in case they need to recover, something like:

archive_command = 'gzip <%p >%f'

so the compression is done in a way that is visible to the
administrator.

As long as we tell them there's more than one way to do it. Many tape
drives offer hardware compression, for example, so there would be no
gain in doing this twice.

Good point. I am thinking 'gzip --fast' would be the best option for
copies to another file system. I see about 0.6 seconds to compress a
16mb WAL file here and I get 16x compression.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#41)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I was thinking of the archiver filling because of lots of almost-empty
16mb files. If you archive every five seconds, it is 11 Gigs/hour,
which is not too bad, I guess, but I would bet compression would save
space and I/O load too.

If you wanted to archive every few seconds, it would be worth cutting
the size of the segment files. At the moment I believe the segment
size is a pg_config_manual.h configuration item. Not sure if it would
be practical to make it run-time configurable, but in any case doing that
would help a lot for people who want short archive cycles.

But really, if that is the concern, I'd think you'd want Slony or some
other near-real-time replication mechanism. PITR is designed for people
for whom some-small-number-of-minutes is close enough.

regards, tom lane

#43Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Bruce Momjian (#41)
Re: Problem with PITR recovery

On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote:

Simon Riggs wrote:

The disk would only fill if the archiver doesn't keep up with
transmitting xlog files to the archive. The archive can fill up if it is
not correctly sized, even now. Switching log files every N seconds would
at least give a very predictable archive sizing calculation which should
actually work against users sizing their archives poorly.

I was thinking of the archiver filling because of lots of almost-empty
16mb files. If you archive every five seconds, it is 11 Gigs/hour,
which is not too bad, I guess, but I would bet compression would save
space and I/O load too.

I suggested back then that some command to replace an archive could be
provided. So some people could use rsync to update the older version of
the XLog file to the new state. Non-rsync enabled people could use a
temporary file to copy the new file, and then rename to the original
XLog name, substituting the older version. And as a third way, maybe we
can come up with a sort-of-xdelta that would only update the yet-unused
portion of the old xlog file to the new content. (Maybe this could be
made to work with tape.)

Everyone here said that there was no need for such a thing because it
would complicate matters.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

#44Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Alvaro Herrera (#43)
Re: Problem with PITR recovery

Alvaro Herrera wrote:

On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote:

Simon Riggs wrote:

The disk would only fill if the archiver doesn't keep up with
transmitting xlog files to the archive. The archive can fill up if it is
not correctly sized, even now. Switching log files every N seconds would
at least give a very predictable archive sizing calculation which should
actually work against users sizing their archives poorly.

I was thinking of the archiver filling because of lots of almost-empty
16mb files. If you archive every five seconds, it is 11 Gigs/hour,
which is not too bad, I guess, but I would bet compression would save
space and I/O load too.

I suggested back then that some command to replace an archive could be
provided. So some people could use rsync to update the older version of
the XLog file to the new state. Non-rsync enabled people could use a
temporary file to copy the new file, and then rename to the original
XLog name, substituting the older version. And as a third way, maybe we
can come up with a sort-of-xdelta that would only update the yet-unused
portion of the old xlog file to the new content. (Maybe this could be
made to work with tape.)

Everyone here said that there was no need for such a thing because it
would complicate matters.

I do think we are going to need to go in that direction. I think the
problem is that we didn't have enough time to come up with a clear
solution to this problem so we delayed it for 8.1.

I agree the idea of overwriting is a nice idea and works for everything
but a tape drive, so it has to be optional in some way.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#45Jeff Davis
jdavis-pgsql@empires.org
In reply to: Oleg Bartunov (#40)
Re: Problem with PITR recovery

On Tue, 2005-04-19 at 15:23 +0400, Oleg Bartunov wrote:

This is not an argument ! It's shame we still don't understand do we really
have reliable online backup or just hype with a lot of restriction and
caution. I'm not experienced Oracle DBA but I don't want to be a blind user.
I read seminal papers about recovery and I thought I understand how
it should works in our system. I want to be 110% sure to claim we're
ready to recommend it to our clients. I'm sure there are many experienced
DBA's who also don't understand what we have right now, especially after
this thread.

Unless I misunderstand something, I think you're overreacting a bit. The
failure case is that the machine on which the database resides vaporizes
after you've done "pg_stop_backup()" but before the archiver archives
the WAL segments used during the backup procedure.

In practice, there are many reasons why that is not a major problem. For
example, PITR base backups are often going to be taken when the archiver
is already archiving WAL segments, and you already have a previous,
working bask backup. You'd still be able to use that old base backup and
the newly archived WAL segments.

In general, it's just not realistic that you take a machine from having
no backups of any kind to running mission-critical transactions and
depending solely on the PITR backup, and then watch the server vaporize,
all in less time than it takes to archive a few WAL segments.

In almost all cases, the loss in data would be comparable to the loss
experienced by not having the last few WAL segments shipped, and PITR
never made a promise of keeping the transactions that never got
archived.

PITR works, and the developers are:
(1) Improving the current docs to make it absolutely clear how to make
100% assured backups.
(2) Making PITR easier to administer, probably for 8.1.
(3) Adding features to PITR, probably for 8.1.

If what I said above is incorrect, please correct me, because that means
that I'm one of the lost DBAs that Oleg is talking about.

Regards,
Jeff Davis

#46Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jeff Davis (#45)
Re: Problem with PITR recovery

Jeff Davis wrote:

Unless I misunderstand something, I think you're overreacting a bit. The
failure case is that the machine on which the database resides vaporizes
after you've done "pg_stop_backup()" but before the archiver archives
the WAL segments used during the backup procedure.

In practice, there are many reasons why that is not a major problem. For
example, PITR base backups are often going to be taken when the archiver
is already archiving WAL segments, and you already have a previous,
working bask backup. You'd still be able to use that old base backup and
the newly archived WAL segments.

In general, it's just not realistic that you take a machine from having
no backups of any kind to running mission-critical transactions and
depending solely on the PITR backup, and then watch the server vaporize,
all in less time than it takes to archive a few WAL segments.

In almost all cases, the loss in data would be comparable to the loss
experienced by not having the last few WAL segments shipped, and PITR
never made a promise of keeping the transactions that never got
archived.

PITR works, and the developers are:
(1) Improving the current docs to make it absolutely clear how to make
100% assured backups.
(2) Making PITR easier to administer, probably for 8.1.
(3) Adding features to PITR, probably for 8.1.

You are right. The problem we really had was that the documentation
didn't mention the restrictions, and it said you could remove the old
archived WAL files once you did pg_stop_backup(). That has been
corrected and the new documentation will be in 8.0.3. I will mention
the PITR documentation clarification in the release notes for 8.0.3.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#47Oleg Bartunov
oleg@sai.msu.su
In reply to: Jeff Davis (#45)
Re: Problem with PITR recovery

On Tue, 19 Apr 2005, Jeff Davis wrote:

Unless I misunderstand something, I think you're overreacting a bit. The

Y're right. It's all emotions :)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#48Klaus Naumann
lists@distinctmind.de
In reply to: Simon Riggs (#33)
Re: Problem with PITR recovery

Hi Simon,

Actually, me too. Never saw the need for the Oracle command myself.

It actually has. If you want to move your redo logs to a new disk, you
create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE;
to switch to the new logfile. Then you can remove the "old" one
(speaking just of one file for simplification).
Waiting on that event could take ages.

Strictly speaking, this doesn't concern postgresql (yet). But if, at the
future, we support user defined (= changing these parameters while the
db is running) redo log locations, sizes and count, we need a function
to switch the logfile manually. Which I think the pg_stop_backup()
hack is not suitable for.

#49Andrew Rawnsley
ronz@ravensfield.com
In reply to: Klaus Naumann (#48)
Re: Problem with PITR recovery

It is also recommended when creating new standby control files, when
Oracle can't
automatically expand the data file capacity on a standby like it does
with
a live database. Nothing like seeing the 'Didn't restore XXXX from
sufficiently old
backup' message when Oracle is confused (which seems to be most of the
time)
about what transactions have been applied where.

This, of course, doesn't matter for postgresql. Thank the gods....

On Apr 20, 2005, at 3:28 AM, Klaus Naumann wrote:

Hi Simon,

Actually, me too. Never saw the need for the Oracle command myself.

It actually has. If you want to move your redo logs to a new disk, you
create a new redo log file and then issue a ALTER SYSTEM SWITCH
LOGFILE;
to switch to the new logfile. Then you can remove the "old" one
(speaking just of one file for simplification).
Waiting on that event could take ages.

Strictly speaking, this doesn't concern postgresql (yet). But if, at
the
future, we support user defined (= changing these parameters while the
db is running) redo log locations, sizes and count, we need a function
to switch the logfile manually. Which I think the pg_stop_backup()
hack is not suitable for.

---------------------------(end of
broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

____________________________

Andrew Rawnsley
Chief Technology Officer
Investor Analytics, LLC
(740) 587-0114
http://www.investoranalytics.com

#50Simon Riggs
simon@2ndquadrant.com
In reply to: Klaus Naumann (#48)
Re: Problem with PITR recovery

On Wed, 2005-04-20 at 09:28 +0200, Klaus Naumann wrote:

Actually, me too. Never saw the need for the Oracle command myself.

It actually has. If you want to move your redo logs to a new disk, you
create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE;
to switch to the new logfile. Then you can remove the "old" one
(speaking just of one file for simplification).
Waiting on that event could take ages.

Strictly speaking, this doesn't concern postgresql (yet). But if, at the
future, we support user defined (= changing these parameters while the
db is running) redo log locations, sizes and count, we need a function
to switch the logfile manually. Which I think the pg_stop_backup()
hack is not suitable for.

Thanks Klaus - I never tried that online.

We're someway away from functionality for online redo location
migration, I agree. Sounds like we'd still be able to do the log switch
as part that.

Best Regards, Simon Riggs

#51Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#30)
Re: Problem with PITR recovery

On Mon, 2005-04-18 at 23:20 +0100, Simon Riggs wrote:

My plan would be to write a special xlog record for xlog switching. This
would be a special processing instruction, rather than a data/redo
instructions. This would be implemented as another xlog info value on
the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo
would simply set a variable to be used elsewhere.)

When written the xlog switch instruction (XLogInsert) would switch to a
new xlog, just as if a file had been filled, causing it to be
immediately archived.

This has been mostly implemented and posted to PATCHES, though I have a
later patch also. There are some points still to discuss.

Setting the pointer seems to work, but there are 3 pointers, each
protected by a separate locks. All of those are designed to be taken and
held independently.

My understanding is that the correct locking order would be:

WALInsertLock
WALWriteLock
info_lck

XLogInsert uses info_lck first, but then checks everything again once it
acquires WALInsertLock. To switch files, we must ensure that nobody can
insert xlrecs with a record pointer higher than the log switch record.
This is different from checkpoints, where a checkpoint record can
actually occur before records which are logically after it; that must
never happen with a log switch else we'd miss them entirely on wal
replay.

Next, from XLogInsert with WALInsertLock held, we wait to acquire
WALWriteLock, since an I/O might be in progress currently. When we have
this, we then issue an XLogWrite, during which we update the record
pointer, which then is propogated through to info_lck.

AFAICS this is the only case of unconditionally acquiring all 3 locks.

Do we agree that this is the correct lock sequence, and if it is, do we
think that this leaves open the chance of deadlock at any stage?

A shutdown checkpoint would also have the same effect as an
XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
away the file. Otherwise, we'd have a problem as to which order to write
the messages in at shutdown time. (Not happy about that bit, so
suggestions welcome...)

Treating shutdown checkpoint markers as xlog switches is possible but
gives problems since archive_command is a SUSET variable. On replay we
wouldn't necessarily know whether a shutdown checkpoint was treated as
an xlog switch when it was written, so we'd need to attempt to switch
and look beyond the checkpoint marker, just in case. That makes me
uncomfortable.

Hmmm...

Best Regards, Simon Riggs

#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#51)
Re: Problem with PITR recovery

Simon Riggs <simon@2ndquadrant.com> writes:

AFAICS this is the only case of unconditionally acquiring all 3 locks.

You just lost me ... I think the above is certainly a bad idea from a
concurrency standpoint, and very possibly a deadlock risk.

In any case you are thinking about it the wrong way. It is not
LogwrtResult you want to advance, it is the Insert variables that define
what the current WAL buffer page is.

ISTM the correct approach involves having a special case in XLogInsert:
just after inserting an end-of-file record, forcibly advance to the next
buffer, and set it up to be the first page for the next segment rather
than the next segment in sequence. (This is likely best handled as an
extra call to AdvanceXLInsertBuffer that invokes some special-case code
in AdvanceXLInsertBuffer.) You normally only need the WALInsertLock to
do this. After that's complete you can release the insert lock, and
then other operations can proceed while you do an XLogFlush to force out
the remaining dirty WAL buffers for the old segment. Then you're done.
(I think I'd put the XLogFlush in the pg_stop_backup code, not in
XLogInsert proper.)

regards, tom lane

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#51)
Re: Problem with PITR recovery

Simon Riggs <simon@2ndquadrant.com> writes:

Treating shutdown checkpoint markers as xlog switches is possible but
gives problems since archive_command is a SUSET variable. On replay we
wouldn't necessarily know whether a shutdown checkpoint was treated as
an xlog switch when it was written, so we'd need to attempt to switch
and look beyond the checkpoint marker, just in case. That makes me
uncomfortable.

[ Forgot to respond to this part... ]

I think the only safe way to handle that would be to define a shutdown
checkpoint record as being effectively an end-of-file record ALWAYS,
whether archiving or not. This would be rather a problem for initdb,
which would go through a new XLOG segment for each of its multiple
calls to a standalone backend --- on the other hand, it's not real
clear why we couldn't fold initdb down to one bootstrap run and one
plain standalone backend run, which'd cut that problem down to the
point of tolerability.

However, this still begs the question of why we are bothering.
I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment. Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.

regards, tom lane

#54Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#53)
Re: Problem with PITR recovery

On Wed, 2005-04-20 at 15:59 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Treating shutdown checkpoint markers as xlog switches is possible but
gives problems since archive_command is a SUSET variable. On replay we
wouldn't necessarily know whether a shutdown checkpoint was treated as
an xlog switch when it was written, so we'd need to attempt to switch
and look beyond the checkpoint marker, just in case. That makes me
uncomfortable.

[ Forgot to respond to this part... ]

I think the only safe way to handle that would be to define a shutdown
checkpoint record as being effectively an end-of-file record ALWAYS,
whether archiving or not. This would be rather a problem for initdb,
which would go through a new XLOG segment for each of its multiple
calls to a standalone backend --- on the other hand, it's not real
clear why we couldn't fold initdb down to one bootstrap run and one
plain standalone backend run, which'd cut that problem down to the
point of tolerability.

However, this still begs the question of why we are bothering.

Thats a big question :-)

I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment. Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.

Right now, I'm happy to leave that part anyhow...

Best Regards, Simon Riggs

#55Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#52)
Re: Problem with PITR recovery

On Wed, 2005-04-20 at 15:51 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

AFAICS this is the only case of unconditionally acquiring all 3 locks.

You just lost me ... I think the above is certainly a bad idea from a
concurrency standpoint, and very possibly a deadlock risk.

'twas my fear too.

In any case you are thinking about it the wrong way. It is not
LogwrtResult you want to advance, it is the Insert variables that define
what the current WAL buffer page is.

Yes OK, so that way I don't need the 3 locks. Good.

ISTM the correct approach involves having a special case in XLogInsert:
just after inserting an end-of-file record, forcibly advance to the next
buffer, and set it up to be the first page for the next segment rather
than the next segment in sequence. (This is likely best handled as an
extra call to AdvanceXLInsertBuffer that invokes some special-case code
in AdvanceXLInsertBuffer.) You normally only need the WALInsertLock to
do this. After that's complete you can release the insert lock, and
then other operations can proceed while you do an XLogFlush to force out
the remaining dirty WAL buffers for the old segment. Then you're done.

Good. Thats was roughly what I'm attempting now, just advancing the
wrong pointer and struggling/worried by the 3 lock problem.

(I think I'd put the XLogFlush in the pg_stop_backup code, not in
XLogInsert proper.)

That seems like the way its done elsewhere.

Best Regards, Simon Riggs

#56Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#53)
Re: Problem with PITR recovery

Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Treating shutdown checkpoint markers as xlog switches is possible but
gives problems since archive_command is a SUSET variable. On replay we
wouldn't necessarily know whether a shutdown checkpoint was treated as
an xlog switch when it was written, so we'd need to attempt to switch
and look beyond the checkpoint marker, just in case. That makes me
uncomfortable.

However, this still begs the question of why we are bothering.
I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment. Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.

Doing the archive at server shutdown eliminates one of the "must
document" items, so the system behaves more predictably that it does
not. It is not required --- it is a usability issue.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#56)
Re: Problem with PITR recovery

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

However, this still begs the question of why we are bothering.
I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment. Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.

Doing the archive at server shutdown eliminates one of the "must
document" items, so the system behaves more predictably that it does
not. It is not required --- it is a usability issue.

No, it just replaces a documentation issue with a reliability issue.
We'd have to consider what to say about the prospect that the archiver
is unable to archive that last segment, is kill -9'd by init at some
critical point in the process, etc etc. I think it's just a bad idea
to promise people that shutting down the postmaster will have any such
effect.

regards, tom lane

#58Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#57)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

However, this still begs the question of why we are bothering.
I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment. Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.

Doing the archive at server shutdown eliminates one of the "must
document" items, so the system behaves more predictably that it does
not. It is not required --- it is a usability issue.

No, it just replaces a documentation issue with a reliability issue.
We'd have to consider what to say about the prospect that the archiver
is unable to archive that last segment, is kill -9'd by init at some
critical point in the process, etc etc. I think it's just a bad idea
to promise people that shutting down the postmaster will have any such
effect.

OK, makes sense. Could we give them a command to archive it before they
shut down? That would make sense.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#58)
Re: Problem with PITR recovery

OK, makes sense. Could we give them a command to archive it before they
shut down? That would make sense.

Not if the idea is to be certain you got everything ... I think what we
have to do is document a manual procedure for archiving the last XLOG
file.

But really my question is "what's the use case for this?" ISTM that
on-line backups are what PITR users want, not something involving
shutting down the postmaster --- and the changes Simon is already making
will be enough to handle those cases.

regards, tom lane

#60Michael Paesold
mpaesold@gmx.at
In reply to: Bruce Momjian (#58)
Re: Problem with PITR recovery

Tom Lane wrote:

Bruce Momjian wrote:

OK, makes sense. Could we give them a command to archive it before they
shut down? That would make sense.

Not if the idea is to be certain you got everything ... I think what we
have to do is document a manual procedure for archiving the last XLOG
file.

What Bruce would want is a way to "stop new transactions, archive and
shutdown", which would do this atomically. Then we could have another
shutdown switch for pg_ctl.

But yea, a documentation for a manual procedure would be ok, too, just not
as user friendly.

Best Regards,
Michael Paesold

#61Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Michael Paesold (#60)
Re: Problem with PITR recovery

Michael Paesold wrote:

Tom Lane wrote:

Bruce Momjian wrote:

OK, makes sense. Could we give them a command to archive it before they
shut down? That would make sense.

Not if the idea is to be certain you got everything ... I think what we
have to do is document a manual procedure for archiving the last XLOG
file.

What Bruce would want is a way to "stop new transactions, archive and
shutdown", which would do this atomically. Then we could have another
shutdown switch for pg_ctl.

Yea, probably a separate switch, or an additional switch to pg_clt would
be best, but then we have to add to pg_ctl.

But yea, a documentation for a manual procedure would be ok, too, just not
as user friendly.

Right. I just hate the 'do this, do that' instructions to PITR. When
they get too long/complex, I get worried.

I used to use Informix's ontape, which was a bad user interface because
the admin had to be sure it was always running. Anyway, when you
control-C'ed the process, it would flush out any partially written wal
file and you knew you had everything.

I am thinking a special pg_ctl flag, and disabling -W for that so you
have to wait for the success message. Of course we then have to
document the use of the pg_ctl flag then.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#62Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#61)
Re: Problem with PITR recovery

On Thu, 2005-04-21 at 08:57 -0400, Bruce Momjian wrote:

Michael Paesold wrote:

Tom Lane wrote:

Bruce Momjian wrote:

OK, makes sense. Could we give them a command to archive it before they
shut down? That would make sense.

Not if the idea is to be certain you got everything ... I think what we
have to do is document a manual procedure for archiving the last XLOG
file.

What Bruce would want is a way to "stop new transactions, archive and
shutdown", which would do this atomically. Then we could have another
shutdown switch for pg_ctl.

Yea, probably a separate switch, or an additional switch to pg_clt would
be best, but then we have to add to pg_ctl.

But yea, a documentation for a manual procedure would be ok, too, just not
as user friendly.

Right. I just hate the 'do this, do that' instructions to PITR. When
they get too long/complex, I get worried.

I am thinking a special pg_ctl flag, and disabling -W for that so you
have to wait for the success message. Of course we then have to
document the use of the pg_ctl flag then.

I'll write the log switch, you decide when/how to invoke it.

My head hurts.

Best Regards, Simon Riggs