Re: problems using pg_start_backup/pg_stop_backup and pg_basebackup at same time
Thanks for the response Magnus.
I've replied inline below.
On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <magnus@hagander.net>
wrote:
On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicagent@gmail.com> wrote:
I am experiencing two problems
1) pg_stop_backup hangs until pg_basebackup finishes
2) WAL contains references to invalid pages(I suspect when 1. is fixed I won't experience 2. )
Full description below:
postgresql: 9.3.13
My situation is I create a writable report database every day using
pg_start_backup() / rsync / pg_stop_backup()
(whole process takes 1-2 hours, the rsync actually copies the data dir of
a slave/warm spare DB)Taking what is called an exclusive backup, which is what pg_start_backup()
does, is not supported off a standby node. Only pg_basebackup is supported
against a standby node.Can you reproduce your errors if you make this backup from the master?
pg_start_backup() is run on the master.
The rsync copies the data dir of the standby b/c it is local.
I have tried rsyncing from the master w/ the same results.
Also once a week I create a database for backup/archive purposes using
pg_basebackup
(whole process takes about 13 hours)These two processes used to be able to coincide until recently.
Recent changes include a major debian upgrade and a minor version of
postgres upgrade from 9.3.10.Now when the report sync occurs during the pg_basebackup (w/
--xlog-method=stream option) the pg_stop_backup() hangs until
the the pg_basebackup has completed (4 hours later).Do you also have an archive_command, and what is it? Normally,
pg_stop_backup() blocks on the archive command - so perhaps you have ended
up with a dependency between that one and the base backup command somehow?
The archive command is:
archive_command = 'rsync -a %p postgres@archivedb
:/storage/postgres/9.3/main-archive/pg_xlog/%f'
I thought the archive command was working properly while pg_stop_backup was
hanging based on the timestamps of the files, but upon closer
inspection using stat I see that the files actually were not copied over
until pg_stop_backup stopped hanging.
Actually it appears that the archive_command is not executed the entire
time pg_basebackup is running.
pg_basebackup is started about 5 hours before pg_start_backup is executed.
Based on stat it looks like the wal files get written locally but do not
get rsynced until pg_basebackup is done.
A labeled WAL backup file is created when the pg_stop_backup() is first
executed and another
is created when the pg_basebackup completes.While the pg_stop_backup() hangs the following appears in the logs:
2016-06-11 07:50:45 EDT: WARNING: pg_stop_backup still waiting for all
required WAL segments to be archived (7680 seconds elapsed)
2016-06-11 07:50:45 EDT: HINT: Check that your archive_command is
executing properly. pg_stop_backup can be canceled safely, but the
database backup will not be usable without all the WAL segments.until eventually:
2016-06-11 12:59:07 EDT:LOG: duration: 26190082.497 ms statement:
SELECT pg_stop_backup()The archive command appears to work as WAL files are being archived
successfully with timestamps that occur while pg_stop_backup is hanging.-rw------- 1 postgres postgres 314 Jun 11 12:56
000000030000109200000066.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
-rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
-rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
<snip>
-rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
-rw------- 1 postgres postgres 335 Jun 11 05:42
0000000300001092000000D4.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0Content of 0000000300001092000000D4.00000028.backup:
START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
CHECKPOINT LOCATION: 1092/D4004E08
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-06-11 04:51:04 EDT
LABEL: reporting 2016-06-11 04:51:03.892804-04
STOP TIME: 2016-06-11 05:42:37 EDTThis file shows that the backup took about an hour, doesn't it? So it
doesn't look like it was waiting?Or are you saying the wait happens *after* the .backup file has been
dropped in the archives?
The .backup file gets written locally in a timely manner but doesn't get
dropped into the archive until after the
pg_stop_backup() stops hanging.
So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
causing the archive_command to pause which in turn
causes pg_stop_backup() to hang?
Thanks.
Alex
Should I file a bug?
Does anyone know of a workaround?
I could try running pg_basebackup w/ --xlog-method=fetch and see if that
helps but I do like knowing that the base backup will have all needed WAL
files regardless of WAL retention policy on master.
Thanks.
Alex
Recap of what I *think* is going on:
1) pg_basebackup w/ --xlog-method=stream is started
2) archive_command is not executed while pg_basebackup() is running
3) while pg_basebackup() is running pg_start_backup() is executed
4) Later pg_stop_backup() is executed.
5) pg_stop_backup() hangs related to the archive_command not running
6) hours later when pg_basebackup finishes, back logged archive commands
get executed then pg_stop_backup() finishes
On Mon, Jun 20, 2016 at 10:15 AM, Alex Malek <magicagent@gmail.com> wrote:
Show quoted text
Thanks for the response Magnus.
I've replied inline below.On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <magnus@hagander.net>
wrote:On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicagent@gmail.com> wrote:
I am experiencing two problems
1) pg_stop_backup hangs until pg_basebackup finishes
2) WAL contains references to invalid pages(I suspect when 1. is fixed I won't experience 2. )
Full description below:
postgresql: 9.3.13
My situation is I create a writable report database every day using
pg_start_backup() / rsync / pg_stop_backup()
(whole process takes 1-2 hours, the rsync actually copies the data dir
of a slave/warm spare DB)Taking what is called an exclusive backup, which is what
pg_start_backup() does, is not supported off a standby node. Only
pg_basebackup is supported against a standby node.Can you reproduce your errors if you make this backup from the master?
pg_start_backup() is run on the master.
The rsync copies the data dir of the standby b/c it is local.
I have tried rsyncing from the master w/ the same results.Also once a week I create a database for backup/archive purposes using
pg_basebackup
(whole process takes about 13 hours)These two processes used to be able to coincide until recently.
Recent changes include a major debian upgrade and a minor version of
postgres upgrade from 9.3.10.Now when the report sync occurs during the pg_basebackup (w/
--xlog-method=stream option) the pg_stop_backup() hangs until
the the pg_basebackup has completed (4 hours later).Do you also have an archive_command, and what is it? Normally,
pg_stop_backup() blocks on the archive command - so perhaps you have ended
up with a dependency between that one and the base backup command somehow?The archive command is:
archive_command = 'rsync -a %p postgres@archivedb
:/storage/postgres/9.3/main-archive/pg_xlog/%f'I thought the archive command was working properly while pg_stop_backup
was hanging based on the timestamps of the files, but upon closer
inspection using stat I see that the files actually were not copied over
until pg_stop_backup stopped hanging.Actually it appears that the archive_command is not executed the entire
time pg_basebackup is running.
pg_basebackup is started about 5 hours before pg_start_backup is executed.
Based on stat it looks like the wal files get written locally but do not
get rsynced until pg_basebackup is done.A labeled WAL backup file is created when the pg_stop_backup() is first
executed and another
is created when the pg_basebackup completes.While the pg_stop_backup() hangs the following appears in the logs:
2016-06-11 07:50:45 EDT: WARNING: pg_stop_backup still waiting for all
required WAL segments to be archived (7680 seconds elapsed)
2016-06-11 07:50:45 EDT: HINT: Check that your archive_command is
executing properly. pg_stop_backup can be canceled safely, but the
database backup will not be usable without all the WAL segments.until eventually:
2016-06-11 12:59:07 EDT:LOG: duration: 26190082.497 ms statement:
SELECT pg_stop_backup()The archive command appears to work as WAL files are being archived
successfully with timestamps that occur while pg_stop_backup is hanging.-rw------- 1 postgres postgres 314 Jun 11 12:56
000000030000109200000066.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
-rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
-rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
<snip>
-rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
-rw------- 1 postgres postgres 335 Jun 11 05:42
0000000300001092000000D4.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0Content of 0000000300001092000000D4.00000028.backup:
START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
CHECKPOINT LOCATION: 1092/D4004E08
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-06-11 04:51:04 EDT
LABEL: reporting 2016-06-11 04:51:03.892804-04
STOP TIME: 2016-06-11 05:42:37 EDTThis file shows that the backup took about an hour, doesn't it? So it
doesn't look like it was waiting?Or are you saying the wait happens *after* the .backup file has been
dropped in the archives?The .backup file gets written locally in a timely manner but doesn't get
dropped into the archive until after the
pg_stop_backup() stops hanging.So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
causing the archive_command to pause which in turn
causes pg_stop_backup() to hang?Thanks.
Alex
On Mon, Jun 27, 2016 at 8:27 PM, Alex Malek <magicagent@gmail.com> wrote:
Should I file a bug?
Oh, sorry. I did not spot that this was posted to pgsql-admin, which has a
lot fewer readers. I've moved it over to -hackers where more people will
see it.
(And also, please don't top-post on these lists, as it makes the discussion
much harder to follow)
Does anyone know of a workaround?
I could try running pg_basebackup w/ --xlog-method=fetch and see if that
helps but I do like knowing that the base backup will have all needed WAL
files regardless of WAL retention policy on master.Thanks.
AlexRecap of what I *think* is going on:
1) pg_basebackup w/ --xlog-method=stream is started
2) archive_command is not executed while pg_basebackup() is running
There is no reason why archive_command should not run during a
pg_basebackup run. Do you get anything at all in the logfile indicating
that this would be the case? If you look at the ps output, does the
archiver process indicate what it's doing, and does it have a sub-process
for your rsync command?
Is there any chance that for example the network or other part of the
system gets saturated and the archive_command is simply runniung too slow
to keep up?
3) while pg_basebackup() is running pg_start_backup() is executed
4) Later pg_stop_backup() is executed.
5) pg_stop_backup() hangs related to the archive_command not running
6) hours later when pg_basebackup finishes, back logged archive commands
get executed then pg_stop_backup() finishes
This would be consistent with (2), but there is no reason why (2) would
happen because of pg_basebackup unless there's an actual bug there.
//Magnus
On Mon, Jun 20, 2016 at 10:15 AM, Alex Malek <magicagent@gmail.com> wrote:
Thanks for the response Magnus.
I've replied inline below.On Mon, Jun 20, 2016 at 4:17 AM, Magnus Hagander <magnus@hagander.net>
wrote:On Mon, Jun 13, 2016 at 6:59 PM, Alex Malek <magicagent@gmail.com>
wrote:I am experiencing two problems
1) pg_stop_backup hangs until pg_basebackup finishes
2) WAL contains references to invalid pages(I suspect when 1. is fixed I won't experience 2. )
Full description below:
postgresql: 9.3.13
My situation is I create a writable report database every day using
pg_start_backup() / rsync / pg_stop_backup()
(whole process takes 1-2 hours, the rsync actually copies the data dir
of a slave/warm spare DB)Taking what is called an exclusive backup, which is what
pg_start_backup() does, is not supported off a standby node. Only
pg_basebackup is supported against a standby node.Can you reproduce your errors if you make this backup from the master?
pg_start_backup() is run on the master.
The rsync copies the data dir of the standby b/c it is local.
I have tried rsyncing from the master w/ the same results.Also once a week I create a database for backup/archive purposes using
pg_basebackup
(whole process takes about 13 hours)These two processes used to be able to coincide until recently.
Recent changes include a major debian upgrade and a minor version of
postgres upgrade from 9.3.10.Now when the report sync occurs during the pg_basebackup (w/
--xlog-method=stream option) the pg_stop_backup() hangs until
the the pg_basebackup has completed (4 hours later).Do you also have an archive_command, and what is it? Normally,
pg_stop_backup() blocks on the archive command - so perhaps you have ended
up with a dependency between that one and the base backup command somehow?The archive command is:
archive_command = 'rsync -a %p postgres@archivedb
:/storage/postgres/9.3/main-archive/pg_xlog/%f'I thought the archive command was working properly while pg_stop_backup
was hanging based on the timestamps of the files, but upon closer
inspection using stat I see that the files actually were not copied over
until pg_stop_backup stopped hanging.Actually it appears that the archive_command is not executed the entire
time pg_basebackup is running.
pg_basebackup is started about 5 hours before pg_start_backup is executed.
Based on stat it looks like the wal files get written locally but do not
get rsynced until pg_basebackup is done.A labeled WAL backup file is created when the pg_stop_backup() is first
executed and another
is created when the pg_basebackup completes.While the pg_stop_backup() hangs the following appears in the logs:
2016-06-11 07:50:45 EDT: WARNING: pg_stop_backup still waiting for all
required WAL segments to be archived (7680 seconds elapsed)
2016-06-11 07:50:45 EDT: HINT: Check that your archive_command is
executing properly. pg_stop_backup can be canceled safely, but the
database backup will not be usable without all the WAL segments.until eventually:
2016-06-11 12:59:07 EDT:LOG: duration: 26190082.497 ms statement:
SELECT pg_stop_backup()The archive command appears to work as WAL files are being archived
successfully with timestamps that occur while pg_stop_backup is hanging.-rw------- 1 postgres postgres 314 Jun 11 12:56
000000030000109200000066.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 12:56 000000030000109300000029
-rw------- 1 postgres postgres 16M Jun 11 12:54 000000030000109300000028
-rw------- 1 postgres postgres 16M Jun 11 12:52 000000030000109300000027
<snip>
-rw------- 1 postgres postgres 16M Jun 11 06:22 0000000300001092000000F1
-rw------- 1 postgres postgres 335 Jun 11 05:42
0000000300001092000000D4.00000028.backup
-rw------- 1 postgres postgres 16M Jun 11 05:42 0000000300001092000000F0Content of 0000000300001092000000D4.00000028.backup:
START WAL LOCATION: 1092/D4000028 (file 0000000300001092000000D4)
STOP WAL LOCATION: 1092/F008EA10 (file 0000000300001092000000F0)
CHECKPOINT LOCATION: 1092/D4004E08
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2016-06-11 04:51:04 EDT
LABEL: reporting 2016-06-11 04:51:03.892804-04
STOP TIME: 2016-06-11 05:42:37 EDTThis file shows that the backup took about an hour, doesn't it? So it
doesn't look like it was waiting?Or are you saying the wait happens *after* the .backup file has been
dropped in the archives?The .backup file gets written locally in a timely manner but doesn't get
dropped into the archive until after the
pg_stop_backup() stops hanging.So it appears that pg_basebackup ( perhaps --xlog-method=stream ? ) is
causing the archive_command to pause which in turn
causes pg_stop_backup() to hang?Thanks.
Alex
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Fri, Jul 1, 2016 at 5:00 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Mon, Jun 27, 2016 at 8:27 PM, Alex Malek <magicagent@gmail.com> wrote:
Should I file a bug?
Oh, sorry. I did not spot that this was posted to pgsql-admin, which has a
lot fewer readers. I've moved it over to -hackers where more people will see
it.(And also, please don't top-post on these lists, as it makes the discussion
much harder to follow)Does anyone know of a workaround?
I could try running pg_basebackup w/ --xlog-method=fetch and see if that
helps but I do like knowing that the base backup will have all needed WAL
files regardless of WAL retention policy on master.Thanks.
AlexRecap of what I *think* is going on:
1) pg_basebackup w/ --xlog-method=stream is started
2) archive_command is not executed while pg_basebackup() is runningThere is no reason why archive_command should not run during a pg_basebackup
run. Do you get anything at all in the logfile indicating that this would be
the case? If you look at the ps output, does the archiver process indicate
what it's doing, and does it have a sub-process for your rsync command?Is there any chance that for example the network or other part of the system
gets saturated and the archive_command is simply runniung too slow to keep
up?3) while pg_basebackup() is running pg_start_backup() is executed
4) Later pg_stop_backup() is executed.
5) pg_stop_backup() hangs related to the archive_command not running
6) hours later when pg_basebackup finishes, back logged archive commands
get executed then pg_stop_backup() finishesThis would be consistent with (2), but there is no reason why (2) would
happen because of pg_basebackup unless there's an actual bug there.
I think there is one dependency of archiver (2) to finish which is
that corresponding .ready file should be generated. Can it be
possible, that due to some reason the .ready file is not generated
corresponding to stoppoint determined by pg_stop_backup()? Is it
possible to verify if the required .ready files are present?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers