PITR potentially broken in 9.2

Started by Jeff Janesover 13 years ago61 messageshackersbugs

jeff.janes@gmail.com

over 13 years ago

hackersbugs

Doing PITR in 9.2.1, the system claims that it reached a consistent
recovery state immediately after redo starts.
This leads to it various mysterious failures, when it should instead
throw a "requested recovery stop point is before consistent recovery
point" error.
(If you are unlucky, I think it might even silently start up in a
corrupt state.)

This seems to have been introduced in:
commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3
Author: Simon Riggs <simon@2ndQuadrant.com>
Date: Wed Jan 25 18:02:04 2012 +0000

Allow pg_basebackup from standby node with safety checking.
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.

Jun Ishizuka and Fujii Masao, with minor modifications

the backup file:

START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C)
STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A)
CHECKPOINT LOCATION: 1/188D8120
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-11-27 09:40:13 PST
LABEL: label
STOP TIME: 2012-11-27 09:42:10 PST

(The file 00000001000000010000000C was archived at 9:37.)

recovery.conf:
restore_command = 'cp /tmp/archivedir/%f %p'
recovery_target_time = '2012-11-27 09:38:00 PST'

Log file:

22110 2012-11-27 09:49:15.220 PST LOG: database system was
interrupted; last known up at 2012-11-27 09:40:13 PST
22110 2012-11-27 09:49:15.235 PST LOG: starting point-in-time
recovery to 2012-11-27 09:38:00-08
22110 2012-11-27 09:49:15.271 PST LOG: restored log file
"000000010000000100000018" from archive
22110 2012-11-27 09:49:15.367 PST LOG: restored log file
"00000001000000010000000C" from archive
22110 2012-11-27 09:49:15.372 PST LOG: redo starts at 1/CD89E48
22110 2012-11-27 09:49:15.374 PST LOG: consistent recovery state
reached at 1/CD8B7F0
22110 2012-11-27 09:49:15.490 PST LOG: restored log file
"00000001000000010000000D" from archive
22110 2012-11-27 09:49:15.775 PST LOG: restored log file
"00000001000000010000000E" from archive
22110 2012-11-27 09:49:16.078 PST LOG: restored log file
"00000001000000010000000F" from archive
22110 2012-11-27 09:49:16.345 PST LOG: restored log file
"000000010000000100000010" from archive
22110 2012-11-27 09:49:16.533 PST LOG: recovery stopping before
commit of transaction 951967, time 2012-11-27 09:38:00.000689-08
22110 2012-11-27 09:49:16.533 PST LOG: redo done at 1/10F41900
22110 2012-11-27 09:49:16.533 PST LOG: last completed transaction
was at log time 2012-11-27 09:37:59.998496-08
22110 2012-11-27 09:49:16.537 PST LOG: selected new timeline ID: 2
22110 2012-11-27 09:49:16.584 PST LOG: archive recovery complete
22113 2012-11-27 09:49:16.599 PST LOG: checkpoint starting:
end-of-recovery immediate wait
22113 2012-11-27 09:49:17.815 PST LOG: checkpoint complete: wrote
8336 buffers (12.7%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.097 s, sync=1.115 s, total=1.230 s; sync files=14,
longest=0.578 s, average=0.079 s
22110 2012-11-27 09:49:17.929 PST FATAL: could not access status of
transaction 1014015
22110 2012-11-27 09:49:17.929 PST DETAIL: Could not read from file
"pg_clog/0000" at offset 245760: Success.
22109 2012-11-27 09:49:17.932 PST LOG: startup process (PID 22110)
exited with exit code 1
22109 2012-11-27 09:49:17.932 PST LOG: terminating any other active
server processes

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Noah Misch

noah@leadboat.com

over 13 years ago

In reply to: Jeff Janes (#1)

hackersbugs

Re: PITR potentially broken in 9.2

On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote:

Doing PITR in 9.2.1, the system claims that it reached a consistent
recovery state immediately after redo starts.
This leads to it various mysterious failures, when it should instead
throw a "requested recovery stop point is before consistent recovery
point" error.
(If you are unlucky, I think it might even silently start up in a
corrupt state.)

I observed a similar problem with 9.2. Despite a restore_command that failed
every time, startup from a hot backup completed. At the time, I suspected a
mistake on my part.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Noah Misch (#2)

hackersbugs

Re: PITR potentially broken in 9.2

On 28.11.2012 06:27, Noah Misch wrote:

On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote:

Doing PITR in 9.2.1, the system claims that it reached a consistent
recovery state immediately after redo starts.
This leads to it various mysterious failures, when it should instead
throw a "requested recovery stop point is before consistent recovery
point" error.
(If you are unlucky, I think it might even silently start up in a
corrupt state.)

I observed a similar problem with 9.2. Despite a restore_command that failed
every time, startup from a hot backup completed. At the time, I suspected a
mistake on my part.

I believe this was caused by this little typo/thinko:

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6763,7 +6763,7 @@ StartupXLOG(void)
/* Pop the error context stack */
error_context_stack = errcontext.previous;

-				if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
+				if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) &&
XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
{
/*

Normally, backupEndPoint is invalid, and we rely on seeing an
end-of-backup WAL record to mark the location. backupEndPoint is only
set when restoring from a backup that was taken from a standby, but
thanks to the above, recovery incorrectly treats that as end-of-backup.

Fixed, thanks for the report!

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Jeff Janes (#1)

hackersbugs

Re: PITR potentially broken in 9.2

On 2012-11-27 10:08:12 -0800, Jeff Janes wrote:

Doing PITR in 9.2.1, the system claims that it reached a consistent
recovery state immediately after redo starts.
This leads to it various mysterious failures, when it should instead
throw a "requested recovery stop point is before consistent recovery
point" error.
(If you are unlucky, I think it might even silently start up in a
corrupt state.)

This seems to have been introduced in:
commit 8366c7803ec3d0591cf2d1226fea1fee947d56c3
Author: Simon Riggs <simon@2ndQuadrant.com>
Date: Wed Jan 25 18:02:04 2012 +0000

Allow pg_basebackup from standby node with safety checking.
Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.

Jun Ishizuka and Fujii Masao, with minor modifications

the backup file:

START WAL LOCATION: 1/CD89E48 (file 00000001000000010000000C)
STOP WAL LOCATION: 1/1AFA11A0 (file 00000001000000010000001A)
CHECKPOINT LOCATION: 1/188D8120
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2012-11-27 09:40:13 PST
LABEL: label
STOP TIME: 2012-11-27 09:42:10 PST

(The file 00000001000000010000000C was archived at 9:37.)

recovery.conf:
restore_command = 'cp /tmp/archivedir/%f %p'
recovery_target_time = '2012-11-27 09:38:00 PST'

Log file:

22110 2012-11-27 09:49:15.220 PST LOG: database system was
interrupted; last known up at 2012-11-27 09:40:13 PST
22110 2012-11-27 09:49:15.235 PST LOG: starting point-in-time
recovery to 2012-11-27 09:38:00-08
22110 2012-11-27 09:49:15.271 PST LOG: restored log file
"000000010000000100000018" from archive
22110 2012-11-27 09:49:15.367 PST LOG: restored log file
"00000001000000010000000C" from archive
22110 2012-11-27 09:49:15.372 PST LOG: redo starts at 1/CD89E48

Hm. Are you sure its actually reading your backup file? Its hard to say
without DEBUG1 output but I would tentatively say its not reading it at
all because the the "redo starts at ..." message indicates its not using
the checkpoint location from the backup file.

Can you reproduce the issue? If so, can you give an exact guide? If not,
do you still have the datadir et al. from above?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Andres Freund (#4)

hackersbugs

Re: PITR potentially broken in 9.2

On 28.11.2012 15:26, Andres Freund wrote:

Hm. Are you sure its actually reading your backup file? Its hard to say
without DEBUG1 output but I would tentatively say its not reading it at
all because the the "redo starts at ..." message indicates its not using
the checkpoint location from the backup file.

By backup file, you mean the backup history file? Since 9.0, recovery
does not read the backup history file, it's for informational/debugging
purposes only. All the information recovery needs is in the
backup_label, and an end-of-backup WAL record marks the location where
pg_stop_backup() was called, ie. how far the WAL must be replayed for
the backup to be consistent.

Can you reproduce the issue? If so, can you give an exact guide? If not,
do you still have the datadir et al. from above?

I just committed a fix for this, but if you can, it would still be nice
if you could double-check that it now really works.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Heikki Linnakangas (#5)

hackersbugs

Re: PITR potentially broken in 9.2

On 2012-11-28 15:37:38 +0200, Heikki Linnakangas wrote:

On 28.11.2012 15:26, Andres Freund wrote:

Hm. Are you sure its actually reading your backup file? Its hard to say
without DEBUG1 output but I would tentatively say its not reading it at
all because the the "redo starts at ..." message indicates its not using
the checkpoint location from the backup file.

By backup file, you mean the backup history file? Since 9.0, recovery does
not read the backup history file, it's for informational/debugging purposes
only. All the information recovery needs is in the backup_label, and an
end-of-backup WAL record marks the location where pg_stop_backup() was
called, ie. how far the WAL must be replayed for the backup to be
consistent.

I mean the label read by read_backup_label(). Jeff's mail indicated it
had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.

Can you reproduce the issue? If so, can you give an exact guide? If not,

do you still have the datadir et al. from above?

I just committed a fix for this, but if you can, it would still be nice if
you could double-check that it now really works.

Yuck. Too bad that that got in.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Andres Freund (#6)

hackersbugs

Re: PITR potentially broken in 9.2

On 28.11.2012 15:47, Andres Freund wrote:

I mean the label read by read_backup_label(). Jeff's mail indicated it
had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.

That's correct. The checkpoint was at 1/188D8120, but it's redo pointer
was earlier, at 1/CD89E48, so that's where redo had to start.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Heikki Linnakangas (#7)

hackersbugs

Re: PITR potentially broken in 9.2

On 2012-11-28 16:34:55 +0200, Heikki Linnakangas wrote:

On 28.11.2012 15:47, Andres Freund wrote:

I mean the label read by read_backup_label(). Jeff's mail indicated it
had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48.

That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was
earlier, at 1/CD89E48, so that's where redo had to start.

Heh. I I just compared 18 with CD and didn't notice the different length
of both...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Heikki Linnakangas (#3)

hackersbugs

Re: PITR potentially broken in 9.2

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 28.11.2012 06:27, Noah Misch wrote:

I observed a similar problem with 9.2. Despite a restore_command that failed
every time, startup from a hot backup completed. At the time, I suspected a
mistake on my part.

I believe this was caused by this little typo/thinko:

Is this related at all to the problem discussed over at
http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
? The conclusion-so-far in that thread seems to be that an error
ought to be thrown for recovery_target_time earlier than the
backup stop time, but one is not getting thrown.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#5)

hackersbugs

Re: PITR potentially broken in 9.2

On Wed, Nov 28, 2012 at 5:37 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 28.11.2012 15:26, Andres Freund wrote:

Can you reproduce the issue? If so, can you give an exact guide? If not,
do you still have the datadir et al. from above?

Yes, it is reliable enough to be used for "git bisect"

rm /tmp/archivedir/0000000*
initdb
## edit postgresql.conf to set up archiving etc. and set
checkpoint_segments to 60
pg_ctl -D /tmp/data -l /tmp/data/logfile_master start -w
createdb
pgbench -i -s 10
pgbench -T 36000000 &
sleep 120
psql -c "SELECT pg_start_backup('label');"
cp -rp /tmp/data/ /tmp/data_slave
sleep 120
psql -c "SELECT pg_stop_backup();"
rm /tmp/data_slave/pg_xlog/0*
rm /tmp/data_slave/postmaster.*
rm /tmp/data_slave/logfile_master
cp src/backend/access/transam/recovery.conf.sample
/tmp/data_slave/recovery.conf
## edit /tmp/data_slave/recovery.conf to set up restore command and stop point.
cp -rpi /tmp/data_slave /tmp/data_slave2
pg_ctl -D /tmp/data_slave2/ start -o "--port=9876"

At some point, kill the pgbench:
pg_ctl -D /tmp/data stop -m fast

I run the master with fsync off, otherwise to takes to long to
accumulate archived log files.
The checkpoint associated with pg_start_backup takes ~2.5 minutes, so
pick a time that is 1.25 minutes before the time reported in the
backup history or backup_label file for the PITR end time.

I copy data_slave to data_slave2 so that I can try different things
without having to restart the whole process from the beginning.

I just committed a fix for this, but if you can, it would still be nice if
you could double-check that it now really works.

Thanks. In REL9_2_STABLE, it now correctly gives the "requested
recovery stop point is before consistent recovery point" error.

Also if the recovery is started with hot_standby=on and with no
recovery_target_time, in patched REL9_2_STABLE the database becomes
"ready to accept read only connections" at the appropriate time, once
the end-of-backup WAL has been replayed. In 9.2.0 and 9.2.1, it
instead opened for read only connections at the point that the
end-of-checkpoint record (the checkpoint associated with the
pg_start_backup) has replayed, which I think is too early.

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#9)

hackersbugs

Re: PITR potentially broken in 9.2

On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 28.11.2012 06:27, Noah Misch wrote:

I observed a similar problem with 9.2. Despite a restore_command that failed
every time, startup from a hot backup completed. At the time, I suspected a
mistake on my part.

I believe this was caused by this little typo/thinko:

Is this related at all to the problem discussed over at
http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
? The conclusion-so-far in that thread seems to be that an error
ought to be thrown for recovery_target_time earlier than the
backup stop time, but one is not getting thrown.

It is not directly related. That thread was about 9.1.6.

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

(In 9.2.1, recovery sometimes blows up before that particular problem
could be detected, which is what lead me here in the first place--that
is the extent of the relationship AFAIK)

To see this one, follow the instructions in my previous email, but set
recovery_target_time to a time just after the end of the
pg_start_backup checkpoint, rather than just before it, and turn on
hot_standby

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jeff Janes (#11)

hackersbugs

Re: PITR potentially broken in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is this related at all to the problem discussed over at
http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
? The conclusion-so-far in that thread seems to be that an error
ought to be thrown for recovery_target_time earlier than the
backup stop time, but one is not getting thrown.

It is not directly related. That thread was about 9.1.6.

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

(In 9.2.1, recovery sometimes blows up before that particular problem
could be detected, which is what lead me here in the first place--that
is the extent of the relationship AFAIK)

To see this one, follow the instructions in my previous email, but set
recovery_target_time to a time just after the end of the
pg_start_backup checkpoint, rather than just before it, and turn on
hot_standby

I tried to reproduce this as per your directions, and see no problem in
HEAD. Recovery advances to the specified stop time, hot standby mode
wakes up, and it pauses waiting for me to do pg_xlog_replay_resume().
But I can connect to the standby and do that. So I'm unsure what the
problem is.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#12)

hackersbugs

Re: PITR potentially broken in 9.2

On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Wed, Nov 28, 2012 at 7:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is this related at all to the problem discussed over at
http://archives.postgresql.org/pgsql-general/2012-11/msg00709.php
? The conclusion-so-far in that thread seems to be that an error
ought to be thrown for recovery_target_time earlier than the
backup stop time, but one is not getting thrown.

It is not directly related. That thread was about 9.1.6.

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

(In 9.2.1, recovery sometimes blows up before that particular problem
could be detected, which is what lead me here in the first place--that
is the extent of the relationship AFAIK)

To see this one, follow the instructions in my previous email, but set
recovery_target_time to a time just after the end of the
pg_start_backup checkpoint, rather than just before it, and turn on
hot_standby

I tried to reproduce this as per your directions, and see no problem in
HEAD. Recovery advances to the specified stop time, hot standby mode
wakes up,

Hot standby should only wake up once recovery has proceeded beyond the
pg_stop_backup() point.

If the specified stop point is after pg_start_backup() finishes, but
before pg_stop_backup(), then hot standby should not start up (and
with the newest HEAD, in my hands, it does not). Are you sure you set
the stop time to just after pg_start_backup, not to just after
pg_stop_backup?

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jeff Janes (#13)

hackersbugs

Re: PITR potentially broken in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

I tried to reproduce this as per your directions, and see no problem in
HEAD. Recovery advances to the specified stop time, hot standby mode
wakes up,

Hot standby should only wake up once recovery has proceeded beyond the
pg_stop_backup() point.

If the specified stop point is after pg_start_backup() finishes, but
before pg_stop_backup(), then hot standby should not start up (and
with the newest HEAD, in my hands, it does not). Are you sure you set
the stop time to just after pg_start_backup, not to just after
pg_stop_backup?

I'm confused. Are you now saying that this problem only exists in
9.1.x? I tested current HEAD because you indicated the problem was
still there.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#14)

hackersbugs

Re: PITR potentially broken in 9.2

On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 1, 2012 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

In the newly fixed 9_2_STABLE, that problem still shows up the same as
it does in 9.1.6.

I tried to reproduce this as per your directions, and see no problem in
HEAD. Recovery advances to the specified stop time, hot standby mode
wakes up,

Hot standby should only wake up once recovery has proceeded beyond the
pg_stop_backup() point.

If the specified stop point is after pg_start_backup() finishes, but
before pg_stop_backup(), then hot standby should not start up (and
with the newest HEAD, in my hands, it does not). Are you sure you set
the stop time to just after pg_start_backup, not to just after
pg_stop_backup?

I'm confused. Are you now saying that this problem only exists in
9.1.x? I tested current HEAD because you indicated the problem was
still there.

No, I'm saying the problem exists both in 9.1.x and in hypothetical
9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
is masked by that other problem which has just been fixed.

I'll try it again in b1346822f3048ede254647f3a46 just to be sure, but
I'm already fairly sure.

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jeff Janes (#15)

hackersbugs

Re: PITR potentially broken in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm confused. Are you now saying that this problem only exists in
9.1.x? I tested current HEAD because you indicated the problem was
still there.

No, I'm saying the problem exists both in 9.1.x and in hypothetical
9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
is masked by that other problem which has just been fixed.

I'm still confused. I've now tried this in both HEAD and 9.1 branch
tip, and I do not see any misbehavior. If I set recovery_target_time to
before the pg_stop_backup time, I get "FATAL: requested recovery stop
point is before consistent recovery point" which is what I expect; and
if I set it to after the pg_stop_backup time, it starts up as expected.
So if there's a remaining unfixed bug here, I don't understand what
that is.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#16)

hackersbugs

Re: PITR potentially broken in 9.2

On Sun, Dec 2, 2012 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Sat, Dec 1, 2012 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm confused. Are you now saying that this problem only exists in
9.1.x? I tested current HEAD because you indicated the problem was
still there.

No, I'm saying the problem exists both in 9.1.x and in hypothetical
9.2.2 and in hypothetical 9.3, but not in 9.2.[01] because in those it
is masked by that other problem which has just been fixed.

I'm still confused. I've now tried this in both HEAD and 9.1 branch
tip, and I do not see any misbehavior. If I set recovery_target_time to
before the pg_stop_backup time, I get "FATAL: requested recovery stop
point is before consistent recovery point" which is what I expect; and
if I set it to after the pg_stop_backup time, it starts up as expected.
So if there's a remaining unfixed bug here, I don't understand what
that is.

I've reproduced it again using the just-tagged 9.2.2, and uploaded a
135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
drive. The data directory contains the recovery.conf which is set to
end recovery between the two critical time points.

https://docs.google.com/open?id=0Bzqrh1SO9FcES181YXRVdU5NSlk

This is the command line I use to start recovery, and the resulting log output.

https://docs.google.com/open?id=0Bzqrh1SO9FcEaTQ2QXhFdDZYaUE

I can't connect to the standby to execute pg_xlog_replay_resume() because:
FATAL: the database system is starting up

Cheers,

Jeff

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jeff Janes (#17)

hackersbugs

Re: PITR potentially broken in 9.2

Jeff Janes <jeff.janes@gmail.com> writes:

I've reproduced it again using the just-tagged 9.2.2, and uploaded a
135MB tarball of the /tmp/data_slave2 and /tmp/archivedir to google
drive. The data directory contains the recovery.conf which is set to
end recovery between the two critical time points.

Hmmm ... I can reproduce this with current 9.2 branch tip. However,
more or less by accident I first tried it with a 9.2-branch postmaster
from a couple weeks ago, and it works as expected with that: the log
output looks like

LOG: restored log file "00000001000000000000001B" from archive
LOG: restored log file "00000001000000000000001C" from archive
LOG: restored log file "00000001000000000000001D" from archive
LOG: database system is ready to accept read only connections
LOG: recovery stopping before commit of transaction 305610, time 2012-12-02 15:08:54.000131-08
LOG: recovery has paused
HINT: Execute pg_xlog_replay_resume() to continue.

and I can connect and do the pg_xlog_replay_resume() thing.
Note the "ready to accept read only connections" line, which
does not show up with branch tip.

So apparently this is something we broke since Nov 18. Don't know what
yet --- any thoughts? Also, I am still not seeing what the connection
is to the original report against 9.1.6.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Tom Lane (#18)

hackersbugs

Re: PITR potentially broken in 9.2

I wrote:

So apparently this is something we broke since Nov 18. Don't know what
yet --- any thoughts?

Further experimentation shows that reverting commit
ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's
something wrong/incomplete about that fix.

This is a bit urgent since we now have to consider whether to withdraw
9.2.2 and issue a hasty 9.2.3. Do we have a regression here since
9.2.1, and if so how bad is it?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#20

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#19)

hackersbugs

Re: PITR potentially broken in 9.2

On 2012-12-04 19:35:48 -0500, Tom Lane wrote:

I wrote:

So apparently this is something we broke since Nov 18. Don't know what
yet --- any thoughts?

Further experimentation shows that reverting commit
ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's
something wrong/incomplete about that fix.

ISTM that the code should check ControlFile->backupEndRequired, not just
check for an invalid backupEndPoint. I haven't looked into the specific
issue though.

This is a bit urgent since we now have to consider whether to withdraw
9.2.2 and issue a hasty 9.2.3. Do we have a regression here since
9.2.1, and if so how bad is it?

Not sure.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#21

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#18)

hackersbugs

#22

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#18)

hackersbugs

#23

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#19)

hackersbugs

#24

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Jeff Janes (#21)

hackersbugs

#25

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#24)

hackersbugs

#26

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#25)

hackersbugs

#27

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#26)

hackersbugs

#28

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#19)

hackersbugs

#29

Tatsuo Ishii

t-ishii@sra.co.jp

over 13 years ago

In reply to: Tom Lane (#27)

hackersbugs

#30

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tatsuo Ishii (#29)

hackersbugs

#31

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#25)

hackersbugs

#32

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#30)

hackersbugs

#33

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Simon Riggs (#31)

hackersbugs

#34

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Simon Riggs (#31)

hackersbugs

#35

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Simon Riggs (#34)

hackersbugs

#36

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Andres Freund (#33)

hackersbugs

#37

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#33)

hackersbugs

#38

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#37)

hackersbugs

#39

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#38)

hackersbugs

#40

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#39)

hackersbugs

#41

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Andres Freund (#40)

hackersbugs

#42

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#39)

hackersbugs

#43

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Simon Riggs (#42)

hackersbugs

#44

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Simon Riggs (#43)

hackersbugs

#45

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#44)

hackersbugs

#46

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Tom Lane (#45)

hackersbugs

#47

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#39)

hackersbugs

#48

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#46)

hackersbugs

#49

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Jeff Janes (#47)

hackersbugs

#50

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#46)

hackersbugs

#51

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#50)

hackersbugs

#52

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#51)

hackersbugs

#53

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#52)

hackersbugs

#54

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#51)

hackersbugs

#55

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#51)

hackersbugs

#56

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#55)

hackersbugs

#57

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#54)

hackersbugs

#58

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#56)

hackersbugs

#59

Andres Freund

andres@anarazel.de

over 13 years ago

In reply to: Tom Lane (#57)

hackersbugs

#60

Jeff Janes

jeff.janes@gmail.com

over 13 years ago

In reply to: Tom Lane (#49)

hackersbugs

#61

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#59)

hackersbugs