Failing start-up archive recovery at Standby mode in PG9.2.4

Started by KONDO Mitsumasaover 12 years ago24 messages
#1KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
3 attachment(s)

Hi,

I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication.
I test same problem in PG9.2.3. But it is not occerd...

We search this problem patch in PostgreSQL git repository.
It may be occerd in this commit.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ee8b95e8c807f31220f31bcff445d1dcb20c28b4

Attached sciript(replication_test.sh) is reproduced this problem.
If you use this script, you sholud set up about PGPATH and install pgbench.
Script execution summary is under following:

1. Create Master and Standby instance.
2. Insert Master to lot of record using pgbench and stop master by immediate shutdown.
3. Start Master and Standby.
4. Standby fails start-up achive recovery in ValidXLOGHeader, xlog.c:4395.
5. Standby thinks achive recovery is finished. It is mistake!
6. Standby connects to Master for stream replication, but it fail.

cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive
[Standby] 2013-04-22 01:27:25 EDTLOCATION: RestoreArchivedFile, xlog.c:3273
[Standby] 2013-04-22 01:27:25 EDTDEBUG: 00000: got WAL segment from archive
[Standby] 2013-04-22 01:27:25 EDTLOCATION: XLogFileReadAnyTLI, xlog.c:2930
[Standby] 2013-04-22 01:27:25 EDTDEBUG: 00000: compacted fsync request queue from 4096 entries to 2 entries
[Standby] 2013-04-22 01:27:25 EDTCONTEXT: xlog redo multi-insert (init): rel 1663/12870/24588; blk 41575; 61 tuples
[Standby] 2013-04-22 01:27:25 EDTLOCATION: CompactCheckpointerRequestQueue, checkpointer.c:1288
[Standby] 2013-04-22 01:27:25 EDTDEBUG: 00000: checkpoint sync: number=1 file=base/12870/24588 time=822.092 msec
[Standby] 2013-04-22 01:27:25 EDTLOCATION: mdsync, md.c:1161
[Standby] 2013-04-22 01:27:25 EDTDEBUG: 00000: checkpoint sync: number=2 file=base/12870/24588_fsm time=0.908 msec
[Standby] 2013-04-22 01:27:25 EDTLOCATION: mdsync, md.c:1161
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restartpoint complete: wrote 6 buffers (0.1%); 0 transaction log file(s) added, 3 removed, 4 recycled; write=0.437 s, sync=0.823 s, total=1.274 s; sync files=2, longest=0.822 s, average=0.411 s
[Standby] 2013-04-22 01:27:25 EDTLOCATION: LogCheckpointEnd, xlog.c:7893
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: recovery restart point at 0/F0FF8D0
[Standby] 2013-04-22 01:27:25 EDTDETAIL: last completed transaction was at log time 2013-04-22 01:26:32.439575-04
[Standby] 2013-04-22 01:27:25 EDTLOCATION: CreateRestartPoint, xlog.c:8601
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restartpoint starting: xlog
[Standby] 2013-04-22 01:27:25 EDTLOCATION: LogCheckpointStart, xlog.c:7821
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません

[* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0

[Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389

[mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:30 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:30 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

[Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389
/pg_ctl -D Standby cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:35 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:35 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:35 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:35 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:35 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:35 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:35 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed

We try to fix this problem with my company's colleaguea,
but it is too difficult and complicated to WAL contorol and validation method...

This problem is occerd in xlog.c:10692.

10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
10693 goto next_record_is_invalid;

And problem is occerd this checking in ValidXLOGHeader().

4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
4391 {
4392 ereport(emode_for_corrupt_record(emode, recaddr),
4393 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u",
4394 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff,
4395 readId, readSeg, readOff)));
4396 return false;
4397 }

We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
We cannot analyze why this value is uncorrect!

Please fix this problem with us!

Best regards

--
NTT OSS Center
Mitsumasa KONDO

Attachments:

replication_test.shtext/plain; charset=Shift_JIS; name=replication_test.shDownload
pg9.2.4.logtext/plain; charset=Shift_JIS; name=pg9.2.4.logDownload
pg9.2.3.logtext/plain; charset=Shift_JIS; name=pg9.2.3.logDownload
#2Andres Freund
andres@2ndquadrant.com
In reply to: KONDO Mitsumasa (#1)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

ello,

On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote:

Hi,

I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication.
I test same problem in PG9.2.3. But it is not occerd...

cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive

I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません

[* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0

[Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389

[mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:30 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:30 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.

We try to fix this problem with my company's colleaguea,
but it is too difficult and complicated to WAL contorol and validation method...

This problem is occerd in xlog.c:10692.

10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
10693 goto next_record_is_invalid;

And problem is occerd this checking in ValidXLOGHeader().

4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
4391 {
4392 ereport(emode_for_corrupt_record(emode, recaddr),
4393 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u",
4394 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff,
4395 readId, readSeg, readOff)));
4396 return false;
4397 }

We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
We cannot analyze why this value is uncorrect!

I think the tests is just noticing that we don't have valid WAL - we are
probably reading old preallocated wal files from before starting the
node as a standby. Normally it will get the correct data from the
primary after that (you can see the START_REPLICATION commands in the
log), but that failed.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Andres Freund (#2)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Hello,

cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive

I can't read the error message here, but this looks suspicious.

The error message is "No such file or directory" in Japanese :-).
Perhaps restore_command is 'cp ../arc/%f %p'.

The recovery command seems to be returning success although it prints such

an error message?

It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
filename even if the restore command ends up in failure. And the
caller XLogFileOpen successfully ets and returns fd for the filename
if the file exists since before the restoration attempt. It does not
matter when the existing file and the file expected to be restored are
identical in content.

If the segment includes the timeline increment point, something might
happen, I suppose..

regards,

--
Kyotaro Horiguchi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Kyotaro HORIGUCHI (#3)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Sorry,

caller XLogFileOpen successfully ets and returns fd for the filename

The caller is XLogFileRead in this case.

# and 'ets' is gets, of course.

regards,
--
Kyotaro Horiguchi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#3)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 2013-04-24 19:16:12 +0900, Kyotaro HORIGUCHI wrote:

Hello,

cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive

I can't read the error message here, but this looks suspicious.

The error message is "No such file or directory" in Japanese :-).
Perhaps restore_command is 'cp ../arc/%f %p'.

The recovery command seems to be returning success although it prints such

an error message?

It seems that, on 9.2.4, RestoreArchivedFile returns expected xlog
filename even if the restore command ends up in failure. And the
caller XLogFileOpen successfully ets and returns fd for the filename
if the file exists since before the restoration attempt. It does not
matter when the existing file and the file expected to be restored are
identical in content.

But thats not what happening here, afaics the "restore log file ..."
message is only printed if the returncode is 0.

And the return value from RestoreArchivedFile() would be different in
the case you describe which would make XLogFileRead() fail, so it should
result in different errors.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Andres Freund (#5)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Oops,

But thats not what happening here, afaics the "restore log file ..."
message is only printed if the returncode is 0.

You're right. 'cp <nonexistent> <somewhere>' exits with the status
code 1 (or 256?).

The quoted log lines simply show that segment for tli=3 did not exist
and that for tli=2 had been gotten finally. It's quite normal and
irrelevant to the trouble mentioned. Sorry for the confusion.

Unfortunately, the script attached didn't reproduce the situation for
me. But in the log attached,

[Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed
cp: cannot stat `../arc/000000030000000000000014': No such file or directory.
[Standby] 2013-04-22 01:27:30 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0

seems showing that the standby received negative ack for the request
for 20th segment, and 5 seconds later, it tried to get that from
archive and somehow thought that it'd gotten but the header is of 6th
segment.

regards,

--
Kyotaro Horiguchi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#6)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

I had a bit look on it and came up with an hypothesis.. umm or a
scenario.

==

Just after restartpoint, too old xlog files are recycled but its
page header has old content, specifically, xlogid and xrecoff.

Plus, if the master's LSN is at the head of new segment file, the
file for the segment may have not been created and the WAL send
request for that LSN from the standby might fail as "requested
WAL segment %s has already been removed", in spite of the segment
is "not yet created"...

So the standby received nack for the request and looks for
archive but the file is properly not existent. But the file of
that segment is certainly found in pg_xlog directory. So
XLogFileRead grabs the old-in-content file and...

seems showing that the standby received negative ack for the request
for 20th segment, and 5 seconds later, it tried to get that from
archive and somehow thought that it'd gotten but the header is of 6th
segment.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#2)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Hi,

I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication.
I test same problem in PG9.2.3. But it is not occerd...

cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive

I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

Sorry.. This suspicious massage is written in japanese.
It means "missing file or directory". Beacause starting point of timeline 3 WAL file is 00000003000000000000002A.

cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません

[* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0

[Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:25 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

[Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389

[mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
[Standby] 2013-04-22 01:27:30 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0
[Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000
[Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449
[Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed
[Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023
[Standby] 2013-04-22 01:27:30 EDTLOG: 00000: streaming replication successfully connected to primary
[Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171
[Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment 000000030000000000000014 has already been removed

And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.

In this situation, I try to up timelineID by archive recovery, not also recover WAL files.
If achive recovery works correctly in Standby, recovery done point is 000000020000000000000029 in timelineID 2.
Please see detail in pg9.2.4.log. Situation of archive file is here.

We try to fix this problem with my company's colleaguea,
but it is too difficult and complicated to WAL contorol and validation method...

This problem is occerd in xlog.c:10692.

10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
10693 goto next_record_is_invalid;

And problem is occerd this checking in ValidXLOGHeader().

4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
4391 {
4392 ereport(emode_for_corrupt_record(emode, recaddr),
4393 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u",
4394 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff,
4395 readId, readSeg, readOff)));
4396 return false;
4397 }

We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
We cannot analyze why this value is uncorrect!

I think the tests is just noticing that we don't have valid WAL - we are
probably reading old preallocated wal files from before starting the
node as a standby. Normally it will get the correct data from the
primary after that (you can see the START_REPLICATION commands in the
log), but that failed.

I hope so! I contenue to try to analayze this problem..

Best regards

--
NTT Open Sorce Software Center
Mitsumasa KONDO

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Amit Langote
amitlangote09@gmail.com
In reply to: Kyotaro HORIGUCHI (#7)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

I also had a similar observation when I could reproduce this.
I tried to find why restartpoint causes the recycled segment to be named
after timeline 3, but have not been able to determine that.

When I looked at the source, I found that, the function XLogFileReadAnyTLI
which returns a segment file for reading a XLog page iterates over a list
expectedTLIs which starts with 3 in such a case (that is, in case where this
error happens). XLogFileReadAnyTLI checks the segment in both archive and
pg_xlog. So, even if such a segment (that is with timeline 3) may not be in
the archive , it may be in pg_xlog, due to recycling as we have observed.
So, such a recycled segment may be returned by XLogFileReadAnyTLI as though
it were the next segment to recover from, resulting in the "unexpected
pageaddr ..." error.

I could not understand (in case this error happens) how expectedTLIs list
had 3 at its head (for example, when XLogFileReadAnyTLI used it as we
observed) whereas at other times (when this error does not happen), it has 2
at its head until all the segments of timeline 2 are recovered from?
Also, how does recycled segment gets timeline 3 whereas 2 is expected in
this case?

Is this the right way to look at the problem and its possible fix?

I have tried going through the source regarding this but have not been able
to determine where this could accidentally happen, partly because I do not
understand recovery process (and its code) very well. Will post if find
anything useful.

regards,
Amit Langote

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Amit Langote (#9)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Hmm. I think that I caught the tail of the problem..

Script with small change reproduced the situation for me.

The latest standby uses 3 as its TLI after the history file
0..3.history which could get from the archive. So the WAL files
recycled on this standby will have the TLI=3.
Nevertheless the LSN of the segment recycled on standby is on the
TLI=2 in the master, the standby makes the first request for each
segment with that LSN but TLI = 3 to the master because the standby
runs on recoveryTargetTLI=3. The master reasonably doesn't have it and
finally the standby finds that wrong WAL file in its pg_xlog directory
before the second request with TLI=2 would be made.

In conclusion, the standby should name the recycled WAL segment using
the same TLI for the LSN on the master. Or should never recycle WAL
files..

regards,

--
Kyotaro Horiguchi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Kyotaro HORIGUCHI (#10)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 25.04.2013 17:55, Kyotaro HORIGUCHI wrote:

Hmm. I think that I caught the tail of the problem..

Script with small change reproduced the situation for me.

Can you share the modified script, please?

The latest standby uses 3 as its TLI after the history file
0..3.history which could get from the archive. So the WAL files
recycled on this standby will have the TLI=3.
Nevertheless the LSN of the segment recycled on standby is on the
TLI=2 in the master, the standby makes the first request for each
segment with that LSN but TLI = 3 to the master because the standby
runs on recoveryTargetTLI=3. The master reasonably doesn't have it and
finally the standby finds that wrong WAL file in its pg_xlog directory
before the second request with TLI=2 would be made.

I'm not sure I understand what the problem is, though. When the standby
opens the bogus, recycled, file in pg_xlog, it will notice that the
header is incorrect, and retry reading the file from the archive.

In conclusion, the standby should name the recycled WAL segment using
the same TLI for the LSN on the master. Or should never recycle WAL
files..

Perhaps, but it should nevertheless not get confused by recycled segments.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Kyotaro HORIGUCHI (#10)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

I forgot it.

In conclusion, the standby should name the recycled WAL segment using

the same TLI for the LSN on the master. Or should never recycle WAL
files.

Or the standby should make the request with correct TLI at first
consulting the timeline history.
Or the standby should make retry with the more small TLIs after it
gets broken segments.

regards,

--
Kyotaro Horiguchi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Heikki Linnakangas (#11)
1 attachment(s)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Can you share the modified script, please?

Please find the attached files:
test.sh : test script. most significant change is the load.
I used simple insert instead of pgbench.
It might need some more adjustment for other environment
as my usual.
xlog.c.diff : Additional log output I thought to be useful to diagnose.

I'm not sure I understand what the problem is, though. When the standby
opens the bogus, recycled, file in pg_xlog, it will notice that the header
is incorrect, and retry reading the file from the archive.

It looks the corrupted header only once. It continues to make retry
without looking there after that.

Perhaps, but it should nevertheless not get confused by recycled segments.

regards, and good night.
--
Kyotaro Horiguchi

Attachments:

promtest_20130426.tar.gzapplication/x-gzip; name=promtest_20130426.tar.gzDownload
#14Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Kyotaro HORIGUCHI (#13)
1 attachment(s)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 25.04.2013 18:56, Kyotaro HORIGUCHI wrote:

Can you share the modified script, please?

Please find the attached files:
test.sh : test script. most significant change is the load.
I used simple insert instead of pgbench.
It might need some more adjustment for other environment
as my usual.
xlog.c.diff : Additional log output I thought to be useful to diagnose.

Ok, thanks, I see what's going on now. The problem is that once
XLogFileRead() finds a file with tli X, it immediately sets curFileTLI =
X. XLogFileReadAnyTLI() never tries to read files with tli < curFileTLI.
So, if recovery finds a file with the right filename, e.g
000000030000000000000008, it never tries to open
000000020000000000000008 anymore, even if the contents of
000000030000000000000008 later turn out to be bogus.

One idea to fix this is to not set curFileTLI, until the page header on
the just-opened file has been verified. Another idea is to change the
check in XLogFileReadAnyTLI() that currently forbids curFileTLI from
moving backwards. We could allow curFileTLI to move backwards, as long
as the tli is >= ThisTimeLineID (ThisTimeLineID is the current timeline
we're recovering records from).

Attached is a patch for the 2nd approach. With the patch, the test
script works for me. Thoughts?

PS. This wasn't caused by the 9.2.4 change to do crash recovery before
entering archive recovery. The test script fails in the same way with
9.2.3 as well.

- Heikki

Attachments:

dont-get-confused-by-recycled-segments-1.patchtext/x-diff; name=dont-get-confused-by-recycled-segments-1.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 690077c..4ca75d7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -251,8 +251,7 @@ static bool recoveryStopAfter;
  * curFileTLI: the TLI appearing in the name of the current input WAL file.
  * (This is not necessarily the same as ThisTimeLineID, because we could
  * be scanning data that was copied from an ancestor timeline when the current
- * file was created.)  During a sequential scan we do not allow this value
- * to decrease.
+ * file was created.)
  */
 static TimeLineID recoveryTargetTLI;
 static bool recoveryTargetIsLatest = false;
@@ -657,7 +656,7 @@ static bool InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
 static int XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode,
-				   int sources);
+				   int sources, bool randAccess);
 static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -2899,7 +2898,8 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
  * This version searches for the segment with any TLI listed in expectedTLIs.
  */
 static int
-XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources)
+XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources,
+				   bool randAccess)
 {
 	char		path[MAXPGPATH];
 	ListCell   *cell;
@@ -2909,17 +2909,16 @@ XLogFileReadAnyTLI(uint32 log, uint32 seg, int emode, int sources)
 	 * Loop looking for a suitable timeline ID: we might need to read any of
 	 * the timelines listed in expectedTLIs.
 	 *
-	 * We expect curFileTLI on entry to be the TLI of the preceding file in
-	 * sequence, or 0 if there was no predecessor.	We do not allow curFileTLI
-	 * to go backwards; this prevents us from picking up the wrong file when a
-	 * parent timeline extends to higher segment numbers than the child we
-	 * want to read.
+	 * During a sequential read, do not check TLIs smaller than the timeline
+	 * we're currently recovering (ThisTimeLineID); this prevents us from
+	 * picking up the wrong file when a parent timeline extends to higher
+	 * segment numbers than the child we want to read.
 	 */
 	foreach(cell, expectedTLIs)
 	{
 		TimeLineID	tli = (TimeLineID) lfirst_int(cell);
 
-		if (tli < curFileTLI)
+		if (!randAccess && tli < ThisTimeLineID)
 			break;				/* don't bother looking at too-old TLIs */
 
 		if (sources & XLOG_FROM_ARCHIVE)
@@ -10510,9 +10509,6 @@ retry:
 						close(readFile);
 						readFile = -1;
 					}
-					/* Reset curFileTLI if random fetch. */
-					if (randAccess)
-						curFileTLI = 0;
 
 					/*
 					 * Try to restore the file from archive, or read an
@@ -10573,7 +10569,7 @@ retry:
 					/* Don't try to read from a source that just failed */
 					sources &= ~failedSources;
 					readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
-												  sources);
+												  sources, randAccess);
 					switched_segment = true;
 					if (readFile >= 0)
 						break;
@@ -10607,16 +10603,12 @@ retry:
 			{
 				int			sources;
 
-				/* Reset curFileTLI if random fetch. */
-				if (randAccess)
-					curFileTLI = 0;
-
 				sources = XLOG_FROM_PG_XLOG;
 				if (InArchiveRecovery)
 					sources |= XLOG_FROM_ARCHIVE;
 
 				readFile = XLogFileReadAnyTLI(readId, readSeg, emode,
-											  sources);
+											  sources, randAccess);
 				switched_segment = true;
 				if (readFile < 0)
 					return false;
#15Kyotaro HORIGUCHI
kyota.horiguchi@gmail.com
In reply to: Heikki Linnakangas (#14)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Thank you for the patch.

The test script finishes in success with that. And looks reasonable on a
short glance.

On Fri, Apr 26, 2013 at 4:34 AM, Heikki Linnakangas <hlinnakangas@vmware.com>
wrote:

One idea to fix this is to not set curFileTLI, until the page header on

the

just-opened file has been verified. Another idea is to change the check in
XLogFileReadAnyTLI() that currently forbids curFileTLI from moving
backwards. We could allow curFileTLI to move backwards, as long as the
tli is >= ThisTimeLineID (ThisTimeLineID is the current timeline we're
recovering records from).

Attached is a patch for the 2nd approach. With the patch, the test script
works for me. Thoughts?

I am uncertain a bit weather it is necessary to move curFileTLI to
anywhere randomly read . On a short glance, the random access occurs also
for reading checkpoint-related records.
Also I don't have clear distinction between lastSegmentTLI and curFileTLI
after the patch applied. Although , I need look closer around them to
understand.

PS. This wasn't caused by the 9.2.4 change to do crash recovery before
entering archive recovery. The test script fails in the same way with

9.2.3

as well.

--
Kyotaro Horiguchi

#16Amit Langote
amitlangote09@gmail.com
In reply to: Kyotaro HORIGUCHI (#15)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

How would code after applying this patch behave if a recycled segment gets
renamed using the newest timeline (say 3) while we are still recovering from
a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
that recycled segment as the next segment to recover from, we get the error.
And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
be recoveryTargetTLI at all times, is that right?), it will return that
wrong (recycled segment) in the first iteration itself.

The code for renaming the recycling segment remains unaffected by this
patch, right? That is, the code which assigns timelineID to newly created /
recycled segments is not changed.

(a snippet from CreateRestartPoint() )

/*
* Update ThisTimeLineID to the recovery target timeline, so that
* we install any recycled segments on the correct timeline.
*/

ThisTimeLineID = GetRecoveryTargetTLI(); <<<<<

RemoveOldXlogFiles(_logId, _logSeg, endptr);

/*
* Make more log segments if needed. (Do this after recycling old log
* segments, since that may supply some of the needed files.)
*/
PreallocXlogFiles(endptr);

So, would we still get a similar behavior (wherein the wrongly named
recycled segments are produced) and cause the error to happen again? Do we
need to look here?

What do you think?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753352.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Amit Langote
amitlangote09@gmail.com
In reply to: Kyotaro HORIGUCHI (#15)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

How would code after applying this patch behave if a recycled segment gets
renamed using the newest timeline (say 3) while we are still recovering from
a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
that recycled segment as the next segment to recover from, we get the error.
And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
be recoveryTargetTLI at all times, is that right?), it will return that
wrong (recycled segment) in the first iteration itself.

The code for renaming the recycling segment remains unaffected by this
patch, right? That is, the code which assigns timelineID to newly created /
recycled segments is not changed.

(a snippet from CreateRestartPoint() )

/*
* Update ThisTimeLineID to the recovery target timeline, so that
* we install any recycled segments on the correct timeline.
*/

ThisTimeLineID = GetRecoveryTargetTLI(); <<<<<

RemoveOldXlogFiles(_logId, _logSeg, endptr);

/*
* Make more log segments if needed. (Do this after recycling old log
* segments, since that may supply some of the needed files.)
*/
PreallocXlogFiles(endptr);

So, would we still get a similar behavior (wherein the wrongly named
recycled segments are produced) and cause the error to happen again? Do we
need to look here?

What do you think?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753353.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Kyotaro HORIGUCHI (#15)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 26.04.2013 07:02, Kyotaro HORIGUCHI wrote:

I am uncertain a bit weather it is necessary to move curFileTLI to
anywhere randomly read . On a short glance, the random access occurs also
for reading checkpoint-related records.

I didn't understand that.

Also I don't have clear distinction between lastSegmentTLI and curFileTLI
after the patch applied.

In short, lastSegmentTLI is the TLI in first page of the last opened
segment, while curFileTLI is the TLI in the filename of the last opened
segment. Usually they're the same, but in a segment that contains a
timeline switch, they differ. For example, if you perform PITR to a
point in the middle of segment 000000010000000000000062, you end up with
two segments:

000000010000000000000062
000000020000000000000062

The first half of those files are identical, but the latter contains a
time-line changing checkpoint record in the middle, and from that point
on the contents are different. When we open file
000000020000000000000062, lastSegmentTLI is 1, because the first half of
that segment contains WAL from timeline 1, but curFileTLI is 2, because
that's the TLI in the filename.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Amit Langote
amitlangote09@gmail.com
In reply to: Heikki Linnakangas (#14)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

What would happen if a recycled segment gets assigned a newer timeline than
the one we are currently recovering from? In the reported erroneous
behavior, that is what happens causing XLogFileReadAnyTLI() to return such
bogus segment causing the error. Since, expectedTLIs contains a newer
timeline at its head, and if such a wrongly recycled (that is one bearing a
newer timeline than curFileTLI) segment exists in pg_xlog,
XLogFileReadAnyTLI() would return fd of that segment itself causing the
error in question. In next retry, the same thing would happen and prevent
standby startup from proceeding any further.

So, should we also look at the segment recycling code to see how it names
the recycled segments (what timelineID does it use) ? For example, in
CreateRestartPoint() where segments are recycled?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753358.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Amit Langote (#17)
1 attachment(s)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 26.04.2013 07:47, Amit Langote wrote:

How would code after applying this patch behave if a recycled segment gets
renamed using the newest timeline (say 3) while we are still recovering from
a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
that recycled segment as the next segment to recover from, we get the error.
And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
be recoveryTargetTLI at all times, is that right?), it will return that
wrong (recycled segment) in the first iteration itself.

As long as the right segment is present in the archive, that's OK. Even
if a recycled segment with higher TLI is in pg_xlog, with the patch
we'll still read the segment with lower TLI from the archive. But there
is a corner-case where a recycled segment with a higher TLI masks a
segment with lower TLI in pg_xlog. For example, if you try to recover by
copying all the required WAL files directly into pg_xlog, without using
restore_command, you could run into problems.

So yeah, I think you're right and we need to rethink the recycling. The
first question is, do we have to recycle WAL segments during recovery at
all? It's pointless when we're restoring from archive with
restore_command; the recycled files will just get replaced with files
from the archive. It does help when walreceiver is active, but I wonder
how significant it is in practice.

I guess the safest, smallest change is to use a lower TLI when
installing the recycled files. So, instead of using the current recovery
target timeline, use the ID of the timeline we're currently recovering.
That way the reycycled segments will never have a higher TLI than other
files that recovery will try to replay. See attached patch.

- Heikki

Attachments:

dont-get-confused-by-recycled-segments-2.patchtext/x-diff; name=dont-get-confused-by-recycled-segments-2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30d877b..cfd8a34 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8459,10 +8459,15 @@ CreateRestartPoint(int flags)
 		PrevLogSeg(_logId, _logSeg);
 
 		/*
-		 * Update ThisTimeLineID to the recovery target timeline, so that
-		 * we install any recycled segments on the correct timeline.
+		 * Update ThisTimeLineID to the timeline we're currently recovering,
+		 * so that we install any recycled segments on the correct timeline.
+		 * (This might be higher than the TLI of the restartpoint we just
+		 * made, if a timeline switch was replayed while we were performing
+		 * the restartpoint.)
 		 */
-		ThisTimeLineID = GetRecoveryTargetTLI();
+		SpinLockAcquire(&xlogctl->info_lck);
+		ThisTimeLineID = XLogCtl->lastCheckPoint.ThisTimeLineID;
+		SpinLockRelease(&xlogctl->info_lck);
 
 		RemoveOldXlogFiles(_logId, _logSeg, endptr);
 
#21KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#14)
1 attachment(s)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Hi,

I discavered the problem cause. I think taht horiguchi's discovery is another problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in archive recovery.
In recovery mode, I think that RestartPoint can write only MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.

[mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
000000020000000000000003 000000020000000000000007 00000002000000000000000B 00000003.history
000000020000000000000004 000000020000000000000008 00000002000000000000000C 00000003000000000000000E
000000020000000000000005 000000020000000000000009 00000002000000000000000D 00000003000000000000000F
000000020000000000000006 00000002000000000000000A 00000002000000000000000E archive_status

This problem case is here.

[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: attempting to remove WAL segments older than log file 000000030000000000000002
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3568
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction log file "000000010000000000000002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction log file "000000020000000000000002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint complete: wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s, average=0.392 s
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: recovery restart point at 0/90FE448
[Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction was at log time 2013-04-26 04:25:53.203725-04
[Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint, xlog.c:8601
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint starting: xlog
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart, xlog.c:7821
cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file "00000003000000000000000F" from archive: return code 256
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile, xlog.c:3323
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: unexpected pageaddr 0/2000000 in log file 0, segment 15, offset 0
[Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file "00000003000000000000000F" from archive: return code 256

In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search pg_xlog directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal) in pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.

Best Regard,
--
NTT Open Source Software Center
Mitsumasa KONDO

Attachments:

fix-ArchiveRecovery-in-StandbyMode.patchtext/x-diff; name=fix-ArchiveRecovery-in-StandbyMode.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5452ae1..ae4bcd8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7371,7 +7371,8 @@ CreateRestartPoint(int flags)
         * restartpoint. It's assumed that flushing the buffers will do that as a
         * side-effect.
         */
-       if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
+       if (( ControlFile->state == DB_IN_ARCHIVE_RECOVERY && RecoveryInProgress()) ||
+               XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
                lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
        {
                ereport(DEBUG2,
#22Mitsumasa KONDO
kondo.mitsumasa@gmail.com
In reply to: KONDO Mitsumasa (#21)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during
archive recovery. But I cannot recognize why illegal WAL file was created
in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(), first check WAL file does not get
fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()

if (sources & XLOG_FROM_ARCHIVE)
{
fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
if (fd != -1)
{
elog(DEBUG1, "got WAL segment from archive");
return fd;
}
}

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And
return illegal WAL File’s fd.

XLogFileReadAnyTLI()

if (sources & XLOG_FROM_PG_XLOG)
{
fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
if (fd != -1)
return fd;
}

Returned fd is be readFile value. Of cource readFile value is over 0. So
out of for-loop.

XLogPageRead

readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
sources);
switched_segment = true;
if (readFile >= 0)
break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead

if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
{
ereport(emode_for_corrupt_record(emode, *RecPtr),
(errcode_for_file_access(),
errmsg("could not seek in log file %u, segment %u to offset %u: %m",
readId, readSeg, readOff)));
goto next_record_is_invalid;
}
if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
{
ereport(emode_for_corrupt_record(emode, *RecPtr),
(errcode_for_file_access(),
errmsg("could not read from log file %u, segment %u, offset %u: %m",
readId, readSeg, readOff)));
goto next_record_is_invalid;
}
if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
goto next_record_is_invalid;

I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO

#23Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: KONDO Mitsumasa (#21)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

On 26.04.2013 11:51, KONDO Mitsumasa wrote:

Hi,

I discavered the problem cause. I think taht horiguchi's discovery is
another problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in
archive recovery.
In recovery mode, I think that RestartPoint can write only
MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.

[mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
000000020000000000000003 000000020000000000000007
00000002000000000000000B 00000003.history
000000020000000000000004 000000020000000000000008
00000002000000000000000C 00000003000000000000000E
000000020000000000000005 000000020000000000000009
00000002000000000000000D 00000003000000000000000F
000000020000000000000006 00000002000000000000000A
00000002000000000000000E archive_status

This problem case is here.

[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: attempting to remove
WAL segments older than log file 000000030000000000000002
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3568
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
log file "000000010000000000000002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
log file "000000020000000000000002"
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
xlog.c:3607
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint complete:
wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2
recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3,
longest=1.176 s, average=0.392 s
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: recovery restart point
at 0/90FE448
[Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction
was at log time 2013-04-26 04:25:53.203725-04
[Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint,
xlog.c:8601
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint starting: xlog
[Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart,
xlog.c:7821
cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
やディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
"00000003000000000000000F" from archive: return code 256
[Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile,
xlog.c:3323
[Standby] 2013-04-26 04:26:44 EDT LOG: 00000: unexpected pageaddr
0/2000000 in log file 0, segment 15, offset 0
[Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
やディレクトリはありません
[Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
"00000003000000000000000F" from archive: return code 256

In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search
pg_xlog directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal)
in pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing
MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.

I didn't understand this. I committed a fix for the issue where recycled
WAL files get in the way of recovery, but I'm not sure if you're
describing the same issue or something else. But before we dig any
deeper into this, can you verify if you're still seeing any issues on
9.3beta1?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Heikki Linnakangas (#23)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4

(2013/05/07 22:40), Heikki Linnakangas wrote:

On 26.04.2013 11:51, KONDO Mitsumasa wrote:

So I fix CreateRestartPoint at branching point of executing
MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.

I didn't understand this. I committed a fix for the issue where recycled WAL
files get in the way of recovery, but I'm not sure if you're describing the same
issue or something else. But before we dig any deeper into this, can you verify
if you're still seeing any issues on 9.3beta1?

Yes, my fix was very plain and mistake fix point. I see your patch, your fix is
right. I posted another problem that is cannot promote and PITR. It might be
different problem, uut I could not reproduce it...
Please wait for more! I try to analyze and test in 9.2.4 with your patch and
9.3beta1.

Best regards,
--
NTT Open Source Software Center
Mitsumasa KONDO

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers