help with error "unexpected pageaddr"

Started by Scot Kreienkampover 15 years ago5 messagesgeneral
Jump to latest
#1Scot Kreienkamp
SKreien@la-z-boy.com

Hey everyone,

We have a PG 8.3.7 server that is doing WAL log shipping to 2 other
servers that are remote mirrors. This has been working well for almost
two years. Last night we did some massive data and structure changes to
one of our databases. Since then I get these errors on the two mirrors:

2010-09-15 08:35:05 EDT: LOG: restored log file
"0000000100000301000000D9" from archive

2010-09-15 08:35:27 EDT: LOG: restored log file
"0000000100000301000000DA" from archive

2010-09-15 08:35:40 EDT: LOG: restored log file
"0000000100000301000000DB" from archive

2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log
file 769, segment 219, offset 0

2010-09-15 08:35:40 EDT: LOG: redo done at 301/DA370780

2010-09-15 08:35:40 EDT: LOG: last completed transaction was at log
time 2010-09-15 08:30:01.24936-04

2010-09-15 08:35:40 EDT: LOG: restored log file
"0000000100000301000000DA" from archive

2010-09-15 08:36:26 EDT: LOG: selected new timeline ID: 2

2010-09-15 08:37:11 EDT: LOG: archive recovery complete

I've taken two separate file level backups and tried to restart the
mirrors, and every time on both servers I get a similar error message.
I seem to recall reading that it may have something to do with
corruption in the timeline, which is why it's jumping to a new timeline
ID.

1. Can anyone tell me what this means?

2. Is there some corruption in the database?

3. If so, is there an easy way to fix it?

Also, one additional question. I don't have a 00001.history file which
makes the PITRTools complain constantly. Is there any way to regenerate
this file?

Any help would be much appreciated. I'm rather worried that I've got
corruption, and not having the mirrors running puts us at risk for data
loss.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scot Kreienkamp (#1)
Re: help with error "unexpected pageaddr"

"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:

We have a PG 8.3.7 server that is doing WAL log shipping to 2 other
servers that are remote mirrors. This has been working well for almost
two years. Last night we did some massive data and structure changes to
one of our databases. Since then I get these errors on the two mirrors:

2010-09-15 08:35:05 EDT: LOG: restored log file
"0000000100000301000000D9" from archive

2010-09-15 08:35:27 EDT: LOG: restored log file
"0000000100000301000000DA" from archive

2010-09-15 08:35:40 EDT: LOG: restored log file
"0000000100000301000000DB" from archive

2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log
file 769, segment 219, offset 0

This appears to indicate that you archived the wrong contents of log
file 0000000100000301000000DB. If you don't still have the correct
contents on the master, I think the only way to recover is to take a
fresh base backup so you can make the slaves roll forward from a point
later than this log segment. There's no reason to suppose that there's
data corruption on the master, just bad data in the WAL archive.

You'd probably be well advised to look closely at your WAL archiving
script to see if it has any race conditions that might be triggered by
very fast generation of WAL.

Also, one additional question. I don't have a 00001.history file which
makes the PITRTools complain constantly. Is there any way to regenerate
this file?

Just ignore that, it's cosmetic (the file isn't supposed to exist).

regards, tom lane

#3Scot Kreienkamp
SKreien@la-z-boy.com
In reply to: Tom Lane (#2)
Re: help with error "unexpected pageaddr"

"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:

We have a PG 8.3.7 server that is doing WAL log shipping to 2 other
servers that are remote mirrors. This has been working well for

almost

two years. Last night we did some massive data and structure changes

to

one of our databases. Since then I get these errors on the two

mirrors:

2010-09-15 08:35:05 EDT: LOG: restored log file
"0000000100000301000000D9" from archive

2010-09-15 08:35:27 EDT: LOG: restored log file
"0000000100000301000000DA" from archive

2010-09-15 08:35:40 EDT: LOG: restored log file
"0000000100000301000000DB" from archive

2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log
file 769, segment 219, offset 0

This appears to indicate that you archived the wrong contents of log
file 0000000100000301000000DB. If you don't still have the correct
contents on the master, I think the only way to recover is to take a
fresh base backup so you can make the slaves roll forward from a point
later than this log segment. There's no reason to suppose that there's
data corruption on the master, just bad data in the WAL archive.

You'd probably be well advised to look closely at your WAL archiving
script to see if it has any race conditions that might be triggered by
very fast generation of WAL.

Also, one additional question. I don't have a 00001.history file

which

makes the PITRTools complain constantly. Is there any way to

regenerate

this file?

Just ignore that, it's cosmetic (the file isn't supposed to exist).

regards, tom lane

Tom,

I tried to take a new base backup about 45 minutes ago. The master has
rolled forward a number of WAL files since I last tried, but it still
fails.

LOG: restored log file "0000000100000301000000FE" from archive
LOG: restored log file "000000010000030200000000" from archive
LOG: restored log file "000000010000030200000001" from archive
LOG: restored log file "000000010000030200000002" from archive
LOG: restored log file "000000010000030200000003" from archive
LOG: unexpected pageaddr 301/50000000 in log file 770, segment 3,
offset 0
LOG: redo done at 302/2BCE828
LOG: last completed transaction was at log time 2010-09-15
15:07:01.040854-04
LOG: restored log file "000000010000030200000002" from archive
LOG: selected new timeline ID: 2

My entire WAL archiving script is 4 cp %p %f commands. It's so short I
don't even have a script, it's directly in the postgresql.conf archive
command.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scot Kreienkamp (#3)
Re: help with error "unexpected pageaddr"

"Scot Kreienkamp" <SKreien@la-z-boy.com> writes:

I tried to take a new base backup about 45 minutes ago. The master has
rolled forward a number of WAL files since I last tried, but it still
fails.

LOG: restored log file "0000000100000301000000FE" from archive
LOG: restored log file "000000010000030200000000" from archive
LOG: restored log file "000000010000030200000001" from archive
LOG: restored log file "000000010000030200000002" from archive
LOG: restored log file "000000010000030200000003" from archive
LOG: unexpected pageaddr 301/50000000 in log file 770, segment 3,
offset 0

Hmmm ... is it possible that your WAL archive contains log files
numbered higher than where your master is?

regards, tom lane

#5Scot Kreienkamp
SKreien@la-z-boy.com
In reply to: Tom Lane (#4)
Re: help with error "unexpected pageaddr"

Shouldn't have, the only thing we did to the server was restart it and
run our database queries. Clearing out all the wal files from pg_xlog
along with a new base backup did fix it though.

Thanks for the help Tom!

Scot Kreienkamp
skreien@la-z-boy.com