Postgresql 9.3.4 Streaming Replication Standby invalid Page block

Started by Burgess, Freddiealmost 12 years ago5 messagesbugs
Jump to latest
#1Burgess, Freddie
FBurgess@Radiantblue.com

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

thanks

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Burgess, Freddie (#1)
Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 07/02/2014 02:03 AM, Burgess, Freddie wrote:

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Sounds exactly like this bug:

/messages/by-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4
in the standby too?

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#2)
Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:

On 07/02/2014 02:03 AM, Burgess, Freddie wrote:

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

Sounds exactly like this bug:

/messages/by-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Burgess, Freddie
FBurgess@Radiantblue.com
In reply to: Andres Freund (#3)
Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

show data_checksums;
data_checksums
----------------
off

tabsdb=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4). 64-bit

On both Master/Standby
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The standby replayed all of the outstanding WAL logs overnight and we have caught up with the primary database now, and streaming replication is running fine now.

The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a Partition tablespace with data from the year 2007. I verified that the row counts match up between the master/slave on the tables that reside on that tablespace.

Is there anything else I can do to verify the consistency on the standby?

thanks

________________________________________
From: Andres Freund [andres@2ndquadrant.com]
Sent: Wednesday, July 02, 2014 7:09 AM
To: Heikki Linnakangas
Cc: Burgess, Freddie; "PostgreSQL Bugs ‎[pgsql-bugs@postgresql.org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:

On 07/02/2014 02:03 AM, Burgess, Freddie wrote:

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

Sounds exactly like this bug:

/messages/by-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Burgess, Freddie
FBurgess@Radiantblue.com
In reply to: Burgess, Freddie (#4)
Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

Today, we have the same error in the logs, but now the standby server will not re-start at all. This error is referring to a static partition holding historical data from 2006, so the problem has to be related to autovaccum

FATAL: invalid page in block 420538 of relation pg_tblspc/16434/PG_9.3_201306121/16444/125127662
CONTEXT: xlog redo vacuum: rel 16434/16444/125127662; blk 582590, lastBlockVacuumed 0
LOG: startup process (PID 14307) exited with exit code 1
LOG: terminating any other active server processes

Are there any solutions?

thanks
________________________________________
From: pgsql-bugs-owner@postgresql.org [pgsql-bugs-owner@postgresql.org] on behalf of Burgess, Freddie [FBurgess@Radiantblue.com]
Sent: Wednesday, July 02, 2014 4:04 PM
To: Andres Freund; Heikki Linnakangas
Cc: "PostgreSQL Bugs ‎[pgsql-bugs@postgresql.org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

show data_checksums;
data_checksums
----------------
off

tabsdb=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4). 64-bit

On both Master/Standby
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The standby replayed all of the outstanding WAL logs overnight and we have caught up with the primary database now, and streaming replication is running fine now.

The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a Partition tablespace with data from the year 2007. I verified that the row counts match up between the master/slave on the tables that reside on that tablespace.

Is there anything else I can do to verify the consistency on the standby?

thanks

________________________________________
From: Andres Freund [andres@2ndquadrant.com]
Sent: Wednesday, July 02, 2014 7:09 AM
To: Heikki Linnakangas
Cc: Burgess, Freddie; "PostgreSQL Bugs ‎[pgsql-bugs@postgresql.org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:

On 07/02/2014 02:03 AM, Burgess, Freddie wrote:

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

Sounds exactly like this bug:

/messages/by-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs