BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
The following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:
We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using Corosync
and Heartbeat.
My application is requirement needs heavy database updates. Upon fail-over
I've noticed that database indexes got corrupted.
I'm not sure why this is happening. I was referring release notes of 9.1.3
and found similar issue is already fixed in it, but we are facing the same.
Here is a snapshot of installed postgresql packages:
mayank@server:~$ dpkg -l | grep postgres
ii
postgresql-9.1
9.1.5-1~bpo60+1
object-relational SQL database, version 9.1 server
ii
postgresql-client-9.1
9.1.5-1~bpo60+1
front-end programs for PostgreSQL 9.1
ii
postgresql-client-common
130~bpo60+1
manager for multiple PostgreSQL client versions
ii postgresql-common
130~bpo60+1
PostgreSQL database-cluster manager
ii
postgresql-contrib
9.1+130~bpo60+2
additional facilities for PostgreSQL (supported
version)
ii
postgresql-contrib-9.1
9.1.5-1~bpo60+1
additional facilities for PostgreSQL
Regards,
Mayank Mittal
Show quoted text
Date: Thu, 20 Sep 2012 16:15:11 +0000
Subject: [BUGS] BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
To: pgsql-bugs@postgresql.org
From: mayank.mittal.1982@hotmail.comThe following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using Corosync
and Heartbeat.
My application is requirement needs heavy database updates. Upon fail-over
I've noticed that database indexes got corrupted.
I'm not sure why this is happening. I was referring release notes of 9.1.3
and found similar issue is already fixed in it, but we are facing the same.--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
mayank.mittal.1982@hotmail.com writes:
The following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:
We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using Corosync
and Heartbeat.
My application is requirement needs heavy database updates. Upon fail-over
I've noticed that database indexes got corrupted.
Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.
regards, tom lane
Hello Tom, Thanks for the information. But problem is it is occurring quite frequently in my case.
Regards,
Mayank Mittal
Show quoted text
From: tgl@sss.pgh.pa.us
To: mayank.mittal.1982@hotmail.com
CC: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
Date: Thu, 20 Sep 2012 13:15:17 -0400mayank.mittal.1982@hotmail.com writes:
The following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using Corosync
and Heartbeat.
My application is requirement needs heavy database updates. Upon fail-over
I've noticed that database indexes got corrupted.Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thursday, September 20, 2012 07:15:17 PM Tom Lane wrote:
mayank.mittal.1982@hotmail.com writes:
The following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using
Corosync and Heartbeat.
My application is requirement needs heavy database updates. Upon
fail-over I've noticed that database indexes got corrupted.
What kind of indexes are you using? Hash indexes by any chance?
As you say downthread the failures are frequent could you provide a bit more
details about your setup (including configuration, initial setup etc) and the
logs on both machines?
Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.
I think the likelihood of that bug causing the the index file to be zero bytes
- at least thats what I read from $subject - is really, really small:
The index would need to be created (setting a proper BM_PERMANENT flag on the
meta page), evicted from the buffer cache and thus written to the filesystem,
the root page would need to split causing the meta page to be rewritten (this
time without a proper BM_PERMANENT) in a very quick succession followed by a
OS/HW failure loosing the data already in the OS cache.
So, unless I am missing something, I don't see how that can happen.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
On Thursday, September 20, 2012 07:15:17 PM Tom Lane wrote:
Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.
I think the likelihood of that bug causing the the index file to be zero bytes
- at least thats what I read from $subject - is really, really small:
Sure, but what about the heap? The case I was speculating about was
that the heap had been truncated, but because of the corruption problem,
the index still had heap pointers in it. We don't know what file 16585
is supposed to be.
Your point about hash indexes is definitely worth asking though...
that would square with the reported symptoms.
regards, tom lane
On Thursday, September 20, 2012 11:38:52 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On Thursday, September 20, 2012 07:15:17 PM Tom Lane wrote:
Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.I think the likelihood of that bug causing the the index file to be zero
bytes- at least thats what I read from $subject - is really, really small:
Sure, but what about the heap? The case I was speculating about was
that the heap had been truncated, but because of the corruption problem,
the index still had heap pointers in it. We don't know what file 16585
is supposed to be.
Hm. Interesting thought.
*think*
Wouldn't the truncation have created a completely new index relation?
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
On Thursday, September 20, 2012 11:38:52 PM Tom Lane wrote:
Sure, but what about the heap? The case I was speculating about was
that the heap had been truncated, but because of the corruption problem,
the index still had heap pointers in it. We don't know what file 16585
is supposed to be.
Wouldn't the truncation have created a completely new index relation?
If it were an actual TRUNCATE, yeah. But it could be a case of VACUUM
truncating a now-empty table to zero blocks.
But nothing like this would explain the OP's report that corruption is
completely reproducible for him. So I like your theory about hash index
use better. We really oughta get some WAL support in there.
regards, tom lane
Hello Andres,I didn't mention hashing type for indexes explicitly. I'm relying on the default one which is B-Tree.Here is the basic configuration of my system.
Operating System: Debian Linux 6.0Type: 64-bitFile system Type: ext4RAM : 4G
Also I didn't understand where to find BM_PERMANENT flag setting.
Here is steps for initial setup.
1. Server 1 is running in master mode.2. When server 2 came up. Our Resource Agent initiates pg_dump on master node and copy the dump to data folder of slave node.3. Once copied completely, we create recovery.conf file on the slave node and starts the Postgre.4. In case of Master failure, RA creates trigger file in slave to promote it to master.
I'm using following command to take dump of master:pg_basebackup -U postgres -h <master_node_ip> -P -x -D <backup_location>
Regards,
Mayank MittalBarco Electronics System Ltd.Mob. +91 9873437922
Show quoted text
From: andres@2ndquadrant.com
To: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
Date: Thu, 20 Sep 2012 23:31:35 +0200
CC: tgl@sss.pgh.pa.us; mayank.mittal.1982@hotmail.comOn Thursday, September 20, 2012 07:15:17 PM Tom Lane wrote:
mayank.mittal.1982@hotmail.com writes:
The following bug has been logged on the website:
Bug reference: 7562
Logged by: Mayank Mittal
Email address: mayank.mittal.1982@hotmail.com
PostgreSQL version: 9.1.5
Operating system: Debian Linux 6.0
Description:We are using 2 node set-up of PostgreSQL 9.1.5 in which one is master and
other is slave which is in sync of master with streaming replication.
The design is in such a way that in case of master node failure the slave
node has to take master role. I'm controlling this behaviour using
Corosync and Heartbeat.
My application is requirement needs heavy database updates. Upon
fail-over I've noticed that database indexes got corrupted.What kind of indexes are you using? Hash indexes by any chance?
As you say downthread the failures are frequent could you provide a bit more
details about your setup (including configuration, initial setup etc) and the
logs on both machines?Hmm. There is a fix for a slave-side-index-corruption problem in 9.1.6,
which is due to be announced Monday. I am not certain whether this is
the same thing though; that bug is low-probability as far as we can
tell (it would only happen if the master had been in the middle of an
index page split or page deletion at the instant of failover). Anyway
the first thing to find out is whether 9.1.6 fixes it.I think the likelihood of that bug causing the the index file to be zero bytes
- at least thats what I read from $subject - is really, really small:The index would need to be created (setting a proper BM_PERMANENT flag on the
meta page), evicted from the buffer cache and thus written to the filesystem,
the root page would need to split causing the meta page to be rewritten (this
time without a proper BM_PERMANENT) in a very quick succession followed by a
OS/HW failure loosing the data already in the OS cache.
So, unless I am missing something, I don't see how that can happen.Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Attachments:
postgresql.conftext/plainDownload
--On 20. September 2012 18:18:12 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:
If it were an actual TRUNCATE, yeah. But it could be a case of VACUUM
truncating a now-empty table to zero blocks.But nothing like this would explain the OP's report that corruption is
completely reproducible for him. So I like your theory about hash index
use better. We really oughta get some WAL support in there.
We had a similar issue at a customer site. The server was shut down for
updating it from 9.1.4 to 9.1.5, after starting it again the log was
immediately cluttered with
ERROR: could not read block 251 in file "base/6447890/7843708": read only
0 of 8192 bytes
The index was a primary key on table with mostly INSERTS (only a few
hundred DELETEs, autovacuum didn't even bother to vacuum it yet and no
manual VACUUM). According to the customer, no DDL action takes place on
this specific table. The kernel didn't show any errors.
--
Thanks
Bernd
On Friday, September 21, 2012 10:18:39 AM Bernd Helmle wrote:
--On 20. September 2012 18:18:12 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:
If it were an actual TRUNCATE, yeah. But it could be a case of VACUUM
truncating a now-empty table to zero blocks.But nothing like this would explain the OP's report that corruption is
completely reproducible for him. So I like your theory about hash index
use better. We really oughta get some WAL support in there.We had a similar issue at a customer site. The server was shut down for
updating it from 9.1.4 to 9.1.5, after starting it again the log was
immediately cluttered with
How was it shutdown? -m fast or -m immediate?
ERROR: could not read block 251 in file "base/6447890/7843708": read only
0 of 8192 bytes
So, not block 0. How many blocks does the new index contain?
Mayank:
Do you always see the error in block 0?
The index was a primary key on table with mostly INSERTS (only a few
hundred DELETEs, autovacuum didn't even bother to vacuum it yet and no
manual VACUUM). According to the customer, no DDL action takes place on
this specific table. The kernel didn't show any errors.
Ok, this is getting wierd. Bernd some minutes ago confirmed on IRC that the
table is older than the last checkpoint...
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
No, Most of the time I've seen in block 0, but 2-3 time it was with other blocks as well.
Regards,
Mayank MittalBarco Electronics System Ltd.Mob. +91 9873437922
Show quoted text
From: andres@2ndquadrant.com
To: mailings@oopsware.de
Subject: Re: [BUGS] BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
Date: Fri, 21 Sep 2012 10:25:50 +0200
CC: tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.org; mayank.mittal.1982@hotmail.comOn Friday, September 21, 2012 10:18:39 AM Bernd Helmle wrote:
--On 20. September 2012 18:18:12 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:
If it were an actual TRUNCATE, yeah. But it could be a case of VACUUM
truncating a now-empty table to zero blocks.But nothing like this would explain the OP's report that corruption is
completely reproducible for him. So I like your theory about hash index
use better. We really oughta get some WAL support in there.We had a similar issue at a customer site. The server was shut down for
updating it from 9.1.4 to 9.1.5, after starting it again the log was
immediately cluttered withHow was it shutdown? -m fast or -m immediate?
ERROR: could not read block 251 in file "base/6447890/7843708": read only
0 of 8192 bytesSo, not block 0. How many blocks does the new index contain?
Mayank:
Do you always see the error in block 0?The index was a primary key on table with mostly INSERTS (only a few
hundred DELETEs, autovacuum didn't even bother to vacuum it yet and no
manual VACUUM). According to the customer, no DDL action takes place on
this specific table. The kernel didn't show any errors.Ok, this is getting wierd. Bernd some minutes ago confirmed on IRC that the
table is older than the last checkpoint...Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--On 21. September 2012 10:25:50 +0200 Andres Freund
<andres@2ndquadrant.com> wrote:
We had a similar issue at a customer site. The server was shut down for
updating it from 9.1.4 to 9.1.5, after starting it again the log was
immediately cluttered withHow was it shutdown? -m fast or -m immediate?
-m fast
ERROR: could not read block 251 in file "base/6447890/7843708": read
only 0 of 8192 bytesSo, not block 0. How many blocks does the new index contain?
255 blocks according to its current size.
--
Thanks
Bernd
On Friday, September 21, 2012 01:37:38 PM Mayank Mittal wrote:
As discussed with Andres on IRC, I tried to reproduce the issue with some
debug log enabled.In order to reproduce I fixed my already broken system
(index corrupted) by running REINDEX database <database_name>.Once done I
performed the failover and now I'm getting following
error:[org.postgresql.util.PSQLException: ERROR: missing chunk number 0
for toast value 33972 in pg_toast_16582]
Unfortunately I don't think its really a valid approach to start from an
already corrupted database when doing this :( There might already be lingering
corruption causing the problem.
Have you seen the missing chunk error before? Did you reproduce the issue from
a corrupted database as well before?
Greetings,
Andres
Regards,
Mayank MittalBarco Electronics System Ltd.Mob. +91 9873437922Date: Fri, 21 Sep 2012 11:34:49 +0200
From: mailings@oopsware.de
To: andres@2ndquadrant.com
CC: tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.org;
mayank.mittal.1982@hotmail.com Subject: Re: [BUGS] BUG #7562: could not
read block 0 in file "base/16385/16585": read only 0 of 8192 bytes--On 21. September 2012 10:25:50 +0200 Andres Freund
<andres@2ndquadrant.com> wrote:
We had a similar issue at a customer site. The server was shut down
for updating it from 9.1.4 to 9.1.5, after starting it again the log
was immediately cluttered withHow was it shutdown? -m fast or -m immediate?
-m fast
ERROR: could not read block 251 in file "base/6447890/7843708": read
only 0 of 8192 bytesSo, not block 0. How many blocks does the new index contain?
255 blocks according to its current size.
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Import Notes
Reply to msg id not found: COL002-W26162B3F04076A13B715CED5990@phx.gbl
No, this is the first time, I've seen this issue.In past as well reindex the tables and it works well.
BTW now I'm resetting the database to start from fresh.
Regards,
Mayank MittalBarco Electronics System Ltd.Mob. +91 9873437922
Show quoted text
From: andres@2ndquadrant.com
To: mayank.mittal.1982@hotmail.com
Subject: Re: [BUGS] BUG #7562: could not read block 0 in file "base/16385/16585": read only 0 of 8192 bytes
Date: Fri, 21 Sep 2012 13:43:00 +0200
CC: tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.orgOn Friday, September 21, 2012 01:37:38 PM Mayank Mittal wrote:
As discussed with Andres on IRC, I tried to reproduce the issue with some
debug log enabled.In order to reproduce I fixed my already broken system
(index corrupted) by running REINDEX database <database_name>.Once done I
performed the failover and now I'm getting following
error:[org.postgresql.util.PSQLException: ERROR: missing chunk number 0
for toast value 33972 in pg_toast_16582]Unfortunately I don't think its really a valid approach to start from an
already corrupted database when doing this :( There might already be lingering
corruption causing the problem.Have you seen the missing chunk error before? Did you reproduce the issue from
a corrupted database as well before?Greetings,
Andres
Regards,
Mayank MittalBarco Electronics System Ltd.Mob. +91 9873437922Date: Fri, 21 Sep 2012 11:34:49 +0200
From: mailings@oopsware.de
To: andres@2ndquadrant.com
CC: tgl@sss.pgh.pa.us; pgsql-bugs@postgresql.org;
mayank.mittal.1982@hotmail.com Subject: Re: [BUGS] BUG #7562: could not
read block 0 in file "base/16385/16585": read only 0 of 8192 bytes--On 21. September 2012 10:25:50 +0200 Andres Freund
<andres@2ndquadrant.com> wrote:
We had a similar issue at a customer site. The server was shut down
for updating it from 9.1.4 to 9.1.5, after starting it again the log
was immediately cluttered withHow was it shutdown? -m fast or -m immediate?
-m fast
ERROR: could not read block 251 in file "base/6447890/7843708": read
only 0 of 8192 bytesSo, not block 0. How many blocks does the new index contain?
255 blocks according to its current size.
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs