Postgres 9.3 read block error went into recovery mode

Started by Shuwn Yuan Teeover 12 years ago5 messagesgeneral
Jump to latest
#1Shuwn Yuan Tee
shuwnyuan@binary.com

We recently experienced crash on out postgres production server. Here's our
server environment:

- Postgres 9.3
- in OpenVZ container
- total memory: 64GB

Here's the error snippet from postgres log:

ERROR: could not read block 356121 in file "base/33134/33598.2": Bad
address
LOG: server process (PID 21119) was terminated by signal 7: Bus error

WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2013-12-03 08:47:06
UTC
LOG: database system was not properly shut down; automatic recovery in
progress
UTC FATAL: the database system is in recovery mode

LOG: checkpoint complete: wrote 10499 buffers (0.7%); 0 transaction log
file(s) added, 0 removed, 4 recycled; write=0.215 s, sync=11.405 s,
total=11.631
FATAL: the database system is in recovery mode
LOG: database system is ready to accept connections

Can anyone suggests whether this is critical error? Does it indicate any
data corruption in postgres?

Although we think this is unlikely related, but this is what we did few
hours before the crash:

(1) Try to improve query performance by tweaking this:
a) shared_buffer: 8GB -> 16GB
b) effective_cache_size: 16GB -> 32GB
c) random_page_cost: 4 -> 2
d) restart postgres

(2) Due to no obvious improvement in performace, change the setting in (1)
back to before & restart

Thanks if anyone has any insight.

regards,
shuwn yuan

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Shuwn Yuan Tee (#1)
Re: Postgres 9.3 read block error went into recovery mode

Shuwn Yuan Tee wrote:

We recently experienced crash on out postgres production server. Here's our server environment:

- Postgres 9.3
- in OpenVZ container
- total memory: 64GB

Here's the error snippet from postgres log:

ERROR: could not read block 356121 in file "base/33134/33598.2": Bad address

LOG: server process (PID 21119) was terminated by signal 7: Bus error

[...]

Can anyone suggests whether this is critical error? Does it indicate any data corruption in postgres?

Yes, this is a critical error.

Unless my math is off, a PostgreSQL disk file should not contain more
than 131072 blocks (1GB / 8KB), so something is whacky there.

But I find the second entry just as alarming.

I am no hardware guy, but I believe that a bus error would indicate a
hardware problem.

Is there a chance that you can perform a thorough hardware check
on the machine?

Make sure that you have a good backup from before this happened.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Laurenz Albe (#2)
Re: Postgres 9.3 read block error went into recovery mode

Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Shuwn Yuan Tee wrote:

We recently experienced crash on out postgres production server.
Here's our server environment:

- in OpenVZ container

ERROR:  could not read block 356121 in file "base/33134/33598.2": Bad address

LOG:  server process (PID 21119) was terminated by signal 7: Bus error

Unless my math is off, a PostgreSQL disk file should not contain
more than 131072 blocks (1GB / 8KB), so something is whacky
there.

Not at all; the block number is the logical block number within the
relation; it determines both the segment to read from (in this case
".2") and the offset into that segment.  That all looks fine.

I am no hardware guy, but I believe that a bus error would
indicate a hardware problem.

Or a VM problem.  Personally I have never seen this except in a VM,
and the cause always turned out to be a VM bug.  Be sure you are
up-to-date on bug fixes for the software.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Shuwn Yuan Tee (#1)
Re: Postgres 9.3 read block error went into recovery mode

On Tue, Dec 3, 2013 at 4:32 AM, Shuwn Yuan Tee <shuwnyuan@binary.com> wrote:

We recently experienced crash on out postgres production server. Here's our
server environment:

- Postgres 9.3
- in OpenVZ container
- total memory: 64GB

Here's the error snippet from postgres log:

ERROR: could not read block 356121 in file "base/33134/33598.2": Bad
address
LOG: server process (PID 21119) was terminated by signal 7: Bus error

WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2013-12-03 08:47:06
UTC
LOG: database system was not properly shut down; automatic recovery in
progress
UTC FATAL: the database system is in recovery mode

LOG: checkpoint complete: wrote 10499 buffers (0.7%); 0 transaction log
file(s) added, 0 removed, 4 recycled; write=0.215 s, sync=11.405 s,
total=11.631
FATAL: the database system is in recovery mode
LOG: database system is ready to accept connections

Can anyone suggests whether this is critical error? Does it indicate any
data corruption in postgres?

Although we think this is unlikely related, but this is what we did few
hours before the crash:

(1) Try to improve query performance by tweaking this:
a) shared_buffer: 8GB -> 16GB
b) effective_cache_size: 16GB -> 32GB
c) random_page_cost: 4 -> 2
d) restart postgres

(2) Due to no obvious improvement in performace, change the setting in (1)
back to before & restart

Thanks if anyone has any insight

This seems to be bug in OpenVZ. It appears not to like high shared
buffer settings.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Shuwn Yuan Tee
shuwnyuan@binary.com
In reply to: Kevin Grittner (#3)
Re: Postgres 9.3 read block error went into recovery mode

Thanks everyone for the reply. So I would conclude it as OpenVZ problem,
probably we will run some further check just to make sure no data
corruption.

Many thanks again :)

On Thu, Dec 5, 2013 at 12:23 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Show quoted text

Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Shuwn Yuan Tee wrote:

We recently experienced crash on out postgres production server.
Here's our server environment:

- in OpenVZ container

ERROR: could not read block 356121 in file "base/33134/33598.2": Bad

address

LOG: server process (PID 21119) was terminated by signal 7: Bus error

Unless my math is off, a PostgreSQL disk file should not contain
more than 131072 blocks (1GB / 8KB), so something is whacky
there.

Not at all; the block number is the logical block number within the
relation; it determines both the segment to read from (in this case
".2") and the offset into that segment. That all looks fine.

I am no hardware guy, but I believe that a bus error would
indicate a hardware problem.

Or a VM problem. Personally I have never seen this except in a VM,
and the cause always turned out to be a VM bug. Be sure you are
up-to-date on bug fixes for the software.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company