Standby Server Bus 7 error

Started by Fabio Milanoover 11 years ago9 messagesgeneral
Jump to latest
#1Fabio Milano
fmilano@crdsgroup.com

Hi,

Any assistance in interpreting the logs is much appreciated.

Replication server crashes. Below is snippet from log.

2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr
imary at 12/B0000000 on timeline 1
2014-07-22 23:43:12 EDT FATAL: could not receive data from
WAL stream: server closed the connection unexpectedly
This probably means the server terminated a
bnormally
before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D0': No such file or directory
2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
00 in log segment 0000000100000012000000D0, offset 0
2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
000012000000D0" from archive
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D1': No such file or directory
2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
00 in log segment 0000000100000012000000D1, offset 0
2014-07-22 23:43:28 EDT LOG: started streaming WAL from primary at 12/D1000000 on timeline 1
2014-07-22 23:53:13 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000012000000F1': No such file or directory
2014-07-22 23:53:13 EDT LOG: unexpected pageaddr 12/AA000000 in log segment 0000000100000012000000F1, offset 0
2014-07-22 23:53:13 EDT LOG: started streaming WAL from primary at 12/F1000000 on timeline 1

Upon trying a /etc/init.d/postgresql restart

2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D" from archive
2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at 1B/1DFC64C0
2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only connections
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E" from archive
2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error
2014-07-23 14:47:56 EDT LOG: terminating any other acti server processes

Recovery.conf

restore_command = 'cp /var/lib/postgresql/archive/%f %p'
standby_mode = 'on'
primary_conninfo = 'host=[ipaddress] port=5432 user=sherweb_standby_server sslmode=require'
archive_cleanup_command = '/usr/lib/postgresql/9.3/bin/pg_archivecleanup /var/lib/postgresql/9.3/archive/ %r'

#2Michael Paquier
michael@paquier.xyz
In reply to: Fabio Milano (#1)
Re: Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:

Any assistance in interpreting the logs is much appreciated.
Replication server crashes. Below is snippet from log.
2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr
imary at 12/B0000000 on timeline 1
2014-07-22 23:43:12 EDT FATAL: could not receive data from
WAL stream: server closed the connection unexpectedly
This probably means the server terminated a
bnormally
before or while processing the request.

The replication stream has finished. Did you root server crash in some way?

cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D0': No such file or directory
2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
00 in log segment 0000000100000012000000D0, offset 0

This means that server expected new WAL, but found instead an old
record, something that happens when a WAL file is recycled and is not
yet overwritten with new data. Did your archives get corrupted in some
way by a crash of master?

2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
000012000000D0" from archive
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D1': No such file or directory
2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
00 in log segment 0000000100000012000000D1, offset 0

Same here.

Upon trying a /etc/init.d/postgresql restart
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D"
from archive
2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at
1B/1DFC64C0
2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only
connections
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E"
from archive
2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by
signal 7: Bus error

This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Michael Paquier (#2)
Re: Standby Server Bus 7 error

Michael Paquier <michael.paquier@gmail.com> wrote:

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>

2014-07-23 14:47:56 EDT LOG:  startup process (PID 730)    terminated by signal 7: Bus error

This may find its root to a hardware problem...

... or a VM bug, which is the most frequent cause I have seen.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Fabio Milano
fmilano@crdsgroup.com
In reply to: Michael Paquier (#2)
Re: Standby Server Bus 7 error

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Wednesday, July 23, 2014 10:25 PM
To: Fabio Milano
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:

Any assistance in interpreting the logs is much appreciated.
Replication server crashes. Below is snippet from log.
2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr imary at
12/B0000000 on timeline 1
2014-07-22 23:43:12 EDT FATAL: could not receive data from WAL
stream: server closed the connection unexpectedly
This probably means the server terminated a bnormally
before or while processing the request.

The replication stream has finished. Did you root server crash in some way?

cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D0': No such file or directory
2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
00 in log segment 0000000100000012000000D0, offset 0

This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master?

2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
000012000000D0" from archive
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D1': No such file or directory
2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
00 in log segment 0000000100000012000000D1, offset 0

Same here.

Upon trying a /etc/init.d/postgresql restart
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D"
from archive
2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at
1B/1DFC64C0
2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only
connections
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E"
from archive
2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by
signal 7: Bus error

This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

Thank you Michael,

No report of crashing on the Master.

This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data.

Could this be caused by something happening by the developers on the Master?

The Archive command on the master is:
'rsync -az %p postgres@[ipaddress]:archive/%f'

This may find its root to a hardware problem...

The standby server is a VPS. Previously 9.0.13 replication was working .
9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed
Master new install OS Ubuntu 14.04 and new postgres 9.3.4 install

Version is:

psql (PostgreSQL) 9.3.4

Performed a manual resync yesterday however another crash this morning.

Standby - /etc/init.d/postgresql stop
Master - psql -c "select pg_start_backup('label', true)"
Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip address]:/var/lib/postgresql/9.3/main/
Master - psql -c "SELECT pg_stop_backup()"
Standby - /etc/init.d/postgresql start

Another crash.

2014-07-24 09:58:31 EDT LOG: restored log file "000000010000003300000096" from archive
cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory
2014-07-24 09:58:31 EDT LOG: unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0
2014-07-24 09:58:32 EDT LOG: started streaming WAL from primary at 33/97000000 on timeline 1
2014-07-24 10:03:50 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory
2014-07-24 10:03:50 EDT LOG: unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0
2014-07-24 10:04:06 EDT LOG: restored log file "0000000100000033000000B7" from archive
cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory
2014-07-24 10:04:06 EDT LOG: unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0
2014-07-24 10:04:06 EDT LOG: started streaming WAL from primary at 33/B8000000 on timeline 1
2014-07-24 10:06:24 EDT LOG: startup process (PID 18905) was terminated by signal 7: Bus error
2014-07-24 10:06:24 EDT LOG: terminating any other active server processes

/etc/init.d/postgresql restart

2014-07-24 15:19:55 EDT LOG: restored log file "000000010000003400000009" from archive
2014-07-24 15:19:57 EDT LOG: startup process (PID 22458) was terminated by signal 7: Bus error
2014-07-24 15:19:57 EDT LOG: terminating any other active server processes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Fabio Milano
fmilano@crdsgroup.com
In reply to: Fabio Milano (#4)
Re: Standby Server Bus 7 error

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Fabio Milano
Sent: Thursday, July 24, 2014 12:40 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Wednesday, July 23, 2014 10:25 PM
To: Fabio Milano
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com> wrote:

Any assistance in interpreting the logs is much appreciated.
Replication server crashes. Below is snippet from log.
2014-07-22 23:36:23 EDT LOG: started streaming WAL from pr imary at
12/B0000000 on timeline 1
2014-07-22 23:43:12 EDT FATAL: could not receive data from WAL
stream: server closed the connection unexpectedly
This probably means the server terminated a bnormally
before or while processing the request.

The replication stream has finished. Did you root server crash in some way?

cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D0': No such file or directory
2014-07-22 23:43:12 EDT LOG: unexpected pageaddr 12/8D0000
00 in log segment 0000000100000012000000D0, offset 0

This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data. Did your archives get corrupted in some way by a crash of master?

2014-07-22 23:43:28 EDT LOG: restored log file "0000000100
000012000000D0" from archive
cp: cannot stat '/var/lib/postgresql/archive/00000001000000
12000000D1': No such file or directory
2014-07-22 23:43:28 EDT LOG: unexpected pageaddr 12/A50000
00 in log segment 0000000100000012000000D1, offset 0

Same here.

Upon trying a /etc/init.d/postgresql restart
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001D"
from archive
2014-07-23 14:47:55 EDT LOG: consistent recovery state ched at
1B/1DFC64C0
2014-07-23 14:47:55 EDT LOG: database system is ready t ccept read only
connections
2014-07-23 14:47:55 EDT LOG: restored log file "0000000 00001B0000001E"
from archive
2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by
signal 7: Bus error

This may find its root to a hardware problem...

What is the version of 9.3 you are using btw?
--
Michael

Thank you Michael,

No report of crashing on the Master.

This means that server expected new WAL, but found instead an old record, something that happens when a WAL file is recycled and is not yet overwritten with new data.

Could this be caused by something happening by the developers on the Master?

The Archive command on the master is:
'rsync -az %p postgres@[ipaddress]:archive/%f'

This may find its root to a hardware problem...

The standby server is a VPS. Previously 9.0.13 replication was working .
9.0.13 was removed. OS upgraded to Ubuntu 14.04 and Postgres 9.3.4 installed Master new install OS Ubuntu 14.04 and new postgres 9.3.4 install

Version is:

psql (PostgreSQL) 9.3.4

Performed a manual resync yesterday however another crash this morning.

Standby - /etc/init.d/postgresql stop
Master - psql -c "select pg_start_backup('label', true)"
Master - rsync -avc --exclude postmaster.pid --exclude postgresql.pid /var/lib/postgresql/9.3/main/* [ip address]:/var/lib/postgresql/9.3/main/
Master - psql -c "SELECT pg_stop_backup()"
Standby - /etc/init.d/postgresql start

Another crash.

2014-07-24 09:58:31 EDT LOG: restored log file "000000010000003300000096" from archive
cp: cannot stat '/var/lib/postgresql/archive/000000010000003300000097': No such file or directory
2014-07-24 09:58:31 EDT LOG: unexpected pageaddr 33/5C000000 in log segment 000000010000003300000097, offset 0
2014-07-24 09:58:32 EDT LOG: started streaming WAL from primary at 33/97000000 on timeline 1
2014-07-24 10:03:50 EDT FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B7': No such file or directory
2014-07-24 10:03:50 EDT LOG: unexpected pageaddr 33/6A000000 in log segment 0000000100000033000000B7, offset 0
2014-07-24 10:04:06 EDT LOG: restored log file "0000000100000033000000B7" from archive
cp: cannot stat '/var/lib/postgresql/archive/0000000100000033000000B8': No such file or directory
2014-07-24 10:04:06 EDT LOG: unexpected pageaddr 33/76000000 in log segment 0000000100000033000000B8, offset 0
2014-07-24 10:04:06 EDT LOG: started streaming WAL from primary at 33/B8000000 on timeline 1
2014-07-24 10:06:24 EDT LOG: startup process (PID 18905) was terminated by signal 7: Bus error
2014-07-24 10:06:24 EDT LOG: terminating any other active server processes

/etc/init.d/postgresql restart

2014-07-24 15:19:55 EDT LOG: restored log file "000000010000003400000009" from archive
2014-07-24 15:19:57 EDT LOG: startup process (PID 22458) was terminated by signal 7: Bus error
2014-07-24 15:19:57 EDT LOG: terminating any other active server processes

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Confirmed with Developers:

We haven't done any rollbacks, and haven't seen any crashes.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#3)
Re: Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Michael Paquier <michael.paquier@gmail.com> wrote:

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>

2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error

This may find its root to a hardware problem...

... or a VM bug, which is the most frequent cause I have seen.

yeah, in particular, with OpenVZ. Is this server virtualized?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Fabio Milano
fmilano@crdsgroup.com
In reply to: Merlin Moncure (#6)
Re: Standby Server Bus 7 error

Yes.

The Standby/Slave server is a VPS. Virtuozzo.

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Monday, July 28, 2014 9:47 AM
To: Kevin Grittner
Cc: Michael Paquier; Fabio Milano; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Thu, Jul 24, 2014 at 9:48 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

Michael Paquier <michael.paquier@gmail.com> wrote:

On Thu, Jul 24, 2014 at 12:53 PM, Fabio Milano <fmilano@crdsgroup.com>

2014-07-23 14:47:56 EDT LOG: startup process (PID 730) terminated by signal 7: Bus error

This may find its root to a hardware problem...

... or a VM bug, which is the most frequent cause I have seen.

yeah, in particular, with OpenVZ. Is this server virtualized?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Fabio Milano (#7)
Re: Standby Server Bus 7 error

On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote:

Yes.

The Standby/Slave server is a VPS. Virtuozzo.

Virtuozzo is based on OpenVZ. So it's time to look for OpenVZ bus
errors and what people did to get around them. In at least one case
reducing shared buffers fixed the problem.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Fabio Milano
fmilano@crdsgroup.com
In reply to: Merlin Moncure (#8)
Re: Standby Server Bus 7 error

Thanks you!

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Monday, July 28, 2014 9:54 AM
To: Fabio Milano
Cc: Kevin Grittner; Michael Paquier; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Standby Server Bus 7 error

On Mon, Jul 28, 2014 at 11:50 AM, Fabio Milano <fmilano@crdsgroup.com> wrote:

Yes.

The Standby/Slave server is a VPS. Virtuozzo.

Virtuozzo is based on OpenVZ. So it's time to look for OpenVZ bus errors and what people did to get around them. In at least one case reducing shared buffers fixed the problem.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general