How to continue streaming replication after this error?

Started by Torsten Förtschabout 12 years ago5 messagesgeneral
Jump to latest
#1Torsten Förtsch
torsten.foertsch@gmx.net

Hi,

one of our streaming replicas died with

2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.

Now, if I try to restart it, I get this:

The PostgreSQL server failed to start. Please check the log output:
2014-02-21 07:42:53 UTC LOG: database system was interrupted while in
recovery at log time 2014-02-21 05:02:45 UTC
2014-02-21 07:42:53 UTC HINT: If this has occurred more than once some
data might be corrupted and you might need to choose an earlier recovery
target.
2014-02-21 07:42:53 UTC LOG: incomplete startup packet
2014-02-21 07:42:53 UTC LOG: entering standby mode
2014-02-21 07:42:53 UTC LOG: redo starts at 11C/B2211778
2014-02-21 07:42:53 UTC FATAL: the database system is starting up
2014-02-21 07:42:54 UTC LOG: consistent recovery state reached at
11C/B4234108
2014-02-21 07:42:54 UTC LOG: database system is ready to accept read
only connections
2014-02-21 07:42:54 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 07:42:54 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 07:42:54 UTC LOG: startup process (PID 38187) was terminated
by signal 6: Aborted
2014-02-21 07:42:54 UTC LOG: terminating any other active server processes

This is 9.3.2. What is the supposed way to continue replication? Or do I
need to start from a fresh base backup?

Thanks,
Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Torsten Förtsch (#1)
Re: How to continue streaming replication after this error?

On 21/02/14 09:17, Torsten F�rtsch wrote:

one of our streaming replicas died with

2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.

Any idea what that means?

I have got a second replica dying with the same symptoms.

Thanks,
Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Torsten Förtsch (#2)
Re: How to continue streaming replication after this error?

On Sat, Feb 22, 2014 at 1:21 PM, Torsten Förtsch
<torsten.foertsch@gmx.net>wrote:

On 21/02/14 09:17, Torsten Förtsch wrote:

one of our streaming replicas died with

2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server

processes

2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.

Any idea what that means?

I have got a second replica dying with the same symptoms.

The Xlog record seems to be corrupted. The op code 32
represents XLOG_HEAP2_FREEZE_PAGE, the code exists to handle it.
Don't know why the system is not able to recognize the op code? Can you
try pg_xlogdump of the corrupted WAL file?

Keep the data folder for problem investigation. As it seems some of kind
corruption, you need to take a fresh base backup to continue.

Regards,
Hari Babu
Fujitsu Australia

#4Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Torsten Förtsch (#2)
Re: How to continue streaming replication after this error?

On 22/02/14 03:21, Torsten F�rtsch wrote:

2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32

2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.

Any idea what that means?

Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.

Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Michael Paquier
michael@paquier.xyz
In reply to: Torsten Förtsch (#4)
Re: How to continue streaming replication after this error?

On Mon, Feb 24, 2014 at 12:23 PM, Torsten Förtsch
<torsten.foertsch@gmx.net>wrote:

On 22/02/14 03:21, Torsten Förtsch wrote:

Any idea what that means?

Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.

9.3.3 has introduced some new configuration parameters. So you need to
actually update a slave before the master or replication is broken.
--
Michael