How to continue streaming replication after this error?
Hi,
one of our streaming replicas died with
2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.
Now, if I try to restart it, I get this:
The PostgreSQL server failed to start. Please check the log output:
2014-02-21 07:42:53 UTC LOG: database system was interrupted while in
recovery at log time 2014-02-21 05:02:45 UTC
2014-02-21 07:42:53 UTC HINT: If this has occurred more than once some
data might be corrupted and you might need to choose an earlier recovery
target.
2014-02-21 07:42:53 UTC LOG: incomplete startup packet
2014-02-21 07:42:53 UTC LOG: entering standby mode
2014-02-21 07:42:53 UTC LOG: redo starts at 11C/B2211778
2014-02-21 07:42:53 UTC FATAL: the database system is starting up
2014-02-21 07:42:54 UTC LOG: consistent recovery state reached at
11C/B4234108
2014-02-21 07:42:54 UTC LOG: database system is ready to accept read
only connections
2014-02-21 07:42:54 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 07:42:54 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 07:42:54 UTC LOG: startup process (PID 38187) was terminated
by signal 6: Aborted
2014-02-21 07:42:54 UTC LOG: terminating any other active server processes
This is 9.3.2. What is the supposed way to continue replication? Or do I
need to start from a fresh base backup?
Thanks,
Torsten
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 21/02/14 09:17, Torsten F�rtsch wrote:
one of our streaming replicas died with
2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.
Any idea what that means?
I have got a second replica dying with the same symptoms.
Thanks,
Torsten
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sat, Feb 22, 2014 at 1:21 PM, Torsten Förtsch
<torsten.foertsch@gmx.net>wrote:
On 21/02/14 09:17, Torsten Förtsch wrote:
one of our streaming replicas died with
2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active serverprocesses
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.Any idea what that means?
I have got a second replica dying with the same symptoms.
The Xlog record seems to be corrupted. The op code 32
represents XLOG_HEAP2_FREEZE_PAGE, the code exists to handle it.
Don't know why the system is not able to recognize the op code? Can you
try pg_xlogdump of the corrupted WAL file?
Keep the data folder for problem investigation. As it seems some of kind
corruption, you need to take a fresh base backup to continue.
Regards,
Hari Babu
Fujitsu Australia
On 22/02/14 03:21, Torsten F�rtsch wrote:
2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32
2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN
2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated
by signal 6: Aborted
2014-02-21 05:17:11 UTC LOG: terminating any other active server processes
2014-02-21 05:17:11 UTC WARNING: terminating connection because of
crash of another server process
2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2014-02-21 05:17:11 UTC HINT: In a moment you should be able to
reconnect to the database and repeat your command.
Any idea what that means?
Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.
Torsten
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Feb 24, 2014 at 12:23 PM, Torsten Förtsch
<torsten.foertsch@gmx.net>wrote:
On 22/02/14 03:21, Torsten Förtsch wrote:
Any idea what that means?
Updating the replica to 9.3.3 cured it. The master was already on 9.3.3.
9.3.3 has introduced some new configuration parameters. So you need to
actually update a slave before the master or replication is broken.
--
Michael