FATAL: SMgrRelation hashtable corrupted

Started by Daulat Ramalmost 7 years ago5 messagesgeneral
Jump to latest
#1Daulat Ram
Daulat.Ram@exponential.com

Hello team

I need your help on this issue.

My Postgres 11.2 container is not started due to the below error message. It is in streaming replication environment.

2019-05-17 06:41:08.989 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-17 06:41:09.093 UTC [11] LOG: database system was interrupted while in recovery at 2019-05-17 06:40:24 UTC
2019-05-17 06:41:09.093 UTC [11] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2019-05-17 06:41:11.260 UTC [12] FATAL: the database system is starting up
2019-05-17 06:41:11.673 UTC [13] FATAL: the database system is starting up
2019-05-17 06:41:12.209 UTC [14] FATAL: the database system is starting up
2019-05-17 06:41:12.427 UTC [15] FATAL: the database system is starting up
2019-05-17 06:41:15.425 UTC [16] FATAL: the database system is starting up
2019-05-17 06:41:15.680 UTC [17] FATAL: the database system is starting up
2019-05-17 06:41:16.059 UTC [18] FATAL: the database system is starting up
2019-05-17 06:41:16.263 UTC [19] FATAL: the database system is starting up
2019-05-17 06:41:16.624 UTC [20] FATAL: the database system is starting up
2019-05-17 06:41:17.471 UTC [21] FATAL: the database system is starting up
2019-05-17 06:41:18.739 UTC [22] FATAL: the database system is starting up
2019-05-17 06:41:19.877 UTC [11] LOG: database system was not properly shut down; automatic recovery in progress
2019-05-17 06:41:19.887 UTC [11] LOG: redo starts at 5E/170349E8
2019-05-17 06:41:19.954 UTC [11] FATAL: SMgrRelation hashtable corrupted
2019-05-17 06:41:19.954 UTC [11] CONTEXT: WAL redo at 5E/17061648 for Transaction/COMMIT: 2019-05-17 06:39:46.902988+00; rels: base/59265/105367 base/59265/105349 base/59265/105365 base/59265/105362 base/59265/105360 base/59265/105349 base/59265/105358 base/59265/105355; inval msgs: catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 relcache 105365 relcache 105367 relcache 105367 relcache 105293 relcache 105411 relcache 105411 relcache 105365 relcache 105293 relcache 105358 relcache 105360 relcache 105360 relcache 105285 relcache 105413 relcache 105413 relcache 105358 relcache 105285
2019-05-17 06:41:19.955 UTC [1] LOG: startup process (PID 11) exited with exit code 1
2019-05-17 06:41:19.955 UTC [1] LOG: aborting startup due to startup process failure
2019-05-17 06:41:19.961 UTC [1] LOG: database system is shut down

Regards,
Daulat

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daulat Ram (#1)
Re: FATAL: SMgrRelation hashtable corrupted

Daulat Ram <Daulat.Ram@exponential.com> writes:

My Postgres 11.2 container is not started due to the below error message. It is in streaming replication environment.

2019-05-17 06:41:19.954 UTC [11] FATAL: SMgrRelation hashtable corrupted

Yes, this is probably the same issue reported in

/messages/by-id/15672-b9fa7db32698269f@postgresql.org

/messages/by-id/15684-4ef33de3271cf929@postgresql.org

The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
The bad news is that your database is probably toast anyway --- an update
won't undo the catalog corruption that is causing the WAL replay crash.
I hope you have a recent backup to restore from.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: FATAL: SMgrRelation hashtable corrupted

Hi,

On 2019-05-17 09:30:05 -0400, Tom Lane wrote:

The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
The bad news is that your database is probably toast anyway --- an update
won't undo the catalog corruption that is causing the WAL replay crash.
I hope you have a recent backup to restore from.

Should there not be a backup, couldn't weaken the error checks during
replay a bit (locally), to allow replay to progress? The indexes will be
toast, but it ought to allow to recover the table data completely.

Greetings,

Andres Freund

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: FATAL: SMgrRelation hashtable corrupted

On 2019-May-17, Tom Lane wrote:

The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
The bad news is that your database is probably toast anyway --- an update
won't undo the catalog corruption that is causing the WAL replay crash.
I hope you have a recent backup to restore from.

Hmm, shouldn't it be possible to do a PITR restore to the point just
before the problem record, ie. 5E/17061648?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: FATAL: SMgrRelation hashtable corrupted

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-May-17, Tom Lane wrote:

The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
The bad news is that your database is probably toast anyway --- an update
won't undo the catalog corruption that is causing the WAL replay crash.
I hope you have a recent backup to restore from.

Hmm, shouldn't it be possible to do a PITR restore to the point just
before the problem record, ie. 5E/17061648?

If he's got the necessary WAL archives :-(

A dump and restore would be advisable afterwards in any case, since
the catalog corruption would still be there.

regards, tom lane