Reseting undo/redo logs

Started by Edmon Begolialmost 14 years ago3 messageshackers

ebegoli@gmail.com

almost 14 years ago

I have this issue on Greenplum which is a MPP hybrid build from
postgres 8.2, and the issue I am seeing is 100% from pg code.

One of the Greenplum segments went down and it cannot recover because
"PANIC XX000 invalid redo/undo record in shutdown checkpoint
(xlog.c:6576)"

I am posting this question here because most casual users of
Postgres/Greenplum are telling me that database is hosed, but I think
that with pg_resetxlog and some
(http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
loss I could at least "hack" database to come back up.

What I am asking for help here is to help me calculate the reset
values - where to find the most recent valid one and how to
*specifically* calculate the reset ones.

Please advise,
Edmon

2012-06-12 13:16:18.614912
EDT p14611 th802662304 0 seg-1 LOG 0 mirror transition,
primary address(port) 'boxgp10a(41001)' mirror address(port)
'boxgp02a(51001)' mirroring role 'primary role' mirroring state
'change tracking' segment state 'not initialized' process name(pid)
'filerep main process(14611)' filerep state 'not initialized'
0 cdbfilerep.c 3371
2012-06-12 13:16:18.617047
EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToInsync() found
insync_transition_completed:'false' full
resync:'false' 0 cdbresynchronizechangetracking.c 2522
2012-06-12 13:16:18.617113
EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToResync() found
resync_transition_completed:'false' full
resync:'false' 0 cdbresynchronizechangetracking.c 2559
2012-06-12 13:16:18.746870
EDT p14612 th802662304 0 seg-1 LOG 0 searching for last
checkpoint location for creating the initial resynchronize
changetracking 0 xlog.c 10836
2012-06-12 13:16:18.747318
EDT p14612 th802662304 0 seg-1 LOG 0 record with zero
length at 14/48000070 0 xlog.c 4182
2012-06-12 13:16:18.747491
EDT p14612 th802662304 0 seg-1 LOG 0 scanned through 1
initial xlog records since last checkpoint for writing into the
resynchronize change log 0 cdbresynchronizechangetracking.c 206
2012-06-12 13:16:18.750830
EDT p14624 th802662304 0 seg-1 LOG 0 database system was
shut down at 2012-06-12 11:00:13 EDT 0 xlog.c 6326
2012-06-12 13:16:18.750987
EDT p14624 th802662304 0 seg-1 LOG 0 checkpoint record is
at 14/48000020 0 xlog.c 6425
2012-06-12 13:16:18.751016
EDT p14624 th802662304 0 seg-1 LOG 0 redo record is at
14/48000020; undo record is at 14/42AC2118; shutdown
TRUE 0 xlog.c 6534
2012-06-12 13:16:18.751041
EDT p14624 th802662304 0 seg-1 LOG 0 next transaction ID:
0/4553423; next OID: 241771 0 xlog.c 6538
2012-06-12 13:16:18.751065
EDT p14624 th802662304 0 seg-1 LOG 0 next MultiXactId: 271;
next MultiXactOffset: 549 0 xlog.c 6541
2012-06-12 13:16:18.796637
EDT p14624 th802662304 0 seg-1 PANIC XX000 invalid
redo/undo record in shutdown checkpoint
(xlog.c:6576) 0 xlog.c 6576 "Stack trace:
1 0xa59f75 postgres errstart + 0x595
2 0x50f7ac postgres StartupXLOG + 0x1b8c
3 0x51778d postgres StartupProcessMain + 0x2fd
4 0x590746 postgres AuxiliaryProcessMain + 0x796
5 0x85fe54 postgres <symbol not found> + 0x85fe54
6 0x86003a postgres StartMasterOrPrimaryPostmasterProcesses + 0x3a
7 0x86ffaf postgres doRequestedPrimaryMirrorModeTransitions + 0xd9f
8 0x86bc4a postgres PostmasterMain + 0x1f8a
9 0x772bda postgres main + 0x4da
10 0x2af72ebc7994 libc.so.6 __libc_start_main + 0xf4
11 0x47bf49 postgres <symbol not found> + 0x47bf49

Tom Lane

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Edmon Begoli (#1)

Re: Reseting undo/redo logs

Edmon Begoli <ebegoli@gmail.com> writes:

One of the Greenplum segments went down and it cannot recover because
"PANIC XX000 invalid redo/undo record in shutdown checkpoint
(xlog.c:6576)"

I am posting this question here because most casual users of
Postgres/Greenplum are telling me that database is hosed, but I think
that with pg_resetxlog and some
(http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
loss I could at least "hack" database to come back up.

What I am asking for help here is to help me calculate the reset
values - where to find the most recent valid one and how to
*specifically* calculate the reset ones.

pg_controldata should give you useful starting points. I don't think we
can offer any more help than what is on the pg_resetxlog reference page
as to what to do with them. (Though you might try reading the more
recent releases' versions of that page to see if anything's been
clarified.)

regards, tom lane

Edmon Begoli

ebegoli@gmail.com

almost 14 years ago

In reply to: Tom Lane (#2)

Re: Reseting undo/redo logs

Thanks. I was going down this route, so just your confirmation that
this is the right path is helpful.

Edmon

Show quoted text

On Thu, Jun 21, 2012 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Edmon Begoli <ebegoli@gmail.com> writes:

One of the Greenplum segments went down and it cannot recover because
"PANIC XX000 invalid redo/undo record in shutdown checkpoint
(xlog.c:6576)"

I am posting this question here because most casual users of
Postgres/Greenplum are telling me that database is hosed, but I think
that with pg_resetxlog and some
(http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
loss I could at least "hack" database to come back up.

What I am asking for help here is to help me calculate the reset
values - where to find the most recent valid one and how to
*specifically* calculate the reset ones.

pg_controldata should give you useful starting points. I don't think we
can offer any more help than what is on the pg_resetxlog reference page
as to what to do with them. (Though you might try reading the more
recent releases' versions of that page to see if anything's been
clarified.)

regards, tom lane