Notice and share memory corruption
I get the following on untuned Linux (Redhat 6.2) using stock 7.0.2
rpm-s
NOTICE: RegisterSharedInvalid: SI buffer overflow
NOTICE: InvalidateSharedInvalid: cache state reset
Actually I get many of them ;(
I'm running a script that does a bunch of mixed INSERTS, UPDATES,
DELETES and SELECTS.
after getting that I'm unable to vacuum database until I reset the OS
Where/how should I start looking (or is it a known problem)
Are there any simple workarounds to stop it happening.
-----------
Hannu
Hannu Krosing <hannu@tm.ee> writes:
I get the following on untuned Linux (Redhat 6.2) using stock 7.0.2
rpm-s
NOTICE: RegisterSharedInvalid: SI buffer overflow
NOTICE: InvalidateSharedInvalid: cache state reset
Actually I get many of them ;(
AFAIK, these are just noise in 7.0. The only reason you see them is
we haven't got round to removing the messages or downgrading them to
elog(DEBUG).
I'm running a script that does a bunch of mixed INSERTS, UPDATES,
DELETES and SELECTS.
I'll bet you also have some backends sitting idle with open
transactions? The combination of idle and active backends is what
usually provokes SI overruns.
after getting that I'm unable to vacuum database until I reset the OS
Define your terms more carefully, please. What do you mean by
"unable to vacuum" --- what happens *exactly*? In any case,
surely it doesn't take an OS reboot to recover. I might believe
you need to restart the postmaster...
regards, tom lane
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
I get the following on untuned Linux (Redhat 6.2) using stock 7.0.2
rpm-sNOTICE: RegisterSharedInvalid: SI buffer overflow
NOTICE: InvalidateSharedInvalid: cache state resetActually I get many of them ;(
AFAIK, these are just noise in 7.0. The only reason you see them is
we haven't got round to removing the messages or downgrading them to
elog(DEBUG).I'm running a script that does a bunch of mixed INSERTS, UPDATES,
DELETES and SELECTS.I'll bet you also have some backends sitting idle with open
transactions? The combination of idle and active backends is what
usually provokes SI overruns.after getting that I'm unable to vacuum database until I reset the OS
Define your terms more carefully, please. What do you mean by
"unable to vacuum" --- what happens *exactly*?
NOTICE: FlushRelationBuffers(access_right, 2009): block 1944 is
referenced (private 0, global 2)
FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
In any case,
surely it doesn't take an OS reboot to recover. I might believe
you need to restart the postmaster...
on one machine a simple restart worked
Maybe i have to really restart it (instead of doing
/etc/rc.d/init.d/postgresql restart)
by running killall -9 /usr/bin/postgres
I was quite sure that just restarting it did not help, but maybe
it really did not restart, just claimed to .
On the other I still get
amphora2=# vacuum;
NOTICE: FlushRelationBuffers(item, 30): block 2 is referenced (private
0, global 1)
FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
after stopping postmaster (and checking it is stopped)
I could do a vacuum after restarting the whole machine...
OTOH it _may_ be that someone started another backend right after
restart and did something,
but must this be a FATAL error ?
-----------
Hannu
Hannu Krosing <hannu@tm.ee> writes:
Define your terms more carefully, please. What do you mean by
"unable to vacuum" --- what happens *exactly*?
NOTICE: FlushRelationBuffers(access_right, 2009): block 1944 is
referenced (private 0, global 2)
FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2
Oh, that's interesting. This error indicates that some prior
transaction neglected to release a reference count on a shared buffer.
We have seen sporadic reports of this problem in 7.0, but so far no
one has come up with a reproducible example. If you can boil down
your script to something that reproducibly causes the problem then
that'd be a great help in tracking it down.
If you have clients that sometimes disconnect in the middle of a
transaction, it might help to apply the attached patch.
Maybe i have to really restart it (instead of doing
/etc/rc.d/init.d/postgresql restart)
by running killall -9 /usr/bin/postgres
Restarting the postmaster should clear the problem (by releasing and
reinitializing shared memory). I dunno where you got the idea that
kill -9 was a recommended way of shutting down the system, but I sure
wouldn't recommend it. A plain kill on the postmaster ought to do it
(see the pg_ctl script in release 7.0.*).
regards, tom lane
*** src/backend/tcop/postgres.c.orig Sat May 20 22:23:30 2000
--- src/backend/tcop/postgres.c Wed Aug 30 16:47:51 2000
***************
*** 1459,1465 ****
* Initialize the deferred trigger manager
*/
if (DeferredTriggerInit() != 0)
! proc_exit(0);
SetProcessingMode(NormalProcessing);
--- 1459,1465 ----
* Initialize the deferred trigger manager
*/
if (DeferredTriggerInit() != 0)
! goto normalexit;
SetProcessingMode(NormalProcessing);
***************
*** 1479,1490 ****
TPRINTF(TRACE_VERBOSE, "AbortCurrentTransaction");
AbortCurrentTransaction();
! InError = false;
if (ExitAfterAbort)
! {
! ProcReleaseLocks(); /* Just to be sure... */
! proc_exit(0);
! }
}
Warn_restart_ready = true; /* we can now handle elog(ERROR) */
--- 1479,1489 ----
TPRINTF(TRACE_VERBOSE, "AbortCurrentTransaction");
AbortCurrentTransaction();
!
if (ExitAfterAbort)
! goto errorexit;
!
! InError = false;
}
Warn_restart_ready = true; /* we can now handle elog(ERROR) */
***************
*** 1553,1560 ****
if (HandleFunctionRequest() == EOF)
{
/* lost frontend connection during F message input */
! pq_close();
! proc_exit(0);
}
break;
--- 1552,1558 ----
if (HandleFunctionRequest() == EOF)
{
/* lost frontend connection during F message input */
! goto normalexit;
}
break;
***************
*** 1608,1618 ****
*/
case 'X':
case EOF:
! if (!IsUnderPostmaster)
! ShutdownXLOG();
! pq_close();
! proc_exit(0);
! break;
default:
elog(ERROR, "unknown frontend message was received");
--- 1606,1612 ----
*/
case 'X':
case EOF:
! goto normalexit;
default:
elog(ERROR, "unknown frontend message was received");
***************
*** 1642,1651 ****
if (IsUnderPostmaster)
NullCommand(Remote);
}
! } /* infinite for-loop */
! proc_exit(0); /* shouldn't get here... */
! return 1;
}
#ifndef HAVE_GETRUSAGE
--- 1636,1655 ----
if (IsUnderPostmaster)
NullCommand(Remote);
}
! } /* end of main loop */
!
! normalexit:
! ExitAfterAbort = true; /* ensure we will exit if elog during abort */
! AbortOutOfAnyTransaction();
! if (!IsUnderPostmaster)
! ShutdownXLOG();
!
! errorexit:
! pq_close();
! ProcReleaseLocks(); /* Just to be sure... */
! proc_exit(0);
! return 1; /* keep compiler quiet */
}
#ifndef HAVE_GETRUSAGE