VACUUM causes violent postmaster death

Started by Dan Moschukabout 25 years ago8 messages
#1Dan Moschuk
dan@freebsd.org

Server process (pid 13361) exited with status 26 at Fri Nov 3 17:49:44 2000
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.

This happens fairly regularly. I assume exit code 26 is used to dictate
that a specific error has occured.

The database is a decent size (~3M records) with about 4 indexes.

-Dan
--
Man is a rational animal who always loses his temper when he is called
upon to act in accordance with the dictates of reason.
-- Oscar Wilde

#2Alfred Perlstein
bright@wintelcom.net
In reply to: Dan Moschuk (#1)
Re: VACUUM causes violent postmaster death

* Dan Moschuk <dan@freebsd.org> [001103 14:55] wrote:

Server process (pid 13361) exited with status 26 at Fri Nov 3 17:49:44 2000
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.

This happens fairly regularly. I assume exit code 26 is used to dictate
that a specific error has occured.

The database is a decent size (~3M records) with about 4 indexes.

What version of postgresql? Tom Lane recently fixed some severe problems
with vacuum and heavily used databases, the fix should be in the latest
7.0.2-patches/7.0.3 release.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dan Moschuk (#1)
Re: VACUUM causes violent postmaster death

Dan Moschuk <dan@freebsd.org> writes:

Server process (pid 13361) exited with status 26 at Fri Nov 3 17:49:44 2000

What's signal 26 on your system? (Look in /usr/include/signal.h or
/usr/include/signum.h or /usr/include/sys/signal.h)

regards, tom lane

#4Dan Moschuk
dan@freebsd.org
In reply to: Tom Lane (#3)
Re: VACUUM causes violent postmaster death

| > Server process (pid 13361) exited with status 26 at Fri Nov 3 17:49:44 2000
|
| What's signal 26 on your system? (Look in /usr/include/signal.h or
| /usr/include/signum.h or /usr/include/sys/signal.h)

dan@spirit:/home/dan grep 26 /usr/include/sys/signal.h
#define SIGVTALRM 26 /* virtual time alarm */

Cheers,
-Dan
--
Man is a rational animal who always loses his temper when he is called
upon to act in accordance with the dictates of reason.
-- Oscar Wilde

#5Dan Moschuk
dan@freebsd.org
In reply to: Alfred Perlstein (#2)
Re: VACUUM causes violent postmaster death

| > This happens fairly regularly. I assume exit code 26 is used to dictate
| > that a specific error has occured.
| >
| > The database is a decent size (~3M records) with about 4 indexes.
|
| What version of postgresql? Tom Lane recently fixed some severe problems
| with vacuum and heavily used databases, the fix should be in the latest
| 7.0.2-patches/7.0.3 release.

It's 7.0.2-patches from about two or three weeks ago.

-Dan
--
Man is a rational animal who always loses his temper when he is called
upon to act in accordance with the dictates of reason.
-- Oscar Wilde

#6Alfred Perlstein
bright@wintelcom.net
In reply to: Dan Moschuk (#5)
Re: VACUUM causes violent postmaster death

* Dan Moschuk <dan@freebsd.org> [001103 15:32] wrote:

| > This happens fairly regularly. I assume exit code 26 is used to dictate
| > that a specific error has occured.
| >
| > The database is a decent size (~3M records) with about 4 indexes.
|
| What version of postgresql? Tom Lane recently fixed some severe problems
| with vacuum and heavily used databases, the fix should be in the latest
| 7.0.2-patches/7.0.3 release.

It's 7.0.2-patches from about two or three weeks ago.

Make sure pgsql/src/backend/commands/vacuum.c is at:

revision 1.148.2.1
date: 2000/09/19 21:01:04; author: tgl; state: Exp; lines: +37 -19
Back-patch fix to ensure that VACUUM always calls FlushRelationBuffers.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dan Moschuk (#4)
Re: VACUUM causes violent postmaster death

Dan Moschuk <dan@freebsd.org> writes:

| > Server process (pid 13361) exited with status 26 at Fri Nov 3 17:49:44 2000
|
| What's signal 26 on your system?

#define SIGVTALRM 26 /* virtual time alarm */

Well, that sure shouldn't be happening. You aren't perhaps running it
under a ulimit setting that limits total process CPU time, are you?

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alfred Perlstein (#6)
Re: VACUUM causes violent postmaster death

I don't think Dan's problem is related to the recently found VACUUM
bugs. Killing a backend with SIGVTALRM suggests that something thinks
the backend's been running too long. ulimit is a likely suspect.
Another possibility is some sort of profiling mechanism gone haywire.
There's nothing in our source code that would invoke that signal, so
it's got to be some outside agency, I think.

regards, tom lane