Re: Fwd: 8.1beta2 vacuum analyze hanging on idle database

Started by Kevin Grittnerover 20 years ago4 messages
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

I see that my initial post never made it through to the list. I assume
this was some technical failure, so I'm adding it back for this reply.

It doesn't appear that we did stop postmaster between incidents.
We have now done so.

The software we are running is a build from the beta2 release, with
no special options specified at ./configure time. Would you expect
such a build to include the debug info you wanted? We will include
the --enable-debug in our next build, but I wondered because I was
showing our DBA manager the diagnostic steps, and ran gdb bt
against an idle connection, and got:

(gdb) bt
#0 0x40197b46 in recv () from /lib/i686/libc.so.6
#1 0x0813485f in secure_read ()
#2 0x08138f7b in pq_recvbuf ()
#3 0x081393a9 in pq_getbyte ()
#4 0x08195565 in PostgresMain ()
#5 0x081716c5 in ServerLoop ()
#6 0x0817232e in PostmasterMain ()
#7 0x0813aad8 in main ()

Which seemed to show reasonable information, to my untrained eye.
That got me wondering whether the "(corrupt stack?)" note on the
previous backtrace might be something real. Both were run against
processes running the same copy of the backend software.

-Kevin

Tom Lane <tgl@sss.pgh.pa.us> 10/04/05 4:22 PM >>>

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

I can't hold the database in the problem state much longer -- if there
are any other diagnostic steps you'd like me to take before we clear
the problem, please let me know very soon.

Not at the moment ...

INFO: vacuuming "pg_catalog.pg_constraint"
INFO: index "pg_constraint_conname_nsp_index" now contains 35 row =
versions in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pg_constraint_conrelid_index" now contains 35 row versions =
in 2 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
[Hanging here for about 2 hours so far.]

Interesting that it seems to consistently be having a problem with a
pg_constraint index. Have you restarted the postmaster at any point
since this trouble began? If it were something like an unreleased
buffer pin, then it could persist indefinitely until postmaster restart.

(gdb) bt
#0 0x40198488 in semop () from /lib/i686/libc.so.6
#1 0x4a2c8cf8 in ?? ()
#2 0xbfffb2e0 in ?? ()
#3 0xbfffb308 in ?? ()
#4 0x0816a3d4 in PGSemaphoreLock ()
Previous frame inner to this frame (corrupt stack?)

This is fairly unhelpful :-(. The next stack frame down would have told
us something useful, but really we need to see the whole call stack.

It may be that you need to rebuild Postgres with --enable-debug in order
to get something gdb can work with.

regards, tom lane

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#1)

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

The software we are running is a build from the beta2 release, with
no special options specified at ./configure time. Would you expect
such a build to include the debug info you wanted?

No, you need configure --enable-debug, which is not the default.
For working with a beta release, --enable-cassert isn't a bad idea
either, though it is probably not relevant to your problem.

(gdb) bt
#0 0x40197b46 in recv () from /lib/i686/libc.so.6
#1 0x0813485f in secure_read ()
#2 0x08138f7b in pq_recvbuf ()
#3 0x081393a9 in pq_getbyte ()
#4 0x08195565 in PostgresMain ()
#5 0x081716c5 in ServerLoop ()
#6 0x0817232e in PostmasterMain ()
#7 0x0813aad8 in main ()

Which seemed to show reasonable information, to my untrained eye.

Yeah, that looks expected for a non-debug build. (Debug build would
show call parameters too, which is why it would be more helpful
even apart from the "(corrupt stack?)" problem.)

That got me wondering whether the "(corrupt stack?)" note on the
previous backtrace might be something real.

More likely, it's specific to particular places in the code that got
optimized in a way that gdb couldn't figure out.

regards, tom lane

#3Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#2)

On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

The software we are running is a build from the beta2 release, with
no special options specified at ./configure time. Would you expect
such a build to include the debug info you wanted?

No, you need configure --enable-debug, which is not the default.
For working with a beta release, --enable-cassert isn't a bad idea
either, though it is probably not relevant to your problem.

Also, note that --enable-cassert will reduce performance somewhat, and
may make the bug go away.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"En el principio del tiempo era el desencanto. Y era la desolaci�n. Y era
grande el esc�ndalo, y el destello de monitores y el crujir de teclas."
("Sean los P�jaros Pulentios", Daniel Correa)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote:

For working with a beta release, --enable-cassert isn't a bad idea
either, though it is probably not relevant to your problem.

Also, note that --enable-cassert will reduce performance somewhat, and
may make the bug go away.

True --- but there's also a chance it could expose the bug immediately.
It'd be worth trying both ways.

regards, tom lane