SMP-PPC spinlocks in 7.2.4?

Started by Eric Soroosabout 23 years ago5 messagesgeneral
Jump to latest
#1Eric Soroos
eric-psql@soroos.net

Hello.

Previously, I had some problems that appear to have been caused by the smp-ppc spinlock issue that was in the early 7.2 series.

The machine is a dual-800 g4, 10.1.5, using a libpq based client through local domain sockets.

The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing so is causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4. I've run a vacuum full on the databases.

Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that get hung. (I'd estimate 6 deadlocks since Saturday). It seems to coincide with times of extra activity, such as when the databases are being backed up with pg_dump.

I've also noticed the following in cron logs from nightly vacuums

NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing
....

Is there anything I can do to debug this? I'm willing to give it a shot, but I'm also rapidly preparing a single proc linux/intel machine to take over db duties.

eric

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Soroos (#1)
Re: SMP-PPC spinlocks in 7.2.4?

eric soroos <eric-psql@soroos.net> writes:

The last few days, I've been dealing with a client who has drastically upped their usage of the database and in doing so is causing deadlocks. I was running 7.2 or 7.2.1, I upgraded to a locally compiled 7.2.4. I've run a vacuum full on the databases.

Sometimes the clients have a ps ax status of async_notify, sometimes there's just a stack of selects and updates that get hung. (I'd estimate 6 deadlocks since Saturday). It seems to coincide with times of extra activity, such as when the databases are being backed up with pg_dump.

Hm. Do they use query-cancels at all? The reference to async_notify
makes me wonder if this is related to the recently-discovered
async_notify bug that could prevent fast-mode shutdowns. I'm not
certain how that might lead to an apparent deadlock, but a query cancel
arriving during async_notify would surely improve the odds of trouble.

If you don't mind running a slightly customized version, you might try
back-patching this fix:
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&amp;r2=1.91.2.1
into 7.2.4 and see if that improves matters.

If it doesn't, I'd be interested to look into the matter, but I'd
probably need access to the machine to see what is going on.

I've also noticed the following in cron logs from nightly vacuums

NOTICE: Rel pg_attribute: Uninitialized page 59 - fixing
NOTICE: Rel pg_attribute: Uninitialized page 60 - fixing

These are harmless.

Is there anything I can do to debug this? I'm willing to give it a
shot, but I'm also rapidly preparing a single proc linux/intel machine
to take over db duties.

I think you're mistaken to be blaming the hardware...

regards, tom lane

#3Eric Soroos
eric-psql@soroos.net
In reply to: Tom Lane (#2)
Re: SMP-PPC spinlocks in 7.2.4?

Tom,

Hm. Do they use query-cancels at all? The reference to async_notify
makes me wonder if this is related to the recently-discovered
async_notify bug that could prevent fast-mode shutdowns. I'm not
certain how that might lead to an apparent deadlock, but a query cancel
arriving during async_notify would surely improve the odds of trouble.

Not that I know of, unless it's for cleanup of queries when quitting the app or other such abort type states.

If you don't mind running a slightly customized version, you might try
back-patching this fix:
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&amp;r2=1.91.2.1
into 7.2.4 and see if that improves matters.

I'll give that a shot.

If it doesn't, I'd be interested to look into the matter, but I'd
probably need access to the machine to see what is going on.

That's probably possible, but there are some client confidentiality issues.

Is there anything I can do to debug this? I'm willing to give it a
shot, but I'm also rapidly preparing a single proc linux/intel machine
to take over db duties.

I think you're mistaken to be blaming the hardware...

The linux box is a migration that's being accelerated from this issue. It has more drive, more memory, no app servers, and control of the kernel shared memory parameters.

eric

#4Eric Soroos
eric-psql@soroos.net
In reply to: Eric Soroos (#3)
Re: SMP-PPC spinlocks in 7.2.4?

Tom,

If you don't mind running a slightly customized version, you might try
back-patching this fix:
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&amp;r2=1.91.2.1
into 7.2.4 and see if that improves matters.

I'll give that a shot.

It patched cleanly except for the version header. I've been running it for about 36 hours now with no problems. I'd say that I'm about 85% convinced that it made the difference, as I've also done some optimizations since then that reduce the database load by caching.

I'd say that this patch is a candidate for 7.2.5 if there's ever another 7.2 release.

thanks for your help.

eric

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Soroos (#4)
Re: SMP-PPC spinlocks in 7.2.4?

eric soroos <eric-psql@soroos.net> writes:

I'd say that this patch is a candidate for 7.2.5 if there's ever
another 7.2 release.

Yeah. I'm not sure that there will be another 7.2 release, but I'll pop
the patch into the 7.2 CVS branch while I'm thinking about it...

regards, tom lane