Apparent deadlock for simultaneous sequential scans

Started by PostgreSQL Bugs Listalmost 25 years ago4 messagesbugs
Jump to latest
#1PostgreSQL Bugs List
pgsql-bugs@postgresql.org

Robert Bruccoleri (bruc@stone.congen.com) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
Apparent deadlock for simultaneous sequential scans

Long Description
On an SGI Origin 2000 with 32 CPU's, I'm running Postgresql 7.1beta4
using 32768 buffers. I have an application which does a two table join with a nested loop plan - one table is scanned sequentially, and
the other is index scanned for each hit.

If I run this application by itself, performance is fine. The queries
take a few minutes to execute, which is reasonable given the number
of tuples that must be returned.

However, if more than one application is run at once, the performance
deteriotates drastically. Monitoring the backends using the SGI par
command (which monitors system calls) shows that all the affected backends are running select in timeout mode. Running dbx on the
running backends reveals that all of them are waiting for the bufmgr
spinlock (BufMgrLock). Here is the traceback for all the backends:

0 __select(0x0, 0x0, 0x0, 0x0) ["select.s":17, 0xfa34e00]

1 _select(0x0, 0x0, 0x0, 0x0) ["selectSCI.c":30, 0xfa34e74]
2 s_lock_sleep(0x0, 0x0, 0x64000010, 0x100580b4) ["s_lock.c":90, 0x557d2c]
3 s_lock(0x64000010, 0x100580b4, 0x9c, 0x0) ["s_lock.c":113, 0x557db0]
4 SpinAcquire(0x0, 0x0, 0x0, 0x0) ["spin.c":156, 0x55ee74]
5 RelationGetBufferWithBuffer(0x103c5b78, 0x4, 0x171, 0x0) ["bufmgr.c":117, 0x551aa0]
6 heapgettup(0x103c5b78, 0x1040e81c, 0x1, 0x1040e850) ["heapam.c":411, 0x43f050]
7 heap_getnext(0x1040e800, 0x0, 0x0, 0x0) ["heapam.c":1072, 0x4416d4]
8 SeqNext(0x1040aa40, 0x0, 0x0, 0x0) ["nodeSeqscan.c":98, 0x4edc04]
9 ExecScan(0x1040aa40, 0x4edaf0, 0x0, 0x0) ["execScan.c":98, 0x4e2d64]
10 ExecSeqScan(0x1040aa40, 0x0, 0x0, 0x0) ["nodeSeqscan.c":137, 0x4edc74]
11 ExecProcNode(0x1040aa40, 0x1040aad0, 0x0, 0x0) ["execProcnode.c":285, 0x4df39c]
12 ExecNestLoop(0x1040aad0, 0x0, 0x0, 0x0) ["nodeNestloop.c":173, 0x4ed140]
13 ExecProcNode(0x1040aad0, 0x1040dee0, 0x0, 0x0) ["execProcnode.c":305, 0x4df40c]
14 ExecUnique(0x1040dee0, 0x0, 0x0, 0x0) ["nodeUnique.c":71, 0x4ef40c]
15 ExecProcNode(0x1040dee0, 0x1040dee0, 0x0, 0x0) ["execProcnode.c":333, 0x4df4b4]
16 ExecutePlan(0x1040e020, 0x1040dee0, 0x1, 0x0) ["execMain.c":965, 0x4dd3e0]
17 ExecutorRun(0x1040e000, 0x1040e020, 0x3, 0x0) ["execMain.c":199, 0x4dc06c]
18 ProcessQuery(0x103fa5e8, 0x1040dee0, 0x2, 0x0) ["pquery.c":305, 0x56f8d4]
19 pg_exec_query_string(0x103f9d98, 0x2, 0x1037e300, 0x0) ["postgres.c":810, 0x56d444]
20 PostgresMain(0x4, 0x7fff2560, 0xa, 0x7fff2eb4) ["postgres.c":1882, 0x56ef1c]
21 DoBackend(0x100ba550, 0x0, 0x0, 0x0) ["postmaster.c":2035, 0x540058]
22 BackendStartup(0x100ba550, 0x0, 0x0, 0x0) ["postmaster.c":1812, 0x53f8e4]
23 ServerLoop(0x0, 0x0, 0x0, 0x0) ["postmaster.c":967, 0x53dfa4]
24 PostmasterMain(0xa, 0x7fff2eb4, 0x0, 0x0) ["postmaster.c":666, 0x53d60c]
25 main(0xa, 0x7fff2eb4, 0x0, 0x0) ["main.c":142, 0x4fdbac]
26 __istart() ["crt1tinit.s":13, 0x4255f0]

It's not clear to me why the spinlock needs be grabbed at the beginning of RelationGetBufferWithBuffer, but that does seem to
be the problem.

If more information is required, please let me know.

I've compared the code for this file against PostgreSQL 7.1.1 and
this part is unchanged.

Sample Code

No file was uploaded with this report

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PostgreSQL Bugs List (#1)
Re: Apparent deadlock for simultaneous sequential scans

pgsql-bugs@postgresql.org writes:

Apparent deadlock for simultaneous sequential scans

However, if more than one application is run at once, the performance
deteriotates drastically.

So is it a deadlock, or a slowdown? How many is "more than one"?

regards, tom lane

#3Robert E. Bruccoleri
bruc@stone.congenomics.com
In reply to: Tom Lane (#2)
Re: Apparent deadlock for simultaneous sequential scans

Dear Tom,

pgsql-bugs@postgresql.org writes:

Apparent deadlock for simultaneous sequential scans

However, if more than one application is run at once, the performance
deteriotates drastically.

So is it a deadlock, or a slowdown? How many is "more than one"?

With two processors running the same query, it appears to be a
slowdown. When I look at the system calls, the backends were
executing about one read per second. With six processors running the
same query, it appeared to be a deadlock -- no I/O's were being issued
over the time that I watched.

"More than one" means two or more.

Thanks. --Bob

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert E. Bruccoleri (#3)
Re: Apparent deadlock for simultaneous sequential scans

bruc@stone.congenomics.com (Robert E. Bruccoleri) writes:

With two processors running the same query, it appears to be a
slowdown. When I look at the system calls, the backends were
executing about one read per second. With six processors running the
same query, it appeared to be a deadlock -- no I/O's were being issued
over the time that I watched.

It's hard to believe there's an actual deadlock here. You might be
looking at pathological inefficiency of the spinlock implementation,
but still it seems that someone somewhere must be getting some work
done. Can you determine which backend actually has the spinlock?
What's it doing?

Given that you mentioned you had a large number of shared buffers,
it might be that a background checkpoint process running BufferSync()
is part of the problem. It looks like BufferSync acquires the spinlock
separately for each buffer it examines, which would be kinda nasty in
the presence of heavy contention. OTOH we shouldn't really care if
BufferSync is slow.

regards, tom lane