9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi,
I've been running some tests on pg 9.2beta1 and in particular a set
of queries like
create table _tmp0 as select * from (
select *, (select healpixid from idt_match as m where
m.transitid=o.transitid)
as x from idt_photoobservation as o offset 0
) as y where x%16=ZZZ order by x;
(where ZZZ is some number integer number 0<=ZZZ<16)
With the following plan:
----------------------------------------------------------------------------------------------------------------------
Sort (cost=3228814504.96..3228815137.21 rows=252902 width=498)
Sort Key: y.x
-> Subquery Scan on y (cost=0.00..3228791809.25 rows=252902 width=498)
Filter: ((y.x % 16::bigint) = 0)
-> Limit (cost=0.00..3228033102.41 rows=50580456 width=490)
-> Seq Scan on idt_photoobservation o (cost=0.00..3228033102.41 rows=50580456 width=490)
SubPlan 1
-> Index Scan using idt_match_transitid_idx on idt_match m (cost=0.00..63.74 rows=1 width=8)
Index Cond: (transitid = o.transitid)
The schema of the tables are:
e2es2=> \d idt_match
Table "public.idt_match"
Column | Type | Modifiers
-----------+----------+-----------
sourceid | bigint |
transitid | bigint |
healpixid | bigint |
flagsxm | smallint |
Indexes:
"idt_match_idx" btree (healpixid)
"idt_match_transitid_idx" btree (transitid)
Table "public.idt_photoobservation"
Column | Type | Modifiers
-----------+----------+-----------
transitid | bigint |
fluxbp0 | real |
....
more columns
....
Indexes:
"idt_photoobservation_idx" btree (transitid)
And I noticed than when I run the query like the one shown above in parallel
(in multiple connections for ZZZ=0...8) the performance of each query
drops down significantly (factor of 2)(despite the fact that during the
execution of the query postgres is mostly CPU bound).
So I tried to oprofile it and strace it, And that's what I saw:
The strace -c of each process shows something like that
#########
Process 18660 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
76.25 0.001342 0 268987 semop
23.75 0.000418 0 61694 read
0.00 0.000000 0 138 lseek
0.00 0.000000 0 355 select
0.00 0.000000 0 3 kill
------ ----------- ----------- --------- --------- ----------------
100.00 0.001760 331177 total
#######
And the oprofile shows this on top:
-------------------------------------------------------------------------------
2863981 25.7117 ReleasePredicateLocks
2863981 100.000 ReleasePredicateLocks [self]
-------------------------------------------------------------------------------
1246629 11.1917 LocalBufferAlloc
1246629 100.000 LocalBufferAlloc [self]
-------------------------------------------------------------------------------
1135393 10.1931 CheckForSerializableConflictIn
1135393 100.000 CheckForSerializableConflictIn [self]
------------------------------------------------------------
So there is a lot of locking for some reason, And I was surprised to see
anything related to SSI in the profile at all, because I'm not running
serializable transactions (I was just running my queries from multiple
psql sessions).
Is there a reasonable explanation for what I'm seeing ?
Thanks in advance,
Sergey
More info: there is no other concurrent activity on the machine.
shared_buffers is 10G, The machine has ~ 60G of RAM, 24 cores and proper
RAID
Another note is that the oprofile and strace were obtained during the
stage when the idt_photo.. table was scanned and data was being stored in
the pgsql_tmp (not during the last bit when the data in pgsql_tmp were
actually being sorted).
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
On Thu, May 24, 2012 at 8:24 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
Hi,
I've been running some tests on pg 9.2beta1 and in particular a set
of queries likecreate table _tmp0 as select * from (
select *, (select healpixid from idt_match as m where
m.transitid=o.transitid)
as x from idt_photoobservation as o offset 0
) as y where x%16=ZZZ order by x;(where ZZZ is some number integer number 0<=ZZZ<16)
With the following plan:
----------------------------------------------------------------------------------------------------------------------
Sort (cost=3228814504.96..3228815137.21 rows=252902 width=498)
Sort Key: y.x
-> Subquery Scan on y (cost=0.00..3228791809.25 rows=252902 width=498)
Filter: ((y.x % 16::bigint) = 0)
-> Limit (cost=0.00..3228033102.41 rows=50580456 width=490)
-> Seq Scan on idt_photoobservation o
(cost=0.00..3228033102.41 rows=50580456 width=490)
SubPlan 1
-> Index Scan using idt_match_transitid_idx on
idt_match m (cost=0.00..63.74 rows=1 width=8)
Index Cond: (transitid = o.transitid)The schema of the tables are:
e2es2=> \d idt_match
Table "public.idt_match"
Column | Type | Modifiers
-----------+----------+-----------
sourceid | bigint |
transitid | bigint |
healpixid | bigint |
flagsxm | smallint |
Indexes:
"idt_match_idx" btree (healpixid)
"idt_match_transitid_idx" btree (transitid)Table "public.idt_photoobservation"
Column | Type | Modifiers
-----------+----------+-----------
transitid | bigint |
fluxbp0 | real |
....
more columns
....
Indexes:
"idt_photoobservation_idx" btree (transitid)And I noticed than when I run the query like the one shown above in parallel
(in multiple connections for ZZZ=0...8) the performance of each query drops
down significantly (factor of 2)(despite the fact that during the execution
of the query postgres is mostly CPU bound).So I tried to oprofile it and strace it, And that's what I saw:
The strace -c of each process shows something like that
#########
Process 18660 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
76.25 0.001342 0 268987 semop
23.75 0.000418 0 61694 read
0.00 0.000000 0 138 lseek
0.00 0.000000 0 355 select
0.00 0.000000 0 3 kill
------ ----------- ----------- --------- --------- ----------------
100.00 0.001760 331177 total
#######And the oprofile shows this on top:
-------------------------------------------------------------------------------
2863981 25.7117 ReleasePredicateLocks
2863981 100.000 ReleasePredicateLocks [self]
-------------------------------------------------------------------------------
1246629 11.1917 LocalBufferAlloc
1246629 100.000 LocalBufferAlloc [self]
-------------------------------------------------------------------------------
1135393 10.1931 CheckForSerializableConflictIn
1135393 100.000 CheckForSerializableConflictIn [self]
------------------------------------------------------------So there is a lot of locking for some reason, And I was surprised to see
anything related to SSI in the profile at all, because I'm not running
serializable transactions (I was just running my queries from multiple psql
sessions).
Are you sure? I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:
/* Nothing to do if this is not a serializable transaction */
if (MySerializableXact == InvalidSerializableXact)
return false;
What's the default isolation mode set to?
merlin
On Thu, 24 May 2012, Merlin Moncure wrote:
Are you sure? I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:/* Nothing to do if this is not a serializable transaction */
if (MySerializableXact == InvalidSerializableXact)
return false;
What's the default isolation mode set to?
Sorry, it was indeed my mistake. I was pointing opreport to the
binary which was recompiled, while the old version was still running , so
I guess oprofile was picking up wrong function names
Here is the correct oprofile: when multiple queries are running :
samples % symbol name
-------------------------------------------------------------------------------
952457 13.8715 LWLockAcquire
952457 100.000 LWLockAcquire [self]
-------------------------------------------------------------------------------
779077 11.3464 PinBuffer
779077 100.000 PinBuffer [self]
-------------------------------------------------------------------------------
759898 11.0671 s_lock
759898 100.000 s_lock [self]
-------------------------------------------------------------------------------
689753 6.7178 slot_deform_tuple
689753 100.000 slot_deform_tuple [self]
-------------------------------------------------------------------------------
526002 5.1230 UnpinBuffer
526002 100.000 UnpinBuffer [self]
When only one is running:
samples % symbol name
-------------------------------------------------------------------------------
163268 14.0343 slot_deform_tuple
163268 100.000 slot_deform_tuple [self]
-------------------------------------------------------------------------------
126018 10.8324 _bt_compare
126018 100.000 _bt_compare [self]
-------------------------------------------------------------------------------
113975 9.7972 ExecProject
113975 100.000 ExecProject [self]
-------------------------------------------------------------------------------
49760 4.2773 FunctionCall2Coll
49760 100.000 FunctionCall2Coll [self]
-------------------------------------------------------------------------------
49164 4.2261 LWLockAcquire
49164 100.000 LWLockAcquire [self]
-------------------------------------------------------------------------------
43526 3.7414 hash_search_with_hash_value
43526 100.000 hash_search_with_hash_value [self]
############
I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two
slow-down) when running parallel read-only transactions.
Sergey
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
On Thu, May 24, 2012 at 6:24 AM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
Hi,
I've been running some tests on pg 9.2beta1 and in particular a set
of queries like
...
And I noticed than when I run the query like the one shown above in parallel
(in multiple connections for ZZZ=0...8) the performance of each query drops
down significantly (factor of 2)(despite the fact that during the execution
of the query postgres is mostly CPU bound).
Is this a regression from previous versions (9.0, 9.1), or has the
slow down always been there?
Cheers,
Jeff
On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two slow-down)
when running parallel read-only transactions.
Me, too. How many concurrent connections are you running, and does
your working set exceed shared_buffers? Can you provide a
self-contained reproducible test case?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 24 May 2012, Robert Haas wrote:
On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two slow-down)
when running parallel read-only transactions.Me, too. How many concurrent connections are you running, and does
your working set exceed shared_buffers? Can you provide a
self-contained reproducible test case?
The last tests I've been doing were with 8 connections.
And the working set is roughly 30Gig, which is ~ 3x the shared buffers.
(but ~ 50% of RAM).
Regarding the test-case, I'll try to see whether I can still observe the
same slowing down if I chop the main table by a factor of few, so I can
put the data somewhere for download.
S
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/
On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
On Thu, 24 May 2012, Robert Haas wrote:
On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov <koposov@ast.cam.ac.uk>
wrote:I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two
slow-down)
when running parallel read-only transactions.Me, too. How many concurrent connections are you running, and does
your working set exceed shared_buffers? Can you provide a
self-contained reproducible test case?The last tests I've been doing were with 8 connections.
And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
~ 50% of RAM).
Given that additional information, I would say these results are
expected. Unfortunately, our BufFreelistLock is a serious contention
point, and I think that's what you're hitting. See the graph here:
http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, May 24, 2012 at 1:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
On Thu, 24 May 2012, Robert Haas wrote:
On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov <koposov@ast.cam.ac.uk>
wrote:I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two
slow-down)
when running parallel read-only transactions.Me, too. How many concurrent connections are you running, and does
your working set exceed shared_buffers? Can you provide a
self-contained reproducible test case?The last tests I've been doing were with 8 connections.
And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
~ 50% of RAM).Given that additional information, I would say these results are
expected. Unfortunately, our BufFreelistLock is a serious contention
point, and I think that's what you're hitting. See the graph here:http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.
Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).
merlin
On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).
hm, looking at the code some more, it looks like the whole point of
the strategy system is to do that. ISTM bulk insert type queries
would be good candidates for a buffer strategy somehow?
merlin
On Thu, 24 May 2012, Robert Haas wrote:
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.
Actually the problem persits even when I trim the dataset size to be within
the shared_buffers.
Here is the dump (0.5 gig in size, tested with shared_buffers=10G,
work_mem=500Mb):
http://www.ast.cam.ac.uk/~koposov/files/dump.gz
And I attach the script
For my toy dataset the performance of a single thread goes down
from ~6.4 to 18 seconds (~ 3 times worse),
And actually while running the script repeatedly on my main machine, for
some reason I saw some variation in terms of how much threaded execution
is slower than a single thread.
Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a
single thread.
The oprofile shows
782355 21.5269 s_lock
782355 100.000 s_lock [self]
-------------------------------------------------------------------------------
709801 19.5305 PinBuffer
709801 100.000 PinBuffer [self]
-------------------------------------------------------------------------------
326457 8.9826 LWLockAcquire
326457 100.000 LWLockAcquire [self]
-------------------------------------------------------------------------------
309437 8.5143 UnpinBuffer
309437 100.000 UnpinBuffer [self]
-------------------------------------------------------------------------------
252972 6.9606 ReadBuffer_common
252972 100.000 ReadBuffer_common [self]
-------------------------------------------------------------------------------
201558 5.5460 LockBuffer
201558 100.000 LockBuffer [self]
------------------------------------------------------------
It is interesting that On another machine with much smaller shared memory
(3G), smaller RAM (12G), smaller number of cpus and PG 9.1 running I was
getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread)
PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7-
4807 (the total number of real cores is 24)
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/
On Thu, May 24, 2012 at 12:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).hm, looking at the code some more, it looks like the whole point of
the strategy system is to do that.
I thought you were suggesting that the StrategyGetBuffer would
pre-allocate multiple buffers to a backend under the cover of a single
BufFreelistLock. If that is what you were suggesting, that is not
what the strategy system is currently for. It is for locally reusing
buffers, not for gang-allocating them.
If a backend could somehow predict that the buffer it is about to read
in is likely going to be a cold buffer, perhaps it would make sense
for each backend to maintain an small ring of its own which it can
reuse for such cold buffers.
ISTM bulk insert type queries
would be good candidates for a buffer strategy somehow?
Probably. There is a code or README comment to that effect that I
stumbled upon just ra couple hours ago, but can't immediately re-find
it.
Cheers,
Jeff
On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).hm, looking at the code some more, it looks like the whole point of
the strategy system is to do that. ISTM bulk insert type queries
would be good candidates for a buffer strategy somehow?
Yep.
commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Nov 6 20:51:15 2008 +0000
Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks). Also, make it work in a sma
ring of buffers to avoid having bulk inserts trash the whole buffer arena.
Robert Haas, after an idea of Simon Riggs'.
But that doesn't help here, since pgbench is only looking up one row per query.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, May 24, 2012 at 3:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
hm, looking at the code some more, it looks like the whole point of
the strategy system is to do that. ISTM bulk insert type queries
would be good candidates for a buffer strategy somehow?Yep.
commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Nov 6 20:51:15 2008 +0000Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks). Also, make it work in a sma
ring of buffers to avoid having bulk inserts trash the whole buffer arena.Robert Haas, after an idea of Simon Riggs'.
But that doesn't help here, since pgbench is only looking up one row per query.
Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
large concurrent 'insert into foo select...'. I looked in the code
and it appears that the only bulk insert strategy using operations are
copy, create table as select, and table rewrite operations. Concurrent
INSERT SELECT apparently doesn't get the benefit of a strategy and
should be fighting over the freelist once the pool exhausts.
We don't get to skip wal of course, but we should be able to use a
bulk insert strategy, especially if there was some way of predicting
that a large number of tuples were going to be inserted. I'm
wondering though of contention on the free list is in fact the OP's
problem.
merlin
On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
large concurrent 'insert into foo select...'. I looked in the code
and it appears that the only bulk insert strategy using operations are
copy, create table as select, and table rewrite operations. Concurrent
INSERT SELECT apparently doesn't get the benefit of a strategy and
should be fighting over the freelist once the pool exhausts.
I think you are right.
We don't get to skip wal of course, but we should be able to use a
bulk insert strategy, especially if there was some way of predicting
that a large number of tuples were going to be inserted. I'm
wondering though of contention on the free list is in fact the OP's
problem.
Not sure. It might be some other LWLock, but it's hard to tell which
one from the information provided.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
We don't get to skip wal of course, but we should be able to use a
bulk insert strategy, especially if there was some way of predicting
that a large number of tuples were going to be inserted. �I'm
wondering though of contention on the free list is in fact the OP's
problem.
Not sure. It might be some other LWLock, but it's hard to tell which
one from the information provided.
Yeah. It seems quite plausible that Robert's select-only benchmark might
be mainly tripping over the freelist lock, but I'm less convinced about
something that's doing INSERT/SELECT, and therefore is also doing a lot
of WAL activity, index insertions, etc. I'd want to see some
instrumentation results before assuming we know where the bottleneck is
there.
regards, tom lane
Hi,
On Thu, 24 May 2012, Robert Haas wrote:
Not sure. It might be some other LWLock, but it's hard to tell which
one from the information provided.
If you could tell what's the best way to find out the info that you need,
then I could run it reasonably quickly.
S
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/
On Thu, May 24, 2012 at 3:36 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
Hi,
On Thu, 24 May 2012, Robert Haas wrote:
Not sure. It might be some other LWLock, but it's hard to tell which
one from the information provided.If you could tell what's the best way to find out the info that you need,
then I could run it reasonably quickly.
Add
#define LWLOCK_STATS
near the top of:
src/backend/storage/lmgr/lwlock.c
and recompile and run a reduced-size workload. When the processes
exits, they will dump a lot of data about LWLock usage to the logfile.
Generally the LWLock with the most blocks on it will be the main
culprit.
Cheers,
Jeff
On Thu, 24 May 2012, Jeff Janes wrote:
Add
#define LWLOCK_STATS
near the top of:
src/backend/storage/lmgr/lwlock.cand recompile and run a reduced-size workload. When the processes
exits, they will dump a lot of data about LWLock usage to the logfile.
Generally the LWLock with the most blocks on it will be the main
culprit.
Here is the output from a multi-threaded run (8thtreads, 22 seconds)
sorted by blk. Not sure whether that's of much use or not:
PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466
PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449
PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401
PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397
PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387
PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378
PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321
PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318
PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306
PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271
PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262
PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255
PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240
PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237
PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231
PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230
PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226
PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226
PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219
PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215
PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215
PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214
PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205
PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200
PID 7114 lwlock 7: shacq 0 exacq 547 blk 193
PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192
PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191
PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183
PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183
PID 7113 lwlock 7: shacq 0 exacq 548 blk 178
PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178
PID 7115 lwlock 7: shacq 0 exacq 549 blk 177
PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177
PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177
PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172
PID 7111 lwlock 7: shacq 0 exacq 549 blk 172
PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167
PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167
PID 7112 lwlock 7: shacq 0 exacq 548 blk 166
PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165
PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163
PID 7110 lwlock 7: shacq 0 exacq 548 blk 162
PID 7109 lwlock 7: shacq 0 exacq 548 blk 161
PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158
PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154
PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153
PID 7111 lwlock 38: shacq 333306 exacq 882 blk 152
PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151
PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147
PID 7108 lwlock 7: shacq 0 exacq 549 blk 145
PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142
PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142
PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140
PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140
PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139
PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139
PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138
PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137
PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126
PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125
PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122
PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120
PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119
PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119
PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117
PID 7109 lwlock 38: shacq 333340 exacq 322 blk 115
PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112
PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112
PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111
PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109
PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109
PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 109
PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108
PID 7108 lwlock 0: shacq 0 exacq 6504 blk 108
PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106
PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106
PID 7113 lwlock 37: shacq 127040 exacq 790 blk 105
PID 7111 lwlock 45: shacq 130893 exacq 863 blk 103
PID 7114 lwlock 35: shacq 135342 exacq 879 blk 102
PID 7113 lwlock 40: shacq 129560 exacq 853 blk 101
PID 7110 lwlock 41: shacq 133697 exacq 1063 blk 101
PID 7112 lwlock 35: shacq 135257 exacq 1229 blk 98
PID 7110 lwlock 46: shacq 132990 exacq 1059 blk 98
PID 7112 lwlock 37: shacq 127029 exacq 1229 blk 97
PID 7114 lwlock 40: shacq 129597 exacq 913 blk 96
PID 7110 lwlock 17: shacq 0 exacq 2655 blk 96
PID 7109 lwlock 46: shacq 133014 exacq 326 blk 96
PID 7111 lwlock 40: shacq 129720 exacq 890 blk 95
PID 7111 lwlock 46: shacq 133057 exacq 876 blk 94
PID 7114 lwlock 33: shacq 127651 exacq 921 blk 92
PID 7112 lwlock 39: shacq 127288 exacq 1176 blk 91
PID 7111 lwlock 39: shacq 127438 exacq 891 blk 91
PID 7114 lwlock 37: shacq 127038 exacq 894 blk 89
PID 7113 lwlock 33: shacq 127602 exacq 803 blk 89
PID 7110 lwlock 37: shacq 127038 exacq 1085 blk 89
PID 7110 lwlock 35: shacq 135333 exacq 1016 blk 88
PID 7113 lwlock 46: shacq 132954 exacq 833 blk 87
PID 7111 lwlock 41: shacq 133778 exacq 821 blk 87
PID 7115 lwlock 36: shacq 204845 exacq 535 blk 86
PID 7112 lwlock 33: shacq 127551 exacq 1149 blk 86
PID 7109 lwlock 17: shacq 0 exacq 3463 blk 86
PID 7110 lwlock 910: shacq 91 exacq 91 blk 84
PID 7110 lwlock 39: shacq 127396 exacq 1068 blk 84
PID 7109 lwlock 36: shacq 204581 exacq 363 blk 83
PID 7113 lwlock 41: shacq 133648 exacq 765 blk 82
PID 7114 lwlock 41: shacq 133757 exacq 897 blk 79
PID 7114 lwlock 39: shacq 127429 exacq 923 blk 79
S
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/
On Thu, May 24, 2012 at 6:26 PM, Sergey Koposov <koposov@ast.cam.ac.uk> wrote:
On Thu, 24 May 2012, Jeff Janes wrote:
Add
#define LWLOCK_STATS
near the top of:
src/backend/storage/lmgr/lwlock.cand recompile and run a reduced-size workload. When the processes
exits, they will dump a lot of data about LWLock usage to the logfile.
Generally the LWLock with the most blocks on it will be the main
culprit.Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
by blk. Not sure whether that's of much use or not:PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466
PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449
PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401
PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397
PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387
PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378
PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321
PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318
PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306
PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271
PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262
PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255
PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240
PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237
PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231
PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230
PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226
PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226
PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219
PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215
PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215
PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214
PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205
PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200
PID 7114 lwlock 7: shacq 0 exacq 547 blk 193
PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192
PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191
PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183
PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183
PID 7113 lwlock 7: shacq 0 exacq 548 blk 178
PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178
PID 7115 lwlock 7: shacq 0 exacq 549 blk 177
PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177
PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177
PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172
PID 7111 lwlock 7: shacq 0 exacq 549 blk 172
PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167
PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167
PID 7112 lwlock 7: shacq 0 exacq 548 blk 166
PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165
PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163
PID 7110 lwlock 7: shacq 0 exacq 548 blk 162
PID 7109 lwlock 7: shacq 0 exacq 548 blk 161
PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158
PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154
PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153
PID 7111 lwlock 38: shacq 333306 exacq 882 blk 152
PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151
PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147
PID 7108 lwlock 7: shacq 0 exacq 549 blk 145
PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142
PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142
PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140
PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140
PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139
PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139
PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138
PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137
PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126
PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125
PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122
PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120
PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119
PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119
PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117
PID 7109 lwlock 38: shacq 333340 exacq 322 blk 115
PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112
PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112
PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111
PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109
PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109
PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 109
PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108
PID 7108 lwlock 0: shacq 0 exacq 6504 blk 108
PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106
PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106
PID 7113 lwlock 37: shacq 127040 exacq 790 blk 105
PID 7111 lwlock 45: shacq 130893 exacq 863 blk 103
PID 7114 lwlock 35: shacq 135342 exacq 879 blk 102
PID 7113 lwlock 40: shacq 129560 exacq 853 blk 101
PID 7110 lwlock 41: shacq 133697 exacq 1063 blk 101
PID 7112 lwlock 35: shacq 135257 exacq 1229 blk 98
PID 7110 lwlock 46: shacq 132990 exacq 1059 blk 98
PID 7112 lwlock 37: shacq 127029 exacq 1229 blk 97
PID 7114 lwlock 40: shacq 129597 exacq 913 blk 96
PID 7110 lwlock 17: shacq 0 exacq 2655 blk 96
PID 7109 lwlock 46: shacq 133014 exacq 326 blk 96
PID 7111 lwlock 40: shacq 129720 exacq 890 blk 95
PID 7111 lwlock 46: shacq 133057 exacq 876 blk 94
PID 7114 lwlock 33: shacq 127651 exacq 921 blk 92
PID 7112 lwlock 39: shacq 127288 exacq 1176 blk 91
PID 7111 lwlock 39: shacq 127438 exacq 891 blk 91
PID 7114 lwlock 37: shacq 127038 exacq 894 blk 89
PID 7113 lwlock 33: shacq 127602 exacq 803 blk 89
PID 7110 lwlock 37: shacq 127038 exacq 1085 blk 89
PID 7110 lwlock 35: shacq 135333 exacq 1016 blk 88
PID 7113 lwlock 46: shacq 132954 exacq 833 blk 87
PID 7111 lwlock 41: shacq 133778 exacq 821 blk 87
PID 7115 lwlock 36: shacq 204845 exacq 535 blk 86
PID 7112 lwlock 33: shacq 127551 exacq 1149 blk 86
PID 7109 lwlock 17: shacq 0 exacq 3463 blk 86
PID 7110 lwlock 910: shacq 91 exacq 91 blk 84
PID 7110 lwlock 39: shacq 127396 exacq 1068 blk 84
PID 7109 lwlock 36: shacq 204581 exacq 363 blk 83
PID 7113 lwlock 41: shacq 133648 exacq 765 blk 82
PID 7114 lwlock 41: shacq 133757 exacq 897 blk 79
PID 7114 lwlock 39: shacq 127429 exacq 923 blk 79
These are all on the buffer partition locks. That makes sense...I was
wrong earlier: this case was in fact 'create table as', not 'insert
select' which rules out both the freelist lock and the wal insert lock
because create table as gets to use both a bulk insert strategy and
wal avoiding logic (assuming wal log level is not 'archive or higher'.
So, why aren't the lock partitions helping here?
merlin
On Fri, 25 May 2012, Merlin Moncure wrote:
These are all on the buffer partition locks. That makes sense...I was
wrong earlier: this case was in fact 'create table as', not 'insert
select' which rules out both the freelist lock and the wal insert lock
because create table as gets to use both a bulk insert strategy and
wal avoiding logic (assuming wal log level is not 'archive or higher'.
So, why aren't the lock partitions helping here?
FYI the WAL level is default, and here are the relevand changed params
from the .conf (autovacuum was disabled but the data was vacuumed
manually).
max_connections = 100 # (change requires restart)
shared_buffers = 10GB # min 128kB
work_mem = 500MB # min 64kB
maintenance_work_mem = 500MB # min 1MB
synchronous_commit = off # synchronization level;
checkpoint_segments = 20 # in logfile segments, min 1, 16MB each
enable_hashjoin = off
enable_mergejoin = off
effective_cache_size = 10GB
autovacuum = off # Enable autovacuum subprocess? 'on'
Regards,
S
*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/