reducing the overhead of frequent table locks - now, with WIP patch

Started by Robert Haasalmost 15 years ago99 messageshackers

robertmhaas@gmail.com

almost 15 years ago

I've now spent enough time working on this issue now to be convinced
that the approach has merit, if we can work out the kinks. I'll start
with some performance numbers.

The case where the current system for taking table locks is really
hurting us is where we have a large number of backends attempting to
access a small number of relations. They all fight over the lock
manager lock on whichever partition (or partitions) that relation (or
those relations) fall in. Increasing the number of partitions doesn't
help, because they are all trying to access the same object, and that
object is only ever going to be in one partition. To exercise this
case, I chose the following benchmark: pgbench -n -S -T 300 -c 36 -j
36. I first tested this on my MacBook Pro, with scale factor 10 and
shared_buffers=400MB. Here are the results of alternating runs
without and with the patch:

tps = 23997.120971 (including connections establishing)
tps = 25003.186860 (including connections establishing)
tps = 23499.257892 (including connections establishing)
tps = 24435.793773 (including connections establishing)
tps = 23579.624360 (including connections establishing)
tps = 24791.974810 (including connections establishing)

As you can see, this works out to a bit more than a 4% improvement on
this two-core box. I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB. Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

That's an improvement of about ~3.5x. According to the vmstat output,
when running without the patch, the CPU state was about 40% idle.
With the patch, it dropped down to around 6%.

There are numerous problems with the code as it stands at this point.
It crashes if you try to use 2PC, which means the regression tests
fail; it probably does horrible things if you run out of shared
memory; pg_locks knows nothing about the new mechanism (arguably, we
could leave it that way: only locks that can't possibly be conflicting
with anything can be taken using this mechanism, but it would be nice
to fix, I think); and there are likely some other gotchas as well.
Still, the basic mechanism appears to work.

The code is attached, for anyone who may be curious. Known idiocies
are marked with "ZZZ". The design was discussed on the previous
thread ("reducing the overhead of frequent table locks"), q.v. There
are some comments in the patch as well, but more is likely needed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Robert Haas (#1)

Re: reducing the overhead of frequent table locks - now, with WIP patch

Robert Haas <robertmhaas@gmail.com> wrote:

That's an improvement of about ~3.5x.

Outstanding!

I don't want to even peek at this until I've posted the two WIP SSI
patches (now both listed on the "Open Items" page), but will
definitely take a look after that.

-Kevin

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Kevin Grittner (#2)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Fri, Jun 3, 2011 at 10:13 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

That's an improvement of about ~3.5x.

Outstanding!

I don't want to even peek at this until I've posted the two WIP SSI
patches (now both listed on the "Open Items" page), but will
definitely take a look after that.

Yeah, those SSI items are important to get nailed down RSN. But
thanks for your interest in this patch. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Noah Misch

noah@leadboat.com

almost 15 years ago

In reply to: Robert Haas (#1)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Fri, Jun 03, 2011 at 09:17:08AM -0400, Robert Haas wrote:

As you can see, this works out to a bit more than a 4% improvement on
this two-core box. I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB. Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

Nice!

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#1)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Fri, Jun 3, 2011 at 2:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I've now spent enough time working on this issue now to be convinced
that the approach has merit, if we can work out the kinks.

Yes, the approach has merits and I'm sure we can work out the kinks.

As you can see, this works out to a bit more than a 4% improvement on
this two-core box. I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB. Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

That's an improvement of about ~3.5x. According to the vmstat output,
when running without the patch, the CPU state was about 40% idle.
With the patch, it dropped down to around 6%.

Congratulations. I believe that is realistic based upon my investigations.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Simon Riggs (#5)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Sat, Jun 4, 2011 at 2:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

As you can see, this works out to a bit more than a 4% improvement on
this two-core box. I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB. Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

That's an improvement of about ~3.5x. According to the vmstat output,
when running without the patch, the CPU state was about 40% idle.
With the patch, it dropped down to around 6%.

Congratulations. I believe that is realistic based upon my investigations.

Tom,

You should look at this. It's good.

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

I will happily add its a completely different approach to the one I'd
been working on, and even more happily is so different from the Oracle
approach that we are definitely unencumbered by patent issues here.
Well done Robert, Noah.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Simon Riggs (#6)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 04.06.2011 18:01, Simon Riggs wrote:

It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

-1

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Kevin Grittner (#2)

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs wrote:

we should be considering this for inclusion in 9.1, not wait
another year.

-1

I'm really happy that we're addressing the problems with scaling to
a large number of cores, and this patch sounds great. Adding a new
feature at this point in the release cycle would be horrible.
Frankly, from the tone of Robert's post, it probably wouldn't be
appropriate to include it in a release if it showed up in this
condition at the start of the last CF for that release.

The nice thing about annual releases is there's never one too far
away -- unless, of course, we hold up a release up to squeeze in
"just one more" feature.

-Kevin

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Simon Riggs (#6)

Re: reducing the overhead of frequent table locks - now, with WIP patch

Simon Riggs <simon@2ndquadrant.com> writes:

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

That suggestion is completely insane. The patch is only WIP and full of
bugs, even according to its author. Even if it were solid, it is way
too late to be pushing such stuff into 9.1. We're trying to ship a
release, not find ways to cause it to slip more.

regards, tom lane

#10

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 15 years ago

In reply to: Robert Haas (#1)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 06/03/2011 03:17 PM, Robert Haas wrote:
[...]

As you can see, this works out to a bit more than a 4% improvement on
this two-core box. I also got access (thanks to Nate Boley) to a
24-core box and ran the same test with scale factor 100 and
shared_buffers=8GB. Here are the results of alternating runs without
and with the patch on that machine:

tps = 36291.996228 (including connections establishing)
tps = 129242.054578 (including connections establishing)
tps = 36704.393055 (including connections establishing)
tps = 128998.648106 (including connections establishing)
tps = 36531.208898 (including connections establishing)
tps = 131341.367344 (including connections establishing)

That's an improvement of about ~3.5x. According to the vmstat output,
when running without the patch, the CPU state was about 40% idle.
With the patch, it dropped down to around 6%.

nice - but lets see on real hardware...

Testing this on a brand new E7-4850 4 Socket/10cores+HT Box - so 80
hardware threads:

first some numbers with -HEAD(-T 120, runtimes at lower -c counts have
fairly high variation in the results, first number is the number of
connections/threads):

-j1: tps = 7928.965493 (including connections establishing)
-j8: tps = 53610.572347 (including connections establishing)
-j16: tps = 80835.446118 (including connections establishing)
-j32: tps = 75666.731883 (including connections establishing)
-j40: tps = 74628.568388 (including connections establishing)
-j64. tps = 68268.081973 (including connections establishing)
-c80 tps = 66704.216166 (including connections establishing)

postgresql is completely lock limited in this test anything beyond
around -j10 is basically not able to push the box to more than 80% IDLE(!)

and now with the patch applied:

-j1: tps = 7783.295587 (including connections establishing)
-j8: tps = 44361.661947 (including connections establishing)
-j16: tps = 92270.464541 (including connections establishing)
-j24: tps = 108259.524782 (including connections establishing)
-j32: tps = 183337.422612 (including connections establishing)
-j40 tps = 209616.052430 (including connections establishing)
-j48: tps = 229621.292382 (including connections establishing)
-j56: tps = 218690.391603 (including connections establishing)
-j64: tps = 188028.348501 (including connections establishing)
-j80. tps = 118814.741609 (including connections establishing)

so much better - but I still think there is some headroom left still,
although pgbench itself is a CPU hog in those benchmark with eating up
to 10 cores in the worst case scenario - will retest with sysbench which
in the past showed more reasonable CPU usage for me.

and a profile(patched code) for the -j48(aka fastest) case:

731535 11.8408 postgres s_lock
291878 4.7244 postgres LWLockAcquire
242373 3.9231 postgres AllocSetAlloc
239083 3.8698 postgres LWLockRelease
202341 3.2751 postgres SearchCatCache
190055 3.0763 postgres hash_search_with_hash_value
187148 3.0292 postgres base_yyparse
173265 2.8045 postgres GetSnapshotData
75700 1.2253 postgres core_yylex
74974 1.2135 postgres MemoryContextAllocZeroAligned
61404 0.9939 postgres _bt_compare
57529 0.9312 postgres MemoryContextAlloc

and one for the -j80 case(also patched).

485798 48.9667 postgres s_lock
60327 6.0808 postgres LWLockAcquire
57049 5.7503 postgres LWLockRelease
18357 1.8503 postgres hash_search_with_hash_value
17033 1.7169 postgres GetSnapshotData
14763 1.4881 postgres base_yyparse
14460 1.4575 postgres SearchCatCache
13975 1.4086 postgres AllocSetAlloc
6416 0.6467 postgres PinBuffer
5024 0.5064 postgres SIGetDataEntries
4704 0.4741 postgres core_yylex
4625 0.4662 postgres _bt_compare

Stefan

#11

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Stefan Kaltenbrunner (#10)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:

and one for the -j80 case(also patched).

485798 48.9667 postgres s_lock
60327 6.0808 postgres LWLockAcquire
57049 5.7503 postgres LWLockRelease
18357 1.8503 postgres hash_search_with_hash_value
17033 1.7169 postgres GetSnapshotData
14763 1.4881 postgres base_yyparse
14460 1.4575 postgres SearchCatCache
13975 1.4086 postgres AllocSetAlloc
6416 0.6467 postgres PinBuffer
5024 0.5064 postgres SIGetDataEntries
4704 0.4741 postgres core_yylex
4625 0.4662 postgres _bt_compare

Hmm, does that mean that it's spending 50% of the time spinning on a
spinlock? That's bad. It's one thing to be contended on a lock, and have
a lot of idle time because of that, but it's even worse to spend a lot
of time spinning because that CPU time won't be spent on doing more
useful work, even if there is some other process on the system that
could make use of that CPU time.

I like the overall improvement on the throughput, of course, but we have
to find a way to avoid the busy-wait.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 15 years ago

In reply to: Heikki Linnakangas (#11)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 06/05/2011 09:12 PM, Heikki Linnakangas wrote:

On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:

and one for the -j80 case(also patched).

485798 48.9667 postgres s_lock
60327 6.0808 postgres LWLockAcquire
57049 5.7503 postgres LWLockRelease
18357 1.8503 postgres hash_search_with_hash_value
17033 1.7169 postgres GetSnapshotData
14763 1.4881 postgres base_yyparse
14460 1.4575 postgres SearchCatCache
13975 1.4086 postgres AllocSetAlloc
6416 0.6467 postgres PinBuffer
5024 0.5064 postgres SIGetDataEntries
4704 0.4741 postgres core_yylex
4625 0.4662 postgres _bt_compare

Hmm, does that mean that it's spending 50% of the time spinning on a
spinlock? That's bad. It's one thing to be contended on a lock, and have
a lot of idle time because of that, but it's even worse to spend a lot
of time spinning because that CPU time won't be spent on doing more
useful work, even if there is some other process on the system that
could make use of that CPU time.

well yeah - we are broken right now with only being able to use ~20% of
CPU on a modern mid-range box, but using 80% CPU (or 4x like in the
above case) and only getting less than 2x the performance seems wrong as
well. I also wonder if we are still missing something fundamental -
because even with the current patch we are quite far away from linear
scaling and light-years from some of our competitors...

Stefan

#13

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Stefan Kaltenbrunner (#12)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Sun, Jun 5, 2011 at 4:01 PM, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:

On 06/05/2011 09:12 PM, Heikki Linnakangas wrote:

On 05.06.2011 22:04, Stefan Kaltenbrunner wrote:

and one for the -j80 case(also patched).

485798 48.9667 postgres s_lock
60327 6.0808 postgres LWLockAcquire
57049 5.7503 postgres LWLockRelease
18357 1.8503 postgres hash_search_with_hash_value
17033 1.7169 postgres GetSnapshotData
14763 1.4881 postgres base_yyparse
14460 1.4575 postgres SearchCatCache
13975 1.4086 postgres AllocSetAlloc
6416 0.6467 postgres PinBuffer
5024 0.5064 postgres SIGetDataEntries
4704 0.4741 postgres core_yylex
4625 0.4662 postgres _bt_compare

Hmm, does that mean that it's spending 50% of the time spinning on a
spinlock? That's bad. It's one thing to be contended on a lock, and have
a lot of idle time because of that, but it's even worse to spend a lot
of time spinning because that CPU time won't be spent on doing more
useful work, even if there is some other process on the system that
could make use of that CPU time.

well yeah - we are broken right now with only being able to use ~20% of
CPU on a modern mid-range box, but using 80% CPU (or 4x like in the
above case) and only getting less than 2x the performance seems wrong as
well. I also wonder if we are still missing something fundamental -
because even with the current patch we are quite far away from linear
scaling and light-years from some of our competitors...

Could you compile with LWLOCK_STATS, rerun these tests, total up the
"blk" numbers by LWLockId, and post the results? (Actually, totalling
up the shacq and exacq numbers would be useful as well, if you
wouldn't mind.)

Unless I very much miss my guess, we're going to see zero contention
on the new structures introduced by this patch. Rather, I suspect
what we're going to find is that, with the hideous contention on one
particular lock manager partition lock removed, there's a more
spread-out contention problem, likely involving the lock manager
partition lock, the buffer mapping locks, and possibly other LWLocks
as well. The fact that the system is busy-waiting rather than just
not using the CPU at all probably means that the remaining contention
is more spread out than that which is removed by this patch. We don't
actually have everything pile up on a single LWLock (as happens in git
master), but we do spend a lot of time fighting cache lines away from
other CPUs. Or at any rate, that's my guess: we need some real
numbers to know for sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Robert Haas (#13)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Sun, Jun 5, 2011 at 5:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Could you compile with LWLOCK_STATS, rerun these tests, total up the
"blk" numbers by LWLockId, and post the results? (Actually, totalling
up the shacq and exacq numbers would be useful as well, if you
wouldn't mind.)

I did this on the loaner 24-core box from Nate Boley and got the
following results. This is just the LWLocks that had blk>0.

lwlock 0: shacq 0 exacq 200625 blk 24044
lwlock 4: shacq 80101430 exacq 196 blk 28
lwlock 33: shacq 8333673 exacq 11977 blk 864
lwlock 34: shacq 7092293 exacq 11890 blk 803
lwlock 35: shacq 7893875 exacq 11909 blk 848
lwlock 36: shacq 7567514 exacq 11912 blk 830
lwlock 37: shacq 7427774 exacq 11930 blk 745
lwlock 38: shacq 7120108 exacq 11989 blk 853
lwlock 39: shacq 7584952 exacq 11982 blk 782
lwlock 40: shacq 7949867 exacq 12056 blk 821
lwlock 41: shacq 6612240 exacq 11929 blk 746
lwlock 42: shacq 47512112 exacq 11844 blk 4503
lwlock 43: shacq 7943511 exacq 11871 blk 878
lwlock 44: shacq 7534558 exacq 12033 blk 800
lwlock 45: shacq 7128256 exacq 12045 blk 856
lwlock 46: shacq 7575339 exacq 12015 blk 818
lwlock 47: shacq 6745173 exacq 12094 blk 806
lwlock 48: shacq 8410348 exacq 12104 blk 977
lwlock 49: shacq 0 exacq 5007594 blk 172533
lwlock 50: shacq 0 exacq 5011704 blk 172282
lwlock 51: shacq 0 exacq 5003356 blk 172802
lwlock 52: shacq 0 exacq 5009020 blk 174648
lwlock 53: shacq 0 exacq 5010808 blk 172080
lwlock 54: shacq 0 exacq 5004908 blk 169934
lwlock 55: shacq 0 exacq 5009324 blk 170281
lwlock 56: shacq 0 exacq 5005904 blk 171001
lwlock 57: shacq 0 exacq 5006984 blk 169942
lwlock 58: shacq 0 exacq 5000346 blk 170001
lwlock 59: shacq 0 exacq 5004884 blk 170484
lwlock 60: shacq 0 exacq 5006304 blk 171325
lwlock 61: shacq 0 exacq 5008421 blk 170866
lwlock 62: shacq 0 exacq 5008162 blk 170868
lwlock 63: shacq 0 exacq 5002238 blk 170291
lwlock 64: shacq 0 exacq 5005348 blk 169764
lwlock 307: shacq 0 exacq 2 blk 1
lwlock 315: shacq 0 exacq 3 blk 2
lwlock 337: shacq 0 exacq 4 blk 3
lwlock 345: shacq 0 exacq 2 blk 1
lwlock 349: shacq 0 exacq 2 blk 1
lwlock 231251: shacq 0 exacq 2 blk 1
lwlock 253831: shacq 0 exacq 2 blk 1

So basically, even with the patch, at 24 cores the lock manager locks
are still under tremendous pressure. But note that there's a big
difference between what's happening here and what's happening without
the patch. Here's without the patch:

lwlock 0: shacq 0 exacq 191613 blk 17591
lwlock 4: shacq 21543085 exacq 102 blk 20
lwlock 33: shacq 2237938 exacq 11976 blk 463
lwlock 34: shacq 1907344 exacq 11890 blk 458
lwlock 35: shacq 2125308 exacq 11908 blk 442
lwlock 36: shacq 2038220 exacq 11912 blk 430
lwlock 37: shacq 1998059 exacq 11927 blk 449
lwlock 38: shacq 1916179 exacq 11953 blk 409
lwlock 39: shacq 2042173 exacq 12019 blk 479
lwlock 40: shacq 2140002 exacq 12056 blk 448
lwlock 41: shacq 1776772 exacq 11928 blk 392
lwlock 42: shacq 12777368 exacq 11842 blk 2451
lwlock 43: shacq 2132240 exacq 11869 blk 478
lwlock 44: shacq 2026845 exacq 12031 blk 446
lwlock 45: shacq 1918618 exacq 12045 blk 449
lwlock 46: shacq 2038437 exacq 12011 blk 472
lwlock 47: shacq 1814660 exacq 12089 blk 401
lwlock 48: shacq 2261208 exacq 12105 blk 478
lwlock 49: shacq 0 exacq 1347524 blk 17020
lwlock 50: shacq 0 exacq 1350678 blk 16888
lwlock 51: shacq 0 exacq 1346260 blk 16744
lwlock 52: shacq 0 exacq 1348432 blk 16864
lwlock 53: shacq 0 exacq 22216779 blk 4914363
lwlock 54: shacq 0 exacq 22217309 blk 4525381
lwlock 55: shacq 0 exacq 1348406 blk 13438
lwlock 56: shacq 0 exacq 1345996 blk 13299
lwlock 57: shacq 0 exacq 1347890 blk 13654
lwlock 58: shacq 0 exacq 1343486 blk 13349
lwlock 59: shacq 0 exacq 1346198 blk 13471
lwlock 60: shacq 0 exacq 1346236 blk 13532
lwlock 61: shacq 0 exacq 1343688 blk 13547
lwlock 62: shacq 0 exacq 1350068 blk 13614
lwlock 63: shacq 0 exacq 1345302 blk 13420
lwlock 64: shacq 0 exacq 1348858 blk 13635
lwlock 321: shacq 0 exacq 2 blk 1
lwlock 329: shacq 0 exacq 4 blk 3
lwlock 337: shacq 0 exacq 6 blk 4
lwlock 347: shacq 0 exacq 5 blk 4
lwlock 357: shacq 0 exacq 3 blk 2
lwlock 363: shacq 0 exacq 3 blk 2
lwlock 369: shacq 0 exacq 4 blk 3
lwlock 379: shacq 0 exacq 2 blk 1
lwlock 383: shacq 0 exacq 2 blk 1
lwlock 445: shacq 0 exacq 2 blk 1
lwlock 449: shacq 0 exacq 2 blk 1
lwlock 451: shacq 0 exacq 2 blk 1
lwlock 1023: shacq 0 exacq 2 blk 1
lwlock 11401: shacq 0 exacq 2 blk 1
lwlock 115591: shacq 0 exacq 2 blk 1
lwlock 117177: shacq 0 exacq 2 blk 1
lwlock 362839: shacq 0 exacq 2 blk 1

In the unpatched case, two lock manager locks are getting beaten to
death, and the others all about equally contended. By eliminating the
portion of the lock manager contention that pertains specifically to
the two heavily trafficked locks, system throughput improves by about
3.5x - and, not surprisingly, traffic on the lock manager locks
increases by approximately the same multiple. Those locks now become
the contention bottleneck, with about 12x the blocking they had
pre-patch. I'm definitely interested in investigating what to do
about that, but I don't think it's this patch's problem to fix all of
our lock manager bottlenecks. Another thing to note is that
pre-patch, the two really badly contented LWLocks were blocking about
22% of the time; post-patch, all of the lock manager locks are
blocking about 3.4% of the time. That's certainly not great, but it's
progress.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Robert Haas (#14)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Sun, Jun 5, 2011 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm definitely interested in investigating what to do
about that, but I don't think it's this patch's problem to fix all of
our lock manager bottlenecks.

I did some further investigation of this. It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it. That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires. Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction. It
might be possible to make the lock entry spring into existence "on
demand" - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid. That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been "given" a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues. I'll look into this
further.

It's likely that if we lick this problem, the BufFreelistLock and
BufMappingLocks are going to be the next hot spot. Of course, we're
ignoring the ten-thousand pound gorilla in the corner, which is that
on write workloads we have a pretty bad contention problem with
WALInsertLock, which I fear will not be so easily addressed. But one
problem at a time, I guess.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Robert Haas (#15)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 06.06.2011 07:12, Robert Haas wrote:

I did some further investigation of this. It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it. That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires. Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction. It
might be possible to make the lock entry spring into existence "on
demand" - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid. That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been "given" a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues. I'll look into this
further.

Ah, I remember I saw that vxid lock pop up quite high in an oprofile
profile recently. I think it was the case of executing a lot of very
simple prepared queries. So it would be nice to address that, even from
a single CPU point of view.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#17

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Tom Lane (#9)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

That suggestion is completely insane. The patch is only WIP and full of
bugs, even according to its author. Even if it were solid, it is way
too late to be pushing such stuff into 9.1. We're trying to ship a
release, not find ways to cause it to slip more.

In 8.3, you implemented virtual transactionids days before we produced
a Release Candidate, against my recommendation.

At that time, I didn't start questioning your sanity. In fact we all
applauded that because it was a great performance gain.

The fact that you disagree with me does not make me insane. Inaction
on this point, resulting in a year's delay, will be considered to be a
gross waste by the majority of objective observers.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Simon Riggs (#17)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 06.06.2011 12:40, Simon Riggs wrote:

On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Simon Riggs<simon@2ndquadrant.com> writes:

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.

That suggestion is completely insane. The patch is only WIP and full of
bugs, even according to its author. Even if it were solid, it is way
too late to be pushing such stuff into 9.1. We're trying to ship a
release, not find ways to cause it to slip more.

In 8.3, you implemented virtual transactionids days before we produced
a Release Candidate, against my recommendation.

FWIW, this bottleneck was not introduced by the introduction of virtual
transaction ids. Before that patch, we just took the lock on the real
transaction id instead.

The fact that you disagree with me does not make me insane.

You are not insane, even if your suggestion is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Heikki Linnakangas (#16)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On Mon, Jun 6, 2011 at 2:54 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile
recently. I think it was the case of executing a lot of very simple prepared
queries. So it would be nice to address that, even from a single CPU point
of view.

It doesn't seem too hard to do, although I have to think about the
details. Even though the VXID locks involved are Exclusive locks,
they are actually very much like the "weak" locks that the current
patch accelerates, because the Exclusive lock is taken only by the
VXID owner, and it can therefore be safely assumed that the initial
lock acquisition won't block anything. Therefore, it's really
unnecessary to touch the primary lock table at transaction start (and
to only touch it at the end if someone's waiting). However, there's a
fly in the ointment: when someone tries to ShareLock a VXID, we need
to determine whether that VXID is still around and, if so, make an
Exclusive lock entry for it in the primary lock table. And, unlike
what I'm doing for strong relation locks, it's probably NOT acceptable
for that to acquire and release every per-backend LWLock, because
every place that waits for VXID locks waits for a list of locks in
sequence, so we could end up with O(n^2) behavior. Now, in theory
that's not a huge problem: the VXID includes the backend ID, so we
ought to be able to figure out which single per-backend LWLock is of
interest and just acquire/release that one. Unfortunately, it appears
that there's no easy way to go from a backend ID to a PGPROC. The
backend IDs are offsets into the "ProcState" array, so they give us a
pointer to the backend's sinval state, not its PGPROC. And while the
PGPROC has a pointer to the sinval info, there's no pointer in the
opposite direction. Even if there were, we'd probably need to hold
SInvalWriteLock in shared mode to follow it.

That might not be the end of the world, since VXID locks are fairly
infrequently used, but it's certainly a little grotty. I do rather
wonder if we should be trying to reduce the number of separate places
where we list the running processes. We have arrays of PGPROC
structures, and then we have one set of pointers to PGPROCs in the
ProcArray, and then we have the ProcState structures for sinval. I
wonder if there's some way to rearrange all this to simplify the
bookkeeping.

BTW, how do you identify from oprofile that *vxid* locks were the
problem? I didn't think it could produce that level of detail.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Robert Haas (#15)

Re: reducing the overhead of frequent table locks - now, with WIP patch

On 06.06.2011 07:12, Robert Haas wrote:

I did some further investigation of this. It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it. That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires. Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction. It
might be possible to make the lock entry spring into existence "on
demand" - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid. That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been "given" a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues. I'll look into this
further.

At the moment, the transaction with given vxid acquires an ExclusiveLock
on the vxid, and anyone who wants to wait for it to finish acquires a
ShareLock. If we simply reverse that, so that the transaction itself
takes ShareLock, and anyone wanting to wait on it take an ExclusiveLock,
will this fastlock patch bust this bottleneck too?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Heikki Linnakangas (#20)

#22

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Robert Haas (#19)

#23

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Robert Haas (#1)

#24

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Heikki Linnakangas (#18)

#25

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#24)

#26

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Robert Haas (#25)

#27

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Kevin Grittner (#26)

#28

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Robert Haas (#1)

#29

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 15 years ago

In reply to: Robert Haas (#25)

#30

Dave Page

dpage@pgadmin.org

almost 15 years ago

In reply to: Dimitri Fontaine (#29)

#31

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

almost 15 years ago

In reply to: Dave Page (#30)

#32

Stephen Frost

sfrost@snowman.net

almost 15 years ago

In reply to: Dave Page (#30)

#33

Dave Page

dpage@pgadmin.org

almost 15 years ago

In reply to: Stefan Kaltenbrunner (#31)

#34

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Dimitri Fontaine (#29)

#35

Andrew Dunstan

andrew@dunslane.net

almost 15 years ago

In reply to: Dave Page (#30)

#36

Dave Page

dpage@pgadmin.org

almost 15 years ago

In reply to: Stephen Frost (#32)

#37

Jignesh K. Shah

J.K.Shah@Sun.COM

almost 15 years ago

In reply to: Josh Berkus (#28)

#38

Chris Browne

cbbrowne@acm.org

almost 15 years ago

In reply to: Simon Riggs (#27)

#39

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Chris Browne (#38)

#40

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Stephen Frost (#32)

#41

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Dave Page (#36)

#42

Alvaro Herrera

alvherre@2ndquadrant.com

almost 15 years ago

In reply to: Dimitri Fontaine (#29)

#43

Alvaro Herrera

alvherre@2ndquadrant.com

almost 15 years ago

In reply to: Robert Haas (#1)

#44

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Dave Page (#36)

#45

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Alvaro Herrera (#43)

#46

Stephen Frost

sfrost@snowman.net

almost 15 years ago

In reply to: Simon Riggs (#41)

#47

Jignesh K. Shah

J.K.Shah@Sun.COM

almost 15 years ago

In reply to: Josh Berkus (#28)

#48

Dave Page

dpage@pgadmin.org

almost 15 years ago

In reply to: Tom Lane (#44)

#49

Stephen Frost

sfrost@snowman.net

almost 15 years ago

In reply to: Alvaro Herrera (#42)

#50

Joshua D. Drake

jd@commandprompt.com

almost 15 years ago

In reply to: Robert Haas (#45)

#51

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Dave Page (#33)

#52

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#51)

#53

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Joshua D. Drake (#50)

#54

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Simon Riggs (#51)

#55

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Dave Page (#48)

#56

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#53)

#57

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Robert Haas (#53)

#58

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Josh Berkus (#55)

#59

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Josh Berkus (#57)

#60

Thom Brown

thom@linux.com

almost 15 years ago

In reply to: Tom Lane (#58)

#61

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Tom Lane (#56)

#62

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Thom Brown (#60)

#63

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#59)

#64

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#59)

#65

Stephen Frost

sfrost@snowman.net

almost 15 years ago

In reply to: Simon Riggs (#64)

#66

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Simon Riggs (#64)

#67

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#64)

#68

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Simon Riggs (#64)

#69

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Tom Lane (#68)

#70

Jignesh K. Shah

J.K.Shah@Sun.COM

almost 15 years ago

In reply to: Jignesh K. Shah (#47)

#71

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#61)

#72

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Simon Riggs (#69)

#73

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Jignesh K. Shah (#70)

#74

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Tom Lane (#72)

#75

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#51)

#76

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#75)

#77

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#76)

#78

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Simon Riggs (#74)

#79

Alvaro Herrera

alvherre@2ndquadrant.com

almost 15 years ago

In reply to: Robert Haas (#62)

#80

Bruce Momjian

bruce@momjian.us

almost 15 years ago

In reply to: Robert Haas (#25)

#81

Bruce Momjian

bruce@momjian.us

almost 15 years ago

In reply to: Bruce Momjian (#80)

#82

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Bruce Momjian (#80)

#83

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 15 years ago

In reply to: Stephen Frost (#49)

#84

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Bruce Momjian (#80)

#85

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Jim Nasby (#83)

#86

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Tom Lane (#82)

#87

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Bruce Momjian (#81)

#88

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#84)

#89

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#85)

#90

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Robert Haas (#88)

#91

Josh Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Simon Riggs (#90)

#92

Joshua D. Drake

jd@commandprompt.com

almost 15 years ago

In reply to: Tom Lane (#68)

#93

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#90)

#94

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Simon Riggs (#86)

#95

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Josh Berkus (#91)

#96

Simon Riggs

simon@2ndQuadrant.com

almost 15 years ago

In reply to: Josh Berkus (#91)

#97

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Simon Riggs (#96)

#98

Dave Page

dpage@pgadmin.org

almost 15 years ago

In reply to: Robert Haas (#97)

#99

Pavan Deolasee

pavan.deolasee@gmail.com

almost 15 years ago

In reply to: Dave Page (#98)

reducing the overhead of frequent table locks - now, with WIP patch

Attachments: