Patch: fix lock contention for HASHHDR.mutex

Started by Aleksander Alekseevover 10 years ago79 messageshackers
Jump to latest
#1Aleksander Alekseev
aleksander@timescale.com

Hello all,

Consider following stacktrace:

(gdb) bt
#0 0x00007f77c81fae87 in semop () syscall-template.S:81
#1 0x000000000063b721 in PGSemaphoreLock pg_sema.c:387
#2 0x0000000000692e1b in LWLockAcquire lwlock.c:1026
#3 0x000000000068d4c5 in LockAcquireExtended lock.c:881
#4 0x000000000068dcc1 in LockAcquire lock.c:672
#5 0x000000000068b7a8 in LockRelationOid lmgr.c:112
#6 0x0000000000501d18 in find_inheritance_children pg_inherits.c:120
#7 0x0000000000501d80 in find_all_inheritors pg_inherits.c:182
#8 0x000000000062db8d in expand_inherited_rtentry prepunion.c:1285
#9 expand_inherited_tables prepunion.c:1212
#10 0x0000000000622705 in subquery_planner planner.c:501
#11 0x0000000000622d31 in standard_planner planner.c:285
#12 0x000000000069ef0c in pg_plan_query postgres.c:809
#13 0x000000000069f004 in pg_plan_queries postgres.c:868
#14 0x00000000006a0fc2 in exec_simple_query postgres.c:1033
#15 PostgresMain postgres.c:4032
#16 0x0000000000467479 in BackendRun postmaster.c:4237
#17 BackendStartup postmaster.c:3913
#18 ServerLoop () postmaster.c:1684
#19 0x000000000064c828 in PostmasterMain postmaster.c:1292
#20 0x0000000000467f3e in main main.c:223

Turns out PostgreSQL can spend a lot of time waiting for a lock in this
particular place, especially if you are running PostgreSQL on 60-core
server. Which obviously is a pretty bad sign.

You can easily reproduce this issue on regular Core i7 server. Use
attached schema.sql file to create a database and run:

pgbench -j 8 -c 8 -f pgbench.sql -T 300 my_database 2>/dev/null &

While this example is running connect to some PostgreSQL process with
gdb and run bt/c from time to time. You will see that PostgreSQL waits
for this particular lock quite often.

The problem is that code between LWLockAcquire (lock.c:881) and
LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
old-good gettimeofday and logging method I managed to find a bottleneck:

-- proclock = SetupLockInTable [lock.c:892]
`-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
`-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
`-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]

If my measurements are not wrong (I didn't place gettimeofday between
SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
waiting for a spinlock, which doesn't seems right.

I managed to fix this behaviour by modifying choose_nelem_alloc
procedure in dynahash.c (see attached patch). The idea is to double
number of items we allocate when there is no more free items in hash
table. So we need twice less allocations which reduces contention.

This patch doesn't cause any performance degradation according to
pgbench, `make check` passes, etc.

Best regards,
Aleksander

Attachments:

lock-contention.patchtext/x-patchDownload+3-3
schema.sqlapplication/sqlDownload
pgbench.sqlapplication/sqlDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aleksander Alekseev (#1)
Re: Patch: fix lock contention for HASHHDR.mutex

Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:

Turns out PostgreSQL can spend a lot of time waiting for a lock in this
particular place, especially if you are running PostgreSQL on 60-core
server. Which obviously is a pretty bad sign.
...
I managed to fix this behaviour by modifying choose_nelem_alloc
procedure in dynahash.c (see attached patch).

TBH, this is just voodoo. I don't know why this change would have made
any impact on lock acquisition performance, and neither do you, and the
odds are good that it's pure chance that it changed anything. One likely
theory is that you managed to shift around memory allocations so that
something aligned on a cacheline boundary when it hadn't before. But, of
course, the next patch that changes allocations anywhere in shared memory
could change that back. There are lots of effects like this that appear
or disappear based on seemingly unrelated code changes when you're
measuring edge-case performance.

The patch is not necessarily bad in itself. As time goes by and machines
get bigger, it can make sense to allocate more memory at a time to reduce
memory management overhead. But arguing for it on the basis that it fixes
lock allocation behavior with 60 cores is just horsepucky. What you were
measuring there was steady-state hash table behavior, not the speed of the
allocate-some-more-memory code path.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#2)
Re: Patch: fix lock contention for HASHHDR.mutex

Hello, Tom

I see your point, but I would like to clarify a few things.

1. Do we consider described measurement method good enough to conclude
that sometimes PostgreSQL really spends 3 ms in a spinlock (like a RTT
between two Internet hosts in the same city)? If not, what method
should be used to approve or disapprove this?

2. If we agree that PostgreSQL does sometimes spend 3 ms in a spinlock
do we consider this a problem?

3. If we consider this a problem, what method is considered appropriate
to find a real reason of such behaviour so we could fix it?

Best regards,
Aleksander

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#3)
Re: Patch: fix lock contention for HASHHDR.mutex

Oops. s/approve or disapprove/confirm or deny/

On Fri, 11 Dec 2015 19:14:41 +0300
Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:

Hello, Tom

I see your point, but I would like to clarify a few things.

1. Do we consider described measurement method good enough to conclude
that sometimes PostgreSQL really spends 3 ms in a spinlock (like a RTT
between two Internet hosts in the same city)? If not, what method
should be used to approve or disapprove this?

2. If we agree that PostgreSQL does sometimes spend 3 ms in a spinlock
do we consider this a problem?

3. If we consider this a problem, what method is considered
appropriate to find a real reason of such behaviour so we could fix
it?

Best regards,
Aleksander

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Aleksander Alekseev (#3)
Re: Patch: fix lock contention for HASHHDR.mutex

On 11 December 2015 at 16:14, Aleksander Alekseev <a.alekseev@postgrespro.ru

wrote:

I see your point, but I would like to clarify a few things.

1. Do we consider described measurement method good enough to conclude
that sometimes PostgreSQL really spends 3 ms in a spinlock (like a RTT
between two Internet hosts in the same city)? If not, what method
should be used to approve or disapprove this?

Timing things seems fine. You may have located an important issue.

2. If we agree that PostgreSQL does sometimes spend 3 ms in a spinlock
do we consider this a problem?

Probably, but then Tom didn't question 1 or 2. What he questioned was your
fix.

3. If we consider this a problem, what method is considered appropriate
to find a real reason of such behaviour so we could fix it?

The problem you identify is in only one place, yet your fix changes many
parts of the code. Why is that the best fix out of the many possible ones?
Why would such a change be acceptable?

I personally don't know the answers to those questions. It would be
wonderful if somebody else would structure our lives such that all we had
to do was find simple answers, but that isn't the way life is. You ask for
a chain of logical thought, but it is for you to create one, somehow:
patches are default-reject, not committer-explain-why-reject.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#2)
Re: Patch: fix lock contention for HASHHDR.mutex

Hello, Tom.

I was exploring this issue further and discovered something strange.

"PROCLOCK hash" and "LOCK hash" are hash tables in shared memory. All
memory for these tables is in fact pre-allocated. But for some reason
these two tables are created (lock.c:394) with init_size =/= max_size.
It causes small overhead on calling memory allocator after hash table
creation and additional locking/unlocking.

I checked all other hash tables created via ShmemInitHash. All of these
tables have init_size == max_size. I suggest to create "PROCLOCK hash"
and "LOCK hash" with init_size == max_size too (see attached patch).
Granted this change doesn't cause any noticeable performance
improvements according to pgbench. Nevertheless it looks like a very
right thing to do which at least doesn't make anything worse.

If this patch seems to be OK next logical step I believe would be to
remove init_size parameter in ShmemInitHash procedure since in practice
it always equals max_size.

On Fri, 11 Dec 2015 10:30:30 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:

Turns out PostgreSQL can spend a lot of time waiting for a lock in
this particular place, especially if you are running PostgreSQL on
60-core server. Which obviously is a pretty bad sign.
...
I managed to fix this behaviour by modifying choose_nelem_alloc
procedure in dynahash.c (see attached patch).

TBH, this is just voodoo. I don't know why this change would have
made any impact on lock acquisition performance, and neither do you,
and the odds are good that it's pure chance that it changed
anything. One likely theory is that you managed to shift around
memory allocations so that something aligned on a cacheline boundary
when it hadn't before. But, of course, the next patch that changes
allocations anywhere in shared memory could change that back. There
are lots of effects like this that appear or disappear based on
seemingly unrelated code changes when you're measuring edge-case
performance.

The patch is not necessarily bad in itself. As time goes by and
machines get bigger, it can make sense to allocate more memory at a
time to reduce memory management overhead. But arguing for it on the
basis that it fixes lock allocation behavior with 60 cores is just
horsepucky. What you were measuring there was steady-state hash
table behavior, not the speed of the allocate-some-more-memory code
path.

regards, tom lane

Attachments:

lock-contention-v2.patchtext/x-patchDownload+7-10
#7Andres Freund
andres@anarazel.de
In reply to: Aleksander Alekseev (#1)
Re: Patch: fix lock contention for HASHHDR.mutex

On 2015-12-11 17:00:01 +0300, Aleksander Alekseev wrote:

The problem is that code between LWLockAcquire (lock.c:881) and
LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
old-good gettimeofday and logging method I managed to find a bottleneck:

-- proclock = SetupLockInTable [lock.c:892]
`-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
`-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
`-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]

If my measurements are not wrong (I didn't place gettimeofday between
SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
waiting for a spinlock, which doesn't seems right.

Well, it's quite possible that your process was scheduled out, while
waiting for that spinlock. Which'd make 3ms pretty normal.

I'd consider using a LWLock instead of a spinlock here. I've seen this
contended in a bunch of situations, and the queued behaviour, combined
with directed wakeups on the OS level, ought to improve the worst case
behaviour measurably.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#7)
Re: Patch: fix lock contention for HASHHDR.mutex

On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-12-11 17:00:01 +0300, Aleksander Alekseev wrote:

The problem is that code between LWLockAcquire (lock.c:881) and
LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
old-good gettimeofday and logging method I managed to find a bottleneck:

-- proclock = SetupLockInTable [lock.c:892]
`-- proclock = (PROCLOCK *) hash_search_with_hash_value [lock.c:1105]
`-- currBucket = get_hash_entry(hashp) [dynahash.c:985]
`-- SpinLockAcquire(&hctl->mutex) [dynahash.c:1187]

If my measurements are not wrong (I didn't place gettimeofday between
SpinLockAquire/SpinLockRelease, etc) we sometimes spend about 3 ms here
waiting for a spinlock, which doesn't seems right.

Well, it's quite possible that your process was scheduled out, while
waiting for that spinlock. Which'd make 3ms pretty normal.

I'd consider using a LWLock instead of a spinlock here. I've seen this
contended in a bunch of situations, and the queued behaviour, combined
with directed wakeups on the OS level, ought to improve the worst case
behaviour measurably.

Amit had the idea a while back of trying to replace the HASHHDR mutex
with something based on atomic ops. It seems hard to avoid the
attendant A-B-A problems but maybe there's a way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: Patch: fix lock contention for HASHHDR.mutex

On 2015-12-17 09:47:57 -0500, Robert Haas wrote:

On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund <andres@anarazel.de> wrote:

I'd consider using a LWLock instead of a spinlock here. I've seen this
contended in a bunch of situations, and the queued behaviour, combined
with directed wakeups on the OS level, ought to improve the worst case
behaviour measurably.

Amit had the idea a while back of trying to replace the HASHHDR mutex
with something based on atomic ops. It seems hard to avoid the
attendant A-B-A problems but maybe there's a way.

It'd really like to see it being replaced by a queuing lock
(i.e. lwlock) before we go there. And then maybe partition the freelist,
and make nentries an atomic. Just doing those might already be good
enough and should be a lot easier.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Aleksander Alekseev
aleksander@timescale.com
In reply to: Andres Freund (#9)
Re: Patch: fix lock contention for HASHHDR.mutex

It'd really like to see it being replaced by a queuing lock
(i.e. lwlock) before we go there. And then maybe partition the
freelist, and make nentries an atomic.

I believe I just implemented something like this (see attachment). The
idea is to partition PROCLOCK hash table manually into NUM_LOCK_
PARTITIONS smaller and non-partitioned hash tables. Since these tables
are non-partitioned spinlock is not used and there is no lock
contention.

On 60-core server we gain 3.5-4 more TPS according to benchmark
described above. As I understand there is no performance degradation in
other cases (different CPU, traditional pgbench, etc).

If this patch seems to be OK I believe we could consider applying the
same change not only to PROCLOCK hash table.

Attachments:

shard-proclock-hash-table.patchtext/x-patchDownload+140-82
#11Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#10)
Re: Patch: fix lock contention for HASHHDR.mutex

On Thu, Dec 17, 2015 at 11:03 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

It'd really like to see it being replaced by a queuing lock
(i.e. lwlock) before we go there. And then maybe partition the
freelist, and make nentries an atomic.

I believe I just implemented something like this (see attachment). The
idea is to partition PROCLOCK hash table manually into NUM_LOCK_
PARTITIONS smaller and non-partitioned hash tables. Since these tables
are non-partitioned spinlock is not used and there is no lock
contention.

Oh, that's an interesting idea. I guess the problem is that if the
freelist is unshared, then users might get an error that the lock
table is full when some other partition still has elements remaining.

On 60-core server we gain 3.5-4 more TPS according to benchmark
described above. As I understand there is no performance degradation in
other cases (different CPU, traditional pgbench, etc).

3.5-4 more TPS, or 3.5 times more TPS? Can you share the actual numbers?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#11)
Re: Patch: fix lock contention for HASHHDR.mutex

Oh, that's an interesting idea. I guess the problem is that if the
freelist is unshared, then users might get an error that the lock
table is full when some other partition still has elements remaining.

True, but I don't believe it should be a real problem assuming we have
a strong hash function. If user get such an error it means that
database is under heavy load and there is not much more free elements
in other tables either. This particular user didn't get lucky, some
other will. Anyway administrator should do something about it -
fight DDoS attack, tune database parameters, etc.

3.5-4 more TPS, or 3.5 times more TPS? Can you share the actual
numbers?

Oops, 3.5-4 _times_ more TPS, i.e. 2206 TPS vs 546 TPS.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#9)
Re: Patch: fix lock contention for HASHHDR.mutex

On Thu, Dec 17, 2015 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-12-17 09:47:57 -0500, Robert Haas wrote:

On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund <andres@anarazel.de>

wrote:

I'd consider using a LWLock instead of a spinlock here. I've seen this
contended in a bunch of situations, and the queued behaviour, combined
with directed wakeups on the OS level, ought to improve the worst case
behaviour measurably.

Amit had the idea a while back of trying to replace the HASHHDR mutex
with something based on atomic ops. It seems hard to avoid the
attendant A-B-A problems but maybe there's a way.

It'd really like to see it being replaced by a queuing lock
(i.e. lwlock) before we go there. And then maybe partition the freelist,
and make nentries an atomic. Just doing those might already be good
enough and should be a lot easier.

makes sense to me, but I think we should as well try the Group leader idea
used for ProcArrayLock optimisation as during those tests, I found that
it gives better results as compare to partitioning.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#14Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#12)
Re: Patch: fix lock contention for HASHHDR.mutex

Oops, 3.5-4 _times_ more TPS, i.e. 2206 TPS vs 546 TPS.

In fact these numbers are for similar but a little bit different
benchmark (same schema without checks on child tables). Here are exact
numbers for a benchmark described above.

Before:

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 20433
latency average: 93.966 ms
tps = 679.698439 (including connections establishing)
tps = 680.353897 (excluding connections establishing)

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 19111
latency average: 100.466 ms
tps = 635.763523 (including connections establishing)
tps = 636.112682 (excluding connections establishing)

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 19218
latency average: 99.906 ms
tps = 639.506848 (including connections establishing)
tps = 639.838757 (excluding connections establishing)

After:

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 95900
latency average: 20.021 ms
tps = 3194.142762 (including connections establishing)
tps = 3196.091843 (excluding connections establishing)

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 96837
latency average: 19.827 ms
tps = 3225.822355 (including connections establishing)
tps = 3227.762847 (excluding connections establishing)

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 64
number of threads: 64
duration: 30 s
number of transactions actually processed: 96143
latency average: 19.970 ms
tps = 3202.637126 (including connections establishing)
tps = 3204.070466 (excluding connections establishing)

Ratio:

$ python

min(3194.0, 3225.0, 3202.0) / max(679.0, 635.0, 639.0)
4.703976435935199

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#10)
Re: Patch: fix lock contention for HASHHDR.mutex

On Thu, Dec 17, 2015 at 9:33 PM, Aleksander Alekseev <
a.alekseev@postgrespro.ru> wrote:

It'd really like to see it being replaced by a queuing lock
(i.e. lwlock) before we go there. And then maybe partition the
freelist, and make nentries an atomic.

I believe I just implemented something like this (see attachment). The
idea is to partition PROCLOCK hash table manually into NUM_LOCK_
PARTITIONS smaller and non-partitioned hash tables. Since these tables
are non-partitioned spinlock is not used and there is no lock
contention.

This idea can improve the situation with ProcLock hash table, but I
think IIUC what Andres is suggesting would reduce the contention
around dynahash freelist and can be helpful in many more situations
including BufMapping locks.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#16Aleksander Alekseev
aleksander@timescale.com
In reply to: Amit Kapila (#15)
Re: Patch: fix lock contention for HASHHDR.mutex

This idea can improve the situation with ProcLock hash table, but I
think IIUC what Andres is suggesting would reduce the contention
around dynahash freelist and can be helpful in many more situations
including BufMapping locks.

I agree. But as I understand PostgreSQL community doesn't generally
approve big changes that affects whole system. Especially if original
problem was only in one particular place. Therefore for now I suggest
only a small change. Naturally if it will be accepted there is no
reason not to apply same changes for BufMapping or even dynahash itself
with corresponding PROCLOCK hash refactoring.

BTW could you (or anyone?) please help me find this thread regarding
BufMapping or perhaps provide a benchmark? I would like to reproduce
this issue but I can't find anything relevant in a mailing list. Also
it seems to be a good idea to compare alternative approaches that were
mentioned (atomics ops, group leader). Are there any discussions,
benchmarks or patches regarding this topic?

Frankly I have serious doubts regarding atomics ops since they will more
likely create the same contention that a spinlock does. But perhaps
there is a patch that works not the way I think it could work.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Aleksander Alekseev
aleksander@timescale.com
In reply to: Amit Kapila (#15)
Re: Patch: fix lock contention for HASHHDR.mutex

This idea can improve the situation with ProcLock hash table, but I
think IIUC what Andres is suggesting would reduce the contention
around dynahash freelist and can be helpful in many more situations
including BufMapping locks.

I agree. But as I understand PostgreSQL community doesn't generally
аpprоvе big changes that affects whole system. Especially if original
problem was only in one particular place. Therefore for now I suggest
only a small change. Naturally if it will be accepted there is no
reason not to apply same changes for BufMapping or even dynahash itself
with corresponding PROCLOCK hash refactoring.

BTW could you (or anyone?) please help me find this thread regarding
BufMapping or perhaps provide a benchmark? I would like to reproduce
this issue but I can't find anything relevant in a mailing list. Also
it seems to be a good idea to compare alternative approaches that were
mentioned (atomics ops, group leader). Are there any discussions,
benchmarks or patches regarding this topic?

Frankly I have serious doubts regarding atomics ops since they will more
likely create the same contention that a spinlock does. But perhaps
there is a patch that works not the way I think it could work.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Aleksander Alekseev (#14)
Re: Patch: fix lock contention for HASHHDR.mutex

On 2015-12-18 11:40:58 +0300, Aleksander Alekseev wrote:

$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database

What's in pgbench.sql?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Aleksander Alekseev
aleksander@timescale.com
In reply to: Andres Freund (#18)
Re: Patch: fix lock contention for HASHHDR.mutex

What's in pgbench.sql?

It's from first message of this thread:

/messages/by-id/20151211170001.78ded9d7@fujitsu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#16)
Re: Patch: fix lock contention for HASHHDR.mutex

On Fri, Dec 18, 2015 at 2:50 PM, Aleksander Alekseev <
a.alekseev@postgrespro.ru> wrote:

This idea can improve the situation with ProcLock hash table, but I
think IIUC what Andres is suggesting would reduce the contention
around dynahash freelist and can be helpful in many more situations
including BufMapping locks.

I agree. But as I understand PostgreSQL community doesn't generally
approve big changes that affects whole system. Especially if original
problem was only in one particular place. Therefore for now I suggest
only a small change. Naturally if it will be accepted there is no
reason not to apply same changes for BufMapping or even dynahash itself
with corresponding PROCLOCK hash refactoring.

BTW could you (or anyone?) please help me find this thread regarding
BufMapping or perhaps provide a benchmark?

You can find that in below thread:
/messages/by-id/CAA4eK1+U+GQDc2sio4adRk+ux6obFYRPxkY=CH5BkNaBToo84A@mail.gmail.com

This even contains the original idea and initial patch for replacing
spinlocks with atomic ops. I have mentioned about the A-B-A problem
few mails down in that thread and given a link to paper suggesting how
that can be solved. It needs more work, but doable.

I would like to reproduce
this issue but I can't find anything relevant in a mailing list. Also
it seems to be a good idea to compare alternative approaches that were
mentioned (atomics ops, group leader). Are there any discussions,
benchmarks or patches regarding this topic?

You can find the discussion and patch related to Group leader approach
in the thread:
/messages/by-id/CAA4eK1JbX4FzPHigNt0JSaz30a85BPJV+ewhk+wg_o-T6xufEA@mail.gmail.com
This patch is already committed.

Frankly I have serious doubts regarding atomics ops since they will more
likely create the same contention that a spinlock does. But perhaps
there is a patch that works not the way I think it could work.

I think it is difficult to say without implementing it. If we want
to evaluate
multiple approaches then what we can do here is I can help with writing a
patch using LWLocks and you can once evaluate the atomic ops approach
and we already have your current patch, then we can see what works out
best.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#21Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#11)
#22Aleksander Alekseev
aleksander@timescale.com
In reply to: Teodor Sigaev (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Teodor Sigaev (#21)
#24Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#24)
#26Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#25)
#27Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#26)
#28Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#27)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Aleksander Alekseev (#27)
#30Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#30)
#32Aleksander Alekseev
aleksander@timescale.com
In reply to: Alvaro Herrera (#31)
#33Oleg Bartunov
oleg@sai.msu.su
In reply to: Andres Freund (#30)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#35)
#37Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#34)
#38Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#36)
#39Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Oleg Bartunov (#33)
#40Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Aleksander Alekseev (#28)
#41Aleksander Alekseev
aleksander@timescale.com
In reply to: Alvaro Herrera (#40)
#42Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#40)
#43Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#42)
#44Dilip Kumar
dilipbalaut@gmail.com
In reply to: Aleksander Alekseev (#41)
#45Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Aleksander Alekseev (#28)
#46Aleksander Alekseev
aleksander@timescale.com
In reply to: Dilip Kumar (#44)
#47Aleksander Alekseev
aleksander@timescale.com
In reply to: Anastasia Lubennikova (#45)
#48Dilip Kumar
dilipbalaut@gmail.com
In reply to: Aleksander Alekseev (#46)
#49Aleksander Alekseev
aleksander@timescale.com
In reply to: Dilip Kumar (#48)
#50Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Aleksander Alekseev (#47)
#51Aleksander Alekseev
aleksander@timescale.com
In reply to: Anastasia Lubennikova (#50)
#52Dilip Kumar
dilipbalaut@gmail.com
In reply to: Aleksander Alekseev (#49)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Anastasia Lubennikova (#45)
#54Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#54)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#55)
#57Aleksander Alekseev
aleksander@timescale.com
In reply to: Alvaro Herrera (#56)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#57)
#59Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#60)
#62Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#61)
#63Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#61)
#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#63)
#65Aleksander Alekseev
aleksander@timescale.com
In reply to: Amit Kapila (#64)
#66Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#65)
#67Aleksander Alekseev
aleksander@timescale.com
In reply to: Amit Kapila (#66)
#68Amit Kapila
amit.kapila16@gmail.com
In reply to: Aleksander Alekseev (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#65)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#69)
#71Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#70)
#72Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#71)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#72)
#74Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#73)
#75Aleksander Alekseev
aleksander@timescale.com
In reply to: Amit Kapila (#74)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#75)
#77Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#76)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Aleksander Alekseev (#77)
#79Aleksander Alekseev
aleksander@timescale.com
In reply to: Robert Haas (#78)