Contention on LWLock buffer_content, due to SHARED lock(?)

Started by Jens-Wolfhard Schicke-Uffmannabout 6 years ago8 messages

Hi,

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

Three tables were involved, simplified case:

CREATE TABLE global_config (id BIGINT PRIMARY KEY);

CREATE TABLE b (
id BIGINT PRIMARY KEY,
config_id BIGINT REFERENCES global_config (id)
);

CREATE TABLE c (
id BIGINT PRIMARY KEY,
config_id BIGINT REFERENCES global_config (id)
);

(I suppose having both b + c doesn't make a difference, but
maybe it was relevant, so I'm including it.)

Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
SELECTs on global_config (again by id).
As the name suggests, there were only very few rows in
global_config, specifically only one was referenced by all
INSERT + UPDATEs on b + c.

On lighter load, all three types of queries were taking <1ms (as
expected), as load grew, all three went to ~50ms avg. execution time
together. AWS RDS console showed wait on LWLock:buffer_content as the
main contribution to that time.

Checking the code, I concluded that I observed lock contention
on the lock taken at the beginning of heap_lock_tuple, where
an exclusive buffer content lock is held while recording the
SHARE lock into the tuple and the WAL and the multiXact. I don't know
the actual number, but potentially up to 7000 active
transactions were holding a SHARE lock on that row, which could have
performance implications while scanning for multiXact memberships.

Semantically, all that lock traffic was superfluous, as the
global_config row's key was in no danger of being changed.

As this situation (some global, essentially static, entity is referenced
by a much written table) seems not uncommon, I wonder:

1. Does the above analysis sound about right?

2. If so, would it be worthwhile to develop a solution?
I was thinking along the lines of introducing an multiXact
representation of "everyone": Instead of meticulously recording every
locking + completing transaciton in a multiXact, after a certain
number of transactions has accumulated in a single multiXact, it is
approximated as "everyone". If later a transaction finds that a SHARE
lock is held by "everyone", the tuple would need no further modification
(not sure if this could even be checked without taking an exclusive
buffer lock). The hard part would probably be to ensure that an
attempt to obtain an EXCLUSIVE lock would finally succeed against a
SHARE lock held by "everyone".

Regards,
Drahflow

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Jens-Wolfhard Schicke-Uffmann (#1)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

On Mon, Dec 9, 2019 at 5:10 PM Jens-Wolfhard Schicke-Uffmann <
drahflow@gmx.de> wrote:

Hi,

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

What version of PostgreSQL are you using?

Cheers,

Jeff

#3Andres Freund
andres@anarazel.de
In reply to: Jens-Wolfhard Schicke-Uffmann (#1)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

Hi,

On 2019-12-09 23:10:36 +0100, Jens-Wolfhard Schicke-Uffmann wrote:

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

Three tables were involved, simplified case:

CREATE TABLE global_config (id BIGINT PRIMARY KEY);

CREATE TABLE b (
id BIGINT PRIMARY KEY,
config_id BIGINT REFERENCES global_config (id)
);

CREATE TABLE c (
id BIGINT PRIMARY KEY,
config_id BIGINT REFERENCES global_config (id)
);

(I suppose having both b + c doesn't make a difference, but
maybe it was relevant, so I'm including it.)

Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
SELECTs on global_config (again by id).
As the name suggests, there were only very few rows in
global_config, specifically only one was referenced by all
INSERT + UPDATEs on b + c.

On lighter load, all three types of queries were taking <1ms (as
expected), as load grew, all three went to ~50ms avg. execution time
together. AWS RDS console showed wait on LWLock:buffer_content as the
main contribution to that time.

Checking the code, I concluded that I observed lock contention
on the lock taken at the beginning of heap_lock_tuple, where
an exclusive buffer content lock is held while recording the
SHARE lock into the tuple and the WAL and the multiXact. I don't know
the actual number, but potentially up to 7000 active
transactions were holding a SHARE lock on that row, which could have
performance implications while scanning for multiXact memberships.

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?

Semantically, all that lock traffic was superfluous, as the
global_config row's key was in no danger of being changed.

Well, postgres can't know that.

As this situation (some global, essentially static, entity is referenced
by a much written table) seems not uncommon, I wonder:

1. Does the above analysis sound about right?

Hard to know without additional data.

2. If so, would it be worthwhile to develop a solution?

Possible, but I'm not sure it's worth the complexity.

I'd definitely like to see a proper reproducer and profile for this,
before investigating further.

I was thinking along the lines of introducing an multiXact
representation of "everyone": Instead of meticulously recording every
locking + completing transaciton in a multiXact, after a certain
number of transactions has accumulated in a single multiXact, it is
approximated as "everyone". If later a transaction finds that a SHARE
lock is held by "everyone", the tuple would need no further modification

I think the big problem with a strategy like this is that it's prone to
generate deadlocks that aren't present in the "original" scheduling.

(not sure if this could even be checked without taking an exclusive
buffer lock).

It should only require a share lock.

The hard part would probably be to ensure that an
attempt to obtain an EXCLUSIVE lock would finally succeed against a
SHARE lock held by "everyone".

Note that this is a seriously complicated area of the code. It's very
easy to create new bugs that aren't easily testable. I think we'd need a
very convincing use-case for improvements around the problem you outline
and relatively simple solution, to counter stability concerns.

Greetings,

Andres Freund

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#3)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

On 2019-Dec-10, Andres Freund wrote:

The hard part would probably be to ensure that an
attempt to obtain an EXCLUSIVE lock would finally succeed against a
SHARE lock held by "everyone".

Note that this is a seriously complicated area of the code. It's very
easy to create new bugs that aren't easily testable. I think we'd need a
very convincing use-case for improvements around the problem you outline
and relatively simple solution, to counter stability concerns.

I'd rather have the ability to mark a table READ ONLY (or similar).
Then any FK references can skip the row locks altogether. For the rare
cases where you need to modify the referenced table, have it marked READ
WRITE, and any row locks are registered normally from that point on,
until you set it back to READ ONLY again.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In reply to: Andres Freund (#3)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

Hi,

On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?

Yes, max connections was 20000, active connections around 7000 at that
time. Unfortunately, I don't have actual numbers of connections in
transactions for that point in time. (We were trying to establish
maximum performance of a larger system.)

Semantically, all that lock traffic was superfluous, as the
global_config row's key was in no danger of being changed.

Well, postgres can't know that.

I am aware; it's just an argument for why it might be possible to
shove some optimization there.

1. Does the above analysis sound about right?

Hard to know without additional data.

What data would be worth recording next time? (Except number of
active transactions, obviously.)

2. If so, would it be worthwhile to develop a solution?

Possible, but I'm not sure it's worth the complexity.

I'd definitely like to see a proper reproducer and profile for this,
before investigating further.

I'll see if and when I can include this into my client's project
schedule. Might be a while, but I'll get back to you when I have
a reproducer + profile data (of an up-to-date vanilla Postgres,
not 10.7+AWS aurora patches).

I think we'd need a very convincing use-case for improvements around the problem
you outline.

Understood. I'll try to get an iron-clad profile of the problematic case
first.

Regards,
Drahflow

In reply to: Alvaro Herrera (#4)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

Hi,

On Tue, Dec 10, 2019 at 03:07:05PM -0300, Alvaro Herrera wrote:

I'd rather have the ability to mark a table READ ONLY (or similar).
Then any FK references can skip the row locks altogether. For the rare
cases where you need to modify the referenced table, have it marked READ
WRITE, and any row locks are registered normally from that point on,
until you set it back to READ ONLY again.

However, that would require changes to applications writing to the table
and a good understanding of performance characteristics by everyone
trying to get to that scale. (OTOH, there is certainly an argument to be
made that whoever hits this kind of problem better also has an idea of
postgres performance tuning anyway.)

More troubling (to me) is that I already know of another table in the
system which should be next-in-line for the same problem, but only on
some rows: It represents accounting entities, of which a very (nearly
static) few are payment processors and all others are customers. From
the application's perspective there's not too much difference between
those, but any customer row will typically only be share locked once,
whereas share locks on payment processor rows will be held by most of
the transactions currently active.

That use-case is not very uncommon I think, so it migth be worthwhile
to implement a solution which does not require all rows of a table to
share similar lock contention characteristics, or writability.

Regards,
Drahflow

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jens-Wolfhard Schicke-Uffmann (#6)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

On 2019-Dec-10, Jens-Wolfhard Schicke-Uffmann wrote:

More troubling (to me) is that I already know of another table in the
system which should be next-in-line for the same problem, but only on
some rows: It represents accounting entities, of which a very (nearly
static) few are payment processors and all others are customers. From
the application's perspective there's not too much difference between
those, but any customer row will typically only be share locked once,
whereas share locks on payment processor rows will be held by most of
the transactions currently active.

Well, you could partition that table. This probably means you'll need
to improve Postgres implementation of PKs on partitioned tables, though.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Andres Freund
andres@anarazel.de
In reply to: Jens-Wolfhard Schicke-Uffmann (#5)
Re: Contention on LWLock buffer_content, due to SHARED lock(?)

Hi,

On 2019-12-10 22:44:17 +0100, Jens-Wolfhard Schicke-Uffmann wrote:

On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:

today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
cores) lock contention on a buffer content lock due to taking of a
SHARED lock (I think):

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?

Yes, max connections was 20000, active connections around 7000 at that
time. Unfortunately, I don't have actual numbers of connections in
transactions for that point in time. (We were trying to establish
maximum performance of a larger system.)

I'd strongly recommend changing your architecture. There's *severe*
overhead in that many concurrent active connections (and some in such a
high max_connections setting). It's likely that you'd be much much
better off by putting in a query pooler in front that limits active
transaction to a significantly smaller number. There's only so many CPU
cores, so at some point adding more concurrency just increases the
overall amount of work that needs to be done (due to the overhead of
managing concurrency and context switches).

1. Does the above analysis sound about right?

Hard to know without additional data.

What data would be worth recording next time? (Except number of
active transactions, obviously.)

I think we'd need a CPU profile for starters. But that unfortunately
won't be possible on RDS...

Greetings,

Andres Freund