Understanding max_locks_per_transaction

Started by Craig McIlweeover 2 years ago4 messagesgeneral
Jump to latest
#1Craig McIlwee
craigm@vt.edu

We're using PostgreSQL 13.10, installed on CentOS 7 from PGDG RPMs.

Recently we've run into "out of shared memory" issues with a hint at
increasing max_locks_per_transaction. The problem is well described in the
PostgreSQL documentation and various blog posts found around the internet,
and the solution is straightforward - touch fewer objects per transaction
or increase the size of the lock table. The error occurs when joining
partitioned tables, and changing the query structure is something we'd like
to avoid, so we are going the route of increasing the size of the lock
table by increasing max_locks_per_transaction.

Many blog posts suggest against using an arbitrarily large value for
max_locks_per_transaction to avoid excess memory usage by the lock table.
Contrary to that is an email from Tom Lane [1]/messages/by-id/25925.1669420577@sss.pgh.pa.us indicating that a lock table
with several million slots wouldn't be so bad. A SO answer from Laurenz
Albe [2]https://stackoverflow.com/a/65084207/2934470 tells us that a lock entry consumes 168 bytes so a table with 1m
slots would consume somewhere around 160MB (assuming we are on the same
architecture, but if not then at least still probably within an order of
magnitude). I suppose I could set the max_locks_per_transaction value
really high, but, on the other hand, if I can make a pretty good estimate
about how many locks are actually needed then I can set a lower value and
let some of that RAM be used for other things like disk caching instead.
So here I am, trying to estimate the amount of locks that would actually be
needed but have some questions and am running into some behavior that I do
not expect.

Most discussions regarding the lock table say that the size of the lock
table determines how many locks can be held. The documentation for
max_locks_per_transaction [3]https://www.postgresql.org/docs/current/runtime-config-locks.html#GUC-MAX-LOCKS-PER-TRANSACTION reads slightly different though, and in
particular this phrases stands out to me:

no more than this many distinct objects can be locked at any one time

To me, that seems to be saying that multiple locks for the same object
(e.g. for a single table) would only consume a single lock table entry.
Finally on to my first question: Am I interpreting the documentation
correctly, that multiple locks for the same object only consume a single
lock table entry, or am I reading too much into this and the size of the
lock table really does dictate the total number of locks regardless of
whether those locks point to the same object? If my interpretation is
correct, then I can size the lock table to be slightly above the total
number of database objects and I should be safe.

I tried to test my understanding by creating two database connections and
in each connection issuing a query that touched a large number of tables
(enough to almost reach the size of the lock table). My theory was that if
multiple locks for the same object only consume a single lock table entry,
then multiple connections could each issue the same big query that used a
large number of locks and no issue would arise. For example, with the
default settings (max_connections = 100, max_prepared_statements = 0,
max_locks_per_transaction = 64) I should have a lock table with 6400 slots
so two connections that both touch 6000 database objects should be able to
run concurrently. That test succeeded, but led me to the next
observation...

To my surprise, during my testing I was able to issue queries that used
well above the maximum number of locks that I was expecting and those
queries completed without issue. Below are the relevant database settings
and the results of a statement that locks many objects.

archive=# show max_connections ;
max_connections
-----------------
100
(1 row)
archive=# show max_prepared_transactions ;
max_prepared_transactions
---------------------------
0
(1 row)
archive=# show max_locks_per_transaction ;
max_locks_per_transaction
---------------------------
64
(1 row)
archive=# begin;
BEGIN
archive=*# explain <large select statement that joins many partitions>
<explain output omitted for brevity>
archive=*# select count(*) as total, count(*) filter (where granted) as
granted from pg_locks;
total | granted
-------+---------
7568 | 7568
(1 row)
archive=*# select count(distinct relation) from pg_locks ;
count
-------
7576
(1 row)

According to the documentation, I should have a lock table with 6400
entries but somehow have been able to obtain 7576 locks. So my second and
last question: how is this possible - shouldn't I have received an "out of
shared memory" error since I exceeded 6400 distinct locked objects?

[1]: /messages/by-id/25925.1669420577@sss.pgh.pa.us
[2]: https://stackoverflow.com/a/65084207/2934470
[3]: https://www.postgresql.org/docs/current/runtime-config-locks.html#GUC-MAX-LOCKS-PER-TRANSACTION
https://www.postgresql.org/docs/current/runtime-config-locks.html#GUC-MAX-LOCKS-PER-TRANSACTION

Craig

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig McIlwee (#1)
Re: Understanding max_locks_per_transaction

Craig McIlwee <craigm@vt.edu> writes:

Most discussions regarding the lock table say that the size of the lock
table determines how many locks can be held. The documentation for
max_locks_per_transaction [3] reads slightly different though, and in
particular this phrases stands out to me:

no more than this many distinct objects can be locked at any one time

To me, that seems to be saying that multiple locks for the same object
(e.g. for a single table) would only consume a single lock table entry.
Finally on to my first question: Am I interpreting the documentation
correctly, that multiple locks for the same object only consume a single
lock table entry,

Yes ... however it's a good deal more complicated than that.

What actually happens under the hood is that we allocate enough shared
memory space for (MaxBackends + max_prepared_transactions) *
max_locks_per_transaction LOCK structs (which are the per-locked-object
entries) and twice that many PROCLOCK structs, which are
per-lock-per-holder information. The 2X multiplier assumes that on
average about two sessions will be holding/requesting locks on any
specific locked object.

Now, MaxBackends is more than max_connections, because it also
accounts for autovacuum workers, parallel workers, etc. So that's
one of the sources of the fuzzy limit you noticed. The other source
is that we allocate about 100K more shared memory space than we think
we need, and it's possible for the lock tables to expand into that
"slop" space. I've not checked the sizes of these structs lately,
but the slop space could surely accommodate several hundred more
locks than the initial estimate allows.

Certainly it's safe to raise max_prepared_transactions a good deal
on modern machines, but I'm not sure that you can reasonably get
to a place where there is a mathematical guarantee that you won't
run out of shared memory. Even if you know how many lockable
objects your installation has (which I bet you don't, or at least
the number isn't likely to hold still for long) it's pretty hard
to say exactly how many PROCLOCK entries you might need. And
bloating the lock table size by max_connections/2 or so to try
to brute-force that doesn't seem like a good plan.

I'd just raise max_prepared_transactions until you stop seeing
problems, and then maybe add a factor of two safety margin.

regards, tom lane

#3Craig McIlwee
craigm@vt.edu
In reply to: Tom Lane (#2)
Re: Understanding max_locks_per_transaction

On Mon, Oct 16, 2023 at 2:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig McIlwee <craigm@vt.edu> writes:

Most discussions regarding the lock table say that the size of the lock
table determines how many locks can be held. The documentation for
max_locks_per_transaction [3] reads slightly different though, and in
particular this phrases stands out to me:

no more than this many distinct objects can be locked at any one time

To me, that seems to be saying that multiple locks for the same object
(e.g. for a single table) would only consume a single lock table entry.
Finally on to my first question: Am I interpreting the documentation
correctly, that multiple locks for the same object only consume a single
lock table entry,

Yes ... however it's a good deal more complicated than that.

What actually happens under the hood is that we allocate enough shared
memory space for (MaxBackends + max_prepared_transactions) *
max_locks_per_transaction LOCK structs (which are the per-locked-object
entries) and twice that many PROCLOCK structs, which are
per-lock-per-holder information. The 2X multiplier assumes that on
average about two sessions will be holding/requesting locks on any
specific locked object.

I think that explains why I could successfully issue the same large
statement from 2 concurrent transactions but then encounter an error when
trying a third.

Now, MaxBackends is more than max_connections, because it also
accounts for autovacuum workers, parallel workers, etc. So that's
one of the sources of the fuzzy limit you noticed. The other source
is that we allocate about 100K more shared memory space than we think
we need, and it's possible for the lock tables to expand into that
"slop" space. I've not checked the sizes of these structs lately,
but the slop space could surely accommodate several hundred more
locks than the initial estimate allows.

OK, those 2 bits of fuzz look like they could increase the lock table size
by at least 1000 entries, maybe even 2000, which explains how I could get
~7500 locks without an error. 100K / 168 = 609 extra slots. Assuming
MaxBackends is 10 higher than max_connections (wild guess there, but maybe
it's even higher?), then that's another 640 slots.

Even if you know how many lockable objects your installation has (which I
bet you don't, or at least the number isn't likely to hold still for long)

Not exactly, but we have time based (monthly) partitioning and can make a
reasonable estimate on how many years an installation will run so our guess
wouldn't be too far off. My plan was to make an estimate then increase by
50% or maybe even 100% to account for new objects that could be added later.

it's pretty hard to say exactly how many PROCLOCK entries you might need.

Yeah, good point there, but I suppose our connection pool's relatively low
max connection limit could be used as an upper bound.

And bloating the lock table size by max_connections/2 or so to try
to brute-force that doesn't seem like a good plan.

I'm not following - where does max_connections/2 come from?

I'd just raise max_prepared_transactions until you stop seeing
problems, and then maybe add a factor of two safety margin.

That's what we've already done for the short term solution. It is somewhat
in conflict with your statement regarding the number of lockable objects
not holding still for long, though. As time goes on and our scheduled jobs
automatically create new monthly partitions, or as our schema evolves, we
may eventually hit the limits again. That's why we'd like to create some
formula that can estimate the max_locks_per_transaction value we should
configure (with the previously mentioned multiplier for safety / future
proofing). An alternative would be to precreate all partitions we
anticipate needing so we don't get surprises down the line, but then we
incur extra planning cost for tables that will stay empty for months or
even years.

Craig

#4Ron
ronljohnsonjr@gmail.com
In reply to: Craig McIlwee (#3)
Re: Understanding max_locks_per_transaction

On 10/16/23 14:31, Craig McIlwee wrote:

That's what we've already done for the short term solution.  It is
somewhat in conflict with your statement regarding the number of lockable
objects not holding still for long, though.  As time goes on and our
scheduled jobs automatically create new monthly partitions, or as our
schema evolves, we may eventually hit the limits again.  That's why we'd
like to create some formula that can estimate the
max_locks_per_transaction value we should configure (with the previously
mentioned multiplier for safety / future proofing).  An alternative would
be to precreate all partitions we anticipate needing so we don't get
surprises down the line, but then we incur extra planning cost for tables
that will stay empty for months or even years.

Just set max_locks_per_transaction to Something Big, and go on to other
business.  16384 worked for us, after slowly inching up towards 1000. Tom's
response let me not worry about setting "so big", and we haven't had any
problems since.

--
Born in Arizona, moved to Babylonia.