Out of Shared Memory: max_locks_per_transaction

Started by Eliot Gableover 13 years ago5 messagesgeneral

egable+pgsql-general@gmail.com

over 13 years ago

I have a PGSQL 9.0.1 database which is on the back-end of an app I was
stress testing last night. This morning, when I try to run psql, I get:

psql: FATAL: out of shared memory
HINT: You might need to increase max_locks_per_transaction.

I believe something in the app is failing to release resources -- maybe a
3-way or 4-way deadlock between writing to tables inside the triggers in
PGSQL or a deadlock between multiple processes talking to the app and the
database or something leaking in the app itself which is causing locks to
not be freed.

How do I track down what is going on if I cannot even run psql to get into
the DB to run troubleshooting queries?

Thanks in advance for any suggestions.

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be
a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live;
not live to eat.) ~Marcus Tullius Cicero

Merlin Moncure

mmoncure@gmail.com

over 13 years ago

In reply to: Eliot Gable (#1)

Re: Out of Shared Memory: max_locks_per_transaction

On Fri, Nov 9, 2012 at 8:43 AM, Eliot Gable
<egable+pgsql-general@gmail.com> wrote:

I have a PGSQL 9.0.1 database which is on the back-end of an app I was
stress testing last night. This morning, when I try to run psql, I get:

psql: FATAL: out of shared memory
HINT: You might need to increase max_locks_per_transaction.

I believe something in the app is failing to release resources -- maybe a
3-way or 4-way deadlock between writing to tables inside the triggers in
PGSQL or a deadlock between multiple processes talking to the app and the
database or something leaking in the app itself which is causing locks to
not be freed.

most likely possibility you have a transaction being left open and
accumulating locks. of course, you have to rule out the fact that
you simply have to increase max_locks_per_transaction: if you have a
lot of tables, it might be reasonable to have to extend this on a
stock config.

one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have. that's another thing
to rule out.

How do I track down what is going on if I cannot even run psql to get into
the DB to run troubleshooting queries?

it's a nasty problem. if shared memory is exhausted and stuck i think
the only practical think to do is to restart the database or nuking
attached clients. maybe try restarting the test, but keep an open
session *with an open transaction* that has previously queried both
pg_locks and pg_stat_activity. it's just a hunch, but perhaps this
might allow you to query said views and gather some details.

merlin

Eliot Gable

egable+pgsql-general@gmail.com

over 13 years ago

In reply to: Merlin Moncure (#2)

Re: Out of Shared Memory: max_locks_per_transaction

most likely possibility you have a transaction being left open and
accumulating locks. of course, you have to rule out the fact that
you simply have to increase max_locks_per_transaction: if you have a
lot of tables, it might be reasonable to have to extend this on a
stock config.

We allow 500 connections to the DB with 64 locks per transaction = 32,000
locks. During the stress testing, I had roughly 40 simultaneous operations
going through the test application. The test application uses a separate
set of threads for accessing the database along with a shared connection
pool and a FIFO queue attached to each connection. Queries are mostly
insert, update, and delete, so they are batched into transactions in blocks
of 100 - 1000 queries per transaction. At the start of the transaction, a
stored procedure is called which acquires locks on 8 tables in a specific
order to prevent triggers on the associated tables from deadlocking with
each other and with other things accessing the database. In total, there
might be 15 tables (at most) touched by the batch of queries.

Another process comes along and processes records which are being inserted
into the database. It pulls up to 10 records from a table, processes them,
and moves those records into a "processed" table. The processing of the
records is rather complex. To facilitate the processing, 6 temporary tables
are created during the processing of each record, and then dropped after
that record is completed. 8 additional tables are accessed in some way
during the processing of each record. Each call to the processing stored
procedure is run in its own transaction and handles only those 10 records
at a time. This is done to keep the length of the transaction short so it
does not block other activity in the database.

one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have. that's another thing
to rule out.

How would I rule this out?

How do I track down what is going on if I cannot even run psql to get

into

the DB to run troubleshooting queries?

it's a nasty problem. if shared memory is exhausted and stuck i think
the only practical think to do is to restart the database or nuking
attached clients. maybe try restarting the test, but keep an open
session *with an open transaction* that has previously queried both
pg_locks and pg_stat_activity. it's just a hunch, but perhaps this
might allow you to query said views and gather some details.

That is an interesting suggestion. I will definitely give that a try.

Is the pg_locks table the table I would query to see what is eating up
those 32,000 locks? Is there some other table or query I could run which
might provide additional information about those 32,000 locks and who /
what is using them?

Thanks for the info and your suggestions!

Eliot Gable

egable+pgsql-general@gmail.com

over 13 years ago

In reply to: Eliot Gable (#3)

Re: Out of Shared Memory: max_locks_per_transaction

Another process comes along and processes records which are being inserted
into the database. It pulls up to 10 records from a table, processes them,
and moves those records into a "processed" table. The processing of the
records is rather complex. To facilitate the processing, 6 temporary tables
are created during the processing of each record, and then dropped after
that record is completed. 8 additional tables are accessed in some way
during the processing of each record. Each call to the processing stored
procedure is run in its own transaction and handles only those 10 records
at a time. This is done to keep the length of the transaction short so it
does not block other activity in the database.

one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have. that's another thing
to rule out.

How would I rule this out?

It really was filling the locks table.

Using your suggestion, I managed to catch it in the process of the bad
behavior, before it exhausted all lock entries. After some sleuthing
through the resulting pg_locks output and my other code, I was able to
isolate and resolve the issue. Basically, there was a call going on which
tried to materialize a stats table based on thousands of records instead of
10 at a time. It was supposed to just be materializing the base rows in
that table, all zeroed out, not based on any of the records. However, it
does so using the same function which actually crunches numbers for the
records, and it was coded to try all records from start of day until the
function was run!

Thanks for the assist.

-Eliot

Merlin Moncure

mmoncure@gmail.com

over 13 years ago

In reply to: Eliot Gable (#4)

Re: Out of Shared Memory: max_locks_per_transaction

On Fri, Nov 9, 2012 at 2:50 PM, Eliot Gable
<egable+pgsql-general@gmail.com> wrote:

one thing that can cause this unfortunately is advisory locks eating
up exactly the amount of shared memory you have. that's another thing
to rule out.

How would I rule this out?

It really was filling the locks table.

Using your suggestion, I managed to catch it in the process of the bad
behavior, before it exhausted all lock entries. After some sleuthing through
the resulting pg_locks output and my other code, I was able to isolate and
resolve the issue. Basically, there was a call going on which tried to
materialize a stats table based on thousands of records instead of 10 at a
time. It was supposed to just be materializing the base rows in that table,
all zeroed out, not based on any of the records. However, it does so using
the same function which actually crunches numbers for the records, and it
was coded to try all records from start of day until the function was run!

awesome...glad I could help. in the case of advisory locks, to help
reduce the likelihood of things like this happening, it's always
better to use the recently added 'xact' flavor of the functions that
release the lock at 'end of transaction' when possible.

merlin