Avoiding deadlocks when performing bulk update and delete operations

Started by Sanjaya Vithanagamaover 11 years ago17 messagesgeneral

Jump to latest

Sanjaya Vithanagama

svithanagama@gmail.com

over 11 years ago

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has
millions of rows. We have several stored procedures which do mass updates
and deletes. Those stored procedures are being called concurrently mainly
by triggers and application code.

The operations usually look like the following where it could match
thousands of records to update or delete:

DELETE FROM table_name t
USING (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) del
WHERE t.id_A = del.id_A
AND t.id_B = del.id_B;

UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform operations
with locks (row level using SELECT FOR UPDATE as used in the above queries
and table level locks) do not seem to solve these deadlock issues. (Note
that we cannot in any way use access exclusive locking on this table
because of the performance impact)

Is there another way that we could try to solve these deadlock situations?
The reference manual says — "The best defense against deadlocks is
generally to avoid them by being certain that all applications using a
database acquire locks on multiple objects in a consistent order."

Is there a guaranteed way to do bulk update/delete operations in a
particular order so that we can ensure deadlocks won't occur? Or are there
any other tricks to avoid deadlocks in this situation?

Thank you in advance,
Sanjaya

Gavin Flower

GavinFlower@archidevsys.co.nz

over 11 years ago

In reply to: Sanjaya Vithanagama (#1)

Re: Avoiding deadlocks when performing bulk update and delete operations

On 24/11/14 16:51, Sanjaya Vithanagama wrote:

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has
millions of rows. We have several stored procedures which do mass
updates and deletes. Those stored procedures are being called
concurrently mainly by triggers and application code.

The operations usually look like the following where it could match
thousands of records to update or delete:

DELETE FROM table_name t
USING (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) del
WHERE t.id_A = del.id_A
AND t.id_B = del.id_B;

UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform
operations with locks (row level using SELECT FOR UPDATE as used in
the above queries and table level locks) do not seem to solve these
deadlock issues. (Note that we cannot in any way use access exclusive
locking on this table because of the performance impact)

Is there another way that we could try to solve these deadlock
situations? The reference manual says — "The best defense against
deadlocks is generally to avoid them by being certain that all
applications using a database acquire locks on multiple objects in a
consistent order."

Is there a guaranteed way to do bulk update/delete operations in a
particular order so that we can ensure deadlocks won't occur? Or are
there any other tricks to avoid deadlocks in this situation?

Thank you in advance,
Sanjaya

Unless there is some sort of implied locking, or other nonsense like
different lock types, then always acquiring locks in the same order
should work - as far as I can tell.

For purely locking problems, and assuming that all the relevant tables
are locked:

For if process p1 acquires locks in the order A, B, & C
and process p2 acquires locks in the order A, C, & D,
Then as soon as one process grabs A, then the other process cannot grab
A nor the other locks - so deadlock is avoided.

Similarly:
p1 A, B, C, & D
p2 B & C
However, if p1 grabs A and then p2 grabs B, P1 will have to wait for p2
to finish before p1 continues - but still, neither is deadlocked. Even
if there is p3 which locks B - at worst 2 processes will wait until the
lucky first process releases its locks.

You may have problems if there is some resource that is in contention,
where 2 processes require the resource and grab it in several parts at
different times, and they both grab some, and then there is insufficient
to completely satisfy either - this is guesswork, I'm not sure what
resources (if any) would be a problem here.

Cheers,
Gavin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Igor Neyman

ineyman@perceptron.com

over 11 years ago

In reply to: Sanjaya Vithanagama (#1)

Re: Avoiding deadlocks when performing bulk update and delete operations

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sanjaya Vithanagama
Sent: Sunday, November 23, 2014 10:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code.

The operations usually look like the following where it could match thousands of records to update or delete:

We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact)

Is there another way that we could try to solve these deadlock situations? The reference manual says — "The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order."

Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation?

Thank you in advance,
Sanjaya

May be I’m missing something here, but it seems that you make the problem worse by using :

DELETE … USING (SELECT … FOR UPDATE)…

Can’t you just do:

DELETE FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B);

Regards,
Igor Neyman

Berend Tober

btober@broadstripe.net

over 11 years ago

In reply to: Igor Neyman (#3)

How to individually list the DDL for all individual data base objects

Is there a good way to individually list the DDL for all individual data
base objects?

Running a data base dump like:

pg_dump mydatabase > mydatabase-database.sql

produces one big file with all the DDL and DML to restore the data base,
which is very convenient for most cases.

Using that I have sometimes cut-and-pasted out of it the DDL for
individual items that I want to work on, and then fed that back in with

psql mydatabase < newstuff.sql

What I would like, though, is a set of output files, one for each
DDL/DML item currently represented in the one big file.

I could maybe attempt some convoluted automated parsing of the big file
with, for example, Perl, but that seems, well, convoluted, error-prone,
and probably fragile.

The directory dump output option for pg_dump is similar to, but not
really, what I want (as far as I see the individual files that produces
are only the ones for reloading data, so correct me if I am wrong ...
and please show me how to do it right!)

I have played around with the custom format dump followed by pg_restore
and various options, but did not get what I wanted, at least not as
elegantly as I wanted.

What I have come up with is fairly simple in appearance, but the way it
works, by reading one line-at-a-time from the list file associated with
the dump file, and then running pg_restore with just that one line,
rinse and repeat for each piece of DDL/DML, also seems convoluted and
potentially fragile.

Something along the lines of (... if anyone thinks this a good idea, or
good starting point ...):

grep -v '^;' listfile | while read a b c n
do
a=${a/;}
echo $a > f
pg_restore -L f -f outputdir/$a dumpfile
done

This, as it is, creates a set of files named according to the id number
that pg_dump uses to identify each element. Ideally, I would like the
files named after the schema+object it represents.

Thanks for your help!

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

François Beausoleil

francois@teksol.info

over 11 years ago

In reply to: Berend Tober (#4)

Re: How to individually list the DDL for all individual data base objects

Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit :

Is there a good way to individually list the DDL for all individual data base objects?

<snip>

grep -v '^;' listfile | while read a b c n
do
a=${a/;}
echo $a > f
pg_restore -L f -f outputdir/$a dumpfile
done

This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element. Ideally, I would like the files named after the schema+object it represents.

Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file?

$ pg_restore —help
…

-f, --file=FILENAME output file name
…
-t, --table=NAME restore named table
…

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate some of these problems. YMMV.

Hope that helps!
François

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Berend Tober (#4)

Re: How to individually list the DDL for all individual data base objects

Berend Tober wrote:

Is there a good way to individually list the DDL for all individual data
base objects?

These threads might interest you:

/messages/by-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com
/messages/by-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Berend Tober

btober@broadstripe.net

over 11 years ago

In reply to: François Beausoleil (#5)

Re: How to individually list the DDL for all individual data base objects

Franï¿½ois Beausoleil wrote:

Le 2014-11-24 ï¿½ 10:14, Berend Tober <btober@broadstripe.net> a ï¿½crit :

Is there a good way to individually list the DDL for all individual data base objects?

<snip>

Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file?

Yes sir, thank you. That is what the script does.

But that little script seems fragilly-dependent upon the format of the
list file and runs (starts a new process for) pg_restore for every
individual line in the list file. ... which seems like poor practise,
generally. Additionally, I'd like stability in the output file names,
rather than the (likely changing) internal, automatically
numerically-named items.

And, btw, I want not just a single table.

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.

$ pg_restore ï¿½help
ï¿½

-f, --file=FILENAME output file name
ï¿½
-t, --table=NAME restore named table
ï¿½

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate some of these problems. YMMV.

Hope that helps!
Franï¿½ois

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Melvin Davidson

melvin6925@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#6)

Re: How to individually list the DDL for all individual data base objects

You should probably look at the pg_extractor utility.

https://github.com/omniti-labs/pg_extractor

With it, you can dump individual or selected objects to separate
directories.

On Mon, Nov 24, 2014 at 11:00 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Berend Tober wrote:

Is there a good way to individually list the DDL for all individual data
base objects?

These threads might interest you:

/messages/by-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com

/messages/by-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Adrian Klaver

adrian.klaver@aklaver.com

over 11 years ago

In reply to: Berend Tober (#7)

Re: How to individually list the DDL for all individual data base objects

On 11/24/2014 08:12 AM, Berend Tober wrote:

Franï¿½ois Beausoleil wrote:

Le 2014-11-24 ï¿½ 10:14, Berend Tober <btober@broadstripe.net> a ï¿½crit :

Is there a good way to individually list the DDL for all individual
data base objects?

<snip>

Were you aware that pg_restore can restore to STDOUT, and output DDL
for only a single named object from a custom dump file?

Yes sir, thank you. That is what the script does.

But that little script seems fragilly-dependent upon the format of the
list file and runs (starts a new process for) pg_restore for every
individual line in the list file. ... which seems like poor practise,
generally. Additionally, I'd like stability in the output file names,
rather than the (likely changing) internal, automatically
numerically-named items.

And, btw, I want not just a single table.

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.

At this point I have to ask:

What is the problem you are trying to solve?

Your last paragraph seems to be reinventing the -Fc TOC, so I am not
sure where you going with this?

$ pg_restore ï¿½help
ï¿½

-f, --file=FILENAME output file name
ï¿½
-t, --table=NAME restore named table
ï¿½

Such that you could run:

$ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump

Unfortunately, this does not respect dependencies and you may have
issues. The --disable-triggers option can alleviate some of these
problems. YMMV.

Hope that helps!
Franï¿½ois

---
This email is free from viruses and malware because avast! Antivirus
protection is active.
http://www.avast.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Berend Tober

btober@broadstripe.net

over 11 years ago

In reply to: Adrian Klaver (#9)

Re: How to individually list the DDL for all individual data base objects

Adrian Klaver wrote:

On 11/24/2014 08:12 AM, Berend Tober wrote:

Franï¿½ois Beausoleil wrote:

Le 2014-11-24 ï¿½ 10:14, Berend Tober <btober@broadstripe.net> a ï¿½crit :

Is there a good way to individually list the DDL for all individual
data base objects?

<snip>

Were you aware that pg_restore can restore to STDOUT, and output DDL
for only a single named object from a custom dump file?

<snip> <snip>

The dependance issue is potentially a problem, but (as I believe) it
looks like the output of pg_restore in a list file is in the correct
order to process dependencies, so I could walk backwards of forwards
through that if changes to one object were dependent on, or caused
cascading effects in other objects.

At this point I have to ask:

What is the problem you are trying to solve?

Your last paragraph seems to be reinventing the -Fc TOC, so I am not
sure where you going with this?

On the small scale, I wanted to have an individual file listing the DDL
for each data base object in the case that I need to modify the object
... I could start with the script that created it as a basis for
modifications, like for views and functions, etc. In the larger scale, I
was thinking I would like to check in all of the individual modules to
revision control, retaining a fine-grained control, rather than
submitting the one big comprehensive file.

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Berend Tober

btober@broadstripe.net

over 11 years ago

In reply to: Melvin Davidson (#8)

Re: How to individually list the DDL for all individual data base objects

Melvin Davidson wrote:

You should probably look at the pg_extractor utility.

https://github.com/omniti-labs/pg_extractor

With it, you can dump individual or selected objects to separate
directories.

That looks like what I'm looking for. (Note: I did Google searching, but
apparently did not use the right set of search terms, because this tool
really is described as being what it is that I am looking to do!)

Thanks!

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Bill Moran

wmoran@potentialtech.com

over 11 years ago

In reply to: Sanjaya Vithanagama (#1)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has
millions of rows. We have several stored procedures which do mass updates
and deletes. Those stored procedures are being called concurrently mainly
by triggers and application code.

The operations usually look like the following where it could match
thousands of records to update or delete:

DELETE FROM table_name t
USING (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) del
WHERE t.id_A = del.id_A
AND t.id_B = del.id_B;

UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform operations
with locks (row level using SELECT FOR UPDATE as used in the above queries
and table level locks) do not seem to solve these deadlock issues. (Note
that we cannot in any way use access exclusive locking on this table
because of the performance impact)

Is there another way that we could try to solve these deadlock situations?
The reference manual says ? "The best defense against deadlocks is
generally to avoid them by being certain that all applications using a
database acquire locks on multiple objects in a consistent order."

Is there a guaranteed way to do bulk update/delete operations in a
particular order so that we can ensure deadlocks won't occur? Or are there
any other tricks to avoid deadlocks in this situation?

Lots of stuff to say about this ...

First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.

Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:

* How many UPDATE/INSERT queries are you running per second?
* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
this problem?
* How frequently do deadlocks occur?
* Are there other tables involved in the transactions ... i.e., have you
confirmed that these are the _only_ tables causing the deadlock?

Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.

I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Sanjaya Vithanagama

svithanagama@gmail.com

over 11 years ago

In reply to: Bill Moran (#12)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Tue, Nov 25, 2014 at 4:42 AM, Bill Moran <wmoran@potentialtech.com>
wrote:

On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

Hi All,

We have a single table which does not have any foreign key references.

id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)

The primary key of the table is a composite of id_A and id_B.

Reads and writes of this table are highly concurrent and the table has
millions of rows. We have several stored procedures which do mass updates
and deletes. Those stored procedures are being called concurrently mainly
by triggers and application code.

The operations usually look like the following where it could match
thousands of records to update or delete:

DELETE FROM table_name t
USING (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) del
WHERE t.id_A = del.id_A
AND t.id_B = del.id_B;

UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B;

We are experiencing deadlocks and all our attempts to perform operations
with locks (row level using SELECT FOR UPDATE as used in the above

queries

and table level locks) do not seem to solve these deadlock issues. (Note
that we cannot in any way use access exclusive locking on this table
because of the performance impact)

Is there another way that we could try to solve these deadlock

situations?

The reference manual says ? "The best defense against deadlocks is
generally to avoid them by being certain that all applications using a
database acquire locks on multiple objects in a consistent order."

Is there a guaranteed way to do bulk update/delete operations in a
particular order so that we can ensure deadlocks won't occur? Or are

there

any other tricks to avoid deadlocks in this situation?

Lots of stuff to say about this ...

First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.

We used sub-selects in the delete queries was with the hope that we could
lock all the rows that will be deleted before the actual deletion happens.
(So that another transaction won't grab the lock for a row which will be
deleted).

Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:

* How many UPDATE/INSERT queries are you running per second?

In peak hours it could be anywhere from 100-250. The problem occurs when
two triggers happen to update/delete the same rows at once.

* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
this problem?

The updates are the most problematic with the execution time being in the
rage of 5-50 seconds.

* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server.
To reproduce the problem easily we've written a simple Java class with
multiple threads calling to the stored procedures running the above queries
inside a loop. This way we can easily recreate a scenario that happens in
the production.

* Are there other tables involved in the transactions ... i.e., have you
confirmed that these are the _only_ tables causing the deadlock?

Yes, there are no other tables involved with the quires so we can eliminate
any deadlock issues related to foreign key references.

Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.

The log output looks like the following: (I have abstracted away the
function names are real queries but this represents the actual issue)

ERROR: deadlock detected
DETAIL: Process 54624 waits for ShareLock on transaction 14164828; blocked
by process 54605.
Process 54605 waits for ShareLock on transaction 14164827; blocked by
process 54624.
Process 54624: SELECT 1 FROM proc_delete()
Process 54605: SELECT 1 FROM proc_update()
HINT: See server log for query details.
CONTEXT: SQL statement "UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B"
PL/pgSQL function proc_delete() line 22 at SQL statement
SQL statement "SELECT proc_delete()"
PL/pgSQL function calling_function() line 6 at PERFORM
STATEMENT: SELECT 1 FROM calling_function()
ERROR: current transaction is aborted, commands ignored until end of
transaction block

I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.

It'll be really good if we can get some ideas/alternative suggestion on how
to solve this one. It's been affecting our production servers for weeks and
we still haven't come across a concrete solution which fixes them.

Thank You.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sanjaya

#14

Bill Moran

wmoran@potentialtech.com

over 11 years ago

In reply to: Sanjaya Vithanagama (#13)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server.
To reproduce the problem easily we've written a simple Java class with
multiple threads calling to the stored procedures running the above queries
inside a loop. This way we can easily recreate a scenario that happens in
the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely. The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

Sanjaya Vithanagama

svithanagama@gmail.com

over 11 years ago

In reply to: Bill Moran (#14)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com>
wrote:

On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server.
To reproduce the problem easily we've written a simple Java class with
multiple threads calling to the stored procedures running the above

queries

inside a loop. This way we can easily recreate a scenario that happens in
the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

When you say replay the transaction, I believe that is to catch the
exception inside the stored procedure? We've considered that option at one
state but, the problem with that is we don't have enough context
information at the stored procedure where this deadlock occurs.

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely. The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

Given that we have no control over how Postgres performs delete and update
operations, the only other possibility seems to be to partition this table
by id_A (so that the individual tables will never be deadlocked). But that
seems to be a too extreme end option at this stage.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sanjaya

#16

Bill Moran

wmoran@potentialtech.com

over 11 years ago

In reply to: Sanjaya Vithanagama (#15)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Thu, 27 Nov 2014 15:07:49 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com>
wrote:

On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server.
To reproduce the problem easily we've written a simple Java class with
multiple threads calling to the stored procedures running the above

queries

inside a loop. This way we can easily recreate a scenario that happens in
the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

When you say replay the transaction, I believe that is to catch the
exception inside the stored procedure? We've considered that option at one
state but, the problem with that is we don't have enough context
information at the stored procedure where this deadlock occurs.

Why not catch it in the application calling the stored procedure?

I don't understand how you could not have enough context to run the command
you were just trying to run. Can you elaborate on what you mean by that?

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely. The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

Given that we have no control over how Postgres performs delete and update
operations, the only other possibility seems to be to partition this table
by id_A (so that the individual tables will never be deadlocked). But that
seems to be a too extreme end option at this stage.

That would be overcomplicating the solution, and almost certainly won't work
anyway. If you're getting deadlocks, it's because two processes are trying
to modify the same rows. Even if you partition, those same rows will be on
the same partition, so you'll still deadlock.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

rob stone

floriparob@gmail.com

over 11 years ago

In reply to: Bill Moran (#16)

Re: Avoiding deadlocks when performing bulk update and delete operations

On Thu, 2014-11-27 at 06:49 -0500, Bill Moran wrote:

On Thu, 27 Nov 2014 15:07:49 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com>
wrote:

On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:

* How frequently do deadlocks occur?

We are seeing deadlocks about 2-3 times per day in the production server.
To reproduce the problem easily we've written a simple Java class with
multiple threads calling to the stored procedures running the above

queries

inside a loop. This way we can easily recreate a scenario that happens in
the production.

Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).

When you say replay the transaction, I believe that is to catch the
exception inside the stored procedure? We've considered that option at one
state but, the problem with that is we don't have enough context
information at the stored procedure where this deadlock occurs.

Why not catch it in the application calling the stored procedure?

I don't understand how you could not have enough context to run the command
you were just trying to run. Can you elaborate on what you mean by that?

I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely. The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.

The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.

2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.

Given that we have no control over how Postgres performs delete and update
operations, the only other possibility seems to be to partition this table
by id_A (so that the individual tables will never be deadlocked). But that
seems to be a too extreme end option at this stage.

That would be overcomplicating the solution, and almost certainly won't work
anyway. If you're getting deadlocks, it's because two processes are trying
to modify the same rows. Even if you partition, those same rows will be on
the same partition, so you'll still deadlock.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com

I've been following this discussion for a few days.
In my experience, deadlocks are either caused by poor database design or
within the application.
As you are only talking about a single table, then the problem has to be
with the application.
You mentioned that you are using Java and so can we assume there is a
class for the table with getters and setters?
The table should also have its action factory where there ought to be a
method to handle updates and deletes?
Instead of passing int_a and int_b values via an array, why not have a
method to which you pass an int_a value, an int_b value, a boolean to
indicate either update or delete and the values for update or nulls if
deleting.
Then you can go into transaction state, lock the row for update and
throw all the exceptions you need and gracefully handle any feedback to
your users.
If you cannot lock the row just display an alert "Please try again in
five minutes" or something similar.

HTH.

Rob

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general