debugging intermittent slow updates under higher load

Started by Chris Withersover 7 years ago9 messagesgeneral

chris@withers.org

over 7 years ago

Hi All,

This is on postgres 9.4.16, same table as the last question I asked,
here's an abbreviated desc:

The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.

Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:

UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';

Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?

cheers,

Chris

Alexey Bashtanov

bashtanov@imap.cc

over 7 years ago

In reply to: Chris Withers (#1)

Re: debugging intermittent slow updates under higher load

The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.

Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:

UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';

Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?

Hello Chris,

One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?

Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.

Best,
Alex

Rene Romero Benavides

rene.romero.b@gmail.com

over 7 years ago

In reply to: Alexey Bashtanov (#2)

Re: debugging intermittent slow updates under higher load

Also read about hot updates and the storage parameter named "fill_factor",
so, data blocks can be recycled instead of creating new ones if the updated
fields don't update also indexes.

Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
<bashtanov@imap.cc>:

The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.

Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:

UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';

Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?

Hello Chris,

One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?

Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.

Best,
Alex

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Rene Romero Benavides

rene.romero.b@gmail.com

over 7 years ago

In reply to: Rene Romero Benavides (#3)

Re: debugging intermittent slow updates under higher load

This parameter can be updated on a "per table" basis.

Am Mi., 5. Dez. 2018 um 09:47 Uhr schrieb Rene Romero Benavides <
rene.romero.b@gmail.com>:

Also read about hot updates and the storage parameter named "fill_factor",
so, data blocks can be recycled instead of creating new ones if the updated
fields don't update also indexes.

Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
<bashtanov@imap.cc>:

The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.

Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:

UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';

Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?

Hello Chris,

One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?

Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.

Best,
Alex

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Chris Withers

chris@withers.org

over 7 years ago

In reply to: Alexey Bashtanov (#2)

Re: debugging intermittent slow updates under higher load

On 05/12/2018 15:40, Alexey Bashtanov wrote:

One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?

Nope, the transaction will just be updating one row at a time.

Can they hit the same row from two clients at the same time?

I've looked for evidence of this, but can't find any. Certainly nothing
running for 2-10s, queries against this table are normally a few hundred ms.

Is there any other write or select-for-update/share load on the table?

Not that I'm aware of. How would I go about getting metrics on problems
like these?

Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.

Is there any existing tooling that does this? I'm loath to start hacking
something up when I'd hope others have done a better job already...

Chris

Chris Withers

chris@withers.org

over 7 years ago

In reply to: Rene Romero Benavides (#3)

Re: debugging intermittent slow updates under higher load

On 05/12/2018 15:47, Rene Romero Benavides wrote:

Also read about hot updates and the storage parameter named
"fill_factor", so, data blocks can be recycled instead of creating new
ones if the updated fields don't update also indexes.

I have read about these, but I'd prefer not to be making
opportunistic/guessing changes on this.

How can I collect metrics/logging/etc evidence to confirm what the
problem actually is?

cheers,

Chris

Alexey Bashtanov

bashtanov@imap.cc

over 7 years ago

In reply to: Chris Withers (#5)

Re: debugging intermittent slow updates under higher load

Is there any existing tooling that does this?

There must be some, google for queries involving pg_locks

I'm loath to start hacking something up when I'd hope others have done
a better job already...

If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?

Chris Withers

chris@withers.org

over 7 years ago

In reply to: Alexey Bashtanov (#7)

Re: debugging intermittent slow updates under higher load

On 06/12/2018 11:00, Alexey Bashtanov wrote:

I'm loath to start hacking something up when I'd hope others have done
a better job already...

If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?

Nope, only ones logged are these updates.

Chris

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Chris Withers (#8)

Re: debugging intermittent slow updates under higher load

čt 6. 12. 2018 v 12:18 odesílatel Chris Withers <chris@withers.org> napsal:

On 06/12/2018 11:00, Alexey Bashtanov wrote:

I'm loath to start hacking something up when I'd hope others have done
a better job already...

If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?

Nope, only ones logged are these updates.

Can you check latency on file system? Some latencies can be enforced by
overloaded file system due wrong configuration of file system cache.

https://serverfault.com/questions/471070/linux-file-system-cache-move-data-from-dirty-to-writeback

Regards

Pavel

Show quoted text

Chris