debugging intermittent slow updates under higher load
Hi All,
This is on postgres 9.4.16, same table as the last question I asked,
here's an abbreviated desc:
# \d alerts_alert
Table "public.alerts_alert"
Column | Type | Modifiers
-----------------+--------------------------+-----------
tags | jsonb | not null
id | character varying(86) | not null
...
Indexes:
"alerts_alert_pkey" PRIMARY KEY, btree (id)
The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.
Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:
UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';
Here's a sample explain:
https://explain.depesz.com/s/Fjq8
What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?
cheers,
Chris
The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';Here's a sample explain:
https://explain.depesz.com/s/Fjq8
What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?
Hello Chris,
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?
Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.
Best,
Alex
Also read about hot updates and the storage parameter named "fill_factor",
so, data blocks can be recycled instead of creating new ones if the updated
fields don't update also indexes.
Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
<bashtanov@imap.cc>:
The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';Here's a sample explain:
https://explain.depesz.com/s/Fjq8
What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?Hello Chris,
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.Best,
Alex
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/
This parameter can be updated on a "per table" basis.
Am Mi., 5. Dez. 2018 um 09:47 Uhr schrieb Rene Romero Benavides <
rene.romero.b@gmail.com>:
Also read about hot updates and the storage parameter named "fill_factor",
so, data blocks can be recycled instead of creating new ones if the updated
fields don't update also indexes.Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
<bashtanov@imap.cc>:The table has around 1.5M rows which have been updated/inserted around
121M times, the distribution of updates to row in alerts_alert will be
quite uneven, from 1 insert up to 1 insert and 0.5M updates.Under high load (200-300 inserts/updates per second) we see occasional
(~10 per hour) updates taking excessively long times (2-10s). These
updates are always of the form:UPDATE "alerts_alert" SET ...bunch of fields... WHERE
"alerts_alert"."id" = '...sha1 hash...';Here's a sample explain:
https://explain.depesz.com/s/Fjq8
What could be causing this? What could we do to debug? What config
changes could we make to alleviate this?Hello Chris,
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.Best,
Alex--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/
--
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/
On 05/12/2018 15:40, Alexey Bashtanov wrote:
One of the reasons could be the row already locked by another backend,
doing the same kind of an update or something different.
Are these updates performed in a longer transactions?
Nope, the transaction will just be updating one row at a time.
Can they hit the same row from two clients at the same time?
I've looked for evidence of this, but can't find any. Certainly nothing
running for 2-10s, queries against this table are normally a few hundred ms.
Is there any other write or select-for-update/share load on the table?
Not that I'm aware of. How would I go about getting metrics on problems
like these?
Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to
lock the same row or object.
Is there any existing tooling that does this? I'm loath to start hacking
something up when I'd hope others have done a better job already...
Chris
On 05/12/2018 15:47, Rene Romero Benavides wrote:
Also read about hot updates and the storage parameter named
"fill_factor", so, data blocks can be recycled instead of creating new
ones if the updated fields don't update also indexes.
I have read about these, but I'd prefer not to be making
opportunistic/guessing changes on this.
How can I collect metrics/logging/etc evidence to confirm what the
problem actually is?
cheers,
Chris
Is there any existing tooling that does this?
There must be some, google for queries involving pg_locks
I'm loath to start hacking something up when I'd hope others have done
a better job already...
If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?
On 06/12/2018 11:00, Alexey Bashtanov wrote:
I'm loath to start hacking something up when I'd hope others have done
a better job already...If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?
Nope, only ones logged are these updates.
Chris
Hi
čt 6. 12. 2018 v 12:18 odesílatel Chris Withers <chris@withers.org> napsal:
On 06/12/2018 11:00, Alexey Bashtanov wrote:
I'm loath to start hacking something up when I'd hope others have done
a better job already...If you log all queries that take more than a second to complete, is your
update the only one logged, or something (the would-be blocker) gets
logged down together with it?Nope, only ones logged are these updates.
Can you check latency on file system? Some latencies can be enforced by
overloaded file system due wrong configuration of file system cache.
https://serverfault.com/questions/471070/linux-file-system-cache-move-data-from-dirty-to-writeback
Regards
Pavel
Show quoted text
Chris