WAL insert delay settings
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.
For synchronous replication, this can make seemingly unrelated sessions
hang. But also for asynchronous replication, it will increase latency.
One idea to address this is to slow down WAL-generating maintenance
operations. This is similar to the vacuum delay. Where the vacuum
delay counts notional I/O cost before sleeping, here we would count how
much WAL has been generated and sleep after some amount.
I attach an example patch for this functionality. It introduces three
settings:
wal_insert_delay_enabled
wal_insert_delay
wal_insert_delay_size
When you turn on wal_insert_delay_enabled, then it will sleep for
wal_insert_delay after the session has produced wal_insert_delay_size of
WAL data.
The idea is that you would tune wal_insert_delay and
wal_insert_delay_size to your required performance characteristics and
then turn on wal_insert_delay_enabled individually in maintenance jobs
or similar.
To test, for example, set up pgbench with synchronous replication and
run an unrelated large index build in a separate session. With the
settings, you can make it as fast or as slow as you want.
Tuning these settings, however, is quite mysterious I fear. You have to
play around a lot to get settings that achieve the right balance.
So, some questions:
Is this useful?
Any other thoughts on how to configure this or do this?
Should we aim for a more general delay system, possibly including vacuum
delay and perhaps something else?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-wal_insert_delay-configuration-settings.patchtext/plain; charset=UTF-8; name=0001-wal_insert_delay-configuration-settings.patch; x-mac-creator=0; x-mac-type=0Download+110-1
Hi,
On February 13, 2019 1:16:07 PM GMT+01:00, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can
create
a lot of WAL. A lot of WAL at once can cause delays in replication.
For synchronous replication, this can make seemingly unrelated sessions
hang. But also for asynchronous replication, it will increase latency.One idea to address this is to slow down WAL-generating maintenance
operations. This is similar to the vacuum delay. Where the vacuum
delay counts notional I/O cost before sleeping, here we would count how
much WAL has been generated and sleep after some amount.I attach an example patch for this functionality. It introduces three
settings:wal_insert_delay_enabled
wal_insert_delay
wal_insert_delay_sizeWhen you turn on wal_insert_delay_enabled, then it will sleep for
wal_insert_delay after the session has produced wal_insert_delay_size
of
WAL data.The idea is that you would tune wal_insert_delay and
wal_insert_delay_size to your required performance characteristics and
then turn on wal_insert_delay_enabled individually in maintenance jobs
or similar.To test, for example, set up pgbench with synchronous replication and
run an unrelated large index build in a separate session. With the
settings, you can make it as fast or as slow as you want.Tuning these settings, however, is quite mysterious I fear. You have
to
play around a lot to get settings that achieve the right balance.So, some questions:
Is this useful?
Any other thoughts on how to configure this or do this?
Should we aim for a more general delay system, possibly including
vacuum
delay and perhaps something else?
Interesting idea, not yet quite sure what to think. But I don't think the way you did it is acceptable - we can't just delay while holding buffer locks, in critical sections, while not interruptible.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi
13.02.2019 17:16, Peter Eisentraut пишет:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.
For synchronous replication, this can make seemingly unrelated sessions
hang. But also for asynchronous replication, it will increase latency.One idea to address this is to slow down WAL-generating maintenance
operations. This is similar to the vacuum delay. Where the vacuum
delay counts notional I/O cost before sleeping, here we would count how
much WAL has been generated and sleep after some amount.I attach an example patch for this functionality. It introduces three
settings:wal_insert_delay_enabled
wal_insert_delay
wal_insert_delay_sizeWhen you turn on wal_insert_delay_enabled, then it will sleep for
wal_insert_delay after the session has produced wal_insert_delay_size of
WAL data.The idea is that you would tune wal_insert_delay and
wal_insert_delay_size to your required performance characteristics and
then turn on wal_insert_delay_enabled individually in maintenance jobs
or similar.To test, for example, set up pgbench with synchronous replication and
run an unrelated large index build in a separate session. With the
settings, you can make it as fast or as slow as you want.Tuning these settings, however, is quite mysterious I fear. You have to
play around a lot to get settings that achieve the right balance.So, some questions:
Is this useful?
Any other thoughts on how to configure this or do this?
Should we aim for a more general delay system, possibly including vacuum
delay and perhaps something else?
I think it's better to have more general cost-based settings which allow
to control performance. Something like what have been already done for
autovacuum.
For example, introduce vacuum-similar mechanism with the following
controlables:
maintenance_cost_page_hit
maintenance_cost_page_miss
maintenance_cost_page_dirty
maintenance_cost_delay
maintenance_cost_limit
maintenance_cost_delay=0 (default) means feature is disabled, but if
user wants to limit performance he can define such parameters in
per-session, or per-user manner. Especially it can be useful for
limiting an already running sessions, such as mass deletion, or pg_dump.
Of course, it's just an idea, because I can't imagine how many things
should be touched in order to implement this.
Regards, Alexey Lesovsky
Andres Freund <andres@anarazel.de> writes:
On February 13, 2019 1:16:07 PM GMT+01:00, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
One idea to address this is to slow down WAL-generating maintenance
operations. This is similar to the vacuum delay. Where the vacuum
delay counts notional I/O cost before sleeping, here we would count how
much WAL has been generated and sleep after some amount.
Interesting idea, not yet quite sure what to think. But I don't think the way you did it is acceptable - we can't just delay while holding buffer locks, in critical sections, while not interruptible.
Yeah. Maybe it could be done in a less invasive way by just having the
WAL code keep a running sum of how much WAL this process has created,
and then letting the existing vacuum-delay infrastructure use that as
one of its how-much-IO-have-I-done inputs.
Not sure if that makes the tuning problem easier or harder, but
it seems reasonable on its face to count WAL emission as I/O.
regards, tom lane
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.
Agreed, though I think VACUUM should certainly be included in this.
I'm all for the idea though it seems like a different approach is needed
based on the down-thread discussion. Ultimately, having a way to have
these activities happen without causing issues for replicas is a great
idea and would definitely address a practical issue that a lot of people
run into.
Thanks!
Stephen
On 13/02/2019 13:18, Andres Freund wrote:
But I don't think the way you did it is acceptable - we can't just delay while holding buffer locks, in critical sections, while not interruptible.
The code I added to XLogInsertRecord() is not inside the critical section.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On February 13, 2019 4:39:21 PM GMT+01:00, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 13/02/2019 13:18, Andres Freund wrote:
But I don't think the way you did it is acceptable - we can't just
delay while holding buffer locks, in critical sections, while not
interruptible.The code I added to XLogInsertRecord() is not inside the critical
section.
Most callers do xlog insertions inside crit sections though.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 13.02.2019 19:57, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On February 13, 2019 1:16:07 PM GMT+01:00, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
One idea to address this is to slow down WAL-generating maintenance
operations. This is similar to the vacuum delay. Where the vacuum
delay counts notional I/O cost before sleeping, here we would count how
much WAL has been generated and sleep after some amount.Interesting idea, not yet quite sure what to think. But I don't think the way you did it is acceptable - we can't just delay while holding buffer locks, in critical sections, while not interruptible.
Yeah. Maybe it could be done in a less invasive way by just having the
WAL code keep a running sum of how much WAL this process has created,
and then letting the existing vacuum-delay infrastructure use that as
one of its how-much-IO-have-I-done inputs.Not sure if that makes the tuning problem easier or harder, but
it seems reasonable on its face to count WAL emission as I/O.regards, tom lane
Also we can add a 'soft' clause to DML queries. It will some abstraction
for background query execution. It can contain the WAL write velocity
limit parameter (as Tom proposed) and may some another.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
On Wed, Feb 13, 2019 at 4:16 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.
For synchronous replication, this can make seemingly unrelated sessions
hang. But also for asynchronous replication, it will increase latency.
I think that I suggested a feature like this early during my time at
Heroku, about 5 years ago. There would occasionally be cases where ops
would find it useful to throttle WAL writing using their own terrible
kludge (it involved sending SIGSTOP to the WAL writer).
I recall that this idea was not well received at the time. I still
think it's a good idea, though. Provided there is a safe way to get it
to work.
--
Peter Geoghegan
Hi,
On 2019-02-13 23:21:39 -0800, Peter Geoghegan wrote:
There would occasionally be cases where ops
would find it useful to throttle WAL writing using their own terrible
kludge (it involved sending SIGSTOP to the WAL writer).
That can't have been the workaround - either you'd interrupt it while
holding critical locks (in which case nobody could write WAL anymore),
or you'd just move all the writing to backends, no?
Greetings,
Andres Freund
On Thu, Feb 14, 2019 at 12:42 AM Andres Freund <andres@anarazel.de> wrote:
That can't have been the workaround - either you'd interrupt it while
holding critical locks (in which case nobody could write WAL anymore),
or you'd just move all the writing to backends, no?
I imagine that it held the critical locks briefly. I'm not endorsing
that approach, obviously, but apparently it more or less worked. It
was something that was used in rare cases, only when there was no
application-specific way to throttle writes, and only when the server
was in effect destabilized by writing out WAL too quickly.
--
Peter Geoghegan
On Thu, Feb 14, 2019 at 12:52 AM Peter Geoghegan <pg@bowt.ie> wrote:
I imagine that it held the critical locks briefly.
I didn't mention that the utility they used would send SIGSTOP and
SIGCONT in close succession. (Yeah, I know.)
--
Peter Geoghegan
On Thu, Feb 14, 2019 at 12:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
I didn't mention that the utility they used would send SIGSTOP and
SIGCONT in close succession. (Yeah, I know.)
Actually, it SIGSTOP'd backends, not the WAL writer or background writer.
--
Peter Geoghegan
On 2/13/19 4:31 PM, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.Agreed, though I think VACUUM should certainly be included in this.
Won't these two throttling criteria interact in undesirable and/or
unpredictable way? With the regular vacuum throttling (based on
hit/miss/dirty) it's possible to compute rough read/write I/O limits.
But with the additional sleeps based on amount-of-WAL, we may sleep for
one of two reasons, so we may not reach either limit. No?
I'm all for the idea though it seems like a different approach is needed
based on the down-thread discussion. Ultimately, having a way to have
these activities happen without causing issues for replicas is a great
idea and would definitely address a practical issue that a lot of people
run into.
+1
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2019-02-14 10:00:38 +0100, Tomas Vondra wrote:
On 2/13/19 4:31 PM, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.Agreed, though I think VACUUM should certainly be included in this.
Won't these two throttling criteria interact in undesirable and/or
unpredictable way? With the regular vacuum throttling (based on
hit/miss/dirty) it's possible to compute rough read/write I/O limits.
But with the additional sleeps based on amount-of-WAL, we may sleep for
one of two reasons, so we may not reach either limit. No?
Well, it'd be max rates for either, if done right. I think we only
should start adding delays for WAL logging if we're exceeding the WAL
write rate. That's obviously more complicated than the stuff we do for
the current VACUUM throttling, but I can't see two such systems
interacting well. Also, the current logic just doesn't work well when
you consider IO actually taking time, and/or process scheduling effects
on busy systems.
Greetings,
Andres Freund
On 2/14/19 10:06 AM, Andres Freund wrote:
Hi,
On 2019-02-14 10:00:38 +0100, Tomas Vondra wrote:
On 2/13/19 4:31 PM, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can create
a lot of WAL. A lot of WAL at once can cause delays in replication.Agreed, though I think VACUUM should certainly be included in this.
Won't these two throttling criteria interact in undesirable and/or
unpredictable way? With the regular vacuum throttling (based on
hit/miss/dirty) it's possible to compute rough read/write I/O limits.
But with the additional sleeps based on amount-of-WAL, we may sleep for
one of two reasons, so we may not reach either limit. No?Well, it'd be max rates for either, if done right. I think we only
should start adding delays for WAL logging if we're exceeding the WAL
write rate.
Not really, I think. If you add additional sleep() calls somewhere, that
may affect the limits in vacuum, making it throttle before reaching the
derived throughput limits.
That's obviously more complicated than the stuff we do for
the current VACUUM throttling, but I can't see two such systems
interacting well. Also, the current logic just doesn't work well when
you consider IO actually taking time, and/or process scheduling effects
on busy systems.
True, but making it even less predictable is hardly an improvement.
cheers
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On February 14, 2019 10:31:57 AM GMT+01:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 2/14/19 10:06 AM, Andres Freund wrote:
Hi,
On 2019-02-14 10:00:38 +0100, Tomas Vondra wrote:
On 2/13/19 4:31 PM, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can
create
a lot of WAL. A lot of WAL at once can cause delays in
replication.
Agreed, though I think VACUUM should certainly be included in this.
Won't these two throttling criteria interact in undesirable and/or
unpredictable way? With the regular vacuum throttling (based on
hit/miss/dirty) it's possible to compute rough read/write I/Olimits.
But with the additional sleeps based on amount-of-WAL, we may sleep
for
one of two reasons, so we may not reach either limit. No?
Well, it'd be max rates for either, if done right. I think we only
should start adding delays for WAL logging if we're exceeding the WAL
write rate.Not really, I think. If you add additional sleep() calls somewhere,
that
may affect the limits in vacuum, making it throttle before reaching the
derived throughput limits.
I don't understand. Obviously, if you have two limits, the scarcer resource can limit full use of the other resource. That seems OK? The thing u think we need to be careful about is not to limit in a way, e.g. by adding sleeps even when below the limit, that a WAL limit causes throttling of normal IO before the WAL limit is reached.
That's obviously more complicated than the stuff we do for
the current VACUUM throttling, but I can't see two such systems
interacting well. Also, the current logic just doesn't work well when
you consider IO actually taking time, and/or process schedulingeffects
on busy systems.
True, but making it even less predictable is hardly an improvement.
I don't quite see the problem here. Could you expand?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 2/14/19 10:36 AM, Andres Freund wrote:
On February 14, 2019 10:31:57 AM GMT+01:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 2/14/19 10:06 AM, Andres Freund wrote:
Hi,
On 2019-02-14 10:00:38 +0100, Tomas Vondra wrote:
On 2/13/19 4:31 PM, Stephen Frost wrote:
Greetings,
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
Bulk operations like CREATE INDEX, ALTER TABLE, or bulk loads can
create
a lot of WAL. A lot of WAL at once can cause delays in
replication.
Agreed, though I think VACUUM should certainly be included in this.
Won't these two throttling criteria interact in undesirable and/or
unpredictable way? With the regular vacuum throttling (based on
hit/miss/dirty) it's possible to compute rough read/write I/Olimits.
But with the additional sleeps based on amount-of-WAL, we may sleep
for
one of two reasons, so we may not reach either limit. No?
Well, it'd be max rates for either, if done right. I think we only
should start adding delays for WAL logging if we're exceeding the WAL
write rate.Not really, I think. If you add additional sleep() calls somewhere,
that
may affect the limits in vacuum, making it throttle before reaching the
derived throughput limits.I don't understand. Obviously, if you have two limits, the scarcer
resource can limit full use of the other resource. That seems OK? The
thing u think we need to be careful about is not to limit in a way,
e.g. by adding sleeps even when below the limit, that a WAL limit
causes throttling of normal IO before the WAL limit is reached.
With the vacuum throttling, rough I/O throughput maximums can be
computed by by counting the number of pages you can read/write between
sleeps. For example with the defaults (200 credits, 20ms sleeps, miss
cost 10 credits) this means 20 writes/round, with 50 rounds/second, so
8MB/s. But this is based on assumption that the work between sleeps
takes almost no time - that's not perfect, but generally works.
But if you add extra sleep() calls somewhere (say because there's also
limit on WAL throughput), it will affect how fast VACUUM works in
general. Yet it'll continue with the cost-based throttling, but it will
never reach the limits. Say you do another 20ms sleep somewhere.
Suddenly it means it only does 25 rounds/second, and the actual write
limit drops to 4 MB/s.
That's obviously more complicated than the stuff we do for
the current VACUUM throttling, but I can't see two such systems
interacting well. Also, the current logic just doesn't work well when
you consider IO actually taking time, and/or process schedulingeffects
on busy systems.
True, but making it even less predictable is hardly an improvement.
I don't quite see the problem here. Could you expand?
All I'm saying that you can now estimate how much reads/writes vacuum
does. With the extra sleeps (due to additional throttling mechanism) it
will be harder.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 14/02/2019 11:03, Tomas Vondra wrote:
But if you add extra sleep() calls somewhere (say because there's also
limit on WAL throughput), it will affect how fast VACUUM works in
general. Yet it'll continue with the cost-based throttling, but it will
never reach the limits. Say you do another 20ms sleep somewhere.
Suddenly it means it only does 25 rounds/second, and the actual write
limit drops to 4 MB/s.
I think at a first approximation, you probably don't want to add WAL
delays to vacuum jobs, since they are already slowed down, so the rate
of WAL they produce might not be your first problem. The problem is
more things like CREATE INDEX CONCURRENTLY that run at full speed.
That leads to an alternative idea of expanding the existing cost-based
vacuum delay system to other commands.
We could even enhance the cost system by taking WAL into account as an
additional factor.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 13/02/2019 16:40, Andres Freund wrote:
On February 13, 2019 4:39:21 PM GMT+01:00, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 13/02/2019 13:18, Andres Freund wrote:
But I don't think the way you did it is acceptable - we can't just
delay while holding buffer locks, in critical sections, while not
interruptible.The code I added to XLogInsertRecord() is not inside the critical
section.Most callers do xlog insertions inside crit sections though.
Is it a problem that pg_usleep(CommitDelay) is inside a critical section?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services