Restart pg_usleep when interrupted
Attachment protected by Amazon:
[0001-Handle-Sleep-interrupts.patch]
https://us-east-1.secure-attach.amazon.com/fcdc82ce-7887-4aa1-af9e-c1161a6b1d6f/bc81fa24-41de-48f9-a767-a6d15801754b
Amazon has replaced attachment with download link. Downloads will be available until July 28, 2024, 19:59 (UTC+00:00).
[Tell us what you think] https://amazonexteu.qualtrics.com/jfe/form/SV_ehuz6zGo8YnsRKK
[For more information click here] https://docs.secure-attach.amazon.com/guide
Hi,
In the proposal by Bertrand [1]/messages/by-id/ZnbIIUQpFJKAJx2W@ip-10-97-1-34.eu-west-3.compute.internal to implement vacuum cost delay tracking
in pg_stat_progress_vacuum, it was discovered that the vacuum cost delay
ends early on the leader process of a parallel vacuum due to parallel workers
reporting progress on index vacuuming, which was introduced in 17
with commit [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=46ebdfe164c61fbac961d1eb7f40e9a684289ae6. With this patch, everytime a parallel worker
completes a vacuum index, it will send a completion message to the leader.
The facility that allows a parallel worker to report progress to the leader was
introduced in commit [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f1889729dd3ab0352dc0ccc2ffcc1b1901f8e39f.
In the case of the patch being proposed by Bertrand, the number of interrupts
will be much more frequent as parallel workers would send a message to the leader
to update the vacuum delay counters every vacuum_delay_point call.
Looking into this, the ideal solution appears to provide the ability for a pg_usleep
call to restart after being interrupted. Thanks to the usage of nanosleep as
of commit[4]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a948e49e2ef11815be0b211723bfc5b53b7f75a8, this should be trivial to do as nanosleep
provides a remaining time, which can be used to restart the sleep. This
idea was also mentioned in [5]/messages/by-id/24848.1473607575@sss.pgh.pa.us.
I am attaching a POC patch that does the $SUBJECT. Rather than changing the
existing pg_usleep, the patch introduces a function, pg_usleep_handle_interrupt,
that takes in the sleep duration and a boolean flag to force handling of the
interrupt, or not.
This function can replace pg_usleep inside vacuum_delay_point, and could be
useful in other cases in which it’s important to handle interrupts.
For Windows, the “force” = “true” flag of the new function uses SleepEx which
from what I can tell based on the code comments is a non-interruptible sleep.
Thoughts?
[1]: /messages/by-id/ZnbIIUQpFJKAJx2W@ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=46ebdfe164c61fbac961d1eb7f40e9a684289ae6
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f1889729dd3ab0352dc0ccc2ffcc1b1901f8e39f
[4]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a948e49e2ef11815be0b211723bfc5b53b7f75a8
[5]: /messages/by-id/24848.1473607575@sss.pgh.pa.us
Regards,
Sami Imseih
Amazon Web Services (AWS)
Import Notes
Reference msg id not found: 152895F3-6B22-45ED-8038-9E14A72AAC88@amazon.com
Hi,
I think you need to find a way to disable this "Attachment protected
by Amazon" thing:
/messages/by-id/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-000000@email.amazonses.com
We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.
...Robert
I think you need to find a way to disable this "Attachment protected
by Amazon" thing:
Yes, I am looking into that. I only noticed after sending the email.
Sorry about that.
Sami
We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.…Robert
Moving to another email going forward.
Reattaching the patch.
Attachments:
0001-Handle-Sleep-interrupts.patchapplication/octet-stream; name=0001-Handle-Sleep-interrupts.patch; x-unix-mode=0644Download+22-4
Sami Imseih <samimseih@gmail.com> writes:
Reattaching the patch.
I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.
Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.
regards, tom lane
On Sat, Jun 29, 2024 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.
In case it's useful for someone looking into that, in earlier
discussions we figured out that it is possible to have high resolution
timeouts AND support latch multiplexing on Linux, FreeBSD, macOS:
/messages/by-id/CA+hUKG+hC9mFx8tEcBsyo7-cAfWgtbRy1eDizeFuff2K7T=4bA@mail.gmail.com
Thanks for the feedback!
On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sami Imseih <samimseih@gmail.com> writes:
Reattaching the patch.
I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.
The case which brought up this discussion is the pg_usleep that
is called within the vacuum_delay_point being interrupted.
When I read the same code comment you cited, it sounded to me
that “long sleeps” are those that are in seconds or minutes. The longest
vacuum delay allowed is 100ms.
Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.
Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.
Regards,
Sami
Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.
Looking through the history of the sleep in vacuum_delay_point, commit
720de00af49 replaced WaitLatch with pg_usleep to allow for microsecond
sleep precision [1]/messages/by-id/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg@mail.gmail.com.
Thomas has proposed a WaitLatchUs implementation in [2]/messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com, but I have not
yet tried it.
So I see there are 2 possible options here to deal with the interrupt of a
parallel vacuum leader when a message is sent by a parallel vacuum worker.
Option 1/ something like my initial proposal which is
to create a function similar to pg_usleep that is able to deal with
interrupts in a sleep. This could be a function scoped only to vacuum.c,
so it can only be used for vacuum delay purposes.
——
Option 2/ to explore the WaitLatchUs implementation by
Thomas which will give both a latch implementation for a sleep with
the microsecond precision.
It is worth mentioning that if we do end up using WaitLatch(Us) inside
vacuum_delay_point, it will need to set only WL_TIMEOUT and
WL_EXIT_ON_PM_DEATH.
i.e.
(void) WaitLatch(MyLatch, WL_TIMEOUT| WL_EXIT_ON_PM_DEATH,
msec
WAIT_EVENT_VACUUM_DELAY);
This way it is not interrupted by a WL_LATCH_SET when a message
is set by a parallel worker.
——
Ultimately, I think option 2 may be worth a closer look as it is a cleaner
and safer approach, to detect a postmaster death.
Thoughts?
[1]: /messages/by-id/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg@mail.gmail.com
[2]: /messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com
Hi,
On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote:
Thanks for the feedback!
On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sami Imseih <samimseih@gmail.com> writes:
Reattaching the patch.
I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.The case which brought up this discussion is the pg_usleep that
is called within the vacuum_delay_point being interrupted.When I read the same code comment you cited, it sounded to me
that “long sleeps” are those that are in seconds or minutes. The longest
vacuum delay allowed is 100ms.
I think that with the proposed patch the actual wait time can be "long".
Indeed, the time between the interruptions and restarts of the nanosleep() call
will lead to drift (as mentioned in the nanosleep() man page). So, with a large
number of interruptions, the actual wait time could be way longer than the
expected wait time.
To put numbers, I did a few tests with the patch (and with v2 shared in [1]/messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal):
cost delay is 1ms and cost limit is 10.
With 50 indexes and 10 parallel workers I can see things like:
2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803
Means we waited more than the max allowed cost delay (100ms).
With 49 parallel workers, it's worst as I can see things like:
2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985
The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).
To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.
[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
With 50 indexes and 10 parallel workers I can see things like:
2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803Means we waited more than the max allowed cost delay (100ms).
With 49 parallel workers, it's worst as I can see things like:
2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal
I could not reproduce the same time you drift you observed on my
machine, so I am guessing the time drift could be worse on certain
platforms than others.
I also looked into the WaitLatchUs patch proposed by Thomas in [1]/messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com
and since my system does have epoll_pwait(2) available, I could not
achieve the sub-millisecond wait times.
A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.
Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.
"While vacuum_cost_delay can be set to fractional-millisecond values,
such delays may not be measured accurately on older platforms”
[1]: /messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com
Regards,
Sami

Attachments:
0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+63-12
A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.
oops, forgot to attach the patch. Here it is.
Attachments:
0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+63-12
Hi,
On Fri, Jul 05, 2024 at 11:49:45AM -0500, Sami Imseih wrote:
With 50 indexes and 10 parallel workers I can see things like:
2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803Means we waited more than the max allowed cost delay (100ms).
With 49 parallel workers, it's worst as I can see things like:
2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal
A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.
Thanks!
I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.
As far the patch, not thoroughly review (as it's still one option among others
being discussed)):
+ struct timespec current;
+ float time_diff;
+
+ clock_gettime(PG_INSTR_CLOCK, ¤t);
+
+ time_diff = (absolute.tv_sec - current.tv_sec) + (absolute.tv_nsec - current.tv_nsec) / 1000000000.0;
I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.
Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.
Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.
Thanks for testing.
I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.
Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.
Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?
Yes, nanosleep is going to provide the most coverage as it’s widely available.
Regards,
Sami

Attachments:
v2-0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+48-2
Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.
My email client has an issue sending attachments it seems. Reattaching
Attachments:
v2-0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+48-2
+ /*
+ * We allow nanosleep to handle interrupts and retry with the remaining time.
+ * However, since nanosleep is susceptible to time drift when interrupted
+ * frequently, we add a safeguard to break out of the nanosleep whenever the
+ * total time of the sleep exceeds the requested sleep time. Using nanosleep
+ * is a more portable approach than clock_nanosleep.
+ */
I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check. Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.
--
nathan
I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.
Can you elaborate further? I am not sure how this will work since delay is a timespec
and elapsed time is an instr_time.
Also, in every iteration of the loop, the delay must be set to the remaining time. The
purpose of the elapsed_time is to make sure that we don’t surpass requested time
delay as an additional safeguard.
Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.
I will elaborate further in the comments for the next revision.
Regards,
Sami
On Thu, Jul 11, 2024 at 01:10:25PM -0500, Sami Imseih wrote:
I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.Can you elaborate further? I am not sure how this will work since delay is a timespec
and elapsed time is an instr_time.Also, in every iteration of the loop, the delay must be set to the remaining time. The
purpose of the elapsed_time is to make sure that we don�t surpass requested time
delay as an additional safeguard.
I'm imagining something like this:
struct timespec delay;
TimestampTz end_time;
end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
do
{
long secs;
int microsecs;
TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, µsecs);
delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;
} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);
--
nathan
Hi,
On Thu, Jul 11, 2024 at 10:15:41AM -0500, Sami Imseih wrote:
I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.Thanks for testing.
I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.
Thanks!
Outside of Nathan's comment:
1 ===
+ * However, since nanosleep is susceptible to time drift when interrupted
+ * frequently, we add a safeguard to break out of the nanosleep whenever the
I'm not sure that "nanosleep is susceptible to time drift when interrupted frequently"
is a correct wording.
What about?
"
However, since the time between frequent interruptions and restarts of the
nanosleep calls can substantially lead to drift in the time when the sleep
finally completes, we add...."
2 ===
+static void vacuum_sleep(double msec);
What about a more generic name that could be used outside of vacuum?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2024-Jul-11, Nathan Bossart wrote:
I'm imagining something like this:
struct timespec delay;
TimestampTz end_time;end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
do
{
long secs;
int microsecs;TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, µsecs);delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);
This looks nicer. We could deal with clock drift easily (in case the
sysadmin winds the clock back) by testing that tv_sec+tv_nsec is not
higher than the initial time to sleep. I don't know how common this
situation is nowadays, but I remember debugging a system years ago where
autovacuum was sleeping for a very long time because of that. I can't
remember now if we did anything in the code to cope, or just told
sysadmins not to do that anymore :-)
FWIW my (Linux's) nanosleep() manpage contains this note:
If the interval specified in req is not an exact multiple of the granu‐
larity underlying clock (see time(7)), then the interval will be rounded
up to the next multiple. Furthermore, after the sleep completes, there
may still be a delay before the CPU becomes free to once again execute
the calling thread.
It's not clear to me what happens if the time to sleep is zero, so maybe
there should be a "if tv_sec == 0 && tv_nsec == 0 then break" statement
at the bottom of the loop, to quit without sleeping one more time than
needed.
For Windows, this [1]https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ looks like an interesting and possibly relevant
read (though maybe SleepEx already does what we want to do here.)
[1]: https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)
I'm imagining something like this:
struct timespec delay;
TimestampTz end_time;end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
do
{
long secs;
int microsecs;TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, µsecs);delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);
I do agree that this is cleaner code, but I am not sure I like this.
1/ TimestampDifference has a dependency on gettimeofday,
while my proposal utilizes clock_gettime. There are old discussions
that did not reach a conclusion comparing both mechanisms.
My main conclusion from these hacker discussions [1]/messages/by-id/31856.1400021891@sss.pgh.pa.us, [2]/messages/by-id/E1cO7fR-0003y0-9E@gemulon.postgresql.org and other
online discussions on the topic is clock_gettime should replace
getimeofday when possible. Precision is the main reason.
2/ It no longer uses the remain time. I think the remain time
is still required here. I did a unrealistic stress test which shows
the original proposal can handle frequent interruptions much better.
#1 in one session kicked off a vacuum
set vacuum_cost_delay = 10;
set vacuum_cost_limit = 1;
set client_min_messages = log;
update large_tbl set version = 1;
vacuum (verbose, parallel 4) large_tbl;
#2 in another session, ran a loop to continually
interrupt the vacuum leader. This was during the
“heap scan” phase of the vacuum.
PID=< pid of vacuum leader >
while :
do
kill -USR1 $PID
done
Using the proposed loop with the remainder, I noticed that
the actual time reported remains close to the requested
delay time.
LOG: 10.000000,10.013420
LOG: 10.000000,10.011188
LOG: 10.000000,10.010860
LOG: 10.000000,10.014839
LOG: 10.000000,10.004542
LOG: 10.000000,10.006035
LOG: 10.000000,10.012230
LOG: 10.000000,10.014535
LOG: 10.000000,10.009645
LOG: 10.000000,10.000817
LOG: 10.000000,10.002162
LOG: 10.000000,10.011721
LOG: 10.000000,10.011655
Using the approach mentioned by Nathan, there
are large differences between requested and actual time.
LOG: 10.000000,17.801778
LOG: 10.000000,12.795450
LOG: 10.000000,11.793723
LOG: 10.000000,11.796317
LOG: 10.000000,13.785993
LOG: 10.000000,11.803775
LOG: 10.000000,15.782767
LOG: 10.000000,31.783901
LOG: 10.000000,19.792440
LOG: 10.000000,21.795795
LOG: 10.000000,18.800412
LOG: 10.000000,16.782886
LOG: 10.000000,10.795197
LOG: 10.000000,14.793333
LOG: 10.000000,29.806556
LOG: 10.000000,18.810784
LOG: 10.000000,11.804956
LOG: 10.000000,24.809812
LOG: 10.000000,25.815600
LOG: 10.000000,22.809493
LOG: 10.000000,22.790908
LOG: 10.000000,19.699097
LOG: 10.000000,23.795613
LOG: 10.000000,24.797078
Let me know what you think?
[1]: /messages/by-id/31856.1400021891@sss.pgh.pa.us
[2]: /messages/by-id/E1cO7fR-0003y0-9E@gemulon.postgresql.org
Regards,
Sami