Restart pg_usleep when interrupted

Started by Sami Imseihover 1 year ago73 messages
Jump to latest
#1Sami Imseih
samimseih@gmail.com

Attachment protected by Amazon:

[0001-Handle-Sleep-interrupts.patch]
https://us-east-1.secure-attach.amazon.com/fcdc82ce-7887-4aa1-af9e-c1161a6b1d6f/bc81fa24-41de-48f9-a767-a6d15801754b

Amazon has replaced attachment with download link. Downloads will be available until July 28, 2024, 19:59 (UTC+00:00).
[Tell us what you think] https://amazonexteu.qualtrics.com/jfe/form/SV_ehuz6zGo8YnsRKK
[For more information click here] https://docs.secure-attach.amazon.com/guide

Hi,

In the proposal by Bertrand [1]/messages/by-id/ZnbIIUQpFJKAJx2W@ip-10-97-1-34.eu-west-3.compute.internal to implement vacuum cost delay tracking
in pg_stat_progress_vacuum, it was discovered that the vacuum cost delay
ends early on the leader process of a parallel vacuum due to parallel workers
reporting progress on index vacuuming, which was introduced in 17
with commit [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=46ebdfe164c61fbac961d1eb7f40e9a684289ae6. With this patch, everytime a parallel worker
completes a vacuum index, it will send a completion message to the leader.

The facility that allows a parallel worker to report progress to the leader was
introduced in commit [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f1889729dd3ab0352dc0ccc2ffcc1b1901f8e39f.

In the case of the patch being proposed by Bertrand, the number of interrupts
will be much more frequent as parallel workers would send a message to the leader
to update the vacuum delay counters every vacuum_delay_point call.

Looking into this, the ideal solution appears to provide the ability for a pg_usleep
call to restart after being interrupted. Thanks to the usage of nanosleep as
of commit[4]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a948e49e2ef11815be0b211723bfc5b53b7f75a8, this should be trivial to do as nanosleep
provides a remaining time, which can be used to restart the sleep. This
idea was also mentioned in [5]/messages/by-id/24848.1473607575@sss.pgh.pa.us.

I am attaching a POC patch that does the $SUBJECT. Rather than changing the
existing pg_usleep, the patch introduces a function, pg_usleep_handle_interrupt,
that takes in the sleep duration and a boolean flag to force handling of the
interrupt, or not.

This function can replace pg_usleep inside vacuum_delay_point, and could be
useful in other cases in which it’s important to handle interrupts.

For Windows, the “force” = “true” flag of the new function uses SleepEx which
from what I can tell based on the code comments is a non-interruptible sleep.

Thoughts?

[1]: /messages/by-id/ZnbIIUQpFJKAJx2W@ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=46ebdfe164c61fbac961d1eb7f40e9a684289ae6
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f1889729dd3ab0352dc0ccc2ffcc1b1901f8e39f
[4]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a948e49e2ef11815be0b211723bfc5b53b7f75a8
[5]: /messages/by-id/24848.1473607575@sss.pgh.pa.us

Regards,

Sami Imseih
Amazon Web Services (AWS)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Sami Imseih (#1)
Re: Restart pg_usleep when interrupted

Hi,

I think you need to find a way to disable this "Attachment protected
by Amazon" thing:

/messages/by-id/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-000000@email.amazonses.com

We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.

...Robert

#3Sami Imseih
samimseih@gmail.com
In reply to: Robert Haas (#2)
Re: Restart pg_usleep when interrupted

I think you need to find a way to disable this "Attachment protected
by Amazon" thing:

Yes, I am looking into that. I only noticed after sending the email.

Sorry about that.

Sami

#4Sami Imseih
samimseih@gmail.com
In reply to: Robert Haas (#2)
Re: Restart pg_usleep when interrupted

We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.

…Robert

Moving to another email going forward.

Reattaching the patch.

Attachments:

0001-Handle-Sleep-interrupts.patchapplication/octet-stream; name=0001-Handle-Sleep-interrupts.patch; x-unix-mode=0644Download+22-4
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sami Imseih (#4)
Re: Restart pg_usleep when interrupted

Sami Imseih <samimseih@gmail.com> writes:

Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

regards, tom lane

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#5)
Re: Restart pg_usleep when interrupted

On Sat, Jun 29, 2024 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

In case it's useful for someone looking into that, in earlier
discussions we figured out that it is possible to have high resolution
timeouts AND support latch multiplexing on Linux, FreeBSD, macOS:

/messages/by-id/CA+hUKG+hC9mFx8tEcBsyo7-cAfWgtbRy1eDizeFuff2K7T=4bA@mail.gmail.com

#7Sami Imseih
samimseih@gmail.com
In reply to: Tom Lane (#5)
Re: Restart pg_usleep when interrupted

Thanks for the feedback!

On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sami Imseih <samimseih@gmail.com> writes:

Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

The case which brought up this discussion is the pg_usleep that
is called within the vacuum_delay_point being interrupted.

When I read the same code comment you cited, it sounded to me
that “long sleeps” are those that are in seconds or minutes. The longest
vacuum delay allowed is 100ms.

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.

Regards,

Sami

#8Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#7)
Re: Restart pg_usleep when interrupted

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.

Looking through the history of the sleep in vacuum_delay_point, commit
720de00af49 replaced WaitLatch with pg_usleep to allow for microsecond
sleep precision [1]/messages/by-id/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg@mail.gmail.com.

Thomas has proposed a WaitLatchUs implementation in [2]/messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com, but I have not
yet tried it.

So I see there are 2 possible options here to deal with the interrupt of a
parallel vacuum leader when a message is sent by a parallel vacuum worker.

Option 1/ something like my initial proposal which is
to create a function similar to pg_usleep that is able to deal with
interrupts in a sleep. This could be a function scoped only to vacuum.c,
so it can only be used for vacuum delay purposes.

——
Option 2/ to explore the WaitLatchUs implementation by
Thomas which will give both a latch implementation for a sleep with
the microsecond precision.

It is worth mentioning that if we do end up using WaitLatch(Us) inside
vacuum_delay_point, it will need to set only WL_TIMEOUT and
WL_EXIT_ON_PM_DEATH.

i.e.
(void) WaitLatch(MyLatch, WL_TIMEOUT| WL_EXIT_ON_PM_DEATH,
msec
WAIT_EVENT_VACUUM_DELAY);

This way it is not interrupted by a WL_LATCH_SET when a message
is set by a parallel worker.

——

Ultimately, I think option 2 may be worth a closer look as it is a cleaner
and safer approach, to detect a postmaster death.

Thoughts?

[1]: /messages/by-id/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg@mail.gmail.com
[2]: /messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#7)
Re: Restart pg_usleep when interrupted

Hi,

On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote:

Thanks for the feedback!

On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sami Imseih <samimseih@gmail.com> writes:

Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

The case which brought up this discussion is the pg_usleep that
is called within the vacuum_delay_point being interrupted.

When I read the same code comment you cited, it sounded to me
that “long sleeps” are those that are in seconds or minutes. The longest
vacuum delay allowed is 100ms.

I think that with the proposed patch the actual wait time can be "long".

Indeed, the time between the interruptions and restarts of the nanosleep() call
will lead to drift (as mentioned in the nanosleep() man page). So, with a large
number of interruptions, the actual wait time could be way longer than the
expected wait time.

To put numbers, I did a few tests with the patch (and with v2 shared in [1]/messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal):

cost delay is 1ms and cost limit is 10.

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#9)
Re: Restart pg_usleep when interrupted

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal

I could not reproduce the same time you drift you observed on my
machine, so I am guessing the time drift could be worse on certain
platforms than others.

I also looked into the WaitLatchUs patch proposed by Thomas in [1]/messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com
and since my system does have epoll_pwait(2) available, I could not
achieve the sub-millisecond wait times.

A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.

Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.

"While vacuum_cost_delay can be set to fractional-millisecond values,
such delays may not be measured accurately on older platforms”

[1]: /messages/by-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com

Regards,

Sami

Attachments:

0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+63-12
#11Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#10)
Re: Restart pg_usleep when interrupted

A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.

oops, forgot to attach the patch. Here it is.

Attachments:

0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+63-12
#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#10)
Re: Restart pg_usleep when interrupted

Hi,

On Fri, Jul 05, 2024 at 11:49:45AM -0500, Sami Imseih wrote:

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: /messages/by-id/ZmaXmWDL829fzAVX@ip-10-97-1-34.eu-west-3.compute.internal

A more portable approach which could be to continue using nanosleep and
add checks to ensure that nanosleep exists whenever
it goes past an absolute time. This was suggested by Bertrand in an offline
conversation. I am not yet fully convinced of this idea, but posting the patch
that implements this idea for anyone interested in looking.

Thanks!

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

As far the patch, not thoroughly review (as it's still one option among others
being discussed)):

+           struct timespec current;
+           float time_diff;
+
+           clock_gettime(PG_INSTR_CLOCK, &current);
+
+           time_diff = (absolute.tv_sec - current.tv_sec) + (absolute.tv_nsec - current.tv_nsec) / 1000000000.0;

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.

Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#13Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#12)
Re: Restart pg_usleep when interrupted

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

Thanks for testing.

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.

Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.

Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?

Yes, nanosleep is going to provide the most coverage as it’s widely available.

Regards,

Sami

Attachments:

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+48-2
#14Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#13)
Re: Restart pg_usleep when interrupted

Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.

My email client has an issue sending attachments it seems. Reattaching

Attachments:

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patchapplication/octet-stream; name=v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch; x-unix-mode=0644Download+48-2
#15Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#14)
Re: Restart pg_usleep when interrupted
+		/*
+		 * We allow nanosleep to handle interrupts and retry with the remaining time.
+		 * However, since nanosleep is susceptible to time drift when interrupted
+		 * frequently, we add a safeguard to break out of the nanosleep whenever the
+		 * total time of the sleep exceeds the requested sleep time. Using nanosleep
+		 * is a more portable approach than clock_nanosleep.
+		 */

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check. Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.

--
nathan

#16Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#15)
Re: Restart pg_usleep when interrupted

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.

Can you elaborate further? I am not sure how this will work since delay is a timespec
and elapsed time is an instr_time.

Also, in every iteration of the loop, the delay must be set to the remaining time. The
purpose of the elapsed_time is to make sure that we don’t surpass requested time
delay as an additional safeguard.

Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.

I will elaborate further in the comments for the next revision.

Regards,

Sami

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#16)
Re: Restart pg_usleep when interrupted

On Thu, Jul 11, 2024 at 01:10:25PM -0500, Sami Imseih wrote:

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.

Can you elaborate further? I am not sure how this will work since delay is a timespec
and elapsed time is an instr_time.

Also, in every iteration of the loop, the delay must be set to the remaining time. The
purpose of the elapsed_time is to make sure that we don�t surpass requested time
delay as an additional safeguard.

I'm imagining something like this:

struct timespec delay;
TimestampTz end_time;

end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

do
{
long secs;
int microsecs;

TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, &microsecs);

delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;

} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

--
nathan

#18Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#13)
Re: Restart pg_usleep when interrupted

Hi,

On Thu, Jul 11, 2024 at 10:15:41AM -0500, Sami Imseih wrote:

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

Thanks for testing.

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

Correct I attached a v2 of this patch that uses instr_time to check the elapsed
time and break out of the loop. It needs some more benchmarking.

Thanks!

Outside of Nathan's comment:

1 ===

+                * However, since nanosleep is susceptible to time drift when interrupted
+                * frequently, we add a safeguard to break out of the nanosleep whenever the

I'm not sure that "nanosleep is susceptible to time drift when interrupted frequently"
is a correct wording.

What about?

"
However, since the time between frequent interruptions and restarts of the
nanosleep calls can substantially lead to drift in the time when the sleep
finally completes, we add...."

2 ===

+static void vacuum_sleep(double msec);

What about a more generic name that could be used outside of vacuum?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#17)
Re: Restart pg_usleep when interrupted

On 2024-Jul-11, Nathan Bossart wrote:

I'm imagining something like this:

struct timespec delay;
TimestampTz end_time;

end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

do
{
long secs;
int microsecs;

TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, &microsecs);

delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;

} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

This looks nicer. We could deal with clock drift easily (in case the
sysadmin winds the clock back) by testing that tv_sec+tv_nsec is not
higher than the initial time to sleep. I don't know how common this
situation is nowadays, but I remember debugging a system years ago where
autovacuum was sleeping for a very long time because of that. I can't
remember now if we did anything in the code to cope, or just told
sysadmins not to do that anymore :-)

FWIW my (Linux's) nanosleep() manpage contains this note:

If the interval specified in req is not an exact multiple of the granu‐
larity underlying clock (see time(7)), then the interval will be rounded
up to the next multiple. Furthermore, after the sleep completes, there
may still be a delay before the CPU becomes free to once again execute
the calling thread.

It's not clear to me what happens if the time to sleep is zero, so maybe
there should be a "if tv_sec == 0 && tv_nsec == 0 then break" statement
at the bottom of the loop, to quit without sleeping one more time than
needed.

For Windows, this [1]https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ looks like an interesting and possibly relevant
read (though maybe SleepEx already does what we want to do here.)

[1]: https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

#20Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#17)
Re: Restart pg_usleep when interrupted

I'm imagining something like this:

struct timespec delay;
TimestampTz end_time;

end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

do
{
long secs;
int microsecs;

TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, &microsecs);

delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;

} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

I do agree that this is cleaner code, but I am not sure I like this.

1/ TimestampDifference has a dependency on gettimeofday,
while my proposal utilizes clock_gettime. There are old discussions
that did not reach a conclusion comparing both mechanisms.
My main conclusion from these hacker discussions [1]/messages/by-id/31856.1400021891@sss.pgh.pa.us, [2]/messages/by-id/E1cO7fR-0003y0-9E@gemulon.postgresql.org and other
online discussions on the topic is clock_gettime should replace
getimeofday when possible. Precision is the main reason.

2/ It no longer uses the remain time. I think the remain time
is still required here. I did a unrealistic stress test which shows
the original proposal can handle frequent interruptions much better.

#1 in one session kicked off a vacuum

set vacuum_cost_delay = 10;
set vacuum_cost_limit = 1;
set client_min_messages = log;
update large_tbl set version = 1;
vacuum (verbose, parallel 4) large_tbl;

#2 in another session, ran a loop to continually
interrupt the vacuum leader. This was during the
“heap scan” phase of the vacuum.

PID=< pid of vacuum leader >
while :
do
kill -USR1 $PID
done

Using the proposed loop with the remainder, I noticed that
the actual time reported remains close to the requested
delay time.

LOG: 10.000000,10.013420
LOG: 10.000000,10.011188
LOG: 10.000000,10.010860
LOG: 10.000000,10.014839
LOG: 10.000000,10.004542
LOG: 10.000000,10.006035
LOG: 10.000000,10.012230
LOG: 10.000000,10.014535
LOG: 10.000000,10.009645
LOG: 10.000000,10.000817
LOG: 10.000000,10.002162
LOG: 10.000000,10.011721
LOG: 10.000000,10.011655

Using the approach mentioned by Nathan, there
are large differences between requested and actual time.

LOG: 10.000000,17.801778
LOG: 10.000000,12.795450
LOG: 10.000000,11.793723
LOG: 10.000000,11.796317
LOG: 10.000000,13.785993
LOG: 10.000000,11.803775
LOG: 10.000000,15.782767
LOG: 10.000000,31.783901
LOG: 10.000000,19.792440
LOG: 10.000000,21.795795
LOG: 10.000000,18.800412
LOG: 10.000000,16.782886
LOG: 10.000000,10.795197
LOG: 10.000000,14.793333
LOG: 10.000000,29.806556
LOG: 10.000000,18.810784
LOG: 10.000000,11.804956
LOG: 10.000000,24.809812
LOG: 10.000000,25.815600
LOG: 10.000000,22.809493
LOG: 10.000000,22.790908
LOG: 10.000000,19.699097
LOG: 10.000000,23.795613
LOG: 10.000000,24.797078

Let me know what you think?

[1]: /messages/by-id/31856.1400021891@sss.pgh.pa.us
[2]: /messages/by-id/E1cO7fR-0003y0-9E@gemulon.postgresql.org

Regards,

Sami

#21Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#20)
#22Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#21)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#22)
#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#23)
#25Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#24)
#26Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#25)
#27Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#25)
#28Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#27)
#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#28)
#30Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#29)
#31Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#30)
#32Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#31)
#33Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#32)
#34Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#33)
#35Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#34)
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Bertrand Drouvot (#33)
#37Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#36)
#38Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#36)
#39Nathan Bossart
nathandbossart@gmail.com
In reply to: Bertrand Drouvot (#38)
#40Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#37)
#41Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#40)
#42Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#41)
#43Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#42)
#44Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#43)
#45Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#44)
#46Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#45)
#47Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#46)
#48Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#47)
#49Nathan Bossart
nathandbossart@gmail.com
In reply to: Bertrand Drouvot (#48)
#50Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#49)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Sami Imseih (#50)
#52Nathan Bossart
nathandbossart@gmail.com
In reply to: Heikki Linnakangas (#51)
#53Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#52)
#54Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#52)
#55Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#53)
#56Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#55)
#57Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#56)
#58Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#57)
#59Sami Imseih
samimseih@gmail.com
In reply to: Sami Imseih (#58)
#60Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#58)
#61Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#60)
#62Nathan Bossart
nathandbossart@gmail.com
In reply to: Sami Imseih (#61)
#63Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#62)
#64Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#62)
#65Nathan Bossart
nathandbossart@gmail.com
In reply to: Bertrand Drouvot (#64)
#66Sami Imseih
samimseih@gmail.com
In reply to: Nathan Bossart (#65)
#67Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#62)
#68Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#67)
#69Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#65)
#70Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nathan Bossart (#60)
#71Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#70)
#72Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#71)
#73Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#72)