pg_test_timing: fix unit typo and widen diff type

Started by Chao Li16 days ago5 messageshackers
Jump to latest
#1Chao Li
li.evan.chao@gmail.com

Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
/* Did time go backwards? */
if (unlikely(diff < 0))
{
fprintf(stderr, _("Detected clock going backwards in time.\n"));
fprintf(stderr, _("Time warp: %d ms\n"), diff);
exit(1);
}
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.

To make the patch easier to process, I split it into two parts: 0001 fixes the unit in the error message, and 0002 changes the type of diff. If this gets accepted, I would be happy to squash them into one commit.

I should also note that although I noticed this while reading 82c0cb4e672, I do not think this was an oversight of that commit. More likely, because clock drift backwards is rare, this issue has simply gone unnoticed for many years.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v1-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patchapplication/octet-stream; name=v1-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch; x-unix-mode=0644Download+1-2
v1-0002-pg_test_timing-use-int64-for-largest-observed-tim.patchapplication/octet-stream; name=v1-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch; x-unix-mode=0644Download+5-6
#2Lukas Fittl
lukas@fittl.com
In reply to: Chao Li (#1)
Re: pg_test_timing: fix unit typo and widen diff type

Hi Chao,

On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

Good catch!

It looks like the use of nanoseconds for "diff" got introduced last
year in 0b096e379e6f9bd49 (as you note later in the email, today's
commit didn't actually change that part), CCing Tom and Hannu as
authors of that earlier change.

That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
there before that earlier commit, but called it "ms" (i.e.
milliseconds) in the error message.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
/* Did time go backwards? */
if (unlikely(diff < 0))
{
fprintf(stderr, _("Detected clock going backwards in time.\n"));
fprintf(stderr, _("Time warp: %d ms\n"), diff);
exit(1);
}
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.

I agree it doesn't seem sound to use an int32 for storing the result
of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
the call to pg_leftmost_one_pos32 though if we actually accept that
large a "diff" value, as in your patch.

Maybe we should error out if the diff is larger than an int32, noting
a positive time drift?

Independently of that, its worth noting we could easily emit the diff
in a larger unit (micro or milliseconds) for easier interpretation, by
just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
"diff_time" again.

Thanks,
Lukas

--
Lukas Fittl

#3Chao Li
li.evan.chao@gmail.com
In reply to: Lukas Fittl (#2)
Re: pg_test_timing: fix unit typo and widen diff type

On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:

Hi Chao,

On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

Good catch!

It looks like the use of nanoseconds for "diff" got introduced last
year in 0b096e379e6f9bd49 (as you note later in the email, today's
commit didn't actually change that part), CCing Tom and Hannu as
authors of that earlier change.

That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
there before that earlier commit, but called it "ms" (i.e.
milliseconds) in the error message.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
/* Did time go backwards? */
if (unlikely(diff < 0))
{
fprintf(stderr, _("Detected clock going backwards in time.\n"));
fprintf(stderr, _("Time warp: %d ms\n"), diff);
exit(1);
}
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.

I agree it doesn't seem sound to use an int32 for storing the result
of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
the call to pg_leftmost_one_pos32 though if we actually accept that
large a "diff" value, as in your patch.

You are right. Changed to pg_leftmost_one_pos64 in v2.

Maybe we should error out if the diff is larger than an int32, noting
a positive time drift?

I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.

Independently of that, its worth noting we could easily emit the diff
in a larger unit (micro or milliseconds) for easier interpretation, by
just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
"diff_time" again.

Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.

PFA v2 - updated 0002 for pg_leftmost_one_pos64.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patchapplication/octet-stream; name=v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch; x-unix-mode=0644Download+1-2
v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patchapplication/octet-stream; name=v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch; x-unix-mode=0644Download+6-7
#4Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#3)
Re: pg_test_timing: fix unit typo and widen diff type

On Apr 2, 2026, at 15:28, Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:

Hi Chao,

On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

Good catch!

It looks like the use of nanoseconds for "diff" got introduced last
year in 0b096e379e6f9bd49 (as you note later in the email, today's
commit didn't actually change that part), CCing Tom and Hannu as
authors of that earlier change.

That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
there before that earlier commit, but called it "ms" (i.e.
milliseconds) in the error message.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
/* Did time go backwards? */
if (unlikely(diff < 0))
{
fprintf(stderr, _("Detected clock going backwards in time.\n"));
fprintf(stderr, _("Time warp: %d ms\n"), diff);
exit(1);
}
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.

I agree it doesn't seem sound to use an int32 for storing the result
of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
the call to pg_leftmost_one_pos32 though if we actually accept that
large a "diff" value, as in your patch.

You are right. Changed to pg_leftmost_one_pos64 in v2.

Maybe we should error out if the diff is larger than an int32, noting
a positive time drift?

I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.

Independently of that, its worth noting we could easily emit the diff
in a larger unit (micro or milliseconds) for easier interpretation, by
just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
"diff_time" again.

Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.

PFA v2 - updated 0002 for pg_leftmost_one_pos64.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>

PFA v3: Fixed a CI failure.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachments:

v3-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patchapplication/octet-stream; name=v3-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch; x-unix-mode=0644Download+1-2
v3-0002-pg_test_timing-use-int64-for-largest-observed-tim.patchapplication/octet-stream; name=v3-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch; x-unix-mode=0644Download+7-8
#5Xiaopeng Wang
wxp_728@163.com
In reply to: Chao Li (#4)
Re:Re: pg_test_timing: fix unit typo and widen diff type

At 2026-04-08 12:06:07, "Chao Li" <li.evan.chao@gmail.com> wrote:

On Apr 2, 2026, at 15:28, Chao Li <li.evan.chao@gmail.com> wrote:

On Apr 2, 2026, at 12:17, Lukas Fittl <lukas@fittl.com> wrote:

Hi Chao,

On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:

Hi,

This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
```
diff = INSTR_TIME_GET_NANOSEC(diff_time);

fprintf(stderr, _("Time warp: %d ms\n"), diff);
```

Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.

Good catch!

It looks like the use of nanoseconds for "diff" got introduced last
year in 0b096e379e6f9bd49 (as you note later in the email, today's
commit didn't actually change that part), CCing Tom and Hannu as
authors of that earlier change.

That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
there before that earlier commit, but called it "ms" (i.e.
milliseconds) in the error message.

To fix that, I think there are two possible options:

1. Use INSTR_TIME_GET_MILLISEC to get “diff"
2. Change “ms" to “ns" in the error message

After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.

“diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:

* INSTR_TIME_GET_NANOSEC() explicitly returns int64:
```
#define INSTR_TIME_GET_NANOSEC(t) \
((int64) (t).ticks)
```

* The current code has a sanity check for backward clock drift:
```
/* Did time go backwards? */
if (unlikely(diff < 0))
{
fprintf(stderr, _("Detected clock going backwards in time.\n"));
fprintf(stderr, _("Time warp: %d ms\n"), diff);
exit(1);
}
```
Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.

I agree it doesn't seem sound to use an int32 for storing the result
of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
the call to pg_leftmost_one_pos32 though if we actually accept that
large a "diff" value, as in your patch.

You are right. Changed to pg_leftmost_one_pos64 in v2.

Maybe we should error out if the diff is larger than an int32, noting
a positive time drift?

I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.

Independently of that, its worth noting we could easily emit the diff
in a larger unit (micro or milliseconds) for easier interpretation, by
just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
"diff_time" again.

Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.

PFA v2 - updated 0002 for pg_leftmost_one_pos64.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

<v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>

PFA v3: Fixed a CI failure.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Hi Chao, Lukas,
Just finished reviewing the v3 patches - looks good to me. The pg_leftmost_one_pos64 fix is spot on, and using INT64_FORMAT for the error message is the right approach.
Nice catch on the unit mismatch and the int32 overflow issue. Both patches are ready to go from my perspective.
Cheers,
Xiaopeng Wang