What is a typical precision of gettimeofday()?
Over in the thread discussing the addition of UUIDv7 support [0]/messages/by-id/CAAhFRxitJv=yoGnXUgeLB_O+M7J2BJAmb5jqAT9gZ3bij3uLDA@mail.gmail.com, there
is some uncertainty about what timestamp precision one can expect from
gettimeofday().
UUIDv7 uses milliseconds since Unix epoch, but can optionally use up to
12 additional bits of timestamp precision (see [1]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#section-6.2-5.6.1), but it can also
just use a counter instead of the extra precision. The current patch
uses the counter method "because of portability concerns" (source code
comment).
I feel that we don't actually have any information about this
portability concern. Does anyone know what precision we can expect from
gettimeofday()? Can we expect the full microsecond precision usually?
[0]: /messages/by-id/CAAhFRxitJv=yoGnXUgeLB_O+M7J2BJAmb5jqAT9gZ3bij3uLDA@mail.gmail.com
/messages/by-id/CAAhFRxitJv=yoGnXUgeLB_O+M7J2BJAmb5jqAT9gZ3bij3uLDA@mail.gmail.com
[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#section-6.2-5.6.1
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#section-6.2-5.6.1
Hi,
cc: Andrey
Over in the thread discussing the addition of UUIDv7 support [0], there
is some uncertainty about what timestamp precision one can expect from
gettimeofday().UUIDv7 uses milliseconds since Unix epoch, but can optionally use up to
12 additional bits of timestamp precision (see [1]), but it can also
just use a counter instead of the extra precision. The current patch
uses the counter method "because of portability concerns" (source code
comment).I feel that we don't actually have any information about this
portability concern. Does anyone know what precision we can expect from
gettimeofday()? Can we expect the full microsecond precision usually?
Specifically in the UUIDv7 application the goal is to generate not
necessarily time-precise UUIDs but rather do our best to get *unique*
UUIDs. As I understand, this is the actual reason why the patch needs
counters.
As Linux man page puts it:
"""
The time returned by gettimeofday() is affected by discontinuous jumps
in the system time (e.g., if the system administrator manually
changes the system time).
"""
On top of that MacOS man page says:
"""
The resolution of the system clock is hardware dependent, and the time
may be updated continuously or in ``ticks.''
"""
On Windows our gettimeofday() implementation is a wrapper for
GetSystemTimePreciseAsFileTime(). The corresponding MSDN page [1]https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime is
somewhat laconic.
Considering the number of environments PostgreSQL can run in (OS +
hardware + virtualization technologies) and the fact that
hardware/software changes I doubt that it's realistic to expect any
particular guarantees from gettimeofday() in the general case.
--
Best regards,
Aleksander Alekseev
On 19.03.24 10:38, Aleksander Alekseev wrote:
Considering the number of environments PostgreSQL can run in (OS +
hardware + virtualization technologies) and the fact that
hardware/software changes I doubt that it's realistic to expect any
particular guarantees from gettimeofday() in the general case.
If we want to be robust without any guarantees from gettimeofday(), then
arguably gettimeofday() is not the right underlying function to use for
UUIDv7. I'm not arguing that, I think we can assume some reasonable
baseline for what gettimeofday() produces. But it would be good to get
some information about what that might be.
Btw., here is util-linux saying
/* Assume that the gettimeofday() has microsecond granularity */
https://github.com/util-linux/util-linux/blob/master/libuuid/src/gen_uuid.c#L232
On Wed, 20 Mar 2024 at 07:35, Peter Eisentraut <peter@eisentraut.org> wrote:
If we want to be robust without any guarantees from gettimeofday(), then
arguably gettimeofday() is not the right underlying function to use for
UUIDv7.
There's also clock_gettime which exposes its resolution using clock_getres
On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote:
I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually?
At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measure nanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets).
I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. Maybe they can add some context.
Best regards, Andrey Borodin.
I plan to send patch to pg_test_timing in a day or two
the underlying time precision on modern linux seems to be
2 ns for some Intel CPUs
10 ns for Zen4
40 ns for ARM (Ampere)
---
Hannu
|
On Tue, Jun 18, 2024 at 7:48 AM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:
Show quoted text
On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote:
I feel that we don't actually have any information about this
portability concern. Does anyone know what precision we can expect from
gettimeofday()? Can we expect the full microsecond precision usually?At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve
tried this module(slightly modified to measure nanoseconds) on some
systems, and everywhere I found ~100ns resolution (95% of ticks fall into
64ns and 128ns buckets).I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg.
Maybe they can add some context.Best regards, Andrey Borodin.
Here is the output of nanosecond-precision pg_test_timing on M1 Macbook Air
/work/postgres/src/bin/pg_test_timing % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 21.54 ns
Histogram of timing durations:
<= ns % of total running % count
0 49.1655 49.1655 68481688
1 0.0000 49.1655 0
3 0.0000 49.1655 0
7 0.0000 49.1655 0
15 0.0000 49.1655 0
31 0.0000 49.1655 0
63 50.6890 99.8545 70603742
127 0.1432 99.9976 199411
255 0.0015 99.9991 2065
511 0.0001 99.9992 98
1023 0.0001 99.9993 140
2047 0.0002 99.9995 284
4095 0.0000 99.9995 50
8191 0.0000 99.9996 65
16383 0.0002 99.9997 240
32767 0.0001 99.9998 128
65535 0.0001 99.9999 97
131071 0.0000 99.9999 58
262143 0.0000 100.0000 44
524287 0.0000 100.0000 22
1048575 0.0000 100.0000 7
2097151 0.0000 100.0000 2
First 128 exact nanoseconds:
0 49.1655 49.1655 68481688
41 16.8964 66.0619 23534708
42 33.7926 99.8545 47069034
83 0.0835 99.9380 116362
84 0.0419 99.9799 58349
125 0.0177 99.9976 24700
As you see the 40 ns internal tick gets somehow blurred into
not-quite-40-ns timing step
On Linux / ARM Ampere where __builtin_readcyclecounter() works (it
compiles but crashes on Mac OS M1, I have not yet tested on Linux M1)
the tick is exactly 40 ns and I'd expect it to be the same on M1.
Show quoted text
On Tue, Jun 18, 2024 at 5:08 PM Hannu Krosing <hannuk@google.com> wrote:
I plan to send patch to pg_test_timing in a day or two
the underlying time precision on modern linux seems to be
2 ns for some Intel CPUs
10 ns for Zen4
40 ns for ARM (Ampere)---
Hannu|
On Tue, Jun 18, 2024 at 7:48 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote:
I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually?
At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measure nanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets).
I’ll add cc Hannu, and also pg_test_timing module authors Ants ang Greg. Maybe they can add some context.
Best regards, Andrey Borodin.
On 18.06.24 07:47, Andrey M. Borodin wrote:
On 19 Mar 2024, at 13:28, Peter Eisentraut <peter@eisentraut.org> wrote:
I feel that we don't actually have any information about this portability concern. Does anyone know what precision we can expect from gettimeofday()? Can we expect the full microsecond precision usually?
At PGConf.dev Hannu Krossing draw attention to pg_test_timing module. I’ve tried this module(slightly modified to measure nanoseconds) on some systems, and everywhere I found ~100ns resolution (95% of ticks fall into 64ns and 128ns buckets).
AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't
really address the original question.
Peter Eisentraut <peter@eisentraut.org> writes:
AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't
really address the original question.
It's not exactly hard to make it do so (see attached).
I tried this on several different machines, and my conclusion is that
gettimeofday() reports full microsecond precision on any platform
anybody is likely to be running PG on today. Even my one surviving
pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like
this:
$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 901.41 ns
Histogram of timing durations:
< us % of total count
1 10.46074 348148
2 89.51495 2979181
4 0.00574 191
8 0.00430 143
16 0.00691 230
32 0.00376 125
64 0.00012 4
128 0.00303 101
256 0.00027 9
512 0.00009 3
1024 0.00009 3
I also modified pg_test_timing to measure nanoseconds not
microseconds (second patch attached), and got this:
$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 805.50 ns
Histogram of timing durations:
< ns % of total count
1 19.84234 739008
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 0.00000 0
128 0.00000 0
256 0.00000 0
512 0.00000 0
1024 80.14013 2984739
2048 0.00078 29
4096 0.00658 245
8192 0.00290 108
16384 0.00252 94
32768 0.00250 93
65536 0.00016 6
131072 0.00185 69
262144 0.00008 3
524288 0.00008 3
1048576 0.00008 3
confirming that when the result changes it generally does so by 1usec.
Applying just the second patch, I find that clock_gettime on this
old hardware seems to be limited to 1us resolution, but on my more
modern machines (mac M1, x86_64) it can tick at 40ns or less.
Even a raspberry pi 4 shows
$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 69.12 ns
Histogram of timing durations:
< ns % of total count
1 0.00000 0
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 37.59583 16317040
128 62.38568 27076131
256 0.01674 7265
512 0.00002 8
1024 0.00000 0
2048 0.00000 0
4096 0.00153 662
8192 0.00019 83
16384 0.00001 3
32768 0.00001 5
suggesting that the clock_gettime resolution is better than 64 ns.
So I concur with Hannu that it's time to adjust pg_test_timing to
resolve nanoseconds not microseconds. I gather he's created a
patch that does more than mine below, so I'll wait for that.
regards, tom lane
(resending to list and other CC:s )
Hi Tom
This is my current patch which also adds running % and optionally uses
faster way to count leading zeros, though I did not see a change from
that.
It also bucketizes first 128 ns to get better overview of exact behaviour.
We may want to put reporting this behind a flag
---
Hannu
Show quoted text
On Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't
really address the original question.It's not exactly hard to make it do so (see attached).
I tried this on several different machines, and my conclusion is that
gettimeofday() reports full microsecond precision on any platform
anybody is likely to be running PG on today. Even my one surviving
pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like
this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 901.41 ns
Histogram of timing durations:
< us % of total count
1 10.46074 348148
2 89.51495 2979181
4 0.00574 191
8 0.00430 143
16 0.00691 230
32 0.00376 125
64 0.00012 4
128 0.00303 101
256 0.00027 9
512 0.00009 3
1024 0.00009 3I also modified pg_test_timing to measure nanoseconds not
microseconds (second patch attached), and got this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 805.50 ns
Histogram of timing durations:
< ns % of total count
1 19.84234 739008
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 0.00000 0
128 0.00000 0
256 0.00000 0
512 0.00000 0
1024 80.14013 2984739
2048 0.00078 29
4096 0.00658 245
8192 0.00290 108
16384 0.00252 94
32768 0.00250 93
65536 0.00016 6
131072 0.00185 69
262144 0.00008 3
524288 0.00008 3
1048576 0.00008 3confirming that when the result changes it generally does so by 1usec.
Applying just the second patch, I find that clock_gettime on this
old hardware seems to be limited to 1us resolution, but on my more
modern machines (mac M1, x86_64) it can tick at 40ns or less.
Even a raspberry pi 4 shows$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 69.12 ns
Histogram of timing durations:
< ns % of total count
1 0.00000 0
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 37.59583 16317040
128 62.38568 27076131
256 0.01674 7265
512 0.00002 8
1024 0.00000 0
2048 0.00000 0
4096 0.00153 662
8192 0.00019 83
16384 0.00001 3
32768 0.00001 5suggesting that the clock_gettime resolution is better than 64 ns.
So I concur with Hannu that it's time to adjust pg_test_timing to
resolve nanoseconds not microseconds. I gather he's created a
patch that does more than mine below, so I'll wait for that.regards, tom lane
Attachments:
pg_test_timing.nanoseconds.patchtext/x-patch; charset=US-ASCII; name=pg_test_timing.nanoseconds.patchDownload+43-14
I also have a variant that uses the low-level CPU cycle counter
directly (attached)
It currently only works on clang, as it is done using
__builtin_readcyclecounter() in order to support both x64 and ARM.
This one is there to understand the overhead of the calculations when
going from cycle counter to POSIX time struct
This works OK with Clang, but we should probably not integrate this
directly into the code as it has some interesting corner cases. For
example Apple's clang does compile __builtin_readcyclecounter() but
crashes with unknown instruction when trying to run it.
Therefore I have not integrated it into Makefile so if you want to use
it, just copy it into src/bin/pg_test_timing and run
cd src/bin/pgtest_timing
mv pg_test_timing.c pg_test_timing.c.backup
cp pg_test_cyclecounter.c pg_test_timing.c
make
mv pg_test_timing pg_test_cyclecounter
mv pg_test_timing.c.backup pg_test_timing.c
It gives output like
Testing timing overhead for 3 seconds.
Total 25000001 ticks in 1000000073 ns, 24999999.175000 ticks per ns
This CPU is running at 24999999 ticks / second, will run test for 74999997 ticks
loop_count 287130958Per loop time including overhead: 10.45 ns, min: 0
ticks (0.0 ns), same: 212854591
Total ticks in: 74999997, in: 3000000541 nr
Log2 histogram of timing durations:
< ticks ( < ns) % of total running % count
1 ( 40.0) 74.1315 74.1315 212854591
2 ( 80.0) 25.8655 99.9970 74267876
4 ( 160.0) 0.0000 99.9970 7
8 ( 320.0) 0.0000 99.9970 3
16 ( 640.0) 0.0000 99.9970 1
32 ( 1280.0) 0.0000 99.9971 27
64 ( 2560.0) 0.0012 99.9983 3439
128 ( 5120.0) 0.0016 99.9999 4683
256 ( 10240.0) 0.0001 100.0000 265
512 ( 20480.0) 0.0000 100.0000 37
1024 ( 40960.0) 0.0000 100.0000 23
2048 ( 81920.0) 0.0000 100.0000 6
First 64 ticks --
0 ( 0.0) 74.1315 74.1315 212854591
1 ( 40.0) 25.8655 99.9970 74267876
2 ( 80.0) 0.0000 99.9970 2
3 ( 120.0) 0.0000 99.9970 5
4 ( 160.0) 0.0000 99.9970 2
6 ( 240.0) 0.0000 99.9983 1
13 ( 520.0) 0.0000 100.0000 1
...
59 ( 2360.0) 0.0000 100.0000 140
60 ( 2400.0) 0.0001 100.0000 210
61 ( 2440.0) 0.0002 100.0000 497
62 ( 2480.0) 0.0002 100.0000 524
63 ( 2520.0) 0.0001 100.0000 391
If you run on some interesting hardware, please share the results.
If we her enough I will put them together in a spreadsheet and share
I also attach my lightning talk slides here
---
Hannu
Show quoted text
On Thu, Jun 20, 2024 at 12:41 PM Hannu Krosing <hannuk@google.com> wrote:
(resending to list and other CC:s )
Hi Tom
This is my current patch which also adds running % and optionally uses
faster way to count leading zeros, though I did not see a change from
that.It also bucketizes first 128 ns to get better overview of exact behaviour.
We may want to put reporting this behind a flag
---
HannuOn Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't
really address the original question.It's not exactly hard to make it do so (see attached).
I tried this on several different machines, and my conclusion is that
gettimeofday() reports full microsecond precision on any platform
anybody is likely to be running PG on today. Even my one surviving
pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like
this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 901.41 ns
Histogram of timing durations:
< us % of total count
1 10.46074 348148
2 89.51495 2979181
4 0.00574 191
8 0.00430 143
16 0.00691 230
32 0.00376 125
64 0.00012 4
128 0.00303 101
256 0.00027 9
512 0.00009 3
1024 0.00009 3I also modified pg_test_timing to measure nanoseconds not
microseconds (second patch attached), and got this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 805.50 ns
Histogram of timing durations:
< ns % of total count
1 19.84234 739008
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 0.00000 0
128 0.00000 0
256 0.00000 0
512 0.00000 0
1024 80.14013 2984739
2048 0.00078 29
4096 0.00658 245
8192 0.00290 108
16384 0.00252 94
32768 0.00250 93
65536 0.00016 6
131072 0.00185 69
262144 0.00008 3
524288 0.00008 3
1048576 0.00008 3confirming that when the result changes it generally does so by 1usec.
Applying just the second patch, I find that clock_gettime on this
old hardware seems to be limited to 1us resolution, but on my more
modern machines (mac M1, x86_64) it can tick at 40ns or less.
Even a raspberry pi 4 shows$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 69.12 ns
Histogram of timing durations:
< ns % of total count
1 0.00000 0
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 37.59583 16317040
128 62.38568 27076131
256 0.01674 7265
512 0.00002 8
1024 0.00000 0
2048 0.00000 0
4096 0.00153 662
8192 0.00019 83
16384 0.00001 3
32768 0.00001 5suggesting that the clock_gettime resolution is better than 64 ns.
So I concur with Hannu that it's time to adjust pg_test_timing to
resolve nanoseconds not microseconds. I gather he's created a
patch that does more than mine below, so I'll wait for that.regards, tom lane
Another thing I changed in reporting was to report <= ns instead of < ns
This was inspired by not wanting to report "zero ns" as "< 1 ns" and
easiest was to change them all to <=
Show quoted text
On Thu, Jun 20, 2024 at 12:41 PM Hannu Krosing <hannuk@google.com> wrote:
(resending to list and other CC:s )
Hi Tom
This is my current patch which also adds running % and optionally uses
faster way to count leading zeros, though I did not see a change from
that.It also bucketizes first 128 ns to get better overview of exact behaviour.
We may want to put reporting this behind a flag
---
HannuOn Wed, Jun 19, 2024 at 6:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't
really address the original question.It's not exactly hard to make it do so (see attached).
I tried this on several different machines, and my conclusion is that
gettimeofday() reports full microsecond precision on any platform
anybody is likely to be running PG on today. Even my one surviving
pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like
this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 901.41 ns
Histogram of timing durations:
< us % of total count
1 10.46074 348148
2 89.51495 2979181
4 0.00574 191
8 0.00430 143
16 0.00691 230
32 0.00376 125
64 0.00012 4
128 0.00303 101
256 0.00027 9
512 0.00009 3
1024 0.00009 3I also modified pg_test_timing to measure nanoseconds not
microseconds (second patch attached), and got this:$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 805.50 ns
Histogram of timing durations:
< ns % of total count
1 19.84234 739008
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 0.00000 0
128 0.00000 0
256 0.00000 0
512 0.00000 0
1024 80.14013 2984739
2048 0.00078 29
4096 0.00658 245
8192 0.00290 108
16384 0.00252 94
32768 0.00250 93
65536 0.00016 6
131072 0.00185 69
262144 0.00008 3
524288 0.00008 3
1048576 0.00008 3confirming that when the result changes it generally does so by 1usec.
Applying just the second patch, I find that clock_gettime on this
old hardware seems to be limited to 1us resolution, but on my more
modern machines (mac M1, x86_64) it can tick at 40ns or less.
Even a raspberry pi 4 shows$ ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 69.12 ns
Histogram of timing durations:
< ns % of total count
1 0.00000 0
2 0.00000 0
4 0.00000 0
8 0.00000 0
16 0.00000 0
32 0.00000 0
64 37.59583 16317040
128 62.38568 27076131
256 0.01674 7265
512 0.00002 8
1024 0.00000 0
2048 0.00000 0
4096 0.00153 662
8192 0.00019 83
16384 0.00001 3
32768 0.00001 5suggesting that the clock_gettime resolution is better than 64 ns.
So I concur with Hannu that it's time to adjust pg_test_timing to
resolve nanoseconds not microseconds. I gather he's created a
patch that does more than mine below, so I'll wait for that.regards, tom lane
Hannu Krosing <hannuk@google.com> writes:
This is my current patch which also adds running % and optionally uses
faster way to count leading zeros, though I did not see a change from
that.
I've not read the patch yet, but I did create a CF entry [1]https://commitfest.postgresql.org/48/5066/
to get some CI cycles on this. The cfbot complains [2]http://cfbot.cputube.org/highlights/all.html#5066 about
[19:24:31.951] pg_test_timing.c: In function ‘output’:
[19:24:31.951] pg_test_timing.c:229:11: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64’ {aka ‘long long int’} [-Werror=format=]
[19:24:31.951] 229 | printf("%*ld %*.4f %*.4f %*lld\n",
[19:24:31.951] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[19:24:31.951] 230 | Max(8, len1), i,
[19:24:31.951] | ~
[19:24:31.951] | |
[19:24:31.951] | int64 {aka long long int}
which seems a bit confused, but anyway you cannot assume that int64 is
a match for "%ld", or "%lld" either. What we generally do for this
elsewhere is to explicitly cast printf arguments to long long int.
Also there's this on Windows:
[19:23:48.231] ../src/bin/pg_test_timing/pg_test_timing.c(162): warning C4067: unexpected tokens following preprocessor directive - expected a newline
regards, tom lane
[1]: https://commitfest.postgresql.org/48/5066/
[2]: http://cfbot.cputube.org/highlights/all.html#5066
I wrote:
Hannu Krosing <hannuk@google.com> writes:
This is my current patch which also adds running % and optionally uses
faster way to count leading zeros, though I did not see a change from
that.
I've not read the patch yet, but I did create a CF entry [1]
to get some CI cycles on this. The cfbot complains [2] about
[ a couple of things ]
Here's a cleaned-up code patch addressing the cfbot complaints
and making the output logic a bit neater.
I think this is committable code-wise, but the documentation needs
work, if not indeed a complete rewrite. The examples are now
horribly out of date, and it seems that the "Clock Hardware and Timing
Accuracy" section is quite obsolete as well, since it suggests that
the best available accuracy is ~100ns.
TBH I'm inclined to rip most of the OS-specific and hardware-specific
information out of there, as it's not something we're likely to
maintain well even if we got it right for current reality.
regards, tom lane
Attachments:
v2-pg_test_timing-nanoseconds.patchtext/x-diff; charset=us-ascii; name=v2-pg_test_timing-nanoseconds.patchDownload+71-27
BTW, getting back to the original point of the thread: I duplicated
Hannu's result showing that on Apple M1 the clock tick seems to be
about 40ns. But look at what I got with the v2 patch on my main
workstation (full output attached):
$ ./pg_test_timing
...
Per loop time including overhead: 16.60 ns
...
Timing durations less than 128 ns:
ns % of total running % count
15 3.2738 3.2738 5914914
16 49.0772 52.3510 88668783
17 36.4662 88.8172 65884173
18 9.5639 98.3810 17279249
19 1.5746 99.9556 2844873
20 0.0416 99.9972 75125
21 0.0004 99.9976 757
...
It sure looks like this is exact-to-the-nanosecond results,
since the modal values match the overall per-loop timing,
and there are no zero measurements.
This is a Dell tower from 2021, running RHEL8 on an Intel Xeon W-2245.
Not exactly top-of-the-line stuff.
regards, tom lane
Attachments:
pg_test_timing.results.txttext/plain; charset=us-ascii; name=pg_test_timing.results.txtDownload
Hi Tom,
On various Intel CPUs I got either steps close to single nanosecond or
sometimes a little more on older ones
One specific CPU moved in in 2 tick increments while the ration to ns
was 2,1/1 or 2100 ticks per microsecond.
On Zen4 AMD the step seems to be 10 ns, even though the tick-to-ns
ratio is 2.6 / 1 , so reading ticks directly gives 26, 54, ...
Also, reading directly in ticks on M1 gave "loop time including
overhead: 2.13 ns" (attached code works on Clang, not sure about GCC)
I'll also take a look at the docs and try to propose something
Do we also need tests for this one ?
----
Hannu
Show quoted text
On Tue, Jul 2, 2024 at 7:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, getting back to the original point of the thread: I duplicated
Hannu's result showing that on Apple M1 the clock tick seems to be
about 40ns. But look at what I got with the v2 patch on my main
workstation (full output attached):$ ./pg_test_timing
...
Per loop time including overhead: 16.60 ns
...
Timing durations less than 128 ns:
ns % of total running % count
15 3.2738 3.2738 5914914
16 49.0772 52.3510 88668783
17 36.4662 88.8172 65884173
18 9.5639 98.3810 17279249
19 1.5746 99.9556 2844873
20 0.0416 99.9972 75125
21 0.0004 99.9976 757
...It sure looks like this is exact-to-the-nanosecond results,
since the modal values match the overall per-loop timing,
and there are no zero measurements.This is a Dell tower from 2021, running RHEL8 on an Intel Xeon W-2245.
Not exactly top-of-the-line stuff.regards, tom lane
Attachments:
pg_test_ticking.ctext/x-csrc; charset=US-ASCII; name=pg_test_ticking.cDownload
Also the step on M1 is slightly above 40ns (41.7ns) , but exactly 40
ns on Ampere Altra.
## M1 on MacBooc Air
Testing timing overhead for 3 seconds.
Total 24000177 ticks in 1000000056 ns, 24000175.655990 ticks per ns
This CPU is running at 24000175 ticks / second, will run test for 72000525 ticks
loop_count 1407639953Per loop time including overhead: 2.13 ns, min: 0
ticks (0.0 ns), same: 1335774969
Total ticks in: 72000525, in: 3000002260 nr
Log2(x+1) histogram of timing durations:
<= ticks ( <= ns) % of total running % count
0 ( 41.7) 94.8946 94.8946 1335774969
2 ( 83.3) 5.1051 99.9997 71861227
6 ( 166.7) 0.0001 99.9998 757
14 ( 333.3) 0.0000 99.9998 0
30 ( 666.7) 0.0002 99.9999 2193
62 ( 1333.3) 0.0000 100.0000 274
126 ( 2666.6) 0.0000 100.0000 446
254 ( 5333.3) 0.0000 100.0000 87
First 64 ticks --
0 ( 0.0) 94.8946 94.8946 1335774969
1 ( 41.7) 5.1032 99.9997 71834980
2 ( 83.3) 0.0019 99.9998 26247
3 ( 125.0) 0.0001 99.9998 757
15 ( 625.0) 0.0000 100.0000 1
## Ampere Altra
Testing timing overhead for 3 seconds.
Total 25000002 ticks in 1000000074 ns, 25000000.150000 ticks per ns
This CPU is running at 25000000 ticks / second, will run test for 75000000 ticks
loop_count 291630863Per loop time including overhead: 10.29 ns, min: 0
ticks (0.0 ns), same: 217288944
Total ticks in: 75000000, in: 3000000542 nr
Log2(x+1) histogram of timing durations:
<= ticks ( <= ns) % of total running % count
0 ( 40.0) 74.5082 74.5082 217288944
2 ( 80.0) 25.4886 99.9968 74332703
6 ( 160.0) 0.0000 99.9968 5
14 ( 320.0) 0.0000 99.9968 0
30 ( 640.0) 0.0000 99.9968 31
62 ( 1280.0) 0.0011 99.9979 3123
126 ( 2560.0) 0.0020 99.9999 5848
254 ( 5120.0) 0.0001 100.0000 149
510 ( 10240.0) 0.0000 100.0000 38
1022 ( 20480.0) 0.0000 100.0000 21
2046 ( 40960.0) 0.0000 100.0000 1
First 64 ticks --
0 ( 0.0) 74.5082 74.5082 217288944
1 ( 40.0) 25.4886 99.9968 74332699
2 ( 80.0) 0.0000 99.9968 4
3 ( 120.0) 0.0000 99.9968 1
4 ( 160.0) 0.0000 99.9968 3
Show quoted text
On Tue, Jul 2, 2024 at 7:31 PM Hannu Krosing <hannuk@google.com> wrote:
Hi Tom,
On various Intel CPUs I got either steps close to single nanosecond or
sometimes a little more on older onesOne specific CPU moved in in 2 tick increments while the ration to ns
was 2,1/1 or 2100 ticks per microsecond.On Zen4 AMD the step seems to be 10 ns, even though the tick-to-ns
ratio is 2.6 / 1 , so reading ticks directly gives 26, 54, ...Also, reading directly in ticks on M1 gave "loop time including
overhead: 2.13 ns" (attached code works on Clang, not sure about GCC)I'll also take a look at the docs and try to propose something
Do we also need tests for this one ?
----
HannuOn Tue, Jul 2, 2024 at 7:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, getting back to the original point of the thread: I duplicated
Hannu's result showing that on Apple M1 the clock tick seems to be
about 40ns. But look at what I got with the v2 patch on my main
workstation (full output attached):$ ./pg_test_timing
...
Per loop time including overhead: 16.60 ns
...
Timing durations less than 128 ns:
ns % of total running % count
15 3.2738 3.2738 5914914
16 49.0772 52.3510 88668783
17 36.4662 88.8172 65884173
18 9.5639 98.3810 17279249
19 1.5746 99.9556 2844873
20 0.0416 99.9972 75125
21 0.0004 99.9976 757
...It sure looks like this is exact-to-the-nanosecond results,
since the modal values match the overall per-loop timing,
and there are no zero measurements.This is a Dell tower from 2021, running RHEL8 on an Intel Xeon W-2245.
Not exactly top-of-the-line stuff.regards, tom lane
Hannu Krosing <hannuk@google.com> writes:
Also, reading directly in ticks on M1 gave "loop time including
overhead: 2.13 ns" (attached code works on Clang, not sure about GCC)
I don't think we should mess with that, given the portability
problems you mentioned upthread.
I'll also take a look at the docs and try to propose something
OK.
Do we also need tests for this one ?
Yeah, it was annoying me that we are eating the overhead of a TAP test
for pg_test_timing and yet it covers barely a third of the code [1]https://coverage.postgresql.org/src/bin/pg_test_timing/pg_test_timing.c.gcov.html.
We obviously can't expect any specific numbers out of a test, but I
was contemplating running "pg_test_timing -d 1" and just checking for
(a) zero exit code and (b) the expected header lines in the output.
regards, tom lane
[1]: https://coverage.postgresql.org/src/bin/pg_test_timing/pg_test_timing.c.gcov.html
On Tue, Jul 2, 2024 at 7:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Do we also need tests for this one ?
Yeah, it was annoying me that we are eating the overhead of a TAP test
for pg_test_timing and yet it covers barely a third of the code [1].
We obviously can't expect any specific numbers out of a test, but I
was contemplating running "pg_test_timing -d 1" and just checking for
(a) zero exit code and (b) the expected header lines in the output.
At least "does it run" tests should be there -
For example with the current toolchain on MacOS I was able to compile
__builtin_readcyclecounter(); but it crashed when the result was
executed.
The same code compiled *and run* fine on same laptop with Ubuntu 24.04
We might also want to have some testing about available speedups from
pg_bitmanip.h being used, but that could be tricky to test in an
universal way.
--
Hannu
Hannu Krosing <hannuk@google.com> writes:
At least "does it run" tests should be there -
For example with the current toolchain on MacOS I was able to compile
__builtin_readcyclecounter(); but it crashed when the result was
executed.
The same code compiled *and run* fine on same laptop with Ubuntu 24.04
We might also want to have some testing about available speedups from
pg_bitmanip.h being used, but that could be tricky to test in an
universal way.
Keep in mind that pg_test_timing is not just some random exercise in a
vacuum. The point of it IMV is to provide data about the performance
one can expect from the instr_time.h infrastructure, which bears on
what kind of resolution EXPLAIN ANALYZE and other features have. So
if we did want to depend on read_tsc() or __builtin_readcyclecounter()
or what-have-you, the way to go about it would be to change
instr_time.h to compile code that uses that. I would consider that
to be a separate patch from what we're doing to pg_test_timing here.
regards, tom lane