GNU/Hurd portability patches
Hi,
please find attached the current patches required to get master built
and the testsuites run on Debian's hurd-i386 port. I have not had the
time to test the hurd-amd64 port in the same fashion, but will do so
next.
As mentioned in this thread[1]/messages/by-id/685a3ccd.170a0220.2d27e6.2232@mx.google.com, one needs a fairly recent kernel for the
high-resolution timers in order to avoid regression test failures, and
the buildfarm client run does not like debug_parallel_query.
Michael
[1]: /messages/by-id/685a3ccd.170a0220.2d27e6.2232@mx.google.com
Attachments:
0001-Make-safeguard-against-incorrect-fd-flags-for-fsync-.patchtext/x-diff; charset=us-asciiDownload
From 2cf70fa676bacf97c24fd0b80bab6fec06b9f2db Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@debian.org>
Date: Tue, 10 Jun 2025 12:23:17 +0200
Subject: [PATCH 1/2] Make safeguard against incorrect fd flags for fsync more
portable.
The current code assumed that O_RDONLY is defined as 0, but this is not
universally the case. To fix this, mask the flags with O_ACCMODE and
assert that they are O_RDONLY for directories or not O_RDONLY for files.
Co-authored-by: Samuel Thibault
---
src/backend/storage/file/fd.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 0e8299dd556..cd1e661974a 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -411,14 +411,12 @@ pg_fsync(int fd)
{
int desc_flags = fcntl(fd, F_GETFL);
- /*
- * O_RDONLY is historically 0, so just make sure that for directories
- * no write flags are used.
- */
+ desc_flags &= O_ACCMODE;
+
if (S_ISDIR(st.st_mode))
- Assert((desc_flags & (O_RDWR | O_WRONLY)) == 0);
+ Assert(desc_flags == O_RDONLY);
else
- Assert((desc_flags & (O_RDWR | O_WRONLY)) != 0);
+ Assert(desc_flags != O_RDONLY);
}
errno = 0;
#endif
--
2.39.5
0002-Make-sure-IOV_MAX-is-defined.patchtext/x-diff; charset=us-asciiDownload
From 1dec060f19bc3bea23ec3b2b97f93b0b60fdd63e Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@debian.org>
Date: Tue, 10 Jun 2025 23:34:14 +0200
Subject: [PATCH 2/2] Make sure IOV_MAX is defined.
We stopped defining IOV_MAX on non-Windows systems since 75357ab9 under
the assumption that every non-Windows system defines it in limits.h
already. However, the GNU (Hurd) system does not define this limit.
As POSIX does not mandate IOV_MAX be defined, define it to 16 if it is
undefined.
Co-authored-by: Christoph Berg
---
src/include/port/pg_iovec.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/include/port/pg_iovec.h b/src/include/port/pg_iovec.h
index df40c7208be..d29721bc7e4 100644
--- a/src/include/port/pg_iovec.h
+++ b/src/include/port/pg_iovec.h
@@ -21,9 +21,6 @@
#else
-/* POSIX requires at least 16 as a maximum iovcnt. */
-#define IOV_MAX 16
-
/* Define our own POSIX-compatible iovec struct. */
struct iovec
{
@@ -33,6 +30,11 @@ struct iovec
#endif
+/* POSIX requires at least 16 as a maximum iovcnt. */
+#ifndef IOV_MAX
+#define IOV_MAX 16
+#endif
+
/*
* Define a reasonable maximum that is safe to use on the stack in arrays of
* struct iovec and other small types. The operating system could limit us to
--
2.39.5
Michael Banck <mbanck@gmx.net> writes:
please find attached the current patches required to get master built
and the testsuites run on Debian's hurd-i386 port. I have not had the
time to test the hurd-amd64 port in the same fashion, but will do so
next.
Pushed, after some fooling with the comments and commit messages.
Please go ahead and set up a Hurd buildfarm member, so that it
stays fixed.
regards, tom lane
Hi,
On Tue, Jul 01, 2025 at 12:41:50PM -0400, Tom Lane wrote:
Michael Banck <mbanck@gmx.net> writes:
please find attached the current patches required to get master built
and the testsuites run on Debian's hurd-i386 port. I have not had the
time to test the hurd-amd64 port in the same fashion, but will do so
next.Pushed, after some fooling with the comments and commit messages.
Thanks! Also for back-patching them.
Regarding the comment,
| * If <limits.h> didn't define IOV_MAX, define our own. X/Open requires at
| * least 16. (GNU Hurd apparently feel that they're not bound by X/Open,
| * because they don't define this symbol at all.)
I personally don't care much about those missing limits on the Hurd, but
Thomas mentioned in
CA+hUKG+tqFVY7Fi=WBvZ6-UsATjcPNBDtphDm7YLjevm2kxSvw@mail.gmail.com (and
Samuel Thibault cited the same sentence to me now when I discussed the
commit with him) that POSIX said "A definition of one of the symbolic
constants in the following list shall be omitted from <limits.h> on
specific implementations where the corresponding value is equal to or
greater than the stated minimum, but is unspecified". So "requires at
least 16" might be a bit too strong here, AIUI.
Please go ahead and set up a Hurd buildfarm member, so that it
stays fixed.
Right, will look into this next.
Michael
Michael Banck <mbanck@gmx.net> writes:
On Tue, Jul 01, 2025 at 12:41:50PM -0400, Tom Lane wrote:
| * If <limits.h> didn't define IOV_MAX, define our own. X/Open requires at
| * least 16. (GNU Hurd apparently feel that they're not bound by X/Open,
| * because they don't define this symbol at all.)
I personally don't care much about those missing limits on the Hurd, but
Thomas mentioned in
CA+hUKG+tqFVY7Fi=WBvZ6-UsATjcPNBDtphDm7YLjevm2kxSvw@mail.gmail.com (and
Samuel Thibault cited the same sentence to me now when I discussed the
commit with him) that POSIX said "A definition of one of the symbolic
constants in the following list shall be omitted from <limits.h> on
specific implementations where the corresponding value is equal to or
greater than the stated minimum, but is unspecified". So "requires at
least 16" might be a bit too strong here, AIUI.
Oh, I missed that bit of the spec. I think "requires at least 16"
is correct anyway, but the parenthetical remark isn't really right.
Not sure if it's worth changing --- the end result is the same in
any case.
regards, tom lane
On Tue, Jul 01, 2025 at 10:01:37PM +0200, Michael Banck wrote:
On Tue, Jul 01, 2025 at 12:41:50PM -0400, Tom Lane wrote:
Pushed, after some fooling with the comments and commit messages.
Thanks! Also for back-patching them.
Catching up on this thread after-the-fact, specifically looking at
29213636e6cd as I did the original check tweaked here for O_RDONLY.
Agreed that a backpatch should be OK as done. The buildfarm looks OK
currently.
--
Michael
Hello hackers,
02.07.2025 02:53, Michael Paquier wrote:
Catching up on this thread after-the-fact, specifically looking at
29213636e6cd as I did the original check tweaked here for O_RDONLY.
Agreed that a backpatch should be OK as done. The buildfarm looks OK
currently.
Three months later we can see a number of failures produced by that
animal on several branches, e.g. [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-19%2007%3A29%3A06:
timed out after 3600 secs
---
# +++ regress install-check in src/test/modules/test_shm_mq +++
# using postmaster on /home/demo/build-farm-19.1/buildroot/tmp/buildfarm-GBEDDQ, port 5678
---- hang ----
I've spun up Debian Hurd locally, using [2]https://cdimage.debian.org/cdimage/ports/latest/hurd-amd64/debian-hurd-amd64-20250807.img.tar.xz, and reproduced this, just by
running: `make -s check -C src/test/modules/test_shm_mq/` in a loop.
It's not that easy to see a backtrace of the running processes on that OS,
but with some debug logging (attached), I can see that the test backend
process or shm_mq background worker just gets stuck shortly after poll().
One example:
echo "
log_min_messages = DEBUG5
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
autovacuum = off
" > /tmp/extra.config
for i in {1..100}; do echo "ITERATION $i"; TEMP_CONFIG=/tmp/extra.config \
timeout 60 make -s check -C src/test/modules/test_shm_mq/ || break; done; \
ps -A | grep postgres
...
ITERATION 9
ok 1 - test_shm_mq 14498 ms
ITERATION 10
ok 1 - test_shm_mq 34759 ms
ITERATION 11
# +++ regress check in src/test/modules/test_shm_mq +++
# initializing database system by copying initdb template
# using temp instance on port 58928 with PID 28848
make: *** [../../../../src/makefiles/pgxs.mk:451: check] Terminated
demo 28848 p0 So 0:00.03 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28849 - Ssfo 0:00.06 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28850 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28851 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28852 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28853 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28855 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28856 - Ssfo 0:00.00 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28862 - Ssfo 0:01.13 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
demo 28866 - Ssfo 0:00.87 postgres -D /home/demo/postgresql/src/test/modules/test_shm_mq/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-McfSYJ
The process 28866 is stuck, it doesn't response to SIGTERM.
src/test/modules/test_shm_mq/log/postmaster.log contains:
2025-09-21 05:48:00.749 BST [28862:23] pg_regress/test_shm_mq LOG: statement: SELECT test_shm_mq(100, (select
string_agg(chr(32+(random()*95)::int), '') from generate_series(1,(100+200*random())::int)), 10000, 1);
2025-09-21 05:48:00.750 BST [28848:59] DEBUG: postmaster received pmsignal signal
2025-09-21 05:48:00.750 BST [28848:60] DEBUG: registering background worker ""
2025-09-21 05:48:00.750 BST [28848:61] DEBUG: assigned pm child slot 240 for bgworker
2025-09-21 05:48:00.750 BST [28848:62] DEBUG: starting background worker process ""
2025-09-21 05:48:00.751 BST [28866:1] DEBUG: find_in_path: trying
"/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq"
2025-09-21 05:48:00.751 BST [28866:2] DEBUG: find_in_path: trying
"/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq.so"
...
!!!shm_mq_receive_bytes[28866]| before WaitLatch
!!!WaitEventSetWaitBlock POLL[28866]| rc: 1
^^^ the last message from that process ^^^
Other run:
2025-09-21 06:03:21.041 BST [29427:23] pg_regress/test_shm_mq LOG: statement: SELECT test_shm_mq(100, (select
string_agg(chr(32+(random()*95)::int), '') from generate_series(1,(100+200*random())::int)), 10000, 1);
2025-09-21 06:03:21.042 BST [29412:64] DEBUG: postmaster received pmsignal signal
2025-09-21 06:03:21.042 BST [29412:65] DEBUG: registering background worker ""
2025-09-21 06:03:21.042 BST [29412:66] DEBUG: assigned pm child slot 239 for bgworker
2025-09-21 06:03:21.042 BST [29412:67] DEBUG: starting background worker process ""
2025-09-21 06:03:21.043 BST [29431:1] DEBUG: find_in_path: trying
"/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq"
2025-09-21 06:03:21.043 BST [29431:2] DEBUG: find_in_path: trying
"/home/demo/postgresql/tmp_install/usr/local/pgsql/lib/test_shm_mq.so"
!!!WaitEventSetWaitBlock POLL[29431]| rc: 1
!!!WaitEventSetWaitBlock[29431] 1
!!!WaitEventSetWaitBlock[29431] 2
!!!WaitEventSetWaitBlock[29431] 3
!!!WaitEventSetWaitBlock[29431] 4
!!!WaitEventSetWaitBlock[29431] 5
^^^ the last message from the process 29431 ^^^
So it seems to me that Hurd is not mature enough yet to test Postgres.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-19%2007%3A29%3A06
[2]: https://cdimage.debian.org/cdimage/ports/latest/hurd-amd64/debian-hurd-amd64-20250807.img.tar.xz
Best regards,
Alexander
Attachments:
waitventset-debugging.patchtext/x-patch; charset=UTF-8; name=waitventset-debugging.patchDownload
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 2c79a649f46..bb8eff28581 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1015,8 +1015,10 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
* at top of loop, because setting an already-set latch is much
* cheaper than setting one that has been reset.
*/
+fprintf(stderr, "!!!shm_mq_send_bytes[%d]| before WaitLatch\n", getpid());
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
WAIT_EVENT_MESSAGE_QUEUE_SEND);
+fprintf(stderr, "!!!shm_mq_send_bytes[%d]| after WaitLatch\n", getpid());
/* Reset the latch so we don't spin. */
ResetLatch(MyLatch);
@@ -1161,8 +1163,10 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
* loop, because setting an already-set latch is much cheaper than
* setting one that has been reset.
*/
+fprintf(stderr, "!!!shm_mq_receive_bytes[%d]| before WaitLatch\n", getpid());
(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
+fprintf(stderr, "!!!shm_mq_receive_bytes[%d]| after WaitLatch\n", getpid());
/* Reset the latch so we don't spin. */
ResetLatch(MyLatch);
diff --git a/src/backend/storage/ipc/waiteventset.c b/src/backend/storage/ipc/waiteventset.c
index 7c0e66900f9..dcf7b2c111d 100644
--- a/src/backend/storage/ipc/waiteventset.c
+++ b/src/backend/storage/ipc/waiteventset.c
@@ -1497,6 +1497,18 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
/* timeout exceeded */
return -1;
}
+fprintf(stderr, "!!!WaitEventSetWaitBlock POLL[%d]| rc: %d\n", getpid(), rc);
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 1\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 2\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 3\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 4\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 5\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 5\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 6\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 7\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 8\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 9\n", getpid());
+fprintf(stderr, "!!!WaitEventSetWaitBlock[%d] 10\n", getpid());
for (cur_event = set->events, cur_pollfd = set->pollfds;
cur_event < (set->events + set->nevents) &&
Hi,
On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
02.07.2025 02:53, Michael Paquier wrote:
Catching up on this thread after-the-fact, specifically looking at
29213636e6cd as I did the original check tweaked here for O_RDONLY.
Agreed that a backpatch should be OK as done. The buildfarm looks OK
currently.Three months later we can see a number of failures produced by that
animal on several branches, e.g. [1]:
timed out after 3600 secs
Right, I've noticed them as well of course, but did not have time to
take a closer look yet. This timeout in test_shm_mq happens on 32bit
hurd-i386 as well, btw.
It's not that easy to see a backtrace of the running processes on that OS,
but with some debug logging (attached), I can see that the test backend
process or shm_mq background worker just gets stuck shortly after poll().
Thanks for taking a deeper look.
So it seems to me that Hurd is not mature enough yet to test Postgres.
That is a bit harsh; this issue should be looked into, but I would not
say it is not mature enough to test Postgres.
Michael
On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
So it seems to me that Hurd is not mature enough yet to test Postgres.
I'd say that it is likely not mature enough for production. In terms
of testing, that seems kind of OK. However, failures like the one you
are reporting here bring noise in the buildfarm, meaning that we would
perhaps tend to ignore reports that are in fact legit because we don't
really know what would be Hurd-related or Postgres-related. Or we
could get mixes of both.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
So it seems to me that Hurd is not mature enough yet to test Postgres.
I'd say that it is likely not mature enough for production. In terms
of testing, that seems kind of OK. However, failures like the one you
are reporting here bring noise in the buildfarm, meaning that we would
perhaps tend to ignore reports that are in fact legit because we don't
really know what would be Hurd-related or Postgres-related. Or we
could get mixes of both.
Yeah, I think the tendency would be to write off any failures as
"Hurd teething pains" unless there are similar reports from other
animals. As long as the failure rate is pretty low, I'm okay
with that.
regards, tom lane
On Mon, Sep 22, 2025 at 07:02:25AM +0900, Michael Paquier wrote:
On Sun, Sep 21, 2025 at 09:00:00AM +0300, Alexander Lakhin wrote:
So it seems to me that Hurd is not mature enough yet to test Postgres.
I'd say that it is likely not mature enough for production. In terms
of testing, that seems kind of OK.
Ack.
However, failures like the one you are reporting here bring noise in
the buildfarm, meaning that we would perhaps tend to ignore reports
that are in fact legit because we don't really know what would be
Hurd-related or Postgres-related.
I will keep an eye on it.
There have been two (infrequent) failures in the isoloation tests as
well, which I haven't had time to investigate further:
In PG15:
|test multiple-row-versions ... FAILED (test process exited with exit code 1) 145260 ms
This one is a segfault, it happened twice so far and only on REL_15_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-18%2007%3A57%3A40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-03%2007%3A36%3A45
|setup failed: server closed the connection unexpectedly
| This probably means the server terminated abnormally
| before or while processing the request.
|/hurd/crash: [...]/buildroot/REL_15_STABLE/inst/bin/postgres -D data-C(25142) crashed, signal {no:11, code:2, error:2}, exception {1, code:2, subcode:0}, PCs: {
| 0x1019b4d34 0x1028ee2a0 0x10249e1ac 0x1025da71f 0x102596611 0x10049fe63,
| 0x102497aec 0x1028e674e 0x1024ada52 0x1024aeb2a 0x1024a785b 0x10291f5b7 0x10291f
| 625 0x1024c7242 0x1024986a2 0x1024c72c0,
| 0x102497aec 0x102572ace
| }, killing task
In PG17:
|not ok 98 - stats 2100 ms
|diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
|--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out 2025-09-15 22:06:24.000000000 +0100
|+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out 2025-09-15 22:23:05.000000000 +0100
|@@ -1445,7 +1445,7 @@
|
| name |pg_stat_get_function_calls|total_above_zero|self_above_zero
| --------------+--------------------------+----------------+---------------
|-test_stat_func| 1|t |t
|+test_stat_func| 1|f |f
| (1 row)
This one happened twice as well, and so far only on REL_17_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05
This might be due to the HPET timer not always being monotonic - there
has been an independent report via the Debian autobuilder and a GNU Mach
fix was committed last night, I'll check whether this can be
reproduced/confirmed-fixed with this change:
https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06
Michael
Hello Michael,
Thank you for paying attention to this!
Maybe I was wrong and we can at least categorize these failures -- I hope
their number is finite, but my point was that it's hardly possible to use
the information, that fruitcrow gives us, to improve Postgres.
22.09.2025 10:22, Michael Banck wrote:
On Mon, Sep 22, 2025 at 07:02:25AM +0900, Michael Paquier wrote:
However, failures like the one you are reporting here bring noise in
the buildfarm, meaning that we would perhaps tend to ignore reports
that are in fact legit because we don't really know what would be
Hurd-related or Postgres-related.I will keep an eye on it.
There have been two (infrequent) failures in the isoloation tests as
well, which I haven't had time to investigate further:In PG17:
|not ok 98 - stats 2100 ms
|diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
|--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out 2025-09-15 22:06:24.000000000 +0100
|+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out 2025-09-15 22:23:05.000000000 +0100
|@@ -1445,7 +1445,7 @@
|
| name |pg_stat_get_function_calls|total_above_zero|self_above_zero
| --------------+--------------------------+----------------+---------------
|-test_stat_func| 1|t |t
|+test_stat_func| 1|f |f
| (1 row)This one happened twice as well, and so far only on REL_17_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05This might be due to the HPET timer not always being monotonic - there
has been an independent report via the Debian autobuilder and a GNU Mach
fix was committed last night, I'll check whether this can be
reproduced/confirmed-fixed with this change:https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06
I reproduced the issue locally and found that
/* total elapsed time in this function call */
INSTR_TIME_SET_CURRENT(total);
INSTR_TIME_SUBTRACT(total, fcu->start);
sometimes gives total.ticks = 0.
I tried the test program from [2] and got on my VM:
went backwards 0 out of 10000000 times
(three times)
But I've created my own test program (see attached), which shows:
for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
...
ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)
I don't know how to test the patch committed, but if you can, that would
be nice.
Best regards,
Alexander
Attachments:
On Mon, Sep 22, 2025 at 11:30:00PM +0300, Alexander Lakhin wrote:
I reproduced the issue locally and found that
/* total elapsed time in this function call */
INSTR_TIME_SET_CURRENT(total);
INSTR_TIME_SUBTRACT(total, fcu->start);
sometimes gives total.ticks = 0.I tried the test program from [2] and got on my VM:
went backwards 0 out of 10000000 times
(three times)But I've created my own test program (see attached), which shows:
for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
...
ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)I don't know how to test the patch committed, but if you can, that would
be nice.
We've had this exact same issue of a clock going backwards with one of
the netbsd animals on an older version not supported anymore by
upstream and that has been kicked out of the buildfarm, as far as I
recall. This has created some disturbance in the regression tests
causing EXPLAIN plan outputs we did not expect, in terms of extra
negative signs and the animal showing red periodically. So yes, this
random factor would be annoying in the buildfarm.
--
Michael
Hi,
On Mon, Sep 22, 2025 at 11:30:00PM +0300, Alexander Lakhin wrote:
Maybe I was wrong and we can at least categorize these failures -- I hope
their number is finite, but my point was that it's hardly possible to use
the information, that fruitcrow gives us, to improve Postgres.
Or, for that matter, to improve GNU Mach/Hurd...
22.09.2025 10:22, Michael Banck wrote:
There have been two (infrequent) failures in the isoloation tests as
well, which I haven't had time to investigate further:In PG17:
|not ok 98 - stats 2100 ms
|diff -U3 buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out
|--- buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/expected/stats_1.out 2025-09-15 22:06:24.000000000 +0100
|+++ buildroot/REL_17_STABLE/pgsql.build/src/test/isolation/output_iso/results/stats.out 2025-09-15 22:23:05.000000000 +0100
|@@ -1445,7 +1445,7 @@
|
| name |pg_stat_get_function_calls|total_above_zero|self_above_zero
| --------------+--------------------------+----------------+---------------
|-test_stat_func| 1|t |t
|+test_stat_func| 1|f |f
| (1 row)This one happened twice as well, and so far only on REL_17_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-15%2021%3A06%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-13%2008%3A04%3A05This might be due to the HPET timer not always being monotonic - there
has been an independent report via the Debian autobuilder and a GNU Mach
fix was committed last night, I'll check whether this can be
reproduced/confirmed-fixed with this change:https://lists.gnu.org/archive/html/bug-hurd/2025-09/msg00020.html
https://salsa.debian.org/hurd-team/gnumach/-/commit/06079a8d212817ee0365f318bd90b67bf56bfb06I reproduced the issue locally and found that
������� /* total elapsed time in this function call */
������� INSTR_TIME_SET_CURRENT(total);
������� INSTR_TIME_SUBTRACT(total, fcu->start);
sometimes gives total.ticks = 0.I tried the test program from [2] and got on my VM:
went backwards 0 out of 10000000 times
(three times)But I've created my own test program (see attached), which shows:
for i in {1..1000}; do printf "ITERATION $i "; ./tt 100 || break; done
�ITERATION 1 t1: 55873639081080, t2: 55873639084090, t2 - t1: 3010 (r: 4950)
�ITERATION 2 t1: 55873641019440, t2: 55873641025700, t2 - t1: 6260 (r: 4950)
�ITERATION 3 t1: 55873642794200, t2: 55873642797130, t2 - t1: 2930 (r: 4950)
...
�ITERATION 23 t1: 55873675001590, t2: 55873675001590, t2 - t1: 0 (r: 4950)I don't know how to test the patch committed, but if you can, that would
be nice.
Thanks for the test. I ran the stats test with the GNU Mach patch and
while it seemed to help, it did error out eventually. However, your test
case is much better/faster and I also see 0 deltas after a few hundred
to a few thousand iterations.
I'll report that on their development list, looks like they have not
plugged all the holes yet..
Michael
Hi,
On Wed, Sep 24, 2025 at 08:31:27AM +0900, Michael Paquier wrote:
We've had this exact same issue of a clock going backwards with one of
the netbsd animals on an older version not supported anymore by
upstream and that has been kicked out of the buildfarm, as far as I
recall. This has created some disturbance in the regression tests
causing EXPLAIN plan outputs we did not expect, in terms of extra
negative signs and the animal showing red periodically.
This was the case initially on 32bit Hurd until I configured it to use
APIC (which is a requirement for HPET timers). So the clock is no longer
going backwards in an obvious way; and apart from the stats isolation
test, I have not seen issues in this area.
So yes, this random factor would be annoying in the buildfarm.
How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.
Michael
On Wed, Sep 24, 2025 at 09:41:19AM +0200, Michael Banck wrote:
On Wed, Sep 24, 2025 at 08:31:27AM +0900, Michael Paquier wrote:
So yes, this random factor would be annoying in the buildfarm.
How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.
This is the pg_test_timing output on my hurd-i386 VM with
pg_test_timing from HEAD:
Average loop time including overhead: 13866,64 ns
Histogram of timing durations:
<= ns % of total running % count
0 0,0510 0,0510 122
1 0,0000 0,0510 0
3 0,0000 0,0510 0
7 0,0000 0,0510 0
15 0,0000 0,0510 0
31 0,0000 0,0510 0
63 0,0000 0,0510 0
127 0,0000 0,0510 0
255 0,0000 0,0510 0
511 0,0000 0,0510 0
1023 0,0004 0,0514 1
2047 0,0000 0,0514 0
4095 98,9320 98,9834 236681
8191 0,8845 99,8679 2116
16383 0,0393 99,9072 94
32767 0,0343 99,9415 82
[...]
Observed timing durations up to 99,9900%:
ns % of total running % count
0 0,0510 0,0510 122
729 0,0004 0,0514 1
3519 0,0004 0,0518 1
3630 0,0130 0,0648 31
3640 0,1651 0,2299 395
3650 0,7449 0,9748 1782
3660 2,3395 3,3143 5597
Clearly those aren't very precise (running Debian 13 GNU/Linux on the
same host in the same qemu/kvm fashion, I get an average loop time
including overhead of around 30ns), but I assumed that the 122 0ns
entries would be the problem; however Hannu reported back in 2024 that
he saw something similar on his Macbook Air M1:
/messages/by-id/CAMT0RQSbzeJN+nPo_QXib-P62rgez=dJxoaTURcN1FYPoLpQPg@mail.gmail.com
|Per loop time including overhead: 21.54 ns
|Histogram of timing durations:
| <= ns % of total running % count
| 0 49.1655 49.1655 68481688
| 1 0.0000 49.1655 0
| 3 0.0000 49.1655 0
| 7 0.0000 49.1655 0
| 15 0.0000 49.1655 0
| 31 0.0000 49.1655 0
| 63 50.6890 99.8545 70603742
| 127 0.1432 99.9976 199411
| 255 0.0015 99.9991 2065
I wonder what is going on here, was that a fluke or is that not related
to the stats isolation test failure after all? Anybody else tried the
updated pg_test_timing on Apple hardware and could possibly run the tt.c
test case from Alexander?
btw, the stats test failed in a similar way on hamerkop (Windows Server
2016) once, 35 days ago:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-08-19%2013%3A56%3A17
Michael
Michael Banck <mbanck@gmx.net> writes:
How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.
I think it is reasonable to require the clock to not go backwards
during a test run, but it's not at all reasonable to require the
clock to advance by more than zero between two successive readings.
We used to encounter the no-advance case all the time, back when
machines had clock resolutions measured in milliseconds. It's
relatively rare now though, so it's possible that some test case
has crept in that expects that. But I'd call it a bug in the
test case if so.
It'd be interesting to see the output of a pg_test_timing run
from your Hurd machine.
regards, tom lane
Michael Banck <mbanck@gmx.net> writes:
This is the pg_test_timing output on my hurd-i386 VM with
pg_test_timing from HEAD:
Ask and ye shall receive ... sorry for not reading the whole thread.
I wonder what is going on here, was that a fluke or is that not related
to the stats isolation test failure after all? Anybody else tried the
updated pg_test_timing on Apple hardware and could possibly run the tt.c
test case from Alexander?
Yeah, I see zero-ns outputs on a couple different Apple M-series
machines. This is the output on the M4 Mini that runs the sifaka
and indri BF animals:
$ pg_test_timing
Testing timing overhead for 3 seconds.
Average loop time including overhead: 17.22 ns
Histogram of timing durations:
<= ns % of total running % count
0 58.8235 58.8235 102495613
1 0.0000 58.8235 0
3 0.0000 58.8235 0
7 0.0000 58.8235 0
15 0.0000 58.8235 0
31 0.0000 58.8235 0
63 41.1229 99.9464 71653499
127 0.0502 99.9966 87421
255 0.0026 99.9992 4522
511 0.0000 99.9992 56
1023 0.0001 99.9993 117
2047 0.0001 99.9994 164
4095 0.0003 99.9997 558
8191 0.0003 100.0000 501
16383 0.0000 100.0000 50
32767 0.0000 100.0000 17
65535 0.0000 100.0000 0
131071 0.0000 100.0000 1
Observed timing durations up to 99.9900%:
ns % of total running % count
0 58.8235 58.8235 102495613
41 13.7077 72.5313 23884717
42 27.4151 99.9464 47768782
83 0.0277 99.9741 48304
84 0.0140 99.9881 24425
125 0.0084 99.9966 14692
...
107083 0.0000 100.0000 1
Those animals are not showing failures, so we can't blame
"clock didn't advance" as a problem in itself. However,
the thing that jumps out at me from your results is that
the clock resolution seems to be only 3 to 4 us on Hurd:
Histogram of timing durations:
<= ns % of total running % count
0 0,0510 0,0510 122
1 0,0000 0,0510 0
3 0,0000 0,0510 0
7 0,0000 0,0510 0
15 0,0000 0,0510 0
31 0,0000 0,0510 0
63 0,0000 0,0510 0
127 0,0000 0,0510 0
255 0,0000 0,0510 0
511 0,0000 0,0510 0
1023 0,0004 0,0514 1
2047 0,0000 0,0514 0
4095 98,9320 98,9834 236681
8191 0,8845 99,8679 2116
It seems plausible that the execution time of the stats
test's function-under-test is so short that it sometimes
doesn't register as more than zero on a machine with poor
clock resolution. It looks like that test only calls the
test function once or twice before checking that it's
accumulated some runtime, and the test function is nothing
more than
CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
I'd call this a bug in that test TBH. It'd be saner to
make the function do something like pg_sleep for 1ms.
regards, tom lane
Hello Michael and Tom,
Thank you for spending time on this!
24.09.2025 13:45, Michael Banck wrote:
btw, the stats test failed in a similar way on hamerkop (Windows Server
2016) once, 35 days ago:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-08-19%2013%3A56%3A17
Yes, and all of such failures are counted at [1]https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#subscription.sql_sporadically_fails_on_hamerkop_due_to_zero_time_difference.
And we had discussed that (Windows-specific) anomaly too: [2]/messages/by-id/CANhcyEX4hH9POyTM3vh=58newEF0=qgK46xF5i-RDir2zAZ4og@mail.gmail.com. Probably
something has changed in that environment, so that we see no such failures
for the last month.
How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.
Regarding the lowest timer resolution, as I mentioned at [3]/messages/by-id/af1d252c-738f-46ab-99c6-a00e0d65aa04@gmail.com, 32k_counter
gives only 0.030517 sec...
[1]: https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#subscription.sql_sporadically_fails_on_hamerkop_due_to_zero_time_difference
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#subscription.sql_sporadically_fails_on_hamerkop_due_to_zero_time_difference
[2]: /messages/by-id/CANhcyEX4hH9POyTM3vh=58newEF0=qgK46xF5i-RDir2zAZ4og@mail.gmail.com
[3]: /messages/by-id/af1d252c-738f-46ab-99c6-a00e0d65aa04@gmail.com
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes:
24.09.2025 13:45, Michael Banck wrote:
How much timer resolution do we require from the system? GNU Mach seems
to (at least try to) guarantee that the timer won't go backwards, but it
does not guarantee (currently) that two consecutive clock_gettime()
calls will return something different in all cases.
Regarding the lowest timer resolution, as I mentioned at [3], 32k_counter
gives only 0.030517 sec...
We are currently doing a short pg_test_timing run in every BF run,
but with only a cursory regex-based sanity check on the output.
Since it's a TAP test, we could easily report the full output in
the TAP log without causing problems. I was already thinking about
doing that, and if there's some question about the minimum expected
timer resolution then it's really silly to not be capturing that
data.
I will go do that, and in a few day's time we should have enough
reports to see what we can realistically expect.
regards, tom lane
Hi,
On Wed, Sep 24, 2025 at 10:28:46AM -0400, Tom Lane wrote:
It seems plausible that the execution time of the stats
test's function-under-test is so short that it sometimes
doesn't register as more than zero on a machine with poor
clock resolution. It looks like that test only calls the
test function once or twice before checking that it's
accumulated some runtime, and the test function is nothing
more thanCREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
I'd call this a bug in that test TBH. It'd be saner to
make the function do something like pg_sleep for 1ms.
I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.
Michael
Attachments:
0001-Add-minimal-sleep-to-stats-isolation-test-functions.patchtext/x-diff; charset=us-asciiDownload
From d6a6995ea53695319dde1488e8f7f4b1bf8c21a8 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Wed, 24 Sep 2025 17:43:20 +0200
Subject: [PATCH] Add minimal sleep to stats isolation test functions.
The functions test_stat_func() and test_stat_func2() had no function bodies.
This made it possible that on machines with relatively low timer resolution
the function would return before the clock advanced, making the test fail.
To avoid that, pg_sleep for 1ms during the function.
Per buildfarm members fruitcrow and hamerkop.
---
src/test/isolation/specs/stats.spec | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
index 1d98ac785b8..5b765da7eff 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -6,10 +6,10 @@ setup
INSERT INTO test_stat_tab(key, value) VALUES('k0', 1);
INSERT INTO test_stat_oid(name, oid) VALUES('test_stat_tab', 'test_stat_tab'::regclass);
- CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
+ CREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN PERFORM pg_sleep(0.001); END;$$;
INSERT INTO test_stat_oid(name, oid) VALUES('test_stat_func', 'test_stat_func'::regproc);
- CREATE FUNCTION test_stat_func2() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
+ CREATE FUNCTION test_stat_func2() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN PERFORM pg_sleep(0.001); END;$$;
INSERT INTO test_stat_oid(name, oid) VALUES('test_stat_func2', 'test_stat_func2'::regproc);
CREATE TABLE test_slru_stats(slru TEXT, stat TEXT, value INT);
--
2.39.5
24.09.2025 18:52, Michael Banck wrote:
On Wed, Sep 24, 2025 at 10:28:46AM -0400, Tom Lane wrote:
It seems plausible that the execution time of the stats
test's function-under-test is so short that it sometimes
doesn't register as more than zero on a machine with poor
clock resolution. It looks like that test only calls the
test function once or twice before checking that it's
accumulated some runtime, and the test function is nothing
more thanCREATE FUNCTION test_stat_func() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
I'd call this a bug in that test TBH. It'd be saner to
make the function do something like pg_sleep for 1ms.I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.
If the stats test could be fixed this way, I wonder how to deal with
regress/subscription.sql. When running:
TESTS="$(printf "subscription %.0s" `seq 1000`)" make -s check-tests
on the same Hurd VM, I'm observing:
...
ok 986 - subscription 53 ms
not ok 987 - subscription 53 ms
ok 988 - subscription 53 ms
...
# 4 of 1000 tests failed.
# The differences that caused some tests to fail can be viewed in the file
"/home/demo/postgresql/src/test/regress/regression.diffs".
$ cat "/home/demo/postgresql/src/test/regress/regression.diffs"
--- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:06:48.000000000 +0100
@@ -70,7 +70,7 @@
SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
?column?
----------
- t
+ f
(1 row)
-- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out
/home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:07:13.000000000 +0100
@@ -70,7 +70,7 @@
SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
?column?
----------
- t
+ f
(1 row)
-- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out
/home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:07:28.000000000 +0100
@@ -70,7 +70,7 @@
SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
?column?
----------
- t
+ f
(1 row)
-- fail - name already exists
diff -U3 /home/demo/postgresql/src/test/regress/expected/subscription.out
/home/demo/postgresql/src/test/regress/results/subscription.out
--- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:07:33.000000000 +0100
@@ -70,7 +70,7 @@
SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
?column?
----------
- t
+ f
(1 row)
-- fail - name already exists
Best regards,
Alexander
Hi,
On Wed, Sep 24, 2025 at 11:00:00PM +0300, Alexander Lakhin wrote:
24.09.2025 18:52, Michael Banck wrote:
I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.
(it ran for 7000 iterations without fault so far)
If the stats test could be fixed this way, I wonder how to deal with
regress/subscription.sql. When running:
TESTS="$(printf "subscription %.0s" `seq 1000`)" make -s check-testson the same Hurd VM, I'm observing:
...
ok 986������ - subscription������������������������������ 53 ms
not ok 987�� - subscription������������������������������ 53 ms
ok 988������ - subscription������������������������������ 53 ms
...
# 4 of 1000 tests failed.
I ran that five times now without a problem, both with and without the
Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
going on here.
# The differences that caused some tests to fail can be viewed in the file
"/home/demo/postgresql/src/test/regress/regression.diffs".$ cat "/home/demo/postgresql/src/test/regress/regression.diffs"
--- /home/demo/postgresql/src/test/regress/expected/subscription.out 2025-09-24 19:49:53.000000000 +0100 +++ /home/demo/postgresql/src/test/regress/results/subscription.out 2025-09-24 20:06:48.000000000 +0100 @@ -70,7 +70,7 @@ �SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub'; � ?column? �---------- - t + f �(1 row)
I saw those issues frequently on the initial 32bit Hurd VM I started to
run the buildfarm code on, before I switched it to HPET timers. Since
then, I don't think I saw that particular error again, but 4 out 1000 is
not a lot of course.
Michael
Hello Michael,
25.09.2025 00:22, Michael Banck wrote:
I ran that five times now without a problem, both with and without the
Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
going on here.
Maybe you're running it against REL_15_STABLE, where this test case is
absent... (I tested that on REL_18_STABLE.) I don't know what can prevent
the test case from failing if the underlying defect is still here.
I saw those issues frequently on the initial 32bit Hurd VM I started to
run the buildfarm code on, before I switched it to HPET timers. Since
then, I don't think I saw that particular error again, but 4 out 1000 is
not a lot of course.
There is also contrib/pg_stat_statements/entry_timestamp, which fails for
me when running in a loop:
for i in `seq 100`; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 make -s check -C contrib/pg_stat_statements || break; done
on iterations 42, 60, 12, 5, 28:
ITERATION 28
...
ok 8 - wal 14 ms
not ok 9 - entry_timestamp 14 ms
ok 10 - privileges 16 ms
...
1..15
# 1 of 15 tests failed.
# The differences that caused some tests to fail can be viewed in the file
"/tst/postgresql/contrib/pg_stat_statements/regression.diffs".
$ cat contrib/pg_stat_statements/regression.diffs
diff -U3 /tst/postgresql/contrib/pg_stat_statements/expected/entry_timestamp.out
/tst/postgresql/contrib/pg_stat_statements/results/entry_timestamp.out
--- /tst/postgresql/contrib/pg_stat_statements/expected/entry_timestamp.out 2025-09-25 04:26:23.000000000 +0100
+++ /tst/postgresql/contrib/pg_stat_statements/results/entry_timestamp.out 2025-09-25 04:50:43.000000000 +0100
@@ -147,7 +147,7 @@
WHERE query LIKE '%STMTTS%';
total | minmax_exec_zero | minmax_ts_after_ref | stats_since_after_ref
-------+------------------+---------------------+-----------------------
- 2 | 1 | 2 | 0
+ 2 | 2 | 2 | 0
(1 row)
-- Cleanup
Best regards,
Alexander
Hi,
On Thu, Sep 25, 2025 at 08:00:00AM +0300, Alexander Lakhin wrote:
25.09.2025 00:22, Michael Banck wrote:
I ran that five times now without a problem, both with and without the
Mach patch I mentioned earlier, and on 32 and 64 bit. Not sure what is
going on here.Maybe you're running it against REL_15_STABLE, where this test case is
absent... (I tested that on REL_18_STABLE.) I don't know what can prevent
the test case from failing if the underlying defect is still here.
No, I ran it on HEAD an REL_18_STABLE looks similar.
There is also contrib/pg_stat_statements/entry_timestamp, which fails for
me when running in a loop:
for i in `seq 100`; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 make -s check -C contrib/pg_stat_statements || break; done
The two test cases above failed frequently for me before I switched to
HPET timers, but on amd64 (which you are running as I understand), those
are activated by default, so this should not be a problem.
What does pg_test_timing from HEAD report as "Average loop time
including overhead" for your VM?
Michael
Hi,
On Wed, Sep 24, 2025 at 05:52:02PM +0200, Michael Banck wrote:
I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.
For the record, the stats test ran 10000 times without a failure with
that patch applied.
Michael
Michael Banck <mbanck@gmx.net> writes:
On Wed, Sep 24, 2025 at 05:52:02PM +0200, Michael Banck wrote:
I did that in the attached, so far my Hurd VM ran the stats test more
than 1000 times without a failure with it. I have the loop running till
10000, I'll report back tomorrow.
For the record, the stats test ran 10000 times without a failure with
that patch applied.
Okay. Elsewhere on the playing field, 32 buildfarm animals have now
reported in with full pg_test_timing output. (I'd thought we might
get more, but apparently there's still only a minority of animals
configured with --enable-tap-tests.) Here's the URLs for those runs,
along with my notes about what the results look like:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=alligator&dt=2025-09-25%2014%3A02%3A34
Ubuntu/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=basilisk&dt=2025-09-25%2011%3A01%3A29
Alpine Linux/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=batta&dt=2025-09-25%2014%3A05%3A01
Debian/aarch64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=billbug&dt=2025-09-25%2003%3A00%3A03
Solaris/sparc: looks to have 15ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bumblebee&dt=2025-09-25%2015%3A00%3A02
CentOS/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bushmaster&dt=2025-09-25%2013%3A40%3A21
Debian/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake&dt=2025-09-24%2019%3A14%3A48
Debian/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=copperhead&dt=2025-09-24%2018%3A49%3A47
Debian/riscv64: looks to have 1000ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2025-09-24%2018%3A05%3A01
FreeBSD/arm64: possibly 20ns ticks, or very noisy 10ns
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2025-09-25%2000%3A05%3A05
Debian/aarch64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2025-09-24%2022%3A39%3A25
Raspbian/armv7: looks to have 50ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-09-24%2021%3A05%3A05
Debian/aarch64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hippopotamus&dt=2025-09-25%2013%3A35%3A48
openSUSE/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2025-09-25%2014%3A34%3A22
macOS/arm64: looks to have 40ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jay&dt=2025-09-24%2016%3A47%3A05
openSUSE/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2025-09-25%2014%3A53%3A05
FreeBSD/x86_64: odd behavior, but I bet it's really 10ns resolution
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2025-09-25%2013%3A35%3A50
macOS/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-09-25%2003%3A39%3A01
NetBSD/macppc: looks to have 50ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2025-09-25%2004%3A00%3A02
Solaris/sparc: looks to have 15ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2025-09-24%2017%3A04%3A38
OpenBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mule&dt=2025-09-24%2019%3A30%3A01
Debian/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pollock&dt=2025-09-25%2005%3A35%3A13
illumos/x86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-09-25%2012%3A43%3A07
Amazon Linux/86_64: timing seems to be ns-precise
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer&dt=2025-09-24%2016%3A49%3A23
OpenBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2025-09-24%2018%3A35%3A01
NetBSD/x86_64: horrible loop time (> 4us), but perhaps 15ns resolution underneath?
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka&dt=2025-09-25%2013%3A56%3A16
macOS/arm64: looks to have 40ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skimmer&dt=2025-09-25%2014%3A20%3A54
CentOS/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snakefly&dt=2025-09-25%2013%3A32%3A08
Amazon Linux/aarch64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=taipan&dt=2025-09-25%2013%3A51%3A19
Debian/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=turaco&dt=2025-09-24%2018%3A15%3A03
Debian/armv7: 500ns loop time, but looks to have 15ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=urutu&dt=2025-09-25%2012%3A55%3A44
Debian/x86_64: looks to have 10ns ticks
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=widowbird&dt=2025-09-24%2016%3A54%3A03
Debian/aarch64: looks to have 20ns ticks?
So your Hurd setup seems to be in the same camp as some of the BSDen:
very slow to read the clock, but the underlying hardware resolution
is not bad, perhaps 10ns. Given that, I don't quite understand how
it's sometimes able to report the same reading twice, but nonetheless
that's what it's doing.
In any case, given these results, it's really hard to credit that
there are any platforms out there today that would fail to
distinguish clock readings more than a couple microseconds apart.
(This squares with conclusions we arrived at previously when
messing about with our clock code.) So I'm inclined to set the
delay in stats.spec to 10us not 1ms, just to not slow the test
more than we have to. Now, pg_sleep relies on WaitLatch which has a
sleep resolution of 1ms, so you might think there's little point in
asking for less. But experimentation shows that if you ask for 1ms
you actually get a 2ms delay --- something odd about the float
roundoff behavior in pg_sleep, seems like. I think we can set the
delay to less so that doesn't happen, and to be prepared in case
someday the sleep resolution gets better.
In short, your patch looks good and I'll go apply it with a slightly
smaller delay parameter.
regards, tom lane
Hi,
On Thu, Sep 25, 2025 at 12:52:33PM -0400, Tom Lane wrote:
In short, your patch looks good and I'll go apply it with a slightly
smaller delay parameter.
Great, thanks! I let my VM run for a few thousand iterations with that
timeout and so far there were no failures.
Michael
[Using this as a general GNU/Hurd problem thread]
An interesting fruitcrow failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-30%2007%3A28%3A50
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File:
"pqsignal.c", Line: 91, PID: 25731
postgres(ExceptionalCondition+0x5a) [0x1006b1d0a]
postgres(+0x711cf2) [0x100711cf2]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102bdffee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102bdffdd]
2025-09-30 08:38:59.451 BST [24668:6] LOG: client backend (PID 25731)
was terminated by signal 6: Aborted
Our definition of NSIG is:
#ifdef PG_SIGNAL_COUNT /* Windows */
#define PG_NSIG (PG_SIGNAL_COUNT)
#elif defined(NSIG)
#define PG_NSIG (NSIG)
#else
#define PG_NSIG (64) /* XXX: wild guess */
#endif
Is NSIG defined? Where on the internet can we see the SIGXXX signal
numbers and the glibc source that is actually used on these systems?
This has to be handling something installed by pqsignal(), so I guess
it's probably not the synchronous SIGABRT from abort() expected in
ExceptionCondition() (assuming that abort() is implemented as
raise(SIGABRT) in the traditional way, which might not be true), so
then I guess it must be an asynchronous signal, but which one?
Searching for that error in our archives brought up another platform
that saw the same assertion fail[1]/messages/by-id/Z8z6EaT89FL7UUBU@nathan. There it smelled a bit like an
uninitialised value somehow finishing up in there, maybe related to
valgrind, but I have no idea whether or how that relates to this
failure.
The main thing I learned while failing to find the values for those
symbols for myself was that it implements asynchronous signals in an
unorthodox way akin to Windows' SIGINT mechanism:
"The UNIX signalling mechanism is implemented for the GNU Hurd by
means of a separate signal thread that is part of every user-space
process. This makes handling of signals a separate thread of control.
GNU Mach itself has no idea what a signal is and kill is not a system
call (as it typically is in a UNIX system): it's implemented in
glibc." - glibc docs[2]https://www.gnu.org/software/hurd/glibc/signal.html
I haven't investigated the details or implications, but huh, I wonder
what that can break in our code... We're working on booting
asynchronous signals out of the code for various reasons so this might
already or at least soon be a non-issue, but still.
I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...
[1]: /messages/by-id/Z8z6EaT89FL7UUBU@nathan
[2]: https://www.gnu.org/software/hurd/glibc/signal.html
Hello Thomas,
Thank you for your attention to that anomaly!
10.10.2025 05:59, Thomas Munro wrote:
[Using this as a general GNU/Hurd problem thread]
An interesting fruitcrow failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-30%2007%3A28%3A50
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File:
"pqsignal.c", Line: 91, PID: 25731
I've added the following logging:
static void
wrapper_handler(SIGNAL_ARGS)
{
...
Assert(postgres_signal_arg > 0);
+fprintf(stderr, "!!!wrapper_handler[%d]| postgres_signal_arg: %d, PG_NSIG: %d\n", getpid(), postgres_signal_arg, PG_NSIG);
Assert(postgres_signal_arg < PG_NSIG);
and got the following during a successful `make check` run:
2025-10-11 10:55:13.091 BST postmaster[1909] LOG: starting PostgreSQL 19devel on x86_64-unknown-gnu0.9, compiled by gcc
(Debian 14.2.0-12) 14.2.0, 64-bit
2025-10-11 10:55:13.092 BST postmaster[1909] LOG: listening on Unix socket "/tmp/pg_regress-Tg7wMt/.s.PGSQL.58928"
2025-10-11 10:55:13.096 BST startup[1915] LOG: database system was shut down at 2025-10-11 10:55:10 BST
!!!wrapper_handler[1909]| postgres_signal_arg: 20, PG_NSIG: 33
2025-10-11 10:55:13.117 BST postmaster[1909] LOG: database system is ready to accept connections
!!!wrapper_handler[1910]| postgres_signal_arg: 16, PG_NSIG: 33
!!!wrapper_handler[1918]| postgres_signal_arg: 16, PG_NSIG: 33
...
$ grep -E -o '!!!.*postgres_signal_arg: [0-9]+' .../postmaster.log | grep -E -o '[0-9]+$' | sort | uniq
14
15
16
2
20
30
31
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGEMT 8) SIGFPE 9) SIGKILL 10) SIGBUS
11) SIGSEGV 12) SIGSYS 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGURG 17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD
21) SIGTTIN 22) SIGTTOU 23) SIGIO 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGINFO 30) SIGUSR1
31) SIGUSR2 32) SIGLOST
Whilst from a failed run, I got:
...
!!!wrapper_handler[1988]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[1989]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[3284]| postgres_signal_arg: 14, PG_NSIG: 33
!!!wrapper_handler[3284]| postgres_signal_arg: 28476608, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 3284
!!!wrapper_handler[1980]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[3278]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[1980]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[3278]| postgres_signal_arg: 30, PG_NSIG: 33
postgres(ExceptionalCondition+0x5a) [0x1006af78a]
postgres(+0x70f59a) [0x10070f59a]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]
...
2025-10-11 12:41:53.905 BST postmaster[1980] LOG: client backend (PID 3284) was terminated by signal 6: Aborted
2025-10-11 12:41:53.905 BST postmaster[1980] DETAIL: Failed process was running: insert into prtx2 select 1 + i%30, i, i
from generate_series(1,500) i, generate_series(1,10) j;
Is NSIG defined? Where on the internet can we see the SIGXXX signal
numbers and the glibc source that is actually used on these systems?
This has to be handling something installed by pqsignal(), so I guess
it's probably not the synchronous SIGABRT from abort() expected in
ExceptionCondition() (assuming that abort() is implemented as
raise(SIGABRT) in the traditional way, which might not be true), so
then I guess it must be an asynchronous signal, but which one?
Searching through /usr/include/ gives me:
/usr/include/signal.h:# define NSIG _NSIG
/usr/include/x86_64-gnu/bits/signum-generic.h:#define _NSIG (__SIGRTMAX + 1)
/usr/include/x86_64-gnu/bits/signum-arch.h:#define __SIGRTMAX __SIGRTMIN
/usr/include/x86_64-gnu/bits/signum-arch.h:#define __SIGRTMIN 32
I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...
That's pretty wise — the most frustrating thing with Hurd VM, which I
created as described above, is that it hangs during tests (only 1 out of
5 `make check` runs completes) and killing the hanging processes doesn't
restore it's working state — I have to reboot it (and fsck finds FS errors
on each reboot) or even restore a copy of VM's disk.
Best regards,
Alexander
On Sun, Oct 12, 2025 at 1:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
!!!wrapper_handler[1988]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[1989]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[3284]| postgres_signal_arg: 14, PG_NSIG: 33
!!!wrapper_handler[3284]| postgres_signal_arg: 28476608, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 3284
Hmm. We only install the handler for real signal numbers, and it
clearly managed to find the handler, so then how did it corrupt signo
before calling the function? I wonder if there could concurrency bugs
reached by our perhaps unusually large amount of signaling (we have
found bugs in the signal implementations of several other OSes...).
This might be the code:
https://github.com/bminor/glibc/blob/master/hurd/hurdsig.c#L639
It appears to suspend the thread selected to handle the signal, mess
with its stack/context and then resume it, just like traditional
monokernels, it's just done in user space by code running in a helper
thread that communicates over Mach ports. So it looks like I
misunderstood that comment in the docs, it's not the handler itself
that runs in a different thread, unless I'm looking at the wrong code
(?).
Some random thoughts after skim-reading that and
glibc/sysdeps/mach/hurd/x86/trampoline.c:
* I wonder if setting up sigaltstack() and then using SA_ONSTACK in
pqsignal() would behave differently, though SysV AMD64 calling
conventions (used by Hurd IIGC) have the first argument in %rdi, not
the stack, so I don't really expect that to be relevant...
* I wonder about the special code paths for handlers that were already
running and happened to be in sigreturn(), or something like that,
which I didn't study at all, but it occurred to me that our pqsignal
will only block the signal itself while running a handler (since it
doesn't specify SA_NODEFER)... so what happens if you block all
signals while running each handler by changing
sigemptyset(&act.sa_mask) to sigfillset(&act.sa_mask)?
* I see special code paths for threads that were in (its notion of)
critical sections, which must be rare, but it looks like that just
leave it pending which seems reasonable
* I see special code paths for SIGIO and SIGURG that I didn't try to
understand, but I wonder what would happen if we s/SIGURG/SIGXCPU/
(I will hopefully soon be able to share a branch that would get rid of
almost all signals, and optionally use pipes or futexes or other
tricks instead, depending on build options, working on that...)
I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...That's pretty wise — the most frustrating thing with Hurd VM, which I
created as described above, is that it hangs during tests (only 1 out of
5 `make check` runs completes) and killing the hanging processes doesn't
restore it's working state — I have to reboot it (and fsck finds FS errors
on each reboot) or even restore a copy of VM's disk.
Huh, so we're doing something unusual enough to de-stabilise some
fundamental service... Has any of this reached the Hurd mailing
lists?
Some more wild and uninformed guesses, while thinking about how to
narrow a bug report down: if the file system is inconsistent after
it's had plenty of time to finish writing disk blocks before you
rebooted it and needing fsck to fix, perhaps that means that ext2fs
(which I understand to be a user space process that manages the file
system[1]https://www.debian.org/ports/hurd/hurd-doc-server) has locked up? Of course it could easily be something
else, who knows, but that makes me wonder about the more exotic file
system operations we use. Looking at fruitcrow's configure output, I
see that it doesn't have fadvise or sync_file_range, but it does have
pwritev/preadv and posix_fallocate. They probably don't get much
exercise in other software... so maybe try telling PostgreSQL that we
don't have 'em and see what happens? It might also be related to our
vigorous renaming, truncating, fsyncing activities... It looks like
the only other plausible file system might be an NFS mount... does it
work any better?
Thinking of other maybe-slightly-unusual things in the signal
processing area that have been problematic in a couple of other OSes
(ie systems that added emulations of Linux system calls), I wondered
about epoll and signalfd, but it doesn't have those either, so it must
be using plain old poll() with the widely used self-pipe trick for
latches, and that doesn't seem likely to be new or buggy code.
Hi Alexander,
On Sat, Oct 11, 2025 at 03:00:00PM +0300, Alexander Lakhin wrote:
Thank you for your attention to that anomaly!
Sorry, I missed Thomas' initial mail on this somewow.
Whilst from a failed run, I got:
Any way to easily reproduce this? It happened only once on fruitcrow so
far.
I wonder whether this is a problem in the (relatively new) x86-64 port,
so I'd like to try to reproduce it on i386.
I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...That's pretty wise — the most frustrating thing with Hurd VM, which I
created as described above, is that it hangs during tests (only 1 out of
5 `make check` runs completes) and killing the hanging processes doesn't
restore it's working state — I have to reboot it (and fsck finds FS errors
on each reboot) or even restore a copy of VM's disk.
I had to reboot fruitcrow last night because it had crashed, but that
was the first time in literally weeks. I tend to reboot it once a week,
but otherwise it ran pretty stable.
It took me a while to get there though before I applied for it to be a
buildfarm animal, here is what I did:
1) (builfarm client specific): removed "HEAD => ['debug_parallel_query =
regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to
reduce concurrency.
2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess
you are already doing that as it does not boot properly otherwise IME.
3. Removed swap (this is already the case for the x86-64 2025 Debian
image, but it was not the case for the earlier 2023 i386 image when I
started this project). Paging to disk has been problematic and prone to
issues (critical parts getting paged out accidently), but this has been
fixed over the summer so in principle running a current gnumach/hurd
package combination from unstable should be fine again.
4. Removed tmpfs translators (so that the default-pager is not used
anywhere, in conjunction with not setting swap, see above), by setting
RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting
out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in
/etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in
/etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but
that is what I have right now).
5. Set the sync interval for the root file system to 5s in /etc/fstab
(this will not help with crashes, but likely make file system corruption
less bad, so recovering from them will be easier):
/dev/wd0s2 / ext2 sync=5 0 1
Hope that helps,
Michael
Hi Michael,
12.10.2025 11:31, Michael Banck wrote:
Any way to easily reproduce this? It happened only once on fruitcrow so
far.
I'd say it happens pretty often when `make check` doesn't hang (so it
takes an hour or two for me to reproduce).
Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried:
EXTRA_REGRESS_OPTS="--max-connections=3" make -s check
and it passed 6 iterations for me. Iteration 7 failed with:
not ok 213 + partition_aggregate 1027 ms
--- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100
@@ -1476,14 +1476,14 @@
(15 rows)
SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3;
- x | sum | avg | sum | count
-----+------+--------------------+-------+-------
- 0 | 5000 | 5.0000000000000000 | 5000 | 1000
- 1 | 6000 | 6.0000000000000000 | 7000 | 1000
- 10 | 5000 | 5.0000000000000000 | 15000 | 1000
- 11 | 6000 | 6.0000000000000000 | 17000 | 1000
- 20 | 5000 | 5.0000000000000000 | 25000 | 1000
- 21 | 6000 | 6.0000000000000000 | 27000 | 1000
+ x | sum | avg | sum | count
+----+------+----------------------------+-------+-------
+ 0 | 5000 | 5.0000000000000000 | 5000 | 1000
+ 1 | 6000 | 6.0000000000000000 | 7000 | 1000
+ 10 | 5000 | 0.000000052757140846001326 | 15000 | 1000
+ 11 | 6000 | 6.0000000000000000 | 17000 | 1000
+ 20 | 5000 | 5.0000000000000000 | 25000 | 1000
+ 21 | 6000 | 6.0000000000000000 | 27000 | 1000
(6 rows)
Then another 6 iterations passed, seventh one hanged. Then 10 iterations
passed.
With EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got:
2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints STATEMENT: ALTER TABLE notnull_tbl2 ALTER a
DROP NOT NULL;
!!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476
postgres(ExceptionalCondition+0x5a) [0x1006af78a]
postgres(+0x70f59a) [0x10070f59a]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]
on iteration 5.
So we can conclude that the issue with signals is better reproduced with
higher concurrency.
28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I
showed before, so numbers are apparently not random.
I had to reboot fruitcrow last night because it had crashed, but that
was the first time in literally weeks. I tend to reboot it once a week,
but otherwise it ran pretty stable.
Today I also tried to test my machine with stress-ng:
stress-ng -v --class os --sequential 20 --timeout 120s
It hanged/crashed at tests access, brk, close, enosys and never reached
the end... Some tests might pass after restart, some fail consistently...
For example:
Fatal glibc error: ../sysdeps/mach/hurd/mig-reply.c:73 (__mig_dealloc_reply_port): assertion failed: port == arg
stress-ng: info: [9395] stressor terminated with unexpected signal 6 'SIGABRT'
backtrace:
stress-ng-enosys [run](+0xace81) [0x1000ace81]
stress-ng-enosys [run](+0x927b6c) [0x100927b6c]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x1029c8fee]
/lib/x86_64-gnu/libc.so.0.3(+0x21aec) [0x1029b0aec]
It took me a while to get there though before I applied for it to be a
buildfarm animal, here is what I did:1) (builfarm client specific): removed "HEAD => ['debug_parallel_query =
regress']," and set "MAX_CONNECTIONS => '3'," in build-farm.conf, to
reduce concurrency.
Thank you for the info! I didn't specify debug_parallel_query for
`make check`, but num_connections really makes the difference.
2. Gave it 4G of memory to the VM via KVM. Also set -M q35, but I guess
you are already doing that as it does not boot properly otherwise IME.
Mine has 4GB too.
3. Removed swap (this is already the case for the x86-64 2025 Debian
image, but it was not the case for the earlier 2023 i386 image when I
started this project). Paging to disk has been problematic and prone to
issues (critical parts getting paged out accidently), but this has been
fixed over the summer so in principle running a current gnumach/hurd
package combination from unstable should be fine again.
Yes, I have no swap enabled.
4. Removed tmpfs translators (so that the default-pager is not used
anywhere, in conjunction with not setting swap, see above), by setting
RAMLOCK=no and RAMTMP=no in /etc/default/tmpfs, as well as commenting
out 'mount_run mount_noupdate'/'mount_tmp mount_noupdate' in
/etc/init.d/mountall.sh and 'mount_run "$MNTMODE"' in
/etc/init.d/mountkernfs.sh (maybe there is a more minimal change, but
that is what I have right now).
I have RAMLOCK=no and RAMTMP=no in my /etc/default/tmpfs and can't see any
tmpfs mounts.
Thank you for your help!
Best regards,
Alexander
On Sun, Oct 12, 2025 at 04:00:00PM +0300, Alexander Lakhin wrote:
Hi Michael,
12.10.2025 11:31, Michael Banck wrote:
Any way to easily reproduce this? It happened only once on fruitcrow so
far.I'd say it happens pretty often when `make check` doesn't hang (so it
takes an hour or two for me to reproduce).Though now that you've mentioned MAX_CONNECTIONS => '3', I also tried:
EXTRA_REGRESS_OPTS="--max-connections=3" make -s check
and it passed 6 iterations for me. Iteration 7 failed with:
not ok 213�� + partition_aggregate��������������������� 1027 ms--- /home/demo/postgresql/src/test/regress/expected/partition_aggregate.out 2025-10-11 10:04:36.000000000 +0100 +++ /home/demo/postgresql/src/test/regress/results/partition_aggregate.out 2025-10-12 13:02:05.000000000 +0100 @@ -1476,14 +1476,14 @@ �(15 rows)�SELECT x, sum(y), avg(y), sum(x+y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3; - x� | sum� |������� avg�������� |� sum� | count -----+------+--------------------+-------+------- -� 0 | 5000 | 5.0000000000000000 |� 5000 |� 1000 -� 1 | 6000 | 6.0000000000000000 |� 7000 |� 1000 - 10 | 5000 | 5.0000000000000000 | 15000 |� 1000 - 11 | 6000 | 6.0000000000000000 | 17000 |� 1000 - 20 | 5000 | 5.0000000000000000 | 25000 |� 1000 - 21 | 6000 | 6.0000000000000000 | 27000 |� 1000 + x� | sum� |����������� avg������������ |� sum� | count +----+------+----------------------------+-------+------- +� 0 | 5000 |�������� 5.0000000000000000 |� 5000 |� 1000 +� 1 | 6000 |�������� 6.0000000000000000 |� 7000 |� 1000 + 10 | 5000 | 0.000000052757140846001326 | 15000 |� 1000 + 11 | 6000 |�������� 6.0000000000000000 | 17000 |� 1000 + 20 | 5000 |�������� 5.0000000000000000 | 25000 |� 1000 + 21 | 6000 |�������� 6.0000000000000000 | 27000 |� 1000 �(6 rows)
euh.
Then another 6 iterations passed, seventh one hanged. Then 10 iterations
passed.With� EXTRA_REGRESS_OPTS="--max-connections=10" make -s check, I got:
2025-10-12 13:52:58.559 BST client backend[15475] pg_regress/constraints
STATEMENT:� ALTER TABLE notnull_tbl2 ALTER a DROP NOT NULL;
!!!wrapper_handler[15479]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 30, PG_NSIG: 33
!!!wrapper_handler[15476]| postgres_signal_arg: 28481392, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 15476
postgres(ExceptionalCondition+0x5a) [0x1006af78a]
postgres(+0x70f59a) [0x10070f59a]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102b89fee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102b89fdd]on iteration 5.
So we can conclude that the issue with signals is better reproduced with
higher concurrency.28481392 (0x1b29770) is pretty close to 28476608 (0x1b284c0), which I
showed before, so numbers are apparently not random.
Ok, so it seems to do that for different tests/statements.
I'll try to reproduce it here with the above --max-connections=10,
thanks.
Michael
Hi,
On Sun, Oct 12, 2025 at 03:20:07PM +0200, Michael Banck wrote:
I'll try to reproduce it here with the above --max-connections=10,
thanks.
That didn't go so well. It ran for over 100 iterations of make check
without a fault (using EXTRA_REGRESS_OPTS="--max-connections=10"), so
not sure what to do next.
The i386 VM crashed once or twice (without any obvious sign of what's
going on, not to mention SIGABRT.
Michael
Hello Michael,
25.09.2025 08:00, Alexander Lakhin wrote:
I saw those issues frequently on the initial 32bit Hurd VM I started to
run the buildfarm code on, before I switched it to HPET timers. Since
then, I don't think I saw that particular error again, but 4 out 1000 is
not a lot of course.There is also contrib/pg_stat_statements/entry_timestamp, which fails for
me when running in a loop:
for i in `seq 100`; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 make -s check -C contrib/pg_stat_statements || break; doneon iterations 42, 60, 12, 5, 28:
ITERATION 28
...
ok 8 - wal 14 ms
not ok 9 - entry_timestamp 14 ms
ok 10 - privileges 16 ms
...
1..15
# 1 of 15 tests failed.
One month later, fruitcrow has generated this failure too:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-10-25%2007%3A45%3A03
pgsql.build/contrib/pg_stat_statements/regression.diffs
diff -U3
/home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/expected/entry_timestamp.out
/home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/results/entry_timestamp.out
--- /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/expected/entry_timestamp.out
2025-10-25 08:45:03.000000000 +0100
+++ /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/results/entry_timestamp.out
2025-10-25 08:57:31.000000000 +0100
@@ -147,7 +147,7 @@
WHERE query LIKE '%STMTTS%';
total | minmax_exec_zero | minmax_ts_after_ref | stats_since_after_ref
-------+------------------+---------------------+-----------------------
- 2 | 1 | 2 | 0
+ 2 | 2 | 2 | 0
(1 row)
-- Cleanup
Thus, the "zero time difference" issue in general still exists.
Best regards,
Alexander
Hi,
On Tue, Oct 28, 2025 at 07:00:01AM +0200, Alexander Lakhin wrote:
One month later, fruitcrow has generated this failure too:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-10-25%2007%3A45%3A03
Thanks for noticing that, I was distracted with pgconf.eu last week...
It hit again today on v17:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-10-30%2011%3A04%3A28
pgsql.build/contrib/pg_stat_statements/regression.diffs diff -U3 /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/expected/entry_timestamp.out /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/results/entry_timestamp.out --- /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/expected/entry_timestamp.out 2025-10-25 08:45:03.000000000 +0100 +++ /home/demo/client-code-REL_19_1/buildroot/HEAD/pgsql.build/contrib/pg_stat_statements/results/entry_timestamp.out 2025-10-25 08:57:31.000000000 +0100 @@ -147,7 +147,7 @@ �WHERE query LIKE '%STMTTS%'; � total | minmax_exec_zero | minmax_ts_after_ref | stats_since_after_ref �-------+------------------+---------------------+----------------------- -���� 2 |��������������� 1 |������������������ 2 |�������������������� 0 +���� 2 |��������������� 2 |������������������ 2 |�������������������� 0 �(1 row)�-- Cleanup
Thus, the "zero time difference" issue in general still exists.
I checked this, if I just run the following excerpt of
entry_timestamp.sql in a tight loop, I get a few (<10) occurrances out
of 10000 iterations where min/max plan time is 0 (or rather
minmax_plan_zero is non-zero):
SELECT pg_stat_statements_reset();
SET pg_stat_statements.track_planning = TRUE;
SELECT 1 AS "STMTTS1";
SELECT
count(*) as total,
count(*) FILTER (
WHERE min_plan_time + max_plan_time = 0
) as minmax_plan_zero
FROM pg_stat_statements
WHERE query LIKE '%STMTTS%';
On the assumption that this isn't a general bug, but just a timing issue
(planning 'SELECT 1' isn't complicated), I see two possibilities:
1. Ignore the plan times, and replace SELECT 1 with SELECT
pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
coverage so likely not be great?
2. Make the query a bit more complicated so that the plan time is likely
to be non-negligable. I actually had to go quite a way to make it pretty
failsafe, the attached made it fail less than 5 times out of 50000
iterations, not sure whether that is acceptable or still considered
flaky?
Any other ideas?
Michael
Attachments:
0001-Make-pg_stat_statement-s-entry_timestamp-test-less-f.patchtext/x-diff; charset=us-asciiDownload
From e2b920d56b09cf438cd511c5a3dbe415586b68df Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@gmx.net>
Date: Wed, 29 Oct 2025 23:10:30 +0100
Subject: [PATCH] Make pg_stat_statement's entry_timestamp test less flaky.
On systems with relatively low timer precision (such as buildfarm animal
fruitcrow), the combined min/max_plan_time and/or min/max_exec_time could be 0
as the executed query (SELECT 1 and similar) is very simple.
Change this by making the query more complex, including call to pg_sleep and a
join to a system table.
---
.../expected/entry_timestamp.out | 32 +++++++++----------
.../sql/entry_timestamp.sql | 6 ++--
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/entry_timestamp.out b/contrib/pg_stat_statements/expected/entry_timestamp.out
index a10c4be6bac..d8cb712cb4e 100644
--- a/contrib/pg_stat_statements/expected/entry_timestamp.out
+++ b/contrib/pg_stat_statements/expected/entry_timestamp.out
@@ -3,17 +3,17 @@
--
-- planning time is needed during tests
SET pg_stat_statements.track_planning = TRUE;
-SELECT 1 AS "STMTTS1";
- STMTTS1
----------
- 1
+SELECT pg_sleep(1e-6), 1 AS "STMTTS1" FROM pg_database GROUP BY "STMTTS1" ORDER BY "STMTTS1";
+ pg_sleep | STMTTS1
+----------+---------
+ | 1
(1 row)
SELECT now() AS ref_ts \gset
-SELECT 1,2 AS "STMTTS2";
- ?column? | STMTTS2
-----------+---------
- 1 | 2
+SELECT pg_sleep(1e-6), 1, 2 AS "STMTTS2" FROM pg_database GROUP BY "STMTTS2" ORDER BY "STMTTS2";
+ pg_sleep | ?column? | STMTTS2
+----------+----------+---------
+ | 1 | 2
(1 row)
SELECT stats_since >= :'ref_ts', count(*) FROM pg_stat_statements
@@ -80,10 +80,10 @@ query, minmax_stats_since = :'minmax_reset_ts' AS reset_ts_match
FROM pg_stat_statements
WHERE query LIKE '%STMTTS%'
ORDER BY query COLLATE "C";
- query | reset_ts_match
----------------------------+----------------
- SELECT $1 AS "STMTTS1" | t
- SELECT $1,$2 AS "STMTTS2" | f
+ query | reset_ts_match
+------------------------------------------------------------------------------------------------+----------------
+ SELECT pg_sleep($1), $2, 2 AS "STMTTS2" FROM pg_database GROUP BY "STMTTS2" ORDER BY "STMTTS2" | f
+ SELECT pg_sleep($1), 1 AS "STMTTS1" FROM pg_database GROUP BY "STMTTS1" ORDER BY "STMTTS1" | t
(2 rows)
-- check that minmax reset does not set stats_reset
@@ -123,10 +123,10 @@ WHERE query LIKE '%STMTTS%';
(1 row)
-- Execute first query once more to check stats update
-SELECT 1 AS "STMTTS1";
- STMTTS1
----------
- 1
+SELECT pg_sleep(1e-6), 1 AS "STMTTS1" FROM pg_database GROUP BY "STMTTS1" ORDER BY "STMTTS1";
+ pg_sleep | STMTTS1
+----------+---------
+ | 1
(1 row)
-- check
diff --git a/contrib/pg_stat_statements/sql/entry_timestamp.sql b/contrib/pg_stat_statements/sql/entry_timestamp.sql
index d6d3027ab4f..61f7f39b9dd 100644
--- a/contrib/pg_stat_statements/sql/entry_timestamp.sql
+++ b/contrib/pg_stat_statements/sql/entry_timestamp.sql
@@ -5,9 +5,9 @@
-- planning time is needed during tests
SET pg_stat_statements.track_planning = TRUE;
-SELECT 1 AS "STMTTS1";
+SELECT pg_sleep(1e-6), 1 AS "STMTTS1" FROM pg_database GROUP BY "STMTTS1" ORDER BY "STMTTS1";
SELECT now() AS ref_ts \gset
-SELECT 1,2 AS "STMTTS2";
+SELECT pg_sleep(1e-6), 1, 2 AS "STMTTS2" FROM pg_database GROUP BY "STMTTS2" ORDER BY "STMTTS2";
SELECT stats_since >= :'ref_ts', count(*) FROM pg_stat_statements
WHERE query LIKE '%STMTTS%'
GROUP BY stats_since >= :'ref_ts'
@@ -91,7 +91,7 @@ FROM pg_stat_statements
WHERE query LIKE '%STMTTS%';
-- Execute first query once more to check stats update
-SELECT 1 AS "STMTTS1";
+SELECT pg_sleep(1e-6), 1 AS "STMTTS1" FROM pg_database GROUP BY "STMTTS1" ORDER BY "STMTTS1";
-- check
-- we don't check planing times here to be independent of
--
2.39.5
Hello Thomas and Michael!
Sorry for the delay. I've finally completed a new round of experiments and
discovered the following:
12.10.2025 03:42, Thomas Munro wrote:
Hmm. We only install the handler for real signal numbers, and it
clearly managed to find the handler, so then how did it corrupt signo
before calling the function? I wonder if there could concurrency bugs
reached by our perhaps unusually large amount of signaling (we have
found bugs in the signal implementations of several other OSes...).
This might be the code:https://github.com/bminor/glibc/blob/master/hurd/hurdsig.c#L639
It appears to suspend the thread selected to handle the signal, mess
with its stack/context and then resume it, just like traditional
monokernels, it's just done in user space by code running in a helper
thread that communicates over Mach ports. So it looks like I
misunderstood that comment in the docs, it's not the handler itself
that runs in a different thread, unless I'm looking at the wrong code
(?).Some random thoughts after skim-reading that and
glibc/sysdeps/mach/hurd/x86/trampoline.c:
* I wonder if setting up sigaltstack() and then using SA_ONSTACK in
pqsignal() would behave differently, though SysV AMD64 calling
conventions (used by Hurd IIGC) have the first argument in %rdi, not
the stack, so I don't really expect that to be relevant...
* I wonder about the special code paths for handlers that were already
running and happened to be in sigreturn(), or something like that,
which I didn't study at all, but it occurred to me that our pqsignal
will only block the signal itself while running a handler (since it
doesn't specify SA_NODEFER)... so what happens if you block all
signals while running each handler by changing
sigemptyset(&act.sa_mask) to sigfillset(&act.sa_mask)?
Thank you for the suggestion!
With this modification:
@@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)
#if !(defined(WIN32) && defined(FRONTEND))
act.sa_handler = func;
- sigemptyset(&act.sa_mask);
+ sigfillset(&act.sa_mask);
act.sa_flags = SA_RESTART;
I got 100 iterations passed (12 of them hanged) without that Assert
triggered.
* I see special code paths for SIGIO and SIGURG that I didn't try to
understand, but I wonder what would happen if we s/SIGURG/SIGXCPU/
With sed 's/SIGURG/SIGXCPU/' -i src/backend/storage/ipc/waiteventset.c, I
still got:
!!!wrapper_handler[8401]| postgres_signal_arg: 28565808, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 8401
...
2025-11-09 12:51:24.095 GMT postmaster[7282] LOG: client backend (PID 8401) was terminated by signal 6: Aborted
2025-11-09 12:51:24.095 GMT postmaster[7282] DETAIL: Failed process was running: UPDATE PKTABLE set ptest2=5 where
ptest2=2;
---
!!!wrapper_handler[21000]| postgres_signal_arg: 28545040, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 21000
...
2025-11-09 13:06:59.458 GMT postmaster[20669] LOG: client backend (PID 21000) was terminated by signal 6: Aborted
2025-11-09 13:06:59.458 GMT postmaster[20669] DETAIL: Failed process was running: UPDATE pvactst SET i = i WHERE i < 1000;
---
!!!wrapper_handler[21973]| postgres_signal_arg: 28562608, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 21973
2025-11-09 14:56:23.955 GMT postmaster[20665] LOG: client backend (PID 21973) was terminated by signal 6: Aborted
2025-11-09 14:56:23.955 GMT postmaster[20665] DETAIL: Failed process was running: INSERT INTO pagg_tab_m SELECT i % 30,
i % 40, i % 50 FROM generate_series(0, 2999) i;
The failure rate is approximately 1 per 30 runs.
Besides that Assert and the hangs, I also observed:
--- /home/demo/postgresql/src/test/regress/expected/xml.out 2025-10-11 10:04:43.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/xml.out 2025-11-10 07:20:56.000000000 +0000
@@ -1788,10 +1788,14 @@
proargtypes text))
SELECT * FROM z
EXCEPT SELECT * FROM x;
- proname | proowner | procost | pronargs | proargnames | proargtypes
----------+----------+---------+----------+-------------+-------------
-(0 rows)
-
+ERROR: could not parse XML document
+DETAIL: line 1: Input is not proper UTF-8, indicate encoding !
+Bytes: 0x92 0x11 0x69 0x3C
+<data>X~R^Qi<proc><proname>pg_get_replication_slots</proname><proowner>10</proowne
+ ^
+line 1: PCDATA invalid Char value 17
+<data>X~R^Qi<proc><proname>pg_get_replication_slots</proname><proowner>10</proowne
+
TRAP: failed Assert("AllocBlockIsValid(block)"), File: "aset.c", Line: 1536, PID: 16354
...
2025-11-09 10:21:16.249 GMT postmaster[15242] LOG: client backend (PID 16354) was terminated by signal 6: Aborted
2025-11-09 10:21:16.249 GMT postmaster[15242] DETAIL: Failed process was running: CREATE INDEX i_bmtest_a ON bmscantest(a);
2025-11-09 10:21:16.249 GMT postmaster[15242] LOG: terminating any other active server processes
TRAP: failed Assert("npages == tbm->npages"), File: "tidbitmap.c", Line: 825, PID: 4641
...
2025-10-14 12:09:00.555 BST postmaster[3818] LOG: client backend (PID 4641) was terminated by signal 6: Aborted
2025-10-14 12:09:00.555 BST postmaster[3818] DETAIL: Failed process was running: select count(*) from tenk1, tenk2
where tenk1.hundred > 1 and tenk2.thousand=0;
--- /home/demo/postgresql/src/test/regress/expected/join_hash.out 2025-10-11 10:04:34.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/join_hash.out 2025-10-14 11:30:16.000000000 +0100
@@ -485,20 +485,12 @@
(8 rows)
select count(*) from simple r join extremely_skewed s using (id);
- count
--------
- 20000
-(1 row)
-
+ERROR: could not read from temporary file: read only 411688 of 47854847 bytes
--- /home/demo/postgresql/src/test/regress/expected/bitmapops.out 2025-10-11 10:04:29.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/bitmapops.out 2025-10-14 11:08:58.000000000 +0100
@@ -13,6 +13,10 @@
SELECT (r%53), (r%59),
'foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'
FROM generate_series(1,70000) r;
CREATE INDEX i_bmtest_a ON bmscantest(a);
+ERROR: index row size 6736 exceeds btree version 4 maximum 2704 for index "i_bmtest_a"
30.10.2025 17:30, Michael Banck wrote:
I checked this, if I just run the following excerpt of
entry_timestamp.sql in a tight loop, I get a few (<10) occurrances out
of 10000 iterations where min/max plan time is 0 (or rather
minmax_plan_zero is non-zero):SELECT pg_stat_statements_reset();
SET pg_stat_statements.track_planning = TRUE;
SELECT 1 AS "STMTTS1";
SELECT
count(*) as total,
count(*) FILTER (
WHERE min_plan_time + max_plan_time = 0
) as minmax_plan_zero
FROM pg_stat_statements
WHERE query LIKE '%STMTTS%';On the assumption that this isn't a general bug, but just a timing issue
(planning 'SELECT 1' isn't complicated), I see two possibilities:1. Ignore the plan times, and replace SELECT 1 with SELECT
pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
coverage so likely not be great?2. Make the query a bit more complicated so that the plan time is likely
to be non-negligable. I actually had to go quite a way to make it pretty
failsafe, the attached made it fail less than 5 times out of 50000
iterations, not sure whether that is acceptable or still considered
flaky?
What concerns me is that there is also subscription.sql and maybe could
be other test(s) that expect at least 1000ns (far from infinite) timer
resolution. Probably it would make sense to define which timer resolution
we consider acceptable for tests and then to check if Hurd can provide it.
Best regards,
Alexander
Hi Alexander,
On Mon, Nov 10, 2025 at 09:00:01PM +0200, Alexander Lakhin wrote:
Sorry for the delay. I've finally completed a new round of experiments and
discovered the following:
[...]
12.10.2025 03:42, Thomas Munro wrote:
* I wonder about the special code paths for handlers that were already
running and happened to be in sigreturn(), or something like that,
which I didn't study at all, but it occurred to me that our pqsignal
will only block the signal itself while running a handler (since it
doesn't specify SA_NODEFER)... so what happens if you block all
signals while running each handler by changing
sigemptyset(&act.sa_mask) to sigfillset(&act.sa_mask)?Thank you for the suggestion!
With this modification:
@@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)#if !(defined(WIN32) && defined(FRONTEND))
act.sa_handler = func;
- sigemptyset(&act.sa_mask);
+ sigfillset(&act.sa_mask);
act.sa_flags = SA_RESTART;I got 100 iterations passed (12 of them hanged) without that Assert
triggered.
But those hangs were unrelated to the assert then, right?
* I see special code paths for SIGIO and SIGURG that I didn't try to
understand, but I wonder what would happen if we s/SIGURG/SIGXCPU/With sed 's/SIGURG/SIGXCPU/' -i src/backend/storage/ipc/waiteventset.c, I
still got:
!!!wrapper_handler[8401]| postgres_signal_arg: 28565808, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 8401
...
2025-11-09 12:51:24.095 GMT postmaster[7282] LOG: client backend (PID 8401) was terminated by signal 6: Aborted
2025-11-09 12:51:24.095 GMT postmaster[7282] DETAIL: Failed process was
running: UPDATE PKTABLE set ptest2=5 where ptest2=2;
---!!!wrapper_handler[21000]| postgres_signal_arg: 28545040, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 21000
...
2025-11-09 13:06:59.458 GMT postmaster[20669] LOG: client backend (PID 21000) was terminated by signal 6: Aborted
2025-11-09 13:06:59.458 GMT postmaster[20669] DETAIL: Failed process was running: UPDATE pvactst SET i = i WHERE i < 1000;
---
!!!wrapper_handler[21973]| postgres_signal_arg: 28562608, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 219732025-11-09 14:56:23.955 GMT postmaster[20665] LOG: client backend (PID 21973) was terminated by signal 6: Aborted
2025-11-09 14:56:23.955 GMT postmaster[20665] DETAIL: Failed process was
running: INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM
generate_series(0, 2999) i;The failure rate is approximately 1 per 30 runs.
Is that the same failure rate you got before?
Michael
10.11.2025 21:33, Michael Banck wrote:
On Mon, Nov 10, 2025 at 09:00:01PM +0200, Alexander Lakhin wrote:
I got 100 iterations passed (12 of them hanged) without that Assert
triggered.But those hangs were unrelated to the assert then, right?
Yeah, I think there are several unrelated issues here.
The failure rate is approximately 1 per 30 runs.
Is that the same failure rate you got before?
Yes, AFAICS, s/SIGURG/SIGXCPU/ didn't affect the failure and it's rate.
Best regards,
Alexander
Import Notes
Reply to msg id not found: 69123df8.050a0220.155a6e.f99dSMTPIN_ADDED_BROKEN@mx.google.com
On Tue, Nov 11, 2025 at 8:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
With this modification:
@@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)#if !(defined(WIN32) && defined(FRONTEND))
act.sa_handler = func;
- sigemptyset(&act.sa_mask);
+ sigfillset(&act.sa_mask);
act.sa_flags = SA_RESTART;I got 100 iterations passed (12 of them hanged) without that Assert
triggered.
Interesting. Perhaps a minimal program that installs a handler
assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
another program loops calling kill(the_other_one, rand() % 2 == 0 ?
SIGUSR1 : SIGUSR2), to support a bug report?
[lots of weird errors in a wide range of code]
I can't make much sense of these failures, but are you saying that
these only happen without that sigfillset(&act.sa_mask) change, that
is, when the signal implementation is misbehaving? If so, I wonder if
the same bug in their signal handling might just be corrupting the
user stack sometimes even when the signal number assertion doesn't
trip.
On the assumption that this isn't a general bug, but just a timing issue
(planning 'SELECT 1' isn't complicated), I see two possibilities:1. Ignore the plan times, and replace SELECT 1 with SELECT
pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
coverage so likely not be great?2. Make the query a bit more complicated so that the plan time is likely
to be non-negligable. I actually had to go quite a way to make it pretty
failsafe, the attached made it fail less than 5 times out of 50000
iterations, not sure whether that is acceptable or still considered
flaky?
Wait, we have tests that fail if the clock doesn't advance? Isn't
that just bogus?
What concerns me is that there is also subscription.sql and maybe could
be other test(s) that expect at least 1000ns (far from infinite) timer
resolution. Probably it would make sense to define which timer resolution
we consider acceptable for tests and then to check if Hurd can provide it.
Ah, I see, so that one is checking if the last reset time advanced to
check that something happened. That also has the theoretical problem
that CLOCK_REALTIME can go backwards sometimes, due to ntpd
adjustments or whatever. In the absence of a "reset_counter" column,
perhaps we could consider a kludge like x->reset_time =
Max(x->reset_time + 1ns, now), just to make sure the value always goes
up on reset, without having any noticeable effect on normal systems...
On Tue, Nov 11, 2025 at 9:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Interesting. Perhaps a minimal program that installs a handler
assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
Or tighter: assert(signo == SIGUSR1 || signo == SIGUSR2).
10.11.2025 22:03, Thomas Munro wrote:
On Tue, Nov 11, 2025 at 8:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
With this modification:
@@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)#if !(defined(WIN32) && defined(FRONTEND))
act.sa_handler = func;
- sigemptyset(&act.sa_mask);
+ sigfillset(&act.sa_mask);
act.sa_flags = SA_RESTART;I got 100 iterations passed (12 of them hanged) without that Assert
triggered.Interesting. Perhaps a minimal program that installs a handler
assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
another program loops calling kill(the_other_one, rand() % 2 == 0 ?
SIGUSR1 : SIGUSR2), to support a bug report?
Yeah, thank you for the idea! I will try it in the coming days.
[lots of weird errors in a wide range of code]
I can't make much sense of these failures, but are you saying that
these only happen without that sigfillset(&act.sa_mask) change, that
is, when the signal implementation is misbehaving? If so, I wonder if
the same bug in their signal handling might just be corrupting the
user stack sometimes even when the signal number assertion doesn't
trip.
No, I think those failures are unrelated, I hit them just because I
executed `make check` many times and some of them definitely occurred
with the unmodified code. Now that I have a script that handles OS hangs
and restores VM's disk automatically, I can run tests for hours and look
for one failure or another if it can be helpful.
On the assumption that this isn't a general bug, but just a timing issue
(planning 'SELECT 1' isn't complicated), I see two possibilities:1. Ignore the plan times, and replace SELECT 1 with SELECT
pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
coverage so likely not be great?2. Make the query a bit more complicated so that the plan time is likely
to be non-negligable. I actually had to go quite a way to make it pretty
failsafe, the attached made it fail less than 5 times out of 50000
iterations, not sure whether that is acceptable or still considered
flaky?Wait, we have tests that fail if the clock doesn't advance? Isn't
that just bogus?
Yeah, we have, this was discussed (and one test was hardened) upthread.
What concerns me is that there is also subscription.sql and maybe could
be other test(s) that expect at least 1000ns (far from infinite) timer
resolution. Probably it would make sense to define which timer resolution
we consider acceptable for tests and then to check if Hurd can provide it.Ah, I see, so that one is checking if the last reset time advanced to
check that something happened. That also has the theoretical problem
that CLOCK_REALTIME can go backwards sometimes, due to ntpd
adjustments or whatever. In the absence of a "reset_counter" column,
perhaps we could consider a kludge like x->reset_time =
Max(x->reset_time + 1ns, now), just to make sure the value always goes
up on reset, without having any noticeable effect on normal systems...
AFAICS, those test cases use pg_clock_gettime_ns() with CLOCK_MONOTONIC
(if defined, and it's really defined on Hurd), so it should not matter in
this concrete case.
Best regards,
Alexander
Hi,
On Tue, Nov 11, 2025 at 09:03:32AM +1300, Thomas Munro wrote:
Interesting. Perhaps a minimal program that installs a handler
assert(signo < 32) for both SIGUSR1 and SIGUSR2 might fail too, if
another program loops calling kill(the_other_one, rand() % 2 == 0 ?
SIGUSR1 : SIGUSR2), to support a bug report?
I tried that, and on 32bit (the 64bit vm is currently on a buildfarm
run) I got one crash (after many million kill()s) so far:
|test_signalhandler: ../sysdeps/mach/hurd/mig-reply.c:85:
|__mig_dealloc_reply_port: Unexpected error: (os/kern) invalid name.
For future reference, on the Mach console:
|login: task ./test_signalhandler(767) looked up a bogus port 23 for3205, most probably a bug.
|task ./test_signalhandler(767) looked up a bogus port 23 for 3205, most probably a bug.
|task ./test_signalhandler(767) looked up a bogus port 23 for 3205, most probably a bug.
|task ./test_signalhandler(767) decreasing a bogus port 23 by 1, most probably a bug.
|/hurd/crash:./test_signalhandler(767) crashed, signal {no:6, code:0, error:0}, exception {0, code:0, subcode:0}, PCs: {
|0x10674ac 0x12f1136 0x10b618c 0x10b64a5 0x10b5611 0x1066953 0x1066870 0x1066918 0x1066396 0x12cec49,
|0x10674ac 0x12e3a4f 0x107d09b 0x107e311 0x107e60f 0x1302e67 0x13031ea 0x109715a 0x1068178 0x10682a4
|}, killing task.
Not sure whether that is the same issue, but I'll report it upstream
tomorrow once I checked on 64bit as well.
Michael
On Tue, Nov 11, 2025 at 10:49 AM Michael Banck <mbanck@gmx.net> wrote:
|login: task ./test_signalhandler(767) looked up a bogus port 23 for3205, most probably a bug.
. o O { An absurdly far-fetched thought while browsing glibc/hurd glue
code: if synchronous I/O is implemented as RPC on Mach ports, could
that mean that it's technically possible to submit now and consume
results later, for asynchronous I/O? Possibly too
private/undocumented anyway, and maybe they'll eventually do io_uring
or something... I idly wondered about driving I/O directly with ports
while studying the dismal implementation of POSIX AIO on macOS, which
also derives from CMU Mach, but NeXT/Apple jammed file systems down
into the unikernel part behind traditional syscalls, and it looks like
maybe only raw devices are accessible with ports. (I have dim
memories of learning C and assembler more than 30 years ago on a
Commodore Amiga whose microkernel nee Cambridge TRIPOS worked like
that... that cheap home computer could easily get both floppy drives
doing random I/O at once while computing other stuff, unlike Unix...)
}
Import Notes
Reply to msg id not found: 69125dd5.170a0220.d19df.23dcSMTPIN_ADDED_BROKEN@mx.google.com
Hi Thomas,
On Mon, Nov 17, 2025 at 03:59:30PM +1300, Thomas Munro wrote:
On Tue, Nov 11, 2025 at 10:49 AM Michael Banck <mbanck@gmx.net> wrote:
|login: task ./test_signalhandler(767) looked up a bogus port 23 for3205, most probably a bug.
. o O { An absurdly far-fetched thought while browsing glibc/hurd glue
code: if synchronous I/O is implemented as RPC on Mach ports, could
that mean that it's technically possible to submit now and consume
results later, for asynchronous I/O? Possibly too
private/undocumented anyway, and maybe they'll eventually do io_uring
or something... I idly wondered about driving I/O directly with ports
while studying the dismal implementation of POSIX AIO on macOS, which
also derives from CMU Mach, but NeXT/Apple jammed file systems down
into the unikernel part behind traditional syscalls, and it looks like
maybe only raw devices are accessible with ports. (I have dim
memories of learning C and assembler more than 30 years ago on a
Commodore Amiga whose microkernel nee Cambridge TRIPOS worked like
that... that cheap home computer could easily get both floppy drives
doing random I/O at once while computing other stuff, unlike Unix...)
}
I have CC'd Samuel Thibault who is the currently active Hurd committer/
maintainer, I hope he can comment on that.
Michael
Hello,
On Mon, Nov 17, 2025 at 03:59:30PM +1300, Thomas Munro wrote:
On Tue, Nov 11, 2025 at 10:49 AM Michael Banck <mbanck@gmx.net> wrote:
|login: task ./test_signalhandler(767) looked up a bogus port 23 for3205, most probably a bug.
. o O { An absurdly far-fetched thought while browsing glibc/hurd glue
code: if synchronous I/O is implemented as RPC on Mach ports, could
that mean that it's technically possible to submit now and consume
results later, for asynchronous I/O?
Yes, it is completely possible.
Possibly too private/undocumented anyway,
It's not really documented much, but it's completely public. One
can include <hurd/io_request.h> and call e.g. io_read_request(port,
reply_port, offset, amount). One then has to run a msgserver loop on the
reply_port to get the reply messages. An example can be seen in the hurd
source in trans/streamio.c, for e.g. device_open_request() calls.
I idly wondered about driving I/O directly with ports
while studying the dismal implementation of POSIX AIO on macOS, which
also derives from CMU Mach, but NeXT/Apple jammed file systems down
into the unikernel part behind traditional syscalls, and it looks like
maybe only raw devices are accessible with ports.
On MacOS, probably indeed. But GNU/Hurd POSIX files are represent as
mach port too so the usual MIG asynchronous mechanism is available.
Samuel
On Tue, Nov 18, 2025 at 12:31 PM Samuel Thibault
<samuel.thibault@gnu.org> wrote:
On Mon, Nov 17, 2025 at 03:59:30PM +1300, Thomas Munro wrote:
. o O { An absurdly far-fetched thought while browsing glibc/hurd glue
code: if synchronous I/O is implemented as RPC on Mach ports, could
that mean that it's technically possible to submit now and consume
results later, for asynchronous I/O?Yes, it is completely possible.
Neat!
Possibly too private/undocumented anyway,
It's not really documented much, but it's completely public. One
can include <hurd/io_request.h> and call e.g. io_read_request(port,
reply_port, offset, amount). One then has to run a msgserver loop on the
reply_port to get the reply messages. An example can be seen in the hurd
source in trans/streamio.c, for e.g. device_open_request() calls.
OK, to continue the thought experiment... someone could invent write
io_method=hurd, and it'd have to be more efficient than handing the
work off to I/O worker processes (what you get with the default
io_method=worker), since the worker process clearly has to do exactly
the same thing internally in a synchronous wrapper function anyway,
just with extra steps to reach it. At a guess, it could follow
io_method=io_uring's general design and have a reply port owned by
each backend (= process), and backends would almost always consume
replies from their own reply port. They'd need to be able to consume
from each other's reply port occasionally, but I assume that's
possible with an exclusive lock and a temporary transfer of receive
rights. Every process would have to receive duplicates of the full
set of ports after fork(), but at least that problem would go away in
an in-development multithreaded mode.
I doubt it'd be much good without a readv/writev operations, though.
It looks they aren't in io_request.defs yet? Does that also imply
that preadv() has to loop over the vectors sending tons of messages
and waiting for replies?
Standard POSIX AIO also lacks vectored I/O. It lacks many, many other
things one might want (though serious implementations in the old
commercial Unixen added unknown incompatible extensions negotiated
with database vendors, including reply ports), but scatter/gather
seems pretty fundamental for database buffer pool implementations:
we'd have to call aio_read()/aio_write() 16, 32 times when we could
just ask a helper process to call preadv() once (assuming it's really
one operation), to transfer a contiguous blocks range to/from
discontiguous buffers. Databases want to do that a lot. When
combined with direct I/O, that's actual IOPS out the window, but even
for buffered I/O it's a very high overhead for straight-line I/O. For
that reason we don't actually support pgaio implementations that don't
have readv/writev currently. When we tried it we had to inhibit I/O
combining at higher levels and it wasn't good.
(And then to get more and more pie-in-the-sky: (1) O_DIRECT is highly
desirable for zero-copy DMA to/from a user space buffer pool, (2)
starting more than one I/O with a single context switch and likewise
for consuming replies, (3) registering/locking memory pages and
descriptors with a port so they don't have to be pinned/unpinned by
the I/O subsystem all the time. And then, if Hurd works the way I
think it might, (4) to avoid chains of pipe-like scheduling overheads
when starting a direct I/O and maybe also some already-cached buffered
I/O, you'd ideally want ports to have a "fast" send path that behaves
like the old Spring/Solaris doors, where the caller's thread would
yield directly to a thread in the receiving server, forming a chain:
database -> file system -> driver -> device that is sort of
synchronous and then returns control, like a kind of dual of a system
call that reaches through the chain of user space service, and
presumably the same sort of thing on the way back from the interrupt
handler on completion. Idea (4) might well be Hurd/Mach heresy for
all I know, being totally out of the loop on this stuff; or perhaps
you already have something like that...)
Hello,
Thomas Munro, le mar. 18 nov. 2025 18:32:38 +1300, a ecrit:
On Tue, Nov 18, 2025 at 12:31 PM Samuel Thibault
Possibly too private/undocumented anyway,
It's not really documented much, but it's completely public. One
can include <hurd/io_request.h> and call e.g. io_read_request(port,
reply_port, offset, amount). One then has to run a msgserver loop on the
reply_port to get the reply messages. An example can be seen in the hurd
source in trans/streamio.c, for e.g. device_open_request() calls.OK, to continue the thought experiment... someone could invent write
io_method=hurd, and it'd have to be more efficient than handing the
work off to I/O worker processes (what you get with the default
io_method=worker), since the worker process clearly has to do exactly
the same thing internally in a synchronous wrapper function anyway,
just with extra steps to reach it.
Yes, you'd avoid having to block a whole thread for just one request,
and instead just queue requests, and process replies.
They'd need to be able to consume from each other's reply port
occasionally, but I assume that's possible with an exclusive lock and
a temporary transfer of receive rights.
Yes, you can transfer ports between processes.
I doubt it'd be much good without a readv/writev operations, though.
It looks they aren't in io_request.defs yet?
They have not been defined so far indeed.
Does that also imply that preadv() has to loop over the vectors
sending tons of messages and waiting for replies?
Currently glibc's preadv performs copies.
(And then to get more and more pie-in-the-sky: (1) O_DIRECT is highly
desirable for zero-copy DMA to/from a user space buffer pool,
We don't currently have that defined.
(2) starting more than one I/O with a single context switch and likewise
for consuming replies,
That would be possible by introducing in gnumach a multi-message variant
of the mach_msg() system call.
(3) registering/locking memory pages and descriptors with a port so
they don't have to be pinned/unpinned by the I/O subsystem all the
time.
That could be introduced too indeed.
And then, if Hurd works the way I think it might, (4) to avoid chains
of pipe-like scheduling overheads when starting a direct I/O and
maybe also some already-cached buffered I/O, you'd ideally want ports
to have a "fast" send path that behaves like the old Spring/Solaris
doors, where the caller's thread would yield directly to a thread in
the receiving server,
That was proposed/experimented, called migrating threads:
https://www.gnu.org/software/hurd/open_issues/mach_migrating_threads
Samuel
On Thu, Nov 20, 2025 at 11:45 AM Samuel Thibault
<samuel.thibault@gnu.org> wrote:
Thomas Munro, le mar. 18 nov. 2025 18:32:38 +1300, a ecrit:
Does that also imply that preadv() has to loop over the vectors
sending tons of messages and waiting for replies?Currently glibc's preadv performs copies.
Even without O_DIRECT (= potential scatter/gather DMA to/from user
space), the kernel/server still seems like a better place to put a
scatter/gather-with-memcpy() loop, because it has to copy the data
in/out anyway, so the data is copied twice with the user space
implementation, and user space additionally has to allocate/free or
allocate/hold a temporary contiguous buffer. In PostgreSQL, the
maximum transfer size is 1MB and the default is 128kB, so it's not
peanuts with many processes/threads needing their own double-buffer.
I thought about that when we started using preadv/pwritev in
PostgreSQL, and I had to decide how to handle the few systems that
don't have it. So I chose to make it loop over pread()/pwritev() in
user space[1]https://github.com/postgres/postgres/blob/master/src/include/port/pg_iovec.h.
(At the time, only Solaris and Windows lacked it, out of around a
dozen systems we test, in another nice example of ecosystem
interaction, the Solaris kernel team saw this and added the syscalls
and proposed them to POSIX-next[2]https://austingroupbugs.net/view.php?id=1832, so it's down to Windows. Windows
can avoid this too and perform DMA for O_DIRECT, if we make some
architectural changes...)
About that 128kB/1MB number: that's another bit of unfinished business
for Hurd systems. We currently assume that if you didn't define
IOV_MAX, then it must be 16, which gives 16 * 8kB = 128kB. I know
that the comment about the Hurd in the relevant file[1]https://github.com/postgres/postgres/blob/master/src/include/port/pg_iovec.h is wrong,
IOV_MAX is not required to be defined by POSIX, and I know the GNU
philosophy is to avoid arbitrary limits. I doubt it matters much in
practice, especially without direct I/O (where larger scatter/gather
might scale non-linearly creating a sweet spot that is higher than
that), but it'd be nice to improve that...
I see that fs.defs can do fsync and fdatasync (= omit_metadata), which
is good, we'd make use of those too (v18 only does asynchronous reads,
but v19 will hopefully add writes). More pie-in-the-sky ideas include
(1) an O_DSYNC that is converted to FUA, and (2) doesn't block
concurrent non-overlapping writes (PostgreSQL currently serialises its
WAL (transaction log) writes, but hopefully in future will learn not
to do that), and (3) if FUA isn't supported by the storage, flushes
the drive write cache, unless it somehow knows this isn't necessary
because of powered caches. That's what Linux does, anyway.
(And then to get more and more pie-in-the-sky: (1) O_DIRECT is highly
desirable for zero-copy DMA to/from a user space buffer pool,We don't currently have that defined.
. o O { Is anyone trying to put ext4 or xfs into a Hurd server? }
(2) starting more than one I/O with a single context switch and likewise
for consuming replies,That would be possible by introducing in gnumach a multi-message variant
of the mach_msg() system call.
. o O { If I were designing a new mach_msgs() I'd also be tempted to
try to make it so that the messages don't have have to be copied in
during the system call, but instead can be accessed directly by the
receiver, which probably means registering VM pages with the port,
preventing faulting, and mapping them into both sender and receiver
until the port is closed, which probably also means you want a
circular queue to deal with the fixed space. I'm basically describing
io_uring's submission and completion queues, reimagined as user space
port buffers. }
(3) registering/locking memory pages and descriptors with a port so
they don't have to be pinned/unpinned by the I/O subsystem all the
time.That could be introduced too indeed.
If that is something the Hurd project is ever looking into, there is
an interesting special case for registered socket buffers: if you have
10,000 mostly idle sockets, and you have a recv() in progress on all
of them, then you don't really want to have to supply 10,000 user
space buffers to receive into, so it'd be nice to be able to register
a user space socket buffer pool of some smaller size and let the I/O
subsystem pick a free buffer when a packet arrives and tell you which
one in the reply message. This is a problem that people meet when
they move from readiness-based networking to asynchronous networking,
with high connection counts. PostgreSQL can't do asynchronous socket
I/O yet, but a couple of us have had semi-working prototypes... (That
architecture would prepare for more zero-copy/DMA-based networking and
hopefully even offloading TLS to kernel threads or fancy network
cards, while I'm doing a tour of vapourware...)
And then, if Hurd works the way I think it might, (4) to avoid chains
of pipe-like scheduling overheads when starting a direct I/O and
maybe also some already-cached buffered I/O, you'd ideally want ports
to have a "fast" send path that behaves like the old Spring/Solaris
doors, where the caller's thread would yield directly to a thread in
the receiving server,That was proposed/experimented, called migrating threads:
https://www.gnu.org/software/hurd/open_issues/mach_migrating_threads
Interesting paper. I wonder if Apple's XPC is also somehow
short-circuiting Mach RPC like this too, for example I think XPC is
used for talking to services like DNS lookup (a driving motivation for
doors on Solaris IIRC), but all their newer OS stuff is closed source
so... *shrug* :-/
Anyway, this sounds like quite a fun OS research project. I suspect
there wouldn't be too many other complex programs that could take
advantage of Mach's asynchrony to the degree PostgreSQL could, even
today but certainly later as its new AIO system spreads to more
parts... If the existing stability problems were resolved and out of
the way first, and if readv/writev operations were added, then I would
be willing to prototype a PostgreSQL patch to try it. Famous last
words perhaps, but it doesn't sound very hard: a minimal POC for a new
PostgreSQL I/O method weighs in at 200-400 lines of code that is
mostly setup and mapping our abstractions to system calls *if* it has
the right basic operations and semantics with no hidden traps. All
the rest of the vapourware we've discussed might just be independent
optimisation work on the Hurd side after that, to make it perform?
[1]: https://github.com/postgres/postgres/blob/master/src/include/port/pg_iovec.h
[2]: https://austingroupbugs.net/view.php?id=1832
Hello,
Thomas Munro, le jeu. 20 nov. 2025 17:20:57 +1300, a ecrit:
On Thu, Nov 20, 2025 at 11:45 AM Samuel Thibault
<samuel.thibault@gnu.org> wrote:Thomas Munro, le mar. 18 nov. 2025 18:32:38 +1300, a ecrit:
Does that also imply that preadv() has to loop over the vectors
sending tons of messages and waiting for replies?Currently glibc's preadv performs copies.
Even without O_DIRECT (= potential scatter/gather DMA to/from user
space), the kernel/server still seems like a better place to put a
scatter/gather-with-memcpy() loop,
I'm not saying the current behavior is optimized :)
(And then to get more and more pie-in-the-sky: (1) O_DIRECT is highly
desirable for zero-copy DMA to/from a user space buffer pool,We don't currently have that defined.
. o O { Is anyone trying to put ext4 or xfs into a Hurd server? }
The idea would be to leverage the FreeBSD implementation through the
rump layer.
(2) starting more than one I/O with a single context switch and likewise
for consuming replies,That would be possible by introducing in gnumach a multi-message variant
of the mach_msg() system call.. o O { If I were designing a new mach_msgs() I'd also be tempted to
try to make it so that the messages don't have have to be copied in
during the system call, but instead can be accessed directly by the
receiver,
That's already what happens: for out-of-line data, mach maps-in/out the
data pages through the RPC.
Samuel
Hello hackers,
24.09.2025 09:31, Michael Banck wrote:
On Mon, Sep 22, 2025 at 11:30:00PM +0300, Alexander Lakhin wrote:
Maybe I was wrong and we can at least categorize these failures -- I hope
their number is finite, but my point was that it's hardly possible to use
the information, that fruitcrow gives us, to improve Postgres.Or, for that matter, to improve GNU Mach/Hurd...
Another three months later, we can see all the fruitcrow failures
counted (113 so far) at:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures
with the following main categories:
1) test_shm_mq times out on Hurd animal fruitcrow due to OS issue
2) pg_stat_statements/entry_timestamp.sql failed due to zero time diff on Hurd animal fruitcrow
3) multiple-row-versions.spec fails on Hurd animal due to OS issue
4) Miscellaneous tests fail on on Hurd animal due to invalid signal received
(and a few assorted failures)
I think it doesn't make much sense to keep tracking/sorting all the
failures produced by fruitcrow, so I'm going to just filter them out until
new Hurd release.
Best regards,
Alexander