occasional ECPG failures on dikkop (FreeBSD)
Hi,
about a month ago dikkop started reporting occasional failures in ECPG
tests. I'm not very familiar with ecpg, and I've been unable to figure
this out so far, so I wonder if others might know ...
The failures seem to happen maybe ~5% of the runs, but only when it's
through the buildfarm client. I've been unable to reproduce the issue,
even when trying to use exactly the same options etc.
Two failures from master:
*
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2026-04-07%2011%3A00%3A39
*
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2026-05-04%2010%3A00%3A10
However, it seems to affect older branches too, all the way back to
REL_14_STABLE.
The failures started to appear ~30 days ago, which aligns with the
machine being upgraded from FreeBSD 14.1 to 14.4. (It might have been
running 14.3, not sure.)
The failures look like this:
ok 64 - thread/prep 732 ms
not ok 65 - thread/alloc 65 ms
# (test process was terminated by signal 11: Segmentation fault)
ok 66 - thread/descriptor 136 ms
so the problem seems to be in thread/alloc. But the log says this:
$ grep 'signal 11' /var/log/messages
Apr 1 21:53:30 generic kernel: pid 27622 (thread_implicit), jid 0, uid
1001: exited on signal 11 (core dumped)
Apr 1 23:54:19 generic kernel: pid 50594 (alloc), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 2 20:19:57 generic kernel: pid 53415 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 4 02:07:58 generic kernel: pid 48615 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 7 12:58:20 generic kernel: pid 17092 (alloc), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 9 13:21:47 generic kernel: pid 65784 (alloc), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 10 18:20:17 generic kernel: pid 67540 (thread_implicit), jid 0, uid
1001: exited on signal 11 (core dumped)
Apr 22 16:29:29 generic kernel: pid 10941 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 22 20:29:47 generic kernel: pid 32964 (thread_implicit), jid 0, uid
1001: exited on signal 11 (core dumped)
Apr 22 23:34:54 generic kernel: pid 43109 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 29 19:24:49 generic kernel: pid 81996 (thread), jid 0, uid 1001:
exited on signal 11 (core dumped)
Apr 30 10:58:42 generic kernel: pid 65438 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
May 3 22:15:57 generic kernel: pid 21640 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
May 4 12:08:15 generic kernel: pid 98832 (alloc), jid 0, uid 1001:
exited on signal 11 (core dumped)
May 5 12:04:33 generic kernel: pid 65140 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
May 5 13:05:45 generic kernel: pid 12122 (prep), jid 0, uid 1001:
exited on signal 11 (core dumped)
So there is plenty of segfaults in the other ecpg tests, it seems.
Sadly, I haven't found any core files. I'll try to look again after the
next failure.
Any ideas? I don't see similar failures on other machines.
regards
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
about a month ago dikkop started reporting occasional failures in ECPG
tests. I'm not very familiar with ecpg, and I've been unable to figure
this out so far, so I wonder if others might know ...
There's some related discussion in the ECPGprepared_statement()
thread:
/messages/by-id/75460b3c-255d-47eb-b889-d99de38e6758@gmail.com
I have no idea whether the bug that Andrew just identified explains
dikkop's problem, though.
regards, tom lane
On 5/6/26 16:00, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
about a month ago dikkop started reporting occasional failures in ECPG
tests. I'm not very familiar with ecpg, and I've been unable to figure
this out so far, so I wonder if others might know ...There's some related discussion in the ECPGprepared_statement()
thread:/messages/by-id/75460b3c-255d-47eb-b889-d99de38e6758@gmail.com
I have no idea whether the bug that Andrew just identified explains
dikkop's problem, though.regards, tom lane
Ah, I haven't noticed the last couple messagess, mentioning dikkop. Not
sure if it's the same issue - I guess we'll have to wait if the failures
go away once it gets fixed.
Still, it seems like a long-standing issue. It's curious dikkop started
failing only very recently, after the OS upgrade.
thanks
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
Ah, I haven't noticed the last couple messagess, mentioning dikkop. Not
sure if it's the same issue - I guess we'll have to wait if the failures
go away once it gets fixed.
Still, it seems like a long-standing issue. It's curious dikkop started
failing only very recently, after the OS upgrade.
Yeah. Given that these are threading tests, I'm suspecting some
change in thread scheduling behavior in this latest FBSD release,
which somehow made it easier to hit a pre-existing issue.
regards, tom lane