Regression tests fail on OpenBSD due to low semmns value

Started by Alexander Lakhinabout 1 year ago17 messages

exclusion@gmail.com

about 1 year ago

Hello hackers,

A recent buildfarm timeout failure on sawshark [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-12-11%2012%3A20%3A05 made me wonder, what's
wrong with that animal — beside that failure, this animal (running on
OpenBSD 7.4) produced "too many clients" errors from time to time, e. g.,
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-07-22%2001%3A20%3A22

I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that
hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4]https://man.openbsd.org/options)
and as a consequence, initdb sets max_connections = 20 for the regression
test database. (This can be helpful sometimes, see e.g., [5]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=73c9f91a1.) At the same
time, paralell_schedule contains groups of 20 tests, for instance:
# parallel group (20 tests): select_into random delete select_having select_distinct_on case prepared_xacts namespace
select_implicit union arrays portals transactions select_distinct subselect update join aggregates hash_index btree_index

Moreover, prepared_xacts performs "\c", and it adds one more connection
for a short time, according to postmaster.log:
2024-12-16 06:18:20.290 EET [regression][1563560:91][client backend] [pg_regress/prepared_xacts] LOG: statement: rollback;
...
2024-12-16 06:18:20.290 EET [regression][1563561:2][client backend] [[unknown]] FATAL: sorry, too many clients already
...
2024-12-16 06:18:20.291 EET [regression][1563560:95][client backend] [pg_regress/prepared_xacts] LOG: disconnection:
session time: 0:00:00.018 user=law database=regression host=[local]

sysctl kern.seminfo.semmns=120 makes the issue go away on this OS;
on the hand, "too many clients" failures can be reproduced on other OS,
with "max_connections=20" in TEMP_CONFIG.

As to the hang, it can be reproduced easily with:
TEMP_CONFIG containing
max_connections=2
superuser_reserved_connections=0

and parallel_schedule as simple as:
test: transactions prepared_xacts
test: transactions prepared_xacts

Running `TEMP_CONFIG=.../extra.config make -s check`, I can see:
# +++ regress check in src/test/regress +++
...
# parallel group (2 tests): prepared_xacts transactions
not ok 1 + transactions 56 ms
not ok 2 + prepared_xacts 21 ms
# (test process exited with exit code 2)
# parallel group (2 tests):
### the test is hanging here ###

with one backend waiting inside:
#0 0x000070c41ed2a007 in epoll_wait (epfd=6, events=0x629f1ce529e8, maxevents=1, timeout=-1) at
../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x0000629f1410d64a in WaitEventSetWaitBlock (set=0x629f1ce52980, cur_timeout=-1, occurred_events=0x7ffd4c4ffed0,
nevents=1) at latch.c:1564
#2 0x0000629f1410d534 in WaitEventSetWait (set=0x629f1ce52980, timeout=-1, occurred_events=0x7ffd4c4ffed0, nevents=1,
wait_event_info=134217779) at latch.c:1510
#3 0x0000629f1410c764 in WaitLatch (latch=0x70c41b86bc24, wakeEvents=33, timeout=0, wait_event_info=134217779) at
latch.c:538
#4 0x0000629f1413d032 in ProcWaitForSignal (wait_event_info=134217779) at proc.c:1893
#5 0x0000629f14132eb9 in GetSafeSnapshot (origSnapshot=0x629f147ad360 <CurrentSnapshotData>) at predicate.c:1579
#6 0x0000629f14133261 in GetSerializableTransactionSnapshot (snapshot=0x629f147ad360 <CurrentSnapshotData>) at
predicate.c:1695
#7 0x0000629f143afafe in GetTransactionSnapshot () at snapmgr.c:253
#8 0x0000629f1414a7b8 in exec_simple_query (query_string=0x629f1ce580f0 "SELECT * FROM writetest;") at postgres.c:1172
...

So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to
become empty (for other backend to remove itself from the list of possible conflicts
inside ReleasePredicateLocks()), but it doesn't happen.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-12-11%2012%3A20%3A05
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-07-22%2001%3A20%3A22
[3]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-11-25%2006%3A20%3A22
[4]: https://man.openbsd.org/options
[5]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=73c9f91a1

Best regards,
Alexander

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Alexander Lakhin (#1)

Re: Regression tests fail on OpenBSD due to low semmns value

Alexander Lakhin <exclusion@gmail.com> writes:

I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that
hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4])
and as a consequence, initdb sets max_connections = 20 for the regression
test database. (This can be helpful sometimes, see e.g., [5].) At the same
time, paralell_schedule contains groups of 20 tests, for instance:

Yeah. That was more-or-less okay before we invented parallel query,
but now there needs to be some headroom. I've thought about adjusting
initdb to not allow max_connections less than 25 (can't remember if
I actually proposed that on-list though). The other way would be to
rearrange parallel_schedule to make the max group size less than 20,
but that seems like a lot of effort for little benefit.

FTR, NetBSD also has unreasonably tiny semaphore settings out-of-the
box. mamba's host is using

kern.ipc.semmni=100
kern.ipc.semmns=1000

and for that matter

kern.maxvnodes=60000
kern.maxproc=1000
kern.maxfiles=10000

...
So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to
become empty (for other backend to remove itself from the list of possible conflicts
inside ReleasePredicateLocks()), but it doesn't happen.

This seems like an actual bug?

regards, tom lane

Andrew Dunstan

andrew@dunslane.net

about 1 year ago

In reply to: Tom Lane (#2)

Re: Regression tests fail on OpenBSD due to low semmns value

On 2024-12-16 Mo 12:23 AM, Tom Lane wrote:

Alexander Lakhin<exclusion@gmail.com> writes:

I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that
hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4])
and as a consequence, initdb sets max_connections = 20 for the regression
test database. (This can be helpful sometimes, see e.g., [5].) At the same
time, paralell_schedule contains groups of 20 tests, for instance:

Yeah. That was more-or-less okay before we invented parallel query,
but now there needs to be some headroom. I've thought about adjusting
initdb to not allow max_connections less than 25 (can't remember if
I actually proposed that on-list though). The other way would be to
rearrange parallel_schedule to make the max group size less than 20,
but that seems like a lot of effort for little benefit.

25 seems perfectly reasonable, these days. The current minimum was set
nearly 7 years ago.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andrew Dunstan (#3)

Re: Regression tests fail on OpenBSD due to low semmns value

Andrew Dunstan <andrew@dunslane.net> writes:

On 2024-12-16 Mo 12:23 AM, Tom Lane wrote:

Yeah. That was more-or-less okay before we invented parallel query,
but now there needs to be some headroom. I've thought about adjusting
initdb to not allow max_connections less than 25 (can't remember if
I actually proposed that on-list though). The other way would be to
rearrange parallel_schedule to make the max group size less than 20,
but that seems like a lot of effort for little benefit.

25 seems perfectly reasonable, these days. The current minimum was set
nearly 7 years ago.

I poked at this a bit on an OpenBSD installation. The out-of-the-box
value of kern.seminfo.semmns seems to be 60, as Alexander said.
It turns out that we can run under that with max_connections = 20,
but not any higher value, the reason being that the number of
semaphores we need is

MaxConnections +
autovacuum_max_workers + 1 +
max_worker_processes +
max_wal_senders +
NUM_AUXILIARY_PROCS

or 20 + 3 + 1 + 8 + 10 + 6 = 48. We allocate semaphores in groups
of SEMAS_PER_SET (16), plus one for identification purposes,
so with this many semaphores needed we create 3 sets of 17 semaphores
= 51 semaphores. One more requested semaphore would put us up to 68
semaphores which is more than OpenBSD's SEMMNS. So we're already on
the hairy edge here.

Now we could just blow this off and say that we can't run on OpenBSD
at all without an increase in kern.seminfo.semmns. But that seems a
little sad, because there are easy things we could do to make this
less tight:

* Why in the world is the default value of max_wal_senders 10?
I find it hard to believe that there are installations using
more than about 3, and even there you can bet they are changing
a lot of other parameters.

* There's no reason that SEMAS_PER_SET has to be a power of 2. The
commentary in sysv_sema.c says "It must be *less than* your kernel's
SEMMSL (max semaphores per set) parameter, which is often around 25".
If we made it, say, 19, then we could allocate 3 sets (really 20
semaphores) and accommodate up to 57 processes without having
to have an increase in kern.seminfo.semmns.

In short then, I propose:

* Increase initdb's minimum probed max_connections to 25.

* Reduce default value of max_wal_senders to 3 (or maybe 5
if people think that's too drastic).

* Change sysv_sema.c's SEMAS_PER_SET to 19.

On a stock OpenBSD setup, I find that this actually lets
us set max_connections to 30, so that there's some headroom
for the inevitable future growth of the number of background
processes.

Of course, none of this is going to save owners of *BSD
buildfarm animals from needing to increase the kernel
parameters, because the regression tests launch multiple
postmasters in places. But I think it's friendly to novice
PG users if they can launch one postmaster without that.

regards, tom lane

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#4)

Re: Regression tests fail on OpenBSD due to low semmns value

Hi,

On 2024-12-16 12:52:46 -0500, Tom Lane wrote:

or 20 + 3 + 1 + 8 + 10 + 6 = 48. We allocate semaphores in groups
of SEMAS_PER_SET (16), plus one for identification purposes,
so with this many semaphores needed we create 3 sets of 17 semaphores
= 51 semaphores. One more requested semaphore would put us up to 68
semaphores which is more than OpenBSD's SEMMNS. So we're already on
the hairy edge here.

Now we could just blow this off and say that we can't run on OpenBSD
at all without an increase in kern.seminfo.semmns.

Given the numbers of users (or even testers) on openbsd that seems like it
might be a reasonable answer... But, see below.

* Why in the world is the default value of max_wal_senders 10?
I find it hard to believe that there are installations using
more than about 3, and even there you can bet they are changing
a lot of other parameters.

I don't think it's that rare as logical replication also needs a walsender
slot... I think we're going to hurt far more users by lowering this than we'd
help.

But I think it might be sane to have initdb probe a lower max_wal_senders
alongside lower max_connections settings. It seems to make sense to have a
lower max_wal_senders settings on machines that don't have enough resources to
run with max_connections=100.

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#5)

Re: Regression tests fail on OpenBSD due to low semmns value

Andres Freund <andres@anarazel.de> writes:

On 2024-12-16 12:52:46 -0500, Tom Lane wrote:

* Why in the world is the default value of max_wal_senders 10?
I find it hard to believe that there are installations using
more than about 3, and even there you can bet they are changing
a lot of other parameters.

I don't think it's that rare as logical replication also needs a walsender
slot... I think we're going to hurt far more users by lowering this than we'd
help.

Hm, okay. If we just twiddle SEMAS_PER_SET we can still have
max_connections = 25 with max_wal_senders = 10, so doing that
much seems free.

regards, tom lane

Thomas Munro

thomas.munro@gmail.com

about 1 year ago

In reply to: Alexander Lakhin (#1)

Re: Regression tests fail on OpenBSD due to low semmns value

On Mon, Dec 16, 2024 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:

It turned out that OpenBSD has semmns as low as 60 (see [4])

Whenever I run into this, or my Mac requires manual ipcrm to clean up
leaked SysV kernel junk, I rebase my patch for sema_kind = 'futex'.
Here it goes. It could be updated to support NetBSD I believe, but I
didn't try as its futex stuff came out later.

Then I remember why I didn't go anywhere with it. It triggers a
thought loop about flipping it all around: use futexes to implement
lwlocks directly in place, and get rid of semaphores completely, but
that involves a few rabbit holes and sub-projects. From memory:
classic r/w lock implementation on futexes is tricky but doable in the
portability constraints, futex fallback implementation even works
surprisingly well but has fun memory map sub-problems, actually lwlock
is not really a classic r/w lock as it has sprouted extra funky APIs
that lead the intrepid rabbit-holer to design an entirely different
new concurrency primitive that is really wanted for those users, a
couple of other places use raw semaphores directly namely procarray.c
and clog.c and if you stare at those for long you will be overwhelmed
with a desire to rewrite them, EOVERFLOW.

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Thomas Munro (#7)

Re: Regression tests fail on OpenBSD due to low semmns value

Thomas Munro <thomas.munro@gmail.com> writes:

Whenever I run into this, or my Mac requires manual ipcrm to clean up
leaked SysV kernel junk, I rebase my patch for sema_kind = 'futex'.
Here it goes. It could be updated to support NetBSD I believe, but I
didn't try as its futex stuff came out later.

FWIW, I looked at a nearby NetBSD 10.0 machine. It has
/usr/include/sys/futex.h, which includes this enticing comment:

/*
* Definitions for the __futex(2) synchronization primitive.
*
* These definitions are intended to be ABI-compatible with the
* Linux futex(2) system call.
*/

However, the complete lack of any user-level documentation makes
me misdoubt the extent of their commitment to this :-(

I have the same concern about depending on undocumented macOS
APIs. Other than that, getting off of SysV semaphores would be
a nice thing to do.

regards, tom lane

Peter Eisentraut

peter_e@gmx.net

about 1 year ago

In reply to: Andres Freund (#5)

Re: Regression tests fail on OpenBSD due to low semmns value

On 16.12.24 19:19, Andres Freund wrote:

* Why in the world is the default value of max_wal_senders 10?
I find it hard to believe that there are installations using
more than about 3, and even there you can bet they are changing
a lot of other parameters.

I don't think it's that rare as logical replication also needs a walsender
slot... I think we're going to hurt far more users by lowering this than we'd
help.

Here is where this change was originally discussed:
/messages/by-id/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com

The low semaphore settings on some BSD systems were also mentioned
there. Did anything change now that it is triggering more issues now?

#10

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Peter Eisentraut (#9)

Re: Regression tests fail on OpenBSD due to low semmns value

Peter Eisentraut <peter@eisentraut.org> writes:

* Why in the world is the default value of max_wal_senders 10?

Here is where this change was originally discussed:
/messages/by-id/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com

Hmm. There was not a lot in that thread about which specific nonzero
value of max_wal_senders to use, but I do see

After some testing and searching for documentation, it seems that at
least the BSD platforms have a very low default semmns setting
(apparently 60, which leads to max_connections=30).

The low semaphore settings on some BSD systems were also mentioned
there. Did anything change now that it is triggering more issues now?

Yeah, we have more background-process slots reserved by default now.
There's parallel worker slots that were not there in v10, and I think
another one or two random auxiliary processes. So we fail to reach
max_connections=30 now.

As things stand today, we can allocate exactly 20 max_connections
because there are 28 background-process slots if all other parameters
are left at default, and 48 usable semaphores is as many as we
can create under the OpenBSD/NetBSD default of SEMMNS=60. So we're
skating at the hairy edge of whether the parallel regression tests
work reliably, and the next time somebody invents a new kind of
auxiliary process, it will stop working altogether.

My proposal to increase SEMAS_PER_SET to 19 would provide us nine
more usable semaphores under the default *BSD configuration.
With the change to initdb to probe 25 not 20 for max_connections,
five of those would go into max_connections and we'd have four
spares for new background processes. Maybe by the time that runs
out, we'll have found a better alternative to SysV semaphores.

The only downside I can see is that the current setup is able
to coexist with some other service that uses a small number of
SysV semaphores, while with these changes that would not work
without raising the platform SEMMNS limit. Realistically though
you're going to want to raise the platform limit for any sort of
production usage of Postgres. I think this discussion is just
about whether "make; make check" will work out-of-the-box, which
I think is a good goal to have.

regards, tom lane

#11

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#10)

Re: Regression tests fail on OpenBSD due to low semmns value

Hi,

On 2024-12-18 11:23:23 -0500, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

After some testing and searching for documentation, it seems that at
least the BSD platforms have a very low default semmns setting
(apparently 60, which leads to max_connections=30).

The low semaphore settings on some BSD systems were also mentioned
there. Did anything change now that it is triggering more issues now?

Yeah, we have more background-process slots reserved by default now.
There's parallel worker slots that were not there in v10, and I think
another one or two random auxiliary processes. So we fail to reach
max_connections=30 now.

As things stand today, we can allocate exactly 20 max_connections
because there are 28 background-process slots if all other parameters
are left at default, and 48 usable semaphores is as many as we
can create under the OpenBSD/NetBSD default of SEMMNS=60. So we're
skating at the hairy edge of whether the parallel regression tests
work reliably, and the next time somebody invents a new kind of
auxiliary process, it will stop working altogether.

My proposal to increase SEMAS_PER_SET to 19 would provide us nine
more usable semaphores under the default *BSD configuration.
With the change to initdb to probe 25 not 20 for max_connections,
five of those would go into max_connections and we'd have four
spares for new background processes. Maybe by the time that runs
out, we'll have found a better alternative to SysV semaphores.

The only downside I can see is that the current setup is able
to coexist with some other service that uses a small number of
SysV semaphores, while with these changes that would not work
without raising the platform SEMMNS limit. Realistically though
you're going to want to raise the platform limit for any sort of
production usage of Postgres. I think this discussion is just
about whether "make; make check" will work out-of-the-box, which
I think is a good goal to have.

Maybe we should consider switching those platforms to unnamed posix
semaphores?

There were some not so great performance numbers in the past:
* openbsd, 2021: /messages/by-id/3010886.1634950831@sss.pgh.pa.us
* netbsd, 2022: /messages/by-id/20220828013914.5hzc7kvcpum5h2yn@awork3.anarazel.de

But TBH, nobody uses openbsd and netbsd if performance matters even one
iota. And considering a bunch of postgres changes to deal with idiotic default
sysv limits doesn't feal like a sensible thing to do in 2024.

Greetings,

Andres Freund

#12

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#11)

Re: Regression tests fail on OpenBSD due to low semmns value

Andres Freund <andres@anarazel.de> writes:

Maybe we should consider switching those platforms to unnamed posix
semaphores?

I already looked into that. OpenBSD still doesn't have cross-process
posix semaphores, at least according to its man page. NetBSD does,
but they consume an FD per sema, which is actually worse because
the default max-open-files-per-process is none too large either.

But TBH, nobody uses openbsd and netbsd if performance matters even one
iota. And considering a bunch of postgres changes to deal with idiotic default
sysv limits doesn't feal like a sensible thing to do in 2024.

Yeah, I would not expend a lot of effort on this. But two one-line
changes doesn't seem unreasonable.

regards, tom lane

#13

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#12)

Re: Regression tests fail on OpenBSD due to low semmns value

Hi,

On 2024-12-18 12:00:48 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Maybe we should consider switching those platforms to unnamed posix
semaphores?

I already looked into that. OpenBSD still doesn't have cross-process
posix semaphores, at least according to its man page.

Ugh, I had missed that:

This implementation does not support shared semaphores, and reports this fact
by setting errno to EPERM. This is perhaps a stretch of the intention of
POSIX, but is compliant, with the caveat that sem_init() always reports a
permissions error when an attempt to create a shared semaphore is made.

That's such a stupid argument that I kinda just want to rip out openbsd
support out of postgres :)

NetBSD does, but they consume an FD per sema, which is actually worse
because the default max-open-files-per-process is none too large either.

Doesn't seem that bad on netbsd 10. Via Bilal's netbsd CI patch, I get:
# sysctl proc.curproc.rlimit.descriptors
proc.curproc.rlimit.descriptors.soft = 1024
proc.curproc.rlimit.descriptors.hard = 3404

But TBH, nobody uses openbsd and netbsd if performance matters even one
iota. And considering a bunch of postgres changes to deal with idiotic default
sysv limits doesn't feal like a sensible thing to do in 2024.

Yeah, I would not expend a lot of effort on this. But two one-line
changes doesn't seem unreasonable.

Agreed for stuff like SEMAS_PER_SET. I just don't think it's a good idea to
invest in lowering our default semaphore requirements by lowering various
default process limits or such.

Greetings,

Andres Freund

#14

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#13)

Re: Regression tests fail on OpenBSD due to low semmns value

Andres Freund <andres@anarazel.de> writes:

On 2024-12-18 12:00:48 -0500, Tom Lane wrote:

NetBSD does, but they consume an FD per sema, which is actually worse
because the default max-open-files-per-process is none too large either.

Doesn't seem that bad on netbsd 10. Via Bilal's netbsd CI patch, I get:
# sysctl proc.curproc.rlimit.descriptors
proc.curproc.rlimit.descriptors.soft = 1024
proc.curproc.rlimit.descriptors.hard = 3404

Hmm, on mamba's host I see

proc.curproc.rlimit.descriptors.soft = 128
proc.curproc.rlimit.descriptors.hard = 1772

I had actually tried building with unnamed semas there a couple days
ago, and found that the postmaster failed to start. 21fb39cb0 should
have alleviated that (didn't test it yet). But we're still in a
very limited-resource regime. That with the old performance tests
you dredged up makes me not want to switch sema types.

Yeah, I would not expend a lot of effort on this. But two one-line
changes doesn't seem unreasonable.

Agreed for stuff like SEMAS_PER_SET. I just don't think it's a good idea to
invest in lowering our default semaphore requirements by lowering various
default process limits or such.

Fair, seems like we're on the same page.

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#14)

Re: Regression tests fail on OpenBSD due to low semmns value

BTW, I did a little bit of performance testing using current OpenBSD
(7.6), and it looks like they partially fixed the performance issues
I saw with their named POSIX semaphores back in 2021. "pgbench -S"
seems to show TPS rates right about on par with a SysV-sema build.
There is still a measurable hit in connection startup time, about
18.8ms versus 16.7ms according to "pgbench -S -C" (with
max_connections set to 100). But that's probably not something
you'd notice if you weren't looking for it. Postmaster start/stop
time is still awful with max_connections = 10000, but how many
people are likely to try that? (It's a couple of seconds at 1000,
so I detect a strong whiff of an O(N^2) issue in there somewhere.)

So maybe we should think about switching OpenBSD to named semas
by default. One good thing about that is we'd have some buildfarm
coverage for that code path --- right now there are no platforms
that use it.

We'd still want to make the other changes I mentioned for NetBSD's
sake, though.

regards, tom lane

#16

Alexander Lakhin

exclusion@gmail.com

about 1 year ago

In reply to: Tom Lane (#15)

Re: Regression tests fail on OpenBSD due to low semmns value

Hello Tom,

16.12.2024 07:23, Tom Lane wrote:

Alexander Lakhin<exclusion@gmail.com> writes:

...
So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to
become empty (for other backend to remove itself from the list of possible conflicts
inside ReleasePredicateLocks()), but it doesn't happen.

This seems like an actual bug?

I've reproduced this behavior with two reduced sqls.
prepared_xacts.sql:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE pxtest4 (a int);
PREPARE TRANSACTION 'regress_sub2';
\c -
COMMIT PREPARED 'regress_sub2';
-- the script ends prematurely and doesn't reach COMMIT when \c fails due
-- to the "too many clients" error.

transactions.sql
SELECT pg_sleep(1);
CREATE TABLE writetest (a int);

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE; -- ok
SELECT * FROM writetest; -- ok
COMMIT;

and parallel_schedule:
test: transactions prepared_xacts

So "transactions" backend just waits for the prepared transaction to
finish.

19.12.2024 01:06, Tom Lane wrote:

We'd still want to make the other changes I mentioned for NetBSD's
sake, though.

Thank you for fixing that shortcoming!

Best regards,
Alexander

#17

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Alexander Lakhin (#16)

Re: Regression tests fail on OpenBSD due to low semmns value

Alexander Lakhin <exclusion@gmail.com> writes:

I've reproduced this behavior with two reduced sqls.
prepared_xacts.sql:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE pxtest4 (a int);
PREPARE TRANSACTION 'regress_sub2';
\c -
COMMIT PREPARED 'regress_sub2';
-- the script ends prematurely and doesn't reach COMMIT when \c fails due
-- to the "too many clients" error.

Hmm, okay. Not really a bug, or at least I don't see much we could
do about it.

It does seem odd that a prepared transaction --- which, at least
in theory, we should know won't do anything more --- can block
other serializable transactions. Maybe that could be improved,
but it sounds like a research project not a bug fix.

regards, tom lane

Regression tests fail on OpenBSD due to low semmns value

Attachments: