Why does PostgresNode.pm set such a low value of max_wal_senders?

Started by Tom Laneover 5 years ago9 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I noticed this recent buildfarm failure:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-09-29%2018%3A45%3A17

which boils down to

error running SQL: 'psql:<stdin>:1: ERROR: could not connect to the publisher: FATAL: number of requested standby connections exceeds max_wal_senders (currently 5)'
while running 'psql -XAtq -d port=62411 host=/tmp/cmXKiWUDs9 dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION' at /home/pgbf/buildroot/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm line 1546.

Digging in the postmaster log shows that indeed we were at the limit
of 5 wal senders. One was about to exit (else this test could never
succeed at all), but it had not done so fast enough to avoid this
failure.

Further digging in the buildfarm archives shows that "number of requested
standby connections exceeds max_wal_senders" seems rather common on our
slower buildfarm members, eg there are two such complaints in prairiedog's
latest successful HEAD build. Apparently, most of the time this gets
masked by automatic restart of logrep workers; but when a test script
involves explicit execution of a replication command, it's going to notice
if that try fails to connect.

So I wonder why PostgresNode.pm is doing

print $conf "max_wal_senders = 5\n";

Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.

There might be some value in the fact that this situation is exercising
the automatic-reconnection behavior, but if so I'd like to find a more
consistent way of testing that.

regards, tom lane

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

On 2020-Sep-29, Tom Lane wrote:

So I wonder why PostgresNode.pm is doing

print $conf "max_wal_senders = 5\n";

Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.

I suggest to remove that line. max_wal_senders used to default to 0
when PostgresNode was touched to have this line in commit 89ac7004dad;
the global default was raised in f6d6d2920d2c.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2020-Sep-29, Tom Lane wrote:

So I wonder why PostgresNode.pm is doing
print $conf "max_wal_senders = 5\n";
Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.

I suggest to remove that line. max_wal_senders used to default to 0
when PostgresNode was touched to have this line in commit 89ac7004dad;
the global default was raised in f6d6d2920d2c.

Hm. We could do so back to v10 where that came in, and there are no
src/test/subscription tests before v10, so that should be sufficient.
Sold.

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

On Tue, Sep 29, 2020 at 07:04:22PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I suggest to remove that line. max_wal_senders used to default to 0
when PostgresNode was touched to have this line in commit 89ac7004dad;
the global default was raised in f6d6d2920d2c.

Hm. We could do so back to v10 where that came in, and there are no
src/test/subscription tests before v10, so that should be sufficient.
Sold.

+1.
--
Michael
#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

On Tue, Sep 29, 2020 at 07:04:22PM -0400, Tom Lane wrote:

Hm. We could do so back to v10 where that came in, and there are no
src/test/subscription tests before v10, so that should be sufficient.
Sold.

Since this stuff has been committed, thorntail has showed a very
interesting failure with only the TAP tests of pg_receivewal:
# Running: pg_receivewal --slot test --create-slot
pg_receivewal: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 0)
not ok 13 - creating a replication slot

This animal uses the following, however this should have zero impact
on the way the configuration is done for nodes of the TAP tests as
that's independent:
UBSan; force_parallel_mode; wal_level=minimal

extra_config in the buildfarm conf file does not impact the nodes of
TAP tests, and PGHOST gets set to the domain path when initializing
PostgresNode.pm for all the nodes involved in a test, so pg_receivewal
should connect to the correct node. The only think I can think of is
that the environment enforces max_wal_senders to 0 in this build.
Noah, is this machine doing anything specific?
--
Michael

#6Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#5)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

On Tue, Sep 29, 2020 at 06:13:46PM -0400, Tom Lane wrote:

So I wonder why PostgresNode.pm is doing

print $conf "max_wal_senders = 5\n";

Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.

In favor of minimal values, we've had semaphore-starved buildfarm members in
the past. Perhaps those days are over, seeing that this commit has not yet
broken a buildfarm member in that particular way. Keeping max_wal_senders=10
seems fine.

On Thu, Oct 01, 2020 at 12:15:38PM +0900, Michael Paquier wrote:

Since this stuff has been committed, thorntail has showed a very
interesting failure with only the TAP tests of pg_receivewal:
# Running: pg_receivewal --slot test --create-slot
pg_receivewal: error: could not connect to server: FATAL: number of
requested standby connections exceeds max_wal_senders (currently 0)
not ok 13 - creating a replication slot

This animal uses the following, however this should have zero impact
on the way the configuration is done for nodes of the TAP tests as
that's independent:
UBSan; force_parallel_mode; wal_level=minimal

extra_config in the buildfarm conf file does not impact the nodes of
TAP tests

No, PostgreSQL commit 54c2ecb changed that. I recommend an explicit
max_wal_senders=10 in PostgresNode, which makes it easy to test
wal_level=minimal:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-world TEMP_CONFIG=/tmp/minimal.conf

thorntail is doing the equivalent, hence the failures.

Perhaps wal_level=minimal should stop its pedantic call for max_wal_senders=0.
As long as the relevant error messages are clear, it would be fine for
wal_level=minimal to ignore max_wal_senders and size resources as though
max_wal_senders=0. That could be one less snag for end users. (It's not
worth changing solely to save a line in PostgresNode, though.)

#7Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#6)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

On Wed, Sep 30, 2020 at 10:38:59PM -0700, Noah Misch wrote:

In favor of minimal values, we've had semaphore-starved buildfarm members in
the past. Perhaps those days are over, seeing that this commit has not yet
broken a buildfarm member in that particular way. Keeping max_wal_senders=10
seems fine.

Indeed, I am not spotting anything suspicious here.

No, PostgreSQL commit 54c2ecb changed that. I recommend an explicit
max_wal_senders=10 in PostgresNode, which makes it easy to test
wal_level=minimal:

printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf
make check-world TEMP_CONFIG=/tmp/minimal.conf

thorntail is doing the equivalent, hence the failures.

Ah, thanks, I have missed this piece. So we really need to have a
value set in this module after all.
--
Michael

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#7)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Sep 30, 2020 at 10:38:59PM -0700, Noah Misch wrote:

In favor of minimal values, we've had semaphore-starved buildfarm members in
the past. Perhaps those days are over, seeing that this commit has not yet
broken a buildfarm member in that particular way. Keeping max_wal_senders=10
seems fine.

Indeed, I am not spotting anything suspicious here.

Yeah, so far so good. Note that PostgresNode.pm does attempt to cater for
semaphore-starved machines, by cutting max_connections as much as it can.
In practice the total semaphore usage of a subscription test is probably
still less than that of one postmaster with default max_connections.

No, PostgreSQL commit 54c2ecb changed that. I recommend an explicit
max_wal_senders=10 in PostgresNode, which makes it easy to test
wal_level=minimal:

Ah, thanks, I have missed this piece. So we really need to have a
value set in this module after all.

Agreed, I'll go put it back.

On the other point, I think that we should continue to complain
about max_wal_senders > 0 with wal_level = minimal. If we reduce
that to a LOG message, which'd be the net effect of trying to be
laxer, people wouldn't see it and would then wonder why they can't
start replication.

regards, tom lane

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Noah Misch (#6)
Re: Why does PostgresNode.pm set such a low value of max_wal_senders?

At Wed, 30 Sep 2020 22:38:59 -0700, Noah Misch <noah@leadboat.com> wrote in
noah> Perhaps wal_level=minimal should stop its pedantic call for max_wal_senders=0.
noah> As long as the relevant error messages are clear, it would be fine for
noah> wal_level=minimal to ignore max_wal_senders and size resources as though
noah> max_wal_senders=0. That could be one less snag for end users. (It's not
noah> worth changing solely to save a line in PostgresNode, though.)

At Thu, 01 Oct 2020 09:42:52 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
tgl> On the other point, I think that we should continue to complain
tgl> about max_wal_senders > 0 with wal_level = minimal. If we reduce
tgl> that to a LOG message, which'd be the net effect of trying to be
tgl> laxer, people wouldn't see it and would then wonder why they can't
tgl> start replication.

FWIW, I'm on the noah's side.

One reason of that is that if we implement the in-place setting
relation persistence feature for bulk-data loading, wal_level would
get flipped-then-back between minimal and replica or logical. The
restriction about max_wal_senders is the pain n the ass in that case..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center