port conflicts when running tests concurrently on windows.
Hi,
With the meson patch applied the tests on windows run concurrently by
default. Unfortunately that fails semi-regularly. The reason for this
basically is that windows defaults to using TCP in tests, and that the
tap-test port determination is very racy:
# When selecting a port, we look for an unassigned TCP port number,
# even if we intend to use only Unix-domain sockets. This is clearly
# necessary on $use_tcp (Windows) configurations, and it seems like a
# good idea on Unixen as well.
$port = get_free_port();
...
=item get_free_port()Locate an unprivileged (high) TCP port that's not currently bound to
anything. This is used by C<new()>, and also by some test cases that need to
start other, non-Postgres servers.Ports assigned to existing PostgreSQL::Test::Cluster objects are automatically
excluded, even if those servers are not currently running.XXX A port available now may become unavailable by the time we start
the desired service.
I don't think there's an easy way to make this race-free. We'd need to teach
postmaster to use pre-opened socket or something like that.
An alternative to that would be to specify a base port number externally. In
the meson branch I already did that for the pg_regress style tests, since they
don't have the automatic port thing above. But for tap tests there's currently
no way to pass in a base-port that I can see.
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
Greetings,
Andres Freund
Hi,
On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
that for PG_REGRESS. But the tap tests don't look at that :(.
Greetings,
Andres Freund
Hi,
On 2021-12-08 16:36:14 -0800, Andres Freund wrote:
On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
that for PG_REGRESS. But the tap tests don't look at that :(.
The tap failures in turn are caused by the Cluster.pm choosing a socket
directory with backslashes. Those backslashes are then treated as an escape
character both by guc.c and libpq.
I think this can be addressed by something like
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 9467a199c8f..c2a8487bbab 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -119,7 +119,15 @@ INIT
$use_tcp = !$PostgreSQL::Test::Utils::use_unix_sockets;
$test_localhost = "127.0.0.1";
$last_host_assigned = 1;
- $test_pghost = $use_tcp ? $test_localhost : PostgreSQL::Test::Utils::tempdir_short;
+ if ($use_tcp)
+ {
+ $test_pghost = $test_localhost;
+ }
+ else
+ {
+ $test_pghost = PostgreSQL::Test::Utils::tempdir_short;
+ $test_pghost =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
+ }
$ENV{PGHOST} = $test_pghost;
$ENV{PGDATABASE} = 'postgres';
I wonder if we need a host2unix() helper accompanying perl2host()? Seems nicer
than sprinkling s!\\!/!g if $PostgreSQL::Test::Utils::windows_os in a growing
number of places...
Greetings,
Andres Freund
Hi,
On 2021-12-08 17:03:07 -0800, Andres Freund wrote:
On 2021-12-08 16:36:14 -0800, Andres Freund wrote:
On 2021-12-08 14:45:50 -0800, Andres Freund wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?On its own PG_TEST_USE_UNIX_SOCKETS doesn't work at all on windows - it fails
trying to use /tmp/ as a socket directory. Using PG_REGRESS_SOCK_DIR fixes
that for PG_REGRESS. But the tap tests don't look at that :(.The tap failures in turn are caused by the Cluster.pm choosing a socket
directory with backslashes. Those backslashes are then treated as an escape
character both by guc.c and libpq.I think this can be addressed by something like
[...]
That indeed seems to pass [1]https://cirrus-ci.com/task/5055530235330560?logs=ssl_test#L5 where it previously failed [2]https://cirrus-ci.com/task/5524596901281792?logs=ssl_test#L5. There's another
failure that I haven't diagnosed yet, but it's independent of tcp vs unix sockets.
[1]: https://cirrus-ci.com/task/5055530235330560?logs=ssl_test#L5
[2]: https://cirrus-ci.com/task/5524596901281792?logs=ssl_test#L5
Greetings,
Andres Freund
On Thu, Dec 9, 2021 at 11:46 AM Andres Freund <andres@anarazel.de> wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
Makes sense. As a data point, it looks like this feature is in all
supported releases of Windows. It arrived in 1803, already EOL'd, and
IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
from the internet and refusing all updates reaches "mainstream EOL"
next month. (Not a Windows person myself, but I've been looking at
this stuff while contemplating various filesystem-related changes...
there it's a little murkier, you need a WSL1-era kernel *and* you need
to be running on top of local NTFS, which is harder for us to expect.)
https://docs.microsoft.com/en-us/lifecycle/products/windows-10-enterprise-and-education
https://docs.microsoft.com/en-us/windows-server/get-started/windows-server-release-info
https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions
On 09.12.21 03:44, Thomas Munro wrote:
On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de> wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?
Makes sense to get this to work, at least as an option.
Makes sense. As a data point, it looks like this feature is in all
supported releases of Windows. It arrived in 1803, already EOL'd, and
IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
from the internet and refusing all updates reaches "mainstream EOL"
next month.
I believe the "18" in "1803" refers to 2018. We have Windows buildfarm
members that mention 2016 and 2017 in their title. Would those be in
trouble?
On 12/9/21 08:35, Peter Eisentraut wrote:
On 09.12.21 03:44, Thomas Munro wrote:
On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de>
wrote:Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?Makes sense to get this to work, at least as an option.
Makes sense. As a data point, it looks like this feature is in all
supported releases of Windows. It arrived in 1803, already EOL'd, and
IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
from the internet and refusing all updates reaches "mainstream EOL"
next month.I believe the "18" in "1803" refers to 2018. We have Windows
buildfarm members that mention 2016 and 2017 in their title. Would
those be in trouble?
Probably not if they have been updated. I have Windows machines
substantially older that 2018 but now running versions dated later.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 12/8/21 20:03, Andres Freund wrote:
I wonder if we need a host2unix() helper accompanying perl2host()? Seems nicer
than sprinkling s!\\!/!g if $PostgreSQL::Test::Utils::windows_os in a growing
number of places...
Probably a good idea. I would call it canonical_path or something like
that. / works quite happily as a path separator in almost all contexts
on Windows - there are a handful of command line programs that choke on
it - but sometimes you need to quote the path. WHen I recently provided
for cross version upgrade testing on MSVC builds I just quoted
everything. On Unix/Msys the shell just removes the quotes.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Hi,
On 2021-12-09 14:35:37 +0100, Peter Eisentraut wrote:
On 09.12.21 03:44, Thomas Munro wrote:
On Thu, Dec 9, 2021 at 11:46 AM Andres Freund<andres@anarazel.de> wrote:
Is it perhaps time to to use unix sockets on windows by default
(i.e. PG_TEST_USE_UNIX_SOCKETS), at least when on a new enough windows?Makes sense to get this to work, at least as an option.
With https://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
TMPDIR, but windows only has TMP and TEMP.
Makes sense. As a data point, it looks like this feature is in all
supported releases of Windows. It arrived in 1803, already EOL'd, and
IIUC even a Windows Server 2016 "LTSC" system that's been disconnected
from the internet and refusing all updates reaches "mainstream EOL"
next month.I believe the "18" in "1803" refers to 2018. We have Windows buildfarm
members that mention 2016 and 2017 in their title. Would those be in
trouble?
Perhaps it could make sense to print the windows version somewhere as part of
a windows build? Perhaps in the buildfarm client? Seems like it could be
generally useful, outside of this specific issue.
The most minimal thing would be to just print cmd /c ver or
such. systeminfo.exe output could also be useful, but has a bit of runtime
(1.5s on my windows VM).
Greetings,
Andres Freund
On 09.12.21 19:41, Andres Freund wrote:
Withhttps://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
TMPDIR, but windows only has TMP and TEMP.
This looks reasonable so far. The commit messages
8f3ec75de4060d86176ad4ac998eeb87a39748c2 and
1d53432ff940b789c2431ba476a2a6e2db3edf84 contain some notes about what I
thought at the time didn't work yet. In particular, the pg_upgrade
tests don't support the use of Unix sockets on Windows. (Those tests
have been rewritten since, so I don't know what the status is.) Also,
the comment in pg_regress.c at remove_temp() is still there. These
things should probably be addressed before we can consider making this
the default.
Hi,
On 2021-12-10 10:22:13 +0100, Peter Eisentraut wrote:
On 09.12.21 19:41, Andres Freund wrote:
Withhttps://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks for
TMPDIR, but windows only has TMP and TEMP.This looks reasonable so far.
I pushed that part, since we clearly need something like them.
The commit messages 8f3ec75de4060d86176ad4ac998eeb87a39748c2 and
1d53432ff940b789c2431ba476a2a6e2db3edf84 contain some notes about what I
thought at the time didn't work yet.
In particular, the pg_upgrade tests
don't support the use of Unix sockets on Windows. (Those tests have been
rewritten since, so I don't know what the status is.)
ISTM we still use two different implementations of the pg_upgrade tests :(. I
recall there being some recent-ish work on moving it to be a tap test, but
apparently not yet committed.
It doesn't look like the vcregress.pl implementation respects
PG_TEST_USE_UNIX_SOCKETS right now.
pg_regress.c at remove_temp() is still there. These things should probably
be addressed before we can consider making this the default.
Hm, not immediately obvious what to do about this. Do you know if windows has
restrictions around the length of unix domain sockets? If not, I wonder if it
could be worth using the data directory as the socket path on windows instead
of the separate temp directory?
Greetings,
Andres Freund
On 13.12.21 20:33, Andres Freund wrote:
pg_regress.c at remove_temp() is still there. These things should probably
be addressed before we can consider making this the default.Hm, not immediately obvious what to do about this. Do you know if windows has
restrictions around the length of unix domain sockets? If not, I wonder if it
could be worth using the data directory as the socket path on windows instead
of the separate temp directory?
According to src/include/port/win32.h, it's the same 108 or so bytes
that everyone else uses. That might not be the kernel limit, however,
just the way the external API is defined.
After the reading the material again, I think that comment might be
overly cautious. The list of things not to do in a signal handler on
Windows is not significantly different than those for say Linux, yet we
do a lot of them anyway. I'm tempted to just ignore the advice and do
it anyway, while ignoring errors, which is what it already does.