Sporadic connection-setup-related test failures on Cygwin in v15-

Started by Alexander Lakhinover 1 year ago5 messages
#1Alexander Lakhin
exclusion@gmail.com

Hello hackers,

A recent lorikeet (a Cygwin animal) failure [1] revealed one more
long-standing (see also [2], [3], [4]) issue related to Cygwin:
  SELECT dblink_connect('dtest1', connection_parameters());
- dblink_connect
-----------------
- OK
-(1 row)
-
+ERROR:  could not establish connection
+DETAIL:  could not connect to server: Connection refused

where inst/logfile contains:
2024-07-16 05:38:21.492 EDT [66963f67.7823:4] LOG:  could not accept new connection: Software caused connection abort
2024-07-16 05:38:21.492 EDT [66963f8c.79e5:170] pg_regress/dblink ERROR:  could not establish connection
2024-07-16 05:38:21.492 EDT [66963f8c.79e5:171] pg_regress/dblink DETAIL:  could not connect to server: Connection refused
        Is the server running locally and accepting
        connections on Unix domain socket "/home/andrew/bf/root/tmp/buildfarm-DK1yh4/.s.PGSQL.5838"?

I made a standalone reproducing script (assuming the dblink extension
installed):
numclients=50
for ((i=1;i<=1000;i++)); do
echo "iteration $i"

for ((c=1;c<=numclients;c++)); do
cat << 'EOF' | /usr/local/pgsql/bin/psql >/dev/null 2>&1 &

SELECT 'dbname='|| current_database()||' port='||current_setting('port')
  AS connstr
\gset

SELECT * FROM dblink('service=no_service', 'SELECT 1') AS t(i int);

SELECT * FROM
dblink(:'connstr', 'SELECT 1') AS t1(i int),
dblink(:'connstr', 'SELECT 2') AS t2(i int),
dblink(:'connstr', 'SELECT 3') AS t3(i int),
dblink(:'connstr', 'SELECT 4') AS t4(i int),
dblink(:'connstr', 'SELECT 5') AS t5(i int);
EOF
done
wait

grep -A1 "Software caused connection abort" server.log && break;
done

which fails for me as below:
iteration 318
2024-07-24 04:19:46.511 PDT [29062:6][postmaster][:0] LOG:  could not accept new connection: Software caused connection
abort
2024-07-24 04:19:46.512 PDT [25312:8][client backend][36/1996:0] ERROR:  could not establish connection

The important fact here is that this failure is not reproduced after
7389aad63 (in v16), so it seems that it's somehow related to signal
processing. Given that, I'm inclined to stop here, without digging deeper,
at least until there are plans to backport that fix or something...

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2024-07-16%2009%3A18%3A31 (REL_13_STABLE)
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2022-07-21%2000%3A36%3A44 (REL_14_STABLE)
[3]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2023-07-06%2009%3A19%3A36 (REL_12_STABLE)
[4]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2022-02-12%2001%3A40%3A56 (REL_13_STABLE, postgres_fdw)
postgres_fdw)

Best regards,
Alexander

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Alexander Lakhin (#1)
Re: Sporadic connection-setup-related test failures on Cygwin in v15-

On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:

The important fact here is that this failure is not reproduced after
7389aad63 (in v16), so it seems that it's somehow related to signal
processing. Given that, I'm inclined to stop here, without digging deeper,
at least until there are plans to backport that fix or something...

+1. I'm not planning to back-patch that work. Perhaps lorikeet
could stop testing releases < 16? They don't work and it's not our
bug[1]https://sourceware.org/legacy-ml/cygwin/2017-08/msg00048.html. We decided not to drop Cygwin support[2]/messages/by-id/5e6797e9-bc26-ced7-6c9c-59bca415598b@dunslane.net, but I don't think
we're learning anything from investigating that noise in the
known-broken branches.

[1]: https://sourceware.org/legacy-ml/cygwin/2017-08/msg00048.html
[2]: /messages/by-id/5e6797e9-bc26-ced7-6c9c-59bca415598b@dunslane.net

#3Alexander Lakhin
exclusion@gmail.com
In reply to: Thomas Munro (#2)
Re: Sporadic connection-setup-related test failures on Cygwin in v15-

24.07.2024 23:58, Thomas Munro wrote:

+1. I'm not planning to back-patch that work. Perhaps lorikeet
could stop testing releases < 16? They don't work and it's not our
bug[1]. We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.

Yeah, it looks like lorikeet votes +[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2024-07-24%2008%3A54%3A07 for your proposal.
(I suppose it failed due to the same signal processing issue, just another
way.)

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2024-07-24%2008%3A54%3A07

Best regards,
Alexander

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#2)
Re: Sporadic connection-setup-related test failures on Cygwin in v15-

On 2024-07-24 We 4:58 PM, Thomas Munro wrote:

On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin<exclusion@gmail.com> wrote:

The important fact here is that this failure is not reproduced after
7389aad63 (in v16), so it seems that it's somehow related to signal
processing. Given that, I'm inclined to stop here, without digging deeper,
at least until there are plans to backport that fix or something...

+1. I'm not planning to back-patch that work. Perhaps lorikeet
could stop testing releases < 16? They don't work and it's not our
bug[1]. We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.

Sure, it can. I've made that change.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#5Alexander Lakhin
exclusion@gmail.com
In reply to: Andrew Dunstan (#4)
Re: Sporadic connection-setup-related test failures on Cygwin in v15-

25.07.2024 19:25, Andrew Dunstan wrote:

+1. I'm not planning to back-patch that work. Perhaps lorikeet
could stop testing releases < 16? They don't work and it's not our
bug[1]. We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.

Sure, it can. I've made that change.

Thank you, Andrew!

I've moved those issues to the "Fixed" category.

Best regards,
Alexander