fairywren hung in pg_basebackup tests

Started by Andrew Dunstanover 3 years ago20 messages
#1Andrew Dunstan
andrew@dunslane.net

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

### Starting node "main"
# Running: pg_ctl -w -D
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/pgdata
-l
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log
-o --cluster-name=main start
waiting for server to start.... done
server started
# Postmaster PID for node "main" is 5368
Junction created for C:\tools\nmsys64\tmp\pIBOsSp9se\tempdir <<===>>
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb
# Taking pg_basebackup tarbackup2 from node "main"
# Running: pg_basebackup -D
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/backup/tarbackup2
-h 127.0.0.1 -p 52897 --checkpoint fast --no-sync -Ft
# Backup finished
[13:11:33.592](0.978s) ok 95 - backup tar was created
[13:11:33.592](0.000s) ok 96 - WAL tar was created
[13:11:33.593](0.000s) not ok 97 - one tablespace tar was created
[13:11:33.593](0.000s)
[13:11:33.593](0.000s) #   Failed test 'one tablespace tar was created'
#   at t/010_pg_basebackup.pl line 352.
[13:11:33.593](0.000s) #          got: '0'
#     expected: '1'
# Checking port 52898
# Found port 52898
Name: replica
Data directory:
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/pgdata
Backup directory:
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/backup
Archive directory:
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/archives
Connection string: port=52898 host=127.0.0.1
Log file:
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_replica.log
# Initializing node "replica" from backup "tarbackup2" of node "main"
# Running: C:/Windows/System32/tar xf
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/backup/tarbackup2/base.tar
-C
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/pgdata
# Running: C:/Windows/System32/tar xf
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/backup/tarbackup2/pg_wal.tar
-C
C:/tools/nmsys64/home/pgrunner/bf/root/REL_15_STABLE/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/pgdata/pg_wal
Use of uninitialized value $_[2] in join or string at
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql\src\test\perl/PostgreSQL/Test/Utils.pm
line 337.
# Running: C:/Windows/System32/tar xf  -C
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica
Use of uninitialized value $_[2] in system at
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql\src\test\perl/PostgreSQL/Test/Utils.pm
line 338.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: fairywren hung in pg_basebackup tests

Andrew Dunstan <andrew@dunslane.net> writes:

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

Hm, what's with the "Use of uninitialized value" warnings?

regards, tom lane

#3Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#2)
Re: fairywren hung in pg_basebackup tests

On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

Hm, what's with the "Use of uninitialized value" warnings?

The warnings are sequelae of:

[13:11:33.593](0.000s) not ok 97 - one tablespace tar was created

From that, it follows that $tblspc_tars[0] is undef at:

PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
'-C', $repTsDir);

# Running: C:/Windows/System32/tar xf� -C
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica

I can confirm that Windows tar hangs when invoked that way. For preventing
the hang, the test file could die() or skip the tar-program-using section
after failing 'one tablespace tar was created'. That still leaves a question
about why pg_basebackup didn't make the tablespace tar file. I would check
010_pg_basebackup_main.log for clues about that.

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Noah Misch (#3)
Re: fairywren hung in pg_basebackup tests

On 2022-07-24 Su 15:10, Noah Misch wrote:

On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

Hm, what's with the "Use of uninitialized value" warnings?

The warnings are sequelae of:

[13:11:33.593](0.000s) not ok 97 - one tablespace tar was created

From that, it follows that $tblspc_tars[0] is undef at:

PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
'-C', $repTsDir);

# Running: C:/Windows/System32/tar xf  -C
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica

Perhaps we should have a guard in system_or_bail() and/or system_log()
which bails if some element of @_ is undefined.

I can confirm that Windows tar hangs when invoked that way. For preventing
the hang, the test file could die() or skip the tar-program-using section
after failing 'one tablespace tar was created'. That still leaves a question
about why pg_basebackup didn't make the tablespace tar file. I would check
010_pg_basebackup_main.log for clues about that.

The same thing has happened again on HEAD. I don't see anything in that
file that gives any clue.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#4)
Re: fairywren hung in pg_basebackup tests

Andrew Dunstan <andrew@dunslane.net> writes:

On 2022-07-24 Su 15:10, Noah Misch wrote:

On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

Hm, what's with the "Use of uninitialized value" warnings?

The warnings are sequelae of:

[13:11:33.593](0.000s) not ok 97 - one tablespace tar was created

From that, it follows that $tblspc_tars[0] is undef at:
PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
'-C', $repTsDir);

Right, so the "glob" failed to find anything. Seeing that this test
is new as of 534472375, which postdates fairywren's last successful
run, I'd guess that the "glob" needs adjustment for msys path names.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: fairywren hung in pg_basebackup tests

I wrote:

Right, so the "glob" failed to find anything. Seeing that this test
is new as of 534472375, which postdates fairywren's last successful
run, I'd guess that the "glob" needs adjustment for msys path names.

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

regards, tom lane

#7Noah Misch
noah@leadboat.com
In reply to: Andrew Dunstan (#4)
Re: fairywren hung in pg_basebackup tests

On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:

On 2022-07-24 Su 15:10, Noah Misch wrote:

On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
Here's the bottom of the regress log. I don't have further info as yet,
but can dig is someone has a suggestion.

Hm, what's with the "Use of uninitialized value" warnings?

The warnings are sequelae of:

[13:11:33.593](0.000s) not ok 97 - one tablespace tar was created

From that, it follows that $tblspc_tars[0] is undef at:

PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
'-C', $repTsDir);

# Running: C:/Windows/System32/tar xf� -C
C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica

Perhaps we should have a guard in system_or_bail() and/or system_log()
which bails if some element of @_ is undefined.

That would be reasonable. Also reasonable to impose some long timeout, maybe
10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#6)
Re: fairywren hung in pg_basebackup tests

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Right, so the "glob" failed to find anything. Seeing that this test
is new as of 534472375, which postdates fairywren's last successful
run, I'd guess that the "glob" needs adjustment for msys path names.

The test added by 534472375 is at the end, hundreds of lines later
than the one that appears to be failing.

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it). If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned? I'm struggling to understand why msys
would change the answer though.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#7)
Re: fairywren hung in pg_basebackup tests

Noah Misch <noah@leadboat.com> writes:

On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:

Perhaps we should have a guard in system_or_bail() and/or system_log()
which bails if some element of @_ is undefined.

+1, seeing how hard this is to diagnose.

That would be reasonable. Also reasonable to impose some long timeout, maybe
10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.

Why would it need to be more than PG_TEST_TIMEOUT_DEFAULT?

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#8)
Re: fairywren hung in pg_basebackup tests

On 2022-07-25 Mo 11:24, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Right, so the "glob" failed to find anything. Seeing that this test
is new as of 534472375, which postdates fairywren's last successful
run, I'd guess that the "glob" needs adjustment for msys path names.

The test added by 534472375 is at the end, hundreds of lines later
than the one that appears to be failing.

Right.

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it). If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned? I'm struggling to understand why msys
would change the answer though.

I have no idea either. The link exists and it is a junction. I'll see
about logging details.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#11Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#9)
Re: fairywren hung in pg_basebackup tests

On Mon, Jul 25, 2022 at 11:35:12AM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:

Perhaps we should have a guard in system_or_bail() and/or system_log()
which bails if some element of @_ is undefined.

+1, seeing how hard this is to diagnose.

That would be reasonable. Also reasonable to impose some long timeout, maybe
10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.

Why would it need to be more than PG_TEST_TIMEOUT_DEFAULT?

We run some long commands, like the parallel_schedule runs. Those currently
use plain system(), but they probably should have used system_log() from a
logging standpoint. If they had, PG_TEST_TIMEOUT_DEFAULT would have been too
short. One could argue that anything that slow should declare its intent to
be that slow, but that argument is getting into the territory of a policy
change rather than a backstop for clearly-unintended longevity.

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Andrew Dunstan (#10)
Re: fairywren hung in pg_basebackup tests

On Tue, Jul 26, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-07-25 Mo 11:24, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it). If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned? I'm struggling to understand why msys
would change the answer though.

I have no idea either. The link exists and it is a junction. I'll see
about logging details.

From the clues so far, it seems like pgwin32_is_junction(fullpath) was
returning true, but the following code in get_dirent_type(), which was
supposed to be equivalent, is not reached on MSYS (though it
apparently does on MSVC?):

+       if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
+               (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
                d->ret.d_type = DT_LNK;

pgwin32_is_junction() uses GetFileAttributes() and tests (attr &
FILE_ATTRIBUTE_REPARSE_POINT) == FILE_ATTRIBUTE_REPARSE_POINT, which
is like the first condition but lacks the dwReserved0 part. What is
that part doing, and why would it be doing something different in MSVC
and MSYS builds? That code came from 87e6ed7c, recently I was just
trying to fix it by reordering the checks; oh, there was some
discussion about that field[1]/messages/by-id/CABUevEzURN=wC95JHvTKFJtEy0eY9rWO42yU=59-q8xSwm-Dug@mail.gmail.com.

One idea is that something about dwReserved0 or
IO_REPARSE_TAG_MOUNT_POINT is different in the open source replacement
system headers supplied by the MinGW project used by MSYS builds
(right?), compared to the "real" Windows SDK's headers used by MSVC
builds.

Or perhaps there is some other dumb mistake, or perhaps the reparse
point really is different, or ... I dunno, I'd probably shove a load
of log messages in there and see what's going on.

[1]: /messages/by-id/CABUevEzURN=wC95JHvTKFJtEy0eY9rWO42yU=59-q8xSwm-Dug@mail.gmail.com

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#12)
Re: fairywren hung in pg_basebackup tests

On 2022-07-26 Tu 18:31, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-07-25 Mo 11:24, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it). If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned? I'm struggling to understand why msys
would change the answer though.

I have no idea either. The link exists and it is a junction. I'll see
about logging details.
From the clues so far, it seems like pgwin32_is_junction(fullpath) was

returning true, but the following code in get_dirent_type(), which was
supposed to be equivalent, is not reached on MSYS (though it
apparently does on MSVC?):

+       if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
+               (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
d->ret.d_type = DT_LNK;

pgwin32_is_junction() uses GetFileAttributes() and tests (attr &
FILE_ATTRIBUTE_REPARSE_POINT) == FILE_ATTRIBUTE_REPARSE_POINT, which
is like the first condition but lacks the dwReserved0 part. What is
that part doing, and why would it be doing something different in MSVC
and MSYS builds? That code came from 87e6ed7c, recently I was just
trying to fix it by reordering the checks; oh, there was some
discussion about that field[1].

One idea is that something about dwReserved0 or
IO_REPARSE_TAG_MOUNT_POINT is different in the open source replacement
system headers supplied by the MinGW project used by MSYS builds
(right?), compared to the "real" Windows SDK's headers used by MSVC
builds.

Or perhaps there is some other dumb mistake, or perhaps the reparse
point really is different, or ... I dunno, I'd probably shove a load
of log messages in there and see what's going on.

[1] /messages/by-id/CABUevEzURN=wC95JHvTKFJtEy0eY9rWO42yU=59-q8xSwm-Dug@mail.gmail.com

dirent.c is not used on msys, only on MSVC. msys is apparently using
opendir and friends supplied by the system.

What it does if there's a junction I'll try to find out, but it appears
that 5344723755 was conceived under a misapprehension about the
behaviour of msys.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#13)
Re: fairywren hung in pg_basebackup tests

On 2022-07-27 We 09:47, Andrew Dunstan wrote:

On 2022-07-26 Tu 18:31, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 4:03 AM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-07-25 Mo 11:24, Thomas Munro wrote:

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it). If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned? I'm struggling to understand why msys
would change the answer though.

I have no idea either. The link exists and it is a junction. I'll see
about logging details.
From the clues so far, it seems like pgwin32_is_junction(fullpath) was

returning true, but the following code in get_dirent_type(), which was
supposed to be equivalent, is not reached on MSYS (though it
apparently does on MSVC?):

+       if ((fd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) != 0 &&
+               (fd.dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
d->ret.d_type = DT_LNK;

pgwin32_is_junction() uses GetFileAttributes() and tests (attr &
FILE_ATTRIBUTE_REPARSE_POINT) == FILE_ATTRIBUTE_REPARSE_POINT, which
is like the first condition but lacks the dwReserved0 part. What is
that part doing, and why would it be doing something different in MSVC
and MSYS builds? That code came from 87e6ed7c, recently I was just
trying to fix it by reordering the checks; oh, there was some
discussion about that field[1].

One idea is that something about dwReserved0 or
IO_REPARSE_TAG_MOUNT_POINT is different in the open source replacement
system headers supplied by the MinGW project used by MSYS builds
(right?), compared to the "real" Windows SDK's headers used by MSVC
builds.

Or perhaps there is some other dumb mistake, or perhaps the reparse
point really is different, or ... I dunno, I'd probably shove a load
of log messages in there and see what's going on.

[1] /messages/by-id/CABUevEzURN=wC95JHvTKFJtEy0eY9rWO42yU=59-q8xSwm-Dug@mail.gmail.com

dirent.c is not used on msys, only on MSVC. msys is apparently using
opendir and friends supplied by the system.

What it does if there's a junction I'll try to find out, but it appears
that 5344723755 was conceived under a misapprehension about the
behaviour of msys.

The msys dirent.h doesn't have a d_type field at all in a struct dirent.
I can see a number of ways of dealing with this, but the simplest seems
to be just to revert 5344723755, at least for msys, along with a comment
about why it's necessary.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#15Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrew Dunstan (#14)
Re: fairywren hung in pg_basebackup tests

On 2022-Jul-27, Andrew Dunstan wrote:

The msys dirent.h doesn't have a d_type field at all in a struct dirent.
I can see a number of ways of dealing with this, but the simplest seems
to be just to revert 5344723755, at least for msys, along with a comment
about why it's necessary.

Hmm, what other ways there are? I'm about to push a change that
duplicates the get_dirent_type call pattern and I was happy about not
having that #ifdef there. Not that it's critical, but ...

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#15)
Re: fairywren hung in pg_basebackup tests

On 2022-07-27 We 10:24, Alvaro Herrera wrote:

On 2022-Jul-27, Andrew Dunstan wrote:

The msys dirent.h doesn't have a d_type field at all in a struct dirent.
I can see a number of ways of dealing with this, but the simplest seems
to be just to revert 5344723755, at least for msys, along with a comment
about why it's necessary.

Hmm, what other ways there are? I'm about to push a change that
duplicates the get_dirent_type call pattern and I was happy about not
having that #ifdef there. Not that it's critical, but ...

The alternative I thought of would be to switch msys to using our
dirent.c. Probably not too hard, but certainly more work than reverting.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: fairywren hung in pg_basebackup tests

Andrew Dunstan <andrew@dunslane.net> writes:

The alternative I thought of would be to switch msys to using our
dirent.c. Probably not too hard, but certainly more work than reverting.

If you ask me, the shortest-path general-purpose fix is to insert

#if MSYS
if (pgwin32_is_junction(path))
return PGFILETYPE_DIR;
#endif

at the start of get_dirent_type. (I'm not sure how to spell the
#if test.) We could look at using dirent.c later, but I think
right now it's important to un-break the buildfarm ASAP.

regards, tom lane

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#17)
Re: fairywren hung in pg_basebackup tests

On 2022-07-27 We 10:58, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The alternative I thought of would be to switch msys to using our
dirent.c. Probably not too hard, but certainly more work than reverting.

If you ask me, the shortest-path general-purpose fix is to insert

#if MSYS
if (pgwin32_is_junction(path))
return PGFILETYPE_DIR;
#endif

at the start of get_dirent_type. (I'm not sure how to spell the
#if test.) We could look at using dirent.c later, but I think
right now it's important to un-break the buildfarm ASAP.

+1. I think you spell it:

#if defined(WIN32) && !defined(_MSC_VER)

(c.f. libpq-be.h)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Andrew Dunstan (#18)
1 attachment(s)
Re: fairywren hung in pg_basebackup tests

On Thu, Jul 28, 2022 at 3:21 AM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-07-27 We 10:58, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The alternative I thought of would be to switch msys to using our
dirent.c. Probably not too hard, but certainly more work than reverting.

Thanks for figuring this out Andrew. Previously I thought of MSYS as
a way to use configure+make+gcc/clang but pure Windows C APIs (using
MinGW's replacement Windows headers), but today I learned that
MSYS/MinGW also supplies a small amount of POSIX stuff, including
readdir() etc, so we don't use our own emulation in that case.

I suppose we could consider using own dirent.h/c with MinGW (and
seeing if there are other similar hazards), to reduce the number of
Windows/POSIX API combinations we have to fret about, but not today.

Another thought for the future is that lstat() + S_ISLNK() could
probably be made to fire for junction points on Windows (all build
variants), and then get_dirent_type()'s fallback code for DT_UNKNOWN
would have Just Worked (no extra system call required), and we could
also probably remove calls to pgwin32_is_junction() everywhere.

If you ask me, the shortest-path general-purpose fix is to insert

#if MSYS
if (pgwin32_is_junction(path))
return PGFILETYPE_DIR;
#endif

at the start of get_dirent_type. (I'm not sure how to spell the
#if test.) We could look at using dirent.c later, but I think
right now it's important to un-break the buildfarm ASAP.

+1. I think you spell it:

#if defined(WIN32) && !defined(_MSC_VER)

I thought about putting it at the top, but don't we really only need
to make an extra system call if MinGW's stat() told us it saw a
directory? And what if you asked to look through symlinks? I thought
about putting it near the S_ISLNK() test, which is the usual pattern,
but then what if MinGW decides to add d_type support one day? Those
thoughts led to the attached formulation. Untested. I'll go and try
to see if I can run this with Melih's proposed MSYS CI support...

Attachments:

0001-Fix-get_dirent_type-for-simlinks-on-MinGW-MSYS.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-get_dirent_type-for-simlinks-on-MinGW-MSYS.patchDownload
From 33005aaad52a550ba028277dfb634d3d6fbfdd7f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 28 Jul 2022 11:45:31 +1200
Subject: [PATCH] Fix get_dirent_type() for simlinks on MinGW/MSYS.

On Windows with MSVC, get_dirent_type() was recently made to return
DT_LNK for junction points by commit 9d3444dc, which fixed some
defective dirent.c code.

On Windows with Cygwin, get_dirent_type() already worked for symlinks,
as it does on POSIX systems, because Cygwin has its own fake symlinks
that behave like POSIX (on closer inspection, Cygwin's dirent has the
BSD d_type extension but it's probably always DT_UNKNOWN, so we fall
back to lstat(), which understands Cygwin symlinks with S_ISLNK()).

On Windows with MinGW/MSYS, we need extra code, because the MinGW
runtime has its own readdir() without d_type, and the lstat()-based
fallback has no knowledge of our convention for treating junctions as
symlinks.

Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net
---
 src/common/file_utils.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 19d308ad1f..210b2c4dfd 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -465,5 +465,19 @@ get_dirent_type(const char *path,
 #endif
 	}
 
+#if defined(WIN32) && !defined(_MSC_VER)
+	/*
+	 * If we're on native Windows (not Cygwin, which has its own POSIX
+	 * symlinks), but not using the MSVC compiler, then we're using a readdir()
+	 * emulation provided by the compiler's runtime that has no concept of
+	 * symlinks.  It sees junction points as directories, so we need an extra
+	 * system call to recognize them as symlinks, following our convention.
+	 */
+	if (result == PGFILETYPE_DIR &&
+		!look_through_symlinks &&
+		pgwin32_is_junction(path))
+		result = PGFILETYPE_LNK;
+#endif
+
 	return result;
 }
-- 
2.36.1

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#19)
Re: fairywren hung in pg_basebackup tests

On Thu, Jul 28, 2022 at 12:41 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I thought about putting it at the top, but don't we really only need
to make an extra system call if MinGW's stat() told us it saw a
directory? And what if you asked to look through symlinks? I thought
about putting it near the S_ISLNK() test, which is the usual pattern,
but then what if MinGW decides to add d_type support one day? Those
thoughts led to the attached formulation. Untested. I'll go and try
to see if I can run this with Melih's proposed MSYS CI support...

Success. I'll push this, and then hopefully those BF animals can be
unwedged. (I see some unrelated warnings to look into.)

https://cirrus-ci.com/task/5253183533481984?logs=tests#L835