fairywren is generating bogus BASE_BACKUP commands
Thomas Munro pointed out this failure to me on fairywren:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-01-21%2020%3A10%3A22
He theorizes that I need some perl2host magic in there, which may well
be true. But I also noticed this:
# Running: pg_basebackup --no-sync -cfast --target
server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
-X none
pg_basebackup: error: could not initiate base backup: ERROR:
unrecognized target: "server;C"
"server" is a valid backup target, but "server;C" is not. And I think
this must be a bug on the client side, because the server logs the
generated query:
2022-01-21 20:53:11.618 UTC [8404:10] 010_pg_basebackup.pl LOG:
received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base
backup', PROGRESS, CHECKPOINT 'fast', MANIFEST 'yes',
TABLESPACE_MAP, TARGET 'server;C', TARGET_DETAIL
'\\tools\\msys64\\home\\pgrunner\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_Ag8r\\backuponserver')
So it's not that the server parses the query and introduces gibberish
-- the client has already introduced gibberish when constructing the
query. But the code to do that is pretty straightforward -- we just
call strchr to find the colon in the backup target, and then
pnstrdup() the part before the colon and use the latter part as-is. If
pnstrdup were failing to add a terminating \0 then this would be quite
plausible, but I think it shouldn't. Unless the operating sytem's
strnlen() is buggy? That seems like a stretch, so feel free to tell me
what obvious stupid thing I did here and am not seeing...
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
"server" is a valid backup target, but "server;C" is not. And I think
this must be a bug on the client side, because the server logs the
generated query:
2022-01-21 20:53:11.618 UTC [8404:10] 010_pg_basebackup.pl LOG:
received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base
backup', PROGRESS, CHECKPOINT 'fast', MANIFEST 'yes',
TABLESPACE_MAP, TARGET 'server;C', TARGET_DETAIL
'\\tools\\msys64\\home\\pgrunner\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_Ag8r\\backuponserver')
So it's not that the server parses the query and introduces gibberish
-- the client has already introduced gibberish when constructing the
query. But the code to do that is pretty straightforward -- we just
call strchr to find the colon in the backup target, and then
pnstrdup() the part before the colon and use the latter part as-is. If
pnstrdup were failing to add a terminating \0 then this would be quite
plausible, but I think it shouldn't. Unless the operating sytem's
strnlen() is buggy? That seems like a stretch, so feel free to tell me
what obvious stupid thing I did here and am not seeing...
I think the backup_target string was already corrupted that way when
pg_basebackup absorbed it from optarg. It's pretty hard to believe that
the strchr/pnstrdup stanza got it wrong. However, comparing the
TARGET_DETAIL to what the TAP test says it issued:
# Running: pg_basebackup --no-sync -cfast --target server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver -X none
pg_basebackup: error: could not initiate base backup: ERROR: unrecognized target: "server;C"
it's absolutely clear that something decided to munge the target string.
Given that colon is reserved in Windows filename syntax, it's not
so surprising if it munged it wrong, or at least more aggressively
than you expected.
I kinda wonder if this notation for the target was well-chosen.
Keeping the file name strictly separate from the "type" keyword
might be a wiser move. Quite aside from Windows-isms, there
are going to be usages where this is hard to tell from a URL.
(If memory serves, double leading slash is significant to some
networked file systems.)
While we're on the subject of ill-chosen option syntax: "-cfast"
with non double dashes? Really? That's horribly ambiguous.
Most programs would parse something like that as five single-letter
options, and most users who know Unix would expect it to mean that.
regards, tom lane
On Sat, Jan 22, 2022 at 10:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
# Running: pg_basebackup --no-sync -cfast --target
server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
-X none
pg_basebackup: error: could not initiate base backup: ERROR:
unrecognized target: "server;C""server" is a valid backup target, but "server;C" is not. And I think
this must be a bug on the client side, because the server logs the
generated query:
It looks a bit like msys perl could be recognising
"server:/home/pgrunner/..." and converting it to
"server;C:\tools\msys64\home\pgrunner\...". From a bit of light
googling I see that such conversions happen in msys perl's system()
unless you turn them off with MSYS_NO_PATHCONV, and then we'd have to
do it ourselves in the right places.
On Fri, Jan 21, 2022 at 5:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the backup_target string was already corrupted that way when
pg_basebackup absorbed it from optarg. It's pretty hard to believe that
the strchr/pnstrdup stanza got it wrong. However, comparing the
TARGET_DETAIL to what the TAP test says it issued:# Running: pg_basebackup --no-sync -cfast --target server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver -X none
pg_basebackup: error: could not initiate base backup: ERROR: unrecognized target: "server;C"it's absolutely clear that something decided to munge the target string.
Given that colon is reserved in Windows filename syntax, it's not
so surprising if it munged it wrong, or at least more aggressively
than you expected.
Nothing in the Perl code tells it that the particular argument in
question is a path rather than anything else, so it must be applying
some heuristic to decide whether to munge it. That's horrible.
I kinda wonder if this notation for the target was well-chosen.
Keeping the file name strictly separate from the "type" keyword
might be a wiser move. Quite aside from Windows-isms, there
are going to be usages where this is hard to tell from a URL.
(If memory serves, double leading slash is significant to some
networked file systems.)
Maybe. I think it's important that the notation is not ridiculously
verbose, and -t server --target-detail $PATH is a LOT more typing.
While we're on the subject of ill-chosen option syntax: "-cfast"
with non double dashes? Really? That's horribly ambiguous.
Most programs would parse something like that as five single-letter
options, and most users who know Unix would expect it to mean that.
I'm not sure whether you're complaining that we accept that syntax or
using it, but AFAIK I'm responsible for neither. I think the syntax
has been accepted since pg_basebackup was added in 2011, and Andres
added it to this test case earlier this week (with -cfast in the
subject line of the commit message). FWIW, though, I've been aware of
that syntax for a long time and never thought it was a problem. I
usually spell the option in exactly that way when I use it, and I'm
relatively sure that things I've given to customers would break if we
removed support for it. I don't know how we'd do that anyway, since
all that's happening here is a call to getopt_long().
--
Robert Haas
EDB: http://www.enterprisedb.com
On 1/21/22 17:10, Thomas Munro wrote:
On Sat, Jan 22, 2022 at 10:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
# Running: pg_basebackup --no-sync -cfast --target
server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
-X none
pg_basebackup: error: could not initiate base backup: ERROR:
unrecognized target: "server;C""server" is a valid backup target, but "server;C" is not. And I think
this must be a bug on the client side, because the server logs the
generated query:It looks a bit like msys perl could be recognising
"server:/home/pgrunner/..." and converting it to
"server;C:\tools\msys64\home\pgrunner\...". From a bit of light
googling I see that such conversions happen in msys perl's system()
unless you turn them off with MSYS_NO_PATHCONV, and then we'd have to
do it ourselves in the right places.
c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
my $source_ts_prefix = $source_ts_path;
$source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
...
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 21, 2022 at 5:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While we're on the subject of ill-chosen option syntax: "-cfast"
with non double dashes? Really? That's horribly ambiguous.
I'm not sure whether you're complaining that we accept that syntax or
using it, but AFAIK I'm responsible for neither. I think the syntax
has been accepted since pg_basebackup was added in 2011, and Andres
added it to this test case earlier this week (with -cfast in the
subject line of the commit message).
pg_basebackup's help defines the syntax as
-c, --checkpoint=fast|spread
set fast or spread checkpointing
which I'd read as requiring a space (or possibly equal sign)
between "-c" and "fast". If it works as written in this test,
that's an accident of the particular getopt implementation,
and I'll bet it won't be too long before we come across
a getopt that doesn't like it.
regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes:
c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
my $source_ts_prefix = $source_ts_path;
$source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
...
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.
The point I was trying to make is that if we have to jump through
that sort of hoop in the test scripts, then real users are going
to have to jump through it as well, and they won't like that
(and we will get bug reports about it). It'd be better to design
the option syntax to avoid such requirements.
regards, tom lane
Hi,
On 2022-01-21 17:42:45 -0500, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
��� my $source_ts_prefix = $source_ts_path;
��� $source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
��� ...��� # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
��� local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;Probably in this case just setting it to 'server:' would do the trick.
The point I was trying to make is that if we have to jump through
that sort of hoop in the test scripts, then real users are going
to have to jump through it as well, and they won't like that
(and we will get bug reports about it). It'd be better to design
the option syntax to avoid such requirements.
Normal users aren't going to invoke a "native" basebackup from inside msys. I
assume the translation happens because an "msys world" perl invokes
a "native" pg_basebackup via msys system(), right? If pg_basebackup instead is
"normally" invoked from a windows terminal, or anything else "native" windows,
the problem won't exist, no?
As we're building a "native" postgres in this case, none of our tools should
internally have such translations happening. So I don't think it'll be a huge
issue for users themselves?
Not that I think that there are all that many users of mingw built postgres on
windows... I think it's mostly msvc built postgres in that world?
Greetings,
Andres Freund
On Fri, Jan 21, 2022 at 5:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The point I was trying to make is that if we have to jump through
that sort of hoop in the test scripts, then real users are going
to have to jump through it as well, and they won't like that
(and we will get bug reports about it). It'd be better to design
the option syntax to avoid such requirements.
Well, as Andrew points out, pg_basebackup's -T foo=bar syntax causes
the same issue, so you'll need to redesign that, too. But even that is
not really a proper fix. The real root of this problem is that the
operating system's notion of a valid path differs from PostgreSQL's
notion of a valid path on this platform, and I imagine that fixing
that is a rather large project.
ISTM that you're basically just complaining about options syntax that
you don't like, but I think there's nothing particularly worse about
this syntax than lots of other things we type all the time. psql -v
VAR=VALUE -P OTHERKINDOFVAR=OTHERVALUE? curl -u USER:PASSWORD?
pg_basebackup -T OLD=NEW? perl -d[t]:MODULE=OPT,OPT? I mean, that last
one actually seems kinda horrible and if this were as bad as that I'd
say yeah, it should be redesigned. But I don't think it is. There's
plenty of precedent for bundling closely-related values into a single
command-line option, which is all I've done here.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2022-01-21 17:26:32 -0500, Robert Haas wrote:
I think the syntax has been accepted since pg_basebackup was added in 2011,
and Andres added it to this test case earlier this week (with -cfast in the
subject line of the commit message).
The reason I used -cfast instead of -c fast or --checkpoint=fast is that the
way perltidy formats leads to very wide lines already, and making them even
longer seemed unattractive...
Given the -cfast syntax successfully passed tests on at least AIX, freebsd,
linux, macos, netbsd, openbsd, windows msvc, windows msys, I'm not too worried
about portability either.
Greetings,
Andres Freund
On Fri, Jan 21, 2022 at 5:35 PM Andrew Dunstan <andrew@dunslane.net> wrote:
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.
Oh, thanks for the tip. Do you want to push a commit that does that,
or ... should I do it?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Jan 22, 2022 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 21, 2022 at 5:35 PM Andrew Dunstan <andrew@dunslane.net> wrote:
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.Oh, thanks for the tip. Do you want to push a commit that does that,
or ... should I do it?
Just a thought: Would it prevent the magic path translation and all
just work if the path were already in Windows form? So, if we did
just this change at the top:
-my $tempdir = PostgreSQL::Test::Utils::tempdir;
+my $tempdir = PostgreSQL::Test::Utils::perl2host(PostgreSQL::Test::Utils::tempdir);
On 1/21/22 18:04, Andres Freund wrote:
Hi,
On 2022-01-21 17:42:45 -0500, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
my $source_ts_prefix = $source_ts_path;
$source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
...
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.The point I was trying to make is that if we have to jump through
that sort of hoop in the test scripts, then real users are going
to have to jump through it as well, and they won't like that
(and we will get bug reports about it). It'd be better to design
the option syntax to avoid such requirements.Normal users aren't going to invoke a "native" basebackup from inside msys. I
assume the translation happens because an "msys world" perl invokes
a "native" pg_basebackup via msys system(), right? If pg_basebackup instead is
"normally" invoked from a windows terminal, or anything else "native" windows,
the problem won't exist, no?As we're building a "native" postgres in this case, none of our tools should
internally have such translations happening. So I don't think it'll be a huge
issue for users themselves?
All true. This is purely an issue for our testing regime, and not for
end users.
Not that I think that there are all that many users of mingw built postgres on
windows... I think it's mostly msvc built postgres in that world?
The vast majority use the installer which is built with MSVC. Very few
in my experience build their own.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 1/21/22 22:43, Thomas Munro wrote:
On Sat, Jan 22, 2022 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 21, 2022 at 5:35 PM Andrew Dunstan <andrew@dunslane.net> wrote:
# See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
Probably in this case just setting it to 'server:' would do the trick.Oh, thanks for the tip. Do you want to push a commit that does that,
or ... should I do it?Just a thought: Would it prevent the magic path translation and all
just work if the path were already in Windows form? So, if we did
just this change at the top:-my $tempdir = PostgreSQL::Test::Utils::tempdir; +my $tempdir = PostgreSQL::Test::Utils::perl2host(PostgreSQL::Test::Utils::tempdir);
It's not as simple as that :-( But you're on the right track. My
suggestion above doesn't work.
The rule for paths is: when you're passing a path to an external program
that's not msys aware (typically, one of our build artefacts like psql
or pg_basebackup) it needs to be a native path. But when you're passing
it to a perl function (e.g. mkdir) or to an external program that's msys
aware it needs to be a virtual path, i.e. one not mangled by perl2host.
Some recent commits to this file especially have not obeyed this rule.
Here's a patch that does it consistently for the whole file. I have
tested it on a system very like fairywren, and the test passes.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Attachments:
basebackup-test-fix.patchtext/x-patch; charset=UTF-8; name=basebackup-test-fix.patchDownload+63-50
On Sun, Jan 23, 2022 at 12:20 PM Andrew Dunstan <andrew@dunslane.net> wrote:
It's not as simple as that :-( But you're on the right track. My
suggestion above doesn't work.The rule for paths is: when you're passing a path to an external program
that's not msys aware (typically, one of our build artefacts like psql
or pg_basebackup) it needs to be a native path. But when you're passing
it to a perl function (e.g. mkdir) or to an external program that's msys
aware it needs to be a virtual path, i.e. one not mangled by perl2host.Some recent commits to this file especially have not obeyed this rule.
Here's a patch that does it consistently for the whole file. I have
tested it on a system very like fairywren, and the test passes.
I can't understand how this would prevent server:/what/ever from
getting turned into server;c:\what\ever. But if it does, great!
Maybe we need to have a README in the tree somewhere that tries to
explain this. Or maybe we should make our build artifacts msys-aware,
if that's possible, so that this just works. Or maybe supporting msys
is not worth the trouble.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Maybe we need to have a README in the tree somewhere that tries to
explain this. Or maybe we should make our build artifacts msys-aware,
if that's possible, so that this just works. Or maybe supporting msys
is not worth the trouble.
I've been wondering that last myself. Supporting Windows-native is
already a huge amount of work, which we put up with because there
are a lot of users. If msys is going to add another large chunk of
work, has it got enough users to justify that?
The recent argument that this behavior isn't user-visible doesn't do
anything to mollify me on that point; it appears to me to be tantamount
to a concession that no real users actually care about msys.
regards, tom lane
On 1/23/22 15:07, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Maybe we need to have a README in the tree somewhere that tries to
explain this. Or maybe we should make our build artifacts msys-aware,
if that's possible, so that this just works. Or maybe supporting msys
is not worth the trouble.I've been wondering that last myself. Supporting Windows-native is
already a huge amount of work, which we put up with because there
are a lot of users. If msys is going to add another large chunk of
work, has it got enough users to justify that?The recent argument that this behavior isn't user-visible doesn't do
anything to mollify me on that point; it appears to me to be tantamount
to a concession that no real users actually care about msys.
Msys is a unix-like environment that is useful to build Postgres. It's
not intended as a general runtime environment. We therefore don't build
msys-aware Postgres. We use msys to build standalone Postgres binaries
that don't need or use any msys runtime. There is nothing in the least
bit new about this - that's the way it's been since day one of the
Windows port nearly 20 years ago.
Speaking as someone who (for my sins) regularly deals with problems on
Windows, I find msys much easier to deal with than VisualStudio,
probably because it's so much like what I use elsewhere. So I think
dropping msys support would be a serious mistake.
The most common issues we get are around this issue of virtualized paths
in the TAP tests. If people followed the rule I suggested upthread, 99%
of those problems would go away. I realize it's annoying - I've been
caught by it myself on more than one occasion. Maybe there's a way to
avoid it, but if there is I'm unaware of it. But I don't think it's in
any way a good reason to drop msys support.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
The most common issues we get are around this issue of virtualized paths
in the TAP tests. If people followed the rule I suggested upthread, 99%
of those problems would go away. I realize it's annoying - I've been
caught by it myself on more than one occasion. Maybe there's a way to
avoid it, but if there is I'm unaware of it. But I don't think it's in
any way a good reason to drop msys support.
Well, let's go back to Robert's other suggestion: some actual
documentation of these rules, in the source tree, might help
people to follow them. src/test/perl/README seems like an
appropriate place.
regards, tom lane
Hi,
On 2022-01-23 16:09:01 -0500, Andrew Dunstan wrote:
Msys is a unix-like environment that is useful to build Postgres. It's
not intended as a general runtime environment. We therefore don't build
msys-aware Postgres. We use msys to build standalone Postgres binaries
that don't need or use any msys runtime. There is nothing in the least
bit new about this - that's the way it's been since day one of the
Windows port nearly 20 years ago.Speaking as someone who (for my sins) regularly deals with problems on
Windows, I find msys much easier to deal with than VisualStudio,
probably because it's so much like what I use elsewhere. So I think
dropping msys support would be a serious mistake.
I agree that msys support is useful (although configure is *so* slow that I
find its usefullness reduced substantially).
The most common issues we get are around this issue of virtualized paths
in the TAP tests. If people followed the rule I suggested upthread, 99%
of those problems would go away. I realize it's annoying - I've been
caught by it myself on more than one occasion. Maybe there's a way to
avoid it, but if there is I'm unaware of it. But I don't think it's in
any way a good reason to drop msys support.
Needing to sprinkle perl2host and MSYS2_ARG_CONV_EXCL over a good number of
tests, getting weird errors when failing, etc IMO isn't a scalable approach,
for a platform that most of use never use.
Can't we solve this in a generic way? E.g. by insisting that the test run with
a native perl and normalizing the few virtual paths we get invoked with
centrally? Making the msys initial setup a bit more cumbersome would IMO be
an OK price to pay for making it maintainable / predictable from other
platforms, as long as we have some decent docs and decent error messages.
Greetings,
Andres Freund
On 1/23/22 16:31, Andres Freund wrote:
The most common issues we get are around this issue of virtualized paths
in the TAP tests. If people followed the rule I suggested upthread, 99%
of those problems would go away. I realize it's annoying - I've been
caught by it myself on more than one occasion. Maybe there's a way to
avoid it, but if there is I'm unaware of it. But I don't think it's in
any way a good reason to drop msys support.Needing to sprinkle perl2host and MSYS2_ARG_CONV_EXCL over a good number of
tests, getting weird errors when failing, etc IMO isn't a scalable approach,
for a platform that most of use never use.Can't we solve this in a generic way? E.g. by insisting that the test run with
a native perl and normalizing the few virtual paths we get invoked with
centrally? Making the msys initial setup a bit more cumbersome would IMO be
an OK price to pay for making it maintainable / predictable from other
platforms, as long as we have some decent docs and decent error messages.
Nice idea. I have a suspicion that it's going to be harder than you
think, but I'll be very happy to be proved wrong.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com