Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

Started by Thomas Munroalmost 2 years ago8 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

I noticed that margay (Solaris) has started running more of the tests
lately, but is failing in pg_basebaseup/010_pg_basebackup. It runs
successfully on wrasse (in older branches, Solaris 11.3 is desupported
in 17/master), and also on pollock (illumos, forked from common
ancestor Solaris 10 while it was open source).

Hmm, wrasse is using "/opt/csw/bin/gtar xf ..." and pollock is using
"/usr/gnu/bin/tar xf ...", while margay is using "/usr/bin/tar xf
...". The tar command is indicating success (it's run by
system_or_bail and it's not bailing), but the replica doesn't want to
come up:

pg_ctl: directory
"/home/marcel/build-farm-15/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/pgdata"
is not a database cluster directory"

So one idea would be that our tar format is incompatible with Sun tar
in some way that corrupts the output, or there is some still
difference in the nesting of the directory structure it creates, or
something like that. I wonder if this is already common knowledge in
the repressed memories of this list, but I couldn't find anything
specific. I'd be curious to know why exactly, if so (in terms of
POSIX conformance etc, who is doing something wrong).

#2Marcel Hofstetter
hofstetter@jomasoft.ch
In reply to: Thomas Munro (#1)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

Hi

Is there a way to configure which tar to use?

gnu tar would be available.

-bash-5.1$ ls -l /usr/gnu/bin/tar
-r-xr-xr-x 1 root bin 1226248 Jul 1 2022 /usr/gnu/bin/tar

Which tar file is used?
I could try to untar manually to see what happens.

Best regards,
Marcel

Am 17.04.2024 um 06:21 schrieb Thomas Munro:

Show quoted text

Hi,

I noticed that margay (Solaris) has started running more of the tests
lately, but is failing in pg_basebaseup/010_pg_basebackup. It runs
successfully on wrasse (in older branches, Solaris 11.3 is desupported
in 17/master), and also on pollock (illumos, forked from common
ancestor Solaris 10 while it was open source).

Hmm, wrasse is using "/opt/csw/bin/gtar xf ..." and pollock is using
"/usr/gnu/bin/tar xf ...", while margay is using "/usr/bin/tar xf
...". The tar command is indicating success (it's run by
system_or_bail and it's not bailing), but the replica doesn't want to
come up:

pg_ctl: directory
"/home/marcel/build-farm-15/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_replica_data/pgdata"
is not a database cluster directory"

So one idea would be that our tar format is incompatible with Sun tar
in some way that corrupts the output, or there is some still
difference in the nesting of the directory structure it creates, or
something like that. I wonder if this is already common knowledge in
the repressed memories of this list, but I couldn't find anything
specific. I'd be curious to know why exactly, if so (in terms of
POSIX conformance etc, who is doing something wrong).

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Marcel Hofstetter (#2)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

On Wed, Apr 17, 2024 at 7:17 PM Marcel Hofstetter
<hofstetter@jomasoft.ch> wrote:

Is there a way to configure which tar to use?

gnu tar would be available.

-bash-5.1$ ls -l /usr/gnu/bin/tar
-r-xr-xr-x 1 root bin 1226248 Jul 1 2022 /usr/gnu/bin/tar

Cool. I guess you could fix the test either by setting
TAR=/usr/gnu/bin/tar or PATH=/usr/gnu/bin:$PATH.

If we want to understand *why* it doesn't work, someone would need to
dig into that. It's possible that PostgreSQL is using some GNU
extension (if so, apparently the BSDs' tar is OK with it too, and I
guess AIX's and HP-UX's was too in the recent times before we dropped
those OSes). I vaguely recall (maybe 20 years ago, time flies) that
Solaris tar wasn't able to extract some tarballs but I can't remember
why... I'm also happy to leave it at "Sun's tar doesn't work for us,
we don't know why" if you are.

#4Marcel Hofstetter
hofstetter@jomasoft.ch
In reply to: Thomas Munro (#3)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

Hi Thomas

Using gnu tar helps to make pg_basebackup work.
It fails now at a later step.

Best regards,
Marcel

Am 17.04.2024 um 10:52 schrieb Thomas Munro:

Show quoted text

On Wed, Apr 17, 2024 at 7:17 PM Marcel Hofstetter
<hofstetter@jomasoft.ch> wrote:

Is there a way to configure which tar to use?

gnu tar would be available.

-bash-5.1$ ls -l /usr/gnu/bin/tar
-r-xr-xr-x 1 root bin 1226248 Jul 1 2022 /usr/gnu/bin/tar

Cool. I guess you could fix the test either by setting
TAR=/usr/gnu/bin/tar or PATH=/usr/gnu/bin:$PATH.

If we want to understand *why* it doesn't work, someone would need to
dig into that. It's possible that PostgreSQL is using some GNU
extension (if so, apparently the BSDs' tar is OK with it too, and I
guess AIX's and HP-UX's was too in the recent times before we dropped
those OSes). I vaguely recall (maybe 20 years ago, time flies) that
Solaris tar wasn't able to extract some tarballs but I can't remember
why... I'm also happy to leave it at "Sun's tar doesn't work for us,
we don't know why" if you are.

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Marcel Hofstetter (#4)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

On Thu, Apr 18, 2024 at 1:40 AM Marcel Hofstetter
<hofstetter@jomasoft.ch> wrote:

Using gnu tar helps to make pg_basebackup work.

Thanks! I guess that'll remain a mystery.

It fails now at a later step.

Oh, this rings a bell:

[14:54:58] t/010_tab_completion.pl ..
Dubious, test returned 29 (wstat 7424, 0x1d00)

We had another thread[1]/messages/by-id/MEYP282MB1669E2E11495A2DEAECE8736B6A7A@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM where we figured out that Solaris's termios
defaults include TABDLY=TAB3, meaning "expand tabs to spaces on
output", and that was upsetting our tab-completion test. Other Unixes
used to vary on this point too, but they all converged on not doing
that, except Solaris, apparently. Perhaps IPC::Run could fix that by
calling ->set_raw() on the pseudo-terminal, but I'm not very sure
about that.

This test suite is passing on pollock because it doesn't have IO::Pty
installed. Could you try uninstalling that perl package for now, so
we can see what breaks next?

[06:34:40] t/010_tab_completion.pl .. skipped: IO::Pty is needed to
run this test

[1]: /messages/by-id/MEYP282MB1669E2E11495A2DEAECE8736B6A7A@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#5)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

Thomas Munro <thomas.munro@gmail.com> writes:

This test suite is passing on pollock because it doesn't have IO::Pty
installed. Could you try uninstalling that perl package for now, so
we can see what breaks next?

If that's inconvenient for some reason, you could also skip the
tab-completion test by setting SKIP_READLINE_TESTS in the
animal's build_env options.

regards, tom lane

#7Marcel Hofstetter
hofstetter@jomasoft.ch
In reply to: Tom Lane (#6)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

Thank you tom.

SKIP_READLINE_TESTS works. margay is now green again.

Best regards,
Marcel

Am 17.04.2024 um 21:12 schrieb Tom Lane:

Show quoted text

Thomas Munro <thomas.munro@gmail.com> writes:

This test suite is passing on pollock because it doesn't have IO::Pty
installed. Could you try uninstalling that perl package for now, so
we can see what breaks next?

If that's inconvenient for some reason, you could also skip the
tab-completion test by setting SKIP_READLINE_TESTS in the
animal's build_env options.

regards, tom lane

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Marcel Hofstetter (#7)
Re: Solaris tar issues, or other reason why margay fails 010_pg_basebackup?

On Fri, Apr 19, 2024 at 12:57 AM Marcel Hofstetter
<hofstetter@jomasoft.ch> wrote:

SKIP_READLINE_TESTS works. margay is now green again.

Great! FTR there was a third thing revealed by margay since you
enabled the TAP tests: commit e2a23576.

I would guess that the best chance of getting the readline stuff to
actually work would be to interest someone who hacks on
IPC::Run-and-related-stuff (*cough* Noah *cough*) and who has Solaris
access to look at that... I would guess it needs a one-line fix
relating to raw/cooked behaviour, but as the proverbial mechanic said,
most of the fee is for knowing where to hit it...