IPC::Run::time[r|out] vs our TAP tests
TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.
It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14. Apparently it's not seeing a match to what it expects
so it eventually times out, but all we get in the log is
[03:03:42.595](0.002s) ok 82 - complete an interpolated psql variable name
[03:03:42.597](0.002s) ok 83 - \\r works
IPC::Run: timeout on timer #1 at /usr/share/perl5/IPC/Run.pm line 2944.
# Postmaster PID for node "main" is 17308
### Stopping node "main" using mode immediate
We would have learned something useful if control had returned to
pump_until, or even better 010_tab_completion.pl's check_completion,
but it doesn't.
A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.
Also, the debug printout code at the bottom of check_completion
is quite useless, because control can never reach it since
BackgroundPsql::query_until will "die" on failure. I think that
that code worked when written, and I'm suspicious that 664d75753
broke it, but I've not dug deeply into the history.
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14
Attachments:
use-timer-not-timeout-in-BackgroundPsql.patchtext/x-diff; charset=us-ascii; name=use-timer-not-timeout-in-BackgroundPsql.patchDownload+2-1
On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.
Skimming this and a few callsites it seems reasonable to use a timer instead of
a timeout, but careful study is needed to make sure we're not introducing
anything subtly wrong in the other direction.
Also, the debug printout code at the bottom of check_completion
is quite useless, because control can never reach it since
BackgroundPsql::query_until will "die" on failure. I think that
that code worked when written, and I'm suspicious that 664d75753
broke it, but I've not dug deeply into the history.
AFAICT, in the previous coding the interactive_psql object would use a timer or
timeout based on what the caller provided, and check_completion used a timer so
the debug logging probably worked as written.
--
Daniel Gustafsson
On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1].
Also please note that tanager has been offline from around the 19th of
March to the 3rd of April, explaining the delay in reporting the
failure in this new psql test. I've switched it back online two days
ago.
Tom, would you like me to test your patch directly on the host? That
should be pretty quick, even if I've not yet checked if the failure is
reproducible with a manual build, outside the buildfarm scripts.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1].
Tom, would you like me to test your patch directly on the host? That
should be pretty quick, even if I've not yet checked if the failure is
reproducible with a manual build, outside the buildfarm scripts.
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
But we need to see what it's actually emitting before there's
much hope of adjusting the expected-output regex.
regards, tom lane
On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
But we need to see what it's actually emitting before there's
much hope of adjusting the expected-output regex.
I have been able to reproduce the failure manually and your patch is
providing more information, indeed, as of:
[10:21:44.017](0.002s) ok 83 - \r works
[10:24:45.462](181.445s) # pump_until: timeout expired when searching
for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
\echo :\{\?VERBOSITY\} "
psql query timed out at
/home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 281.
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?
--
Michael
Attachments:
regress_log_010_tab_completiontext/plain; charset=us-asciiDownload
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
I have been able to reproduce the failure manually and your patch is
providing more information, indeed, as of:
[10:21:44.017](0.002s) ok 83 - \r works
[10:24:45.462](181.445s) # pump_until: timeout expired when searching
for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
\echo :\{\?VERBOSITY\} "
psql query timed out at
/home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 281.
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}
Just to be clear: you see the extra backslashes if you try this
tab-completion manually?
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?
It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.
regards, tom lane
On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}Just to be clear: you see the extra backslashes if you try this
tab-completion manually?
Yeah, I do, after completing "\echo :{?VERB" with this version of
libedit. I see that this completes with backslashes added before '{',
'}' and '?'. The test is telling the same.
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.
I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
--
Michael
On 2024-04-05 04:37 +0200, Michael Paquier wrote:
On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}Just to be clear: you see the extra backslashes if you try this
tab-completion manually?Yeah, I do, after completing "\echo :{?VERB" with this version of
libedit. I see that this completes with backslashes added before '{',
'}' and '?'. The test is telling the same.Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.
https://github.com/NetBSD/src/tree/trunk/lib/libedit
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.
Yeah, the test is passing on mamba which is running the (just
officially released) NetBSD 10.0. I'm not sure whether 10.0
has the "latest" libedit or something a little further back.
sidewinder, with NetBSD 9.3, is happy as well. But 20181209
presumably belongs to NetBSD 8.x, which is theoretically still
in support, so maybe it's worth poking into.
regards, tom lane
On 2024-04-05 05:10 +0200, Tom Lane wrote:
Erik Wienhold <ewie@ewie.name> writes:
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.Yeah, the test is passing on mamba which is running the (just
officially released) NetBSD 10.0. I'm not sure whether 10.0
has the "latest" libedit or something a little further back.
sidewinder, with NetBSD 9.3, is happy as well. But 20181209
presumably belongs to NetBSD 8.x, which is theoretically still
in support, so maybe it's worth poking into.
Having a look right now. Change [1]https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1 looks like a good candidate which
is likely in 20221030-3.1.
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:
LDFLAGS='-L/usr/local/lib' ./configure --with-libedit-preferred
(My ld fu is not so great.)
[1]: https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:
You probably want configure --with-libs=/usr/local/lib,
and likely also --with-includes=/usr/local/include.
regards, tom lane
On 2024-04-05 05:37 +0200, Tom Lane wrote:
Erik Wienhold <ewie@ewie.name> writes:
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:You probably want configure --with-libs=/usr/local/lib,
and likely also --with-includes=/usr/local/include.
Thanks Tom. But I also have to run psql with:
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib src/bin/psql/psql
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1]https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2 (changed on 2019-03-24 but not included in 20190324-3.1) and
[2]: https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae
[1]: https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2
[2]: https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
[2] (both inclusive).
Hm. I just installed NetBSD 8.2 in a VM, and it passes this test:
# +++ tap install-check in src/bin/psql +++
t/001_basic.pl ........... ok
t/010_tab_completion.pl .. ok
t/020_cancel.pl .......... ok
All tests successful.
So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
Erik Wienhold <ewie@ewie.name> writes:
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
[2] (both inclusive).Hm. I just installed NetBSD 8.2 in a VM, and it passes this test:
# +++ tap install-check in src/bin/psql +++
t/001_basic.pl ........... ok
t/010_tab_completion.pl .. ok
t/020_cancel.pl .......... ok
All tests successful.So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:
libedit | 3.1-20181209-1 | oldoldstable | source
libedit | 3.1-20191231-2 | oldstable | source
libedit | 3.1-20221030-2 | stable | source
libedit | 3.1-20230828-1 | testing | source
libedit | 3.1-20230828-1 | unstable | source
libedit | 3.1-20230828-1 | unstable-debug | source
But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.
And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.
regards, tom lane
- ilmari
On 2024-04-04 Th 17:24, Tom Lane wrote:
TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.
I started having a look at these.
Here are the cases I found:
./src/bin/psql/t/010_tab_completion.pl: my $okay = ($out =~ $pattern
&& !$h->{timeout}->is_expired);
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: until
$self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql startup
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless
$timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the
IPC::Run::Timeout object whose is_expired method can be tested
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless
$timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Utils.pm: if ($timeout->is_expired)
./src/test/recovery/t/021_row_visibility.pl: if
($psql_timeout->is_expired)
./src/test/recovery/t/032_relfilenode_reuse.pl: if
($psql_timeout->is_expired)
Those in Cluster.pm look correct - they are doing the run() in an eval
block and testing for the is_expired setting in an exception block. The
other cases look more suspect. I'll take a closer look.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:
...
But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.
And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.
Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.
What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.
(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)
regards, tom lane
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.
Thanks for the investigations! It's clear that this version of
libedit is borked.
(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)
Yeah, my motivation did not go beyond that so you can blame me for
being lazy. ;)
Note also the --without-icu on this host that has little coverage, as
well.
--
Michael
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.
By the way, are you planning to do something like [1]/messages/by-id/1100715.1712265845@sss.pgh.pa.us -- Michael? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.
[1]: /messages/by-id/1100715.1712265845@sss.pgh.pa.us -- Michael
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
By the way, are you planning to do something like [1]? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.
It needs more review I think - I didn't check every call site to see
if anything would be broken. I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.
regards, tom lane
On 2024-04-09 Tu 09:46, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
By the way, are you planning to do something like [1]? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.It needs more review I think - I didn't check every call site to see
if anything would be broken. I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.
What I looked at so far was the use of is_expired, but when you look
into that you see that you need to delve further, to where timeout/timer
objects are created and passed around. I'll take a closer look when I
have done some incremental json housekeeping.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com