IPC::Run::time[r|out] vs our TAP tests
TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.
It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14. Apparently it's not seeing a match to what it expects
so it eventually times out, but all we get in the log is
[03:03:42.595](0.002s) ok 82 - complete an interpolated psql variable name
[03:03:42.597](0.002s) ok 83 - \\r works
IPC::Run: timeout on timer #1 at /usr/share/perl5/IPC/Run.pm line 2944.
# Postmaster PID for node "main" is 17308
### Stopping node "main" using mode immediate
We would have learned something useful if control had returned to
pump_until, or even better 010_tab_completion.pl's check_completion,
but it doesn't.
A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.
Also, the debug printout code at the bottom of check_completion
is quite useless, because control can never reach it since
BackgroundPsql::query_until will "die" on failure. I think that
that code worked when written, and I'm suspicious that 664d75753
broke it, but I've not dug deeply into the history.
regards, tom lane
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tanager&dt=2024-04-04%2016%3A56%3A14
Attachments:
use-timer-not-timeout-in-BackgroundPsql.patchtext/x-diff; charset=us-ascii; name=use-timer-not-timeout-in-BackgroundPsql.patchDownload
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 4091c311b8..72bb21e160 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -96,7 +96,7 @@ sub new
"Forbidden caller of constructor: package: $package, file: $file:$line"
unless $package->isa('PostgreSQL::Test::Cluster');
- $psql->{timeout} = IPC::Run::timeout(
+ $psql->{timeout} = IPC::Run::timer(
defined($timeout)
? $timeout
: $PostgreSQL::Test::Utils::timeout_default);
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 42d5a50dc8..8fd0426d30 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -429,6 +429,7 @@ Pump until string is matched on the specified stream, or timeout occurs.
sub pump_until
{
my ($proc, $timeout, $stream, $until) = @_;
+ die "timeout should be a timer" if defined($timeout->exception);
$proc->pump_nb();
while (1)
{
On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.
Skimming this and a few callsites it seems reasonable to use a timer instead of
a timeout, but careful study is needed to make sure we're not introducing
anything subtly wrong in the other direction.
Also, the debug printout code at the bottom of check_completion
is quite useless, because control can never reach it since
BackgroundPsql::query_until will "die" on failure. I think that
that code worked when written, and I'm suspicious that 664d75753
broke it, but I've not dug deeply into the history.
AFAICT, in the previous coding the interactive_psql object would use a timer or
timeout based on what the caller provided, and check_completion used a timer so
the debug logging probably worked as written.
--
Daniel Gustafsson
On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1].
Also please note that tanager has been offline from around the 19th of
March to the 3rd of April, explaining the delay in reporting the
failure in this new psql test. I've switched it back online two days
ago.
Tom, would you like me to test your patch directly on the host? That
should be pretty quick, even if I've not yet checked if the failure is
reproducible with a manual build, outside the buildfarm scripts.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Apr 04, 2024 at 05:24:05PM -0400, Tom Lane wrote:
The particular thing that started me down this road was wondering
why we are getting no useful failure details from buildfarm member
tanager's struggles with the tab-completion test case added by commit
927332b95 [1].
Tom, would you like me to test your patch directly on the host? That
should be pretty quick, even if I've not yet checked if the failure is
reproducible with a manual build, outside the buildfarm scripts.
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
But we need to see what it's actually emitting before there's
much hope of adjusting the expected-output regex.
regards, tom lane
On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
But we need to see what it's actually emitting before there's
much hope of adjusting the expected-output regex.
I have been able to reproduce the failure manually and your patch is
providing more information, indeed, as of:
[10:21:44.017](0.002s) ok 83 - \r works
[10:24:45.462](181.445s) # pump_until: timeout expired when searching
for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
\echo :\{\?VERBOSITY\} "
psql query timed out at
/home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 281.
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?
--
Michael
Attachments:
Michael Paquier <michael@paquier.xyz> writes:
On Thu, Apr 04, 2024 at 07:09:53PM -0400, Tom Lane wrote:
If you have time, that'd be great. What I suspect is that that
machine's readline isn't regurgitating the string verbatim but is
doing something fancy with backspaces or other control characters.
I have been able to reproduce the failure manually and your patch is
providing more information, indeed, as of:
[10:21:44.017](0.002s) ok 83 - \r works
[10:24:45.462](181.445s) # pump_until: timeout expired when searching
for "(?^::\{\?VERBOSITY} )" with stream: "\echo :{?VERB^G^Mpostgres=#
\echo :\{\?VERBOSITY\} "
psql query timed out at
/home/pgbuildfarm/git/postgres/src/bin/psql/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
line 281.
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}
Just to be clear: you see the extra backslashes if you try this
tab-completion manually?
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?
It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.
regards, tom lane
On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}Just to be clear: you see the extra backslashes if you try this
tab-completion manually?
Yeah, I do, after completing "\echo :{?VERB" with this version of
libedit. I see that this completes with backslashes added before '{',
'}' and '?'. The test is telling the same.
Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.
I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
--
Michael
On 2024-04-05 04:37 +0200, Michael Paquier wrote:
On Thu, Apr 04, 2024 at 10:31:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
This stuff is actually kind of funny on this host, "\echo :{?VERB\t"
completes to something incorrect, as of:
postgres=# \echo :\{\?VERBOSITY\}Just to be clear: you see the extra backslashes if you try this
tab-completion manually?Yeah, I do, after completing "\echo :{?VERB" with this version of
libedit. I see that this completes with backslashes added before '{',
'}' and '?'. The test is telling the same.Attaching the log file, for reference. Now I can see that this uses
libedit at 3.1-20181209, which is far from recent. I'd be OK to just
remove libedit from the build to remove this noise, still I am
wondering if 927332b95e77 got what it was trying to achieve actually
right. Thoughts?It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.
https://github.com/NetBSD/src/tree/trunk/lib/libedit
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.
Yeah, the test is passing on mamba which is running the (just
officially released) NetBSD 10.0. I'm not sure whether 10.0
has the "latest" libedit or something a little further back.
sidewinder, with NetBSD 9.3, is happy as well. But 20181209
presumably belongs to NetBSD 8.x, which is theoretically still
in support, so maybe it's worth poking into.
regards, tom lane
On 2024-04-05 05:10 +0200, Tom Lane wrote:
Erik Wienhold <ewie@ewie.name> writes:
It works with the latest libedit 20230828-3.1. Have to check the NetBSD
source to find out what changed since 20181209-3.1.Yeah, the test is passing on mamba which is running the (just
officially released) NetBSD 10.0. I'm not sure whether 10.0
has the "latest" libedit or something a little further back.
sidewinder, with NetBSD 9.3, is happy as well. But 20181209
presumably belongs to NetBSD 8.x, which is theoretically still
in support, so maybe it's worth poking into.
Having a look right now. Change [1]https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1 looks like a good candidate which
is likely in 20221030-3.1.
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:
LDFLAGS='-L/usr/local/lib' ./configure --with-libedit-preferred
(My ld fu is not so great.)
[1]: https://github.com/NetBSD/src/commit/12863d4d7917df8a7ef5ad9dab6bb719018a22d1
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:
You probably want configure --with-libs=/usr/local/lib,
and likely also --with-includes=/usr/local/include.
regards, tom lane
On 2024-04-05 05:37 +0200, Tom Lane wrote:
Erik Wienhold <ewie@ewie.name> writes:
I'm trying to build Postgres with that older libedit version but can't
figure out what options to pass to ./configure so that it picks
/usr/local/lib/libedit.so instead of /usr/lib/libedit.so. This didn't
work:You probably want configure --with-libs=/usr/local/lib,
and likely also --with-includes=/usr/local/include.
Thanks Tom. But I also have to run psql with:
LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib src/bin/psql/psql
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1]https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2 (changed on 2019-03-24 but not included in 20190324-3.1) and
[2]: https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae
[1]: https://github.com/NetBSD/src/commit/e09538bda2f805200d0f7ae09fb9b7f2f5ed75f2
[2]: https://github.com/NetBSD/src/commit/de11d876419df3570c2418468613aebcebafe6ae
--
Erik
Erik Wienhold <ewie@ewie.name> writes:
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
[2] (both inclusive).
Hm. I just installed NetBSD 8.2 in a VM, and it passes this test:
# +++ tap install-check in src/bin/psql +++
t/001_basic.pl ........... ok
t/010_tab_completion.pl .. ok
t/020_cancel.pl .......... ok
All tests successful.
So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
Erik Wienhold <ewie@ewie.name> writes:
Libedit 20191025-3.1 is the first version where ":{?VERB<tab>" works as
expected. The previous release 20190324-3.1 still produces the escaped
output that Michael found. That narrows down the changes to everything
between [1] (changed on 2019-03-24 but not included in 20190324-3.1) and
[2] (both inclusive).Hm. I just installed NetBSD 8.2 in a VM, and it passes this test:
# +++ tap install-check in src/bin/psql +++
t/001_basic.pl ........... ok
t/010_tab_completion.pl .. ok
t/020_cancel.pl .......... ok
All tests successful.So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:
libedit | 3.1-20181209-1 | oldoldstable | source
libedit | 3.1-20191231-2 | oldstable | source
libedit | 3.1-20221030-2 | stable | source
libedit | 3.1-20230828-1 | testing | source
libedit | 3.1-20230828-1 | unstable | source
libedit | 3.1-20230828-1 | unstable-debug | source
But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.
And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.
regards, tom lane
- ilmari
On 2024-04-04 Th 17:24, Tom Lane wrote:
TIL that IPC::Run::timer is not the same as IPC::Run::timeout.
With a timer object you have to check $timer->is_expired to see
if the timeout has elapsed, but with a timeout object you don't
because it will throw a Perl exception upon timing out, probably
killing your test program.It appears that a good chunk of our TAP codebase has not read this
memo, because I see plenty of places that are checking is_expired
in the naive belief that they'll still have control after a timeout
has fired.
I started having a look at these.
Here are the cases I found:
./src/bin/psql/t/010_tab_completion.pl: my $okay = ($out =~ $pattern
&& !$h->{timeout}->is_expired);
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: until
$self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql startup
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/BackgroundPsql.pm: die "psql query
timed out" if $self->{timeout}->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless
$timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Cluster.pm:timeout is the
IPC::Run::Timeout object whose is_expired method can be tested
./src/test/perl/PostgreSQL/Test/Cluster.pm: # timeout, which
we'll handle by testing is_expired
./src/test/perl/PostgreSQL/Test/Cluster.pm: unless
$timeout->is_expired;
./src/test/perl/PostgreSQL/Test/Utils.pm: if ($timeout->is_expired)
./src/test/recovery/t/021_row_visibility.pl: if
($psql_timeout->is_expired)
./src/test/recovery/t/032_relfilenode_reuse.pl: if
($psql_timeout->is_expired)
Those in Cluster.pm look correct - they are doing the run() in an eval
block and testing for the is_expired setting in an exception block. The
other cases look more suspect. I'll take a closer look.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
So it seems like the bug does not exist in any currently-supported
NetBSD release. Debian has been known to ship obsolete libedit
versions, though.
Both the current (bokworm/12) and previous (bullseye/11) versions of
Debian have new enough libedits to not be affected by this bug:
...
But in bullseye they decided that OpenSSL is a system library as far as
the GPL is concerned, so are linking directly to readline.
And even before then their psql wrapper would LD_PRELOAD readline
instead if installed, so approximately nobody actually ever used psql
with libedit on Debian.
Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.
What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.
(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)
regards, tom lane
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
What I suggest Michael do with tanager is install the
OS-version-appropriate version of GNU readline, so that the animal
will test what ilmari describes as the actually common use-case.
Thanks for the investigations! It's clear that this version of
libedit is borked.
(I see that what he did for the moment is add --without-readline.
Perhaps that's a decent long-term choice too, because I think we
have rather little coverage of that option except on Windows.)
Yeah, my motivation did not go beyond that so you can blame me for
being lazy. ;)
Note also the --without-icu on this host that has little coverage, as
well.
--
Michael
On Fri, Apr 05, 2024 at 05:18:51PM -0400, Tom Lane wrote:
Based on this info, I'm disinclined to put work into trying to
make the case behave correctly with that old libedit version, or
even to lobotomize the test case enough so it would pass.
By the way, are you planning to do something like [1]/messages/by-id/1100715.1712265845@sss.pgh.pa.us -- Michael? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.
[1]: /messages/by-id/1100715.1712265845@sss.pgh.pa.us -- Michael
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
By the way, are you planning to do something like [1]? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.
It needs more review I think - I didn't check every call site to see
if anything would be broken. I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.
regards, tom lane
On 2024-04-09 Tu 09:46, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
By the way, are you planning to do something like [1]? I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.It needs more review I think - I didn't check every call site to see
if anything would be broken. I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.
What I looked at so far was the use of is_expired, but when you look
into that you see that you need to delve further, to where timeout/timer
objects are created and passed around. I'll take a closer look when I
have done some incremental json housekeeping.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote:
It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.
I too felt uneasy about that commit, for the same reason. However,
there is a justification for the change in the commit message which is
not obviously wrong, namely that ":{?name} is the only psql syntax
using the '{' sign". And in fact, SQL basically doesn't use '{' for
anything, either. We do see { showing up inside of quoted strings, for
arrays or JSON, but I would guess that the word-break characters
aren't going to affect behavior within a quoted string. So it still
seems like it should be OK? Another thing that makes me think that my
unease may be unfounded is that the matching character '}' isn't in
WORD_BREAKS either, and I would have thought that if we needed one
we'd need both.
But does anyone else have a more specific reason for thinking that
this might be a problem?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 4 Apr 2024, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote:
On 4 Apr 2024, at 23:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A minimum fix that seems to make this work better is as attached,
but I feel like somebody ought to examine all the IPC::Run::timer
and IPC::Run::timeout calls and see which ones are mistaken.
It's a little scary to convert a timeout to a timer because of
the hazard that someplace that would need to be checking for
is_expired isn't.Skimming this and a few callsites it seems reasonable to use a timer instead of
a timeout, but careful study is needed to make sure we're not introducing
anything subtly wrong in the other direction.
Sharing a few preliminary results from looking at this, the attached passes
check-world but need more study/testing.
It seems wrong to me that we die() in query and query_until rather than giving
the caller the power to decide how to proceed. We have even documented that we
do just this:
"Dies on failure to invoke psql, or if psql fails to connect. Errors
occurring later are the caller's problem"
Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it). Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.
A few other comments on related things that stood out while reviewing:
The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair. Done in 0001 (but
really not necessary).
Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:
$back_q->query_until(
qr/logical_slot_get_changes/, q(
\echo logical_slot_get_changes
SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));
... <other tests> ...
# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;
There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point. This might still work as intended, but the comment is
slightly misleading IMO.
recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating. Calling ->quit is
mandated by the API, in turn required by IPC::Run. Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes
this.
--
Daniel Gustafsson
Attachments:
v1-0001-Configure-interactive-instance-to-restart-timer.patchapplication/octet-stream; name=v1-0001-Configure-interactive-instance-to-restart-timer.patch; x-unix-mode=0644Download
From 18d5801553beaebb7575dee9a7122c375b3f2e17 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 9 Apr 2024 17:58:25 +0200
Subject: [PATCH v1 1/3] Configure interactive instance to restart timer
Use the published API from BackgroundPsql for automatically restarting
the timer rather than manually reaching into it to achieve the same
thing.
---
src/bin/psql/t/010_tab_completion.pl | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index b45c39f0f5..54be55ade3 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -77,8 +77,10 @@ close $FH;
# for possible debugging purposes.
my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
-# fire up an interactive psql session
+# fire up an interactive psql session and configure it such that each query
+# restarts the timer
my $h = $node->interactive_psql('postgres', history_file => $historyfile);
+$h->set_query_timer_restart();
# Simple test case: type something and see if psql responds as expected
sub check_completion
@@ -88,9 +90,6 @@ sub check_completion
# report test failures from caller location
local $Test::Builder::Level = $Test::Builder::Level + 1;
- # restart per-command timer
- $h->{timeout}->start($PostgreSQL::Test::Utils::timeout_default);
-
# send the data to be sent and wait for its result
my $out = $h->query_until($pattern, $send);
my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
--
2.39.3 (Apple Git-146)
v1-0002-Report-test-failure-rather-than-aborting-in-case-.patchapplication/octet-stream; name=v1-0002-Report-test-failure-rather-than-aborting-in-case-.patch; x-unix-mode=0644Download
From 4e7187b824b1a1df3144775eceb40bd996a2f3df Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 9 Apr 2024 18:00:17 +0200
Subject: [PATCH v1 2/3] Report test failure rather than aborting in case of
timeout
When an querying an interactive, or background, psql process the call
would die() in case of a timeout since it used the IPC::Run timeout
construct and not a timer. This aborts tests prematurely which makes
debugging harder, and stops getting results for the remaining tests
in the suite. This converts to using a timer object instead which
will transfer control back to the caller regardless, returning undef
in case of a timeout along with logging a test failure. Affected
callsites are also updated to reflect this.
---
src/test/modules/test_misc/t/005_timeouts.pl | 6 +-----
src/test/perl/PostgreSQL/Test/BackgroundPsql.pm | 15 ++++++++++-----
src/test/recovery/t/031_recovery_conflict.pl | 4 +---
src/test/recovery/t/037_invalid_database.pl | 1 +
4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index a792610231..8e7086402c 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -109,11 +109,7 @@ $node->safe_psql('postgres',
# We just initialize the GUC and wait. No transaction is required.
$psql_session = $node->background_psql('postgres');
-$psql_session->query_until(
- qr/starting_bg_psql/, q(
- \echo starting_bg_psql
- SET idle_session_timeout to '10ms';
-));
+$psql_session->query(q(SET idle_session_timeout to '10ms';));
# Wait until the backend enters the timeout injection point.
$node->wait_for_event('client backend', 'idle-session-timeout');
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 4091c311b8..5df972a20d 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -96,7 +96,7 @@ sub new
"Forbidden caller of constructor: package: $package, file: $file:$line"
unless $package->isa('PostgreSQL::Test::Cluster');
- $psql->{timeout} = IPC::Run::timeout(
+ $psql->{timeout} = IPC::Run::timer(
defined($timeout)
? $timeout
: $PostgreSQL::Test::Utils::timeout_default);
@@ -148,7 +148,8 @@ sub _wait_connect
=item $session->quit
Close the session and clean up resources. Each test run must be closed with
-C<quit>.
+C<quit>. Returns TRUE when the session was cleanly terminated, otherwise
+FALSE. A testfailure will be issued in case the session failed to finish.
=cut
@@ -158,7 +159,9 @@ sub quit
$self->{stdin} .= "\\q\n";
- return $self->{run}->finish;
+ my $ret = $self->{run}->finish;
+ ok($ret, 'Terminating interactive psql session');
+ return $ret;
}
=pod
@@ -219,7 +222,8 @@ sub query
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner/);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ isnt($self->{timeout}->is_expired, 'psql query timed out');
+ return undef if $self->{timeout}->is_expired;
$output = $self->{stdout};
# remove banner again, our caller doesn't care
@@ -278,7 +282,8 @@ sub query_until
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, $until);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ isnt($self->{timeout}->is_expired, 'psql query_until timed out');
+ return undef if $self->{timeout}->is_expired;
$ret = $self->{stdout};
diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index d87efa823f..62936f52d2 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -253,9 +253,7 @@ $res = $psql_standby->query_until(
-- wait for lock held by prepared transaction
SELECT * FROM $table2;
]);
-ok(1,
- "$sect: cursor holding conflicting pin, also waiting for lock, established"
-);
+isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established");
# just to make sure we're waiting for lock already
ok( $node_standby->poll_query_until(
diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl
index 32b7d8af57..256314d165 100644
--- a/src/test/recovery/t/037_invalid_database.pl
+++ b/src/test/recovery/t/037_invalid_database.pl
@@ -87,6 +87,7 @@ is($node->psql('postgres', 'DROP DATABASE regression_invalid'),
# dropping the database, making it a suitable point to wait.
my $bgpsql = $node->background_psql('postgres', on_error_stop => 0);
my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+isnt($pid, undef, 'Get backend PID');
# create the database, prevent drop database via lock held by a 2PC transaction
ok( $bgpsql->query_safe(
--
2.39.3 (Apple Git-146)
v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patchapplication/octet-stream; name=v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patch; x-unix-mode=0644Download
From 648898f8da9b3e75c56eaabaef44c1e493217ab4 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Tue, 9 Apr 2024 18:05:19 +0200
Subject: [PATCH v1 3/3] Clean up BackgroundPsql object when test ends
The BackgroundPsql object must be cleaned up at the end of tests to
avoid leaks, ensure that finish() is called before exiting the suite
with done_testing().
---
src/test/recovery/t/043_wal_replay_wait.pl | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/test/recovery/t/043_wal_replay_wait.pl b/src/test/recovery/t/043_wal_replay_wait.pl
index bbd64aa67b..3a469ddf59 100644
--- a/src/test/recovery/t/043_wal_replay_wait.pl
+++ b/src/test/recovery/t/043_wal_replay_wait.pl
@@ -94,4 +94,5 @@ $node_standby->wait_for_log('recovery is not in progress', $log_offset);
$node_standby->stop;
$node_primary->stop;
+$psql_session->{run}->finish;
done_testing();
--
2.39.3 (Apple Git-146)
On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier <michael@paquier.xyz> wrote:
It kind of looks like a libedit bug, but maybe we should dig more
deeply. I felt itchy about 927332b95e77 removing '{' from the
WORD_BREAKS set, and wondered exactly how that would change readline's
behavior. But even if that somehow accounts for the extra backslash
before '{', it's not clear how it could lead to '?' and '}' also
getting backslashed.I don't have a clear idea, either. I also feel uneasy about
927332b95e77 and its change of WORD_BREAKS, but this has the smell
of a bug from an outdated libedit version.I too felt uneasy about that commit, for the same reason. However,
there is a justification for the change in the commit message which is
not obviously wrong, namely that ":{?name} is the only psql syntax
using the '{' sign". And in fact, SQL basically doesn't use '{' for
anything, either. We do see { showing up inside of quoted strings, for
arrays or JSON, but I would guess that the word-break characters
aren't going to affect behavior within a quoted string. So it still
seems like it should be OK? Another thing that makes me think that my
unease may be unfounded is that the matching character '}' isn't in
WORD_BREAKS either, and I would have thought that if we needed one
we'd need both.
FWIW, the default value of rl_basic_word_break_characters [1] has '{'
but doesn't have '}'. The documentation says that this should "break
words for completion in Bash". But I failed to find an explanation
why this should be so for Bash. As you correctly get, my idea was
that our SQL isn't not heavily using '{' unlike Bash.
But does anyone else have a more specific reason for thinking that
this might be a problem?
I don't particularly object against reverting this commit, but I think
we should get to the bottom of this first. Otherwise there is no
warranty to not run into the same problem again.
------
Regards,
Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes:
On Tue, Apr 9, 2024 at 6:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
I too felt uneasy about that commit, for the same reason. However,
there is a justification for the change in the commit message which is
not obviously wrong, namely that ":{?name} is the only psql syntax
using the '{' sign". And in fact, SQL basically doesn't use '{' for
anything, either.
True.
FWIW, the default value of rl_basic_word_break_characters [1] has '{'
but doesn't have '}'. The documentation says that this should "break
words for completion in Bash". But I failed to find an explanation
why this should be so for Bash. As you correctly get, my idea was
that our SQL isn't not heavily using '{' unlike Bash.
Yeah, there's no doubt that what we are currently using for
WORD_BREAKS has mostly been cargo-culted from Bash rather than having
any solid foundation in SQL syntax. It works all right for us today
because we don't really try to complete anything in general SQL
expression contexts, so as long as we have whitespace and parens in
there we're more or less fine.
I wonder a bit why comma isn't in there, though. As an example,
vacuum (full,fre<TAB>
fails to complete "freeze", though it works fine with a space
after the comma. I've not experimented, but it seems certain
that it'd behave better with comma in WORD_BREAKS. Whether
that'd pessimize other behaviors, I dunno.
The longer-range concern that I have is that if we ever want to
complete stuff within expressions, I think we're going to need
all the valid operator characters to be in WORD_BREAKS. And that
will be a problem for this patch, not because of '{' but because
of '?'. So I'd be happier if the parsing were done in a way that
did not rely on '{' and '?' being treated as word characters.
But I've not looked into how hard that'd be. In any case, it's
likely that expanding WORD_BREAKS like that would cause some other
problems that'd have to be fixed, so it's not very reasonable to
expect this patch to avoid a hypothetical future problem.
regards, tom lane
On 09/04/2024 20:10, Daniel Gustafsson wrote:
Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it). Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.
+1. The patch looks good to me. I didn't comb through the tests to check
for bugs of omission, i.e. cases where the caller would need adjustments
because of this, I trust that you found them all.
=item $session->quit
Close the session and clean up resources. Each test run must be closed with
C<quit>. Returns TRUE when the session was cleanly terminated, otherwise
FALSE. A testfailure will be issued in case the session failed to finish.
What does "session failed to finish" mean? Does it mean the same as "not
cleanly terminated", i.e. a test failure is issued whenever this returns
FALSE?
typo: testfailure -> test failure
diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl index d87efa823fd..62936f52d20 100644 --- a/src/test/recovery/t/031_recovery_conflict.pl +++ b/src/test/recovery/t/031_recovery_conflict.pl @@ -253,9 +253,7 @@ $res = $psql_standby->query_until( -- wait for lock held by prepared transaction SELECT * FROM $table2; ]); -ok(1, - "$sect: cursor holding conflicting pin, also waiting for lock, established" -); +isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established");# just to make sure we're waiting for lock already ok( $node_standby->poll_query_until( diff --git a/src/test/recovery/t/037_invalid_database.pl b/src/test/recovery/t/037_invalid_database.pl index 6d1c7117964..c8c20077f85 100644 --- a/src/test/recovery/t/037_invalid_database.pl +++ b/src/test/recovery/t/037_invalid_database.pl @@ -94,6 +94,7 @@ is($node->psql('postgres', 'DROP DATABASE regression_invalid'), my $cancel = $node->background_psql('postgres', on_error_stop => 1); my $bgpsql = $node->background_psql('postgres', on_error_stop => 0); my $pid = $bgpsql->query('SELECT pg_backend_pid()'); +isnt($pid, undef, 'Get backend PID');# create the database, prevent drop database via lock held by a 2PC transaction
ok( $bgpsql->query_safe(
I'm not sure I understand these changes. These can only fail if the
query() or query_until() function times out, right? In that case, the
query() or query_until() would also report a test failure, so these
additional checks after the call seem redundant. They don't do any harm
either, but I wonder why have them in these particular call sites and
not in other query() or query_until() calls.
The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair. Done in 0001 (but
really not necessary).
+1
Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:$back_q->query_until(
qr/logical_slot_get_changes/, q(
\echo logical_slot_get_changes
SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));... <other tests> ...
# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point. This might still work as intended, but the comment is
slightly misleading IMO.
Agreed, it would be good to actually check that it returns.
recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating. Calling ->quit is
mandated by the API, in turn required by IPC::Run. Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes
this.
Alexander included this fix in commit 3c5db1d6b016 already.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/04/2024 20:10, Daniel Gustafsson wrote:
Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it). Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.+1. The patch looks good to me. I didn't comb through the tests to check for bugs of omission, i.e. cases where the caller would need adjustments because of this, I trust that you found them all.
Thanks for review!
=item $session->quit
Close the session and clean up resources. Each test run must be closed with
C<quit>. Returns TRUE when the session was cleanly terminated, otherwise
FALSE. A testfailure will be issued in case the session failed to finish.What does "session failed to finish" mean? Does it mean the same as "not cleanly terminated", i.e. a test failure is issued whenever this returns FALSE?
It was very literally referring to the finish() method. I've reworded the
comment to indicated that it throws a failure in case the process returns a
non-zero exit status to finish().
typo: testfailure -> test failure
Fixed.
my $pid = $bgpsql->query('SELECT pg_backend_pid()');
+isnt($pid, undef, 'Get backend PID');
# create the database, prevent drop database via lock held by a 2PC transaction
ok( $bgpsql->query_safe(I'm not sure I understand these changes. These can only fail if the query() or query_until() function times out, right? In that case, the query() or query_until() would also report a test failure, so these additional checks after the call seem redundant. They don't do any harm either, but I wonder why have them in these particular call sites and not in other query() or query_until() calls.
Fair point, they were added to provide additional detail in case of failure,
but they are to some degree overzealous and definitely not required.
Attached is a v2 with the above changes and 0003 dropped due to already being
implemented.
--
Daniel Gustafsson
Attachments:
v2-0001-Configure-interactive-instance-to-restart-timer.patchapplication/octet-stream; name=v2-0001-Configure-interactive-instance-to-restart-timer.patch; x-unix-mode=0644Download
From 4b6e77d6b3c1abb03976459491945fccf1c26c3d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 31 Oct 2024 13:18:51 +0100
Subject: [PATCH v2 1/2] Configure interactive instance to restart timer
Use the published API from BackgroundPsql for automatically restarting
the timer rather than manually reaching into it to achieve the same
thing.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/bin/psql/t/010_tab_completion.pl | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index fd645896c9..2cdba15dc7 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -77,8 +77,10 @@ close $FH;
# for possible debugging purposes.
my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
-# fire up an interactive psql session
+# fire up an interactive psql session and configure it such that each query
+# restarts the timer
my $h = $node->interactive_psql('postgres', history_file => $historyfile);
+$h->set_query_timer_restart();
# Simple test case: type something and see if psql responds as expected
sub check_completion
@@ -88,9 +90,6 @@ sub check_completion
# report test failures from caller location
local $Test::Builder::Level = $Test::Builder::Level + 1;
- # restart per-command timer
- $h->{timeout}->start($PostgreSQL::Test::Utils::timeout_default);
-
# send the data to be sent and wait for its result
my $out = $h->query_until($pattern, $send);
my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
--
2.39.3 (Apple Git-146)
v2-0002-Report-test-failure-rather-than-aborting-in-case-.patchapplication/octet-stream; name=v2-0002-Report-test-failure-rather-than-aborting-in-case-.patch; x-unix-mode=0644Download
From 7306c48710d3851d3344a8e1df11dc87ce9bd177 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 31 Oct 2024 13:23:10 +0100
Subject: [PATCH v2 2/2] Report test failure rather than aborting in case of
timeout
When an querying an interactive, or background, psql process the call
would die() in case of a timeout since it used the IPC::Run timeout
construct and not a timer. This aborts tests prematurely which makes
debugging harder, and stops getting results for the remaining tests
in the suite. This converts to using a timer object instead which
will transfer control back to the caller regardless, returning undef
in case of a timeout along with logging a test failure. Affected
callsites are also updated to reflect this.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/test/modules/test_misc/t/005_timeouts.pl | 6 +-----
src/test/perl/PostgreSQL/Test/BackgroundPsql.pm | 16 +++++++++++-----
src/test/recovery/t/031_recovery_conflict.pl | 4 +---
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index d9b7219121..59a89b5029 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -118,11 +118,7 @@ $node->safe_psql('postgres',
# We just initialize the GUC and wait. No transaction is required.
$psql_session = $node->background_psql('postgres');
-$psql_session->query_until(
- qr/starting_bg_psql/, q(
- \echo starting_bg_psql
- SET idle_session_timeout to '10ms';
-));
+$psql_session->query(q(SET idle_session_timeout to '10ms';));
# Wait until the backend enters the timeout injection point.
$node->wait_for_event('client backend', 'idle-session-timeout');
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 3c2aca1c5d..a603c5d252 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -96,7 +96,7 @@ sub new
"Forbidden caller of constructor: package: $package, file: $file:$line"
unless $package->isa('PostgreSQL::Test::Cluster');
- $psql->{timeout} = IPC::Run::timeout(
+ $psql->{timeout} = IPC::Run::timer(
defined($timeout)
? $timeout
: $PostgreSQL::Test::Utils::timeout_default);
@@ -148,7 +148,9 @@ sub _wait_connect
=item $session->quit
Close the session and clean up resources. Each test run must be closed with
-C<quit>.
+C<quit>. Returns TRUE when the session was cleanly terminated, otherwise
+FALSE. A test failure will be issued in case the session exited with a non-
+zero exit status (the finish() method returns TRUE for 0 exit status).
=cut
@@ -158,7 +160,9 @@ sub quit
$self->{stdin} .= "\\q\n";
- return $self->{run}->finish;
+ my $ret = $self->{run}->finish;
+ ok($ret, 'Terminating interactive psql session');
+ return $ret;
}
=pod
@@ -219,7 +223,8 @@ sub query
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner/);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ isnt($self->{timeout}->is_expired, 'psql query timed out');
+ return undef if $self->{timeout}->is_expired;
$output = $self->{stdout};
# remove banner again, our caller doesn't care
@@ -278,7 +283,8 @@ sub query_until
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, $until);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ isnt($self->{timeout}->is_expired, 'psql query_until timed out');
+ return undef if $self->{timeout}->is_expired;
$ret = $self->{stdout};
diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index d87efa823f..62936f52d2 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -253,9 +253,7 @@ $res = $psql_standby->query_until(
-- wait for lock held by prepared transaction
SELECT * FROM $table2;
]);
-ok(1,
- "$sect: cursor holding conflicting pin, also waiting for lock, established"
-);
+isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established");
# just to make sure we're waiting for lock already
ok( $node_standby->poll_query_until(
--
2.39.3 (Apple Git-146)
On 31/10/2024 14:27, Daniel Gustafsson wrote:
On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/04/2024 20:10, Daniel Gustafsson wrote:
=item $session->quit
Close the session and clean up resources. Each test run must be closed with
C<quit>. Returns TRUE when the session was cleanly terminated, otherwise
FALSE. A testfailure will be issued in case the session failed to finish.What does "session failed to finish" mean? Does it mean the same as "not cleanly terminated", i.e. a test failure is issued whenever this returns FALSE?
It was very literally referring to the finish() method. I've reworded the
comment to indicated that it throws a failure in case the process returns a
non-zero exit status to finish().
I see.
@@ -148,7 +148,9 @@ sub _wait_connect
=item $session->quitClose the session and clean up resources. Each test run must be closed with -C<quit>. +C<quit>. Returns TRUE when the session was cleanly terminated, otherwise +FALSE. A test failure will be issued in case the session exited with a non- +zero exit status (the finish() method returns TRUE for 0 exit status).
I still find that confusing. What finish() method? Yes, there's a
finish() method in IPC::Run, but that's BackgroundPsql's internal
affair, not exposed to the callers in any other way. And why do I care
what that finish() returns for 0 exit status? That's not visible to the
quit method's caller.
Perhaps sommething like this:
"Close the psql session and clean up resources. Each psql session must
be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code),
otherwise returns FALSE and reports a test failure. "
Would that be accurate?
--
Heikki Linnakangas
Neon (https://neon.tech)
On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:
On 31/10/2024 14:27, Daniel Gustafsson wrote:
On 28 Oct 2024, at 11:56, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/04/2024 20:10, Daniel Gustafsson wrote:
=item $session->quit
Close the session and clean up resources. Each test run must be
closed with
C<quit>. Returns TRUE when the session was cleanly terminated,
otherwise
FALSE. A testfailure will be issued in case the session failed to
finish.What does "session failed to finish" mean? Does it mean the same as
"not cleanly terminated", i.e. a test failure is issued whenever
this returns FALSE?It was very literally referring to the finish() method. I've
reworded the
comment to indicated that it throws a failure in case the process
returns a
non-zero exit status to finish().I see.
@@ -148,7 +148,9 @@ sub _wait_connect
=item $session->quitClose the session and clean up resources. Each test run must be closed with -C<quit>. +C<quit>. Returns TRUE when the session was cleanly terminated, otherwise +FALSE. A test failure will be issued in case the session exited with a non- +zero exit status (the finish() method returns TRUE for 0 exit status).I still find that confusing. What finish() method? Yes, there's a
finish() method in IPC::Run, but that's BackgroundPsql's internal
affair, not exposed to the callers in any other way. And why do I care
what that finish() returns for 0 exit status? That's not visible to
the quit method's caller.Perhaps sommething like this:
"Close the psql session and clean up resources. Each psql session must
be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code),
otherwise returns FALSE and reports a test failure. "Would that be accurate?
I would be OK with Heikki's version.
The patches have bitrotted slightly.
Also this is wrong, I think:
isnt($self->{timeout}->is_expired, 'psql query_until timed out');
I think it should be
ok(! $self->{timeout}->is_expired, 'psql query_until did not time
out');
isnt() normally takes 3 arguments, and the message you have seems
backwards.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 19 Feb 2025, at 23:08, Andrew Dunstan <andrew@dunslane.net> wrote:
On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:
Thanks for review!
Perhaps sommething like this:
"Close the psql session and clean up resources. Each psql session must be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise returns FALSE and reports a test failure."Would that be accurate?
I would be OK with Heikki's version.
Fixed.
The patches have bitrotted slightly.
The attached rebases over current HEAD and passes check-world locally for me.
Also this is wrong, I think:
isnt($self->{timeout}->is_expired, 'psql query_until timed out');
I think it should be
ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
Fixed.
--
Daniel Gustafsson
Attachments:
v3-0002-Report-test-failure-rather-than-aborting-in-case-.patchapplication/octet-stream; name=v3-0002-Report-test-failure-rather-than-aborting-in-case-.patch; x-unix-mode=0644Download
From e029ae348d49a267bdac17e64507016a62d4b805 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 20 Feb 2025 00:55:14 +0100
Subject: [PATCH v3 2/2] Report test failure rather than aborting in case of
timeout
When an querying an interactive, or background, psql process the
call would die() in case of a timeout since it used the IPC::Run
timeout construct and not a timer. This aborts tests prematurely
which makes debugging harder, and stops getting results for the
remaining tests in the suite. This converts to using a timer
object instead which will transfer control back to the caller
regardless, returning undef in case of a timeout along with
logging a test failure. Affected callsites are also updated to
reflect this.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/test/modules/test_misc/t/005_timeouts.pl | 6 +-----
.../perl/PostgreSQL/Test/BackgroundPsql.pm | 19 +++++++++++++------
src/test/recovery/t/031_recovery_conflict.pl | 4 +---
3 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index cdce2afd935..5478d3394d1 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -118,11 +118,7 @@ $node->safe_psql('postgres',
# We just initialize the GUC and wait. No transaction is required.
$psql_session = $node->background_psql('postgres');
-$psql_session->query_until(
- qr/starting_bg_psql/, q(
- \echo starting_bg_psql
- SET idle_session_timeout to '10ms';
-));
+$psql_session->query(q(SET idle_session_timeout to '10ms';));
# Wait until the backend enters the timeout injection point.
$node->wait_for_event('client backend', 'idle-session-timeout');
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index c611a61cf4e..d19a53455d1 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -100,7 +100,7 @@ sub new
"Forbidden caller of constructor: package: $package, file: $file:$line"
unless $package->isa('PostgreSQL::Test::Cluster');
- $psql->{timeout} = IPC::Run::timeout(
+ $psql->{timeout} = IPC::Run::timer(
defined($timeout)
? $timeout
: $PostgreSQL::Test::Utils::timeout_default);
@@ -176,8 +176,10 @@ sub wait_connect
=item $session->quit
-Close the session and clean up resources. Each test run must be closed with
-C<quit>.
+Close the psql session and clean up resources. Each psql session must be
+closed with C<quit> before the end of the test. Returns TRUE if psql exited
+successfully (i.e. with zero exit code), otherwise returns FALSE and reports
+a test failure.
=cut
@@ -187,7 +189,9 @@ sub quit
$self->{stdin} .= "\\q\n";
- return $self->{run}->finish;
+ my $ret = $self->{run}->finish;
+ ok($ret, 'Terminating interactive psql session');
+ return $ret;
}
=pod
@@ -271,7 +275,9 @@ sub query
$self->{run}, $self->{timeout},
\$self->{stderr}, qr/$banner_match/);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ ok(! $self->{timeout}->is_expired, 'psql query timed out');
+ return undef if $self->{timeout}->is_expired;
+ $output = $self->{stdout};
note "results query $query_cnt:\n",
explain {
@@ -339,7 +345,8 @@ sub query_until
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, $until);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
+ return undef if $self->{timeout}->is_expired;
$ret = $self->{stdout};
diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index 028b0b5f0e1..99d08c4ecf6 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -253,9 +253,7 @@ $res = $psql_standby->query_until(
-- wait for lock held by prepared transaction
SELECT * FROM $table2;
]);
-ok(1,
- "$sect: cursor holding conflicting pin, also waiting for lock, established"
-);
+isnt($res, undef, "$sect: cursor holding conflicting pin, also waiting for lock, established");
# just to make sure we're waiting for lock already
ok( $node_standby->poll_query_until(
--
2.39.3 (Apple Git-146)
v3-0001-Configure-interactive-instance-to-restart-timer.patchapplication/octet-stream; name=v3-0001-Configure-interactive-instance-to-restart-timer.patch; x-unix-mode=0644Download
From dd92c112308c36b72866bc99cc75bf610ccdccc5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 20 Feb 2025 00:54:34 +0100
Subject: [PATCH v3 1/2] Configure interactive instance to restart timer
Use the published API from BackgroundPsql for automatically restarting
the timer rather than manually reaching into it to achieve the same
thing.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/bin/psql/t/010_tab_completion.pl | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index fa7427af8c5..c391c3b3be5 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -77,8 +77,10 @@ close $FH;
# for possible debugging purposes.
my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
-# fire up an interactive psql session
+# fire up an interactive psql session and configure it such that each query
+# restarts the timer
my $h = $node->interactive_psql('postgres', history_file => $historyfile);
+$h->set_query_timer_restart();
# Simple test case: type something and see if psql responds as expected
sub check_completion
@@ -88,9 +90,6 @@ sub check_completion
# report test failures from caller location
local $Test::Builder::Level = $Test::Builder::Level + 1;
- # restart per-command timer
- $h->{timeout}->start($PostgreSQL::Test::Utils::timeout_default);
-
# send the data to be sent and wait for its result
my $out = $h->query_until($pattern, $send);
my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
--
2.39.3 (Apple Git-146)
On 2025-02-19 We 6:56 PM, Daniel Gustafsson wrote:
On 19 Feb 2025, at 23:08, Andrew Dunstan<andrew@dunslane.net> wrote:
On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:Thanks for review!
Perhaps sommething like this:
"Close the psql session and clean up resources. Each psql session must be closed with C<quit> before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise returns FALSE and reports a test failure."Would that be accurate?
I would be OK with Heikki's version.
Fixed.
The patches have bitrotted slightly.
The attached rebases over current HEAD and passes check-world locally for me.
Also this is wrong, I think:
isnt($self->{timeout}->is_expired, 'psql query_until timed out');
I think it should be
ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
Fixed.
Actually, since ok() and friends return true iff the test succeeds,
instead of
+ ok(! $self->{timeout}->is_expired, 'psql query_until did not time
out');
+ return undef if $self->{timeout}->is_expired;
you can avoid doing the same test twice and say:
ok(! $self->{timeout}->is_expired, 'psql query_until did not time
out') || return undef;
although for slightly technical reasons perlcritic disapproves of
"return undef" and prefers that you just write a bare "return" so we
should also fix that.
Sorry for taking a second bite at the cherry.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On 20 Feb 2025, at 14:06, Andrew Dunstan <andrew@dunslane.net> wrote:
Actually, since ok() and friends return true iff the test succeeds, instead of + ok(! $self->{timeout}->is_expired, 'psql query_until did not time out'); + return undef if $self->{timeout}->is_expired; you can avoid doing the same test twice and say: ok(! $self->{timeout}->is_expired, 'psql query_until did not time out') || return undef; although for slightly technical reasons perlcritic disapproves of "return undef" and prefers that you just write a bare "return" so we should also fix that. Sorry for taking a second bite at the cherry.
Not at all, I agree that this is an improvement so fixed in the attached along
with a fresh pgperltidy.
--
Daniel Gustafsson
Attachments:
v4-0002-Report-test-failure-rather-than-aborting-in-case-.patchapplication/octet-stream; name=v4-0002-Report-test-failure-rather-than-aborting-in-case-.patch; x-unix-mode=0644Download
From d135d15650a9eaee318949cfd5f4681c08b59b44 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 20 Feb 2025 00:55:14 +0100
Subject: [PATCH v4 2/2] Report test failure rather than aborting in case of
timeout
When an querying an interactive, or background, psql process the
call would die() in case of a timeout since it used the IPC::Run
timeout construct and not a timer. This aborts tests prematurely
which makes debugging harder, and stops getting results for the
remaining tests in the suite. This converts to using a timer
object instead which will transfer control back to the caller
regardless, returning undef in case of a timeout along with
logging a test failure. Affected callsites are also updated to
reflect this.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/test/modules/test_misc/t/005_timeouts.pl | 6 +-----
.../perl/PostgreSQL/Test/BackgroundPsql.pm | 18 ++++++++++++------
src/test/recovery/t/031_recovery_conflict.pl | 2 +-
3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/test/modules/test_misc/t/005_timeouts.pl b/src/test/modules/test_misc/t/005_timeouts.pl
index cdce2afd935..5478d3394d1 100644
--- a/src/test/modules/test_misc/t/005_timeouts.pl
+++ b/src/test/modules/test_misc/t/005_timeouts.pl
@@ -118,11 +118,7 @@ $node->safe_psql('postgres',
# We just initialize the GUC and wait. No transaction is required.
$psql_session = $node->background_psql('postgres');
-$psql_session->query_until(
- qr/starting_bg_psql/, q(
- \echo starting_bg_psql
- SET idle_session_timeout to '10ms';
-));
+$psql_session->query(q(SET idle_session_timeout to '10ms';));
# Wait until the backend enters the timeout injection point.
$node->wait_for_event('client backend', 'idle-session-timeout');
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index c611a61cf4e..3cdd0852bd7 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -100,7 +100,7 @@ sub new
"Forbidden caller of constructor: package: $package, file: $file:$line"
unless $package->isa('PostgreSQL::Test::Cluster');
- $psql->{timeout} = IPC::Run::timeout(
+ $psql->{timeout} = IPC::Run::timer(
defined($timeout)
? $timeout
: $PostgreSQL::Test::Utils::timeout_default);
@@ -176,8 +176,10 @@ sub wait_connect
=item $session->quit
-Close the session and clean up resources. Each test run must be closed with
-C<quit>.
+Close the psql session and clean up resources. Each psql session must be
+closed with C<quit> before the end of the test. Returns TRUE if psql exited
+successfully (i.e. with zero exit code), otherwise returns FALSE and reports
+a test failure.
=cut
@@ -187,7 +189,9 @@ sub quit
$self->{stdin} .= "\\q\n";
- return $self->{run}->finish;
+ my $ret = $self->{run}->finish;
+ ok($ret, 'Terminating interactive psql session');
+ return $ret;
}
=pod
@@ -271,7 +275,8 @@ sub query
$self->{run}, $self->{timeout},
\$self->{stderr}, qr/$banner_match/);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ ok(!$self->{timeout}->is_expired, 'psql query timed out') || return;
+ $output = $self->{stdout};
note "results query $query_cnt:\n",
explain {
@@ -339,7 +344,8 @@ sub query_until
pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, $until);
- die "psql query timed out" if $self->{timeout}->is_expired;
+ ok(!$self->{timeout}->is_expired, 'psql query_until did not time out')
+ || return;
$ret = $self->{stdout};
diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index 028b0b5f0e1..965537d1f97 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -253,7 +253,7 @@ $res = $psql_standby->query_until(
-- wait for lock held by prepared transaction
SELECT * FROM $table2;
]);
-ok(1,
+isnt($res, undef,
"$sect: cursor holding conflicting pin, also waiting for lock, established"
);
--
2.39.3 (Apple Git-146)
v4-0001-Configure-interactive-instance-to-restart-timer.patchapplication/octet-stream; name=v4-0001-Configure-interactive-instance-to-restart-timer.patch; x-unix-mode=0644Download
From dd92c112308c36b72866bc99cc75bf610ccdccc5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 20 Feb 2025 00:54:34 +0100
Subject: [PATCH v4 1/2] Configure interactive instance to restart timer
Use the published API from BackgroundPsql for automatically restarting
the timer rather than manually reaching into it to achieve the same
thing.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
---
src/bin/psql/t/010_tab_completion.pl | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index fa7427af8c5..c391c3b3be5 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -77,8 +77,10 @@ close $FH;
# for possible debugging purposes.
my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
-# fire up an interactive psql session
+# fire up an interactive psql session and configure it such that each query
+# restarts the timer
my $h = $node->interactive_psql('postgres', history_file => $historyfile);
+$h->set_query_timer_restart();
# Simple test case: type something and see if psql responds as expected
sub check_completion
@@ -88,9 +90,6 @@ sub check_completion
# report test failures from caller location
local $Test::Builder::Level = $Test::Builder::Level + 1;
- # restart per-command timer
- $h->{timeout}->start($PostgreSQL::Test::Utils::timeout_default);
-
# send the data to be sent and wait for its result
my $out = $h->query_until($pattern, $send);
my $okay = ($out =~ $pattern && !$h->{timeout}->is_expired);
--
2.39.3 (Apple Git-146)