Improve error detections in TAP tests by spreading safe_psql

Started by Michael Paquierover 6 years ago6 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

This is a follow-up of the discussion which happened here after Tom
has committed fb57f40:
/messages/by-id/20190828012439.GA1965@paquier.xyz

I have reviewed the TAP tests, and we have much more spots where it
is better to use PostgresNode::safe_psql instead PostgresNode::psql so
as the test dies immediately if there is a failure on a query which
should never fail.

Attached are the spots I have found. Any thoughts or opinions?
Thanks,
--
Michael

Attachments:

tap-psql-safe-v1.patchtext/x-diff; charset=us-asciiDownload+45-44
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#1)
Re: Improve error detections in TAP tests by spreading safe_psql

Michael Paquier <michael@paquier.xyz> writes:

This is a follow-up of the discussion which happened here after Tom
has committed fb57f40:
/messages/by-id/20190828012439.GA1965@paquier.xyz
I have reviewed the TAP tests, and we have much more spots where it
is better to use PostgresNode::safe_psql instead PostgresNode::psql so
as the test dies immediately if there is a failure on a query which
should never fail.

After a brief review of node->psql calls in the TAP tests, I'm
of the opinion that what we should do is revert fb57f40 and instead
change PostgresNode::psql so that on_error_die defaults to 1, then
fix the *very* small number of callers that need it to be zero.
Otherwise we're just going to be fighting this same fire forevermore.
Moreover, that's going to lead to a much smaller patch.

Attached are the spots I have found. Any thoughts or opinions?

Well, for instance, you missed SSLServer.pm, in which every single
one of the psql calls is wrong.

If we go this route we'd have to back-patch the change, else it'd
be a serious hazard for test case back-patching. But the buildfarm
would, more or less by definition, find any oversights --- so that
doesn't scare me much.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: Improve error detections in TAP tests by spreading safe_psql

I wrote:

After a brief review of node->psql calls in the TAP tests, I'm
of the opinion that what we should do is revert fb57f40 and instead
change PostgresNode::psql so that on_error_die defaults to 1, then
fix the *very* small number of callers that need it to be zero.
Otherwise we're just going to be fighting this same fire forevermore.
Moreover, that's going to lead to a much smaller patch.

Hmm .. I experimented with doing it this way, as attached, and I guess
I have to take back the assertion that it'd be a much smaller patch.
I found 47 calls that'd need to be changed to set on_error_die to 0,
whereas your patch is touching 44 calls (though I think it missed some).
I still think this is a safer, less bug-prone way to proceed though.

The attached patch just follows a mechanical rule of "set on_error_die
to 0 in exactly those calls where the surrounding code verifies the
exit code is what it expects". That leads to a lot of code that
looks like, say,

 # Insert should work on standby
 is( $node_standby->psql(
 		'postgres',
-		qq{insert into testtab select generate_series(1,1000), 'foo';}),
+		qq{insert into testtab select generate_series(1,1000), 'foo';},
+		on_error_die => 0),
 	0,
 	'INSERT succeeds with truncated relation FSM');

I have to wonder if it isn't better to just drop the is() test
and let PostgresNode::psql issue the failure. The only thing
the is() is really buying us is the ability to label the test
with an ID string, which is helpful, but um ... couldn't that
just be a comment? Or we could think about adding another
optional parameter:

$node_standby->psql(
'postgres',
qq{insert into testtab select generate_series(1,1000), 'foo';},
test_name => 'INSERT succeeds with truncated relation FSM');

regards, tom lane

Attachments:

default-to-on-error-die-1.patchtext/x-diff; charset=us-ascii; name=default-to-on-error-die-1.patchDownload+161-81
#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Improve error detections in TAP tests by spreading safe_psql

On Wed, Aug 28, 2019 at 12:19:08PM -0400, Tom Lane wrote:

The attached patch just follows a mechanical rule of "set on_error_die
to 0 in exactly those calls where the surrounding code verifies the
exit code is what it expects". That leads to a lot of code that
looks like, say,

# Insert should work on standby
is( $node_standby->psql(
'postgres',
-		qq{insert into testtab select generate_series(1,1000), 'foo';}),
+		qq{insert into testtab select generate_series(1,1000), 'foo';},
+		on_error_die => 0),
0,
'INSERT succeeds with truncated relation FSM');

I am fine with that approach. There is an argument for dropping
safe_psql then?

I have to wonder if it isn't better to just drop the is() test
and let PostgresNode::psql issue the failure. The only thing
the is() is really buying us is the ability to label the test
with an ID string, which is helpful, but um ... couldn't that
just be a comment?

I got the same thought as you on this point about why the is() is
actually necessary if we know that it would fail. An advantage of the
current code is that we are able to catch all errors in a given run at
once. If the test dies immediately, you cannot catch that and this
needs repetitive retries if there is no dependency between each step
of the test. An argument against back-patching the stuff changing the
default flag are tests which rely on the current behavior. This could
be annoying for some people doing advanced testing.
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: Improve error detections in TAP tests by spreading safe_psql

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Aug 28, 2019 at 12:19:08PM -0400, Tom Lane wrote:

The attached patch just follows a mechanical rule of "set on_error_die
to 0 in exactly those calls where the surrounding code verifies the
exit code is what it expects".

I am fine with that approach. There is an argument for dropping
safe_psql then?

Well, it's useful if you just want the stdout back. But its name
is a bit misleading if the default behavior of psql is just as
safe. Not sure whether renaming it is worthwhile.

I have to wonder if it isn't better to just drop the is() test
and let PostgresNode::psql issue the failure.

I got the same thought as you on this point about why the is() is
actually necessary if we know that it would fail. An advantage of the
current code is that we are able to catch all errors in a given run at
once.

Yeah, but only if the test cases are independent, which I think is
mostly not true in our TAP scripts. Otherwise you're just looking
at cascading errors.

An argument against back-patching the stuff changing the
default flag are tests which rely on the current behavior. This could
be annoying for some people doing advanced testing.

Yup, the tradeoff is people possibly having test scripts outside
our tree that would break, versus the possibility that we'll back-patch
test changes that don't behave as expected in back branches. I don't
know if the former is true, but I'm afraid the latter is a certainty
if we don't back-patch.

regards, tom lane

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
Re: Improve error detections in TAP tests by spreading safe_psql

On Wed, Aug 28, 2019 at 09:44:58PM -0400, Tom Lane wrote:

Well, it's useful if you just want the stdout back. But its name
is a bit misleading if the default behavior of psql is just as
safe. Not sure whether renaming it is worthwhile.

It is not that complicated enough to capture stdout with
PostgresNode::psql either, so if we are making the default of one
(PostgresNode::psql) look like the other (PostgresNode::safe_psql),
then we lose no actual feature by dropping safe_psql.

Yeah, but only if the test cases are independent, which I think is
mostly not true in our TAP scripts. Otherwise you're just looking
at cascading errors.

Yep. We have a couple of cases though where things are quite
independent, like the 2PC suite divided into sequences of different
transactions, but mostly there are dependencies between one step and
the follow-up ones.

Yup, the tradeoff is people possibly having test scripts outside
our tree that would break, versus the possibility that we'll back-patch
test changes that don't behave as expected in back branches. I don't
know if the former is true, but I'm afraid the latter is a certainty
if we don't back-patch.

Exactly, that's why I am wondering how wide breakages in those
out-of-the-tree tests would go as we have PostgresNode since 9.6.
Facing this choice makes me uneasy, which is why I would tend to only
do incompatible things on HEAD. Another, safer, possibility would be
to introduce in back-branches an extra psql-like routine enforcing
errors which we use in our tests, to keep the former ones for
compatibility, and to drop the old ones on HEAD. It is never fun to
face sudden errors on a system when doing a minor upgrade.
--
Michael