Readd use of TAP subtests

Started by Peter Eisentrautover 4 years ago24 messageshackers

peter_e@gmx.net

over 4 years ago

Now that subtests in TAP are supported again, I want to correct the
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
and put those subtests back.

Much more work like this is possible, of course. I just wanted to get
this out of the way since the code was already written and I've had this
on my list for, uh, 7 years.

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 4 years ago

In reply to: Peter Eisentraut (#1)

Re: Readd use of TAP subtests

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Now that subtests in TAP are supported again, I want to correct the
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
and put those subtests back.

The updated Test::More version requirement also gives us done_testing()
(added in 0.88), which saves us from the constant maintenance headache
of updating the test counts every time. Do you fancy switching the
tests you're modifying anyway to that?

- ilmari

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Dagfinn Ilmari Mannsåker (#2)

Re: Readd use of TAP subtests

On 8 Dec 2021, at 14:49, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Now that subtests in TAP are supported again, I want to correct the
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
and put those subtests back.

The updated Test::More version requirement also gives us done_testing()
(added in 0.88), which saves us from the constant maintenance headache
of updating the test counts every time. Do you fancy switching the
tests you're modifying anyway to that?

We already call done_testing() in a number of tests, and have done so for a
number of years. src/test/ssl/t/002_scram.pl is one example.

--
Daniel Gustafsson https://vmware.com/

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Peter Eisentraut (#1)

Re: Readd use of TAP subtests

On 12/8/21 08:26, Peter Eisentraut wrote:

Now that subtests in TAP are supported again, I want to correct the
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
and put those subtests back.

Much more work like this is possible, of course. I just wanted to get
this out of the way since the code was already written and I've had
this on my list for, uh, 7 years.

+many as long as we cover all the cases in Cluster.pm and Utils.pm. I
suspect they have acquired some new multi-test subs in the intervening 7
years :-)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 4 years ago

In reply to: Daniel Gustafsson (#3)

Re: Readd use of TAP subtests

Daniel Gustafsson <daniel@yesql.se> writes:

On 8 Dec 2021, at 14:49, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Now that subtests in TAP are supported again, I want to correct the
great historical injustice of 7912f9b7dc9e2d3f6cd81892ef6aa797578e9f06
and put those subtests back.

The updated Test::More version requirement also gives us done_testing()
(added in 0.88), which saves us from the constant maintenance headache
of updating the test counts every time. Do you fancy switching the
tests you're modifying anyway to that?

We already call done_testing() in a number of tests, and have done so for a
number of years. src/test/ssl/t/002_scram.pl is one example.

Reading the Test::More changelog more closely, it turns out that even
though we used to depend on version 0.87, that's effectively equivalent
0.88, because there was no stable 0.87 release, only 0.86 and
development releases 0.87_01 through _03.

Either way, I think we should be switching tests to done_testing()
whenever it would otherwise have to adjust the test count, to avoid
having to do that again and again and again going forward.

- ilmari

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Dagfinn Ilmari Mannsåker (#5)

Re: Readd use of TAP subtests

On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:

Either way, I think we should be switching tests to done_testing()
whenever it would otherwise have to adjust the test count, to avoid
having to do that again and again and again going forward.

I'm not so sure. I don't think its necessarily a bad idea to have to
declare how many tests you're going to run. I appreciate it gets hard in
some cases, which is why we have now insisted on a Test::More version
that supports subtests. I suppose we could just take the attitude that
we're happy with however many tests it actually runs, and as long as
they all pass we're good. It just seems slightly sloppy.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 4 years ago

In reply to: Andrew Dunstan (#6)

Re: Readd use of TAP subtests

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:

Either way, I think we should be switching tests to done_testing()
whenever it would otherwise have to adjust the test count, to avoid
having to do that again and again and again going forward.

I'm not so sure. I don't think its necessarily a bad idea to have to
declare how many tests you're going to run. I appreciate it gets hard in
some cases, which is why we have now insisted on a Test::More version
that supports subtests. I suppose we could just take the attitude that
we're happy with however many tests it actually runs, and as long as
they all pass we're good. It just seems slightly sloppy.

The point of done_testing() is to additionally assert that the test
script ran to completion, so you don't get silent failures if something
should end up calling exit(0) prematurely (a non-zero exit status is
considered a failure by the test harness).

The only cases where an explicit plan adds value is if you're running
tests in a loop and care about the number of iterations, or have a
callback with a test inside that you want to make sure gets called. For
these, it's better to explicitly assert that the list you're iterating
over is of the right length, or increment a counter in the loop or
callback and assert that it has the expected value. This has the added
benefit of the failure being coming from the relevant place and having a
helpful description, rather than a plan mismatch at the end which you
then have to hunt down the cause of.

- ilmari

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Andrew Dunstan (#6)

Re: Readd use of TAP subtests

Andrew Dunstan <andrew@dunslane.net> writes:

On 12/8/21 09:08, Dagfinn Ilmari Mannsåker wrote:

Either way, I think we should be switching tests to done_testing()
whenever it would otherwise have to adjust the test count, to avoid
having to do that again and again and again going forward.

I'm not so sure. I don't think its necessarily a bad idea to have to
declare how many tests you're going to run.

I think the main point is to make sure that the test script reached an
intended exit point, rather than dying early someplace. It's not apparent
to me why reaching a done_testing() call is a less reliable indicator of
that than executing some specific number of tests --- and I agree with
ilmari that maintaining the test count is a serious PITA.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Dagfinn Ilmari Mannsåker (#7)

Re: Readd use of TAP subtests

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= <ilmari@ilmari.org> writes:

The only cases where an explicit plan adds value is if you're running
tests in a loop and care about the number of iterations, or have a
callback with a test inside that you want to make sure gets called. For
these, it's better to explicitly assert that the list you're iterating
over is of the right length, or increment a counter in the loop or
callback and assert that it has the expected value. This has the added
benefit of the failure being coming from the relevant place and having a
helpful description, rather than a plan mismatch at the end which you
then have to hunt down the cause of.

Yeah. A different way of stating that is that the test count adds
security only if you re-derive its proper value from first principles
every time you modify the test script. I don't know about you guys,
but the only way I've ever adjusted those numbers is to put in whatever
the error message said was right. I don't see how that's adding
anything but make-work; it's certainly not doing much to help verify
the script's control flow.

A question that seems pretty relevant here is: what exactly is the
point of using the subtest feature, if we aren't especially interested
in its effect on the overall test count? I can see that it'd have
value when you wanted to use skip_all to control a subset of a test
run, but I'm not detecting where is the value-added in the cases in
Peter's proposed patch.

regards, tom lane

#10

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Tom Lane (#9)

Re: Readd use of TAP subtests

On 08.12.21 18:31, Tom Lane wrote:

A question that seems pretty relevant here is: what exactly is the
point of using the subtest feature, if we aren't especially interested
in its effect on the overall test count? I can see that it'd have
value when you wanted to use skip_all to control a subset of a test
run, but I'm not detecting where is the value-added in the cases in
Peter's proposed patch.

It's useful if you edit a test file and add (what would appear to be) N
tests and want to update the number.

But I'm also OK with the done_testing() style, if there are no drawbacks
to that.

Does that call into question why we raised the Test::More version to
begin with? Or were there other reasons?

#11

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Tom Lane (#8)

Re: Readd use of TAP subtests

On 8 Dec 2021, at 16:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the main point is to make sure that the test script reached an
intended exit point, rather than dying early someplace. It's not apparent
to me why reaching a done_testing() call is a less reliable indicator of
that than executing some specific number of tests --- and I agree with
ilmari that maintaining the test count is a serious PITA.

FWIW I agree with this and am in favor of using done_testing().

--
Daniel Gustafsson https://vmware.com/

#12

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Peter Eisentraut (#10)

Re: Readd use of TAP subtests

Now that we have switched everything to done_testing(), the subtests
feature isn't that relevant anymore, but it might still be useful to get
better output when running with PROVE_FLAGS=--verbose. Compare before:

t/001_basic.pl ..
1..8
ok 1 - vacuumlo --help exit code 0
ok 2 - vacuumlo --help goes to stdout
ok 3 - vacuumlo --help nothing to stderr
ok 4 - vacuumlo --version exit code 0
ok 5 - vacuumlo --version goes to stdout
ok 6 - vacuumlo --version nothing to stderr
ok 7 - vacuumlo with invalid option nonzero exit code
ok 8 - vacuumlo with invalid option prints error message
ok
All tests successful.
Files=1, Tests=8, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.08 cusr
0.05 csys = 0.15 CPU)
Result: PASS

After (with attached patch):

t/001_basic.pl ..
# Subtest: vacuumlo --help
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3
ok 1 - vacuumlo --help
# Subtest: vacuumlo --version
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3
ok 2 - vacuumlo --version
# Subtest: vacuumlo options handling
ok 1 - invalid option nonzero exit code
ok 2 - invalid option prints error message
1..2
ok 3 - vacuumlo options handling
1..3
ok
All tests successful.
Files=1, Tests=3, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.11 cusr
0.07 csys = 0.21 CPU)
Result: PASS

I think for deeply nested tests and test routines defined in other
modules, this helps structure the test output more like the test source
code is laid out, so it makes following the tests and locating failing
test code easier.

#13

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Peter Eisentraut (#12)

Re: Readd use of TAP subtests

On 24 Feb 2022, at 11:16, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I think for deeply nested tests and test routines defined in other modules,
this helps structure the test output more like the test source code is laid
out, so it makes following the tests and locating failing test code easier.

I don't have any strong opinions on this, but if we go ahead with it I think
there should be a short note in src/test/perl/README about when substest could
be considered.

--
Daniel Gustafsson https://vmware.com/

#14

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Peter Eisentraut (#12)

Re: Readd use of TAP subtests

Hi,

On 2022-02-24 11:16:03 +0100, Peter Eisentraut wrote:

Now that we have switched everything to done_testing(), the subtests feature
isn't that relevant anymore, but it might still be useful to get better
output when running with PROVE_FLAGS=--verbose.

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Open issue since 2015:
https://github.com/TestAnything/Specification/issues/2

The perl ecosystem is so moribund :(.

t/001_basic.pl ..
# Subtest: vacuumlo --help
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3

It's clearly better.

We can approximate some of it by using is_deeply() and comparing exit, stdout,
stderr at once. Particularly for helpers like program_help() that are used in
a lot of places.

Greetings,

Andres Freund

#15

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Andres Freund (#14)

Re: Readd use of TAP subtests

On 24.02.22 16:00, Andres Freund wrote:

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Ok that's good to know. What exactly happens when it tries to parse
them? Does it not count them or does it fail somehow? The way the
output is structured

t/001_basic.pl ..
# Subtest: vacuumlo --help
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3
ok 1 - vacuumlo --help

it appears that it should be able to parse it nonetheless and should
just count the non-indented lines.

#16

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Peter Eisentraut (#15)

Re: Readd use of TAP subtests

On 2/25/22 08:39, Peter Eisentraut wrote:

On 24.02.22 16:00, Andres Freund wrote:

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it
seems
that subtests aren't actually specified in the tap format, and that
different
libraries generate different output formats. The reason this matters
somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Ok that's good to know. What exactly happens when it tries to parse
them? Does it not count them or does it fail somehow? The way the
output is structured

t/001_basic.pl ..
# Subtest: vacuumlo --help
    ok 1 - exit code 0
    ok 2 - goes to stdout
    ok 3 - nothing to stderr
    1..3
ok 1 - vacuumlo --help

it appears that it should be able to parse it nonetheless and should
just count the non-indented lines.

AIUI TAP consumers are supposed to ignore lines they don't understand.
The Node TAP setup produces output like this, so perl is hardly alone
here. See <https://node-tap.org/docs/api/subtests/>

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#17

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Peter Eisentraut (#15)

Re: Readd use of TAP subtests

Hi,

On 2022-02-25 14:39:15 +0100, Peter Eisentraut wrote:

On 24.02.22 16:00, Andres Freund wrote:

I've incidentally played with subtests yesterdays, when porting
src/interfaces/libpq/test/regress.pl to a tap test. Unfortunately it seems
that subtests aren't actually specified in the tap format, and that different
libraries generate different output formats. The reason this matters somewhat
is that meson's testrunner can parse tap and give nicer progress / error
reports. But since subtests aren't in the spec it can't currently parse
them...

Ok that's good to know. What exactly happens when it tries to parse them?
Does it not count them or does it fail somehow? The way the output is
structured

Says that it can't pase a line of the tap output:
16:06:55 MALLOC_PERTURB_=156 /usr/bin/perl /tmp/meson-test/build/../subtest.pl
----------------------------------- output -----------------------------------
stdout:
# Subtest: a
ok 1 - a: a
ok 2 - a: b
1..2
ok 1 - a
1..1
stderr:

TAP parsing error: unexpected input at line 4
------------------------------------------------------------------------------

t/001_basic.pl ..
# Subtest: vacuumlo --help
ok 1 - exit code 0
ok 2 - goes to stdout
ok 3 - nothing to stderr
1..3
ok 1 - vacuumlo --help

it appears that it should be able to parse it nonetheless and should just
count the non-indented lines.

It looks like it's not ignoring indented lines...

Greetings,

Andres Freund

#18

Jacob Champion

jacob.champion@enterprisedb.com

over 4 years ago

In reply to: Andrew Dunstan (#16)

Re: Readd use of TAP subtests

On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:

AIUI TAP consumers are supposed to ignore lines they don't understand.

It's undefined behavior [1]https://testanything.org/tap-version-13-specification.html:

Any output that is not a version, a plan, a test line, a YAML block,
a diagnostic or a bail out is incorrect. How a harness handles the
incorrect line is undefined. Test::Harness silently ignores incorrect
lines, but will become more stringent in the future. TAP::Harness
reports TAP syntax errors at the end of a test run.

--Jacob

[1]: https://testanything.org/tap-version-13-specification.html

#19

Jacob Champion

jacob.champion@enterprisedb.com

over 4 years ago

In reply to: Jacob Champion (#18)

Re: Readd use of TAP subtests

On Fri, 2022-02-25 at 16:35 +0000, Jacob Champion wrote:

On Fri, 2022-02-25 at 09:43 -0500, Andrew Dunstan wrote:

AIUI TAP consumers are supposed to ignore lines they don't understand.

It's undefined behavior [1]:

And of course the minute I send this I notice that I've linked the v13
spec instead of the original... sorry. Assuming Perl isn't marking its
tests as version 13, you are correct:

Any output line that is not a version, a plan, a test line, a
diagnostic or a bail out is considered an “unknown” line. A TAP
parser is required to not consider an unknown line as an error but
may optionally choose to capture said line and hand it to the test
harness, which may have custom behavior attached. This is to allow
for forward compatability. Test::Harness silently ignores incorrect
lines, but will become more stringent in the future. TAP::Harness
reports TAP syntax errors at the end of a test run.

--Jacob

#20

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Andrew Dunstan (#16)

Re: Readd use of TAP subtests

Hi,

On 2022-02-25 09:43:20 -0500, Andrew Dunstan wrote:

AIUI TAP consumers are supposed to ignore lines they don't understand.

Are they?

In http://testanything.org/tap-version-13-specification.html there's:

"Lines written to standard output matching /^(not )?ok\b/ must be interpreted
as test lines. [...]All other lines must not be considered test output."

That says that all other lines aren't "test ouput". But what does that mean?
It certainly doesn't mean they can just be ignored, because obviously
^(TAP version|#|1..|Bail out) isn't to be ignored.

And then there's:

"
Anything else

Any output that is not a version, a plan, a test line, a YAML block, a
diagnostic or a bail out is incorrect. How a harness handles the incorrect
line is undefined. Test::Harness silently ignores incorrect lines, but will
become more stringent in the future. TAP::Harness reports TAP syntax errors at
the end of a test run.
"

If I were to to implement a tap parser this wouldn't make me ignore lines.

Contrasting to that:
http://testanything.org/tap-specification.html

"
Anything else

A TAP parser is required to not consider an unknown line as an error but may
optionally choose to capture said line and hand it to the test harness,
which may have custom behavior attached. This is to allow for forward
compatability. Test::Harness silently ignores incorrect lines, but will
become more stringent in the future. TAP::Harness reports TAP syntax errors
at the end of a test run.
"

I honestly don't know what to make of that. Parsers are supposed to ignore
unknown lines, Test::Harness silently ignores, but also "TAP::Harness reports
TAP syntax errors"? This seems like a self contradictory mess.

Greetings,

Andres Freund

#21

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Andres Freund (#20)

#22

Peter Eisentraut

peter_e@gmx.net

about 4 years ago

In reply to: Andres Freund (#17)