New regression test time

Started by Josh Berkusalmost 13 years ago34 messageshackers
Jump to latest
#1Josh Berkus
josh@agliodbs.com

Hackers,

Per discussion on these tests, I ran "make check" against 9.4 head,
applied all of the regression tests other than DISCARD.

Time for 3 "make check" runs without new tests: 65.9s

Time for 3 "make check runs with new tests: 71.7s

So that's an increase of about 10% in test runtime (or 2 seconds per run
on my laptop), in order to greatly improve regression test coverage.
I'd say that splitting the tests is not warranted, and that we should go
ahead with these tests on their testing merits, not based on any extra
check time they might add.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#1)
Re: New regression test time

On 2013-06-28 14:01:23 -0700, Josh Berkus wrote:

Per discussion on these tests, I ran "make check" against 9.4 head,
applied all of the regression tests other than DISCARD.

Time for 3 "make check" runs without new tests: 65.9s

Time for 3 "make check runs with new tests: 71.7s

So that's an increase of about 10% in test runtime (or 2 seconds per run
on my laptop), in order to greatly improve regression test coverage.

How did you evaluate that coverage increased "greatly"? I am not
generally against these tests but I'd be surprised if the overall test
coverage improved noticeably by this. Which makes 10% runtime overhead
pretty hefty if the goal is to actually achieve a high coverage.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
Re: New regression test time

How did you evaluate that coverage increased "greatly"? I am not
generally against these tests but I'd be surprised if the overall test
coverage improved noticeably by this. Which makes 10% runtime overhead
pretty hefty if the goal is to actually achieve a high coverage.

I was relying on Robins' numbers of coverage:

Patch to add more regression tests to ASYNC (/src/backend/commands/async).
Take code-coverage from 39% to 75%.

Please find attached a patch to take code-coverage of ALTER OPERATOR
FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%.

Please find attached a patch to take code-coverage of LOCK TABLE (
src/backend/commands/lockcmds.c) from 57% to 84%.

... etc.

If we someday add so many tests that "make check" takes over a minute on
a modern laptop, then maybe it'll be worth talking about splitting the
test suite into "regular" and "extended". However, it would require 15
more patch sets the size of Robins' to get there, so I don't see that
it's worth the trouble any time soon.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#3)
Re: New regression test time

On 2013-06-28 14:46:10 -0700, Josh Berkus wrote:

How did you evaluate that coverage increased "greatly"? I am not
generally against these tests but I'd be surprised if the overall test
coverage improved noticeably by this. Which makes 10% runtime overhead
pretty hefty if the goal is to actually achieve a high coverage.

I was relying on Robins' numbers of coverage:

Those improvements rather likely end up being an improvement a good bit
less than one percent for the whole binary.

If we someday add so many tests that "make check" takes over a minute on
a modern laptop, then maybe it'll be worth talking about splitting the
test suite into "regular" and "extended". However, it would require 15
more patch sets the size of Robins' to get there, so I don't see that
it's worth the trouble any time soon.

Was it actually an assert enabled build that you tested?

We currently can run make check with stuff like CLOBBER_CACHE_ALWAYS
which finds bugs pretty regularly. If we achieve a high coverage we
quite possibly can't anymore, at least not regularly.
So I actually think having two modes makes sense. Then we could also
link stuff like isolationtester automatically into the longer running
mode...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#1)
Re: New regression test time

* Josh Berkus (josh@agliodbs.com) wrote:

So that's an increase of about 10% in test runtime (or 2 seconds per run
on my laptop), in order to greatly improve regression test coverage.
I'd say that splitting the tests is not warranted, and that we should go
ahead with these tests on their testing merits, not based on any extra
check time they might add.

For my 2c, +1 on this in general, in spite of the concerns. Covering
cases that we don't is valuable in general and if we get a bit more
coverage for a few more seconds, it's worth it.

Also, if someone wants to split the test up, then they need to provide a
patch which does so. I'm not against that, but I do not feel this
addition should be held up waiting for someone to implement such a
seperation- if anything, having the two things done independently would
probably be cleaner anyway.

Thanks,

Stephen

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#4)
Re: New regression test time

How did you evaluate that coverage increased "greatly"? I am not
generally against these tests but I'd be surprised if the overall test
coverage improved noticeably by this. Which makes 10% runtime overhead
pretty hefty if the goal is to actually achieve a high coverage.

I was relying on Robins' numbers of coverage:

Those improvements rather likely end up being an improvement a good bit
less than one percent for the whole binary.

Yes, but it is a valuable percent nevertheless.

As I understand it, the coverage is about the tested command logic. A lot
this logic is dedicated to check permissions (can you add an operator to
this database? ...) and to verify required conditions (is the function
proposed for operator has the right signature? does the operator overwrite
an existing one? ...).

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Berkus (#1)
Re: New regression test time

Josh Berkus escribi�:

Hackers,

Per discussion on these tests, I ran "make check" against 9.4 head,
applied all of the regression tests other than DISCARD.

Time for 3 "make check" runs without new tests: 65.9s

Time for 3 "make check runs with new tests: 71.7s

So that's an increase of about 10% in test runtime (or 2 seconds per run
on my laptop),

I see two problems with this report:
1. it creates a new installation for each run,
2. it only uses the serial schedule.

I care more about the parallel schedule than the serial one, because
since it obviously runs in less time, then I can run it often and not
worry about how long it takes. On the other hand, the cost of the extra
initdb obviously means that the percentage is a bit lower than if you
were to compare test run time without considering the initdb step.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#7)
Re: New regression test time

On Jun 29, 2013, at 12:36 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I see two problems with this report:
1. it creates a new installation for each run,

But that's the normal way of running the tests anyway, isn't it?

2. it only uses the serial schedule.

make check uses the parallel schedule - did Josh indicate he did anything else?

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
Re: New regression test time

I see two problems with this report:
1. it creates a new installation for each run,

Yes, I'm running "make check"

2. it only uses the serial schedule.

Um, no:

parallel group (19 tests): limit prepare copy2 plancache xml returning
conversion rowtypes largeobject temp truncate polymorphism with
without_oid sequence domain rangefuncs alter_table plpgsql

Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took
about 39s (with patches).

I care more about the parallel schedule than the serial one, because
since it obviously runs in less time, then I can run it often and not
worry about how long it takes. On the other hand, the cost of the extra
initdb obviously means that the percentage is a bit lower than if you
were to compare test run time without considering the initdb step.

Possibly, but I know I run "make check" not "make installcheck" when I'm
testing new code. And the buildfarm, afaik, runs "make check". And,
for that matter, who the heck cares?

The important thing is that "make check" still runs in < 30 seconds on a
standard laptop (an ultraportable, even), so it's not holding up a
change-test-change-test cycle. If you were looking to prove something
else, then go for it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#9)
Re: New regression test time

On 06/29/2013 03:57 PM, Josh Berkus wrote:

I see two problems with this report:
1. it creates a new installation for each run,

Yes, I'm running "make check"

2. it only uses the serial schedule.

Um, no:

parallel group (19 tests): limit prepare copy2 plancache xml returning
conversion rowtypes largeobject temp truncate polymorphism with
without_oid sequence domain rangefuncs alter_table plpgsql

Out of curiosity, I tried a serial run (MAX_CONNECTIONS=1), which took
about 39s (with patches).

I care more about the parallel schedule than the serial one, because
since it obviously runs in less time, then I can run it often and not
worry about how long it takes. On the other hand, the cost of the extra
initdb obviously means that the percentage is a bit lower than if you
were to compare test run time without considering the initdb step.

Possibly, but I know I run "make check" not "make installcheck" when I'm
testing new code. And the buildfarm, afaik, runs "make check". And,
for that matter, who the heck cares?

It runs both :-) We run "make check" early in the process to make sure
we can at least get that far. We run "make installcheck" later, among
other things to check that the tests work in different locales.

I think we need to have a better understanding of just what our standard
regression tests do.

AIUI: They do test feature use and errors that have cropped up in the
past that we need to beware of. They don't test every bug we've ever
had, nor do they exercise every piece of code.

Maybe there is a good case for these last two in a different set of tests.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
Re: New regression test time

On 06/29/2013 02:14 PM, Andrew Dunstan wrote:

AIUI: They do test feature use and errors that have cropped up in the
past that we need to beware of. They don't test every bug we've ever
had, nor do they exercise every piece of code.

If we don't have a test for it, then we can break it in the future and
not know we've broken it until .0 is released. Is that really a
direction we're happy going in?

Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.

I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability? For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Dann Corbit
DCorbit@connx.com
In reply to: Josh Berkus (#11)
Re: New regression test time

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Saturday, June 29, 2013 3:00 PM
To: Andrew Dunstan
Cc: Alvaro Herrera; pgsql-hackers@postgresql.org; Robins Tharakan
Subject: Re: [HACKERS] New regression test time

On 06/29/2013 02:14 PM, Andrew Dunstan wrote:

AIUI: They do test feature use and errors that have cropped up in the
past that we need to beware of. They don't test every bug we've ever
had, nor do they exercise every piece of code.

If we don't have a test for it, then we can break it in the future and not know we've broken it until .0 is released. Is that really a direction we're happy going in?

Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument. But we don't, so it's not. And nobody has offered to write a feature to split our tests either.

I have to say, I'm really surprised at the level of resistance people on this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability? For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

An ounce of prevention is worth a pound of cure.
The cost of a bug rises exponentially, starting at requirements gathering and ending at the customer.
Where I work, we have two computer rooms full of machines that run tests around the clock, 24x365.
Even so, a full regression takes well over a week because we perform hundreds of thousands of tests.
All choices of this kind are trade-offs. But in such situations, my motto is:
"Do whatever will make the customer prosper the most."
IMO-YMMV
<<

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#11)
Re: New regression test time

On 06/29/2013 05:59 PM, Josh Berkus wrote:

Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.

I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability? For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

Dividing the tests into different sections is as simple as creating one
schedule file per section.

I'm not at all resistant to it. In fact, of someone wants to set up
separate sections and add new tests to the different sections I'll be
more than happy to provide buildfarm support for it. Obvious candidates
could include:

* code coverage
* bugs
* tests too big to run in everyday developer use

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
Re: New regression test time

Dividing the tests into different sections is as simple as creating one
schedule file per section.

Oh? Huh. I'd thought it would be much more complicated. Well, by all
means, let's do it then.

I'm not personally convinced that the existing regression tests all
belong in the "default" section -- I think a lot of what we've chosen to
test or not test to date has been fairly arbitrary -- but we can tinker
with that later.

I'm not at all resistant to it. In fact, of someone wants to set up
separate sections and add new tests to the different sections I'll be
more than happy to provide buildfarm support for it. Obvious candidates
could include:

* code coverage
* bugs
* tests too big to run in everyday developer use

So we'd want codecoverage and codecoverage-parallel then, presumably,
for Robins tests? Is there really a reason to supply a non-parallel
version? Would we want to include the core tests in those?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Claudio Freire
klaussfreire@gmail.com
In reply to: Josh Berkus (#14)
Re: New regression test time

On Sat, Jun 29, 2013 at 7:58 PM, Josh Berkus <josh@agliodbs.com> wrote:

Dividing the tests into different sections is as simple as creating one
schedule file per section.

Oh? Huh. I'd thought it would be much more complicated. Well, by all
means, let's do it then.

I think I should point out, since it seems to have gone unnoticed,
that there's a thread with a patch about exactly this (splitting
tests), the "big test separation POC".

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#11)
Re: New regression test time

Josh,

* Josh Berkus (josh@agliodbs.com) wrote:

If we don't have a test for it, then we can break it in the future and
not know we've broken it until .0 is released. Is that really a
direction we're happy going in?

To be fair, AIUI anyway, certain companies have much larger regression
suites that they run new versions of PG against. I'm sure they don't
have the platform coverage that the buildfarm does, but I expect no
small number of build farm animals would fall over under the strain of
that regression suite (or at least, they'd have trouble keeping up with
the commit rate).

I'm definitely pro-more-tests when they add more code/feature coverage
and are nominal in terms of additional time taken during 'make check',
but I seriously doubt we're ever going to have anything close to
complete code coverage in such a suite of tests.

I have to say, I'm really surprised at the level of resistance people on
this list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability? For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

I do think having these tests split into different groups would be
valuable. It might even be good to split them into groups based on what
code they exercise and then devs might be able to decide "I'm working in
this area right now, I want the tests that are applicable to that code".
Might be a bit too much effort though too, dunno. Would be really neat
if it was automated in some way. ;)

Thanks,

Stephen

#17Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Josh Berkus (#11)
Re: New regression test time

If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.

I have done a POC. See:

https://commitfest.postgresql.org/action/patch_view?id=1170

What I have not done is to decide how to split tests in two buckets.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Fabien COELHO (#17)
Re: New regression test time

On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:

If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.

I have done a POC. See:

https://commitfest.postgresql.org/action/patch_view?id=1170

I think it is better to submit for next commit fest which is at below link:

https://commitfest.postgresql.org/action/commitfest_view?id=19

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Amit Kapila (#18)
Re: New regression test time

https://commitfest.postgresql.org/action/patch_view?id=1170

I think it is better to submit for next commit fest which is at below link:

https://commitfest.postgresql.org/action/commitfest_view?id=19

I put it there as the discussion whether to accept or not Robins patches
because of their possible impact on non-regression test time is right now,
so it may make sense to look at it now, and it is a rather small patch.
Otherwise, next commit fest is fine.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robins Tharakan
tharakan@gmail.com
In reply to: Amit Kapila (#18)
Re: New regression test time

On 30 June 2013 02:33, Amit kapila <amit.kapila@huawei.com> wrote:

On Sunday, June 30, 2013 11:37 AM Fabien COELHO wrote:

If we had a different set of tests, that would be a valid argument. But
we don't, so it's not. And nobody has offered to write a feature to
split our tests either.

I have done a POC. See:

https://commitfest.postgresql.org/action/patch_view?id=1170

I think it is better to submit for next commit fest which is at below link:

https://commitfest.postgresql.org/action/commitfest_view?id=19

Hi,

- There is a certain value in having separate tests, just that for the
big-tests to be any meaningful, if the buildfarm could run on a periodic
(daily?) basis and send some kind of automated bug-reports. Without an
automatic feedback, most may not inclined to run all tests before
submitting a patch and there'd be a big pile up near a release.

- For now, the new tests that I submit for review (for next CF) would be
for 'make check', until a 'make bigcheck' or whatever is up and running.

--
Robins Tharakan

#21Jeff Janes
jeff.janes@gmail.com
In reply to: Andrew Dunstan (#13)
#22Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Josh Berkus (#22)
#24David Fetter
david@fetter.org
In reply to: Josh Berkus (#11)
#25Andres Freund
andres@anarazel.de
In reply to: David Fetter (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#25)
#27Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#26)
#28Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andrew Dunstan (#27)
#29Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#26)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Noah Misch (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#30)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#31)
#33Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#1)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#33)