performance-test farm

Started by Tomas Vondraover 14 years ago22 messages
#1Tomas Vondra
tv@fuzzy.cz

Hi everyone,

several members of this mailing list mentioned recently it'd be really
useful to have a performance-test farm, that it might improve the
development process and make some changes easier.

I've briefly discussed this with another CSPUG member, who represents a
local company using PostgreSQL for a long time (and that supports
CSPUG), and we've agreed to investigate this a bit further.

I do have a rough idea what it might look like, but I've never built
performance-testing farm for such distributed project. So I'd like to
know what would you expect from such beast. Especially

1) Is there something that might serve as a model?

I've googled to seach if there's some tool but "performance-test
farm" gave me a lot of info about how to breed cows, pigs and goats
on a farm, but that's not very useful in this case I guess.

2) How would you use it? What procedure would you expect?

I mean this should produce regular performance test of the current
sources (and publish it on some website), but the whole point is to
allow developers to do a performance test of their changes before
commit to the main.

How would you expect to deliver these changes to the farm? How would
you define the job? How would you expect to get the results? etc.

Just try to write down a list of steps.

3) Any other features expected?

If you notice any interesting feature, write it down and note
whether it's a 'must have' or a 'nice to have' feature.

I really can't promise anything right now - I have just a very rough
idea how much time/effort/money this might take. So let's see what is
needed to build a 'minimal farm' and if it's feasible with the resources
we can get.

regards
Tomas

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tomas Vondra (#1)
Re: performance-test farm

Tomas Vondra <tv@fuzzy.cz> wrote:

1) Is there something that might serve as a model?

I've been assuming that we would use the PostgreSQL Buildfarm as a
model.

http://buildfarm.postgresql.org/

2) How would you use it? What procedure would you expect?

People who had suitable test environments could sign up to
periodically build and performance test using the predetermined test
suite, and report results back for a consolidated status display.
That would spot regressions.

It would be nice to have a feature where a proposed patch could be
included for a one-time build-and-benchmark run, so that ideas could
be tried before commit. It can be hard to anticipate all the
differenced between Intel and AMD, Linux and Windows, 32 bit and 64
bit, etc.

3) Any other features expected?

Pretty graphs? :-)

-Kevin

#3Tomas Vondra
tv@fuzzy.cz
In reply to: Kevin Grittner (#2)
Re: performance-test farm

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):

Tomas Vondra <tv@fuzzy.cz> wrote:

1) Is there something that might serve as a model?

I've been assuming that we would use the PostgreSQL Buildfarm as a
model.

http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may not
be the right place to do performance testing (not sure how isolated
the benchmarks can be etc.).

2) Not sure how open this might be for the developers (if they could
issue their own builds etc.).

3) If this should be part of the current buildfarm, then I'm afraid I
can't do much about it.

2) How would you use it? What procedure would you expect?

People who had suitable test environments could sign up to
periodically build and performance test using the predetermined test
suite, and report results back for a consolidated status display.
That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good idea, as
to get reliable benchmark results you need a proper environment (not
influenced by other jobs, changes of hw etc.).

It would be nice to have a feature where a proposed patch could be
included for a one-time build-and-benchmark run, so that ideas could
be tried before commit. It can be hard to anticipate all the
differenced between Intel and AMD, Linux and Windows, 32 bit and 64
bit, etc.

Yes, that's one of the main goals - to allow developers to benchmark
their patches under various workloads. I don't think we'll be able to
get all those configurations, though.

3) Any other features expected?

Pretty graphs? :-)

Sure. And it will be Web 2.0 ready ;-)

Tomas

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tomas Vondra (#3)
Re: performance-test farm

Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):

Tomas Vondra <tv@fuzzy.cz> wrote:

1) Is there something that might serve as a model?

I've been assuming that we would use the PostgreSQL Buildfarm as
a model.

http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).

I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.

2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).

I haven't done it, but I understand that you can create a "local"
buildfarm instance which isn't reporting its results. Again,
something similar might be good.

3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.

Not part of the current buildfarm; just using a similar overall
pattern. Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.

2) How would you use it? What procedure would you expect?

People who had suitable test environments could sign up to
periodically build and performance test using the predetermined
test suite, and report results back for a consolidated status
display. That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good
idea, as to get reliable benchmark results you need a proper
environment (not influenced by other jobs, changes of hw etc.).

Yeah, accurate benchmarking is not easy. We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm. Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.

-Kevin

#5Tomas Vondra
tv@fuzzy.cz
In reply to: Kevin Grittner (#4)
Re: performance-test farm

Dne 12.5.2011 00:21, Kevin Grittner napsal(a):

Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):

Tomas Vondra <tv@fuzzy.cz> wrote:

1) Is there something that might serve as a model?

I've been assuming that we would use the PostgreSQL Buildfarm as
a model.

http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).

I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.

Good point. Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).

I haven't done it, but I understand that you can create a "local"
buildfarm instance which isn't reporting its results. Again,
something similar might be good.

Well, yeah. So the developers would get a local 'copy' of all the
benchmarks / workloads and could run them?

3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.

Not part of the current buildfarm; just using a similar overall
pattern. Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.

OK, got it.

2) How would you use it? What procedure would you expect?

People who had suitable test environments could sign up to
periodically build and performance test using the predetermined
test suite, and report results back for a consolidated status
display. That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good
idea, as to get reliable benchmark results you need a proper
environment (not influenced by other jobs, changes of hw etc.).

Yeah, accurate benchmarking is not easy. We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm. Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.

I guess we could run a script that collects all those important
parameters and then detect changes. Anyway we still need some 'really
stable' machines that are not changed at all, to get a long-term baseline.

But I guess that could be done by running some dedicated machines ourselves.

regards
Tomas

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Kevin Grittner (#4)
Re: performance-test farm

On 05/11/2011 06:21 PM, Kevin Grittner wrote:

Tomas Vondra<tv@fuzzy.cz> wrote:

Dne 11.5.2011 23:41, Kevin Grittner napsal(a):

Tomas Vondra<tv@fuzzy.cz> wrote:

First up, you guys should be aware that Greg Smith at least is working
on this. Let's not duplicate effort.

1) Is there something that might serve as a model?

I've been assuming that we would use the PostgreSQL Buildfarm as
a model.

http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may
not be the right place to do performance testing (not sure how
isolated the benchmarks can be etc.).

I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.

Some buildfarm members might well be suitable for it.

I recently added support for running optional steps, and made the SCM
module totally generic. Soon I'm hoping to provide for more radical
extensibility by having addon modules, which will register themselves
with the framework and the have their tests run. I'm currently working
on an API for such modules. This was inspired by Mike Fowler's work on a
module to test JDBC builds, which his buildfarm member is currently
doing: See
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=piapiac&amp;dt=2011-05-11%2000%3A00%3A02&gt;
for example. Obvious candidate modules might be other client libraries
(e.g. perl DBD::Pg), non-committed patches, non-standard tests, and
performance testing.

2) Not sure how open this might be for the developers (if they
could issue their own builds etc.).

I haven't done it, but I understand that you can create a "local"
buildfarm instance which isn't reporting its results. Again,
something similar might be good.

You can certainly create a client that doesn't report its results (just
run it in --test mode). And you can create your own private server
(that's been done by at least two organizations I know of).

But to test your own stuff, what we really need is a module to run
non-committed patches, I think (see above).

There buildfarm client does have a mode (--from-source) that lets you
test your own stuff and doesn't report on it if you do, but I don't see
that it would be useful here.

3) If this should be part of the current buildfarm, then I'm
afraid I can't do much about it.

Sure you can. Contribute to the efforts mentioned above.

Not part of the current buildfarm; just using a similar overall
pattern. Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.

The buildfarm server is a pretty generic reporting framework. Sure we
can build another. But it seems a bit redundant.

2) How would you use it? What procedure would you expect?

People who had suitable test environments could sign up to
periodically build and performance test using the predetermined
test suite, and report results back for a consolidated status
display. That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good
idea, as to get reliable benchmark results you need a proper
environment (not influenced by other jobs, changes of hw etc.).

You are not going to get a useful performance farm except in a
distributed way. We don't own any labs, nor have we any way of
assembling the dozens or hundreds of machines to represent the spectrum
of platforms that we want tested in one spot. Knowing that we have
suddenly caused a performance regression on, say, FreeBSD 8.1 running on
AMD64, is a critical requirement.

Yeah, accurate benchmarking is not easy. We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm. Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.

Indeed, there are lots of moving pieces.

cheers

andrew

#7Stephen Frost
sfrost@snowman.net
In reply to: Andrew Dunstan (#6)
Re: performance-test farm

* Andrew Dunstan (andrew@dunslane.net) wrote:

First up, you guys should be aware that Greg Smith at least is
working on this. Let's not duplicate effort.

Indeed. I'm also interested in making this happen and have worked with
Greg in the past on it. There's even some code out there that we
developed to add it on to the buildfarm, though that needs to be
reworked to fit with Andrew's latest changes (which are all good
changes).

We need a bit of hardware, but more, we need someone to clean up the
code, get it all integrated, and make it all work and report useful
information. My feeling is "if you build it, they will come" with
regard to the hardware/performance machines.

Thanks,

Stephen

#8Josh Berkus
josh@agliodbs.com
In reply to: Stephen Frost (#7)
Re: performance-test farm

Guys,

There are two mutually exclusive problems to solve with a
performance-test farm.

The first problem is plaform performance, which would be a matter of
expanding the buildfarm to include a small set of performance tests ...
probably ones based on previously known problems, plus some other simple
common operations. The goal here would be to test on as many different
machines as possible, rather than getting full coverage of peformance.

The second would be to test the full range of PostgreSQL performance.
That is, to test every different thing we can reasonably benchmark on a
PostgreSQL server. This would have to be done on a few dedicated
full-time testing machines, because of the need to use the full hardware
resources. When done, this test would be like a full-blown TPC benchmark.

The above are fundamentally different tests.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#9Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#8)
Re: performance-test farm

* Josh Berkus (josh@agliodbs.com) wrote:

The first problem is plaform performance, which would be a matter of
expanding the buildfarm to include a small set of performance tests ...
probably ones based on previously known problems, plus some other simple
common operations. The goal here would be to test on as many different
machines as possible, rather than getting full coverage of peformance.

imv, we should be trying to include the above in the regression tests,
presuming that they can be done in that structure and that they can be
done 'quickly'. (It shouldn't be hard to figure out if gettimeofday()
is really slow on one arch, for example, which I think is what you're
getting at here...)

The second would be to test the full range of PostgreSQL performance.
That is, to test every different thing we can reasonably benchmark on a
PostgreSQL server. This would have to be done on a few dedicated
full-time testing machines, because of the need to use the full hardware
resources. When done, this test would be like a full-blown TPC benchmark.

Right, this is what I thought this discussion (and much of the other
recent commentary) was focused on. I don't see the first as needing an
independent 'farm'.

Thanks,

Stephen

#10Greg Smith
greg@2ndquadrant.com
In reply to: Tomas Vondra (#5)
Re: performance-test farm

Tomas Vondra wrote:

Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

The idea is that buildfarm systems that are known to have a) reasonable
hardware and b) no other concurrent work going on could also do
performance tests. The main benefit of this approach is it avoids
duplicating all of the system management and source code building work
needed for any sort of thing like this; just leverage the buildfarm
parts when they solve similar enough problems. Someone has actually
done all that already; source code was last sync'd to the build farm
master at the end of March: https://github.com/greg2ndQuadrant/client-code

By far the #1 thing needed to move this forward from where it's stuck at
now is someone willing to dig into the web application side of this.
We're collecting useful data. It needs to now be uploaded to the
server, saved, and then reports of what happened generated. Eventually
graphs of performance results over time will be straighforward to
generate. But the whole idea requires someone else (not Andrew, who has
enough to do) sits down and figures out how to extend the web UI with
these new elements.

I guess we could run a script that collects all those important
parameters and then detect changes. Anyway we still need some 'really
stable' machines that are not changed at all, to get a long-term baseline.

I have several such scripts I use, and know where two very serious ones
developed by others are at too. This part is not a problem. If the
changes are big enough to matter, they will show up as a difference on
the many possible "how is the server configured?" reports, we just need
to pick the most reasonable one. It's a small details I'm not worried
about yet.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#9)
Re: performance-test farm

* Josh Berkus (josh@agliodbs.com) wrote:

The first problem is plaform performance, which would be a matter of
expanding the buildfarm to include a small set of performance tests ...
probably ones based on previously known problems, plus some other simple
common operations. The goal here would be to test on as many different
machines as possible, rather than getting full coverage of peformance.

I think it's a seriously *bad* idea to expect existing buildfarm members
to produce useful performance data. Very few of them are running on
dedicated machines, and some are deliberately configured with
performance-trashing options. (I think just about all of 'em use
--enable-cassert, but there are some with worse things...)

We can probably share a great deal of the existing buildfarm code and
infrastructure, but the actual members of the p-farm will need to be a
separate collection of machines running different builds.

Stephen Frost <sfrost@snowman.net> writes:

imv, we should be trying to include the above in the regression tests,
presuming that they can be done in that structure and that they can be
done 'quickly'.

There's no such thing as a useful performance test that runs quickly
enough to be sane to incorporate in our standard regression tests.

regards, tom lane

#12Greg Smith
greg@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: performance-test farm

Tom Lane wrote:

There's no such thing as a useful performance test that runs quickly
enough to be sane to incorporate in our standard regression tests.

To throw a hard number out: I can get a moderately useful performance
test on a SELECT-only workload from pgbench in about one minute. That's
the bare minimum, stepping up to 5 minutes is really necessary before
I'd want to draw any real conclusions.

More importantly, a large portion of the time I'd expect regression test
runs to be happening with debug/assert on. We've well established this
trashes pgbench performance. One of the uglier bits of code added to
add the "performance farm" feature to the buildfarm code was hacking in
a whole different set of build options for it.

Anyway, what I was envisioning here was that performance farm systems
would also execute the standard buildfarm tests, but not the other way
around. We don't want performance numbers from some platform that is
failing the basic tests. I would just expect that systems running the
performance tests would cycle through regression testing much less
often, as they might miss a commit because they were running a longer
test then.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#13Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#11)
Re: performance-test farm

Dne 12.5.2011 16:36, Tom Lane napsal(a):

* Josh Berkus (josh@agliodbs.com) wrote:

The first problem is plaform performance, which would be a matter of
expanding the buildfarm to include a small set of performance tests ...
probably ones based on previously known problems, plus some other simple
common operations. The goal here would be to test on as many different
machines as possible, rather than getting full coverage of peformance.

I think it's a seriously *bad* idea to expect existing buildfarm members
to produce useful performance data. Very few of them are running on
dedicated machines, and some are deliberately configured with
performance-trashing options. (I think just about all of 'em use
--enable-cassert, but there are some with worse things...)

We can probably share a great deal of the existing buildfarm code and
infrastructure, but the actual members of the p-farm will need to be a
separate collection of machines running different builds.

Yes, I agree that to get reliable and useful performance data, we need
well defined environment (dedicated machines, proper settings) and this
probably is not possible with most of the current buildfarm members.
That's not an issue of the buildfarm - it simply serves other purposes,
not performance testing.

But I believe using the buildfarm framework is a good idea, and maybe we
could even use some of the nodes (those that are properly set).

Not sure if there should be two separate farms (one to test builds, one
to test performance), or if we could run one farm and enable
'performance-testing modules' only for some of the members.

regards
Tomas

#14Tomas Vondra
tv@fuzzy.cz
In reply to: Greg Smith (#10)
Re: performance-test farm

Dne 12.5.2011 08:54, Greg Smith napsal(a):

Tomas Vondra wrote:

Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

The idea is that buildfarm systems that are known to have a) reasonable
hardware and b) no other concurrent work going on could also do
performance tests. The main benefit of this approach is it avoids
duplicating all of the system management and source code building work
needed for any sort of thing like this; just leverage the buildfarm
parts when they solve similar enough problems. Someone has actually
done all that already; source code was last sync'd to the build farm
master at the end of March: https://github.com/greg2ndQuadrant/client-code

Yes, I think that using the existing buildfarm framework is a good idea.
Do you think we should integrate this into the current buildfarm
(although only the selected nodes would run these performance tests) or
that it should be a separate farm?

regards
Tomas

#15Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#12)
Re: performance-test farm

All,

BTW, if we want a kind of "performance unit test", Drizzle has a very
nice framework for this. And it's even already PostgreSQL-compatible.

I'm hunting for the link for it now ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#16Lou Picciano
loupicciano@comcast.net
In reply to: Josh Berkus (#15)
Re: performance-test farm

Josh My Man! How are you?!!

Is this the one?: http://planetdrizzle.org/

Lou Picciano

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: pgsql-hackers@postgresql.org
Sent: Thursday, May 12, 2011 8:11:57 PM
Subject: Re: [HACKERS] performance-test farm

All,

BTW, if we want a kind of "performance unit test", Drizzle has a very
nice framework for this. And it's even already PostgreSQL-compatible.

I'm hunting for the link for it now ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Josh Berkus
josh@agliodbs.com
In reply to: Lou Picciano (#16)
Re: performance-test farm

On 5/12/11 7:19 PM, Lou Picciano wrote:

Josh My Man! How are you?!!

Is this the one?: http://planetdrizzle.org/

Since that's their blog feed, here's some durable links:

Testing tool:
http://docs.drizzle.org/testing/dbqp.html

Random query generator:
https://launchpad.net/randgen

However, looking at those now I'm not seeing response time as part of
the test, which is of course critical for us. Also, their test results
are diff-based, which is (as we know all too well) fragile.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#18Tomas Vondra
tv@fuzzy.cz
In reply to: Greg Smith (#10)
Re: performance-test farm

Dne 12.5.2011 08:54, Greg Smith napsal(a):

Tomas Vondra wrote:

Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

The idea is that buildfarm systems that are known to have a) reasonable
hardware and b) no other concurrent work going on could also do
performance tests. The main benefit of this approach is it avoids
duplicating all of the system management and source code building work
needed for any sort of thing like this; just leverage the buildfarm
parts when they solve similar enough problems. Someone has actually
done all that already; source code was last sync'd to the build farm
master at the end of March: https://github.com/greg2ndQuadrant/client-code

By far the #1 thing needed to move this forward from where it's stuck at
now is someone willing to dig into the web application side of this.
We're collecting useful data. It needs to now be uploaded to the
server, saved, and then reports of what happened generated. Eventually
graphs of performance results over time will be straighforward to
generate. But the whole idea requires someone else (not Andrew, who has
enough to do) sits down and figures out how to extend the web UI with
these new elements.

Hi all,

it seems CSPUG will get two refurbished servers at the end of this
month. We plan to put both of them to the buildfarm - one for regular
testing with Czech locales and I'd like to use the other one for the
proposed performance testing.

I'm willing to put some time into this, but I'll need help with
preparing the 'action plan' (because you know - I live in EU, and in EU
everything is driven by action plans).

AFAIK what needs to be done is:

1) preparing the hw, OS etc. - ok
2) registering the machine as a buildfarm member - ok
3) modifying the buildfarm client-code to collect performance data

- What data should to be collected prior to the benchmark?

a) info about the environment (to make sure it's safe)?
b) something else?

- What performance tests should be executed?

a) let's start with pgbench - select-only and regular
b) something else in the future? DSS/DWH workloads?
c) special tests (spinlocks, db that fits to RAM, ...)

4) modifying the buildfarm server-code to accept and display
performance data

- not really sure what needs to be done here

regards
Tomas

#19Tomas Vondra
tv@fuzzy.cz
In reply to: Greg Smith (#10)
Re: performance-test farm

On 12.5.2011 08:54, Greg Smith wrote:

Tomas Vondra wrote:

The idea is that buildfarm systems that are known to have a) reasonable
hardware and b) no other concurrent work going on could also do
performance tests. The main benefit of this approach is it avoids
duplicating all of the system management and source code building work
needed for any sort of thing like this; just leverage the buildfarm
parts when they solve similar enough problems. Someone has actually
done all that already; source code was last sync'd to the build farm
master at the end of March: https://github.com/greg2ndQuadrant/client-code

By far the #1 thing needed to move this forward from where it's stuck at
now is someone willing to dig into the web application side of this.
We're collecting useful data. It needs to now be uploaded to the
server, saved, and then reports of what happened generated. Eventually
graphs of performance results over time will be straighforward to
generate. But the whole idea requires someone else (not Andrew, who has
enough to do) sits down and figures out how to extend the web UI with
these new elements.

Hi,

I'd like to revive this thread. A few days ago we have finally got our
buildfarm member working (it's called "magpie") - it's spending ~2h a
day chewing on the buildfarm tasks, so we can use the other 22h to do
some useful work.

I suppose most of you are busy with 9.2 features, but I'm not so I'd
like to spend my time on this.

Now that I had to set up the buildfarm member I'm somehow aware of how
the buildfarm works. I've checked the PGBuildFarm/server-code and
greg2ndQuadrant/client-code repositories and while I certainly am not a
perl whiz, I believe I can tweak it to handle the performance-related
result too.

What is the current state of this effort? Is there someone else working
on that? If not, I propose this (for starters):

* add a new page "Performance results" to the menu, with a list of
members that uploaded the perfomance-results

* for each member, there will be a list of tests along with a running
average for each test, last test and indicator if it improved, got
worse or is the same

* for each member/test, a history of runs will be displayed, along
with a simple graph

I'm not quite sure how to define which members will run the performance
tests - I see two options:

* for each member, add a flag "run performance tests" so that we can
choose which members are supposed to be safe

OR

* run the tests on all members (if enabled in build-farm.conf) and
then decide which results are relevant based on data describing the
environment (collected when running the tests)

I'm also wondering if

* using the buildfarm infrastructure the right thing to do, if it can
provide some 'advanced features' (see below)

* we should use the current buildfarm members (although maybe not all
of them)

* it can handle one member running the tests with different settings
(various shared_buffer/work_mem sizes, num of clients etc.) and
various hw configurations (for example magpie contains a regular
SATA drive as well as an SSD - would be nice to run two sets of
tests, one for the spinner, one for the SSD)

* this can handle 'pushing' a list of commits to test (instead of
just testing the HEAD) so that we can ask the members to run the
tests for particular commits in the past (I consider this to be
very handy feature)

regards
Tomas

#20Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#19)
Re: performance-test farm

On Mon, Mar 5, 2012 at 5:20 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

The idea is that buildfarm systems that are known to have a) reasonable
hardware and b) no other concurrent work going on could also do
performance tests.  The main benefit of this approach is it avoids
duplicating all of the system management and source code building work
needed for any sort of thing like this; just leverage the buildfarm
parts when they solve similar enough problems.  Someone has actually
done all that already; source code was last sync'd to the build farm
master at the end of March:  https://github.com/greg2ndQuadrant/client-code

By far the #1 thing needed to move this forward from where it's stuck at
now is someone willing to dig into the web application side of this.
We're collecting useful data.  It needs to now be uploaded to the
server, saved, and then reports of what happened generated.  Eventually
graphs of performance results over time will be straighforward to
generate.  But the whole idea requires someone else (not Andrew, who has
enough to do) sits down and figures out how to extend the web UI with
these new elements.

Hi,

I'd like to revive this thread. A few days ago we have finally got our
buildfarm member working (it's called "magpie") - it's spending ~2h a
day chewing on the buildfarm tasks, so we can use the other 22h to do
some useful work.

I suppose most of you are busy with 9.2 features, but I'm not so I'd
like to spend my time on this.

Now that I had to set up the buildfarm member I'm somehow aware of how
the buildfarm works. I've checked the PGBuildFarm/server-code and
greg2ndQuadrant/client-code repositories and while I certainly am not a
perl whiz, I believe I can tweak it to handle the performance-related
result too.

What is the current state of this effort? Is there someone else working
on that? If not, I propose this (for starters):

 * add a new page "Performance results" to the menu, with a list of
   members that uploaded the perfomance-results

 * for each member, there will be a list of tests along with a running
   average for each test, last test and indicator if it improved, got
   worse or is the same

 * for each member/test, a history of runs will be displayed, along
   with a simple graph

I suspect that the second of these is not that useful; presumably
there will be many small, irrelevant variations. The graph is the key
thing.

I'm not quite sure how to define which members will run the performance
tests - I see two options:

 * for each member, add a flag "run performance tests" so that we can
   choose which members are supposed to be safe

 OR

 * run the tests on all members (if enabled in build-farm.conf) and
   then decide which results are relevant based on data describing the
   environment (collected when running the tests)

First option seems better to me, but I don't run any buildfarm critters, so...

I'm also wondering if

 * using the buildfarm infrastructure the right thing to do, if it can
   provide some 'advanced features' (see below)

 * we should use the current buildfarm members (although maybe not all
   of them)

The main advantage of using the buildfarm, AFAICS, is that it would
make it less work for people to participate in this new thing.

 * it can handle one member running the tests with different settings
   (various shared_buffer/work_mem sizes, num of clients etc.) and
   various hw configurations (for example magpie contains a regular
   SATA drive as well as an SSD - would be nice to run two sets of
   tests, one for the spinner, one for the SSD)

That would be nice.

 * this can handle 'pushing' a list of commits to test (instead of
   just testing the HEAD) so that we can ask the members to run the
   tests for particular commits in the past (I consider this to be
   very handy feature)

That would be nice, too.

I think it's great that you want to put some energy into this. It's
something we have been talking about for years, but if we're making
any progress at all, it's glacial.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Greg Smith
greg@2ndQuadrant.com
In reply to: Tomas Vondra (#19)
Re: performance-test farm

On 03/05/2012 05:20 PM, Tomas Vondra wrote:

What is the current state of this effort? Is there someone else working
on that? If not, I propose this (for starters):

* add a new page "Performance results" to the menu, with a list of
members that uploaded the perfomance-results

* for each member, there will be a list of tests along with a running
average for each test, last test and indicator if it improved, got
worse or is the same

* for each member/test, a history of runs will be displayed, along
with a simple graph

I am quite certain no one else is working on this.

The results are going to bounce around over time. "Last test" and
simple computations based on it are not going to be useful. A graph and
a way to drill down into the list of test results is what I had in mind.

Eventually we'll want to be able to flag bad trends for observation,
without having to look at the graph. That's really optional for now,
but here's how you could do that. If you compare a short moving average
to a longer one, you can find out when a general trend line has been
crossed upwards or downwards, even with some deviation to individual
samples. There's a stock trading technique using this property called
the moving average crossover; a good example is shown at
http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&amp;wealthScriptType=MovingAverageCrossover

To figure this out given a series of TPS value samples, let MA(N) be the
moving average of the last N samples. Now compute a value like this:

MA(5) - MA(20)

If that's positive, recent tests have improved performance from the
longer trend average. If it's negative, performance has dropped. We'll
have to tweak the constants in this--5 and 20 in this case--to make this
useful. It should be sensitive enough to go off after a major change,
while not complaining about tiny variations. I can tweak those
parameters easily once there's some sample data to check against. They
might need to be an optional part of the configuration file, since
appropriate values might have some sensitivity to how often the test runs.

It's possible to keep a running weighted moving average without actually
remembering all of the history. The background writer works that way.
I don't think that will be helpful here though, because you need a chunk
of the history to draw a graph of it.

I'm not quite sure how to define which members will run the performance
tests - I see two options:

* for each member, add a flag "run performance tests" so that we can
choose which members are supposed to be safe

OR

* run the tests on all members (if enabled in build-farm.conf) and
then decide which results are relevant based on data describing the
environment (collected when running the tests)

I was thinking of only running this on nodes that have gone out of their
way to enable this, so something more like the first option you gave
here. Some buildfarm animals might cause a problem for their owners
should they suddenly start doing anything new that gobbles up a lot more
resources. It's important that any defaults--including what happens if
you add this feature to the code but don't change the config file--does
not run any performance tests.

* it can handle one member running the tests with different settings
(various shared_buffer/work_mem sizes, num of clients etc.) and
various hw configurations (for example magpie contains a regular
SATA drive as well as an SSD - would be nice to run two sets of
tests, one for the spinner, one for the SSD)

* this can handle 'pushing' a list of commits to test (instead of
just testing the HEAD) so that we can ask the members to run the
tests for particular commits in the past (I consider this to be
very handy feature)

I would highly recommend against scope creep in these directions. The
goal here is not to test hardware or configuration changes. You've been
doing a lot of that recently, and this chunk of software is not going to
be a good way to automate such tests.

The initial goal of the performance farm is to find unexpected
regressions in the performance of the database code, running some simple
tests. It should handle the opposite too, proving improvements work out
as expected on multiple systems. The buildfarm structure is suitable
for that job.

If you want to simulate a more complicated test, one where things like
work_mem matter, the first step there is to pick a completely different
benchmark workload. You're not going to do it with simple pgbench calls.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#22Tomas Vondra
tv@fuzzy.cz
In reply to: Greg Smith (#21)
Re: performance-test farm

On 4.4.2012 05:35, Greg Smith wrote:

On 03/05/2012 05:20 PM, Tomas Vondra wrote:

What is the current state of this effort? Is there someone else working
on that? If not, I propose this (for starters):

* add a new page "Performance results" to the menu, with a list of
members that uploaded the perfomance-results

* for each member, there will be a list of tests along with a running
average for each test, last test and indicator if it improved, got
worse or is the same

* for each member/test, a history of runs will be displayed, along
with a simple graph

I am quite certain no one else is working on this.

The results are going to bounce around over time. "Last test" and
simple computations based on it are not going to be useful. A graph and
a way to drill down into the list of test results is what I had in mind.

Eventually we'll want to be able to flag bad trends for observation,
without having to look at the graph. That's really optional for now,
but here's how you could do that. If you compare a short moving average
to a longer one, you can find out when a general trend line has been
crossed upwards or downwards, even with some deviation to individual
samples. There's a stock trading technique using this property called
the moving average crossover; a good example is shown at
http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&amp;wealthScriptType=MovingAverageCrossover

Yes, exactly. I've written 'last test' but I actually meant something
like this, i.e. detecting the change of the trend over time. The moving
average crossover looks interesting, although there are other ways to
achieve similar goal (e.g. correlating with a a pattern - a step
function for example, etc.).

It's possible to keep a running weighted moving average without actually
remembering all of the history. The background writer works that way. I
don't think that will be helpful here though, because you need a chunk
of the history to draw a graph of it.

Keeping the history is not a big deal IMHO. And it gives us the freedom
to run a bit more complex analysis anytime later.

I'm not quite sure how to define which members will run the performance
tests - I see two options:

* for each member, add a flag "run performance tests" so that we can
choose which members are supposed to be safe

OR

* run the tests on all members (if enabled in build-farm.conf) and
then decide which results are relevant based on data describing the
environment (collected when running the tests)

I was thinking of only running this on nodes that have gone out of their
way to enable this, so something more like the first option you gave
here. Some buildfarm animals might cause a problem for their owners
should they suddenly start doing anything new that gobbles up a lot more
resources. It's important that any defaults--including what happens if
you add this feature to the code but don't change the config file--does
not run any performance tests.

Yes, good points. Default should be 'do not run performance test' then.

* it can handle one member running the tests with different settings
(various shared_buffer/work_mem sizes, num of clients etc.) and
various hw configurations (for example magpie contains a regular
SATA drive as well as an SSD - would be nice to run two sets of
tests, one for the spinner, one for the SSD)

* this can handle 'pushing' a list of commits to test (instead of
just testing the HEAD) so that we can ask the members to run the
tests for particular commits in the past (I consider this to be
very handy feature)

I would highly recommend against scope creep in these directions. The
goal here is not to test hardware or configuration changes. You've been
doing a lot of that recently, and this chunk of software is not going to
be a good way to automate such tests.

The initial goal of the performance farm is to find unexpected
regressions in the performance of the database code, running some simple
tests. It should handle the opposite too, proving improvements work out
as expected on multiple systems. The buildfarm structure is suitable
for that job.

Testing hardware configuration changes was not the goal of the proposed
behavior. The goal was to test multiple (sane) PostgreSQL configs. There
are conditions that might demonstrate themselves only in certain
conditions (e.g. very small/large shared buffers, spinners/SSDs etc.).

Those are exacly the 'unexpected regressions' you've mentioned.

If you want to simulate a more complicated test, one where things like
work_mem matter, the first step there is to pick a completely different
benchmark workload. You're not going to do it with simple pgbench calls.

Yes, but I do expect to prepare custom pgbench scripts in the future to
test such things. So I want to design the code so that this is possible
(either right now or in the future).

A simple workaround would be to create a 'virtual member' for each
member configuration, e.g.

magpie-hdd-1024-8 - magpie with HDD, 1024MB shared buffers and 8MB
work_mem

magpie-ssd-512-16 - magpie with SSD, 512MB shared buffers and 16MB
work_mem

and so on. But IMHO it's a bit dirty solution.

Tomas