pgbench progress report improvements
Here is a patch submission for reference to the next commitfest.
Improve pgbench measurements & progress report
- Use progress option both under init & bench.
Activate progress report by default, every 5 seconds.
When initializing, --quiet reverts to the old every 100,000 insertions
behavior...
- Measure transaction latency instead of computing it.
The previous computed figure does not make sense under throttling,
as sleep throttling time was included in the figures.
The latency and its standard deviation are printed out under progress
and in the final report.
- Take thread start time at the beginning of the thread.
Otherwise it includes pretty slow thread/fork system start times in
the measurements. May help with bug #8326.
- Reduce and compensate throttling underestimation bias.
This is a consequence of relying on an integer random generator.
It was about 0.5% with 1000 distinct values. Multiplier added to
compensate the 0.05% bias with 10000 distinct integer values.
- Updated documentation & help message.
Sample output under initialization with --progress=1
creating tables...
223000 of 1000000 tuples (22%) done (elapsed 1.01 s, remaining 3.52 s).
430200 of 1000000 tuples (43%) done (elapsed 2.03 s, remaining 2.68 s).
507300 of 1000000 tuples (50%) done (elapsed 3.44 s, remaining 3.34 s).
571400 of 1000000 tuples (57%) done (elapsed 4.05 s, remaining 3.03 s).
786800 of 1000000 tuples (78%) done (elapsed 5.03 s, remaining 1.36 s).
893200 of 1000000 tuples (89%) done (elapsed 6.02 s, remaining 0.72 s).
1000000 of 1000000 tuples (100%) done (elapsed 6.93 s, remaining 0.00 s).
...
Sample output under benchmarking with --progress=1
progress: 1.0 s, 116.7 tps, 8.528 +- 1.217 ms lat
progress: 2.0 s, 117.7 tps, 8.493 +- 1.165 ms lat
progress: 3.0 s, 115.5 tps, 8.654 +- 1.650 ms lat
progress: 4.0 s, 116.8 tps, 8.559 +- 1.365 ms lat
progress: 5.0 s, 111.7 tps, 8.947 +- 3.475 ms lat
progress: 6.0 s, 116.7 tps, 8.563 +- 1.387 ms lat
progress: 7.0 s, 116.6 tps, 8.572 +- 1.474 ms lat
...
# oops, things happen:
progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat
...
--
Fabien.
Attachments:
pgbench-measurements-v1.patchtext/x-diff; name=pgbench-measurements-v1.patchDownload+182-88
On Tue, 2013-08-06 at 10:47 +0200, Fabien wrote:
Here is a patch submission for reference to the next commitfest.
Please replace the tabs in the SGML files with spaces.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is a patch submission for reference to the next commitfest.
Please replace the tabs in the SGML files with spaces.
Attached a v2 with tabs replaced by spaces.
--
Fabien.
Attachments:
pgbench-measurements-v2.patchtext/x-diff; name=pgbench-measurements-v2.patchDownload+182-88
For others following along, Pavel Stehule reviewed this on a different thread:
/messages/by-id/CAFj8pRAVYUhv8x3fEEwaSGGvDcijOgVLZSAVd7rmWn9Mw8RANA@mail.gmail.com
On Tue, Aug 06, 2013 at 10:47:14AM +0200, Fabien wrote:
Improve pgbench measurements & progress report
These changes are loosely coupled; please separate them into several patch
files:
- Use progress option both under init & bench.
Activate progress report by default, every 5 seconds.
When initializing, --quiet reverts to the old every 100,000 insertions
behavior...
Patch (1): Change the default from --progress=0 to --progress=5
This has a disadvantage of causing two extra gettimeofday() calls per
transaction. That's not cheap on all platforms; a user comparing pgbench
results across versions will need to make a point of disabling this again to
make his results comparable. Given that threat and uncertainty about which
default would be more popular, I think we should drop this part.
Patch (2): Make --initialize mode respect --progress.
The definition you've chosen for --quiet makes that option contrary to its own
name: message-per-100k-tuples is typically more voluminous than your proposed
new default of message-per-5s. In fact, since --quiet currently switches from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.
During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:
/messages/by-id/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
/messages/by-id/1346472039.18010.10.camel@vanquo.pezone.net
That gives me pause about working through yet another change; we keep burning
time on this minor issue.
- Measure transaction latency instead of computing it.
The previous computed figure does not make sense under throttling,
as sleep throttling time was included in the figures.
The latency and its standard deviation are printed out under progress
and in the final report.
Patch (3): Report the standard deviation of latency.
Seems potentially useful; see inline comments below.
In my limited testing, the interesting latency cases involved stray clusters
of transactions showing 10x-100x mean latency. If that's typical, I'd doubt
mean/stddev is a great analytical tool. But I have little reason to believe
that my experience here was representative.
Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
If we do this, should we also augment the --log output to contain the figures
necessary to reconstruct this calculation? I think that would mean adding
fields for the time when the first command of the transaction was sent.
- Take thread start time at the beginning of the thread.
Otherwise it includes pretty slow thread/fork system start times in
the measurements. May help with bug #8326.
Patch (5)
That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.
- Reduce and compensate throttling underestimation bias.
This is a consequence of relying on an integer random generator.
It was about 0.5% with 1000 distinct values. Multiplier added to
compensate the 0.05% bias with 10000 distinct integer values.
Patch (6)
See inline comment below.
- Updated documentation & help message.
Include any relevant documentation and --help updates with the patch
necessitating them. If there are errors in the documentation today, put fixes
for those in a separate patch (7).
Additionally, this patch contains numerous whitespace-only changes. Do not
mix those into a patch that makes a functional change. Most of them would be
done or undone by the next pgindent run, so I suggest simply dropping them.
See the first entry here:
https://wiki.postgresql.org/wiki/Submitting_a_Patch#Reasons_your_patch_might_be_returned
Patches (5) and (6) are nicely small and free of UI questions. At a minimum,
let's try to get those done in this CommitFest. I propose dropping patches
(1) and (2). Patches (3) and (4) have open questions, but I think they have
good value potential.
Sample output under benchmarking with --progress=1
progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat
"x +- y" does not convey that y is a standard deviation. I suggest getting
the term "stddev" in there somehow, maybe like this:
progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792
--- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c
- " -R, --rate=SPEC target rate in transactions per second\n" + " -R, --rate=NUM target rate in transactions per second\n"
This would fall under patch (7) if you feel it's needed.
@@ -928,14 +931,18 @@ top: * Use inverse transform sampling to randomly generate a delay, such * that the series of delays will approximate a Poisson distribution * centered on the throttle_delay time. - * - * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier. + * + * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay multiplier, + * and results in a 0.055 % target underestimation bias: + * + * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i; + * = 1.000552717032611116335474 * * If transactions are too slow or a given wait is shorter than * a transaction, the next transaction will start right away. */ - int64 wait = (int64) - throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + int64 wait = (int64) (throttle_delay * + 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));
This quantizes the random number, then dequantizes it, then corrects for the
resolution lost in the round trip. Shouldn't it just be:
int64 wait = (int64) (throttle_delay *
-log(1 - pg_erand48(thread->random_state)));
- if ((!use_quiet) && (j % 100000 == 0)) + if (!progress && (j % 100000 == 0))
Project style is "progress == 0" rather than "!progress".
@@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients, * transaction. The measured lag may be caused by thread/client load, * the database load, or the Poisson throttling process. */ - printf("average rate limit schedule lag: %.3f ms (max %.3f ms)\n", - 0.001 * throttle_lag / normal_xacts, 0.001 * throttle_lag_max); + double latency = 0.001 * total_latencies / normal_xacts; + double sqlat = (double) total_sqlats / normal_xacts; + + printf("average latency under rate limit: %.3f +- %.3f ms\n" + "average rate limit lag: %.3f ms (max %.3f ms)\n", + latency, + 0.001 * sqrt(sqlat - 1000000*latency*latency), + 0.001 * throttle_lag / normal_xacts, + 0.001 * throttle_lag_max);
This output only happens when using --rate. That has made sense heretofore,
because in the absence of rate limiting, average latency can be trivially
calculated from tps. Since we will now track stddev as well, let's print
something like this even when --rate is not in use.
- float tps, total_run, latency; + double tps, total_run, latency, sqlat, stdev, lag;
Why these variable type changes?
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Noah,
Thanks a lot for all these comments!
I'm not planning to apply all of them directly, especially removing
features that I think really desirable. Please find a defense against some
of your suggestions. I wish to wait for some more feedback about these
arguments before spending time in heavy changes.
Improve pgbench measurements & progress report
These changes are loosely coupled; please separate them into several patch
files:
I thought about this. I submitted a bunch of very small pgbench patches to
the previous commit fest, and it resulted in a lot of duplicated tests by
reviewers, which made it a waste of their precious time.
ISTM that what you suggest will result in more reviewer time, which is the
critical resource of the commitfest.
Moreover, splitting the patch as you suggest, especially in 7 patches,
would result in conflicting or dependent patches, which creates yet more
issues for the reviews. There is *some* coupling, because the reports are
about the measurements performed, and I change both to try to have
something consistent in the end. I can separate the doc fix, but this is
just one line...
Basically, I tried to optimize reviewer time by a combined patch focussed
on pgbench measurements and their reporting.
But this does not prevent discussing features separately!
- Use progress option both under init & bench.
Activate progress report by default, every 5 seconds.
When initializing, --quiet reverts to the old every 100,000 insertions
behavior...Patch (1): Change the default from --progress=0 to --progress=5
This has a disadvantage of causing two extra gettimeofday() calls per
transaction. That's not cheap on all platforms; a user comparing
pgbench results across versions will need to make a point of disabling
this again to make his results comparable. Given that threat and
uncertainty about which default would be more popular, I think we should
drop this part.
The gettimeofday call time is very small compared to network and database
(disk!) accesses involved in a pgbench run. On my small laptop,
gettimeofday takes about 0.00000002 seconds (0.02 ᅵs -- micro seconds) per
call, which is typically under 1/1000 of the fastest local-socket
read-only one-table cache-hit prepared trivial transaction.
This is a little different when initializing, but I put a guard on
gettimeofday in that case.
If this is a blocker, I can put 0, but I really do not think it is
necessary because of the performance impact.
About the default behavior : it should be what is more useful.
I found that running pgbench to get significant results requires long
minutes or even hours because of warmup time. Running a command for 30
minutes without any feedback is annoying. The second point is that the
feedback would help user notice that the figures evolve slowly but
significantly, and realise that their too short runs are not significant
at all. So it seems to me that it should be the desirable behavior, and
--progress=0 is always possible anyway for the very performance
(over-)conscious tester.
Patch (2): Make --initialize mode respect --progress.
The definition you've chosen for --quiet makes that option contrary to its own
name: message-per-100k-tuples is typically more voluminous than your proposed
new default of message-per-5s. In fact, since --quiet currently switches from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:/messages/by-id/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
/messages/by-id/1346472039.18010.10.camel@vanquo.pezone.netThat gives me pause about working through yet another change; we keep burning
time on this minor issue.
I totally agree that this "quiet" is not convincing!
My actual opinion is that quiet should just mean quiet:-), i.e. no
progress report.
I tried to preserve the row-counting behavior because I thought that
someone would object otherwise, but I would be really in favor of dropping
the row-counting report behavior altogether and only keep the time
triggered report.
- Measure transaction latency instead of computing it.
Patch (3): Report the standard deviation of latency.
Seems potentially useful; see inline comments below.
In my limited testing, the interesting latency cases involved stray
clusters of transactions showing 10x-100x mean latency. If that's
typical, I'd doubt mean/stddev is a great analytical tool. But I have
little reason to believe that my experience here was representative.
The stddev measurement is sensitive to the cache hit ratio, so this is a
good indicator of whether the steady state is reached.
Indeed, mean is the most basic tool, and stddev the second most basic...
They are better than nothing, and the stddev does add value.
Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
If we do this, should we also augment the --log output to contain the figures
necessary to reconstruct this calculation? I think that would mean adding
fields for the time when the first command of the transaction was sent.
Yep, I thought about it, but decided against for the following reason: the
lag measure is only significant under throttling, which is not the typical
use case for pgbench. When there is no throttling, there is no lag, so
nothing to report. Also, having two formats would mean that existing tool
that analyse these logs would have to deal with both.
...
Patch (5)That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.
It was really annoying when I was debugging pgbench, because I did not
know whether flukes I was seeing were due to this issue or elsewhere.
Having the start time taken when the test really starts help rule out this
as a potential issue for strange figures on short runs.
The bug report I mentionned was a pretty short test with a *lot* of
threads which took possibly several seconds to be started, but I cannot
run that on my small laptop.
- Updated documentation & help message.
Include any relevant documentation and --help updates with the patch
necessitating them. If there are errors in the documentation today, put fixes
for those in a separate patch (7).
I could do a separate patch for a one-liner doc update for homogeneity...
Additionally, this patch contains numerous whitespace-only changes.
Hmmm. This is emacs fault:-) I'll (try) to remove them.
Patches (5) and (6) are nicely small and free of UI questions. At a minimum,
let's try to get those done in this CommitFest. I propose dropping patches
(1) and (2).
I disagree: I thing that having --progress and --quiet work consistently
both under initialization and bench is valuable. The other point is
whether the default should be 0, and I defended why it should really not
be 0, see above.
Patches (3) and (4) have open questions, but I think they have
good value potential.
See, and evaluate, my arguments and answers.
Sample output under benchmarking with --progress=1
progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat"x +- y" does not convey that y is a standard deviation.
I'm refering to the ᅵ sign, but in ascii because I doubt that I can use a
UTF character:-) From http://en.wikipedia.org/wiki/Plus-minus_sign:
"In experimental sciences, the sign commonly indicates the confidence
interval or error in a measurement, often the standard deviation or
standard error."
ISTM that "pgbench" is experimental science.
I suggest getting the term "stddev" in there somehow, maybe like this:
progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792
My issue is to try to keep the line width under control so as to avoid
line breaks in the terminal. Under throttling, there is also the time lag
appended to the line.
Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the
figures, so I would prefer to stick to spaces.
Maybe:
progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
but I liked my "+-" approach:-)
- " -R, --rate=SPEC target rate in transactions per second\n" + " -R, --rate=NUM target rate in transactions per second\n"This would fall under patch (7) if you feel it's needed.
This is really to be homogeneous with other options. "needed" is a strong
word.
- int64 wait = (int64) - throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + int64 wait = (int64) (throttle_delay * + 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));This quantizes the random number, then dequantizes it, then corrects for the
resolution lost in the round trip.
Yep!
Shouldn't it just be:
int64 wait = (int64) (throttle_delay *
-log(1 - pg_erand48(thread->random_state)));
It could. However with this approach the inserted delay would be pretty
much unbounded. The benefit of using 10000 is that the delay is bounded by
about 10 times the expected average rate, so although it is throttled by a
stochastic process, it stays quite active and steady. Basically using the
quantization seemed a good compromise to me...
- if ((!use_quiet) && (j % 100000 == 0)) + if (!progress && (j % 100000 == 0))Project style is "progress == 0" rather than "!progress".
I could. I used "!progress" because "progress == 0" is interpreted as "no
progress", and not "progress with value zero", which would mean a report
at every possible time.
@@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients,
[...]This output only happens when using --rate. That has made sense
heretofore, because in the absence of rate limiting, average latency can
be trivially calculated from tps. Since we will now track stddev as
well, let's print something like this even when --rate is not in use.
It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.
- float tps, total_run, latency; + double tps, total_run, latency, sqlat, stdev, lag;Why these variable type changes?
Because C computes stuff in double anyway (all maths functions such as
sqrt above are about double), and it was an oversight from my part to have
use float in the first place in the previous patch. Thus I fixed it.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 21, 2013 at 4:55 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I suggest getting the term "stddev" in there somehow, maybe like this:
progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792
My issue is to try to keep the line width under control so as to avoid
line breaks in the terminal. Under throttling, there is also the time lag
appended to the line.Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the
figures, so I would prefer to stick to spaces.Maybe:
progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
but I liked my "+-" approach:-)
100 +- 3 implies a range of 97 to 103 and no values are outside of that
range.
Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract the
figures, so I would prefer to stick to spaces.Maybe:
progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
but I liked my "+-" approach:-)
100 +- 3 implies a range of 97 to 103 and no values are outside of that
range.
This is one of the possible meaning, but ISTM that it is not exclusive.
Anyway here we do not have a symmetric distribution of run times.
Well, it seems I'm alone in liking my +-, so I'll backtrack on that.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 21, 2013 at 10:55:54AM +0200, Fabien COELHO wrote:
Improve pgbench measurements & progress report
These changes are loosely coupled; please separate them into several patch
files:I thought about this. I submitted a bunch of very small pgbench patches
to the previous commit fest, and it resulted in a lot of duplicated tests
by reviewers, which made it a waste of their precious time.ISTM that what you suggest will result in more reviewer time, which is
the critical resource of the commitfest.Moreover, splitting the patch as you suggest, especially in 7 patches,
would result in conflicting or dependent patches, which creates yet more
issues for the reviews. There is *some* coupling, because the reports are
about the measurements performed, and I change both to try to have
something consistent in the end. I can separate the doc fix, but this is
just one line...
Post all the patch files to the same mailing list thread and have them
reviewed under a single CommitFest entry. A reviewer or committer preferring
less granularity can squash patches together effortlessly. The inverse task,
splitting a patch, is not so mechanical. Consequently, err on the side of
more, smaller patches; it almost always saves reviewer and committer time.
Basically, I tried to optimize reviewer time by a combined patch focussed
on pgbench measurements and their reporting.
Relatedness of changes is a continuum, and committers sometimes disagree on
how far to subdivide before commit. This patch is not a borderline case;
removing a bias from --rate throttle delays does not belong in the same commit
as enabling progress reports by default.
But this does not prevent discussing features separately!
Actually, it does hamper debate. With smaller patches, a person interested in
just one issue can read just that patch. With a monolithic patch, one must
take in the entire thing to fully comprehend any single issue.
Patch (1): Change the default from --progress=0 to --progress=5
This has a disadvantage of causing two extra gettimeofday() calls per
transaction. That's not cheap on all platforms; a user comparing
pgbench results across versions will need to make a point of disabling
this again to make his results comparable. Given that threat and
uncertainty about which default would be more popular, I think we
should drop this part.The gettimeofday call time is very small compared to network and database
(disk!) accesses involved in a pgbench run. On my small laptop,
gettimeofday takes about 0.00000002 seconds (0.02 µs -- micro seconds)
per call, which is typically under 1/1000 of the fastest local-socket
read-only one-table cache-hit prepared trivial transaction.
Many systems do have cheap gettimeofday(). Many don't. We recently added
contrib/pg_test_timing in response to the variability in this area.
I found that running pgbench to get significant results requires long
minutes or even hours because of warmup time. Running a command for 30
minutes without any feedback is annoying. The second point is that the
feedback would help user notice that the figures evolve slowly but
significantly, and realise that their too short runs are not significant
at all. So it seems to me that it should be the desirable behavior, and
--progress=0 is always possible anyway for the very performance
(over-)conscious tester.
Those benefits aren't compelling enough to counterbalance the risk of
gettimeofday() overhead affecting results. (Other opinions welcome.) For a
tool like pgbench that requires considerable skill to use effectively,
changing a default only helps slightly. It doesn't take much of a risk to
make us better off leaving the default unchanged.
Patch (2): Make --initialize mode respect --progress.
The definition you've chosen for --quiet makes that option contrary to its own
name: message-per-100k-tuples is typically more voluminous than your proposed
new default of message-per-5s. In fact, since --quiet currently switches from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:/messages/by-id/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
/messages/by-id/1346472039.18010.10.camel@vanquo.pezone.netThat gives me pause about working through yet another change; we keep burning
time on this minor issue.I totally agree that this "quiet" is not convincing!
My actual opinion is that quiet should just mean quiet:-), i.e. no
progress report.I tried to preserve the row-counting behavior because I thought that
someone would object otherwise, but I would be really in favor of
dropping the row-counting report behavior altogether and only keep the
time triggered report.
I would be fine with dropping the row-counting behavior. But why subject this
distant side issue to its third round of bikeshedding in 18 months?
Patch (4): Redefine "latency" as reported by pgbench and report "lag" more.
If we do this, should we also augment the --log output to contain the figures
necessary to reconstruct this calculation? I think that would mean adding
fields for the time when the first command of the transaction was sent.Yep, I thought about it, but decided against for the following reason:
the lag measure is only significant under throttling, which is not the
typical use case for pgbench. When there is no throttling, there is no
lag, so nothing to report. Also, having two formats would mean that
existing tool that analyse these logs would have to deal with both.
Fair enough.
- Take thread start time at the beginning of the thread.
Otherwise it includes pretty slow thread/fork system start times in
the measurements. May help with bug #8326.Patch (5)
That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.It was really annoying when I was debugging pgbench, because I did not
know whether flukes I was seeing were due to this issue or elsewhere.
Having the start time taken when the test really starts help rule out
this as a potential issue for strange figures on short runs.The bug report I mentionned was a pretty short test with a *lot* of
threads which took possibly several seconds to be started, but I cannot
run that on my small laptop.
I can reproduce the report on low-spec machines, including a 1-GiB VM with "-c
100 -j 100" instead of "-c 800 -j 100". Your patch does not dramatically
change the spread between "including connections establishing" and "excluding
connections establishing". Thinking about it, that's unsurprising. The
thrust of this change is to not count client thread creation at all, as
opposed to counting it as part of connection establishment. Establishing
database connections is more expensive than creating threads, so it will
remain the dominant cost. Whether thread creation should be accounted
therewith seems arbitrary, so I'm inclined to leave it alone.
I'll post some further detail to the pgsql-bugs thread.
- Updated documentation & help message.
Include any relevant documentation and --help updates with the patch
necessitating them. If there are errors in the documentation today, put fixes
for those in a separate patch (7).I could do a separate patch for a one-liner doc update for homogeneity...
That's how we often do things. Thanks.
Additionally, this patch contains numerous whitespace-only changes.
Hmmm. This is emacs fault:-) I'll (try) to remove them.
Patch credit: "Emacs, with some tweaks by Fabien".
Sample output under benchmarking with --progress=1
progress: 36.0 s, 10.9 tps, 91.864 +- 124.874 ms lat
progress: 37.0 s, 115.2 tps, 8.678 +- 1.792 ms lat"x +- y" does not convey that y is a standard deviation.
I'm refering to the ± sign, but in ascii because I doubt that I can use a
UTF character:-) From http://en.wikipedia.org/wiki/Plus-minus_sign:"In experimental sciences, the sign commonly indicates the confidence
interval or error in a measurement, often the standard deviation or
standard error."ISTM that "pgbench" is experimental science.
Use of +-y as an abbreviation for σ=y seems defensible when the distribution
has less skew. The "91.864 +- 124.874" quoted above need not imply negative
latency measurements, yet a plain reading of +- would imply just that. (I
rarely read work in experimental science fields, so this objection could be
out of touch with widespread practice.)
I suggest getting the term "stddev" in there somehow, maybe like this:
progress: 37.0 s, 115.2 tps, latency avg=8.678 ms stddev=1.792
My issue is to try to keep the line width under control so as to avoid
line breaks in the terminal. Under throttling, there is also the time lag
appended to the line.Moreover, using 'xxx=figure" breaks simple "cut" pipelining to extract
the figures, so I would prefer to stick to spaces.Maybe:
progress: 36.0 s, 115.2 tps, lat avg 9.678 ms stddev 1.792, lag 0.143 ms
Works for me.
- int64 wait = (int64) - throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + int64 wait = (int64) (throttle_delay * + 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));This quantizes the random number, then dequantizes it, then corrects for the
resolution lost in the round trip.Yep!
Shouldn't it just be:
int64 wait = (int64) (throttle_delay *
-log(1 - pg_erand48(thread->random_state)));It could. However with this approach the inserted delay would be pretty
much unbounded. The benefit of using 10000 is that the delay is bounded
by about 10 times the expected average rate, so although it is throttled
by a stochastic process, it stays quite active and steady. Basically
using the quantization seemed a good compromise to me...
Ah; that makes sense. Then I understand why you want to remove the bias, but
why do you also increase the upper bound?
- if ((!use_quiet) && (j % 100000 == 0)) + if (!progress && (j % 100000 == 0))Project style is "progress == 0" rather than "!progress".
I could. I used "!progress" because "progress == 0" is interpreted as "no
progress", and not "progress with value zero", which would mean a report
at every possible time.
That's a good argument for doing it the way you did it.
@@ -2143,8 +2174,15 @@ printResults(int ttype, int normal_xacts, int nclients,
[...]This output only happens when using --rate. That has made sense
heretofore, because in the absence of rate limiting, average latency
can be trivially calculated from tps. Since we will now track stddev as
well, let's print something like this even when --rate is not in use.It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.
The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate. The progress reporting in
threadRun() does have two variations as you describe.
- float tps, total_run, latency; + double tps, total_run, latency, sqlat, stdev, lag;Why these variable type changes?
Because C computes stuff in double anyway (all maths functions such as
sqrt above are about double), and it was an oversight from my part to
have use float in the first place in the previous patch. Thus I fixed it.
Okay.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dear Noah,
Thanks for your answers and remarks,
[...]
I'll split some part of the patch where there is no coupling, but I do
not want to submit conflicting patches.
Those benefits aren't compelling enough to counterbalance the risk of
gettimeofday() overhead affecting results. (Other opinions welcome.)
Yep. If I really leave gettimeofday out, I cannot measure the stddev, so
I'll end up with:
--rate => gettimeofday, mean (& stddev) measured, because they
cannot be derived otherwise.
no --rate => no (or less) gettimeofday, mean computed from total time and
no stddev report because it cannot be computed.
That is annoying, because I do want the standard deviation, and this mean
one "if"s complexity here and there.
ISTM that when running under a target time, the (hopefully very small)
overhead is only one gettimeofday call, because the other one is taken
anyway to check whether the thread should stop.
Or I can add a yet another option, say --stddev, to ask for standard
deviation, which will imply the additional gettimeofday call...
For a tool like pgbench that requires considerable skill to use
effectively, changing a default only helps slightly. It doesn't take
much of a risk to make us better off leaving the default unchanged.
I can put a 0 default... but even very experienced people will be bitten
over and over. Why should I care? ISTM that the default should be the
safe option, and experienced user can request "-quiet" if they want it.
[...]
I tried to preserve the row-counting behavior because I thought that
someone would object otherwise, but I would be really in favor of
dropping the row-counting report behavior altogether and only keep the
time triggered report.I would be fine with dropping the row-counting behavior. But why subject this
distant side issue to its third round of bikeshedding in 18 months?
I was not involved:-)
The 100000 behavior is the initial & old version, and only applies to
initialization. Someone found it too verbose when scaling, and I agree, so
made a quick patch which preserves the old behavior (someone must have
said: whatever, do not change the default!) but allowed to switch to a
less noisy version, hence the -quiet which is not really quiet. This would
be the defective result of a compromise:-)
If I follow your request not to change any default, I cannot merge cleanly
the -i & bench behaviors, as currenty -i does have a default progress
report and its own -quiet, and benchmarking does not.
The current behavior is inconsistent. I would prefer something consistent,
preferably always show a progress report, and -quiet hides it (fully), or
if you really insist no progress report, and --progress shows it, and the
-quiet option is removed.
- Take thread start time at the beginning of the thread.
That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.[...] so I'm inclined to leave it alone.
I really spent *hours* debugging stupid measures on the previous round of
pgbench changes, when adding the throttling stuff. Having the start time
taken when the thread really starts is just sanity, and I needed that just
to rule out that it was the source of the "strange" measures.
-j 800 vs -j 100 : ITM that if I you create more threads, the time delay
incurred is cumulative, so the strangeness of the result should worsen.
800 threads ~ possibly 10 seconds of creation time, to be compared to a
few seconds of run time.
Shouldn't it just be:
int64 wait = (int64) (throttle_delay *
-log(1 - pg_erand48(thread->random_state)));[...]
Ah; that makes sense. Then I understand why you want to remove the bias, but
why do you also increase the upper bound?
Because the bias was significantly larger for 1000 (about 0.5%), so this
also contributed to reduce said bias, and 9.2 times the average target
seems as reasonnable a bound as 6.9.
It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate. The progress reporting in
threadRun() does have two variations as you describe.
Indeed, I took it for the progress report. I'll check. It must be
consistent whether under --rate or not.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate. The progress reporting in
threadRun() does have two variations as you describe.Indeed, I took it for the progress report. I'll check. It must be consistent
whether under --rate or not.
I checked.
The current status is that stddev is currently only available under
--progress or --rate, and displayed in the final report if --rate.
I can add stddev in the final report *if* progress was on, but not
otherwise.
This means that under standard benchmarking (no --rate and without
progress if it is not enabled by default) stddev cannot be shown.
This is consistent with your requirement that gettimeofday calls must be
avoided, but results in this necessary inconsistency in reporting results.
The alternative is to always measure, but that would imply to always call
gettimeofday, and I understood that it is a nogo for you. I think that
your performance concerns are needless:
I confirm a 50,000,000 gettimeofday calls/s on my laptop. On a desktop
workstation I have 43 millions calls/s. The slowest I have found on an old
server is 3.3 millions calls/s, that is 0.3 ᅵs per call.
Even so, this is to be compared to the fastest possible transaction
latency I got which is about 140 ᅵs (one local prepared in-memory table
select on my laptop), so the overhead 1/450 of the minimum possible
transaction time, pretty negligeable in my opinion. For transaction times
which involve disk accesses, the latency is in ms and the overhead well
under 1/1000.
There are some measures about gettimeofday overhead reported in:
http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html
The worst figure is for a Pentium 166 MHz (!), the overhead is 86.6 ᅵs.
However typical values are around or under 1 ᅵs, even for old Pentium II
350 MHz PCs, and this is less that 1% of our minimum possible transaction
time.
Note also that under option --report-latencies, the number of gettimeofday
calls is pretty large, and nobody complained.
So I'm of the opinion that we should not worry at all about the number of
gettimeofday calls in pgbench.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Split 1 of the initial submission.
pgbench: minor update of documentation & help message.
Use NUM in help message for homogeneity with other options. The target
*start* time of the transaction is set by the stochastic process which is
doing the throttling (--rate), not the end time.
--
Fabien.
Attachments:
pgbench-rate-doc.patchtext/x-diff; name=pgbench-rate-doc.patchDownload+2-2
Split 2 of the initial submission
pgbench: reduce and compensate throttling underestimation bias.
This is a consequence of relying on an integer random generator,
which allow to ensure that delays inserted stay reasonably in
range of the target average delay.
The bias was about 0.5% with 1000 distinct values. Multiplier added
to compensate the 0.05% bias with 10000 distinct integer values,
so there is no bias now.
--
Fabien.
Attachments:
pgbench-rate-bias.patchtext/x-diff; name=pgbench-rate-bias.patchDownload+7-3
Split 3 of the initial submission, which actually deal with data measured
and reported on stderr under various options.
This version currently takes into account many comments by Noah Mish. In
particular, the default "no report" behavior under benchmarking is not
changed, although I really think that it should. Also, the standard
deviation is only shown when available, which is not in under all
settings, because of concerns expressed about the cost of additional calls
to gettimeofday. ISTM that these concerned are misplaced in this
particular case.
Improve pgbench measurements & progress report
These changes are coupled because measures are changed, and their
reporting as well. Submitting separate patches for these interlinked
features would result in conflicting or dependent patches, so I wish to
avoid that. Also it would require more reviewer time.
- Use same progress and quiet options both under init & bench.
The reporting is every 5 seconds for initialization,
and currently no report for benchmarking.
I strongly suggest to change this default to 5 seconds for both,
if people do not veto any change of the default behavior...
The rational is that benchmarking is error prone as it must run
a long time to be significant because of warmup effects (esp on HDD,
less true on SSD). Seeing how the current performance evolve would
help pgbench users realise that. See down below a sample output.
The previous progress report under initialization, which printed a line
every 100,000 rows, is removed, as it is much to verbose for most
hardware, and pretty long for any significant scale.
- Measure transaction latency instead of computing it,
for --rate and --progress.
The previous computed figure does not make sense under throttling:
as sleep throttling time was included in the figures, the displayed
results were plain false.
The latency and its standard deviation are printed out under progress
and in the final report when available.
It could be made "always" available, but that would require to accept
additional gettimeofday calls. I do not think that there is a
performance issue here, but there is no concensus yet.
- Take thread start time at the beginning of the thread (!)
Otherwise it includes pretty slow thread/fork system start times in
the measurements. May help with bug #8326. This also helps with throttling,
as the start time is used to adjust the rate: if it is not the actual
start time, there is a bit of a rush at the beginning in order to
catch up. Taking the start time when the thread actually starts is
the sane thing to do to avoid surprises at possibly strange measures
on short runs.
Sample output under initialization with --progress=1
creating tables...
1126000 of 3000000 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s).
2106000 of 3000000 tuples (70%) done (elapsed 2.00 s, remaining 0.85 s).
2824000 of 3000000 tuples (94%) done (elapsed 3.00 s, remaining 0.19 s).
3000000 of 3000000 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s).
vacuum...
set primary keys...
done.
Sample output under benchmarking with --progress=1
progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat
progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
progress: 3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat
progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
progress: 5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat
...
progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
...
progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat
...
progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms lat
...
The "warmup" is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.
--
Fabien.
Attachments:
pgbench-measurements-v3.patchtext/x-diff; name=pgbench-measurements-v3.patchDownload+171-86
Fabien,
As long as you're hacking pgbench output, what about offering a JSON
option instead of the text output? I'm working on automating pgbench
performance testing, and having the output in a proper delimited format
would be really helpful.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMc4abc45051f60c78cd616766cb0b6d0b99c8b635e50b746b71ae5301ebd35f4fabe29330e34111b8cb159e4f50bbe942@asav-1.01.com
Hello Josh,
As long as you're hacking pgbench output, what about offering a JSON
option instead of the text output? I'm working on automating pgbench
performance testing, and having the output in a proper delimited format
would be really helpful.
I'm more a "grep | cut | ..." person, but I do understand your point about
automation. My favorite hack is to load benchmark data into sqlite and
perform query to build data extract for gnuplot figures.
So I can put that in the stack of things to do when I have some time.
Due to the possibly repetitive table structure of the data, maybe CSV
would make sense as well. It is less extensible, but it is directly usable
by sqlite or pg.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/22/13 2:58 PM, Fabien wrote:
Split 3 of the initial submission, which actually deal with data
measured and reported on stderr under various options.
It seems this patch doesn't apply. Does it need the first two applied
first?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dear Peter,
Split 3 of the initial submission, which actually deal with data
measured and reported on stderr under various options.It seems this patch doesn't apply. Does it need the first two applied
first?
Oops. Indeed. The patch is fully independent of the two others. It was
generated with "git diff -b master", because I was told not to change
spacing, but then it must be applied with "patch -l -p1".
Please find attached a new version without this requirement.
postgresql> git branch test
postgresql> git checkout test
Switched to branch 'test'
postgresql> patch -p1 < ../pgbench-measurements-v4.patch
patching file contrib/pgbench/pgbench.c
patching file doc/src/sgml/pgbench.sgml
--
Fabien.
Attachments:
pgbench-measurements-v4.patchtext/x-diff; charset=us-ascii; name=pgbench-measurements-v4.patchDownload+171-86
On Sat, Sep 21, 2013 at 4:55 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
- Use progress option both under init & bench.
Activate progress report by default, every 5 seconds.
When initializing, --quiet reverts to the old every 100,000 insertions
behavior...Patch (1): Change the default from --progress=0 to --progress=5
This has a disadvantage of causing two extra gettimeofday() calls per
transaction. That's not cheap on all platforms; a user comparing pgbench
results across versions will need to make a point of disabling this again to
make his results comparable. Given that threat and uncertainty about which
default would be more popular, I think we should drop this part.The gettimeofday call time is very small compared to network and database
(disk!) accesses involved in a pgbench run. On my small laptop, gettimeofday
takes about 0.00000002 seconds (0.02 µs -- micro seconds) per call, which is
typically under 1/1000 of the fastest local-socket read-only one-table
cache-hit prepared trivial transaction.This is a little different when initializing, but I put a guard on
gettimeofday in that case.If this is a blocker, I can put 0, but I really do not think it is necessary
because of the performance impact.
I really don't think it's a good idea to change the default behavior.
We've had many discussions about how the overhead of gettimeofday()
can sometimes be surprisingly large; I don't think we should assume
it's guaranteed to be small in this case. Also, changing established
defaults will annoy users who like to present defaults; I don't see
any reason to assume that your preferences will be universal. In
doubtful cases we should favor leaving the behavior the way it's been
in previous releases, because backward-compatibility is very
desirable.
Patch (2): Make --initialize mode respect --progress.
The definition you've chosen for --quiet makes that option contrary to its
own
name: message-per-100k-tuples is typically more voluminous than your
proposed
new default of message-per-5s. In fact, since --quiet currently switches
from
message-per-100k-tuples to message-per-5s, your patch gives it the exact
opposite of its present effect.During the 9.3 development cycle, we _twice_ made changes pertaining to
--initialize message frequency:/messages/by-id/20120620.170427.347012755716399568.t-ishii@sraoss.co.jp
/messages/by-id/1346472039.18010.10.camel@vanquo.pezone.net
That gives me pause about working through yet another change; we keep
burning
time on this minor issue.I totally agree that this "quiet" is not convincing!
My actual opinion is that quiet should just mean quiet:-), i.e. no progress
report.I tried to preserve the row-counting behavior because I thought that someone
would object otherwise, but I would be really in favor of dropping the
row-counting report behavior altogether and only keep the time triggered
report.
-1 for changing this again. Frankly, I might have come down in a
different place if I had understood exactly how this was going to end
up being commited; it ended up being quite different from what I was
expecting. But I really think that relitigating this just so we can
break backward compatibility again one release later is not a good use
of anyone's time, or a good idea in general.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 22, 2013 at 08:44:08PM +0200, Fabien COELHO wrote:
pgbench: minor update of documentation & help message.
Use NUM in help message for homogeneity with other options. The target
*start* time of the transaction is set by the stochastic process which is
doing the throttling (--rate), not the end time.
Committed.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 22, 2013 at 08:46:55PM +0200, Fabien wrote:
pgbench: reduce and compensate throttling underestimation bias.
This is a consequence of relying on an integer random generator,
which allow to ensure that delays inserted stay reasonably in
range of the target average delay.The bias was about 0.5% with 1000 distinct values. Multiplier added
to compensate the 0.05% bias with 10000 distinct integer values,
so there is no bias now.
--- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -929,13 +929,17 @@ top: * that the series of delays will approximate a Poisson distribution * centered on the throttle_delay time. * - * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier. + * 10000 implies a 9.2 (-log(1/10000)) to 0.0 (log 1) delay multiplier, + * and results in a 0.055 % target underestimation bias: + * + * SELECT 1.0/AVG(-LN(i/10000.0)) FROM generate_series(1,10000) AS i; + * = 1.000552717032611116335474 * * If transactions are too slow or a given wait is shorter than * a transaction, the next transaction will start right away. */ - int64 wait = (int64) - throttle_delay * -log(getrand(thread, 1, 1000)/1000.0); + int64 wait = (int64) (throttle_delay * + 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));thread->throttle_trigger += wait;
On Sun, Sep 22, 2013 at 10:08:54AM +0200, Fabien COELHO wrote:
Then I understand why you want to remove the bias, but
why do you also increase the upper bound?Because the bias was significantly larger for 1000 (about 0.5%), so this
also contributed to reduce said bias, and 9.2 times the average target
seems as reasonnable a bound as 6.9.
The magnitude of the bias is unimportant if known exactly, as you do here.
Smaller quanta do give you more possible distinct sleep durations, which is a
nice bonus. What I'm hearing is that you don't especially want to change the
upper bound, but changing it was an acceptable side effect. However, we can
meet all those goals simultaneously with simpler code, can we not?
int64 wait = (int64) (throttle_delay *
Min(7.0, -log(1 - pg_erand48(thread->random_state))));
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers