[PATCH] pgbench --throttle (submission 4)
Minor changes wrt to the previous submission, so as to avoid running some
stuff twice under some conditions. This is for reference to the next
commit fest.
--
Fabien.
Attachments:
pgbench-throttle.patchtext/x-diff; name=pgbench-throttle.patchDownload+96-4
Hi,
It would be better to submit updated versions of a patch on the email
thread it is dedicated to and not create a new thread so as people can
easily follow the progress you are doing.
Thanks,
--
Michael
Simpler version of 'pgbench --throttle' by handling throttling at the
beginning of the transaction instead of doing it at the end.
This is for reference to the next commitfest.
--
Fabien.
Attachments:
pgbench-throttle.patchtext/x-diff; name=pgbench-throttle.patchDownload+83-1
New submission which put option help in alphabetical position, as
per Peter Eisentraut f0ed3a8a99b052d2d5e0b6153a8907b90c486636
This is for reference to the next commitfest.
--
Fabien.
Attachments:
pgbench-throttle.patchtext/x-diff; name=pgbench-throttle.patchDownload+83-1
New submission for the next commit fest.
This new version also reports the average lag time, i.e. the delay between
scheduled and actual transaction start times. This may help detect whether
things went smothly, or if at some time some delay was introduced because
of the load and some catchup was done afterwards.
Question 1: should it report the maximum lang encountered?
Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.
--
Fabien.
Attachments:
pgbench-throttle.patchtext/x-diff; name=pgbench-throttle.patchDownload+114-5
On 6/1/13 5:00 AM, Fabien COELHO wrote:
Question 1: should it report the maximum lang encountered?
I haven't found the lag measurement to be very useful yet, outside of
debugging the feature itself. Accordingly I don't see a reason to add
even more statistics about the number outside of testing the code. I'm
seeing some weird lag problems that this will be useful for though right
now, more on that a few places below.
Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.
This is getting too far ahead. Let's get the throttle part nailed down
before introducing even more moving parts into this. I've attached an
updated patch that changes a few things around already. I'm not done
with this yet and it needs some more review before commit, but it's not
too far away from being ready.
This feature works quite well. On a system that will run at 25K TPS
without any limit, I did a run with 25 clients and a rate of 400/second,
aiming at 10,000 TPS, and that's what I got:
number of clients: 25
number of threads: 1
duration: 60 s
number of transactions actually processed: 599620
average transaction lag: 0.307 ms
tps = 9954.779317 (including connections establishing)
tps = 9964.947522 (excluding connections establishing)
I never thought of implementing the throttle like this before, but it
seems to work out well so far. Check out tps.png to see the smoothness
of the TPS curve (the graphs came out of pgbench-tools. There's a
little more play outside of the target than ideal for this case. Maybe
it's worth tightening the Poisson curve a bit around its center?
The main implementation issue I haven't looked into yet is why things
can get weird at the end of the run. See the latency.png graph attached
and you can see what I mean.
I didn't like the naming on this option or all of the ways you could
specify the delay. None of those really added anything, since you can
get every other behavior by specifying a non-integer TPS. And using the
word "throttle" inside the code is fine, but I didn't like exposing that
implementation detail more than it had to be.
What I did instead was think of this as a transaction rate target, which
makes the help a whole lot simpler:
-R SPEC, --rate SPEC
target rate per client in transactions per second
Made the documentation easier to write too. I'm not quite done with
that yet, the docs wording in this updated patch could still be better.
I personally would like this better if --rate specified a *total* rate
across all clients. However, there are examples of both types of
settings in the program already, so there's no one precedent for which
is right here. -t is per-client and now -R is too; I'd prefer it to be
like -T instead. It's not that important though, and the code is
cleaner as it's written right now. Maybe this is better; I'm not sure.
I did some basic error handling checks on this and they seemed good, the
program rejects target rates of <=0.
On the topic of this weird latency spike issue, I did see that show up
in some of the results too. Here's one where I tried to specify a rate
higher than the system can actually handle, 80000 TPS total on a
SELECT-only test
$ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 761779
average transaction lag: 10298.380 ms
tps = 25392.312544 (including connections establishing)
tps = 25397.294583 (excluding connections establishing)
It was actually limited by the capabilities of the hardware, 25K TPS.
10298 ms of lag per transaction can't be right though.
Some general patch submission suggestions for you as a new contributor:
-When re-submitting something with improvements, it's a good idea to add
a version number to the patch so reviewers can tell them apart easily.
But there is no reason to change the subject line of the e-mail each
time. I followed that standard here. If you updated this again I would
name the file pgbench-throttle-v9.patch but keep the same e-mail subject.
-There were some extra carriage return characters in your last
submission. Wasn't a problem this time, but if you can get rid of those
that makes for a better patch.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Hello Greg,
Thanks for this very detailed review and the suggestions!
I'll submit a new patch
Question 1: should it report the maximum lang encountered?
I haven't found the lag measurement to be very useful yet, outside of
debugging the feature itself. Accordingly I don't see a reason to add even
more statistics about the number outside of testing the code. I'm seeing
some weird lag problems that this will be useful for though right now, more
on that a few places below.
I'll explain below why it is really interesting to get this figure, and
that it is not really available as precisely elsewhere.
Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.This is getting too far ahead.
Ok!
Let's get the throttle part nailed down before introducing even more
moving parts into this. I've attached an updated patch that changes a
few things around already. I'm not done with this yet and it needs some
more review before commit, but it's not too far away from being ready.
Ok. I'll submit a new version by the end of the week.
This feature works quite well. On a system that will run at 25K TPS without
any limit, I did a run with 25 clients and a rate of 400/second, aiming at
10,000 TPS, and that's what I got:number of clients: 25
number of threads: 1
duration: 60 s
number of transactions actually processed: 599620
average transaction lag: 0.307 ms
tps = 9954.779317 (including connections establishing)
tps = 9964.947522 (excluding connections establishing)I never thought of implementing the throttle like this before,
Stochastic processes are a little bit magic:-)
but it seems to work out well so far. Check out tps.png to see the
smoothness of the TPS curve (the graphs came out of pgbench-tools.
There's a little more play outside of the target than ideal for this
case. Maybe it's worth tightening the Poisson curve a bit around its
center?
The point of a Poisson distribution is to model random events the kind of
which are a little bit irregular, such as web requests or queuing clients
at a taxi stop. I cannot really change the formula, but if you want to
argue with Simᅵon Denis Poisson, hist current address is 19th section of
"Pᅵre Lachaise" graveyard in Paris:-)
More seriously, the only parameter that can be changed is the "1000000.0"
which drives the granularity of the Poisson process. A smaller value would
mean a smaller potential multiplier; that is how far from the average time
the schedule can go. This may come under "tightening", although it would
depart from a "perfect" process and possibly may be a little less
"smooth"... for a given definition of "tight", "perfect" and "smooth":-)
[...] What I did instead was think of this as a transaction rate target,
which makes the help a whole lot simpler:-R SPEC, --rate SPEC
target rate per client in transactions per second
Ok, I'm fine with this name.
Made the documentation easier to write too. I'm not quite done with that
yet, the docs wording in this updated patch could still be better.
I'm not an English native speaker, any help is welcome here. I'll do my
best.
I personally would like this better if --rate specified a *total* rate across
all clients.
Ok, I can do that, with some reworking so that the stochastic process is
shared by all threads instead of being within each client. This mean that
a lock between threads to access some variables, which should not impact
the test much. Another option is to have a per-thread stochastic process.
However, there are examples of both types of settings in the
program already, so there's no one precedent for which is right here. -t is
per-client and now -R is too; I'd prefer it to be like -T instead. It's not
that important though, and the code is cleaner as it's written right now.
Maybe this is better; I'm not sure.
I like the idea of just one process instead of a per-client one. I did not
try at the beginning because the implementation is less straightforward.
On the topic of this weird latency spike issue, I did see that show up in
some of the results too.
Your example illustrates *exactly* why the lag measure was added.
The Poisson processes generate an ideal event line (that is irregularly
scheduled transaction start times targetting the expected tps) which
induces a varrying load that the database is trying to handle.
If it cannot start right away, this means that some transactions are
differed with respect to their schedule start time. The measure latency
reports exactly that: the clients do not handle the load. There may be
some catchup later, that is the clients come back in line with the
scheduled transactions.
I need to put this measure here because the "schedluled time" is only
known to pgbench and not available elsewhere. The max would really be more
interesting than the mean, so as to catch that some things were
temporarily amiss, even if things went back to nominal later.
Here's one where I tried to specify a rate higher
than the system can actually handle, 80000 TPS total on a SELECT-only test$ pgbench -S -T 30 -c 8 -j 4 -R10000tps pgbench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 761779
average transaction lag: 10298.380 ms
The interpretation is the following: as the database cannot handle the
load, transactions were processed on average 10 seconds behind their
scheduled transaction time. You had on average a 10 second latency to
answer "incoming" requests. Also some transactions where implicitely not
even scheduled, so the situation is worse than that...
tps = 25392.312544 (including connections establishing)
tps = 25397.294583 (excluding connections establishing)It was actually limited by the capabilities of the hardware, 25K TPS. 10298
ms of lag per transaction can't be right though.Some general patch submission suggestions for you as a new contributor:
Hmmm, I did a few things such as "pgxs" back in 2004, so maybe "not very
active" is a better description than "new":-)
-When re-submitting something with improvements, it's a good idea to add a
version number to the patch so reviewers can tell them apart easily. But
there is no reason to change the subject line of the e-mail each time. I
followed that standard here. If you updated this again I would name the file
pgbench-throttle-v9.patch but keep the same e-mail subject.
Ok.
-There were some extra carriage return characters in your last submission.
Wasn't a problem this time, but if you can get rid of those that makes for a
better patch.
Ok.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is submission v9 based on your v8 version.
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the "fork" emulation
- I do not know wheter this works with Win32 pthread stuff.
- reduced multiplier ln(1000000) -> ln(1000)
- avg & max throttling lag are reported
There's a little more play outside of the target than ideal for this
case. Maybe it's worth tightening the Poisson curve a bit around its
center?
A stochastic process moves around the target value, but is not right on
it.
--
Fabien.
Attachments:
pgbench-throttle-v9.patchtext/x-diff; name=pgbench-throttle-v9.patchDownload+139-6
On 6/10/13 6:02 PM, Fabien COELHO wrote:
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the "fork" emulation
- I do not know wheter this works with Win32 pthread stuff.
Instead of this complexity, can we just split the TPS input per client?
That's all I was thinking of here, not adding a new set of threading
issues. If 10000 TPS is requested and there's 10 clients, just set the
delay so that each of them targets 1000 TPS.
I'm guessing it's more accurate to have them all communicate as you've
done here, but it seems like a whole class of new bugs and potential
bottlenecks could come out of that. Whenever someone touches the
threading model for pgbench it usually gives a stack of build farm
headaches. Better to avoid those unless there's really a compelling
reason to go through that.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
- the tps is global, with a mutex to share the global stochastic process
- there is an adaptation for the "fork" emulation
- I do not know wheter this works with Win32 pthread stuff.Instead of this complexity,
Well, the mutex impact is very localized in the code. The complexity is
more linked to the three "thread" implementations intermixed there.
can we just split the TPS input per client?
Obviously it is possible. Note that it is more logical to do that per
thread. I did one shared stochastic process because it makes more sense to
have just one.
That's all I was thinking of here, not adding a new set of threading issues.
If 10000 TPS is requested and there's 10 clients, just set the delay so that
each of them targets 1000 TPS.
Ok, so I understand that a mutex is too much!
I'm guessing it's more accurate to have them all communicate as you've done
here, but it seems like a whole class of new bugs and potential bottlenecks
could come out of that.
I do not think that there is a performance or locking contension issue: it
is about getting a mutex for a section which performs one integer add and
two integer assignements, that is about 3 instructions, to be compared
with the task of performing database operations over the network. There
are several orders of magnitudes between those tasks. It would need a more
than terrible mutex implementation to have any significant impact.
Whenever someone touches the threading model for pgbench it usually
gives a stack of build farm headaches. Better to avoid those unless
there's really a compelling reason to go through that.
I agree that the threading model in pgbench is a mess, mostly because of
the 3 concurrent implementations intermixed in the code. Getting rid of
the fork emulation and win32 special handling and only keeping the pthread
implementation, which seems to be available even on windows, would be a
relief. I'm not sure if there is still a rationale to have these 3
implementations, but it ensures a maintenance mess:-(
I'll submit a version without mutex, but ISTM that this one is
conceptually cleaner, although I'm not sure about what happens on windows.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Submission 10:
- per thread throttling instead of a global process with a mutex.
this avoids a mutex, and the process is shared between clients
of a given thread.
- ISTM that there "thread start time" should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure thread creation performance...
I've added a comment suggesting where it should be put instead,
first thing in threadRun.
--
Fabien.
Attachments:
pgbench-throttle-v10.patchtext/x-diff; name=pgbench-throttle-v10.patchDownload+125-4
On 6/11/13 4:11 PM, Fabien COELHO wrote:
- ISTM that there "thread start time" should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure thread creation performance...
I noticed that, but it seemed a pretty minor issue. Did you look at the
giant latency spikes at the end of the test run I submitted the graph
for? I wanted to nail down what was causing those before worrying about
the startup timing.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
- ISTM that there "thread start time" should be initialized at the
beginning of threadRun instead of in the loop *before* thread creation,
otherwise the thread creation delays are incorporated in the
performance measure, but ISTM that the point of pgbench is not to
measure thread creation performance...I noticed that, but it seemed a pretty minor issue.
Not for me, because the "max lag" measured in my first version was really
the threads creation time, not very interesting.
Did you look at the giant latency spikes at the end of the test run I
submitted the graph for?
I've looked at the graph you sent. I must admit that I did not understand
exactly what is measured and where it is measured. Because of its position
at the end of the run, I thought of some disconnection related effects
when pgbench run is interrupted by a time out signal, so some things are
done more slowly. Fine with me, we are stopping anyway, and out of the
steady state.
I wanted to nail down what was causing those before worrying about the
startup timing.
Well, the short answer is that I'm not worried by that, for the reason
explained above. I would be worried if it was anywhere else but where the
transactions are interrupted, the connections are closed and the threads
are stopped. I may be wrong:-)
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Did you look at the giant latency spikes at the end of the test run I
submitted the graph for? I wanted to nail down what was causing those
before worrying about the startup timing.
If you are still worried: if you run the very same command without
throttling and measure the same latency, does the same thing happens at
the end? My guess is that it should be "yes". If it is no, I'll try out
pgbench-tools.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/12/13 3:19 AM, Fabien COELHO wrote:
If you are still worried: if you run the very same command without
throttling and measure the same latency, does the same thing happens at
the end? My guess is that it should be "yes". If it is no, I'll try out
pgbench-tools.
It looks like it happens rarely for one client without the rate limit,
but that increases to every time for multiple client with limiting in
place. pgbench-tools just graphs the output from the latency log.
Here's a setup that runs the test I'm doing:
$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ pgbench -S -c 25 -T 30 -l pgbench && tail -n 40 pgbench_log*
Sometimes there's no slow entries. but I've seen this once so far:
0 21822 1801 0 1371217462 945264
1 21483 1796 0 1371217462 945300
8 20891 1931 0 1371217462 945335
14 20520 2084 0 1371217462 945374
15 20517 1991 0 1371217462 945410
16 20393 1928 0 1371217462 945444
17 20183 2000 0 1371217462 945479
18 20277 2209 0 1371217462 945514
23 20316 2114 0 1371217462 945549
22 20267 250128 0 1371217463 193656
The third column is the latency for that transaction. Notice how it's a
steady ~2000 us except for the very last transaction, which takes
250,128 us. That's the weird thing; these short SELECT statements
should never take that long. It suggests there's something weird
happening with how the client exits, probably that its latency number is
being computed after more work than it should.
Here's what a rate limited run looks like for me. Note that I'm still
using the version I re-submitted since that's where I ran into this
issue, I haven't merged your changes to split the rate among each client
here--which means this is 400 TPS per client == 10000 TPS total:
$ pgbench -S -c 25 -T 30 -R 400 -l pgbench && tail -n 40 pgbench_log
7 12049 2070 0 1371217859 195994
22 12064 2228 0 1371217859 196115
18 11957 1570 0 1371217859 196243
23 12130 989 0 1371217859 196374
8 11922 1598 0 1371217859 196646
11 12229 4833 0 1371217859 196702
21 11981 1943 0 1371217859 196754
20 11930 1026 0 1371217859 196799
14 11990 13119 0 1371217859 208014
^^^ fast section
vvv delayed section
1 11982 91926 0 1371217859 287862
2 12033 116601 0 1371217859 308644
6 12195 115957 0 1371217859 308735
17 12130 114375 0 1371217859 308776
0 12026 115507 0 1371217859 308822
3 11948 118228 0 1371217859 308859
4 12061 113484 0 1371217859 308897
5 12110 113586 0 1371217859 308933
9 12032 117744 0 1371217859 308969
10 12045 114626 0 1371217859 308989
12 11953 113372 0 1371217859 309030
13 11883 114405 0 1371217859 309066
15 12018 116069 0 1371217859 309101
16 11890 115727 0 1371217859 309137
19 12140 114006 0 1371217859 309177
24 11884 115782 0 1371217859 309212
There's almost 90,000 usec of latency showing up between epoch time
1371217859.208014 and 1371217859.287862 here. What's weird about it is
that the longer the test runs, the larger the gap is. If collecting the
latency data itself caused the problem, that would make sense, so maybe
this is related to flushing that out to disk.
If you want to look just at the latency numbers without the other
columns in the way you can use:
cat pgbench_log.* | awk {'print $3'}
That is how I was evaluating the smoothness of the rate limit, by
graphing those latency values. pgbench-tools takes those and a derived
TPS/s number and plots them, which made it easier for me to spot this
weirdness. But I've already moved onto analyzing the raw latency data
instead, I can see the issue without the graph once I've duplicated the
conditions.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I don't have this resolved yet, but I think I've identified the cause.
Updating here mainly so Fabien doesn't duplicate my work trying to track
this down. I'm going to keep banging at this until it's resolved now
that I got this far.
Here's a slow transaction:
1371226017.568515 client 1 executing \set naccounts 100000 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts
WHERE aid = 268721;
1371226017.789816 client 1 receiving
That confirms it is getting stuck at the "throttling" step. Looks like
the code pauses there because it's trying to overload the "sleeping"
state that was already in pgbench, but handle it in a special way inside
of doCustom(), and that doesn't always work.
The problem is that pgbench doesn't always stay inside doCustom when a
client sleeps. It exits there to poll for incoming messages from the
other clients, via select() on a shared socket. It's not safe to assume
doCustom will be running regularly; that's only true if clients keep
returning messages.
So as long as other clients keep banging on the shared socket, doCustom
is called regularly, and everything works as expected. But at the end
of the test run that happens less often, and that's when the problem
shows up.
pgbench already has a "\sleep" command, and the way that delay is
handled happens inside threadRun() instead. The pausing of the rate
limit throttle needs to operate in the same place. I have to redo a few
things to confirm this actually fixes the issue, as well as look at
Fabien's later updates to this since I wandered off debugging. I'm sure
it's in the area of code I'm poking at now though.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
pgbench already has a "\sleep" command, and the way that delay is
handled happens inside threadRun() instead. The pausing of the rate
limit throttle needs to operate in the same place.
It does operate at the same place. The throttling is performed by
inserting a "sleep" first thing when starting a new transaction. So if
their is a weirdness, it should show as well without throttling but with a
fixed \sleep instead?
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Greg,
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.
One issue is that the throttling time was included in the measure, but not
the first time because "txn_begin" is not set at the beginning of
doCustom.
Also, flag st->listen is set to 1 but *never* set back to 0...
sh> grep listen pgbench.c
int listen;
if (st->listen)
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;
ISTM that I can fix the "weirdness" by inserting an ugly "goto top;", but
I would feel better about it by removing all gotos and reordering some
actions in doCustom in a more logical way. However that would be a bigger
patch.
Please find attached 2 patches:
- the first is the full throttle patch which ensures that the
txn_begin is taken at a consistent point, after throttling,
which requires resetting "listen". There is an ugly goto.
I've also put times in a consistent format in the log,
"789.012345" instead of "789 12345".
- the second patch just shows the diff between v10 and the first one.
--
Fabien.
On 6/14/13 3:50 PM, Fabien COELHO wrote:
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.
measurement times, no; interactions with throttling, no. If it was
either of those I'd have finished this off days ago. Latent bugs,
possibly. We may discover there's nothing wrong with your code at the
end here, that it just makes hitting this bug more likely.
Unfortunately today is the day *some* bug is popping up, and I want to
get it squashed before I'll be happy.
The lag is actually happening during a kernel call that isn't working as
expected. I'm not sure whether this bug was there all along if \sleep
was used, or if it's specific to the throttle sleep.
Also, flag st->listen is set to 1 but *never* set back to 0...
I noticed that st-listen was weird too, and that's on my short list of
suspicious things I haven't figured out yet.
I added a bunch more logging as pgbench steps through its work to track
down where it's stuck at. Until the end all transactions look like this:
1371238832.084783 client 10 throttle lag 2 us
1371238832.084783 client 10 executing \setrandom aid 1 :naccounts
1371238832.084803 client 10 sending SELECT abalance FROM
pgbench_accounts WHERE aid = 753099;
1371238832.084840 calling select
1371238832.086539 client 10 receiving
1371238832.086539 client 10 finished
All clients who hit lag spikes at the end are going through this
sequence instead:
1371238832.085912 client 13 throttle lag 790 us
1371238832.085912 client 13 executing \setrandom aid 1 :naccounts
1371238832.085931 client 13 sending SELECT abalance FROM
pgbench_accounts WHERE aid = 564894;
1371238832.086592 client 13 receiving
1371238832.086662 calling select
1371238832.235543 client 13 receiving
1371238832.235543 client 13 finished
Note the "calling select" here that wasn't in the normal length
transaction before it. The client is receiving something here, but
rather than it finishing the transaction it falls through and ends up at
the select() system call outside of doCustom. All of the clients that
are sleeping when the system slips into one of these long select() calls
are getting stuck behind it. I'm not 100% sure, but I think this only
happens when all remaining clients are sleeping.
Here's another one, it hits the receive that doesn't finish the
transaction earlier (1371238832.086587) but then falls into the same
select() call at 1371238832.086662:
1371238832.085884 client 12 throttle lag 799 us
1371238832.085884 client 12 executing \setrandom aid 1 :naccounts
1371238832.085903 client 12 sending SELECT abalance FROM
pgbench_accounts WHERE aid = 299080;
1371238832.086587 client 12 receiving
1371238832.086662 calling select
1371238832.231032 client 12 receiving
1371238832.231032 client 12 finished
Investigation is still going here...
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.measurement times, no; interactions with throttling, no. If it was either of
those I'd have finished this off days ago. Latent bugs, possibly. We may
discover there's nothing wrong with your code at the end here,
To summarize my point: I think my v10 code does not take into account all
of the strangeness in doCustom, and I'm pretty sure that there is no point
in including thottle sleeps into latency measures, which was more or less
the case. So it is somehow a "bug" which only shows up if you look at the
latency measures, but the tps are fine.
that it just makes hitting this bug more likely. Unfortunately today is
the day *some* bug is popping up, and I want to get it squashed before
I'll be happy.The lag is actually happening during a kernel call that isn't working as
expected. I'm not sure whether this bug was there all along if \sleep was
used, or if it's specific to the throttle sleep.
The throttle sleep is inserted out of the state machine. That is why in
the "test" patch I added a goto to ensure that it is always taken at the
right time, that is when state==0 and before txn_begin is set, and not
possibly between other states when doCustom happens to be recalled after a
return.
I added a bunch more logging as pgbench steps through its work to track down
where it's stuck at. Until the end all transactions look like this:1371238832.084783 client 10 throttle lag 2 us
1371238832.084783 client 10 executing \setrandom aid 1 :naccounts
1371238832.084803 client 10 sending SELECT abalance FROM pgbench_accounts
WHERE aid = 753099;
1371238832.084840 calling select
1371238832.086539 client 10 receiving
1371238832.086539 client 10 finishedAll clients who hit lag spikes at the end are going through this sequence
instead:1371238832.085912 client 13 throttle lag 790 us
1371238832.085912 client 13 executing \setrandom aid 1 :naccounts
1371238832.085931 client 13 sending SELECT abalance FROM pgbench_accounts
WHERE aid = 564894;
1371238832.086592 client 13 receiving
1371238832.086662 calling select
1371238832.235543 client 13 receiving
1371238832.235543 client 13 finished
Note the "calling select" here that wasn't in the normal length transaction
before it. The client is receiving something here, but rather than it
finishing the transaction it falls through and ends up at the select() system
call outside of doCustom. All of the clients that are sleeping when the
system slips into one of these long select() calls are getting stuck behind
it. I'm not 100% sure, but I think this only happens when all remaining
clients are sleeping.
Note: in both the slow cases there is a "receiving" between "sending" and
"select". This suggests that the "goto top" at the very end of doCustom is
followed in one case but not the other.
ISTM that there is a timeout passed to select which is computed based on
the current sleeping time of each client. I'm pretty sure that not a well
tested path...
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers