gaussian distribution pgbench
Hello,
I create gaussinan distribution pgbench patch that can access records with
gaussian frequency. And I submit this commit fest.
* Purpose this patch
In the general transaction situation, clients access for all records equally is
hard to happen. I think gaussian distribution access patterns are most of
transaction petterns in general. My patch realizes neary this access pattern.
I think that not only it can simulate a general access pattern as an effect of
this patch, but also it is useful for new development features such as effective
use and free of shared_buffers, the readahead optimization in the OS, and the
speed-up of the tuple level lock.
* Usage
It is easy to use, only put -g with standard deviation threshold parameter.
If we set larger standard deviation threshold, pgbench access patern limited
more specific records. Min standard deviation threshold is 2.
Execution example command is here.
[mitsu-ko@localhost postgresql]$ bin/pgbench -g 10 -c 16 -j 8 -T 300
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
standard deviation threshold: 10.00000
access probability of top 20%, 10% and 5% records: 0.95450 0.68269 0.38292
query mode: simple
number of clients: 16
number of threads: 8
duration: 300 s
number of transactions actually processed: 566367
tps = 1887.821409 (including connections establishing)
tps = 1887.949390 (excluding connections establishing)
"access probability" indicates top N access probability in this benchmark.
If we set larger standard deviation threshold parameter, it become more large.
Attached png files which are "gausian_2.png" and "gaussian_10.png" indicate
gaussian distribution access patern by my patch. "no_gaussian.png" is not with -g
option (normal). I think my patch realize gaussian distribution access patern.
* Approach
It replaces uniform random number generator to gaussian distribution random
number generator using by box-muller tansform method. Then, I use standard
deviation threshold parameter for mapping a normal distribution access pattern in
each record and normalization. It is linear mappping method that is a floating
point to an integer value.
* other
I also create another patches that can get more accurate benchmark result in
pgbench, and will submit them this commit fest. They are like that I submitted
checkpoint patch in the past. They are all right, too!
Any question?
Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Attachments:
gaussian_pgbench_v0.patchtext/x-diff; name=gaussian_pgbench_v0.patchDownload
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..77f60ae 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -40,6 +40,7 @@
#include <ctype.h>
#include <math.h>
#include <signal.h>
+#include <limits.h>
#ifndef WIN32
#include <sys/time.h>
@@ -175,6 +176,8 @@ int progress_nclients = 0; /* number of clients for progress report */
bool is_connect; /* establish connection for each transaction */
bool is_latencies; /* report per-command latencies */
int main_pid; /* main process id used in log filename */
+bool use_gaussian = false; /* use gaussian distribution benchmark */
+double stdev_threshold = 5; /* standard deviation threshold */
char *pghost = "";
char *pgport = "";
@@ -360,6 +363,7 @@ usage(void)
" -D, --define=VARNAME=VALUE\n"
" define variable for use by custom script\n"
" -f, --file=FILENAME read transaction script from FILENAME\n"
+ " -g, --gaussian=NUM gaussian distribution benchmark with NUM standard deviation threshold\n"
" -j, --jobs=NUM number of threads (default: 1)\n"
" -l, --log write transaction times to log file\n"
" -M, --protocol=simple|extended|prepared\n"
@@ -471,7 +475,27 @@ getrand(TState *thread, int64 min, int64 max)
* protected by a mutex, and therefore a bottleneck on machines with many
* CPUs.
*/
- return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+ double rand = pg_erand48(thread->random_state);
+ double rand1;
+ double rand2;
+ double stdev;
+
+ if(!use_gaussian)
+ return min + (int64) ((max - min + 1) * rand);
+
+ /* generate gaussian distribution */
+ rand1 = (rand * (LONG_MAX - 1.0) + 0.5) / LONG_MAX;
+ do
+ {
+ rand2 = pg_erand48(thread->random_state);
+ /* Box-Muller transform */
+ stdev = sqrt(-2.0 * log(rand1)) * sin(2.0 * M_PI * rand2);
+ }while( stdev < (-1.0 * stdev_threshold) || stdev > stdev_threshold);
+
+ /* normalization */
+ rand = (stdev + stdev_threshold) / (stdev_threshold * 2.0);
+
+ return min + (int64) (max - min + 1) * rand ;
}
/* call PQexec() and exit() on failure */
@@ -935,7 +959,7 @@ top:
* a transaction, the next transaction will start right away.
*/
int64 wait = (int64)
- throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+ throttle_delay * - log((int64) ((pg_erand48(thread->random_state) * (LONG_MAX - 1.0) + 0.5) / LONG_MAX));
thread->throttle_trigger += wait;
@@ -2119,6 +2143,15 @@ printResults(int ttype, int normal_xacts, int nclients,
printf("transaction type: %s\n", s);
printf("scaling factor: %d\n", scale);
+ if(use_gaussian)
+ {
+ printf("standard deviation threshold: %.5f\n", stdev_threshold);
+ printf("access probability of top 20%%, 10%% and 5%% records: %.5f %.5f %.5f\n",
+ (double) ((erf (stdev_threshold * 0.2 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ (double) ((erf (stdev_threshold * 0.1 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ (double) ((erf (stdev_threshold * 0.05 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0))))
+ );
+ }
printf("query mode: %s\n", QUERYMODE[querymode]);
printf("number of clients: %d\n", nclients);
printf("number of threads: %d\n", nthreads);
@@ -2208,6 +2241,7 @@ main(int argc, char **argv)
{"define", required_argument, NULL, 'D'},
{"file", required_argument, NULL, 'f'},
{"fillfactor", required_argument, NULL, 'F'},
+ {"gaussian", required_argument, NULL, 'g'},
{"host", required_argument, NULL, 'h'},
{"initialize", no_argument, NULL, 'i'},
{"jobs", required_argument, NULL, 'j'},
@@ -2301,7 +2335,7 @@ main(int argc, char **argv)
state = (CState *) pg_malloc(sizeof(CState));
memset(state, 0, sizeof(CState));
- while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:R:", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:g:D:F:M:P:R:", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -2418,6 +2452,15 @@ main(int argc, char **argv)
if (process_file(filename) == false || *sql_files[num_files - 1] == NULL)
exit(1);
break;
+ case 'g':
+ use_gaussian = true;
+ stdev_threshold = atoi(optarg);
+ if(stdev_threshold < 2)
+ {
+ fprintf(stderr, "gaussian option (-g) must be more than 2: %f\n", stdev_threshold);
+ exit(1);
+ }
+ break;
case 'D':
{
char *p;
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 49a79b1..5f7cb1e 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -320,6 +320,18 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
</varlistentry>
<varlistentry>
+ <term><option>-g</option> <replaceable>standard deviation</></term>
+ <term><option>--gaussian</option><replaceable>standard deviation</></term>
+ <listitem>
+ <para>
+ Gaussian distribution pgbench option. Need the standard deviation threshold.
+ If we set larger standard deviation threshold, pgbench access patern limited
+ more specific records. Min standard deviation threshold is 2.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-j</option> <replaceable>threads</></term>
<term><option>--jobs=</option><replaceable>threads</></term>
<listitem>
KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote:
I create gaussinan distribution pgbench patch that can access
records with gaussian frequency. And I submit this commit fest.
Thanks!
I have moved this to the Open CommitFest, though.
https://commitfest.postgresql.org/action/commitfest_view/open
You had accidentally added to the CF In Progress.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Mitsumasa,
In the general transaction situation, clients access for all records equally is
hard to happen. I think gaussian distribution access patterns are most of
transaction petterns in general. My patch realizes neary this access pattern.
That is great! I was just looking for something like that!
I have not looked at the patch yet, but from the plots you sent, it seems
that it is a gaussian distribution over the keys. However this pattern
induces stronger cache effects which are maybe not too realistic, because
neighboring keys in the middle are more likely to be chosen.
It seems to me that this is not desirable.
Have you considered adding a "randomization" layer, that is once you have
a key in [1 .. n] centered around n/2, then you perform a pseudo-random
transformation into the same domain so that key values are scattered over
the whole domain?
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
You had accidentally added to the CF In Progress.
Oh, I had completely mistook this CF schedule :-)
Maybe, Horiguchi-san is same situation...
However, because of your moving, I become first submitter in next CF.
Thank you for moving :-)
--
Mitsumasa KONDO
Import Notes
Resolved by subject fallback
However this pattern induces stronger cache effects which are maybe not
too realistic,
because neighboring keys in the middle are more likely to be chosen.
I think that your opinion is right. However, in effect, it is a
paseudo-benchmark, so that I think that such a simple mechanism is also
necessary.
Have you considered adding a "randomization" layer, that is once you have
a key in [1 .. > n] centered around n/2, then you perform a pseudo-random
transformation into the same > domain so that key values are scattered over
the whole domain?
Yes. I also consider this patch. It can realize by adding linear mapping
array which is created by random generator. However, current erand48
algorithm is not high accuracy and fossil algorithm, I do not know whether
it works well. If we realize it, we may need more accurate random generator
algorithm which is like Mersenne Twister*.*
Regards,
--
Mitsumasa KONDO
Import Notes
Resolved by subject fallback
On 9/20/13 2:42 AM, KONDO Mitsumasa wrote:
I create gaussinan distribution pgbench patch that can access records with
gaussian frequency. And I submit this commit fest.
This patch no longer applies.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry for my delay reply.
Since I have had vacation last week, I replyed from gmail.
However, it was stalled post to pgsql-hackers:-(
(2013/09/21 6:05), Kevin Grittner wrote:
You had accidentally added to the CF In Progress.
Oh, I had completely mistook this CF schedule :-)
Maybe, Horiguchi-san is same situation...
However, because of your moving, I become first submitter in next CF.
Thank you for moving !
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry for my delay reply.
Since I have had vacation last week, I replied from gmail.
However, it was stalled post to pgsql-hackers:-(
(2013/09/21 7:54), Fabien COELHO wrote:
However this pattern induces stronger cache effects which are maybe not too realistic,
because neighboring keys in the middle are more likely to be chosen.
I think that your opinion is right. However, in effect, it is a
paseudo-benchmark, so that I think that such a simple mechanism is also necessary.
Have you considered adding a "randomization" layer, that is once you have a key in [1 .. > n] centered around n/2, then you perform a pseudo-random transformation into the same > domain so that key values are scattered over the whole domain?
Yes. I also consider this patch. It can realize by adding linear mapping array
which is created by random generator. However, current erand48 algorithm is not
high accuracy and fossil algorithm, I do not know whether it works well. If we
realize it, we may need more accurate random generator algorithm which is like
Mersenne Twister.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/09/27 5:29), Peter Eisentraut wrote:
This patch no longer applies.
I will try to create this patch in next commit fest.
If you have nice idea, please send me!
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 30.09.2013 07:12, KONDO Mitsumasa wrote:
(2013/09/27 5:29), Peter Eisentraut wrote:
This patch no longer applies.
I will try to create this patch in next commit fest.
If you have nice idea, please send me!
A few thoughts on this:
1. DBT-2 uses a non-uniform distribution. You can use that instead of
pgbench.
2. Do we really want to add everything and the kitchen sink to pgbench?
Every addition is small when considered alone, but we'll soon end with a
monster. So I'm inclined to reject this patch on those grounds.
3. That said, this could be handy. But it would be even more handy if
you could get Gaussian random numbers with \setrandom, so that you could
use this with custom scripts. And once you implement that, do we
actually need the -g flag anymore? If you want TPC-B transactions with
gaussian distribution, you can write a custom script to do that. The
documentation includes a full script that corresponds to the built-in
TPC-B script.
So what I'd actually like to see is \setgaussian, for use in custom scripts.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
3. That said, this could be handy. But it would be even more handy if you
could get Gaussian random numbers with \setrandom, so that you could use this
with custom scripts. And once you implement that, do we actually need the -g
flag anymore? If you want TPC-B transactions with gaussian distribution, you
can write a custom script to do that. The documentation includes a full
script that corresponds to the built-in TPC-B script.So what I'd actually like to see is \setgaussian, for use in custom scripts.
Indeed, great idea! That looks pretty elegant! It would be something like:
\setgauss var min max sigma
I'm not sure whether sigma should be relative to max-min, or absolute.
I would say relative is better...
A concerned I raised is that what one should really want is a "pseudo
randomized" (discretized) gaussian, i.e. you want the probability of each
value along a gaussian distribution, *but* no direct frequency correlation
between neighbors. Otherwise, you may have unwanted/unrealistic positive
cache effects. Maybe this could be achieved by an independent built-in,
say either:
\randomize var min max [parameter ?]
\randomize var min max val [parameter]
Which would mean take variable var which must be in [min,max], and apply a
pseudo-random transformation which results is also in [min,max].
From a probabilistic point of view, it seems to me that a randomized
(discretized) exponential would be more significant to model a server
load.
\setexp var min max lambda...
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 21, 2013 at 9:13 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
So what I'd actually like to see is \setgaussian, for use in custom scripts.
+1. I'd really like to be able to run a benchmark with a Gaussian and
uniform distribution side-by-side for comparative purposes - we need
to know that we're not optimizing one at the expense of the other.
Sure, DBT-2 gets you a non-uniform distribution, but it has serious
baggage from it being a tool primarily intended for measuring the
relative performance of different database systems. pgbench would be
pretty worthless for measuring the relative strengths and weaknesses
of different database systems, but it is not bad at informing the
optimization efforts of hackers. pgbench is a defacto standard for
that kind of thing, so we should make it incrementally better for that
kind of thing. No standard industry benchmark is likely to replace it
for this purpose, because such optimizations require relatively narrow
focus.
Sometimes I want to maximally pessimize the number of FPIs generated.
Other times I do not. Getting a sense of how something affects a
variety of distributions would be very valuable, not least since
normal distributions abound in nature.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 20/12/13 09:36, Peter Geoghegan wrote:
On Thu, Nov 21, 2013 at 9:13 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:So what I'd actually like to see is \setgaussian, for use in custom scripts.
+1. I'd really like to be able to run a benchmark with a Gaussian and
uniform distribution side-by-side for comparative purposes - we need
to know that we're not optimizing one at the expense of the other.
Sure, DBT-2 gets you a non-uniform distribution, but it has serious
baggage from it being a tool primarily intended for measuring the
relative performance of different database systems. pgbench would be
pretty worthless for measuring the relative strengths and weaknesses
of different database systems, but it is not bad at informing the
optimization efforts of hackers. pgbench is a defacto standard for
that kind of thing, so we should make it incrementally better for that
kind of thing. No standard industry benchmark is likely to replace it
for this purpose, because such optimizations require relatively narrow
focus.Sometimes I want to maximally pessimize the number of FPIs generated.
Other times I do not. Getting a sense of how something affects a
variety of distributions would be very valuable, not least since
normal distributions abound in nature.
Curious, wouldn't the common usage pattern tend to favour a skewed
distribution, such as the Poisson Distribution (it has been over 40
years since I studied this area, so there may be better candidates).
Just that gut feeling & experience tends to make me think that the
"Normal" distribution may often not be the best for database access
simulation.
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/19/13 5:52 PM, Gavin Flower wrote:
Curious, wouldn't the common usage pattern tend to favour a skewed
distribution, such as the Poisson Distribution (it has been over 40
years since I studied this area, so there may be better candidates).
Some people like database load testing with a "Pareto principle"
distribution, where 80% of the activity hammers 20% of the rows such
that locking becomes important. (That's one specific form of Pareto
distribution) The standard pgbench load indirectly gets you quite a bit
of that due to all the contention on the branches table. Targeting all
of that at a single table can be more realistic.
My last round of reviewing a pgbench change left me pretty worn out with
wanting to extend that code much further. Adding in some new
probability distributions would be fine though, that's a narrow change.
We shouldn't get too excited about pgbench remaining a great tool for
too much longer though. pgbench is fast approaching a wall nowadays,
where it's hard for any single client server to fully overload today's
larger server. You basically need a second large server to generate
load, whereas what people really want is a bunch of coordinated small
clients. (That sort of wall was in early versions too, it just got
pushed upward a lot by the multi-worker changes in 9.0 coming around the
same time desktop core counts really skyrocketed)
pgbench started as a clone of a now abandoned Java project called
JDBCBench. I've been seriously considering a move back toward that
direction lately. Nowadays spinning up ten machines to run load
generation is trivial. The idea of extending pgbench's C code to
support multiple clients running at the same time and collating all of
their results is not a project I'd be excited about. It should remain a
perfectly fine tool for PostgreSQL developers to find code hotspots, but
that's only so useful.
(At this point someone normally points out Tsung solved all of those
problems years ago if you'd only give it a chance. I think it's kind of
telling that work on sysbench is rewriting the whole thing so you can
use Lua for your test scripts.)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I revise my gaussian pgbench patch which wss requested from community.
* Changes
- Support custom script.
- "\setgaussian" is generating gaussian distribute random number.
- ex) \setgaussian [var] [min] [max] [stddev_threshold]
- We can use mixture model in multiple custom scripts.
- Delete short option "-g", and add long options ”--gaussian"
- Refactoring getrand() interface
- getrand(TState *thread, int64 min, int64 max) + getrand(TState *thread, int64 min, int64 max, DistType dist_type, double value1)
- We can easy to add other random distribution algorithms. Please see detail
design in attached patch.
Febien COELHO wrote:
From a probabilistic point of view, it seems to me that a randomized
(discretized) exponential would be more significant to model a server load.
\setexp var min max lambda...
I can create randomized exponential distribution under following. It is very easy.
double rand_exp( double lambda ){
return -log(Uniform(0,1))/lambda;
}
If community wants this, I will add this function in my patch.
Gavin Flower wrote:
Curious, wouldn't the common usage pattern tend to favour a skewed distribution,
such as the Poisson Distribution (it has been over 40 years since I studied
this area, so there may be better candidates).
The difference between Poisson distribution and Gaussian distribution is discrete
or not.
In my gaussian algorithm, first generating continuos gaussian distribution, next
projection to integer values which are each record, it will be discrete value.
Therefore, it will be almost simular with Poisson distribution. And when we set
larger standard deviations(higher 10), it will be created better approximation of
Poisson distribution.
Attached sql files are for custom scripts which are different distribution. It
realize mixture distribuion benchmark. And attached graph is the result.
[example command]
$pgbench -f file1.sql file2.sql
If you have more some comment, please send me.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Attachments:
gaussian_pgbench_v3.patchtext/x-diff; name=gaussian_pgbench_v3.patchDownload
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***************
*** 40,45 ****
--- 40,46 ----
#include <ctype.h>
#include <math.h>
#include <signal.h>
+ #include <limits.h>
#ifndef WIN32
#include <sys/time.h>
***************
*** 176,181 **** int progress_nthreads = 0; /* number of threads for progress report */
--- 177,183 ----
bool is_connect; /* establish connection for each transaction */
bool is_latencies; /* report per-command latencies */
int main_pid; /* main process id used in log filename */
+ double stdev_threshold = 5; /* standard deviation threshold */
char *pghost = "";
char *pgport = "";
***************
*** 267,272 **** typedef enum QueryMode
--- 269,280 ----
NUM_QUERYMODE
} QueryMode;
+ typedef enum DistType
+ {
+ DIST_UNIFORM, /* normal random distribution */
+ DIST_GAUSSIAN /* gaussian random distribution */
+ } DistType;
+
static QueryMode querymode = QUERY_SIMPLE;
static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
***************
*** 338,343 **** static char *select_only = {
--- 346,392 ----
"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
};
+ /* --gaussian case */
+ static char *gaussian_tpc_b = {
+ "\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ "\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ "\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ "\\setgaussian aid 1 :naccounts :stdev_threshold\n"
+ "\\setrandom bid 1 :nbranches\n"
+ "\\setrandom tid 1 :ntellers\n"
+ "\\setrandom delta -5000 5000\n"
+ "BEGIN;\n"
+ "UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ "SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ "UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
+ "UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
+ "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ "END;\n"
+ };
+
+ /* --gaussian-n case */
+ static char *gaussian_simple_update = {
+ "\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ "\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ "\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ "\\setgaussian aid 1 :naccounts :stdev_threshold\n"
+ "\\setrandom bid 1 :nbranches\n"
+ "\\setrandom tid 1 :ntellers\n"
+ "\\setrandom delta -5000 5000\n"
+ "BEGIN;\n"
+ "UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ "SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ "END;\n"
+ };
+
+ /* --gaussian-s case */
+ static char *gaussian_select_only = {
+ "\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ "\\setgaussian aid 1 :naccounts :stdev_threshold\n"
+ "SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ };
+
/* Function prototypes */
static void setalarm(int seconds);
static void *threadRun(void *arg);
***************
*** 381,386 **** usage(void)
--- 430,438 ----
" -v, --vacuum-all vacuum all four standard tables before tests\n"
" --aggregate-interval=NUM aggregate data over NUM seconds\n"
" --sampling-rate=NUM fraction of transactions to log (e.g. 0.01 for 1%%)\n"
+ " --gaussian=NUM gaussian tpc-b with NUM standard deviation threshold\n"
+ " --gaussian-n=NUM gaussian -N option benchmark with NUM standard deviation threshold\n"
+ " --gaussian-s=NUM gaussian -S option benchmark with NUM standard deviation threshold\n"
"\nCommon options:\n"
" -d, --debug print debugging output\n"
" -h, --host=HOSTNAME database server host or socket directory\n"
***************
*** 461,469 **** gotdigits:
return ((sign < 0) ? -result : result);
}
! /* random number generator: uniform distribution from min to max inclusive */
static int64
! getrand(TState *thread, int64 min, int64 max)
{
/*
* Odd coding is so that min and max have approximately the same chance of
--- 513,521 ----
return ((sign < 0) ? -result : result);
}
! /* random number generator: uniform or gaussian distribution from min to max inclusive */
static int64
! getrand(TState *thread, int64 min, int64 max, DistType dist_type, double value1)
{
/*
* Odd coding is so that min and max have approximately the same chance of
***************
*** 474,480 **** getrand(TState *thread, int64 min, int64 max)
* protected by a mutex, and therefore a bottleneck on machines with many
* CPUs.
*/
! return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
}
/* call PQexec() and exit() on failure */
--- 526,567 ----
* protected by a mutex, and therefore a bottleneck on machines with many
* CPUs.
*/
! double rand = pg_erand48(thread->random_state);
!
! switch(dist_type)
! {
! /* Generate uniform distribution. */
! case DIST_UNIFORM :
! break;
!
! /* Generate gaussian distribution. */
! case DIST_GAUSSIAN :
! {
! double rand1 = (rand * (LONG_MAX - 1.0) + 0.5) / LONG_MAX;
! double rand2;
! double stdev;
! double stdev_threshold = value1;
!
! /* get user specified random number until appeared ranged number in this loop */
! do
! {
! rand2 = pg_erand48(thread->random_state);
! /* Box-Muller transform */
! stdev = sqrt(-2.0 * log(rand1)) * sin(2.0 * M_PI * rand2);
! } while ( stdev < (-1.0 * stdev_threshold) || stdev > stdev_threshold);
!
! /* normalization */
! rand = (stdev + value1) / (value1 * 2.0);
! break;
! }
!
! /* maybe bug.. */
! default :
! return 0;
! }
!
! /* return int64 random number within between min and max */
! return min + (int64) (max - min + 1) * rand ;
}
/* call PQexec() and exit() on failure */
***************
*** 942,948 **** top:
* a transaction, the next transaction will start right away.
*/
int64 wait = (int64) (throttle_delay *
! 1.00055271703 * -log(getrand(thread, 1, 10000)/10000.0));
thread->throttle_trigger += wait;
--- 1029,1035 ----
* a transaction, the next transaction will start right away.
*/
int64 wait = (int64) (throttle_delay *
! 1.00055271703 * - log(getrand(thread, 1, 10000, DIST_UNIFORM, 0)/10000.0));
thread->throttle_trigger += wait;
***************
*** 1179,1185 **** top:
if (commands[st->state] == NULL)
{
st->state = 0;
! st->use_file = (int) getrand(thread, 0, num_files - 1);
commands = sql_files[st->use_file];
st->is_throttled = false;
/*
--- 1266,1273 ----
if (commands[st->state] == NULL)
{
st->state = 0;
! st->use_file = (int) getrand(thread, 0, num_files - 1,
! DIST_UNIFORM, 0);
commands = sql_files[st->use_file];
st->is_throttled = false;
/*
***************
*** 1379,1387 **** top:
}
#ifdef DEBUG
! printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max));
#endif
! snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max));
if (!putVariable(st, argv[0], argv[1], res))
{
--- 1467,1475 ----
}
#ifdef DEBUG
! printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max, DIST_UNIFORM, 0));
#endif
! snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max, DIST_UNIFORM, 0));
if (!putVariable(st, argv[0], argv[1], res))
{
***************
*** 1391,1396 **** top:
--- 1479,1574 ----
st->listen = 1;
}
+ else if (pg_strcasecmp(argv[0], "setgaussian") == 0)
+ {
+ char *var;
+ char *endptr;
+ int64 min;
+ int64 max;
+ double stdev_threshold;
+ char res[64];
+
+ if (*argv[2] == ':')
+ {
+ if((var = getVariable(st, argv[2] + 1)) == NULL)
+ {
+ fprintf(stderr, "%s: undefined variable %s\n", argv[0], argv[2]);
+ st->ecnt++;
+ return true;
+ }
+ min = strtoint64(var);
+ }
+ else
+ min = strtoint64(argv[2]);
+ #ifdef NOT_USED
+ if (min < 0)
+ {
+ fprintf(stderr, "%s: invalid minimum number %d\n", argv[0], min);
+ st->ecnt++;
+ return;
+ }
+ #endif
+ if (*argv[3] == ':')
+ {
+ if((var = getVariable(st, argv[3] + 1)) == NULL)
+ {
+ fprintf(stderr, "%s: invalid maximum number %s\n", argv[0], argv[3]);
+ st->ecnt++;
+ return true;
+ }
+ max = strtoint64(var);
+ }
+ else
+ max = strtoint64(argv[3]);
+
+ /* check if min and max are appropriate value */
+ if(max < min)
+ {
+ fprintf(stderr, "%s: maximum is less than minimum\n", argv[0]);
+ st->ecnt++;
+ return true;
+ }
+
+ /* for not overflowing when generating random number */
+ if(max - min < 0 || (max - min) + 1 < 0)
+ {
+ fprintf(stderr, "%s: range too large\n", argv[0]);
+ st->ecnt++;
+ return true;
+ }
+
+ if(*argv[4] == ':')
+ {
+ if((var = getVariable(st, argv[4] + 1)) == NULL)
+ {
+ fprintf(stderr, "%s: invalid gausian threshold number %s\n", argv[0], argv[4]);
+ st->ecnt++;
+ return true;
+ }
+ stdev_threshold = strtod(var, NULL);
+ }
+ else
+ stdev_threshold = strtod(argv[4], &endptr);
+
+ if ( stdev_threshold < 2)
+ {
+ fprintf(stderr, "%s: gaussian threshold must be more than 2\n,", argv[4]);
+ st->ecnt++;
+ return true;
+ }
+ #ifdef DEBUG
+ printf("min: " INT64_FORMAT " max: " INT64_FORMAT " random: " INT64_FORMAT "\n", min, max, getrand(thread, min, max, DIST_GAUSSIAN, stdev_threshold));
+ #endif
+ snprintf(res, sizeof(res), INT64_FORMAT, getrand(thread, min, max, DIST_GAUSSIAN, stdev_threshold));
+
+ if(!putVariable(st, argv[0], argv[1], res))
+ {
+ st->ecnt++;
+ return true;
+ }
+
+ st->listen = 1;
+ }
else if (pg_strcasecmp(argv[0], "set") == 0)
{
char *var;
***************
*** 1915,1920 **** process_commands(char *buf)
--- 2093,2110 ----
fprintf(stderr, "%s: extra argument \"%s\" ignored\n",
my_commands->argv[0], my_commands->argv[j]);
}
+ else if (pg_strcasecmp(my_commands->argv[0], "setgaussian") == 0)
+ {
+ if (my_commands->argc < 5)
+ {
+ fprintf(stderr, "%s: missing argument\n", my_commands->argv[0]);
+ exit(1);
+ }
+
+ for (j = 5; j < my_commands->argc; j++)
+ fprintf(stderr, "%s: extra argument \"%s\" ignored\n",
+ my_commands->argv[0], my_commands->argv[j]);
+ }
else if (pg_strcasecmp(my_commands->argv[0], "set") == 0)
{
if (my_commands->argc < 3)
***************
*** 2193,2203 **** printResults(int ttype, int normal_xacts, int nclients,
--- 2383,2411 ----
s = "Update only pgbench_accounts";
else if (ttype == 1)
s = "SELECT only";
+ else if (ttype == 4)
+ s = "Gaussian distributed TPC-B (sort of)";
+ else if (ttype == 5)
+ s = "Gaussian distributed update only pgbench_accounts";
+ else if (ttype == 6)
+ s = "Gaussian distributed SELECT only";
else
s = "Custom query";
printf("transaction type: %s\n", s);
printf("scaling factor: %d\n", scale);
+
+ /* gaussian distributed */
+ if(ttype == 4 || ttype == 5 || ttype == 6)
+ {
+ printf("standard deviation threshold: %.5f\n", stdev_threshold);
+ printf("access probability of top 20%%, 10%% and 5%% records: %.5f %.5f %.5f\n",
+ (double) ((erf (stdev_threshold * 0.2 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ (double) ((erf (stdev_threshold * 0.1 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0)))),
+ (double) ((erf (stdev_threshold * 0.05 / sqrt(2.0))) / (erf (stdev_threshold / sqrt(2.0))))
+ );
+ }
+
printf("query mode: %s\n", QUERYMODE[querymode]);
printf("number of clients: %d\n", nclients);
printf("number of threads: %d\n", nthreads);
***************
*** 2327,2332 **** main(int argc, char **argv)
--- 2535,2543 ----
{"unlogged-tables", no_argument, &unlogged_tables, 1},
{"sampling-rate", required_argument, NULL, 4},
{"aggregate-interval", required_argument, NULL, 5},
+ {"gaussian", required_argument, NULL, 6},
+ {"gaussian-n", required_argument, NULL, 7},
+ {"gaussian-s", required_argument, NULL, 8},
{"rate", required_argument, NULL, 'R'},
{NULL, 0, NULL, 0}
};
***************
*** 2606,2611 **** main(int argc, char **argv)
--- 2817,2849 ----
}
#endif
break;
+ case 6:
+ ttype = 4;
+ stdev_threshold = atof(optarg);
+ if(stdev_threshold < 2)
+ {
+ fprintf(stderr, "--gaussian=NUM must be more than 2: %f\n", stdev_threshold);
+ exit(1);
+ }
+ break;
+ case 7:
+ ttype = 5;
+ stdev_threshold = atof(optarg);
+ if(stdev_threshold < 2)
+ {
+ fprintf(stderr, "--gaussian-n=NUM must be more than 2: %f\n", stdev_threshold);
+ exit(1);
+ }
+ break;
+ case 8:
+ ttype = 6;
+ stdev_threshold = atof(optarg);
+ if(stdev_threshold < 2)
+ {
+ fprintf(stderr, "--gaussian-s=NUM must be more than 2: %f\n", stdev_threshold);
+ exit(1);
+ }
+ break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
***************
*** 2803,2808 **** main(int argc, char **argv)
--- 3041,3057 ----
}
}
+ /* set :stdev_threshold variable */
+ if(getVariable(&state[0], "stdev_threshold") == NULL)
+ {
+ snprintf(val, sizeof(val), "%lf", stdev_threshold);
+ for (i = 0; i < nclients; i++)
+ {
+ if (!putVariable(&state[i], "startup", "stdev_threshold", val))
+ exit(1);
+ }
+ }
+
if (!is_no_vacuum)
{
fprintf(stderr, "starting vacuum...");
***************
*** 2841,2847 **** main(int argc, char **argv)
sql_files[0] = process_builtin(simple_update);
num_files = 1;
break;
!
default:
break;
}
--- 3090,3107 ----
sql_files[0] = process_builtin(simple_update);
num_files = 1;
break;
! case 4:
! sql_files[0] = process_builtin(gaussian_tpc_b);
! num_files = 1;
! break;
! case 5:
! sql_files[0] = process_builtin(gaussian_simple_update);
! num_files = 1;
! break;
! case 6:
! sql_files[0] = process_builtin(gaussian_select_only);
! num_files = 1;
! break;
default:
break;
}
***************
*** 3035,3041 **** threadRun(void *arg)
Command **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;
! st->use_file = getrand(thread, 0, num_files - 1);
if (!doCustom(thread, st, &result->conn_time, logfile, &aggs))
remains--; /* I've aborted */
--- 3295,3302 ----
Command **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;
! st->use_file = getrand(thread, 0, num_files - 1, DIST_UNIFORM, 0);
!
if (!doCustom(thread, st, &result->conn_time, logfile, &aggs))
remains--; /* I've aborted */
*** a/doc/src/sgml/pgbench.sgml
--- b/doc/src/sgml/pgbench.sgml
***************
*** 320,325 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
--- 320,340 ----
</varlistentry>
<varlistentry>
+ <term><option>--gaussian</option><replaceable>standard deviation</></term>
+ <term><option>--gaussian-n</option><replaceable>standard deviation</></term>
+ <term><option>--gaussian-s</option><replaceable>standard deviation</></term>
+ <listitem>
+ <para>
+ Gaussian distribution pgbench option. Need the standard deviation threshold.
+ If we set larger standard deviation threshold, pgbench access patern limited
+ more specific records. Min standard deviation threshold is 2. If you add '-n'
+ or '-s' options at tail, you can execute gaussian distribution pgbench which
+ is like '-N' or '-S' option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-j</option> <replaceable>threads</></term>
<term><option>--jobs=</option><replaceable>threads</></term>
<listitem>
***************
*** 770,775 **** pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
--- 785,812 ----
<varlistentry>
<term>
+ <literal>\setgaussian <replaceable>varname</> <replaceable>min</> <replaceable>max</> <replaceable>
+ standard deviation threshold</literal>
+ </term>
+
+ <listitem>
+ <para>
+ Sets variable <replaceable>varname</> to a gaussian random integer value
+ between the limits <replaceable>min</> and <replaceable>max</> inclusive.
+ Each limit can be either an integer constant or a
+ <literal>:</><replaceable>variablename</> reference to a variable
+ having an integer value. Min standard deviation threshold is 2.
+ </para>
+
+ <para>
+ Example:
+ <programlisting>
+ \setgaussian aid 1 :naccounts 5
+ </programlisting></para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>
<literal>\sleep <replaceable>number</> [ us | ms | s ]</literal>
</term>
graph.pngimage/png; name=graph.pngDownload
�PNG
IHDR � ��*� )PLTE��� ���� � ��� � ���@ �� ��� �@����@ ��� �` �� `��`� � @��0`��` @@@@� ��`�``�`� � � ` ���@��`��`� `��� � �` �``` @@ @�`� `�``����@ � ���������� ���` ����`��� ��`�@ �@@�����`����� ������� � ��������`` � �� �� �������� � � �� � �� � �@ �@��`��`��� �� ��@��@��`��p����� ������T&�s �IDATx����� Eq��w���q$HH���vz��g�s pg���f������/`: ���\�0���e��E nU�#���.�� �o�pm������
��W��G��`����FP �T��q���}���� �n��W��^�I-����p� \ �M����������>%�&�-����o�������>��/��`�� �]�s��;|���|��]��t�������=���%@����Q�����=~8?��������3����t}�����$�>�I��Ff�@�;@�Ky2S2A��Jo^�qW���{n.������O�6���������=����� \�L�reU�T����>�j=�&L��0{$�#`r ��|G����C�9�ed�@��� ��������#�m`�� 09 k.`=>��>09`r� �H��qw�� ����UP
��U��~H��_���C�F�P������� l��
p����������}�Q��p �(L�7�>`1�;�� .`r ��|@���`8��l� 1�&�� ��E�����0>`r ���`}_|@ �,?` &�� �����b[ �t�a� �09��6����>