[PATCH] pgbench various mods for high volume testing
Hello. Attached is a patch that I created against REL9_2_4 for
contrib/pgbench. I am willing to re-work the patch for HEAD or another
version if you choose to accept the patch.
The patch supports a number of modifications to pgbench to facilitate
benchmarking using many client processes across many hosts to the
effect of over 100,000 connections sending over 500,000 transactions
per second from over 500 pgbench processes on a dozen client hosts.
This effort was for an open source RDBMS that I have created which
speaks the PostgreSQL Frontend/Backend Protocol. I would like to get
approval to have this patch placed in the main branch for pgbench so
that I don't have to maintain a distinct patch. Even though I created
this patch to test a product which is not PostgreSQL, I hope that you
find the modifications to be useful for PostgreSQL testing, at least
at very high volumes.
That background out of the way, here are the additional features:
----------------------------------
--urandom: use /dev/urandom to provide seed values for randomness.
Without this, multiple pgbench processes are likely to generate the
same sequence of "random" numbers. This was noticeable in InfiniSQL
benchmarking because of the resulting extremely high rate of locked
records from having stored procedures invoked with identical parameter
values.
--per-second=NUM: report per-second throughput rate on stdout. NUM is
the quantity of transactions in each batch that gets counted. The
higher the value, the less frequently gettimeofday gets called.
gettimeofday invocation can become a limiting factor as throughput
increases, so minimizing it is beneficial. For example, with NUM of
100, time will be checked every 100 transactions, which will cause the
per-second output to be in multiples of 100. This enables fine-grained
(per second) analysis of transaction throughput.
-P PASSWORD: pass the db password on the command line. This is
necessary for InfiniSQL benchmarking because hundreds or more separate
pgbench processes can be launched, and InfiniSQL requires password
authentication. Having to manually enter all those passwords would
making benchmarking impossible.
-I: do not abort connection if transaction error is encountered.
InfiniSQL returns an error if records are locked, so pgbench was
patched to tolerate this. This is pending a fix, but until then,
pgbench needs to carry on. The specific error emitted from the server
is written to stderr for each occurrence. The total quantity of
transactions is not incremented if there's an error.
----------------------------
Thank you for your consideration. More background about how I used the
patch is at http://www.infinisql.org
If you find this patch to be useful, then I am willing to modify the
patch as necessary to get it accepted into the code base. I made sure
to create it as a '-c' patch and I haven't snuck in any rogue
whitespace. Apply it in the root of REL9_2_4 as: patch -p1 <
pgbench_persecond-v1.patch
Sincerely,
Mark Travis
Attachments:
pgbench_persecond-v1.patchapplication/octet-stream; name=pgbench_persecond-v1.patchDownload
diff -rcpN original/contrib/pgbench/pgbench.c new/contrib/pgbench/pgbench.c
*** original/contrib/pgbench/pgbench.c 2013-04-01 11:20:36.000000000 -0700
--- new/contrib/pgbench/pgbench.c 2013-11-12 21:03:45.349693960 -0800
*************** int fillfactor = 100;
*** 125,130 ****
--- 125,168 ----
int unlogged_tables = 0;
/*
+ * do not close client if query error is encountered
+ */
+ int proceed_on_error = 0;
+
+ /*
+ * start second for per-second rate reporting
+ */
+ uint64 persecondstart;
+
+ /*
+ * number of seconds to keep tally
+ *
+ */
+ #define PERSECOND_NUMSECONDS 604800
+
+ /*
+ * per-second report table
+ * persecond[threadnum][second]
+ */
+ int **persecond;
+
+ /*
+ * size of transaction batches to report on per second
+ *
+ */
+ int persecondbatchsize;
+
+ /*
+ * per thread per second completed transactions tally
+ */
+ int *persecondtally;
+
+ /*
+ * use /dev/urandom for random seed state
+ */
+ int use_urandom = 0;
+
+ /*
* tablespace selection
*/
char *tablespace = NULL;
*************** char *pgoptions = NULL;
*** 150,155 ****
--- 188,194 ----
char *pgtty = NULL;
char *login = NULL;
char *dbName;
+ char *password;
volatile bool timer_exceeded = false; /* flag from signal handler */
*************** usage(const char *progname)
*** 361,366 ****
--- 400,406 ----
" -D VARNAME=VALUE\n"
" define variable for use by custom script\n"
" -f FILENAME read transaction script from FILENAME\n"
+ " -I do not abort connection if query error is encountered\n"
" -j NUM number of threads (default: 1)\n"
" -l write transaction times to log file\n"
" -M simple|extended|prepared\n"
*************** usage(const char *progname)
*** 373,383 ****
--- 413,433 ----
" -t NUM number of transactions each client runs (default: 10)\n"
" -T NUM duration of benchmark test in seconds\n"
" -v vacuum all four standard tables before tests\n"
+ " --per-second=NUM\n"
+ " report per second throughput rate per thread. NUM is the # of\n"
+ " statements in each batch to be added to the per second tally.\n"
+ " As NUM increases, the sampling rate to get the current\n"
+ " time decreases.\n"
+
+ " Note that one tally is made per statement in a multi-statement\n"
+ " transaction, including BEGIN and COMMIT.\n"
+ " --urandom use /dev/urandom for seeding random number generator\n"
"\nCommon options:\n"
" -d print debugging output\n"
" -h HOSTNAME database server host or socket directory\n"
" -p PORT database server port number\n"
" -U USERNAME connect as specified database user\n"
+ " -P PASSWORD send specified password\n"
" -V, --version output version information, then exit\n"
" -?, --help show this help, then exit\n"
"\n"
*************** static PGconn *
*** 421,427 ****
doConnect(void)
{
PGconn *conn;
- static char *password = NULL;
bool new_pass;
/*
--- 471,476 ----
*************** top:
*** 884,895 ****
{
case PGRES_COMMAND_OK:
case PGRES_TUPLES_OK:
break; /* OK */
default:
! fprintf(stderr, "Client %d aborted in state %d: %s",
! st->id, st->state, PQerrorMessage(st->con));
! PQclear(res);
! return clientDone(st, false);
}
PQclear(res);
discard_response(st);
--- 933,965 ----
{
case PGRES_COMMAND_OK:
case PGRES_TUPLES_OK:
+ if (persecondbatchsize)
+ {
+ if (!(++persecondtally[thread->tid] % persecondbatchsize))
+ {
+ instr_time now;
+ uint64 nowsec;
+ INSTR_TIME_SET_CURRENT(now);
+ nowsec = INSTR_TIME_GET_MICROSEC(now) / 1000000;
+ if (nowsec-persecondstart <= PERSECOND_NUMSECONDS-1)
+ {
+ persecond[thread->tid][nowsec-persecondstart] += persecondtally[thread->tid];
+ persecondtally[thread->tid] = 0;
+ }
+ }
+ }
break; /* OK */
default:
! if (!proceed_on_error)
! {
! fprintf(stderr, "Client %d aborted in state %d: %s", st->id, st->state, PQerrorMessage(st->con));
! PQclear(res);
! return clientDone(st, false);
! }
! else
! {
! fprintf(stderr, "Client %d proceeding after error in state %d: %s", st->id, st->state, PQerrorMessage(st->con));
! }
}
PQclear(res);
discard_response(st);
*************** printResults(int ttype, int normal_xacts
*** 1841,1846 ****
--- 1911,1948 ----
}
}
}
+
+ if (persecondbatchsize)
+ {
+ int lastsecond=-1;
+ int n;
+ int tnum;
+ printf("\nsecond");
+ for (tnum=0; tnum < nthreads; tnum++)
+ {
+ printf(",thread%i", tnum);
+ }
+ printf("\n");
+ for (tnum=0; tnum < nthreads; tnum++)
+ {
+ for (n=0; n < PERSECOND_NUMSECONDS; n++)
+ {
+ if (lastsecond < n && persecond[tnum][n])
+ {
+ lastsecond = n;
+ }
+ }
+ }
+ for (n=0; n < lastsecond; n++)
+ {
+ printf("%lu", persecondstart+n);
+ for (tnum=0; tnum < nthreads; tnum++)
+ {
+ printf(",%i", persecond[tnum][n]);
+ }
+ printf("\n");
+ }
+ }
}
*************** main(int argc, char **argv)
*** 1868,1875 ****
--- 1970,1980 ----
int total_xacts;
int i;
+ FILE *urandomfd;
static struct option long_options[] = {
+ {"urandom", no_argument, &use_urandom, 1},
+ {"per-second", required_argument, NULL, 4},
{"index-tablespace", required_argument, NULL, 3},
{"tablespace", required_argument, NULL, 2},
{"unlogged-tables", no_argument, &unlogged_tables, 1},
*************** main(int argc, char **argv)
*** 1919,1925 ****
state = (CState *) xmalloc(sizeof(CState));
memset(state, 0, sizeof(CState));
! while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
{
switch (c)
{
--- 2024,2030 ----
state = (CState *) xmalloc(sizeof(CState));
memset(state, 0, sizeof(CState));
! while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:P:Ix:f", long_options, &optindex)) != -1)
{
switch (c)
{
*************** main(int argc, char **argv)
*** 2071,2076 ****
--- 2176,2187 ----
exit(1);
}
break;
+ case 'P':
+ password = optarg;
+ break;
+ case 'I':
+ proceed_on_error = 1;
+ break;
case 0:
/* This covers long options which take no argument. */
break;
*************** main(int argc, char **argv)
*** 2080,2085 ****
--- 2191,2199 ----
case 3: /* index-tablespace */
index_tablespace = optarg;
break;
+ case 4: /* per-second */
+ persecondbatchsize = atoi(optarg);
+ break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
*************** main(int argc, char **argv)
*** 2116,2132 ****
}
/*
! * is_latencies only works with multiple threads in thread-based
! * implementations, not fork-based ones, because it supposes that the
* parent can see changes made to the per-thread execution stats by child
* threads. It seems useful enough to accept despite this limitation, but
* perhaps we should FIXME someday (by passing the stats data back up
* through the parent-to-child pipes).
*/
#ifndef ENABLE_THREAD_SAFETY
! if (is_latencies && nthreads > 1)
{
! fprintf(stderr, "-r does not work with -j larger than 1 on this platform.\n");
exit(1);
}
#endif
--- 2230,2247 ----
}
/*
! * is_latencies and per second reporting only work
! * with multiple threads in thread-based
! * implementations, not fork-based ones, because they suppose that the
* parent can see changes made to the per-thread execution stats by child
* threads. It seems useful enough to accept despite this limitation, but
* perhaps we should FIXME someday (by passing the stats data back up
* through the parent-to-child pipes).
*/
#ifndef ENABLE_THREAD_SAFETY
! if ((is_latencies && nthreads > 1) || (persecondbatchsize && nthreads >1))
{
! fprintf(stderr, "-r and --per-second do not work with -j larger than 1 on this platform.\n");
exit(1);
}
#endif
*************** main(int argc, char **argv)
*** 2271,2279 ****
thread->tid = i;
thread->state = &state[nclients / nthreads * i];
thread->nstate = nclients / nthreads;
! thread->random_state[0] = random();
! thread->random_state[1] = random();
! thread->random_state[2] = random();
if (is_latencies)
{
--- 2386,2411 ----
thread->tid = i;
thread->state = &state[nclients / nthreads * i];
thread->nstate = nclients / nthreads;
! if (use_urandom==0)
! {
! thread->random_state[0] = random();
! thread->random_state[1] = random();
! thread->random_state[2] = random();
! }
! else
! {
! urandomfd = fopen("/dev/urandom", "r");
! if (urandomfd == NULL)
! {
! fprintf(stderr, "could not open /dev/urandom: %s\n", strerror(errno));
! exit(1);
! }
! if (fread(thread->random_state, sizeof(unsigned short), 3, urandomfd)==-1)
! {
! fprintf(stderr, "couldn't read from /dev/urandom: %s\n", strerror(errno));
! exit(1);
! }
! }
if (is_latencies)
{
*************** main(int argc, char **argv)
*** 2305,2310 ****
--- 2437,2469 ----
if (duration > 0)
setalarm(duration);
+ if (persecondbatchsize)
+ {
+ int n;
+ persecond = malloc(nthreads * sizeof(int *));
+ if (!persecond)
+ {
+ fprintf(stderr, "out of memory for persecond report\n");
+ exit(1);
+ }
+ for (n=0; n < nthreads; n++)
+ {
+ persecond[n] = (int *)calloc(PERSECOND_NUMSECONDS, sizeof(int));
+ if (!persecond[n])
+ {
+ fprintf(stderr, "out of memory for persecond report\n");
+ exit(1);
+ }
+ }
+ persecondtally = calloc(nthreads, sizeof(int));
+ if (!persecondtally)
+ {
+ fprintf(stderr, "out of memory for persecondtally\n");
+ exit(1);
+ }
+ persecondstart = INSTR_TIME_GET_MICROSEC(start_time) / 1000000;
+ }
+
/* start threads */
for (i = 0; i < nthreads; i++)
{
A non-authoritative answer from previous experience at trying to improve
pgbench:
Hello. Attached is a patch that I created against REL9_2_4 for
contrib/pgbench. I am willing to re-work the patch for HEAD or another
version if you choose to accept the patch.
It rather works the other way around: "you submit a patch which get
accepted or not, possibly after (too) heavy discussions". It is not "you
submit an idea, and it gets accepted, and the patch you will submit is
applied later on". There is a commitfest to submit patches, see
http://commitfest.postgresql.org.
Moreover people do not like bundled multi-purpose patch, so at the minimum
it will have to be split per feature.
That background out of the way, here are the additional features:
--urandom: use /dev/urandom to provide seed values for randomness.
Without this, multiple pgbench processes are likely to generate the
same sequence of "random" numbers. This was noticeable in InfiniSQL
benchmarking because of the resulting extremely high rate of locked
records from having stored procedures invoked with identical parameter
values.
This loos unix/linux specific? I think that if possible, the randomness
issue should be kept out of "pgbench"?
--per-second=NUM: report per-second throughput rate on stdout. NUM is
the quantity of transactions in each batch that gets counted. The
higher the value, the less frequently gettimeofday gets called.
gettimeofday invocation can become a limiting factor as throughput
increases, so minimizing it is beneficial. For example, with NUM of
100, time will be checked every 100 transactions, which will cause the
per-second output to be in multiples of 100. This enables fine-grained
(per second) analysis of transaction throughput.
See existing option --progress. I do not understand how a transaction may
not be counted. Do you mean measured?
My measure of the cost of gettimeofday() calls show that for actual
transactions which involve disk read/write operations on a Linux system
the impact is really small, and this is also true for read-only accesses
as it is small wrt network traffic (send/receive transaction) but some
people have expressed concerns about gettimeofday costs in the past.
-P PASSWORD: pass the db password on the command line. This is
necessary for InfiniSQL benchmarking because hundreds or more separate
pgbench processes can be launched, and InfiniSQL requires password
authentication. Having to manually enter all those passwords would
making benchmarking impossible.
Hmmm... $HOME/.pgpass is your friend? Consider an environment variable?
The idea is to avoid having a password in you shell history.
-I: do not abort connection if transaction error is encountered.
InfiniSQL returns an error if records are locked, so pgbench was
patched to tolerate this. This is pending a fix, but until then,
pgbench needs to carry on. The specific error emitted from the server
is written to stderr for each occurrence. The total quantity of
transactions is not incremented if there's an error.
No opinion about this one.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-11-13 08:35:31 +0100, Fabien COELHO wrote:
Hello. Attached is a patch that I created against REL9_2_4 for
contrib/pgbench. I am willing to re-work the patch for HEAD or another
version if you choose to accept the patch.It rather works the other way around: "you submit a patch which get accepted
or not, possibly after (too) heavy discussions". It is not "you submit an
idea, and it gets accepted, and the patch you will submit is applied later
on". There is a commitfest to submit patches, see
http://commitfest.postgresql.org.
Well, you certainly can, are even encouraged to, ask for feedback about
a feature before spending significant time on it. So interest certainly
cannot be a guarantee for acceptance, but it certainly is helpful.
That background out of the way, here are the additional features:
--urandom: use /dev/urandom to provide seed values for randomness.
Without this, multiple pgbench processes are likely to generate the
same sequence of "random" numbers. This was noticeable in InfiniSQL
benchmarking because of the resulting extremely high rate of locked
records from having stored procedures invoked with identical parameter
values.This loos unix/linux specific? I think that if possible, the randomness
issue should be kept out of "pgbench"?
urandom is available on a couple of platforms, no just linux. I don't
see a big problem making the current srandom() invocation more complex.
-I: do not abort connection if transaction error is encountered.
InfiniSQL returns an error if records are locked, so pgbench was
patched to tolerate this. This is pending a fix, but until then,
pgbench needs to carry on. The specific error emitted from the server
is written to stderr for each occurrence. The total quantity of
transactions is not incremented if there's an error.
I am not sure I like the implementation not having looked at it, but I
certainly think this is a useful feature. I think the error rate should
be computed instead of just disregarding it though.
It might also be worthwile to add code to automatically retry
transactions that fail with an error indicating a transient error (like
serialization failures).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers