Add accurate option to pgbench

Started by Mitsumasa KONDOabout 12 years ago4 messages

kondo.mitsumasa@gmail.com

about 12 years ago

1 attachment(s)

Hi,

I create pgbench patch that adding accurate option in benchmark, and submit
it in CF3.
It is simple option to get more accurate benchmark result and to avoid miss
benchmark result in pgbench.

Logic of this option is under following.
1. execute cluster command to sort records.
2. execute checkpoint to clear dirty-buffers in shared_buffers.
3. execute sync command to clear dirty-file-caches in OS.
4. waiting 10 seconds for raid cache is until empty .
5. execute checkpoint to init checkpoint_timeout and checkpoint_segments.
6. start benchmark.

Sample output is under following.

[mitsu-ko@vm-kondo pgbench]$ ./pgbench -a
starting cluster...end.
starting checkpoint...end.
starting sync all buffers and wait 10 seconds...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
accurate mode: on
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 187.677120 (including connections establishing)
tps = 236.417798 (excluding connections establishing)

I hope that it will be reccomended pgbench option in commnity development.
However, it might too carefuly option before starting benchmark.
Please give me comments.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachments:

pgbench_accurate_option_v0.patchapplication/octet-stream; name=pgbench_accurate_option_v0.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 816400f..6f11e25 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -360,6 +360,7 @@ usage(void)
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tables        create tables as unlogged tables\n"
 		   "\nBenchmarking options:\n"
+		   "  -a  --accurate           execute pgbench on most accurate mode\n"
 		   "  -c, --client=NUM         number of concurrent database clients (default: 1)\n"
 		   "  -C, --connect            establish new connection for each transaction\n"
 		   "  -D, --define=VARNAME=VALUE\n"
@@ -2252,7 +2253,8 @@ int
 main(int argc, char **argv)
 {
 	static struct option long_options[] = {
-		/* systematic long/short named options*/
+		/* systematic long/short named options */
+		{"accurate", no_argument, NULL, 'a'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
 		{"debug", no_argument, NULL, 'd'},
@@ -2291,6 +2293,7 @@ main(int argc, char **argv)
 	int			nclients = 1;	/* default number of simulated clients */
 	int			nthreads = 1;	/* default number of threads */
 	int			is_init_mode = 0;		/* initialize mode? */
+	int			is_accurate_mode = 0; 		/* execute cluster and checkpoint before testing */
 	int			is_no_vacuum = 0;		/* no vacuum at all before testing? */
 	int			do_vacuum_accounts = 0; /* do vacuum accounts before testing? */
 	int			ttype = 0;		/* transaction type. 0: TPC-B, 1: SELECT only,
@@ -2354,10 +2357,13 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:R:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "aih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:R:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
+			case 'a':
+				is_accurate_mode++;
+				break;
 			case 'i':
 				is_init_mode++;
 				break;
@@ -2759,7 +2765,25 @@ main(int argc, char **argv)
 		}
 	}
 
-	if (!is_no_vacuum)
+	if(is_accurate_mode)
+	{
+		fprintf(stderr, "starting cluster...");
+		executeStatement(con, "cluster pgbench_accounts using pgbench_accounts_pkey");
+		executeStatement(con, "cluster pgbench_branches using pgbench_branches_pkey");
+		executeStatement(con, "cluster pgbench_tellers using pgbench_tellers_pkey");
+		executeStatement(con, "truncate pgbench_history");
+		fprintf(stderr, "end.\n");
+		fprintf(stderr, "starting checkpoint...");
+		executeStatement(con, "checkpoint");
+		fprintf(stderr, "end.\n");
+		fprintf(stderr, "starting sync all buffers and wait 10 seconds...");
+		sync();
+		/* wait 10 seconds until raid cache is empty */
+		pg_usleep(10 * 1000 * 1000);
+		executeStatement(con, "checkpoint");
+		fprintf(stderr, "end.\n");
+	}
+	else if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
 		executeStatement(con, "vacuum pgbench_branches");
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 8e1a05d..e4a949f 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -262,6 +262,18 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
 
     <variablelist>
 
+      <varlistentry>
+       <term><option>-q</option> <replaceable>accurate</></term>
+       <term><option>--accurate=</option><replaceable>accurate</></term>
+       <listitem>
+        <para>
+        Execute pgbench on accurate mode, that executes cluster, checkpoint,
+        sync, and checkpoint before starting benchmark to get more accurate
+        benchmark result.
+        </para>
+       </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-c</option> <replaceable>clients</></term>
       <term><option>--client=</option><replaceable>clients</></term>

Robert Haas

robertmhaas@gmail.com

about 12 years ago

In reply to: Mitsumasa KONDO (#1)

Re: Add accurate option to pgbench

On Thu, Oct 31, 2013 at 6:36 AM, Mitsumasa KONDO
<kondo.mitsumasa@gmail.com> wrote:

Hi,

I create pgbench patch that adding accurate option in benchmark, and submit
it in CF3.
It is simple option to get more accurate benchmark result and to avoid miss
benchmark result in pgbench.

Logic of this option is under following.
1. execute cluster command to sort records.
2. execute checkpoint to clear dirty-buffers in shared_buffers.
3. execute sync command to clear dirty-file-caches in OS.
4. waiting 10 seconds for raid cache is until empty .
5. execute checkpoint to init checkpoint_timeout and checkpoint_segments.
6. start benchmark.

I have similar logic in some of my benchmarking scripts but I don't
see a compelling reason to include it in pgbench itself. You can
checkpoint, sync, and clear OS caches in your script before starting
the pgbench run. Requirements will vary from system to system; e.g.
some people might want to write to /proc/sys/vm/drop_caches, which is
both non-portable and not possible from within pgbench because it
requires additional privileges. More importantly, not everyone will
want to do it, and not everyone will want to write the same value.
Similarly, waiting 10 seconds for the RAID cache to drain is not
relevant for everyone, nor is necessarily the right amount of time to
wait. We'll go nuts if we try to anticipate needs in this area in
pgbench; there will be many different right answers on individual
people's systems.

All of which is to say that I'm not in favor of accepting this patch.
As a side node, if we were going to accept it, I think --accurate
isn't a good name; there could be good reasons to want to run without
these behaviors, but who wouldn't want to be accurate?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

about 12 years ago

In reply to: Robert Haas (#2)

Re: Add accurate option to pgbench

Logic of this option is under following.
1. execute cluster command to sort records.
2. execute checkpoint to clear dirty-buffers in shared_buffers.
3. execute sync command to clear dirty-file-caches in OS.
4. waiting 10 seconds for raid cache is until empty .
5. execute checkpoint to init checkpoint_timeout and checkpoint_segments.
6. start benchmark.

I have similar logic in some of my benchmarking scripts but I don't
see a compelling reason to include it in pgbench itself.

I agree that this looks more like script material.

However I think that part of this interesting checklist and discussion
could make it to some "caveat" section about reproducible performance
measures in pgbench documentation, though?

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Fabien COELHO (#3)

Re: Add accurate option to pgbench

Fabien COELHO <coelho@cri.ensmp.fr> writes:

However I think that part of this interesting checklist and discussion
could make it to some "caveat" section about reproducible performance
measures in pgbench documentation, though?

+1. There's already a section of advice about how to get reproducible
numbers from pgbench --- we could certainly extend that to cover more
things.

BTW, even if we were going to put code for these things into pgbench,
driving them all off a single switch would be very bad design.
I see no reason to think that all and only these issues would be
appropriate to control for any particular user of pgbench.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers