pgbench --tuple-size option

Started by Fabien COELHOover 11 years ago11 messages
#1Fabien COELHO
coelho@cri.ensmp.fr
1 attachment(s)

After publishing some test results with pgbench on SSD with varying page
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples,
and that results may be different with other tuple sizes.

This patch adds an option to change the default tuple size, so that this
can be tested easily.

--
Fabien.

Attachments:

pgbench-tupsize-1.patchtext/x-diff; charset=us-ascii; name=pgbench-tupsize-1.patchDownload
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2f7d80e..709022c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -119,6 +119,10 @@ int			scale = 1;
  */
 int			fillfactor = 100;
 
+/* approximate number of bytes for rows
+ */
+int			tupsize = 100;
+
 /*
  * create foreign key constraints on the tables?
  */
@@ -359,6 +363,7 @@ usage(void)
 	"                           create indexes in the specified tablespace\n"
 	 "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tables        create tables as unlogged tables\n"
+		   "  --tuple-size=NUM         target tuple size (default: 100)\n"
 		   "\nBenchmarking options:\n"
 		   "  -c, --client=NUM         number of concurrent database clients (default: 1)\n"
 		   "  -C, --connect            establish new connection for each transaction\n"
@@ -1704,32 +1709,37 @@ init(bool is_no_vacuum)
 		const char *table;		/* table name */
 		const char *smcols;		/* column decls if accountIDs are 32 bits */
 		const char *bigcols;	/* column decls if accountIDs are 64 bits */
-		int			declare_fillfactor;
+		int			declare_fillfactor; /* whether to add a fillfactor */
+		int			tupsize_correction; /* tupsize correction */
 	};
 	static const struct ddlinfo DDLs[] = {
 		{
 			"pgbench_history",
-			"tid int,bid int,aid    int,delta int,mtime timestamp,filler char(22)",
-			"tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22)",
-			0
+			"tid int,bid int,aid    int,delta int,mtime timestamp,filler char",
+			"tid int,bid int,aid bigint,delta int,mtime timestamp,filler char",
+			0,
+			78
 		},
 		{
 			"pgbench_tellers",
-			"tid int not null,bid int,tbalance int,filler char(84)",
-			"tid int not null,bid int,tbalance int,filler char(84)",
-			1
+			"tid int not null,bid int,tbalance int,filler char",
+			"tid int not null,bid int,tbalance int,filler char",
+			1,
+			16
 		},
 		{
 			"pgbench_accounts",
-			"aid    int not null,bid int,abalance int,filler char(84)",
-			"aid bigint not null,bid int,abalance int,filler char(84)",
-			1
+			"aid    int not null,bid int,abalance int,filler char",
+			"aid bigint not null,bid int,abalance int,filler char",
+			1,
+			16
 		},
 		{
 			"pgbench_branches",
-			"bid int not null,bbalance int,filler char(88)",
-			"bid int not null,bbalance int,filler char(88)",
-			1
+			"bid int not null,bbalance int,filler char",
+			"bid int not null,bbalance int,filler char",
+			1,
+			12
 		}
 	};
 	static const char *const DDLINDEXes[] = {
@@ -1767,6 +1777,9 @@ init(bool is_no_vacuum)
 		char		buffer[256];
 		const struct ddlinfo *ddl = &DDLs[i];
 		const char *cols;
+		int 		ts = tupsize - ddl->tupsize_correction;
+
+		if (ts < 1) ts = 1;
 
 		/* Remove old table, if it exists. */
 		snprintf(buffer, sizeof(buffer), "drop table if exists %s", ddl->table);
@@ -1790,9 +1803,9 @@ init(bool is_no_vacuum)
 
 		cols = (scale >= SCALE_32BIT_THRESHOLD) ? ddl->bigcols : ddl->smcols;
 
-		snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s",
+		snprintf(buffer, sizeof(buffer), "create%s table %s(%s(%d))%s",
 				 unlogged_tables ? " unlogged" : "",
-				 ddl->table, cols, opts);
+				 ddl->table, cols, ts, opts);
 
 		executeStatement(con, buffer);
 	}
@@ -2504,6 +2517,7 @@ main(int argc, char **argv)
 		{"unlogged-tables", no_argument, &unlogged_tables, 1},
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
+		{"tuple-size", required_argument, NULL, 6},
 		{"rate", required_argument, NULL, 'R'},
 		{NULL, 0, NULL, 0}
 	};
@@ -2811,6 +2825,15 @@ main(int argc, char **argv)
 				}
 #endif
 				break;
+			case 6:
+				initialization_option_set = true;
+				tupsize = atoi(optarg);
+				if (tupsize <= 0)
+				{
+					fprintf(stderr, "invalid tuple size: %d\n", tupsize);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 23bfa9e..e6210e7 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -515,6 +515,15 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>--tuple-size=</option><replaceable>size</></term>
+      <listitem>
+       <para>
+        Set target size of tuples. Default is 100 bytes.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-v</option></term>
       <term><option>--vacuum-all</option></term>
       <listitem>
#2Andres Freund
andres@2ndquadrant.com
In reply to: Fabien COELHO (#1)
Re: pgbench --tuple-size option

On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote:

After publishing some test results with pgbench on SSD with varying page
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and
that results may be different with other tuple sizes.

This patch adds an option to change the default tuple size, so that this can
be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#2)
Re: pgbench --tuple-size option

Hello Andres,

This patch adds an option to change the default tuple size, so that this can
be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument
does not apply here.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Fabien COELHO (#3)
Re: pgbench --tuple-size option

On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote:

Hello Andres,

This patch adds an option to change the default tuple size, so that this can
be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#4)
Re: pgbench --tuple-size option

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage, thus I
prefer all benchmarks to run exactly the same straightforward way in all
cases so as to avoid unwanted effects on what I'm trying to measure, which
is already noisy and unstable enough.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Fabien COELHO (#5)
Re: pgbench --tuple-size option

On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote:

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case. But if that's what you're concerned about -
understandably - it seems to make more sense to split -i into two. One
to create the tables, and another to fill them. That'd allow to do
manual stuff inbetween.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#6)
Re: pgbench --tuple-size option

I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case. But if that's what you're concerned about -
understandably -

Indeed, my (long) experience with benchmarks is that it is a much more
complicated that it looks if you want to really understand what you are
getting, and to get anything meaningful.

it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.

Hmmm. This would mean much more changes than the pretty trivial patch I
submitted: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply
to first part while other apply to second part, which would lead in any
case to a signicantly more complicated documentation... a lot of trouble
for my use case to answer Josh pertinent comments, and to be able to test
the "tuple size" factor easily. Moreover, I would reject it myself as too
much trouble for a small benefit.

Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@2ndquadrant.com
In reply to: Fabien COELHO (#7)
Re: pgbench --tuple-size option

On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:

it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.

Hmmm. This would mean much more changes than the pretty trivial patch I
submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply to
first part while other apply to second part, which would lead in any case to
a signicantly more complicated documentation... a lot of trouble for my use
case to answer Josh pertinent comments, and to be able to test the "tuple
size" factor easily. Moreover, I would reject it myself as too much trouble
for a small benefit.

Well, it's something more generic, because it allows you do do more...

Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).

I personally think this isn't worth the price. But I'm just one guy.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Andres Freund (#8)
Re: pgbench --tuple-size option

On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:

it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.

Hmmm. This would mean much more changes than the pretty trivial patch I
submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply to
first part while other apply to second part, which would lead in any case to
a signicantly more complicated documentation... a lot of trouble for my use
case to answer Josh pertinent comments, and to be able to test the "tuple
size" factor easily. Moreover, I would reject it myself as too much trouble
for a small benefit.

Well, it's something more generic, because it allows you do do more...

Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).

I personally think this isn't worth the price. But I'm just one guy.

I also don't like this feature. The benefit of this option seems too small.
If we apply this, we might want to support other options, for example,
option to change the data type of each column, option to create new
index using "minmax", option to change the fillfactor of each table, ...etc.
There are countless such options, but I'm afraid that it's really hard to
support so many options.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#8)
Re: pgbench --tuple-size option

Hmmm. This would mean much more changes than the pretty trivial patch I
submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

Indeed. I'm not too proud of that very point either:-) You are right that
it deserves at the minimum a clear comment. To put the varying size in the
DDL string means vsprintf and splitting the query building some more,
which I do not find desirable.

[...]
Well, it's something more generic, because it allows you do do more...

Apart from I do not need it (at least right now), and that it is more
work, my opinion is that it would be rejected. Not a strong insentive to
spend time in that direction.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#6)
Re: pgbench --tuple-size option

The custom initialization is to run a manual ALTER after the
initialization.

Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case.

After some testing and laughing, my conclusion is "not fine at all". The
"filler" attributes in "pgbench" are by default "EXTENDED", which mean
possibly compressed... As the the default value is '', the compression,
when tried for large sizes, performs very well, and the performance is the
same as with a (declared) smaller tuple:-) Probably not the intention of
the benchmark designer. Conclusion: I need an ALTER TABLE anyway to change
the STORAGE. Or maybe pgbench should always do it anyway...

Conclusion 2: I've noted the submission as "rejected" as both you and
Fujii don't like it, and although I found it useful, but I can do without
it quite easily.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers