pgbench --tuple-size option
After publishing some test results with pgbench on SSD with varying page
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples,
and that results may be different with other tuple sizes.
This patch adds an option to change the default tuple size, so that this
can be tested easily.
--
Fabien.
Attachments:
pgbench-tupsize-1.patchtext/x-diff; charset=us-ascii; name=pgbench-tupsize-1.patchDownload
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2f7d80e..709022c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -119,6 +119,10 @@ int scale = 1;
*/
int fillfactor = 100;
+/* approximate number of bytes for rows
+ */
+int tupsize = 100;
+
/*
* create foreign key constraints on the tables?
*/
@@ -359,6 +363,7 @@ usage(void)
" create indexes in the specified tablespace\n"
" --tablespace=TABLESPACE create tables in the specified tablespace\n"
" --unlogged-tables create tables as unlogged tables\n"
+ " --tuple-size=NUM target tuple size (default: 100)\n"
"\nBenchmarking options:\n"
" -c, --client=NUM number of concurrent database clients (default: 1)\n"
" -C, --connect establish new connection for each transaction\n"
@@ -1704,32 +1709,37 @@ init(bool is_no_vacuum)
const char *table; /* table name */
const char *smcols; /* column decls if accountIDs are 32 bits */
const char *bigcols; /* column decls if accountIDs are 64 bits */
- int declare_fillfactor;
+ int declare_fillfactor; /* whether to add a fillfactor */
+ int tupsize_correction; /* tupsize correction */
};
static const struct ddlinfo DDLs[] = {
{
"pgbench_history",
- "tid int,bid int,aid int,delta int,mtime timestamp,filler char(22)",
- "tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22)",
- 0
+ "tid int,bid int,aid int,delta int,mtime timestamp,filler char",
+ "tid int,bid int,aid bigint,delta int,mtime timestamp,filler char",
+ 0,
+ 78
},
{
"pgbench_tellers",
- "tid int not null,bid int,tbalance int,filler char(84)",
- "tid int not null,bid int,tbalance int,filler char(84)",
- 1
+ "tid int not null,bid int,tbalance int,filler char",
+ "tid int not null,bid int,tbalance int,filler char",
+ 1,
+ 16
},
{
"pgbench_accounts",
- "aid int not null,bid int,abalance int,filler char(84)",
- "aid bigint not null,bid int,abalance int,filler char(84)",
- 1
+ "aid int not null,bid int,abalance int,filler char",
+ "aid bigint not null,bid int,abalance int,filler char",
+ 1,
+ 16
},
{
"pgbench_branches",
- "bid int not null,bbalance int,filler char(88)",
- "bid int not null,bbalance int,filler char(88)",
- 1
+ "bid int not null,bbalance int,filler char",
+ "bid int not null,bbalance int,filler char",
+ 1,
+ 12
}
};
static const char *const DDLINDEXes[] = {
@@ -1767,6 +1777,9 @@ init(bool is_no_vacuum)
char buffer[256];
const struct ddlinfo *ddl = &DDLs[i];
const char *cols;
+ int ts = tupsize - ddl->tupsize_correction;
+
+ if (ts < 1) ts = 1;
/* Remove old table, if it exists. */
snprintf(buffer, sizeof(buffer), "drop table if exists %s", ddl->table);
@@ -1790,9 +1803,9 @@ init(bool is_no_vacuum)
cols = (scale >= SCALE_32BIT_THRESHOLD) ? ddl->bigcols : ddl->smcols;
- snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s",
+ snprintf(buffer, sizeof(buffer), "create%s table %s(%s(%d))%s",
unlogged_tables ? " unlogged" : "",
- ddl->table, cols, opts);
+ ddl->table, cols, ts, opts);
executeStatement(con, buffer);
}
@@ -2504,6 +2517,7 @@ main(int argc, char **argv)
{"unlogged-tables", no_argument, &unlogged_tables, 1},
{"sampling-rate", required_argument, NULL, 4},
{"aggregate-interval", required_argument, NULL, 5},
+ {"tuple-size", required_argument, NULL, 6},
{"rate", required_argument, NULL, 'R'},
{NULL, 0, NULL, 0}
};
@@ -2811,6 +2825,15 @@ main(int argc, char **argv)
}
#endif
break;
+ case 6:
+ initialization_option_set = true;
+ tupsize = atoi(optarg);
+ if (tupsize <= 0)
+ {
+ fprintf(stderr, "invalid tuple size: %d\n", tupsize);
+ exit(1);
+ }
+ break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 23bfa9e..e6210e7 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -515,6 +515,15 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
</varlistentry>
<varlistentry>
+ <term><option>--tuple-size=</option><replaceable>size</></term>
+ <listitem>
+ <para>
+ Set target size of tuples. Default is 100 bytes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-v</option></term>
<term><option>--vacuum-all</option></term>
<listitem>
On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote:
After publishing some test results with pgbench on SSD with varying page
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and
that results may be different with other tuple sizes.This patch adds an option to change the default tuple size, so that this can
be tested easily.
I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Andres,
This patch adds an option to change the default tuple size, so that this can
be tested easily.I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.
The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument
does not apply here.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote:
Hello Andres,
This patch adds an option to change the default tuple size, so that this can
be tested easily.I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.
The custom initialization is to run a manual ALTER after the
initialization.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.The custom initialization is to run a manual ALTER after the
initialization.
Sure, it can be done this way.
I'm not sure about the implication of ALTER on the table storage, thus I
prefer all benchmarks to run exactly the same straightforward way in all
cases so as to avoid unwanted effects on what I'm trying to measure, which
is already noisy and unstable enough.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote:
I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.The custom initialization is to run a manual ALTER after the
initialization.Sure, it can be done this way.
I'm not sure about the implication of ALTER on the table storage,
Should be fine in this case. But if that's what you're concerned about -
understandably - it seems to make more sense to split -i into two. One
to create the tables, and another to fill them. That'd allow to do
manual stuff inbetween.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I'm not sure about the implication of ALTER on the table storage,
Should be fine in this case. But if that's what you're concerned about -
understandably -
Indeed, my (long) experience with benchmarks is that it is a much more
complicated that it looks if you want to really understand what you are
getting, and to get anything meaningful.
it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.
Hmmm. This would mean much more changes than the pretty trivial patch I
submitted: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply
to first part while other apply to second part, which would lead in any
case to a signicantly more complicated documentation... a lot of trouble
for my use case to answer Josh pertinent comments, and to be able to test
the "tuple size" factor easily. Moreover, I would reject it myself as too
much trouble for a small benefit.
Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.Hmmm. This would mean much more changes than the pretty trivial patch I
submitted
FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.
: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply to
first part while other apply to second part, which would lead in any case to
a signicantly more complicated documentation... a lot of trouble for my use
case to answer Josh pertinent comments, and to be able to test the "tuple
size" factor easily. Moreover, I would reject it myself as too much trouble
for a small benefit.
Well, it's something more generic, because it allows you do do more...
Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).
I personally think this isn't worth the price. But I'm just one guy.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
it seems to make more sense to split -i into two. One to create the
tables, and another to fill them. That'd allow to do manual stuff
inbetween.Hmmm. This would mean much more changes than the pretty trivial patch I
submittedFWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.: more options (2 parts init + compatibility with the previous
case), splitting the "init" function, having a dependency and new error
cases to check (you must have the table to fill them), some options apply to
first part while other apply to second part, which would lead in any case to
a signicantly more complicated documentation... a lot of trouble for my use
case to answer Josh pertinent comments, and to be able to test the "tuple
size" factor easily. Moreover, I would reject it myself as too much trouble
for a small benefit.Well, it's something more generic, because it allows you do do more...
Feel free to reject the patch if you do not want it. I think that its
cost/benefit is reasonable (one small option, small code changes, some
benefit for people who want to measure performance in various cases).I personally think this isn't worth the price. But I'm just one guy.
I also don't like this feature. The benefit of this option seems too small.
If we apply this, we might want to support other options, for example,
option to change the data type of each column, option to create new
index using "minmax", option to change the fillfactor of each table, ...etc.
There are countless such options, but I'm afraid that it's really hard to
support so many options.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hmmm. This would mean much more changes than the pretty trivial patch I
submittedFWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.
Indeed. I'm not too proud of that very point either:-) You are right that
it deserves at the minimum a clear comment. To put the varying size in the
DDL string means vsprintf and splitting the query building some more,
which I do not find desirable.
[...]
Well, it's something more generic, because it allows you do do more...
Apart from I do not need it (at least right now), and that it is more
work, my opinion is that it would be rejected. Not a strong insentive to
spend time in that direction.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The custom initialization is to run a manual ALTER after the
initialization.Sure, it can be done this way.
I'm not sure about the implication of ALTER on the table storage,
Should be fine in this case.
After some testing and laughing, my conclusion is "not fine at all". The
"filler" attributes in "pgbench" are by default "EXTENDED", which mean
possibly compressed... As the the default value is '', the compression,
when tried for large sizes, performs very well, and the performance is the
same as with a (declared) smaller tuple:-) Probably not the intention of
the benchmark designer. Conclusion: I need an ALTER TABLE anyway to change
the STORAGE. Or maybe pgbench should always do it anyway...
Conclusion 2: I've noted the submission as "rejected" as both you and
Fujii don't like it, and although I found it useful, but I can do without
it quite easily.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers