pgbench - implement strict TPC-B benchmark

Started by Fabien COELHOalmost 7 years ago26 messages

coelho@cri.ensmp.fr

almost 7 years ago

1 attachment(s)

Hello devs,

The attached patch does $SUBJECT, as a showcase for recently added
features, including advanced expressions (CASE...), \if, \gset, ending SQL
commands at ";"...

There is also a small fix to the doc which describes the tpcb-like
implementation but gets one variable name wrong: balance -> delta.

--
Fabien.

Attachments:

pgbench-strict-tpcb-1.patchtext/x-diff; name=pgbench-strict-tpcb-1.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ee2501be55..29f4168e76 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -347,7 +347,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         An optional integer weight after <literal>@</literal> allows to adjust the
         probability of drawing the script.  If not specified, it is set to 1.
         Available built-in scripts are: <literal>tpcb-like</literal>,
-        <literal>simple-update</literal> and <literal>select-only</literal>.
+        <literal>simple-update</literal>, <literal>select-only</literal> and
+        <literal>strict-tpcb</literal>.
         Unambiguous prefixes of built-in names are accepted.
         With special name <literal>list</literal>, show the list of built-in scripts
         and exit immediately.
@@ -843,7 +844,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
   <para>
    The default built-in transaction script (also invoked with <option>-b tpcb-like</option>)
    issues seven commands per transaction over randomly chosen <literal>aid</literal>,
-   <literal>tid</literal>, <literal>bid</literal> and <literal>balance</literal>.
+   <literal>tid</literal>, <literal>bid</literal> and <literal>delta</literal>.
    The scenario is inspired by the TPC-B benchmark, but is not actually TPC-B,
    hence the name.
   </para>
@@ -869,6 +870,14 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
    If you select the <literal>select-only</literal> built-in (also <option>-S</option>),
    only the <command>SELECT</command> is issued.
   </para>
+
+  <para>
+   If you select the <literal>strict-tpcb</literal> built-in, the SQL commands are similar
+   to <literal>tpcb-like</literal>, but the <literal>delta</literal> amount is 6 digits,
+   the account branch has a 15% probability to be in the same branch as the teller (unless
+   there is only one branch in which case it is always the same), and the final account
+   balance is actually extracted by the script.
+  </para>
  </refsect2>
 
  <refsect2>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 99529de52a..039dc32393 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -566,6 +566,47 @@ static const BuiltinScript builtin_script[] =
 		"<builtin: select only>",
 		"\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
 		"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+	},
+	{
+		/*---
+		 * Standard conforming TPC-B benchmark script:
+		 *
+		 * The account branch has a 15% probability to be the same as the
+		 * teller branch.
+		 * The amount is in [-999999, 999999].
+		 * The final account balance is actually extracted by the driver.
+		 */
+		"strict-tpcb",
+		"<builtin: strict TPC-B>",
+		/* choose teller and compute its branch */
+		"\\set tid random(1, " CppAsString2(ntellers) " * :scale)\n"
+		"\\set btid 1 + (:tid - 1) / (" CppAsString2(ntellers) " / " CppAsString2(nbranches) ")\n"
+		/* choose account branch: 85% different from teller unless there is only one branch */
+		"\\if :scale = 1 or random(1, 100) > 85\n"
+		"\\set bid :btid\n"
+		"\\else\n"
+		"\\set bid random(1, " CppAsString2(nbranches) " * :scale - 1)\n"
+		"\\set bid :bid + case when :bid >= :btid then 1 else 0 end\n"
+		"\\endif\n"
+		/* choose account within branch */
+		"\\set lid random(1, 100000)\n"
+		"\\set aid (:bid - 1) * " CppAsString2(naccounts) "/" CppAsString2(nbranches) " + :lid\n"
+		"\\set delta random(-999999, 999999)\n"
+		/* it should probably be combined, but that breaks -M prepared */
+		"BEGIN;\n"
+		"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)\n"
+		"  VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+		"UPDATE pgbench_accounts\n"
+		"  SET abalance = abalance + :delta\n"
+		"  WHERE aid = :aid\n"
+		"  RETURNING abalance\\gset\n"
+		"UPDATE pgbench_tellers\n"
+		"  SET tbalance = tbalance + :delta\n"
+		"  WHERE tid = :tid;\n"
+		"UPDATE pgbench_branches\n"
+		"  SET bbalance = bbalance + :delta\n"
+		"  WHERE bid = :bid;\n"
+		"END;\n"
 	}
 };

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#1)

Re: pgbench - implement strict TPC-B benchmark

On Tue, Apr 9, 2019 at 3:58 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The attached patch does $SUBJECT, as a showcase for recently added
features, including advanced expressions (CASE...), \if, \gset, ending SQL
commands at ";"...

Hi Fabien,

+ the account branch has a 15% probability to be in the same branch
as the teller (unless

I would say "... has a 15% probability of being in the same ...". The
same wording appears further down in the comment.

I see that the parameters you propose match the TPCB 2.0
description[1]http://www.tpc.org/tpcb/spec/tpcb_current.pdf, and the account balance was indeed supposed to be
returned to the driver. I wonder if "strict" is the right name here
though. "tpcb-like-2" at least leaves room for someone to propose yet
another variant, and still includes the "-like" disclaimer, which I
interpret as a way of making it clear that this benchmark and results
produced by it have no official TPC audited status.

There is also a small fix to the doc which describes the tpcb-like
implementation but gets one variable name wrong: balance -> delta.

Agreed. I committed that part. Thanks!

[1]: http://www.tpc.org/tpcb/spec/tpcb_current.pdf

--
Thomas Munro
https://enterprisedb.com

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Thomas Munro (#2)

1 attachment(s)

Re: pgbench - implement strict TPC-B benchmark

Hello Thomas,

Thanks for the feedback.

+ the account branch has a 15% probability to be in the same branch
as the teller (unless

I would say "... has a 15% probability of being in the same ...". The
same wording appears further down in the comment.

Fixed.

I see that the parameters you propose match the TPCB 2.0 description[1],
[...]

Nearly:-(

While re-re-re-re-reading the spec, it was 85%, i.e. people mostly go to
their local teller, I managed to get it wrong. Sigh. Fixed. Hopefully.

I've updated the script a little so that it is closer to spec. I've also
changed the script definition so that it still works as expected if
someone changes "nbranches" definition for some reason, even if this
is explicitely discourage by comments.

I wonder if "strict" is the right name here though. "tpcb-like-2" at
least leaves room for someone to propose yet another variant, and still
includes the "-like" disclaimer, which I interpret as a way of making it
clear that this benchmark and results produced by it have no official
TPC audited status.

Hmmm.

The -like suffix is really about the conformance of the script, not the
rest, but that should indeed be clearer. I've expanded the comment and doc
about this with a disclaimers, so that there is no ambiguity about what is
expected to conform, which is only the transaction script.

I have added a comment about the non conformance of the "int" type use for
balances in the initialization phase.

Also, on second thought, I've changed the name to "standard-tpcb", but I'm
unsure whether it is better than "script-tpcb". There is an insentive to
have a different prefix so that "-b t" would not complain of ambiguity
between "tpcb-like*", which would be a regression. This is why I did not
choose the simple "tcp". There may be a "standard-tpcb-2" anyway.

I have added a small test run in the TAP test.

On my TODO list is adding an initialization option to change the balance
type for conformance, by using NUMERIC or integer8.

While thinking about that, an unrelated thought occured to me: adding a
partitioned initialization variant would be nice to test the performance
impact of partitioning easily. I should have thought of that as soon as
partitioning was added. Added to my TODO list as well.

--
Fabien.

Attachments:

pgbench-strict-tpcb-2.patchtext/x-diff; name=pgbench-strict-tpcb-2.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 09af9f1a7d..6ff557b2b3 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -347,7 +347,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         An optional integer weight after <literal>@</literal> allows to adjust the
         probability of drawing the script.  If not specified, it is set to 1.
         Available built-in scripts are: <literal>tpcb-like</literal>,
-        <literal>simple-update</literal> and <literal>select-only</literal>.
+        <literal>simple-update</literal>, <literal>select-only</literal> and
+        <literal>standard-tpcb</literal>.
         Unambiguous prefixes of built-in names are accepted.
         With special name <literal>list</literal>, show the list of built-in scripts
         and exit immediately.
@@ -869,6 +870,25 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
    If you select the <literal>select-only</literal> built-in (also <option>-S</option>),
    only the <command>SELECT</command> is issued.
   </para>
+
+  <para>
+   If you select the <literal>standard-tpcb</literal> built-in, the SQL commands are similar
+   to <literal>tpcb-like</literal>, but the <literal>delta</literal> amount is 6 digits,
+   the account branch has a 85% probability of being in the same branch as the teller (unless
+   there is only one branch in which case it is always the same), and the final account
+   balance is actually extracted by the script.
+  </para>
+
+  <note>
+   <para>
+    Disclaimer: with <literal>standard-tpcb</literal>, only the transaction script
+    is expected to conform to the "TPC Benchmark(tm) B revision 2.0" specification.
+    Other parts of the benchmark, such as type capabilities, initialization,
+    performance data collection, checks on final values, database configuration,
+    test duration... may or may not conform to the standard depending on the actual
+    run.
+   </para>
+  </note>
  </refsect2>
 
  <refsect2>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b84658ccd..4888beb129 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -193,8 +193,12 @@ int64		random_seed = -1;
  * end of configurable parameters
  *********************************************************************/
 
-#define nbranches	1			/* Makes little sense to change this.  Change
-								 * -s instead */
+/*
+ * It makes little sense to change "nbranches", use -s instead.
+ *
+ * Nevertheless, "ntellers" and "naccounts" must be divisible by "nbranches".
+ */
+#define nbranches	1
 #define ntellers	10
 #define naccounts	100000
 
@@ -566,6 +570,61 @@ static const BuiltinScript builtin_script[] =
 		"<builtin: select only>",
 		"\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
 		"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+	},
+	{
+		/*---
+		 * Standard conforming TPC-B benchmark script as defined in
+		 * "TPC BENCHMARK (tm) B Standard Specification Revision 2.0"
+		 * from 7 June 1994, see http://www.tpc.org/tpcb/spec/tpcb_current.pdf
+		 *
+		 * The transaction profile is defined Section 1.2.
+		 *
+		 * Transaction inputs are defined in Section 5.3:
+		 *
+		 * - The account branch has a 85% probability of being in the same
+		 *   branch as the teller.
+		 * - The delta amount is in [-999999, 999999].
+		 *
+		 * By application of very explicit Section 1.3.2, the final account
+		 * balance is actually extracted by the driver with \gset.
+		 *
+		 * DISCLAIMER: only the transaction script is expected to conform to the
+		 * specification. Other parts (types capabilities, initialization,
+		 * performance data collection, database configuration, checks on final
+		 * values, ...) may or may not conform to the requirements of the
+		 * benchmark depending on the actual benchmark run.
+		 */
+		"standard-tpcb",
+		"<builtin: standard TPC-B>",
+		/* choose teller and compute its branch */
+		"\\set tid random(1, " CppAsString2(ntellers) " * :scale)\n"
+		"\\set btid 1 + (:tid - 1) / (" CppAsString2(ntellers) " / " CppAsString2(nbranches) ")\n"
+		/* choose account branch: 85% same as teller unless there is only one branch */
+		"\\if random(0, 99) < 85 or :scale * " CppAsString2(nbranches)" = 1\n"
+		"\\set bid :btid\n"
+		"\\else\n"
+		"\\set bid random(1, " CppAsString2(nbranches) " * :scale - 1)\n"
+		"\\set bid :bid + case when :bid >= :btid then 1 else 0 end\n"
+		"\\endif\n"
+		/* choose account within branch */
+		"\\set lid random(1, " CppAsString2(naccounts) "/" CppAsString2(nbranches) ")\n"
+		"\\set aid (:bid - 1) * " CppAsString2(naccounts) "/" CppAsString2(nbranches) " + :lid\n"
+		"\\set delta random(-999999, 999999)\n"
+		/* it should probably be combined, but that currently breaks -M prepared */
+		"BEGIN;\n"
+		"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)\n"
+		"  VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+		"UPDATE pgbench_accounts\n"
+		"  SET abalance = abalance + :delta\n"
+		"  WHERE aid = :aid\n"
+		"  RETURNING abalance\\gset\n"
+		"UPDATE pgbench_tellers\n"
+		"  SET tbalance = tbalance + :delta\n"
+		"  WHERE tid = :tid;\n"
+		"UPDATE pgbench_branches\n"
+		"  SET bbalance = bbalance + :delta\n"
+		"  WHERE bid = :bid;\n"
+		"END;\n"
 	}
 };
 
@@ -3616,6 +3675,11 @@ initCreateTables(PGconn *con)
 	 * would completely break comparability of pgbench results with prior
 	 * versions. Since pgbench has never pretended to be fully TPC-B compliant
 	 * anyway, we stick with the historical behavior.
+	 *
+	 * Note 2: also according to spec, balances must hold 10 decimal digits
+	 * plus sign.  The 4-bytes "int" type used below does not conform.  Given
+	 * the default settings and the random walk (aka Drunkard's Walk) theorem,
+	 * some balances may overflow on long runs.
 	 */
 	struct ddlinfo
 	{
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3b097a91b2..db236ef5db 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -171,6 +171,22 @@ pgbench(
 	],
 	'pgbench select only');
 
+pgbench(
+	'-t 10 -c 3 -M prepared -b st -r',
+	0,
+	[
+		qr{builtin: standard TPC-B},
+		qr{clients: 3\b},
+		qr{threads: 1\b},
+		qr{processed: 30/30},
+		qr{mode: prepared},
+		qr{\\if random\(0, 99\) < 85}
+	],
+	[
+		qr{vacuum}
+	],
+	'pgbench standard tpc-b');
+
 # check if threads are supported
 my $nthreads = 2;

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Fabien COELHO (#3)

Re: pgbench - implement strict TPC-B benchmark

Fabien COELHO <coelho@cri.ensmp.fr> writes:

[ pgbench-strict-tpcb-2.patch ]

TBH, I think we should reject this patch. Nobody cares about TPC-B
anymore, and they care even less about differences between one
sort-of-TPC-B test and another sort-of-TPC-B test. (As the lack
of response on this thread shows.) We don't need this kind of
baggage in pgbench; it's got too many "features" already.

I'm also highly dubious about labeling this script "standard TPC-B",
when it resolves only some of the reasons why our traditional script
is not really TPC-B. That's treading on being false advertising.

regards, tom lane

Peter Geoghegan

pg@bowt.ie

over 6 years ago

In reply to: Tom Lane (#4)

Re: pgbench - implement strict TPC-B benchmark

On Tue, Jul 30, 2019 at 3:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

TBH, I think we should reject this patch. Nobody cares about TPC-B
anymore, and they care even less about differences between one
sort-of-TPC-B test and another sort-of-TPC-B test. (As the lack
of response on this thread shows.) We don't need this kind of
baggage in pgbench; it's got too many "features" already.

+1. TPC-B was officially made obsolete in 1995.

I'm also highly dubious about labeling this script "standard TPC-B",
when it resolves only some of the reasons why our traditional script
is not really TPC-B. That's treading on being false advertising.

IANAL, but it may not even be permissible to claim that we have
implemented "standard TPC-B".

--
Peter Geoghegan

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Peter Geoghegan (#5)

Re: pgbench - implement strict TPC-B benchmark

Peter Geoghegan <pg@bowt.ie> writes:

On Tue, Jul 30, 2019 at 3:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm also highly dubious about labeling this script "standard TPC-B",
when it resolves only some of the reasons why our traditional script
is not really TPC-B. That's treading on being false advertising.

IANAL, but it may not even be permissible to claim that we have
implemented "standard TPC-B".

Yeah, very likely you can't legally say that unless the TPC
has certified your test. (Our existing code and docs are quite
careful to call pgbench's version "TPC-like" or similar weasel
wording, and never claim that it is really TPC-B or even a close
approximation.)

regards, tom lane

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Tom Lane (#6)

Re: pgbench - implement strict TPC-B benchmark

Hello Tom,

I'm also highly dubious about labeling this script "standard TPC-B",
when it resolves only some of the reasons why our traditional script
is not really TPC-B. That's treading on being false advertising.

IANAL, but it may not even be permissible to claim that we have
implemented "standard TPC-B".

Yeah, very likely you can't legally say that unless the TPC
has certified your test. (Our existing code and docs are quite
careful to call pgbench's version "TPC-like" or similar weasel
wording, and never claim that it is really TPC-B or even a close
approximation.)

Hmmm.

I agree that nobody really cares about TPC-B per se. The point of this
patch is to provide a built-in example of recent and useful pgbench
features that match a real specification.

The "strict" only refers to the test script. It cannot match the whole
spec which addresses many other things, some of them more process than
tool: table creation and data types, performance data collection, database
configuration, pricing of hardware used in the tests, post-benchmark run
checks, auditing constraints, whatever…

[about pgbench] it's got too many "features" already.

I disagree with this judgement.

Although not all features are that useful, the accumulation of recent
additions (int/float/bool expressions, \if, \gset, non uniform prng, …)
makes it suitable for testing various realistic scenarii which could not
be tested before. As said above, my point was to have a builtin
illustration of available capabilities.

It did not occur to me that a scripts which implements "strictly" a
particular section of a 25-year obsolete benchmark could raise any legal
issue.

--
Fabien.

Jonah H. Harris

jonah.harris@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#7)

Re: pgbench - implement strict TPC-B benchmark

On Wed, Jul 31, 2019 at 10:10 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Tom,

I'm also highly dubious about labeling this script "standard TPC-B",
when it resolves only some of the reasons why our traditional script
is not really TPC-B. That's treading on being false advertising.

IANAL, but it may not even be permissible to claim that we have
implemented "standard TPC-B".

Yeah, very likely you can't legally say that unless the TPC
has certified your test. (Our existing code and docs are quite
careful to call pgbench's version "TPC-like" or similar weasel
wording, and never claim that it is really TPC-B or even a close
approximation.)

Hmmm.

I agree that nobody really cares about TPC-B per se. The point of this
patch is to provide a built-in example of recent and useful pgbench
features that match a real specification.

I agree with this. When I was at EnterpriseDB, while it wasn't audited, we
had to develop an actual TPC-B implementation because pgbench was too
different. pgbench itself isn't that useful as a benchmark tool, imo, but
if we have the ability to make it better (i.e. closer to an actual
benchmark kit), I think we should.

--
Jonah H. Harris

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Jonah H. Harris (#8)

Re: pgbench - implement strict TPC-B benchmark

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

On Wed, Jul 31, 2019 at 10:10 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I agree that nobody really cares about TPC-B per se. The point of this
patch is to provide a built-in example of recent and useful pgbench
features that match a real specification.

I agree with this. When I was at EnterpriseDB, while it wasn't audited, we
had to develop an actual TPC-B implementation because pgbench was too
different. pgbench itself isn't that useful as a benchmark tool, imo, but
if we have the ability to make it better (i.e. closer to an actual
benchmark kit), I think we should.

[ shrug... ] TBH, the proposed patch does not look to me like actual
benchmark kit; it looks like a toy. Nobody who was intent on making their
benchmark numbers look good would do a significant amount of work in a
slow, ad-hoc interpreted language. I also wonder to what extent the
numbers would reflect pgbench itself being the bottleneck. Which is
really the fundamental problem I've got with all the stuff that's been
crammed into pgbench of late --- the more computation you're doing there,
the less your results measure the server's capabilities rather than
pgbench's implementation details.

In any case, even if I were in love with the script itself, we cannot
commit something that claims to be "standard TPC-B". It needs weasel
wording that makes it clear that it isn't TPC-B, and then you have a
problem of user confusion about why we have both not-quite-TPC-B-1
and not-quite-TPC-B-2, and which one to use, or which one was used in
somebody else's tests.

I think if you want to show off what these pgbench features are good
for, it'd be better to find some other example that's not in the
midst of a legal minefield.

regards, tom lane

#10

Peter Geoghegan

pg@bowt.ie

over 6 years ago

In reply to: Tom Lane (#9)

Re: pgbench - implement strict TPC-B benchmark

On Wed, Jul 31, 2019 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I agree with this. When I was at EnterpriseDB, while it wasn't audited, we
had to develop an actual TPC-B implementation because pgbench was too
different. pgbench itself isn't that useful as a benchmark tool, imo, but
if we have the ability to make it better (i.e. closer to an actual
benchmark kit), I think we should.

[ shrug... ] TBH, the proposed patch does not look to me like actual
benchmark kit; it looks like a toy. Nobody who was intent on making their
benchmark numbers look good would do a significant amount of work in a
slow, ad-hoc interpreted language.

According to TPC themselves, "In contrast to TPC-A, TPC-B is not an
OLTP benchmark. Rather, TPC-B can be looked at as a database stress
test..." [1]http://www.tpc.org/tpcb/ -- Peter Geoghegan. Sounds like classic pgbench to me.

Not sure where that leaves this patch. What problem is it actually
trying to solve?

[1]: http://www.tpc.org/tpcb/ -- Peter Geoghegan
--
Peter Geoghegan

#11

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Tom Lane (#9)

Re: pgbench - implement strict TPC-B benchmark

Hello Tom,

[ shrug... ] TBH, the proposed patch does not look to me like actual
benchmark kit; it looks like a toy. Nobody who was intent on making their
benchmark numbers look good would do a significant amount of work in a
slow, ad-hoc interpreted language. I also wonder to what extent the
numbers would reflect pgbench itself being the bottleneck.

Which is really the fundamental problem I've got with all the stuff
that's been crammed into pgbench of late --- the more computation you're
doing there, the less your results measure the server's capabilities
rather than pgbench's implementation details.

That is a very good question. It is easy to measure the overhead, for
instance:

sh> time pgbench -r -T 30 -M prepared
...
latency average = 2.425 ms
tps = 412.394420 (including connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.022 BEGIN;
0.061 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.038 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.046 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.042 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.036 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
2.178 END;
real 0m30.080s, user 0m0.406s, sys 0m0.689s

The cost of pgbench interpreted part (\set) is under 1/1000. The full time
of the process itself counts for 1.4%, below the inevitable system time
which is 2.3%. Pgbench overheads are pretty small compared to postgres
connection and command execution, and system time. The above used a local
socket, if it were an actual remote network connection, the gap would be
larger. A profile run could collect more data, but that does not seem
necessary.

Some parts of Pgbench could be optimized, eg for expressions the large
switch could be avoided with precomputed function call, some static
analysis could infer some types and avoid calls to generic functions which
have to tests types, and so on. But franckly I do not think that this is
currently needed so I would not bother unless an actual issue is proven.

Also, pgbench overheads must be compared to an actual client application,
which deals with a database through some language (PHP, Python, JS, Java…)
the interpreter of which would be written in C/C++ just like pgbench, and
some library (ORM, DBI, JDBC…), possibly written in the initial language
and relying on libpq under the hood. Ok, there could be some JIT involved,
but it will not change that there are costs there too, and it would have
to do pretty much the same things that pgbench is doing, plus what the
application has to do with the data.

All in all, pgbench overheads are small compared to postgres processing
times and representative of a reasonably optimized client application.

In any case, even if I were in love with the script itself,

Love is probably not required for a feature demonstration:-)

we cannot commit something that claims to be "standard TPC-B".

Yep, I clearly underestimated this legal aspect.

It needs weasel wording that makes it clear that it isn't TPC-B, and
then you have a problem of user confusion about why we have both
not-quite-TPC-B-1 and not-quite-TPC-B-2, and which one to use, or which
one was used in somebody else's tests.

I agree that confusion is no good either.

I think if you want to show off what these pgbench features are good
for, it'd be better to find some other example that's not in the
midst of a legal minefield.

Yep, I got that.

To try to salvage my illustration idea: I could change the name to "demo",
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use
a non-uniform random generator, and then explicitly say that it is a
vaguely inspired by "TPC-B" and intended as a demo script susceptible to
be updated to illustrate new features (eg if using a non-uniform generator
I'd really like to add a permutation layer if available some day).

This way, the "demo" real intention would be very clear.

--
Fabien.

#12

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Peter Geoghegan (#10)

Re: pgbench - implement strict TPC-B benchmark

According to TPC themselves, "In contrast to TPC-A, TPC-B is not an
OLTP benchmark. Rather, TPC-B can be looked at as a database stress
test..." [1]. Sounds like classic pgbench to me.

Not sure where that leaves this patch. What problem is it actually
trying to solve?

That there is no builtin illustration of pgbench script features that
allow more realistic benchmarking.

--
Fabien.

#13

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#11)

Re: pgbench - implement strict TPC-B benchmark

On Thu, Aug 1, 2019 at 2:53 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

All in all, pgbench overheads are small compared to postgres processing
times and representative of a reasonably optimized client application.

It's pretty easy to devise tests where pgbench is client-limited --
just try running it with threads = clients/4, sometimes even
clients/2. So I don't buy the idea that this is true in general.

To try to salvage my illustration idea: I could change the name to "demo",
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use
a non-uniform random generator, and then explicitly say that it is a
vaguely inspired by "TPC-B" and intended as a demo script susceptible to
be updated to illustrate new features (eg if using a non-uniform generator
I'd really like to add a permutation layer if available some day).

This way, the "demo" real intention would be very clear.

I do not like this idea at all; "demo" is about as generic a name as
imaginable. But I have another idea: how about including this script
in the documentation with some explanatory text that describes (a) the
ways in which it is more faithful to TPC-B than what the normal
pgbench thing and (b) the problems that it doesn't solve, as
enumerated by Fabien upthread:

"table creation and data types, performance data collection, database
configuration, pricing of hardware used in the tests, post-benchmark run
checks, auditing constraints, whatever…"

Perhaps that idea still won't attract any votes, but I throw it out
there for consideration.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Fabien COELHO (#11)

Re: pgbench - implement strict TPC-B benchmark

Hi,

On 2019-08-01 08:52:52 +0200, Fabien COELHO wrote:

sh> time pgbench -r -T 30 -M prepared
...
latency average = 2.425 ms
tps = 412.394420 (including connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.022 BEGIN;
0.061 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.038 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.046 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.042 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.036 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
2.178 END;
real 0m30.080s, user 0m0.406s, sys 0m0.689s

The cost of pgbench interpreted part (\set) is under 1/1000.

I don't put much credence in those numbers for pgbench commands - they
don't include significant parts of the overhead of command
execution. Even just the fact that you need to process more commands
through the pretty slow pgbench interpreter has significant costs.

Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
a pretty significant share.

And before you argue that that's just about a read-only workload:
Servers with either synchronous_commit=off, or with extremely fast WAL
commit due to BBUs/NVMe, are quite common. So you can easily get into
scenarios where pgbench overhead is an issue for read/write workloads too.

With synchronous_commit=off, I e.g. see:

$ PGOPTIONS='-c synchronous_commit=off' /usr/bin/time pgbench -Mprepared -n -c 8 -j 8 pgbench_100 -T 10 -r
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 8
number of threads: 8
duration: 10 s
number of transactions actually processed: 179198
latency average = 0.447 ms
tps = 17892.824470 (including connections establishing)
tps = 17908.086839 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.042 BEGIN;
0.086 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.061 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.070 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.070 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.058 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.054 END;
6.65user 10.64system 0:10.02elapsed 172%CPU (0avgtext+0avgdata 4588maxresident)k
0inputs+0outputs (0major+367minor)pagefaults 0swaps

And the largest part of the overhead is in pgbench's interpreter loop:
+   12.35%  pgbench  pgbench                [.] threadRun
+    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
+    3.54%  pgbench  pgbench                [.] dopr.constprop.0
+    3.30%  pgbench  pgbench                [.] fmtint
+    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
+    2.95%  pgbench  libc-2.28.so           [.] malloc
+    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
+    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
+    1.93%  pgbench  pgbench                [.] getVariable
+    1.85%  pgbench  libc-2.28.so           [.] ppoll
+    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
+    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv
+    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
+    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt

And that's the just the standard pgbench read/write case, without
additional script commands or anything.

The full time
of the process itself counts for 1.4%, below the inevitable system time
which is 2.3%. Pgbench overheads are pretty small compared to postgres
connection and command execution, and system time. The above used a local
socket, if it were an actual remote network connection, the gap would be
larger. A profile run could collect more data, but that does not seem
necessary.

Well, duh, that's because you're completely IO bound. You're doing
400tps. That's *nothing*. All you're measuring is how fast the WAL can
be fdatasync()ed to disk. Of *course* pgbench isn't a relevant overhead
in that case. I really don't understand how this can be an argument.

Also, pgbench overheads must be compared to an actual client application,
which deals with a database through some language (PHP, Python, JS, Java…)
the interpreter of which would be written in C/C++ just like pgbench, and
some library (ORM, DBI, JDBC…), possibly written in the initial language and
relying on libpq under the hood. Ok, there could be some JIT involved, but
it will not change that there are costs there too, and it would have to do
pretty much the same things that pgbench is doing, plus what the application
has to do with the data.

Uh, but those clients aren't all running on a single machine.

Greetings,

Andres Freund

#15

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Robert Haas (#13)

Re: pgbench - implement strict TPC-B benchmark

Hello Robert,

All in all, pgbench overheads are small compared to postgres processing
times and representative of a reasonably optimized client application.

It's pretty easy to devise tests where pgbench is client-limited --
just try running it with threads = clients/4, sometimes even
clients/2. So I don't buy the idea that this is true in general.

Ok, one thread cannot feed an N core server if enough client are executed
per thread and the server has few things to do.

The point I'm clumsily trying to make is that pgbench-specific overheads
are quite small: Any benchmark driver would have pretty much at least the
same costs, because you have the cpu cost of the tool itself, then the
library it uses, eg lib{pq,c}, then syscalls. Even if the first costs are
reduced to zero, you still have to deal with the database through the
system, and this part will be the same.

As the cost of pgbench itself in a reduced part of the total cpu costs of
running the bench client side, there is no extraordinary improvement to
expect by optimizing this part. This does not mean that pgbench
performance should not be improved, if possible and maintainable.

I'll develop a little more that point in an answer to Andres figures,
which are very interesting, by providing some more figures.

To try to salvage my illustration idea: I could change the name to "demo",
i.e. quite far from "TPC-B", do some extensions to make it differ, eg use
a non-uniform random generator, and then explicitly say that it is a
vaguely inspired by "TPC-B" and intended as a demo script susceptible to
be updated to illustrate new features (eg if using a non-uniform generator
I'd really like to add a permutation layer if available some day).

This way, the "demo" real intention would be very clear.

I do not like this idea at all; "demo" is about as generic a name as
imaginable.

What name would you suggest, if it were to be made available from pgbench
as a builtin, that avoids confusion with "tpcb-like"?

But I have another idea: how about including this script in the
documentation with some explanatory text that describes (a) the ways in
which it is more faithful to TPC-B than what the normal pgbench thing
and (b) the problems that it doesn't solve, as enumerated by Fabien
upthread:

We can put more examples in the documentation, ok.

One of the issue raised by Tom is that claiming faithfulness to TCP-B is
prone to legal issues. Franckly, I do not care about TPC-B, only that it
is a *real* benchmark, and that it allows to illustrate pgbench
capabilities.

Another point is confusion if there are two tpcb-like scripts provided.

So I'm fine with giving up any claim about faithfulness, especially as it
would allow the "demo" script to be more didactic and illustrate more
of pgbench capabilities.

"table creation and data types, performance data collection, database
configuration, pricing of hardware used in the tests, post-benchmark run
checks, auditing constraints, whatever…"

I already put such caveats in comments and in the documentation, but that
does not seem to be enough for Tom.

Perhaps that idea still won't attract any votes, but I throw it out
there for consideration.

I think that adding an illustration section could be fine, but ISTM that
it would still be appropriate to have the example executable. Moreover, I
think that your idea does not fixes the "we need not to make too much
claims about TPC-B to avoid potential legal issues".

--
Fabien.

#16

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Andres Freund (#14)

Re: pgbench - implement strict TPC-B benchmark

Hello Andres,

Thanks a lot for these feedbacks and comments.

Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
a pretty significant share.

Fine, but what is the corresponding server load? 211%? 611%? And what
actual time is spent in pgbench itself, vs libpq and syscalls?

Figures and discussion below.

And before you argue that that's just about a read-only workload:

I'm fine with worth case scenarii:-) Let's do the worse for my 2 cores
running at 2.2 GHz laptop:

(0) we can run a really do nearly nothing script:

sh> cat nope.sql
\sleep 0
# do not sleep, so stay awake…

sh> time pgbench -f nope.sql -T 10 -r
latency average = 0.000 ms
tps = 12569499.226367 (excluding connections establishing) # 12.6M
statement latencies in milliseconds:
0.000 \sleep 0
real 0m10.072s, user 0m10.027s, sys 0m0.012s

Unsurprisingly pgbench is at about 100% cpu load, and the transaction cost
(transaction loop and stat collection) is 0.080 µs (1/12.6M) per script
execution (one client on one thread).

(1) a pgbench complex-commands-only script:

sh> cat set.sql
\set x random_exponential(1, :scale * 10, 2.5) + 2.1
\set y random(1, 9) + 17.1 * :x
\set z case when :x > 7 then 1.0 / ln(:y) else 2.0 / sqrt(:y) end

sh> time pgbench -f set.sql -T 10 -r
latency average = 0.001 ms
tps = 1304989.729560 (excluding connections establishing) # 1.3M
statement latencies in milliseconds:
0.000 \set x random_exponential(1, :scale * 10, 2.5) + 2.1
0.000 \set y random(1, 9) + 17.1 * :x
0.000 \set z case when :x > 7 then 1.0 / ln(:y) else 2.0 / sqrt(:y) end
real 0m10.038s, user 0m10.003s, sys 0m0.000s

Again pgbench load is near 100%, with only pgbench stuff (thread loop,
expression evaluation, variables, stat collection) costing about 0.766 µs
cpu per script execution. This is about 10 times the previous case, 90% of
pgbench cpu cost is in expressions and variables, not a surprise.

Probably this under-a-µs could be reduced… but what overall improvements
would it provide? An answer with the last test:

(2) a ridiculously small SQL query, tested through a local unix socket:

sh> cat empty.sql
;
# yep, an empty query!

sh> time pgbench -f empty.sql -T 10 -r
latency average = 0.016 ms
tps = 62206.501709 (excluding connections establishing) # 62.2K
statement latencies in milliseconds:
0.016 ;
real 0m10.038s, user 0m1.754s, sys 0m3.867s

We are adding minimal libpq and underlying system calls to pgbench
internal cpu costs in the most favorable (or worst:-) sql query with the
most favorable postgres connection.

Apparent load is about (1.754+3.867)/10.038 = 56%, so the cpu cost per
script is 0.56 / 62206.5 = 9 µs, over 100 times the cost of a do-nothing
script (0), and over 10 times the cost of a complex expression command
script (1).

Conclusion: pgbench-specific overheads are typically (much) below 10% of
the total client-side cpu cost of a transaction, while over 90% of the cpu
cost is spent in libpq and system, for the worst case do-nothing query.

A perfect bench driver which would have zero overheads would reduce the
cpu cost by at most 10%, because you still have to talk to the database.
through the system. If pgbench cost were divided by two, which would be a
reasonable achievement, the benchmark client cost would be reduced by 5%.

Wow?

I have already given some thought in the past to optimize "pgbench",
especially to avoid long switches (eg in expression evaluation) and maybe
to improve variable management, but as show above I would not expect a
gain worth the effort and assume that a patch would probably be justly
rejected, because for a realistic benchmark script these costs are already
much less than other inevitable libpq/syscall costs.

That does not mean that nothing needs to be done, but the situation is
currently quite good.

In conclusion, ISTM that current pgbench allows to saturate a postgres
server with a client significantly smaller than the server, which seems
like a reasonable benchmarking situation. Any other driver in any other
language would necessarily incur the same kind of costs.

[...] And the largest part of the overhead is in pgbench's interpreter
loop:

Indeed, the figures below are very interesting! Thanks for collecting
them.

+   12.35%  pgbench  pgbench                [.] threadRun
+    3.54%  pgbench  pgbench                [.] dopr.constprop.0
+    3.30%  pgbench  pgbench                [.] fmtint
+    1.93%  pgbench  pgbench                [.] getVariable

~ 21%, probably some inlining has been performed, because I would have
expected to see significant time in "advanceConnectionState".

+    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
+    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
+    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
+    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
+    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt

~ 13%

+    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
+    2.95%  pgbench  libc-2.28.so           [.] malloc
+    1.85%  pgbench  libc-2.28.so           [.] ppoll
+    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
+    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv

~ 11%, str is a pain… Not sure who is calling though, pgbench or libpq.

This is basically 47% pgbench, 53% lib*, on the sample provided. I'm
unclear about where system time is measured.

And that's the just the standard pgbench read/write case, without
additional script commands or anything.

Well, duh, that's because you're completely IO bound. You're doing
400tps. That's *nothing*. All you're measuring is how fast the WAL can
be fdatasync()ed to disk. Of *course* pgbench isn't a relevant overhead
in that case. I really don't understand how this can be an argument.

Sure. My interest in running it was to show that the \set stuff was
ridiculous compared to processing an actual SQL query, but it does not
allow to analyze all overheads. I hope that the 3 above examples allow to
make my point more understandable.

Also, pgbench overheads must be compared to an actual client application,
which deals with a database through some language (PHP, Python, JS, Java…)
the interpreter of which would be written in C/C++ just like pgbench, and
some library (ORM, DBI, JDBC…), possibly written in the initial language and
relying on libpq under the hood. Ok, there could be some JIT involved, but
it will not change that there are costs there too, and it would have to do
pretty much the same things that pgbench is doing, plus what the application
has to do with the data.

Uh, but those clients aren't all running on a single machine.

Sure.

The cumulated power of the clients is probably much larger than the
postgres server itself, and ISTM that pgbench allows to simulate such
things with much smaller client-side requirements, and that any other tool
could not do much better.

--
Fabien.

#17

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Fabien COELHO (#16)

Re: pgbench - implement strict TPC-B benchmark

Hi,

On 2019-08-02 10:34:24 +0200, Fabien COELHO wrote:

Hello Andres,

Thanks a lot for these feedbacks and comments.

Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
a pretty significant share.

Fine, but what is the corresponding server load? 211%? 611%? And what actual
time is spent in pgbench itself, vs libpq and syscalls?

System wide pgbench, including libpq, is about 22% of the whole system.

As far as I can tell there's a number of things that are wrong:
- prepared statement names are recomputed for every query execution
- variable name lookup is done for every command, rather than once, when
parsing commands
- a lot of string->int->string type back and forths

Conclusion: pgbench-specific overheads are typically (much) below 10% of the
total client-side cpu cost of a transaction, while over 90% of the cpu cost
is spent in libpq and system, for the worst case do-nothing query.

I don't buy that that's the actual worst case, or even remotely close to
it. I e.g. see higher pgbench overhead for the *modify* case than for
the pgbench's readonly case. And that's because some of the meta
commands are slow, in particular everything related to variables. And
the modify case just has more variables.

+   12.35%  pgbench  pgbench                [.] threadRun
+    3.54%  pgbench  pgbench                [.] dopr.constprop.0
+    3.30%  pgbench  pgbench                [.] fmtint
+    1.93%  pgbench  pgbench                [.] getVariable
~ 21%, probably some inlining has been performed, because I would have
expected to see significant time in "advanceConnectionState".

Yea, there's plenty inlining. Note dopr() is string processing.

+    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
+    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
+    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
+    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
+    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt

~ 13%

A lot of that is really stupid. We need to improve
libpq. PqsendQueryGuts (attributed to PQsendQueryPrepared here), builds
the command in many separate pqPut* commands, which reside in another
translation unit, is pretty sad.

+    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
+    2.95%  pgbench  libc-2.28.so           [.] malloc
+    1.85%  pgbench  libc-2.28.so           [.] ppoll
+    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
+    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv

~ 11%, str is a pain… Not sure who is calling though, pgbench or
libpq.

Both. Most of the strcmp is from getQueryParams()/getVariable(). The
dopr() is from pg_*printf, which is mostly attributable to
preparedStatementName() and getVariable().

This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear
about where system time is measured.

It was excluded in this profile, both to reduce profiling costs, and to
focus on pgbench.

Greetings,

Andres Freund

#18

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Andres Freund (#17)

Re: pgbench - implement strict TPC-B benchmark

Hello Andres,

Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
a pretty significant share.

Fine, but what is the corresponding server load? 211%? 611%? And what actual
time is spent in pgbench itself, vs libpq and syscalls?

System wide pgbench, including libpq, is about 22% of the whole system.

Hmmm. I guess that the consistency between 189% CPU on 4 cores/8 threads
and 22% overall load is that 189/800 = 23.6% ~ 22%.

Given the simplicity of the select-only transaction the stuff is CPU
bound, so postgres 8 server processes should saturate the 4 core CPU, and
pgbench & postgres are competing for CPU time. The overall load is
probably 100%, i.e. 22% pgbench vs 78% postgres (assuming system is
included), 78/22 = 3.5, i.e. pgbench on one core would saturate postgres
on 3.5 cores on a CPU bound load.

I'm not chocked by these results for near worst-case conditions (i.e. the
server side has very little to do).

It seems quite consistent with the really worst-case example I reported
(empty query, cannot do less). Looking at the same empty-sql-query load
through "htop", I have 95% postgres and 75% pgbench. This is not fully
consistent with "time" which reports 55% pgbench overall, over 2/3 of
which in system, under 1/3 pgbench which must be devided into pgbench
actual code and external libpq/lib* other stuff.

Yet again, pgbench code is not the issue from my point of view, because
time is spent mostly elsewhere and any other driver would have to do the
same.

As far as I can tell there's a number of things that are wrong:

Sure, I agree that things could be improved.

- prepared statement names are recomputed for every query execution

I'm not sure it is a bug issue, but it should be precomputed somewhere,
though.

- variable name lookup is done for every command, rather than once, when
parsing commands

Hmmm. The names of variables are not all known in advance, eg \gset.
Possibly it does not matter, because the name of actually used variables
is known. Each used variables could get a number so that using a variable
would be accessing an array at the corresponding index.

- a lot of string->int->string type back and forths

Yep, that is a pain, ISTM that strings are exchanged at the protocol
level, but this is libpq design, not pgbench.

As far as variable values are concerned, AFAICR conversion are performed
on demand only, and just once.

Overall, my point if that even if all pgbench-specific costs were wiped
out it would not change the final result (pgbench load) much because most
of the time is spent in libpq and system. Any other test driver would
incur the same cost.

Conclusion: pgbench-specific overheads are typically (much) below 10% of the
total client-side cpu cost of a transaction, while over 90% of the cpu cost
is spent in libpq and system, for the worst case do-nothing query.

I don't buy that that's the actual worst case, or even remotely close to
it.

Hmmm. I'm not sure I can do much worse than 3 complex expressions against
one empty sql query. Ok, I could put 27 complex expressions to reach
50-50, but the 3-to-1 complex-expression-to-empty-sql ratio already seems
ok for a realistic worst-case test script.

I e.g. see higher pgbench overhead for the *modify* case than for
the pgbench's readonly case. And that's because some of the meta
commands are slow, in particular everything related to variables. And
the modify case just has more variables.

Hmmm. WRT \set and expressions, the two main cost seems to be the large
switch and the variable management. Yet again, I still interpret the
figures I collected as these costs are small compared to libpq/system
overheads, and the overall result is below postgres own CPU costs (on a
per client basis).

+   12.35%  pgbench  pgbench                [.] threadRun
+    3.54%  pgbench  pgbench                [.] dopr.constprop.0
~ 21%, probably some inlining has been performed, because I would have
expected to see significant time in "advanceConnectionState".
Yea, there's plenty inlining. Note dopr() is string processing.

Which is a pain, no doubt about that. Some of it as been taken out of
pgbench already, eg comparing commands vs using an enum.

+    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
+    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
+    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
+    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
+    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt
~ 13%
A lot of that is really stupid. We need to improve libpq.
PqsendQueryGuts (attributed to PQsendQueryPrepared here), builds the
command in many separate pqPut* commands, which reside in another
translation unit, is pretty sad.

Indeed, I'm definitely convinced that libpq costs are high and should be
reduced where possible. Now, yet again, they are much smaller than the
time spent in the system to send and receive the data on a local socket,
so somehow they could be interpreted as good enough, even if not that
good.

+    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
+    2.95%  pgbench  libc-2.28.so           [.] malloc
+    1.85%  pgbench  libc-2.28.so           [.] ppoll
+    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
+    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv
~ 11%, str is a pain… Not sure who is calling though, pgbench or
libpq.
Both. Most of the strcmp is from getQueryParams()/getVariable(). The
dopr() is from pg_*printf, which is mostly attributable to
preparedStatementName() and getVariable().

Hmmm. Franckly I can optimize pgbench code pretty easily, but I'm not sure
of maintainability, and as I said many times, about the real effect it
would have, because these cost are a minor part of the client side
benchmark part.

This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear
about where system time is measured.

It was excluded in this profile, both to reduce profiling costs, and to
focus on pgbench.

Ok.

If we take my other figures and round up, for a running pgbench we have
1/6 actual pgbench, 1/6 libpq, 2/3 system.

If I get a factor of 10 speedup in actual pgbench (let us assume I'm that
good:-), then the overall gain is 1/6 - 1/6/10 = 15%. Although I can do
it, it would be some fun, but the code would get ugly (not too bad, but
nevertheless probably less maintainable, with a partial typing phase and
expression compilation, and my bet is that however good the patch would be
rejected).

Do you see an error in my evaluation of pgbench actual costs and its
contribution to the overall performance of running a benchmark?

If yes, which it is?

If not, do you think advisable to spend time improving the evaluator &
variable stuff and possibly other places for an overall 15% gain?

Also, what would be the likelyhood of such optimization patch to pass?

I could do a limited variable management improvement patch, eventually, I
have funny ideas to speedup the thing, some of which outlined above, some
others even more terrible.

--
Fabien.

#19

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Fabien COELHO (#18)

1 attachment(s)

Re: pgbench - implement strict TPC-B benchmark

Hello Andres,

If not, do you think advisable to spend time improving the evaluator &
variable stuff and possibly other places for an overall 15% gain?

Also, what would be the likelyhood of such optimization patch to pass?

I could do a limited variable management improvement patch, eventually, I
have funny ideas to speedup the thing, some of which outlined above, some
others even more terrible.

Attached is a quick PoC.

sh> cat set.sql
\set x 0
\set y :x
\set z :y
\set w :z
\set v :w
\set x :v
\set y :x
\set z :y
\set w :z
\set v :w
\set x :v
\set y :x
\set z :y
\set w :z
\set v :w
\set x1 :x
\set x2 :x
\set y1 :z
\set y0 :w
\set helloworld :x

Before the patch:

sh> ./pgbench -T 10 -f vars.sql
...
tps = 802966.183240 (excluding connections establishing) # 0.8M

After the patch:

sh> ./pgbench -T 10 -f vars.sql
...
tps = 2665382.878271 (excluding connections establishing) # 2.6M

Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3
complex expressions tests is not measurable, though.

Probably variable management should be reworked more deeply to cleanup the
code.

Questions:
- how likely is such a patch to pass? (IMHO not likely)
- what is its impact to overall performance when actual queries
are performed (IMHO very small).

--
Fabien.

Attachments:

pgbench-symbols-1.patchtext/x-diff; name=pgbench-symbols-1.patchDownload

diff --git a/src/bin/pgbench/Makefile b/src/bin/pgbench/Makefile
index 25abd0a875..39bba12c23 100644
--- a/src/bin/pgbench/Makefile
+++ b/src/bin/pgbench/Makefile
@@ -7,7 +7,7 @@ subdir = src/bin/pgbench
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = pgbench.o exprparse.o $(WIN32RES)
+OBJS = pgbench.o exprparse.o symbol_table.o $(WIN32RES)
 
 override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 17c9ec32c5..e5a15ae136 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -212,6 +212,7 @@ make_variable(char *varname)
 
 	expr->etype = ENODE_VARIABLE;
 	expr->u.variable.varname = varname;
+	expr->u.variable.varnum = getOrCreateSymbolId(symbol_table, varname);
 	return expr;
 }
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf3306a..14153ebc35 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -118,6 +118,7 @@ typedef int pthread_attr_t;
 
 static int	pthread_create(pthread_t *thread, pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
 static int	pthread_join(pthread_t th, void **thread_return);
+// TODO: mutex?
 #elif defined(ENABLE_THREAD_SAFETY)
 /* Use platform-dependent pthread capability */
 #include <pthread.h>
@@ -232,7 +233,9 @@ const char *progname;
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /*
- * Variable definitions.
+ * Variable contents.
+ *
+ * Variable names are managed in symbol_table with a number.
  *
  * If a variable only has a string value, "svalue" is that value, and value is
  * "not set".  If the value is known, "value" contains the value (in any
@@ -243,11 +246,14 @@ volatile bool timer_exceeded = false;	/* flag from signal handler */
  */
 typedef struct
 {
-	char	   *name;			/* variable's name */
-	char	   *svalue;			/* its value in string form, if known */
-	PgBenchValue value;			/* actual variable's value */
+	int			number;	/* variable symbol number */
+	char		   *svalue;	/* its value in string form, if known */
+	PgBenchValue	value;		/* actual variable's value */
 } Variable;
 
+#define undefined_variable_value(v)						\
+	v.svalue == NULL && v.value.type == PGBT_NO_VALUE
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -395,7 +401,6 @@ typedef struct
 	/* client variables */
 	Variable   *variables;		/* array of variable definitions */
 	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -493,6 +498,7 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * varprefix 	SQL commands terminated with \gset have this set
  *				to a non NULL value.  If nonempty, it's used to prefix the
  *				variable name that receives the value.
+ * set_varnum	variable symbol number for \set and \setshell.
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
  */
@@ -505,6 +511,7 @@ typedef struct Command
 	int			argc;
 	char	   *argv[MAX_ARGS];
 	char	   *varprefix;
+	int			set_varnum;
 	PgBenchExpr *expr;
 	SimpleStats stats;
 } Command;
@@ -523,6 +530,10 @@ static int64 total_weight = 0;
 
 static int	debug = 0;			/* debug flag */
 
+/* also used by exprparse.y */
+#define INITIAL_SYMBOL_TABLE_SIZE 128
+SymbolTable symbol_table = NULL;
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -1215,39 +1226,16 @@ doConnect(void)
 	return conn;
 }
 
-/* qsort comparator for Variable array */
-static int
-compareVariableNames(const void *v1, const void *v2)
-{
-	return strcmp(((const Variable *) v1)->name,
-				  ((const Variable *) v2)->name);
-}
-
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
 lookupVariable(CState *st, char *name)
 {
-	Variable	key;
+	int		num = getSymbolId(symbol_table, name, true);
 
-	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (0 <= num && num < st->nvariables)
+		return & st->variables[num];
+	else
 		return NULL;
-
-	/* Sort if we have to */
-	if (!st->vars_sorted)
-	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
-	}
-
-	/* Now we can search */
-	key.name = name;
-	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
-								sizeof(Variable),
-								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
@@ -1292,17 +1280,26 @@ makeVariableValue(Variable *var)
 	if (var->value.type != PGBT_NO_VALUE)
 		return true;			/* no work */
 
+	if (var->svalue == NULL || var->svalue[0] == '\0')
+		return false;
+
 	slen = strlen(var->svalue);
 
-	if (slen == 0)
-		/* what should it do on ""? */
-		return false;
+	/* try most probable first */
+	if (is_an_int(var->svalue))
+	{
+		/* if it looks like an int, it must be an int without overflow */
+		int64		iv;
 
-	if (pg_strcasecmp(var->svalue, "null") == 0)
+		if (!strtoint64(var->svalue, false, &iv))
+			return false;
+
+		setIntValue(&var->value, iv);
+	}
+	else if (pg_strcasecmp(var->svalue, "null") == 0)
 	{
 		setNullValue(&var->value);
 	}
-
 	/*
 	 * accept prefixes such as y, ye, n, no... but not for "o". 0/1 are
 	 * recognized later as an int, which is converted to bool if needed.
@@ -1320,16 +1317,6 @@ makeVariableValue(Variable *var)
 	{
 		setBoolValue(&var->value, false);
 	}
-	else if (is_an_int(var->svalue))
-	{
-		/* if it looks like an int, it must be an int without overflow */
-		int64		iv;
-
-		if (!strtoint64(var->svalue, false, &iv))
-			return false;
-
-		setIntValue(&var->value, iv);
-	}
 	else						/* type should be double */
 	{
 		double		dv;
@@ -1338,7 +1325,8 @@ makeVariableValue(Variable *var)
 		{
 			fprintf(stderr,
 					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+					var->number >= 0 ? getSymbolName(symbol_table, var->number) : "<null>",
+					var->svalue);
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1380,66 +1368,38 @@ valid_variable_name(const char *name)
 	return true;
 }
 
-/*
- * Lookup a variable by name, creating it if need be.
- * Caller is expected to assign a value to the variable.
- * Returns NULL on failure (bad name).
+/* Assign a string value to a variable, creating it if need be.
+ *
+ * Returns false on failure (bad name)
  */
-static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
-{
-	Variable   *var;
-
-	var = lookupVariable(st, name);
-	if (var == NULL)
-	{
-		Variable   *newvars;
-
-		/*
-		 * Check for the name only when declaring a new variable to avoid
-		 * overhead.
-		 */
-		if (!valid_variable_name(name))
-		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
-			return NULL;
-		}
-
-		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
-
-		var = &newvars[st->nvariables];
-
-		var->name = pg_strdup(name);
-		var->svalue = NULL;
-		/* caller is expected to initialize remaining fields */
-
-		st->nvariables++;
-		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
-	}
-
-	return var;
-}
-
-/* Assign a string value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
 static bool
 putVariable(CState *st, const char *context, char *name, const char *value)
 {
+	int		num = getOrCreateSymbolId(symbol_table, name);
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
-	if (!var)
-		return false;
+	if (num < 0 || st->nvariables <= num)
+	{
+		fprintf(stderr, "%s: too many variables while adding \"%s\"\n",
+				context, name);
+		abort();
+	}
+
+	var = & st->variables[num];
+
+	if (var->number == -1)
+	{
+		/* first time the variable is encountered, do some checks */
+		if (!valid_variable_name(name))
+		{
+			fprintf(stderr, "%s: invalid variable name \"%s\"\n",
+					context, name);
+			return false;
+		}
+
+		var->number = num;
+	}
 
 	/* dup then free, in case value is pointing at this variable */
 	val = pg_strdup(value);
@@ -1452,35 +1412,62 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 	return true;
 }
 
-/* Assign a value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/* shortcut version if variable number is known */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
-				 const PgBenchValue *value)
+putSymbolValue(CState *st, const char *context, int var_num,
+			   const PgBenchValue *value)
 {
-	Variable   *var;
+	Variable	*var;
 
-	var = lookupCreateVariable(st, context, name);
-	if (!var)
+	if (var_num < 0 || var_num >= st->nvariables)
+	{
+		fprintf(stderr, "%s: unexpected variable number %d\n",
+				context, var_num);
 		return false;
+	}
+
+	var = & st->variables[var_num];
+
+	if (unlikely(var->number == -1))
+		var->number = var_num;
 
 	if (var->svalue)
+	{
 		free(var->svalue);
-	var->svalue = NULL;
+		var->svalue = NULL;
+	}
 	var->value = *value;
 
 	return true;
 }
 
-/* Assign an integer value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/* shortcut version with variable number */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putSymbolInt(CState *st, const char *context, int var_num, int64 value)
 {
-	PgBenchValue val;
+	Variable	*var;
 
-	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	if (var_num < 0 || var_num >= st->nvariables)
+	{
+		fprintf(stderr, "%s: unexpected variable number %d\n",
+				context, var_num);
+		return false;
+	}
+
+	var = & st->variables[var_num];
+
+	if (unlikely(var->number == -1))
+		var->number = var_num;
+
+	if (var->svalue)
+	{
+		free(var->svalue);
+		var->svalue = NULL;
+	}
+
+	setIntValue(&var->value, value);
+
+	return true;
 }
 
 /*
@@ -2423,16 +2410,18 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 		case ENODE_VARIABLE:
 			{
 				Variable   *var;
+				int			num = expr->u.variable.varnum;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
-				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
-					return false;
-				}
+				Assert(0 <= num && num < st->nvariables);
+
+				var = & st->variables[num];
 
 				if (!makeVariableValue(var))
+				{
+					fprintf(stderr, "no value for variable \"%s\"\n",
+							expr->u.variable.varname);
 					return false;
+				}
 
 				*retval = var->value;
 				return true;
@@ -2490,7 +2479,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(CState *st, int var_num, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2544,7 +2533,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	command[len] = '\0';
 
 	/* Fast path for non-assignment case */
-	if (variable == NULL)
+	if (var_num == -1)
 	{
 		if (system(command))
 		{
@@ -2584,7 +2573,8 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 				argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+
+	if (var_num != -1 && !putSymbolInt(st, "setshell", var_num, retval))
 		return false;
 
 #ifdef DEBUG
@@ -3344,7 +3334,7 @@ executeMetaCommand(CState *st, instr_time *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putSymbolValue(st, argv[0], command->set_varnum, &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3414,7 +3404,7 @@ executeMetaCommand(CState *st, instr_time *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(st, command->set_varnum, argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3422,7 +3412,7 @@ executeMetaCommand(CState *st, instr_time *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(st, -1, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -4280,6 +4270,12 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 			offsets[j] = word_offset;
 			my_command->argv[j++] = pg_strdup(word_buf.data);
 			my_command->argc++;
+
+			if (!valid_variable_name(word_buf.data))
+				syntax_error(source, lineno, my_command->first_line, my_command->argv[0],
+							 "invalid variable name", word_buf.data, -1);
+
+			my_command->set_varnum = getOrCreateSymbolId(symbol_table, word_buf.data);
 		}
 
 		/* then for all parse the expression */
@@ -4377,6 +4373,12 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 		if (my_command->argc < 3)
 			syntax_error(source, lineno, my_command->first_line, my_command->argv[0],
 						 "missing argument", NULL, -1);
+
+		if (!valid_variable_name(my_command->argv[1]))
+			syntax_error(source, lineno, my_command->first_line, my_command->argv[0],
+						 "invalid variable name", my_command->argv[1], -1);
+
+		my_command->set_varnum = getOrCreateSymbolId(symbol_table, my_command->argv[1]);
 	}
 	else if (my_command->meta == META_SHELL)
 	{
@@ -5154,6 +5156,8 @@ main(int argc, char **argv)
 
 	int			i;
 	int			nclients_dealt;
+	int			scale_symbol, client_id_symbol,
+					random_seed_symbol, default_seed_symbol;
 
 #ifdef HAVE_GETRLIMIT
 	struct rlimit rlim;
@@ -5190,6 +5194,21 @@ main(int argc, char **argv)
 		login = env;
 
 	state = (CState *) pg_malloc0(sizeof(CState));
+	state->nvariables = INITIAL_SYMBOL_TABLE_SIZE;
+	state->variables = pg_malloc(sizeof(Variable) * state->nvariables);
+	for (int i = 0; i < state->nvariables; i++)
+	{
+		state->variables[i].number = -1;
+		state->variables[i].svalue = NULL;
+		state->variables[i].value.type = PGBT_NO_VALUE;
+	}
+
+	/* symbol table */
+	symbol_table = newSymbolTable();
+	scale_symbol = getOrCreateSymbolId(symbol_table, "scale");
+	client_id_symbol = getOrCreateSymbolId(symbol_table, "client_id");
+	random_seed_symbol = getOrCreateSymbolId(symbol_table, "random_seed");
+	default_seed_symbol = getOrCreateSymbolId(symbol_table, "default_seed");
 
 	/* set random seed early, because it may be used while parsing scripts. */
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
@@ -5649,6 +5668,8 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	// dumpSymbolTable(stderr, symbol_table);
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5657,32 +5678,28 @@ main(int argc, char **argv)
 
 	if (nclients > 1)
 	{
+		int			nvars = numberOfSymbols(symbol_table) + INITIAL_SYMBOL_TABLE_SIZE;
+
 		state = (CState *) pg_realloc(state, sizeof(CState) * nclients);
 		memset(state + 1, 0, sizeof(CState) * (nclients - 1));
 
+		state[0].variables = pg_realloc(state[0].variables, sizeof(Variable) * nvars);
+		state[0].nvariables = nvars;
+
+		for (i = INITIAL_SYMBOL_TABLE_SIZE; i < nvars; i++)
+		{
+			state[0].variables[i].number = -1;
+			state[0].variables[i].svalue = NULL;
+			state[0].variables[i].value.type = PGBT_NO_VALUE;
+		}
+
 		/* copy any -D switch values to all clients */
 		for (i = 1; i < nclients; i++)
 		{
-			int			j;
-
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
-			{
-				Variable   *var = &state[0].variables[j];
-
-				if (var->value.type != PGBT_NO_VALUE)
-				{
-					if (!putVariableValue(&state[i], "startup",
-										  var->name, &var->value))
-						exit(1);
-				}
-				else
-				{
-					if (!putVariable(&state[i], "startup",
-									 var->name, var->svalue))
-						exit(1);
-				}
-			}
+			state[i].variables = pg_malloc(sizeof(Variable) * nvars);
+			state[i].nvariables = nvars;
+			memcpy(state[i].variables, state[0].variables, sizeof(Variable) * nvars);
 		}
 	}
 
@@ -5754,43 +5771,41 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (undefined_variable_value(state[0].variables[scale_symbol]))
 	{
 		for (i = 0; i < nclients; i++)
-		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putSymbolInt(&state[i], "startup", scale_symbol, scale))
 				exit(1);
-		}
 	}
 
 	/*
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (undefined_variable_value(state[0].variables[client_id_symbol]))
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putSymbolInt(&state[i], "startup", client_id_symbol, i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (undefined_variable_value(state[0].variables[default_seed_symbol]))
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putSymbolInt(&state[i], "startup", default_seed_symbol, (int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (undefined_variable_value(state[0].variables[random_seed_symbol]))
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putSymbolInt(&state[i], "startup", random_seed_symbol, random_seed))
 				exit(1);
 	}
 
@@ -5930,6 +5945,8 @@ main(int argc, char **argv)
 	if (exit_code != 0)
 		fprintf(stderr, "Run was aborted; the above results are incomplete.\n");
 
+	freeSymbolTable(symbol_table);
+
 	return exit_code;
 }
 
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index c4a1e298e0..243f59a8ee 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -115,6 +115,7 @@ struct PgBenchExpr
 		struct
 		{
 			char	   *varname;
+			int			varnum;
 		}			variable;
 		struct
 		{
@@ -163,4 +164,21 @@ extern void syntax_error(const char *source, int lineno, const char *line,
 extern bool strtoint64(const char *str, bool errorOK, int64 *pi);
 extern bool strtodouble(const char *str, bool errorOK, double *pd);
 
+/*
+ * A symbol table is an opaque data structure
+ */
+struct SymbolTable_st;
+typedef struct SymbolTable_st *SymbolTable;
+
+SymbolTable newSymbolTable(void);
+void freeSymbolTable(SymbolTable st);
+void dumpSymbolTable(FILE *out, SymbolTable st);
+int numberOfSymbols(SymbolTable st);
+int getOrCreateSymbolId(SymbolTable st, const char *name);
+int getSymbolId(SymbolTable st, const char *name, bool check);
+char *getSymbolName(SymbolTable st, int number);
+int rmSymbolId(SymbolTable st, const char *name);
+
+extern SymbolTable symbol_table;
+
 #endif							/* PGBENCH_H */
diff --git a/src/bin/pgbench/symbol_table.c b/src/bin/pgbench/symbol_table.c
new file mode 100644
index 0000000000..e9229ea74b
--- /dev/null
+++ b/src/bin/pgbench/symbol_table.c
@@ -0,0 +1,317 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#if defined(ENABLE_THREAD_SAFETY)
+#include <pthread.h>
+#else
+#define pthread_mutex_t void *
+#define pthread_mutex_init(m, p) 0
+#define pthread_mutex_lock(m)
+#define pthread_mutex_unlock(m)
+#define pthread_mutex_destroy(m)
+#endif
+
+typedef enum { false, true } bool;
+
+struct SymbolTable_st;
+typedef struct SymbolTable_st *SymbolTable;
+SymbolTable newSymbolTable(void);
+void freeSymbolTable(SymbolTable st);
+void dumpSymbolTable(FILE *out, SymbolTable st);
+int numberOfSymbols(SymbolTable st);
+int getOrCreateSymbolId(SymbolTable st, const char *name);
+int getSymbolId(SymbolTable st, const char *name, bool check);
+char *getSymbolName(SymbolTable st, int number);
+int rmSymbolId(SymbolTable st, const char *name);
+
+/*----
+ *
+ * simplistic SymbolTable management.
+ *
+ * Implement a string -> int table based on a tree on bytes.
+ * The data structure makes more sense if many searches are expected.
+ *
+ * Each '\0'-terminated string is store in the tree at the first non-unique
+ * prefix, possibly including the final '\0' if the string is a prefix of
+ * another.
+ *
+ * Storage requirement n * len(s).
+ *
+ * Storing a symbol costs len(s).
+ *
+ * Finding a symbol costs len(s) if checked, len(unique_prefix(s)) if unchecked.
+ */
+
+typedef struct Symbol
+{
+	char   *name;
+	int		number;
+} Symbol;
+
+typedef struct SymbolTableTree
+{
+	bool				is_leaf;
+	union {
+		Symbol						symb;	/* is_leaf */
+		struct SymbolTableTree	   *tree;	/* ! ~ */
+	} u;
+} SymbolTableTree;
+
+struct SymbolTable_st
+{
+	int					nsymbols;		/* next symbol number */
+	int					stored;			/* current #symbols in st */
+	SymbolTableTree	   *root;			/* symbol tree root */
+	int					symbols_size;	/* size of following array */
+	Symbol			  **symbols;		/* array of existing symbols */
+	pthread_mutex_t		lock;
+};
+
+/* allocate a branching node of empty leaves */
+static SymbolTableTree * newSymbolTableTree(void)
+{
+	SymbolTableTree * tree = malloc(sizeof(SymbolTableTree) * 256);
+
+	for (int i = 0; i < 256; i++)
+	{
+		tree[i].is_leaf = true;
+		tree[i].u.symb.name = NULL;
+		tree[i].u.symb.number = -1;
+	}
+
+	return tree;
+}
+
+/* allocate a symbol table */
+SymbolTable newSymbolTable(void)
+{
+	SymbolTable st = malloc(sizeof(struct SymbolTable_st));
+
+	st->nsymbols = 0;
+	st->stored = 0;
+	st->root = malloc(sizeof(SymbolTableTree));
+	st->root->is_leaf = true;
+	st->root->u.symb.name = NULL;
+	st->root->u.symb.number = -1;
+	st->symbols_size = 16;
+	st->symbols = malloc(sizeof(Symbol *) * st->symbols_size);
+	if (pthread_mutex_init(&st->lock, NULL) != 0)
+		abort();
+
+	return st;
+}
+
+/* recursively free a symbol table tree */
+static void freeSymbolTableTree(SymbolTableTree * tree)
+{
+	if (tree->is_leaf)
+	{
+		if (tree->u.symb.name != NULL)
+			free(tree->u.symb.name);
+	}
+	else
+	{
+		for (int i = 0; i < 256; i++)
+			freeSymbolTableTree(& tree->u.tree[i]);
+		free(tree->u.tree);
+	}
+}
+
+/* free symbol table st */
+void freeSymbolTable(SymbolTable st)
+{
+	freeSymbolTableTree(st->root);
+	free(st->root);
+	free(st->symbols);
+	pthread_mutex_destroy(&st->lock);
+	free(st);
+}
+
+/* how many symbols have been seen */
+int numberOfSymbols(SymbolTable st)
+{
+	return st->nsymbols;
+}
+
+/* dump tree to out, in byte order */
+static void dumpSymbolTableTree(FILE * out, SymbolTableTree * tree)
+{
+	if (tree == NULL)
+		fprintf(out, "(null)");
+	else if (tree->is_leaf)
+	{
+		if (tree->u.symb.name != NULL || tree->u.symb.number != -1)
+			fprintf(out, "%s -> %d\n", tree->u.symb.name, tree->u.symb.number);
+	}
+	else
+		for (int i = 0; i < 256; i++)
+			dumpSymbolTableTree(out, & tree->u.tree[i]);
+}
+
+void dumpSymbolTable(FILE * out, SymbolTable st)
+{
+	fprintf(out, "SymbolTable dump: %d symbols (%d seen)\n",
+			st->stored, st->nsymbols);
+	dumpSymbolTableTree(out, st->root);
+}
+
+/* add new symbol to a node. NOT thread-safe. */
+static int addSymbol(SymbolTable st, SymbolTableTree *node, const char *name)
+{
+	int		number;
+
+	number = st->nsymbols++;
+	node->u.symb.name = strdup(name);
+	node->u.symb.number = number;
+
+	if (number >= st->symbols_size)
+	{
+		st->symbols_size *= 2;
+		st->symbols = realloc(st->symbols, sizeof(Symbol *) * st->symbols_size);
+	}
+
+	st->symbols[number] = & node->u.symb;
+	st->stored++;
+
+	return number;
+}
+
+/* get existing id or create a new one for name in st */
+int getOrCreateSymbolId(SymbolTable st, const char *name)
+{
+	SymbolTableTree	*node = st->root;
+	int i = 0;
+
+	/* get down to a leaf */
+	while (!node->is_leaf)
+		node = & node->u.tree[(unsigned char) name[i++]];
+
+	if (node->u.symb.name == NULL)
+	{
+		int		number;
+
+		/* empty leaf, let us use it */
+		pthread_mutex_lock(&st->lock);
+		number = addSymbol(st, node, name);
+		pthread_mutex_unlock(&st->lock);
+
+		return number;
+	}
+	else if ((i > 0 && strcmp(node->u.symb.name + i - 1, name + i - 1) == 0) ||
+			 strcmp(node->u.symb.name, name) == 0)
+		/* already occupied by same name */
+		return node->u.symb.number;
+	else /* it contains another name */
+	{
+		Symbol	prev;
+		int		number;
+
+		/* should lock be acquire before? or redo the descent? */
+		pthread_mutex_lock(&st->lock);
+		prev = node->u.symb;
+
+		/* expand tree from current node */
+		do
+		{
+			node->is_leaf = false;
+			node->u.tree = newSymbolTableTree();
+			node->u.tree[(unsigned char) prev.name[i]].u.symb = prev;
+			node = & node->u.tree[(unsigned char) name[i++]];
+		} while (node->u.symb.name != NULL);
+
+		number = addSymbol(st, node, name);
+
+		pthread_mutex_unlock(&st->lock);
+
+		return number;
+	}
+}
+
+/* return symbol number that exists or -1 */
+int getSymbolId(SymbolTable st, const char * name, bool check)
+{
+	SymbolTableTree	   *node = st->root;
+	int					i = 0;
+
+	while (!node->is_leaf)
+		node = & node->u.tree[(unsigned char) name[i++]];
+
+	// - 1 if tree included final '\0' character
+	if (node->u.symb.name != NULL &&
+		(!check ||
+		 (i>0 && strcmp(node->u.symb.name + i - 1, name + i - 1) == 0) ||
+		 strcmp(node->u.symb.name, name) == 0))
+		return node->u.symb.number;
+	else
+		return -1;
+}
+
+/* return symbol name if exists, or NULL */
+char *getSymbolName(SymbolTable st, int number)
+{
+	return 0 <= number && number < st->nsymbols ? st->symbols[number]->name : NULL;
+}
+
+/* remove name from st, return its number or -1 if not found */
+int rmSymbolId(SymbolTable st, const char * name)
+{
+	SymbolTableTree	   *node;
+	int					i = 0,
+						num;
+
+	pthread_mutex_lock(&st->lock);
+
+	node = st->root;
+	while (!node->is_leaf)
+		node = & node->u.tree[(unsigned char) name[i++]];
+
+	if (node->u.symb.name != NULL && strcmp(node->u.symb.name, name) == 0)
+	{
+		num = node->u.symb.number;
+
+		free(node->u.symb.name);
+		node->u.symb.name = NULL;
+		node->u.symb.number = -1;
+		st->stored--;
+		st->symbols[num] = NULL;
+	}
+	else
+		num = -1;
+
+	pthread_mutex_unlock(&st->lock);
+
+	return num;
+}
+
+#ifdef WITH_MAIN
+int main(void)
+{
+	char *names[27] = {
+		"calvin", "hobbes", "calvin", "susie", "moe", "rosalyn",
+		"v", "va", "var", "var1", "var2", "var3",
+		"var11", "var1", "var11", "hobbes", "moe", "v", "",
+		"é", "été", "ét", "étage",
+		"你好!", "你好", "你",
+		NULL
+	};
+
+	SymbolTable st = newSymbolTable();
+
+	for (int i = 0; names[i] != NULL; i++)
+	{
+		fprintf(stdout, "# %s (%ld)\n", names[i], strlen(names[i]));
+		int prev = getSymbolId(st, names[i], true);
+		int nc = getSymbolId(st, names[i], false);
+		int add = getOrCreateSymbolId(st, names[i]);
+		int post = getSymbolId(st, names[i], false);
+		int rm = rmSymbolId(st, "moe");
+		fprintf(stdout, "%d(%d) %d %d %d\n", prev, nc, add, post, rm);
+	}
+
+	dumpSymbolTable(stderr, st);
+	freeSymbolTable(st);
+
+	return 0;
+}
+#endif
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index b82d3f65c4..493f9ec883 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -622,14 +622,14 @@ SELECT LEAST(} . join(', ', (':i') x 256) . q{)}
 		[qr{unexpected function name}], q{\set i noSuchFunction()}
 	],
 	[
-		'set invalid variable name', 2,
+		'set invalid variable name', 1,
 		[qr{invalid variable name}], q{\set . 1}
 	],
 	[ 'set division by zero', 2, [qr{division by zero}], q{\set i 1/0} ],
 	[
 		'set undefined variable',
 		2,
-		[qr{undefined variable "nosuchvariable"}],
+		[qr{no value for variable "nosuchvariable"}],
 		q{\set i :nosuchvariable}
 	],
 	[ 'set unexpected char', 1, [qr{unexpected character .;.}], q{\set i ;} ],

#20

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Fabien COELHO (#19)

Re: pgbench - implement strict TPC-B benchmark

Hi,

On 2019-08-05 17:38:23 +0200, Fabien COELHO wrote:

Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3
complex expressions tests is not measurable, though.

I don't know why that could be disappointing. We put in much more work
for much smaller gains in other places.

Probably variable management should be reworked more deeply to cleanup the
code.

Agreed.

Questions:
- how likely is such a patch to pass? (IMHO not likely)

I don't see why? I didn't review the patch in any detail, but it didn't
look crazy in quick skim? Increasing how much load can be simulated
using pgbench, is something I personally find much more interesting than
adding capabilities that very few people will ever use.

FWIW, the areas I find current pgbench "most lacking" during development
work are:

1) Data load speed. The data creation is bottlenecked on fprintf in a
single process. The index builds are done serially. The vacuum could
be replaced by COPY FREEZE. For a lot of meaningful tests one needs
10-1000s of GB of testdata - creating that is pretty painful.

2) Lack of proper initialization integration for custom
scripts. I.e. have steps that are in the custom script that allow -i,
vacuum, etc to be part of the script, rather than separately
executable steps. --init-steps doesn't do anything for that.

3) pgbench overhead, although that's to a significant degree libpq's fault

4) Ability to cancel pgbench and get approximate results. That currently
works if the server kicks out the clients, but not when interrupting
pgbench - which is just plain weird. Obviously that doesn't matter
for "proper" benchmark runs, but often during development, it's
enough to run pgbench past some events (say the next checkpoint).

- what is its impact to overall performance when actual queries
are performed (IMHO very small).

Obviously not huge - I'd also not expect it to be unobservably small
either. Especially if somebody went and fixed some of the inefficiency
in libpq, but even without that. And even moreso, if somebody revived
the libpq batch work + the relevant pgbench patch, because that removes
a lot of the system/kernel overhead, due to the crazy number of context
switches (obviously not realistic for all workloads, but e.g. for plenty
java workloads, it is), but leaves the same number of variable accesses
etc.

Greetings,

Andres Freund

#21

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#15)

Re: pgbench - implement strict TPC-B benchmark

On Fri, Aug 2, 2019 at 2:38 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Ok, one thread cannot feed an N core server if enough client are executed
per thread and the server has few things to do.

Right ... where N is, uh, TWO.

The point I'm clumsily trying to make is that pgbench-specific overheads
are quite small: Any benchmark driver would have pretty much at least the
same costs, because you have the cpu cost of the tool itself, then the
library it uses, eg lib{pq,c}, then syscalls. Even if the first costs are
reduced to zero, you still have to deal with the database through the
system, and this part will be the same.

I'm not convinced. Perhaps you're right; after all, it's not like
pgbench is doing any real work. On the other hand, I've repeatedly
been annoyed by how inefficient pgbench is, so I'm not totally
prepared to concede that any benchmark driver would have the same
costs, or that it's a reasonably well-optimized client application.
When I run the pgbench, I want to know how fast the server is, not how
fast pgbench is.

What name would you suggest, if it were to be made available from pgbench
as a builtin, that avoids confusion with "tpcb-like"?

I'm not in favor of adding it as a built-in. If we were going to do
it, I don't know that we could do better than tcpb-like-2, and I'm not
excited about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Andres Freund (#20)

Re: pgbench - implement strict TPC-B benchmark

Hello Andres,

Which is a (somehow disappointing) * 3.3 speedup. The impact on the 3
complex expressions tests is not measurable, though.

I don't know why that could be disappointing. We put in much more work
for much smaller gains in other places.

Probably, but I thought I would have a better deal by eliminating most
string stuff from variables.

Questions:
- how likely is such a patch to pass? (IMHO not likely)

I don't see why? I didn't review the patch in any detail, but it didn't
look crazy in quick skim? Increasing how much load can be simulated
using pgbench, is something I personally find much more interesting than
adding capabilities that very few people will ever use.

Yep, but my point is that the bottleneck is mostly libpq/system, as I
tried to demonstrate with the few experiments I reported.

FWIW, the areas I find current pgbench "most lacking" during development
work are:

1) Data load speed. The data creation is bottlenecked on fprintf in a
single process.

snprintf actually, could be replaced.

I submitted a patch to add more control on initialization, including a
server-side loading feature, i.e. the client does not send data, the
server generates its own, see 'G':

https://commitfest.postgresql.org/24/2086/

However on my laptop it is slower than client-side loading on a local
socket. The client version is doing around 70 MB/s, the client load is
20-30%, postgres load is 85%, but I'm not sure I can hope for much more on
my SSD. On my laptop the bottleneck is postgres/disk, not fprintf.

The index builds are done serially. The vacuum could be replaced by COPY
FREEZE.

Well, it could be added?

For a lot of meaningful tests one needs 10-1000s of GB of testdata -
creating that is pretty painful.

Yep.

2) Lack of proper initialization integration for custom
scripts.

Hmmm…

You can always write a psql script for schema and possibly simplistic data
initialization?

However, generating meaningful pseudo-random data for an arbitrary schema
is a pain. I did an external tool for that a few years ago:

http://www.coelho.net/datafiller.html

but it is still a pain.

I.e. have steps that are in the custom script that allow -i, vacuum, etc
to be part of the script, rather than separately executable steps.
--init-steps doesn't do anything for that.

Sure. It just gives some control.

3) pgbench overhead, although that's to a significant degree libpq's fault

I'm afraid that is currently the case.

4) Ability to cancel pgbench and get approximate results. That currently
works if the server kicks out the clients, but not when interrupting
pgbench - which is just plain weird. Obviously that doesn't matter
for "proper" benchmark runs, but often during development, it's
enough to run pgbench past some events (say the next checkpoint).

Do you mean have a report anyway on "Ctrl-C"?

I usually do a -P 1 to see the progress, but making Ctrl-C work should be
reasonably easy.

- what is its impact to overall performance when actual queries
are performed (IMHO very small).

Obviously not huge - I'd also not expect it to be unobservably small
either.

Hmmm… Indeed, the 20 \set script runs at 2.6 M/s, that is 0.019 µs per
\set, and any discussion over the connection is at least 15 µs (for one
client on a local socket).

--
Fabien.

#23

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Robert Haas (#21)

Re: pgbench - implement strict TPC-B benchmark

Hello Robert,

Ok, one thread cannot feed an N core server if enough client are executed
per thread and the server has few things to do.

Right ... where N is, uh, TWO.

Yes, two indeed… For low-work cpu-bound load, given libpq & system
overheads, you cannot really hope for a better deal.

I think that the documentation could be clearer about thread/core
recommendations, i.e. how much ressources you should allocate to pgbench
so that the server is more likely to be the bottleneck, in the "Good
Practices" section.

The point I'm clumsily trying to make is that pgbench-specific overheads
are quite small: Any benchmark driver would have pretty much at least the
same costs, because you have the cpu cost of the tool itself, then the
library it uses, eg lib{pq,c}, then syscalls. Even if the first costs are
reduced to zero, you still have to deal with the database through the
system, and this part will be the same.

I'm not convinced. Perhaps you're right; after all, it's not like
pgbench is doing any real work.

Yep, pgbench is not doing much beyond going from one libpq call to the
next. It can be improved, but the overhead is already reasonably low.

On the other hand, I've repeatedly been annoyed by how inefficient
pgbench is, so I'm not totally prepared to concede that any benchmark
driver would have the same costs, or that it's a reasonably
well-optimized client application. When I run the pgbench, I want to
know how fast the server is, not how fast pgbench is.

Hmmm. You cannot fully isolate components: You can only basically learn
how fast the server is when accessed through a libpq connection, which is
quite reasonable because this is what a user can expect anyway.

--
Fabien.

#24

Dmitry Dolgov

9erthalion6@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#23)

1 attachment(s)

Re: pgbench - implement strict TPC-B benchmark

On Mon, Aug 5, 2019 at 10:46 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The index builds are done serially. The vacuum could be replaced by COPY
FREEZE.

Well, it could be added?

While doing benchmarking using different tools, including pgbench, I found it
useful as a temporary hack to add copy freeze and maintenance_work_mem options
(the last one not as an env variable, just as a set before, although not sure
if it's a best idea). Is it similar to what you were talking about?

Attachments:

v1-0001-pgbench-load-options.patchapplication/octet-stream; name=v1-0001-pgbench-load-options.patchDownload

From e26f0e6256f26bd50d8c35db25255105c32c9eee Mon Sep 17 00:00:00 2001
From: erthalion <9erthalion6@gmail.com>
Date: Wed, 7 Aug 2019 14:56:34 +0200
Subject: [PATCH v1] pgbench load options

Add options for doing "copy freeze" and set maintenance_work_mem
---
 src/bin/pgbench/pgbench.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf3306a..43ec531362 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -231,6 +231,9 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+int	maintenance_work_mem = 0; 	/* maintenance_work_mem for initialization */
+bool copy_freeze = false; 		/* do copy freeze */
+
 /*
  * Variable definitions.
  *
@@ -652,6 +655,8 @@ usage(void)
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "  --show-script=NAME       show builtin script code, then exit\n"
+		   "  --maintenance-work-mem   maintenance_work_mem in MB for init\n"
+		   "  --copy-freeze   		   perform copy freeze\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
@@ -690,6 +695,19 @@ is_an_int(const char *str)
 	return *ptr == '\0';
 }
 
+static const char *
+appendMemory(const char *sql)
+{
+	char *memory;
+
+	if (maintenance_work_mem == 0)
+		return sql;
+
+	memory = psprintf("set maintenance_work_mem to '%dMB'",
+					  maintenance_work_mem);
+
+	return psprintf("%s; %s;", memory, sql);
+}
 
 /*
  * strtoint64 -- convert a string to 64-bit integer
@@ -3705,6 +3723,8 @@ initGenerateData(PGconn *con)
 	double		elapsed_sec,
 				remaining_sec;
 	int			log_interval = 1;
+	const char 	*copy_sql = "copy pgbench_accounts from stdin";
+	const char 	*copy_freeze_sql = "copy pgbench_accounts from stdin freeze";
 
 	fprintf(stderr, "generating data...\n");
 
@@ -3749,7 +3769,7 @@ initGenerateData(PGconn *con)
 	/*
 	 * accounts is big enough to be worth using COPY and tracking runtime
 	 */
-	res = PQexec(con, "copy pgbench_accounts from stdin");
+	res = PQexec(con, appendMemory(copy_freeze ? copy_freeze_sql : copy_sql));
 	if (PQresultStatus(res) != PGRES_COPY_IN)
 	{
 		fprintf(stderr, "%s", PQerrorMessage(con));
@@ -3857,7 +3877,7 @@ initCreatePKeys(PGconn *con)
 	{
 		char		buffer[256];
 
-		strlcpy(buffer, DDLINDEXes[i], sizeof(buffer));
+		strlcpy(buffer, appendMemory(DDLINDEXes[i]), sizeof(buffer));
 
 		if (index_tablespace != NULL)
 		{
@@ -5126,6 +5146,8 @@ main(int argc, char **argv)
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
 		{"show-script", required_argument, NULL, 10},
+		{"maintenance-work-mem", required_argument, NULL, 11},
+		{"copy-freeze", no_argument, NULL, 12},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5486,6 +5508,18 @@ main(int argc, char **argv)
 					exit(0);
 				}
 				break;
+			case 11:			/* maintenance-work-mem */
+				maintenance_work_mem = atoi(optarg);
+				if (maintenance_work_mem <= 0)
+				{
+					fprintf(stderr, "invalid maintenance_work_mem: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
+			case 12:			/* copy-freeze */
+				copy_freeze = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
-- 
2.21.0

#25

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Dmitry Dolgov (#24)

Re: pgbench - implement strict TPC-B benchmark

Hello Dmitry,

Well, it could be added?

While doing benchmarking using different tools, including pgbench, I found it
useful as a temporary hack to add copy freeze and maintenance_work_mem options
(the last one not as an env variable, just as a set before, although not sure
if it's a best idea). Is it similar to what you were talking about?

About this patch:

Concerning the --maintenance... option, ISTM that there could rather be a
generic way to provide "set" settings, not a specific option for a
specific parameter with a specific unit. Moreover, ISTM that it only needs
to be set once on a connection, not per command. I'd suggest something
like:

--connection-initialization '...'

That would be issue when a connection is started, for any query, then the
effect would be achieved with:

pgbench --conn…-init… "SET maintenance_work_main TO '12MB'" ...

The --help does not say that the option expects a parameter.

Also, in you patch it is a initialization option, but the code does not
check for that.

Concerning the freeze option:

It is also a initialization-specific option that should be checked for
that.

The option does not make sense if

The alternative queries could be managed simply without intermediate
variables.

Pgbench documentation is not updated.

There are no tests.

This patch should be submitted in its own thread to help manage it in the
CF app.

--
Fabien.

#26

Dmitry Dolgov

9erthalion6@gmail.com

over 6 years ago

In reply to: Fabien COELHO (#25)

Re: pgbench - implement strict TPC-B benchmark

On Wed, Aug 28, 2019 at 7:37 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

While doing benchmarking using different tools, including pgbench, I found it
useful as a temporary hack to add copy freeze and maintenance_work_mem options
(the last one not as an env variable, just as a set before, although not sure
if it's a best idea). Is it similar to what you were talking about?

About this patch:

Concerning the --maintenance... option, ISTM that there could rather be a
generic way to provide "set" settings, not a specific option for a
specific parameter with a specific unit. Moreover, ISTM that it only needs
to be set once on a connection, not per command. I'd suggest something
like:

--connection-initialization '...'

That would be issue when a connection is started, for any query, then the
effect would be achieved with:

pgbench --conn…-init… "SET maintenance_work_main TO '12MB'" ...

The --help does not say that the option expects a parameter.

Also, in you patch it is a initialization option, but the code does not
check for that.

Concerning the freeze option:

It is also a initialization-specific option that should be checked for
that.

The option does not make sense if

The alternative queries could be managed simply without intermediate
variables.

Pgbench documentation is not updated.

There are no tests.

This patch should be submitted in its own thread to help manage it in the
CF app.

Thanks, that was a pretty deep answer for what was supposed to be just an
alignment question :) But sure, I can prepare a proper version and post it
separately.