pgbench: faster version of tpcb-like transaction
If all the data is in memory and you have a system with fast fsyncs (or are
running with fsync off, or unlogged tables, or synchronous_commit off),
then the big bottleneck in pgbench is the amount of back and forth between
the pgbench program and the backend. There are 7 commands per transaction.
It is easy to package 5 of those commands into a single PL/pgSQL function,
with the other two being implicit via the standard auto-commit behavior
when explicit transactions are not opened. The attached patch does that,
under the name tpcb-func. I first named it tpcb-like-func, but one builtin
name can't be a prefix or another so that won't work.
It creates the function unconditionally during -i, because there is no way
to know if the run-time will end up using it or not. I think this is OK.
PL/pgSQL is installed by default in all supported versions. If someone has
gone through the bother of uninstalling it, I don't see a need to
accommodate them here.
I get nearly a 3 fold speed up using the new transaction, from 9184
to 26383 TPS, on 8 CPU machine using scale 50 and:
PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like
I think this should be committed as a built-in, not just a user-defined
transaction, because I would like to see it widely used. In fact, if it
weren't for historical consistency I would say it should be the default
transaction. Wanting to measure IPC overhead is a valid thing to do, but
certainly isn't the most common thing people want to do with pgbench. If a
user is limited by IO, it wouldn't matter which transaction they use, and
if they are not limited by IO then this transaction is more likely to be
the right one for them than the current default one transaction is.
Also, as a user-defined transaction with -f, you have to go out of your way
to create the function (no "-i" support) and to make sure :scale gets set
correctly during runs (as it won't be automatically read from
pgbench_branches table, you have manually give -D).
Cheers,
Jeff
Attachments:
pgbench_function_v1.patchapplication/octet-stream; name=pgbench_function_v1.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
new file mode 100644
index 03e1212..a51c014
*** a/doc/src/sgml/ref/pgbench.sgml
--- b/doc/src/sgml/ref/pgbench.sgml
*************** pgbench <optional> <replaceable>options<
*** 269,275 ****
Add the specified built-in script to the list of executed scripts.
An optional integer weight after <literal>@</> allows to adjust the
probability of drawing the script. If not specified, it is set to 1.
! Available built-in scripts are: <literal>tpcb-like</>,
<literal>simple-update</> and <literal>select-only</>.
Unambiguous prefixes of built-in names are accepted.
With special name <literal>list</>, show the list of built-in scripts
--- 269,275 ----
Add the specified built-in script to the list of executed scripts.
An optional integer weight after <literal>@</> allows to adjust the
probability of drawing the script. If not specified, it is set to 1.
! Available built-in scripts are: <literal>tpcb-like</>, <literal>tpcb-func</>,
<literal>simple-update</> and <literal>select-only</>.
Unambiguous prefixes of built-in names are accepted.
With special name <literal>list</>, show the list of built-in scripts
*************** pgbench <optional> <replaceable>options<
*** 726,731 ****
--- 726,737 ----
</orderedlist>
<para>
+ If you select the <literal>tpcb-func</> built-in,
+ the above steps are carried out by a single call to a <application>PL/pgSQL</> function,
+ reducing the overhead from inter-process-communication.
+ </para>
+
+ <para>
If you select the <literal>simple-update</> built-in (also <option>-N</>),
steps 4 and 5 aren't included in the transaction.
This will avoid update contention on these tables, but
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
new file mode 100644
index ae78c7b..00e82ef
*** a/src/bin/pgbench/pgbench.c
--- b/src/bin/pgbench/pgbench.c
*************** static const BuiltinScript builtin_scrip
*** 425,430 ****
--- 425,439 ----
"END;\n"
},
{
+ "tpcb-func",
+ "<builtin: TPC-B (sort of) in PL/pgSQL>",
+ "\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
+ "\\set bid random(1, " CppAsString2(nbranches) " * :scale)\n"
+ "\\set tid random(1, " CppAsString2(ntellers) " * :scale)\n"
+ "\\set delta random(-5000, 5000)\n"
+ "select * from pgbench_transaction(:aid, :bid, :tid, :delta);\n"
+ },
+ {
"simple-update",
"<builtin: simple update>",
"\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
*************** init(bool is_no_vacuum)
*** 2688,2693 ****
--- 2697,2716 ----
executeStatement(con, buffer);
}
+ executeStatement(con,
+ "create or replace function pgbench_transaction(arg_aid int, arg_bid int, arg_tid int, arg_delta int) returns int as $$"
+ "DECLARE\n"
+ "abal int;\n"
+ "BEGIN\n"
+ "UPDATE pgbench_accounts SET abalance = abalance + arg_delta WHERE aid = arg_aid;\n"
+ "SELECT abalance into abal FROM pgbench_accounts WHERE aid = arg_aid;\n"
+ "UPDATE pgbench_tellers SET tbalance = tbalance + arg_delta WHERE tid = arg_tid;\n"
+ "UPDATE pgbench_branches SET bbalance = bbalance + arg_delta WHERE bid = arg_bid;\n"
+ "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (arg_tid, arg_bid, arg_aid, arg_delta, CURRENT_TIMESTAMP);\n"
+ "RETURN abal;\n"
+ "END;\n"
+ "$$ language plpgsql");
+
executeStatement(con, "begin");
for (i = 0; i < nbranches * scale; i++)
On Sat, Aug 26, 2017 at 3:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I get nearly a 3 fold speed up using the new transaction, from 9184 to 26383
TPS, on 8 CPU machine using scale 50 and:PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like
What about with "-M prepared"? I think that most of us use that
setting already, especially with CPU-bound workloads.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 26, 2017 at 4:28 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Aug 26, 2017 at 3:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I get nearly a 3 fold speed up using the new transaction, from 9184 to
26383
TPS, on 8 CPU machine using scale 50 and:
PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like
What about with "-M prepared"? I think that most of us use that
setting already, especially with CPU-bound workloads.
I still get a 2 fold improvement, from 13668 to 27036, when both
transactions are tested with -M prepared.
I am surprised, I usually haven't seen that much difference for the default
queries between prepared or not, to the point that I got out of the habit
of testing with it. But back when I was testing with and without
systematically, I did notice that it changed a lot depending on hardware
and concurrency. And of course from version to version different
bottlenecks come and go.
And thanks to Tom for letting me put -M at the end of the command line now.
Cheers,
Jeff
On Sat, Aug 26, 2017 at 4:59 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I still get a 2 fold improvement, from 13668 to 27036, when both
transactions are tested with -M prepared.I am surprised, I usually haven't seen that much difference for the default
queries between prepared or not, to the point that I got out of the habit of
testing with it. But back when I was testing with and without
systematically, I did notice that it changed a lot depending on hardware and
concurrency. And of course from version to version different bottlenecks
come and go.
I must admit that I had a similar unpleasant surprise at one point --
"-M prepared" seems to matter *a lot* these days. That's the default
that I'd change, if any.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
If all the data is in memory and you have a system with fast fsyncs (or are
running with fsync off, or unlogged tables, or synchronous_commit off),
then the big bottleneck in pgbench is the amount of back and forth between
the pgbench program and the backend. There are 7 commands per transaction.
Yeah ...
It is easy to package 5 of those commands into a single PL/pgSQL function,
with the other two being implicit via the standard auto-commit behavior
when explicit transactions are not opened. The attached patch does that,
under the name tpcb-func. I first named it tpcb-like-func, but one builtin
name can't be a prefix or another so that won't work.
I dunno, it seems like this proposal involves jacking up the test case
and driving a completely different one underneath. There is no reason
to consider that you've improved the benchmark results --- you've just
substituted a different benchmark, one with no historical basis, and
not a lot of field justification either.
Wanting to measure IPC overhead is a valid thing to do, but
certainly isn't the most common thing people want to do with pgbench.
I think that's nonsense. Measuring how fast PG can do client interactions
is EXACTLY what this is about. Certainly, pushing SQL operations into
server-side functions is a great way to reduce network overhead, but it
has nothing to do with what we choose as a benchmark.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
If all the data is in memory and you have a system with fast fsyncs (or
are running with fsync off, or unlogged tables, or synchronous_commit
off), then the big bottleneck in pgbench is the amount of back and forth
between the pgbench program and the backend.
Sure. The throughput of a benchmark depends on a bottleneck which may be
disk ios, cpu, network, load... depending on the test conditions.
I tested quite a few variants for my PgDay Paris 2017 talk, including PL
functions, see https://wiki.postgresql.org/wiki/PgDay_Paris_2017.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Tom,
I dunno, it seems like this proposal involves jacking up the test case
and driving a completely different one underneath. There is no reason
to consider that you've improved the benchmark results --- you've just
substituted a different benchmark, one with no historical basis, and
not a lot of field justification either.
ISTM that putting some SQL in a function and calling it is standard
practice in some classes of applications, although probably not the most
frequent.
Moreover, as far as the TPC-B benchmark is concerned, it looks like a
perflectly legitimate implementation of the benchmark: the transaction
profile (Section 1.2) is described as 4 inputs sent in and one result
returned. The fact that the SQL commands are sent one at a time by the
client to the server is a pgbench choice that I would not have done if I
wanted to show the greatest TPC-B numbers with Pg.
Nor does it mean that it is a bad idea to do so... For instance an ORM web
application might tend to generate simple unprepared CRUD queries and
interact a lot back and forth, and the default test is not to bad to
reflect that particular kind of behavior.
Basically there are different kind of application implementations and
different tests could reflect those. So I am fine with providing more
benchmarking options to pgbench, and I intend to do so once its
capabilities are improved.
A caveat I have with Jeff patch is that "tpcb-func" is a misnommer because
pgbench does NOT implement tpcb per spec, and it is my intention to
propose a variant which does implement the spec when possible. Now I think
that I'm also responsible for the prefix constraint on names...
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
About the patch:
I'm generally in favor of providing more options to pgbench, especially if
it can give optimization ideas to the performance conscious user.
I think that the name should be "tpcb-like-plfunc": the script does not
implement tpcb per spec, and such a function could be written in another
language with some performance benefit, or not.
Maybe that mean to relax the prefix condition to "take the first matching
name" when prefix are used.
If you are reimplementing the transaction anyway, you could consider using
UPDATE RETURNING instead of SELECT to get the balance. On the other hand
the doc says that the "steps" are put in a PL function, so maybe it should
reflect the original script.
I'm surprised by:
"select * from pgbench_transaction(:aid, :bid, :tid, :delta);\n"
Why not simply:
"select pgbench_transaction(:aid, :bid, :tid, :delta);\n"
I would suggest to use a more precise function name, in case other
functions are thought of. Maybe "pgbench_tpcb_like_plfunc".
I would suggest to indent better the PL/function and put keywords and
types in capital, and add explicitely the properties of the function (eg
STRICT, VOLATILE?).
There is a spurious space at the end of the executeStatement call line.
The patch potentially interacts with other patches in the long and
slow queue...
As usual with pgbench there are no regression tests.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
If all the data is in memory and you have a system with fast fsyncs (or are
running with fsync off, or unlogged tables, or synchronous_commit off),
then the big bottleneck in pgbench is the amount of back and forth between
the pgbench program and the backend. There are 7 commands per transaction.Yeah ...
It is easy to package 5 of those commands into a single PL/pgSQL function,
with the other two being implicit via the standard auto-commit behavior
when explicit transactions are not opened. The attached patch does that,
under the name tpcb-func. I first named it tpcb-like-func, but one builtin
name can't be a prefix or another so that won't work.I dunno, it seems like this proposal involves jacking up the test case
and driving a completely different one underneath. There is no reason
to consider that you've improved the benchmark results --- you've just
substituted a different benchmark, one with no historical basis, and
not a lot of field justification either.Wanting to measure IPC overhead is a valid thing to do, but
certainly isn't the most common thing people want to do with pgbench.I think that's nonsense. Measuring how fast PG can do client interactions
is EXACTLY what this is about. Certainly, pushing SQL operations into
server-side functions is a great way to reduce network overhead, but it
has nothing to do with what we choose as a benchmark.
Current implementation of pgbench allows Pgpool-II (or any proxy type
middle ware) to test the behavior on PostgreSQL clusters. For example
it sends write queries to the master DB node and read queries to
standby nodes to distribute loads among DB nodes.
With the proposed implementation it is not possible to do that kind of
test anymore since everything is packed into a function.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-08-28 08:05:11 +0900, Tatsuo Ishii wrote:
With the proposed implementation it is not possible to do that kind of
test anymore since everything is packed into a function.
Don't think anybody is proposing to remove the existing way to run
pgbench, so I'm not sure what your point is?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Don't think anybody is proposing to remove the existing way to run
pgbench, so I'm not sure what your point is?
I know. I just wanted to point out that the proposal is not good for
cluster environment tests.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Aug 27, 2017 at 12:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
It is easy to package 5 of those commands into a single PL/pgSQL function,
with the other two being implicit via the standard auto-commit behavior
when explicit transactions are not opened. The attached patch does that,
under the name tpcb-func. I first named it tpcb-like-func, but one builtin
name can't be a prefix or another so that won't work.I dunno, it seems like this proposal involves jacking up the test case
and driving a completely different one underneath. There is no reason
to consider that you've improved the benchmark results --- you've just
substituted a different benchmark, one with no historical basis, and
not a lot of field justification either.
Performance comparison between major releases matters only if the same
set of tests is used.
Wanting to measure IPC overhead is a valid thing to do, but
certainly isn't the most common thing people want to do with pgbench.I think that's nonsense. Measuring how fast PG can do client interactions
is EXACTLY what this is about. Certainly, pushing SQL operations into
server-side functions is a great way to reduce network overhead, but it
has nothing to do with what we choose as a benchmark.
This thread makes me think that it would be a good idea to add in the
documentation of pgbench a section that gives out a set of scripts
that can be used for emulating more patterns instead of having them in
the code. The proposed test has value if one would like to compare if
it is better for an application to move more things server-side if
there is a lot of latency with the existing tpcb test of pgbench, but
that's not the end of it.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 Aug 2017, at 08:37, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
About the patch:
I'm generally in favor of providing more options to pgbench, especially if it can give optimization ideas to the performance conscious user.
I think that the name should be "tpcb-like-plfunc": the script does not implement tpcb per spec, and such a function could be written in another language with some performance benefit, or not.
Maybe that mean to relax the prefix condition to "take the first matching name" when prefix are used.
If you are reimplementing the transaction anyway, you could consider using UPDATE RETURNING instead of SELECT to get the balance. On the other hand the doc says that the "steps" are put in a PL function, so maybe it should reflect the original script.
I'm surprised by:
"select * from pgbench_transaction(:aid, :bid, :tid, :delta);\n"
Why not simply:
"select pgbench_transaction(:aid, :bid, :tid, :delta);\n"
I would suggest to use a more precise function name, in case other functions are thought of. Maybe "pgbench_tpcb_like_plfunc".
I would suggest to indent better the PL/function and put keywords and types in capital, and add explicitely the properties of the function (eg STRICT, VOLATILE?).
There is a spurious space at the end of the executeStatement call line.
The patch potentially interacts with other patches in the long and slow queue...
As usual with pgbench there are no regression tests.
This patch has been Waiting for author during the commitfest without updates,
moving to Returned with feedback.
cheers ./daniel
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers