WIP Patch: Pgbench Serialization and deadlock errors

Started by Marina Polyakovaabout 8 years ago16 messages

m.polyakova@postgrespro.ru

about 8 years ago

1 attachment(s)

Hello, hackers!

This is the fourth version of the patch for pgbench. As in the previous
one, most of the changes were made thanks to the comments of Fabien
Coelho. Here I propose not to abort the client in pgbench in case of
various errors including unsuccessful meta commands. As in the previous
version, transactions with serialization and deadlock failures are
rolled back and retried until they end successfully or their number of
tries reaches maximum.

In details:
* Client's run is aborted only in case of a serious error, for example,
the connection with the backend was lost. Otherwise if the execution of
SQL or meta command fails, the client's run continues normally, except
that the current transaction block terminates if possible.
* If a failed transaction block cannot terminate in the current script,
its commands after the error as well as the commands of the following
scripts are processed as usual so you can get a lot of errors of type
"in failed SQL transaction" (when the current SQL transaction is aborted
and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

The main differences from the previous version:
* support for compound commands (as I understand, they are not properly
supported in the original version, since the execution status is
processed only for the first subcommand). Note that:
** not all compound commands that contain subcommands to start or end an
explicit transaction block can be retried. For example, you cannot use a
compound command to complete a failed transaction block before retrying.
Also if there're any subcommands before the explicit transaction block,
the latter cannot be retried from the very beginning.
** if you want to know which subcommands of compound SQL commands
failed, use the debug output.
* do not divide retries by type in all reports (you can get this
information from the debug output).
* (because of all of the above) option --debug-fails. If you want to
distinguish between failures or errors by type, or you want to know
which subcommands of compound SQL commands failed, use the the pgbench
debugging output created with the option --debug-fails or --debug. The
first option is recommended for this purpose because with the second
option the debugging output can be very large.
* tests for serialization/deadlock errors and retries use advisory locks
(as I wrote here [1]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru other more simple methods can sometimes fail); and
they became lighter (28 simple tests instead of 57, 8 runs of pgbench
instead of 17).

Other smaller differences from the previous version:
* receive information from the server whether we are in the transaction
block or not;
* naming fix (I will be very grateful if a native English speaker
comments on this): failures are errors if they cannot be retried;
* as usual, code cleanup; in particular, use states instead of the bool
variables to complete a failed transaction block.

Patch is attached. Any suggestions are welcome!

[1]: /messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru
/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v4-0001-Pgbench-Errors-and-serialization-deadlock-retries.patchtext/x-diff; name=v4-0001-Pgbench-Errors-and-serialization-deadlock-retries.patchDownload

From 3356215eddb32fb3d5417b34ecd8bca7e831ee0f Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 19 Dec 2017 16:10:14 +0300
Subject: [PATCH v4] Pgbench Errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally, except that the current
transaction block terminates if possible.

Transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of tries
reaches maximum. You can set the maximum number of tries by using the
appropriate benchmarking option (--max-tries). The default value is 1.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block cannot terminate in the current script, its
commands after the error as well as the commands of the following scripts are
processed as usual so you can get a lot of errors of type "in failed SQL
transaction" (when the current SQL transaction is aborted and commands ignored
until end of transaction block). In such cases you can use separate statistics
of these errors in all reports.

If you want to distinguish between failures or errors by type, or you want to
know which subcommands of compound SQL commands failed, use the the pgbench
debugging output created with the option --debug-fails or --debug. The first
option is recommended for this purpose because with the second option the
debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  349 ++++-
 src/bin/pgbench/pgbench.c                          | 1530 +++++++++++++++++---
 src/bin/pgbench/t/001_pgbench_with_server.pl       |    6 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  828 +++++++++++
 4 files changed, 2492 insertions(+), 221 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 4431fc3..899a70a 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,6 +49,7 @@
 
 <screen>
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+maximum number of transaction tries: 1
 scaling factor: 10
 query mode: simple
 number of clients: 10
@@ -59,12 +60,13 @@ tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
+  The first seven lines report some of the most important parameter
   settings.  The next line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
   failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  number of transactions is printed.)  (see <xref linkend="errors-and-retries"
+  endterm="errors-and-retries-title"/> for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -513,22 +515,49 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the <option>--max-tries</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
+       <note>
+        <para>
+         If a custom script contains multiple transactions, its run is reported
+         as failed if any of its transactions or meta commands failed. The run
+         of this script is reported as <literal>in failed tx</literal> if some
+         of its transactions ended with an error "in failed SQL transaction
+         block" and there were no other errors. The run of this script is
+         reported as retried if any of its transactions was retried after a
+         serialization/deadlock failure. The total sum of retries includes the
+         retries for all transactions in this script.
+        </para>
+       </note>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -657,6 +686,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--debug-fails</option></term>
+      <listitem>
+       <para>
+        Print debugging output only for errors and serialization/deadlock
+        failures. See <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
        <para>
@@ -667,6 +707,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. The default is 1.
+       </para>
+       <note>
+        <para>
+         Be careful when repeating transactions with shell commands. Unlike the
+         results of SQL commands, the results of shell commands are not rolled
+         back, except for the variable value of the <command>\setshell</command>
+         command.
+        </para>
+       </note>
+       <note>
+        <para>
+         In pgbench you can use scripts in which the transaction blocks do not
+         end. Be careful in this case because transactions that span over more
+         than one script are not rolled back and will not be retried in case of
+         an error. In such cases, the script in which the error occurred is
+         reported as failed.
+        </para>
+       </note>
+       <note>
+        <para>
+         Not all compound commands that contain subcommands to start or end an
+         explicit transaction block can be retried. For example, you cannot use
+         a compound command to complete a failed transaction block before
+         retrying. Also if there're any subcommands before the explicit
+         transaction block, the latter cannot be retried from the very
+         beginning.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -1256,7 +1333,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1277,8 +1354,28 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>). If the transaction ended with an error "in
+   failed SQL transaction", its <replaceable>time</replaceable> will be reported
+   as <literal>in_failed_tx</literal>. If the transaction ended with other
+   error, its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
+  <note>
+   <para>
+    If a custom script contains multiple transactions, some of its transactions
+    failed with an error "in failed SQL transaction block" and there were not
+    other errors, its <replaceable>time</replaceable> will be reported as
+    <literal>in_failed_tx</literal>. If there were other errors, its
+    <replaceable>time</replaceable> will be reported as
+    <literal>failed</literal>.
+   </para>
+  </note>
+
   <para>
    Here is a snippet of a log file generated in a single-client run:
 <screen>
@@ -1306,6 +1403,23 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10:
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1320,7 +1434,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1334,7 +1448,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1342,21 +1462,43 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>). They report the number of retried
+   transactions and the sum of all the retries after serialization or deadlock
+   failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
+  <note>
+   <para>
+    If a custom script contains multiple transactions and any of its
+    transactions or meta commands failed, its run is counted in
+    <replaceable>failed_tx</replaceable> only once. Again, if some of its
+    transactions ended with an error "in failed SQL transaction block" and there
+    were no other errors, its run is counted in
+    <replaceable>in_failed_tx</replaceable> only once. Similarly, if any of
+    these transactions was retried, the <replaceable>retried_tx</replaceable>
+    statistics is increased by one, regardless of the number of retried
+    transactions. However, the <replaceable>retries</replaceable> and include
+    the sum of all retries after the serialization or deadlock failures for all
+    transactions in this script.
+   </para>
+  </note>
+
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1368,13 +1510,60 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
+  </para>
+
+  <note>
+   <para>
+    If a custom script contains multiple transactions, each failed or retried
+    transaction is counted as well as each failed meta command. Thus, the total
+    sum of per-command errors and/or retries can be greater than the number of
+    failed and/or retried transaction scripts correspondingly.
+   </para>
+  </note>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -1382,6 +1571,7 @@ END;
 <screen>
 starting vacuum...end.
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+maximum number of transaction tries: 1
 scaling factor: 1
 query mode: simple
 number of clients: 10
@@ -1405,10 +1595,49 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+maximum number of transaction tries: 100
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 10000/10000
+number of errors: 7194 (71.940 %)
+latency average = 592.404 ms
+latency stddev = 1026.986 ms
+number of transactions retried: 8855 (88.550 %)
+number of retries: 759474
+tps = 20.287098 (including connections establishing)
+tps = 20.287277 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.003     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.222     0       0  BEGIN;
+  0.736     0       1  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.502     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.966  7175  754739  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  1.136    19    4725  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.881     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  0.855     0       9  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1422,6 +1651,76 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally, except that the
+   current transaction block terminates if possible. Transactions with
+   serialization or deadlock failures are rolled back and repeated until they
+   complete successfully or reach the maximum number of tries specified by the
+   <option>--max-tries</option> option. If the last transaction run fails, this
+   transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    If a failed transaction block cannot terminate in the current script, its
+    commands after the error as well as the commands of the following scripts
+    are processed as usual so you can get a lot of errors of type "in failed SQL
+    transaction" (when the current SQL transaction is aborted and commands
+    ignored until end of transaction block). In such cases you can use separate
+    statistics of these errors in all reports.
+   </para>
+  </note>
+
+  <para>
+   In pgbench you can use scripts in which the transaction blocks do not end. Be
+   careful in this case because transactions that span over more than one script
+   are not rolled back and will not be retried in case of an error. In such
+   cases, the script in which the error occurred is reported as failed.
+  </para>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use option
+   <option>--max-tries</option> to make it possible). The per-statement report
+   inherits all columns from the main report.
+  </para>
+  <note>
+   <para>
+    If a custom script contains multiple transactions, its run is reported as
+    failed if any of its transactions or meta commands failed. The run of this
+    script is reported separatly if some of its transactions ended with an error
+    "in failed SQL transaction block" and there were no other errors. The run of
+    this script is reported as retried if any of its transactions was retried
+    after a serialization/deadlock failure. The total sum of retries includes
+    the retries for all transactions in this script.
+   </para>
+  </note>
+
+  <para>
+   If you want to distinguish between failures or errors by type, or you want to
+   know which subcommands of compound SQL commands failed, use the the
+   <application>pgbench</application> debugging output created with the option
+   <option>--debug-fails</option> or <option>--debug</option>. The first option
+   is recommended for this purpose because with the second option the debugging
+   output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7ce6f60..7e20c6b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -58,6 +58,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -174,8 +177,13 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
+int			max_tries = 1;		/* maximum number of tries to run the
+								 * transaction with serialization or deadlock
+								 * failures */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -232,11 +240,83 @@ typedef struct StatsData
 	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	/*
+	 * Command number in script; -1 if there were not any transactions yet or we
+	 * continue the transaction block from the previous scripts
+	 */
+	int			command;
+
+	/*
+	 * The last non-empty subcommand that may be retried in this command.
+	 * -1 if all subcommands may be retried.
+	 */
+	int			subcmd;
+
+	int			retries;
+
+	unsigned short random_state[3];	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	META_COMMAND_FAILURE,
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION,
+	ANOTHER_FAILURE,
+	NUM_FAILURE_STATUS
+} FailureStatus;
+
+/*
+ * Data structure for the completion of a failed transaction block.
+ */
+typedef struct LastFailure
+{
+	int			command;		/* which command has failed */
+	int			subcmd;			/* which non-empty subcommand of a compound SQL
+								 * command failed; not used for meta commands */
+	bool		was_in_tx_block;	/* were we in the transaction block before
+									 * the failed command */
+	FailureStatus status;		/* type of failure */
+} LastFailure;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -287,6 +367,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with failures or errors.
+	 *
+	 * First, report the failure in CSTATE_FAILURE.
+	 *
+	 * Then, if we need to end the failed transaction block, go to states
+	 * 1) CSTATE_END_FAILED_TX_BLOCK_START
+	 * 2) CSTATE_END_FAILED_TX_BLOCK_WAIT
+	 * 3) CSTATE_END_FAILED_TX_BLOCK_END
+	 * with the appropriate command.
+	 *
+	 * If our try to end this transaction block failed, report this in
+	 * CSTATE_END_FAILED_TX_BLOCK_FAILURE and go ahead with next command.
+	 *
+	 * But if everything is OK, go to CSTATE_RETRY /
+	 * CSTATE_END_FAILED_TX_BLOCK_RETRY. If we can repeat the failed
+	 * transaction, set the same parameters for the transaction execution as in
+	 * the previous tries. Otherwise, go to the next command after the failed
+	 * transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	CSTATE_END_FAILED_TX_BLOCK_START,	/* same as CSTATE_START_COMMAND */
+	CSTATE_END_FAILED_TX_BLOCK_WAIT,	/* same as CSTATE_WAIT_RESULT */
+	CSTATE_END_FAILED_TX_BLOCK_END,		/* same as CSTATE_END_COMMAND */
+	CSTATE_END_FAILED_TX_BLOCK_FAILURE,	/* same as CSTATE_FAILURE */
+	CSTATE_END_FAILED_TX_BLOCK_RETRY,	/* same as CSTATE_RETRY */
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -311,14 +420,13 @@ typedef struct
 	PGconn	   *con;			/* connection handle to DB */
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
+	unsigned short random_state[3];	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -328,6 +436,29 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/* for compound SQL commands */
+	int			subcmd;			/* for what non-empty subcommand of the current
+								 * SQL command we get the query result from the
+								 * server; 0 if all subcommands are empty; not
+								 * used for meta commands */
+
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	bool		in_tx_block;	/* are we in transaction block? */
+	bool		was_in_tx_block;	/* were we in the transaction block before
+									 * executing the current command? */
+	LastFailure last_failure;	/* the status of the failure before we try to
+								 * end a failed transaction block */
+	RetryState  retry_state;
+	int64		retries;
+	bool		error;			/* if there was a serialization or a deadlock
+								 * failure without retrying or another error
+								 * (including meta commands errors) */
+	bool		error_not_in_failed_tx;	/* if there was an error other than
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -371,7 +502,6 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -413,6 +543,18 @@ typedef enum QueryMode
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
+/*
+ * Data structure for non-empty subcommands of the compound SQL command.
+ */
+typedef struct Subcommand
+{
+	int			initial_subcmd_num;	/* subcommand number between all
+									 * subcommands */
+	int			tx_block_end;	/* the number of the nearest non-empty
+								 * subcommand to complete the transaction block
+								 * or -1 */
+} Subcommand;
+
 typedef struct
 {
 	char	   *line;			/* text of command line */
@@ -423,6 +565,28 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
+
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	int			tx_block_end;	/* the number of the nearest command to complete
+								 * the transaction block or -1 */
+
+	/*
+	 * For non-empty subcommands of the compound SQL command (not used for meta
+	 * commands):
+	 */
+	int			num_subcmds;	/* number of non-empty subcommands; 1 if the
+								 * command is not compound */
+	Subcommand *subcmd;
+	int			first_tx_block_begin;	/* the number of the first non-empty
+										 * subcommand that syntactically starts
+										 * the transaction block or -1 */
 } Command;
 
 typedef struct ParsedScript
@@ -439,6 +603,7 @@ static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
 static int	debug = 0;			/* debug flag */
+static int	debug_fails = 0;	/* debug flag for failures and errors */
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -486,6 +651,34 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+/*
+ * Data structure for the properties of various types of failures.
+ */
+typedef struct FailureStatusProperties
+{
+	const char *failure_status_str;	/* NULL if not retriable */
+	const char *error_status_str;
+} FailureStatusProperties;
+
+static const FailureStatusProperties FAILURE_STATUS[] =
+{
+	{NULL, NULL},				/* NO_FAILURE */
+	{NULL, NULL},				/* META_COMMAND_FAILURE */
+	{							/* SERIALIZATION_FAILURE */
+		"a serialization failure",
+		"a serialization error"
+	},
+	{							/* DEADLOCK_FAILURE */
+		"a deadlock failure",
+		"a deadlock error"
+	},
+	{							/* IN_FAILED_SQL_TRANSACTION */
+		NULL,
+		"an error \"in failed SQL transaction\""
+	},
+	{NULL, NULL}				/* ANOTHER_FAILURE */
+};
+
 
 /* Function prototypes */
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -547,15 +740,17 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --debug-fails            print debugging output only for failures and errors\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
@@ -667,7 +862,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(unsigned short random_state[3], int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -678,7 +873,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state));
 }
 
 /*
@@ -687,7 +882,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(unsigned short random_state[3], int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -697,7 +893,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -710,7 +906,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(unsigned short random_state[3], int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -738,8 +935,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state);
+		double		rand2 = 1.0 - pg_erand48(random_state);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -766,7 +963,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(unsigned short random_state[3], int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -775,7 +972,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -853,7 +1050,7 @@ zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(unsigned short random_state[3], int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -864,8 +1061,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state);
+		v = pg_erand48(random_state);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -883,10 +1080,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, unsigned short random_state[3], int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -898,7 +1096,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, unsigned short random_state[3], int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -907,8 +1106,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					  ? computeIterativeZipfian(random_state, n, s)
+					  : computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -960,6 +1159,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -968,7 +1171,8 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, bool error, double lat, double lag,
+		   bool error_not_in_failed_tx, int64 retries)
 {
 	stats->cnt++;
 
@@ -977,6 +1181,11 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag)
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
 	}
+	else if (error)
+	{
+		/* no latency to record on failed transactions */
+		stats->errors++;
+	}
 	else
 	{
 		addToSimpleStats(&stats->latency, lat);
@@ -985,6 +1194,13 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag)
 		if (throttle_delay)
 			addToSimpleStats(&stats->lag, lag);
 	}
+
+	if (error && !error_not_in_failed_tx)
+		stats->errors_in_failed_tx++;
+
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
 }
 
 /* call PQexec() and exit() on failure */
@@ -1110,39 +1326,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1183,9 +1399,10 @@ makeVariableNumeric(Variable *var)
 
 		if (sscanf(var->value, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->value);
+			if (debug_fails)
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->value);
 			return false;
 		}
 		setDoubleValue(&var->num_value, dv);
@@ -1234,11 +1451,12 @@ valid_variable_name(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool write_message)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1249,29 +1467,30 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			if (write_message || debug_fails)
+				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
+						context, name);
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->value = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1280,12 +1499,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, true);
 	if (!var)
 		return false;
 
@@ -1303,12 +1523,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a numeric value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableNumber(CState *st, const char *context, char *name,
-				  const PgBenchValue *value)
+putVariableNumber(Variables *variables, const char *context, char *name,
+				  const PgBenchValue *value, bool write_message)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, write_message);
 	if (!var)
 		return false;
 
@@ -1324,12 +1544,13 @@ putVariableNumber(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value, bool write_message)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableNumber(st, context, name, &val);
+	return putVariableNumber(variables, context, name, &val, write_message);
 }
 
 /*
@@ -1384,7 +1605,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1405,7 +1626,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1420,12 +1641,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 /* get a value as an int, tell if there is a problem */
@@ -1444,7 +1666,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 		Assert(pval->type == PGBT_DOUBLE);
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			if (debug_fails)
+				fprintf(stderr, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1505,8 +1728,9 @@ evalFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		if (debug_fails)
+			fprintf(stderr,
+					"too many function arguments, maximum is %d\n", MAX_FARGS);
 		return false;
 	}
 
@@ -1586,7 +1810,8 @@ evalFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								if (debug_fails)
+									fprintf(stderr, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1597,7 +1822,9 @@ evalFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										if (debug_fails)
+											fprintf(stderr,
+													"bigint out of range\n");
 										return false;
 									}
 									else
@@ -1783,20 +2010,22 @@ evalFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					if (debug_fails)
+						fprintf(stderr, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					if (debug_fails)
+						fprintf(stderr, "random range is too large\n");
 					return false;
 				}
 
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -1811,39 +2040,45 @@ evalFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							if (debug_fails)
+								fprintf(stderr,
+										"gaussian parameter must be at least %f (not %f)\n",
+										MIN_GAUSSIAN_PARAM, param);
 							return false;
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							if (debug_fails)
+								fprintf(stderr,
+										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							if (debug_fails)
+								fprintf(stderr,
+										"exponential parameter must be greater than zero (got %f)\n",
+										param);
 							return false;
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -1879,10 +2114,11 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					if (debug_fails)
+						fprintf(stderr, "undefined variable \"%s\"\n",
+								expr->u.variable.varname);
 					return false;
 				}
 
@@ -1935,7 +2171,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -1966,17 +2202,19 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			if (debug_fails)
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[i]);
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			if (debug_fails)
+				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -1993,7 +2231,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded)
+			if (!timer_exceeded && debug_fails)
 				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 			return false;
 		}
@@ -2003,19 +2241,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		if (debug_fails)
+			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded)
+		if (!timer_exceeded && debug_fails)
 			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		if (debug_fails)
+			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2025,11 +2265,13 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		if (debug_fails)
+			fprintf(stderr,
+					"%s: shell command must return an integer (not \"%s\")\n",
+					argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval, false))
 		return false;
 
 #ifdef DEBUG
@@ -2046,7 +2288,7 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *message)
+commandFailed(CState *st, char *message)
 {
 	fprintf(stderr,
 			"client %d aborted in command %d of script %d; %s\n",
@@ -2055,7 +2297,7 @@ commandFailed(CState *st, const char *message)
 
 /* return a script number with a weighted choice. */
 static int
-chooseScript(TState *thread)
+chooseScript(CState *st)
 {
 	int			i = 0;
 	int64		w;
@@ -2063,7 +2305,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(st->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2083,7 +2325,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2095,7 +2337,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2129,7 +2371,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		if (debug)
@@ -2157,17 +2399,18 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			if (debug_fails)
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[1]);
 			return false;
 		}
 		usec = atoi(var);
@@ -2189,6 +2432,217 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/* make a deep copy of variables array */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->value);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->value)
+			current_destination->value = pg_strdup(current_source->value);
+		else
+			current_destination->value = NULL;
+		current_destination->is_numeric = current_source->is_numeric;
+		current_destination->num_value = current_source->num_value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if the SQL command contains more than one non-empty subcommand.
+ */
+static bool
+is_compound(const Command *command)
+{
+	return (command &&
+			command->type == SQL_COMMAND &&
+			command->num_subcmds > 1);
+}
+
+/*
+ * Returns true if we can end a failed transaction block by the nearest
+ * appropriate command and false otherwise.
+ */
+static bool
+canEndFailedTxBlock(CState *st)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	/* check that this script did not end */
+	if (!command)
+		return false;
+
+	/* we cannot use the subcommands from the current command. */
+	if (command->type == SQL_COMMAND &&
+		command->subcmd[st->subcmd].tx_block_end > st->subcmd)
+		return false;
+
+	/* check if there's a transaction block end later in this script */
+	if (command->tx_block_end == -1)
+		return false;
+
+	return !is_compound(
+					sql_script[st->use_file].commands[command->tx_block_end]);
+}
+
+/*
+ * Returns the number of the first non-empty subcommand to complete the
+ * transaction block or -1 otherwise.
+ */
+static int
+get_first_tx_block_end(const Command *command)
+{
+	if (!command || command->type != SQL_COMMAND)
+		return -1;
+
+	return command->subcmd->tx_block_end;
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, FailureStatus failure_status)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	Assert(failure_status >= 0 && failure_status < NUM_FAILURE_STATUS);
+	Assert(st->command >= st->retry_state.command);
+
+	/* We cannot retry if we do not end the failed transaction block. */
+	if (st->state == CSTATE_END_FAILED_TX_BLOCK_START ||
+		st->state == CSTATE_END_FAILED_TX_BLOCK_WAIT ||
+		st->state == CSTATE_END_FAILED_TX_BLOCK_END ||
+		st->state == CSTATE_END_FAILED_TX_BLOCK_FAILURE)
+		return false;
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!FAILURE_STATUS[failure_status].failure_status_str)
+		return false;
+
+	/*
+	 * We cannot retry if we cannot end a failed transaction block by the
+	 * nearest appropriate command.
+	 */
+	if (st->in_tx_block && !canEndFailedTxBlock(st))
+		return false;
+
+	/*
+	 * We cannot retry if we continue the failed transaction block from the
+	 * previous script.
+	 */
+	if (st->retry_state.command == -1)
+		return false;
+
+	/* Sometimes not all subcommands can be retried from the very beginning. */
+	if (st->retry_state.subcmd >= 0 &&
+		(st->command > st->retry_state.command ||
+		 st->subcmd > st->retry_state.subcmd))
+		return false;
+
+	/* Sometimes not all failed subcommands can be retried. */
+	if (is_compound(command))
+	{
+		if (st->was_in_tx_block || command->first_tx_block_begin == 0)
+		{
+			/*
+			 * We can retry all the subcommands until the end of the first
+			 * explicit transaction block.
+			 */
+			int			first_tx_block_end = get_first_tx_block_end(command);
+
+			if (first_tx_block_end >= 0 &&
+				st->subcmd > get_first_tx_block_end(command))
+				return false;
+		}
+		else
+		{
+			/*
+			 * We can retry all the subcommands before the first explicit
+			 * transaction block.
+			 */
+			if (command->first_tx_block_begin >= 0 &&
+				st->subcmd > command->first_tx_block_begin)
+				return false;
+		}
+	}
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (st->retry_state.retries + 1 >= max_tries)
+		return false;
+
+	return true;
+}
+
+/*
+ * Report that the client got a failure. Report this as an error if a failure
+ * cannot be retried.
+ *
+ * The message must be terminated with a newline character.
+ */
+static void
+print_failure(CState *st, FailureStatus failure_status, const char *message)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+	const char *status_str;
+
+	Assert(failure_status >= 0 && failure_status < NUM_FAILURE_STATUS);
+	Assert(message && *message);
+
+	if (canRetry(st, failure_status))
+	{
+		status_str = FAILURE_STATUS[failure_status].failure_status_str;
+		fprintf(stderr, "client %d got %s (try %d/%d) in command %d",
+				st->id, status_str ? status_str : "a failure",
+				st->retry_state.retries + 1, max_tries, st->command);
+	}
+	else
+	{
+		status_str = FAILURE_STATUS[failure_status].error_status_str;
+		fprintf(stderr, "client %d got %s in command %d",
+				st->id, status_str ? status_str : "an error", st->command);
+	}
+
+	if (is_compound(command))
+		fprintf(stderr, " (subcommand %d)",
+				command->subcmd[st->subcmd].initial_subcmd_num);
+
+	fprintf(stderr, " of script %d:\n%s", st->use_file, message);
+}
+
 /*
  * Advance the state machine of a connection, if possible.
  */
@@ -2200,6 +2654,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2228,12 +2683,20 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_CHOOSE_SCRIPT:
 
-				st->use_file = chooseScript(thread);
+				st->use_file = chooseScript(st);
 
 				if (debug)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
 							sql_script[st->use_file].desc);
 
+				/* reset transaction variables to default values */
+				st->retry_state.command = -1;
+				st->retries = 0;
+				st->error = false;
+				st->error_not_in_failed_tx = false;
+				st->last_failure.status = NO_FAILURE;
+				st->last_failure.command = -1;
+
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
 				else
@@ -2255,7 +2718,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(st->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2289,7 +2752,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(st->random_state, throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -2376,6 +2839,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Send a command to server (or execute a meta-command)
 				 */
 			case CSTATE_START_COMMAND:
+			case CSTATE_END_FAILED_TX_BLOCK_START:
 				command = sql_script[st->use_file].commands[st->command];
 
 				/*
@@ -2388,11 +2852,67 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					break;
 				}
 
+				if (command->type == SQL_COMMAND &&
+					!st->in_tx_block &&
+					st->retry_state.command < st->command)
+				{
+					/*
+					 * It is a first try to run the transaction which begins in
+					 * current command.  Remember its parameters just in case we
+					 * should repeat it in future.
+					 */
+					st->retry_state.command = st->command;
+
+					if (is_compound(command) &&
+						command->first_tx_block_begin >= 0)
+					{
+						if (command->first_tx_block_begin == 0)
+						{
+							/*
+							 * We can retry only the first explicit transaction
+							 * block.
+							 */
+							st->retry_state.subcmd =
+								get_first_tx_block_end(command);
+						}
+						else
+						{
+							/*
+							 * We can retry all the subcommands before the first
+							 * explicit transaction block.
+							 */
+							st->retry_state.subcmd =
+								command->first_tx_block_begin;
+						}
+
+					}
+					else
+					{
+						/*
+						 * The command is not compound or contains no
+						 * subcommands to run an explicit transaction block. So
+						 * we can retry all of its subcommands.
+						 */
+						st->retry_state.subcmd = -1;
+					}
+
+					st->retry_state.retries = 0;
+					memcpy(st->retry_state.random_state, st->random_state,
+						   sizeof(unsigned short) * 3);
+					copyVariables(&st->retry_state.variables, &st->variables);
+				}
+
+				/*
+				 * Remember this if we try to retry the failure in the current
+				 * command.
+				 */
+				st->was_in_tx_block = st->in_tx_block;
+
 				/*
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2406,6 +2926,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						commandFailed(st, "SQL command send failed");
 						st->state = CSTATE_ABORTED;
 					}
+
+					/* wait for the result for the first subcommand */
+					st->subcmd = 0;
+
+					if (st->state == CSTATE_END_FAILED_TX_BLOCK_START)
+						st->state = CSTATE_END_FAILED_TX_BLOCK_WAIT;
 					else
 						st->state = CSTATE_WAIT_RESULT;
 				}
@@ -2423,6 +2949,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						fprintf(stderr, "\n");
 					}
 
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -2434,10 +2962,14 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "execution of meta-command 'sleep' failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = META_COMMAND_FAILURE;
+							if (debug_fails)
+								print_failure(st, failure_status,
+											  "execution of meta-command 'sleep' failed\n");
+
+							st->state = CSTATE_END_COMMAND;
 							break;
 						}
 
@@ -2456,21 +2988,32 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 							if (!evaluateExpr(thread, st, expr, &result))
 							{
-								commandFailed(st, "evaluation of meta-command 'set' failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = META_COMMAND_FAILURE;
+								if (debug_fails)
+									print_failure(st, failure_status,
+												  "evaluation of meta-command 'set' failed\n");
+
+								st->state = CSTATE_END_COMMAND;
 								break;
 							}
 
-							if (!putVariableNumber(st, argv[0], argv[1], &result))
+							if (!putVariableNumber(&st->variables, argv[0],
+												   argv[1], &result, false))
 							{
-								commandFailed(st, "assignment of meta-command 'set' failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = META_COMMAND_FAILURE;
+								if (debug_fails)
+									print_failure(st, failure_status,
+												  "assignment of meta-command 'set' failed\n");
+
+								st->state = CSTATE_END_COMMAND;
 								break;
 							}
 						}
 						else if (command->meta == META_SETSHELL)
 						{
-							bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+							bool		ret = runShellCommand(&st->variables,
+															  argv[1], argv + 2,
+															  argc - 2);
 
 							if (timer_exceeded) /* timeout */
 							{
@@ -2479,8 +3022,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							}
 							else if (!ret)	/* on error */
 							{
-								commandFailed(st, "execution of meta-command 'setshell' failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = META_COMMAND_FAILURE;
+								if (debug_fails)
+									print_failure(st, failure_status,
+												  "execution of meta-command 'setshell' failed\n");
+
+								st->state = CSTATE_END_COMMAND;
 								break;
 							}
 							else
@@ -2490,7 +3037,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (command->meta == META_SHELL)
 						{
-							bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+							bool		ret = runShellCommand(&st->variables,
+															  NULL, argv + 1,
+															  argc - 1);
 
 							if (timer_exceeded) /* timeout */
 							{
@@ -2499,8 +3048,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							}
 							else if (!ret)	/* on error */
 							{
-								commandFailed(st, "execution of meta-command 'shell' failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = META_COMMAND_FAILURE;
+								if (debug_fails)
+									print_failure(st, failure_status,
+												  "execution of meta-command 'shell' failed\n");
+
+								st->state = CSTATE_END_COMMAND;
 								break;
 							}
 							else
@@ -2525,38 +3078,133 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
+			case CSTATE_END_FAILED_TX_BLOCK_WAIT:
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+
+					if (debug)
+						fprintf(stderr, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "perhaps the backend died while processing");
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					failure_status = NO_FAILURE;
+					if (sqlState && *sqlState)
+					{
+						if (strcmp(sqlState,
+								   ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+							failure_status = SERIALIZATION_FAILURE;
+						else if (strcmp(sqlState,
+										ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+							failure_status = DEADLOCK_FAILURE;
+						else if (strcmp(sqlState,
+										ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+							failure_status = IN_FAILED_SQL_TRANSACTION;
+						else
+							failure_status = ANOTHER_FAILURE;
+					}
+
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							if (command->type == SQL_COMMAND &&
+								st->subcmd < command->num_subcmds - 1)
+							{
+								/*
+								 * Wait for the results for the other non-empty
+								 * subcommands.
+								 */
+								PQclear(res);
+								st->subcmd++;
+								break;
+							}
+
+							/*
+							 * Otherwise finish processing the command (failures
+							 * and errors will be processed later).
+							 */
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							{
+								PGTransactionStatusType tx_status;
+								char		buffer[256];
+
+								PQclear(res);
+								discard_response(st);
+
+								tx_status = PQtransactionStatus(st->con);
+								switch (tx_status)
+								{
+									case PQTRANS_IDLE:
+										st->in_tx_block = false;
+										break;
+									case PQTRANS_INTRANS:
+									case PQTRANS_INERROR:
+										st->in_tx_block = true;
+										break;
+									case PQTRANS_UNKNOWN:
+										/*
+										 * PQTRANS_UNKNOWN is expected given a
+										 * broken connection.
+										 */
+										if (PQstatus(st->con) == CONNECTION_BAD)
+										{		/* there's something wrong */
+											commandFailed(st,
+														  "perhaps the backend died while processing");
+											st->state = CSTATE_ABORTED;
+											goto done;
+										}
+									case PQTRANS_ACTIVE:
+									default:
+										/*
+										 * We cannot find out whether we are in
+										 * a transaction block or not. Internal
+										 * error which should never occur.
+										 */
+										sprintf(buffer,
+												"unexpected transaction status %d",
+												tx_status);
+										commandFailed(st, buffer);
+										st->state = CSTATE_ABORTED;
+										goto done;
+								}
+
+								if (failure_status != NO_FAILURE && debug_fails)
+									print_failure(st, failure_status,
+												  PQerrorMessage(st->con));
+
+								if (st->state ==
+									CSTATE_END_FAILED_TX_BLOCK_WAIT)
+									st->state = CSTATE_END_FAILED_TX_BLOCK_END;
+								else
+									st->state = CSTATE_END_COMMAND;
+							}
+							break;
+						default:
+							commandFailed(st, PQerrorMessage(st->con));
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
+
+done:
 				break;
 
 				/*
@@ -2578,26 +3226,176 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * End of command: record stats and proceed to next command.
 				 */
 			case CSTATE_END_COMMAND:
+			case CSTATE_END_FAILED_TX_BLOCK_END:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* process a failure if we have it */
+				if (failure_status != NO_FAILURE)
+				{
+					if (st->state == CSTATE_END_FAILED_TX_BLOCK_END)
+						st->state = CSTATE_END_FAILED_TX_BLOCK_FAILURE;
+					else
+						st->state = CSTATE_FAILURE;
+					break;
+				}
 
 				/*
 				 * command completed: accumulate per-command execution times
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
 
 					/* XXX could use a mutex here, but we choose not to */
-					command = sql_script[st->use_file].commands[st->command];
 					addToSimpleStats(&command->stats,
 									 INSTR_TIME_GET_DOUBLE(now) -
 									 INSTR_TIME_GET_DOUBLE(st->stmt_begin));
 				}
 
-				/* Go ahead with next command */
-				st->command++;
+				if (st->state == CSTATE_END_FAILED_TX_BLOCK_END)
+				{
+					/*
+					 * The failed transaction block has ended. Retry it if
+					 * possible.
+					 */
+					Assert(st->last_failure.command >= 0);
+
+					failure_status = st->last_failure.status;
+					st->command = st->last_failure.command;
+					st->subcmd = st->last_failure.subcmd;
+					st->was_in_tx_block = st->last_failure.was_in_tx_block;
+
+					st->state = CSTATE_END_FAILED_TX_BLOCK_RETRY;
+				}
+				else
+				{
+					/* Go ahead with next command */
+					st->command++;
+					st->state = CSTATE_START_COMMAND;
+				}
+
+				break;
+
+				/*
+				 * Report about failure and end the failed transaction block.
+				 */
+			case CSTATE_FAILURE:
+			case CSTATE_END_FAILED_TX_BLOCK_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				if (canRetry(st, failure_status))
+				{
+					/*
+					 * The failed transaction will be retried. So accumulate the
+					 * retry for the command and for the current script
+					 * execution.
+					 */
+					st->retries++;
+					if (report_per_command)
+						command->retries++;
+				}
+				else
+				{
+					/*
+					 * We will not be able to retry this failed transaction. So
+					 * accumulate the error for the command and for the current
+					 * script execution.
+					 */
+					st->error = true;
+					if (failure_status != IN_FAILED_SQL_TRANSACTION)
+						st->error_not_in_failed_tx = true;
+
+					if (report_per_command)
+					{
+						command->errors++;
+						if (failure_status == IN_FAILED_SQL_TRANSACTION)
+							command->errors_in_failed_tx++;
+					}
+				}
+
+				if (st->state == CSTATE_END_FAILED_TX_BLOCK_FAILURE)
+				{
+					/* Go ahead with next command. */
+					st->command++;
+					st->state = CSTATE_START_COMMAND;
+					break;
+				}
+
+				/*
+				 * Remember the current command and its failure status if we try
+				 * to end a failed transaction block.
+				 */
+				st->last_failure.status = failure_status;
+				st->last_failure.command = st->command;
+				st->last_failure.subcmd = st->subcmd;
+				st->last_failure.was_in_tx_block = st->was_in_tx_block;
+
+				if (st->in_tx_block)
+				{
+					if (canEndFailedTxBlock(st))
+					{
+						/* Try to end the failed transaction block. */
+						st->command = command->tx_block_end;
+						st->state = CSTATE_END_FAILED_TX_BLOCK_START;
+					}
+					else
+					{
+						/*
+						 * We cannot complete a failed transaction block. So go
+						 * ahead with next command.
+						 */
+						st->command++;
+						st->state = CSTATE_START_COMMAND;
+					}
+				}
+				else
+				{
+					/* retry the failed transaction if possible */
+					st->state = CSTATE_RETRY;
+				}
+
+				break;
+
+				/*
+				 * Retry the failed transaction if possible.
+				 */
+			case CSTATE_RETRY:
+			case CSTATE_END_FAILED_TX_BLOCK_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				if (canRetry(st, failure_status))
+				{
+					st->retry_state.retries++;
+					if (debug)
+						fprintf(stderr, "client %d repeats the failed transaction (try %d/%d)\n",
+								st->id,
+								st->retry_state.retries + 1,
+								max_tries);
+
+					st->command = st->retry_state.command;
+					memcpy(st->random_state, st->retry_state.random_state,
+						   sizeof(unsigned short) * 3);
+					copyVariables(&st->variables, &st->retry_state.variables);
+				}
+				else
+				{
+					/*
+					 * Go ahead with next command after the failed transaction.
+					 */
+					if (st->state == CSTATE_END_FAILED_TX_BLOCK_RETRY)
+					{
+						Assert(command->tx_block_end >= 0);
+						st->command = command->tx_block_end + 1;
+					}
+					else
+					{
+						st->command++;
+					}
+				}
+
 				st->state = CSTATE_START_COMMAND;
 				break;
 
@@ -2680,7 +3478,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(st->random_state) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -2696,13 +3494,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -2713,6 +3513,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -2720,7 +3524,8 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, st->error, latency, lag,
+				   st->error_not_in_failed_tx, st->retries);
 	}
 	else
 	{
@@ -2732,12 +3537,25 @@ doLog(TState *thread, CState *st,
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
 					st->id, st->cnt, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->error)
+		{
+			if (st->error_not_in_failed_tx)
+				fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+						st->id, st->cnt, st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+			else
+				fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+						st->id, st->cnt, st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+		}
 		else
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
 					st->id, st->cnt, latency, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1)
+			fprintf(logfile, " " INT64_FORMAT, st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -2757,7 +3575,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && !st->error)
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -2770,7 +3588,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, st->error, latency, lag,
+				   st->error_not_in_failed_tx, st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -2778,8 +3597,9 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, st->error, 0, 0,
+				   st->error_not_in_failed_tx, st->retries);
 	}
 
 	/* client stat is just counting */
@@ -2790,7 +3610,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, st->error,
+				   latency, lag, st->error_not_in_failed_tx, st->retries);
 }
 
 
@@ -3319,6 +4140,82 @@ syntax_error(const char *source, int lineno,
 }
 
 /*
+ * Returns the same text where all continuous blocks of whitespaces are replaced
+ * by one space symbol.
+ *
+ * Returns a malloc'd string.
+ */
+static char *
+normalize_whitespaces(const char *text)
+{
+	const char *ptr = text;
+	char	   *buffer;
+	int			length;
+
+	if (!text)
+		return NULL;
+
+	buffer = pg_malloc(strlen(text) + 1);
+	length = 0;
+
+	while (*ptr)
+	{
+		while (*ptr && !isspace((unsigned char) *ptr))
+			buffer[length++] = *(ptr++);
+		if (isspace((unsigned char) *ptr))
+		{
+			buffer[length++] = ' ';
+			while (isspace((unsigned char) *ptr))
+				ptr++;
+		}
+	}
+	buffer[length] = '\0';
+
+	return buffer;
+}
+
+/*
+ * Returns true if given command syntactically starts the transaction block (we
+ * don't check here if the last transaction block is already completed).
+ */
+static bool
+is_tx_block_begin(const char *command)
+{
+	if (!command)
+		return false;
+
+	return (pg_strncasecmp(command, "begin", 5) == 0 ||
+			pg_strncasecmp(command, "start", 5) == 0);
+}
+
+/*
+ * Returns true if given command generally ends a transaction block (we don't
+ * check here if the last transaction block is already completed).
+ */
+static bool
+is_tx_block_end(const char *command)
+{
+	if (!command)
+		return false;
+
+	if (pg_strncasecmp(command, "end", 3) == 0)
+		return true;
+
+	if (pg_strncasecmp(command, "commit", 6) == 0)
+		return pg_strncasecmp(command, "commit prepared", 15) != 0;
+
+	if (pg_strncasecmp(command, "rollback", 8) == 0)
+		return (pg_strncasecmp(command, "rollback prepared", 17) != 0 &&
+				pg_strncasecmp(command, "rollback to", 11) != 0);
+
+	if (pg_strncasecmp(command, "prepare transaction ", 20) == 0)
+		return (pg_strncasecmp(command, "prepare transaction (", 21) != 0 &&
+				pg_strncasecmp(command, "prepare transaction as ", 23) != 0);
+
+	return false;
+}
+
+/*
  * Parse a SQL command; return a Command struct, or NULL if it's a comment
  *
  * On entry, psqlscan.l has collected the command into "buf", so we don't
@@ -3331,6 +4228,16 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	Command    *my_command;
 	char	   *p;
 	char	   *nlpos;
+	/* for the subcommands of the compound SQL command: */
+	PsqlScanState sstate = psql_scan_create(&pgbench_callbacks);
+	PQExpBufferData line_buf;
+	PsqlScanResult sr;
+	promptStatus_t prompt;
+	char	   *command_text;
+	int			alloc_num = 1;
+	int			num_all_subcmds = 0;
+	bool		has_non_empty_subcmd = false;
+	int			last_tx_block_end = -1;
 
 	/* Skip any leading whitespace, as well as "--" style comments */
 	p = buf->data;
@@ -3359,6 +4266,14 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	initSimpleStats(&my_command->stats);
+	my_command->tx_block_end = -1;
+	my_command->first_tx_block_begin = -1;
+
+	/* if the command is empty */
+	my_command->num_subcmds = 1;
+	my_command->subcmd = (Subcommand *) pg_malloc0(
+		sizeof(Subcommand) * alloc_num);
+	my_command->subcmd->tx_block_end = -1;
 
 	/*
 	 * Install query text as the sole argv string.  If we are using a
@@ -3381,6 +4296,89 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	else
 		my_command->line = pg_strdup(p);
 
+	command_text = normalize_whitespaces(my_command->argv[0]);
+
+	/* get subcommands of a compound command */
+	psql_scan_setup(sstate, command_text, strlen(command_text), 0, true);
+	initPQExpBuffer(&line_buf);
+
+	for (;;)
+	{
+		const char *subcmd_text;
+		Subcommand *subcommand;
+
+		resetPQExpBuffer(&line_buf);
+		sr = psql_scan(sstate, &line_buf, &prompt);
+		subcmd_text = line_buf.data;
+		num_all_subcmds++;
+
+		/* process only non-empty subcommands */
+		if (subcmd_text && *subcmd_text != '\0' && *subcmd_text != ';')
+		{
+			/*
+			 * Do not increment the counter if this is the first non-empty
+			 * subcommand.
+			 */
+			if (has_non_empty_subcmd)
+				my_command->num_subcmds++;
+			else
+				has_non_empty_subcmd = true;
+
+			subcommand = &my_command->subcmd[my_command->num_subcmds - 1];
+			subcommand->initial_subcmd_num = num_all_subcmds - 1;
+
+			/*
+			 * Check if the subcommand syntactically starts the transaction
+			 * block.
+			 */
+			if (is_tx_block_begin(subcmd_text) &&
+				my_command->first_tx_block_begin == -1)
+				my_command->first_tx_block_begin = my_command->num_subcmds - 1;
+
+			/*
+			 * Check if the subcommand syntactically completes the transaction
+			 * block.
+			 */
+			if (is_tx_block_end(subcmd_text))
+			{
+				int			cur_index;
+
+				/*
+				 * Remember it for non-empty subcommands that can fail a
+				 * transaction block earlier.
+				 */
+				for (cur_index = last_tx_block_end + 1;
+					 cur_index < my_command->num_subcmds;
+					 ++cur_index)
+					my_command->subcmd[cur_index].tx_block_end =
+						my_command->num_subcmds - 1;
+				last_tx_block_end = my_command->num_subcmds - 1;
+			}
+			else
+			{
+				subcommand->tx_block_end = -1;
+			}
+		}
+
+		/* Done if we reached EOF */
+		if (sr == PSCAN_INCOMPLETE || sr == PSCAN_EOL)
+			break;
+
+		if (my_command->num_subcmds >= alloc_num)
+		{
+			alloc_num = alloc_num * 2;
+			my_command->subcmd = (Subcommand *) pg_realloc(
+												my_command->subcmd,
+												sizeof(Subcommand) * alloc_num);
+		}
+	}
+
+	termPQExpBuffer(&line_buf);
+	psql_scan_finish(sstate);
+	psql_scan_destroy(sstate);
+
+	free(command_text);
+
 	return my_command;
 }
 
@@ -3419,6 +4417,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	my_command->type = META_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	my_command->tx_block_end = -1;
 
 	/* Save first word (command name) */
 	j = 0;
@@ -3561,6 +4560,7 @@ ParseScript(const char *script, const char *desc, int weight)
 	PQExpBufferData line_buf;
 	int			alloc_num;
 	int			index;
+	int			last_tx_block_end = -1;
 
 #define COMMANDS_ALLOC_NUM 128
 	alloc_num = COMMANDS_ALLOC_NUM;
@@ -3612,6 +4612,21 @@ ParseScript(const char *script, const char *desc, int weight)
 				ps.commands = (Command **)
 					pg_realloc(ps.commands, sizeof(Command *) * alloc_num);
 			}
+
+			if (get_first_tx_block_end(command) >= 0)
+			{
+				/*
+				 * Remember it for commands that can fail a transaction block
+				 * earlier.
+				 */
+				int			cur_index;
+
+				for (cur_index = last_tx_block_end + 1;
+					 cur_index < index;
+					 cur_index++)
+					ps.commands[cur_index]->tx_block_end = index - 1;
+				last_tx_block_end = index - 1;
+			}
 		}
 
 		/* If we reached a backslash, process that */
@@ -3873,6 +4888,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
+	printf("maximum number of transaction tries: %d\n", max_tries);
 	printf("scaling factor: %d\n", scale);
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
@@ -3903,6 +4919,15 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (total->cnt <= 0)
 		return;
 
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f %%)\n",
+			   total->errors, 100.0 * total->errors / total->cnt);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f %%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total->cnt);
+
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
@@ -3934,11 +4959,19 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			   0.001 * total->lag.sum / total->cnt, 0.001 * total->lag.max);
 	}
 
+	/* it can be non-zero only if max_tries is greater than one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f %%)\n",
+			   total->retried, 100.0 * total->retried / total->cnt);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
 	printf("tps = %f (including connections establishing)\n", tps_include);
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -3958,23 +4991,59 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   100.0 * sstats->cnt / total->cnt,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f %%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / sstats->cnt);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f %%)\n",
+						   sstats->errors_in_failed_tx,
+						   100.0 * sstats->errors_in_failed_tx / sstats->cnt);
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
 						   100.0 * sstats->skipped / sstats->cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f %%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / sstats->cnt);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/*
+			 * Report per-command statistics: latencies, retries after failures,
+			 * errors (failures without retrying).
+			 */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" :
+							","));
+				if (total->errors_in_failed_tx > 0)
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				if (total->retried > 0)
+					printf(" and retries");
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -3982,10 +5051,19 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					if (total->errors_in_failed_tx > 0)
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					if (total->retried > 0)
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4016,7 +5094,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4034,6 +5112,8 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"debug-fails", no_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4101,6 +5181,8 @@ main(int argc, char **argv)
 
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
+	state->retry_state.command = -1;
+	state->last_failure.command = -1;
 
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
@@ -4133,6 +5215,7 @@ main(int argc, char **argv)
 				break;
 			case 'd':
 				debug++;
+				debug_fails++;
 				break;
 			case 'c':
 				benchmarking_option_set = true;
@@ -4185,7 +5268,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -4266,7 +5349,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -4374,6 +5457,19 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* debug-fails */
+				debug_fails++;
+				break;
+			case 10:			/* max-tries */
+				benchmarking_option_set = true;
+				max_tries = atoi(optarg);
+				if (max_tries <= 0)
+				{
+					fprintf(stderr, "invalid number of maximum tries: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4554,32 +5650,49 @@ main(int argc, char **argv)
 		state = (CState *) pg_realloc(state, sizeof(CState) * nclients);
 		memset(state + 1, 0, sizeof(CState) * (nclients - 1));
 
-		/* copy any -D switch values to all clients */
 		for (i = 1; i < nclients; i++)
 		{
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+
+			/* copy any -D switch values to all clients */
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->is_numeric)
 				{
-					if (!putVariableNumber(&state[i], "startup",
-										   var->name, &var->num_value))
+					if (!putVariableNumber(&state[i].variables, "startup",
+										   var->name, &var->num_value, true))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->value))
 						exit(1);
 				}
 			}
+
+			/* set the retry state and the state of the last failure */
+			state[i].retry_state.command = -1;
+			state[i].last_failure.command = -1;
 		}
 	}
 
+	/* set random seed */
+	INSTR_TIME_SET_CURRENT(start_time);
+	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
+	/* set random states for clients */
+	for (i = 0; i < nclients; i++)
+	{
+		state[i].random_state[0] = random();
+		state[i].random_state[1] = random();
+		state[i].random_state[2] = random();
+	}
+
 	if (debug)
 	{
 		if (duration <= 0)
@@ -4641,11 +5754,12 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale,
+								true))
 				exit(1);
 		}
 	}
@@ -4654,11 +5768,12 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i,
+								true))
 				exit(1);
 		}
 	}
@@ -4680,10 +5795,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
@@ -4696,9 +5807,6 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
@@ -4780,6 +5888,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -4919,7 +6031,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_END_FAILED_TX_BLOCK_WAIT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -5023,7 +6136,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_END_FAILED_TX_BLOCK_WAIT)
 			{
 				/* don't call doCustom unless data is available */
 				int			sock = PQsocket(st->con);
@@ -5065,7 +6179,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -5092,6 +6210,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -5109,6 +6232,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -5134,6 +6262,14 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (errors > 0)
+				{
+					fprintf(stderr, ", " INT64_FORMAT " failed" , errors);
+					if (errors_in_failed_tx > 0)
+						fprintf(stderr, " (" INT64_FORMAT " in failed tx)",
+										errors_in_failed_tx);
+				}
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -5141,6 +6277,12 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (retried > 0)
+					fprintf(stderr, ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							retried, retries);
+
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index e3cdf28..68e0828 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -415,9 +415,11 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -M prepared',
+		'-n -t 1 -Dfoo=bla -M prepared --debug-fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 1/1}, qr{number of errors: 1 \(100\.000 %\)} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..c77add0
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,828 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 28;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	TRANSACTION_BEGINS => 4,
+	DEADLOCK_1         => 5,
+	WAIT_PGBENCH_2     => 6,
+	DEADLOCK_2         => 7,
+	TRANSACTION_ENDS   => 8,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN\\;"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_serialization_compound =
+	$node->basedir . '/pgbench_script_serialization_compound';
+append_to_file($script_serialization_compound,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN\\;"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL\\;"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_BEGINS . ");\n"
+	  . "SELECT pg_advisory_unlock(" . TRANSACTION_BEGINS . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_BEGINS . ");\n"
+	  . "SELECT pg_advisory_unlock(" . TRANSACTION_BEGINS . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_commit_failure = $node->basedir . '/pgbench_script_commit_failure';
+append_to_file($script_commit_failure,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN\\;"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1\\;"
+	  . "SELECT pg_advisory_lock(0)\\;"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug-fails --file),
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update: check processed transactions");
+
+	like($out_pgbench,
+		qr{number of errors: 1 \(100\.000 %\)},
+		"concurrent update: check errors");
+
+	my $pattern =
+		"client 0 got a serialization error in command 1 \\(subcommand 1\\) of script 0:\n"
+	  . "ERROR:  could not serialize access due to concurrent update\n";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update: check serialization error");
+}
+
+sub test_pgbench_serialization_errors_and_failures
+{
+	my ($script, $name, $error) = @_;
+
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1),
+		$error ? "--debug-fails" : "--debug",
+		qw(--max-tries 2 --file),
+		$script);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		$name . ": check processed transactions");
+
+	if ($error)
+	{
+		like($out_pgbench,
+			qr{number of errors: 1 \(100\.000 %\)},
+			$name . ": check errors");
+
+		my $pattern =
+			"client 0 got a serialization error in command 1 \\(subcommand 1\\) of script 0:\n"
+		  . "ERROR:  could not serialize access due to concurrent update\n";
+
+		like($err_pgbench,
+			qr{$pattern},
+			$name . ": check serialization error");
+	}
+	else
+	{
+		like($out_pgbench,
+			qr{^((?!number of errors)(.|\n))*$},
+			"concurrent update with retrying: check errors");
+
+		my $pattern =
+			"client 0 sending BEGIN;UPDATE xy SET y = y \\+ (-?\\d+) "
+		  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+		  . "(client 0 receiving\n)+"
+		  . "client 0 got a serialization failure \\(try 1/2\\) in command 1 \\(subcommand 1\\) of script 0:\n"
+		  . "ERROR:  could not serialize access due to concurrent update\n"
+		  . "client 0 sending END;\n"
+		  . "\\g2+"
+		  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+		  . "client 0 sending BEGIN;UPDATE xy SET y = y \\+ \\g1 "
+		  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+		like($err_pgbench,
+			qr{$pattern},
+			$name . ": check the retried transaction");
+	}
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update: pgbench 1: check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update: pgbench 2: check processed transactions");
+
+	# The first or second pgbench should get a deadlock error
+	like($out1 . $out2,
+		qr{number of errors: 1 \(100\.000 %\)},
+		"concurrent deadlock update: check errors");
+
+	ok(
+		($err1 =~ /client 0 got a deadlock error in command 6/ or
+	     $err2 =~ /client 0 got a deadlock error in command 5/),
+		"concurrent deadlock update: check deadlock error");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug --max-tries 2 --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug --max-tries 2 --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks which pgbenches will wait for:
+	# - pgbench with a deadlock failure will wait for the lock after the start
+	#   of its retried transaction
+	# - pgbench without a deadlock failure will wait for the lock after the end
+	#   of its sucessuful transaction
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_BEGINS . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_BEGINS . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_BEGINS ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_BEGINS . "_zero' "
+		  . "else '" . TRANSACTION_BEGINS . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_BEGINS
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_BEGINS ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS . "_zero' "
+		  . "else '" . TRANSACTION_ENDS . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	my $pattern1 =
+		"client 0 sending SELECT pg_advisory_lock\\((\\d)\\);\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\((\\d)\\);\n"
+	  . "\\g2+"
+	  . "client 0 got a deadlock failure \\(try 1/2\\) in command 6 of script 0:\n"
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_BEGINS . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_unlock\\(" . TRANSACTION_BEGINS . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(\\g1\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(\\g3\\);\n";
+
+	my $pattern2 =
+		"client 0 sending SELECT pg_advisory_lock\\((\\d)\\);\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\((\\d)\\);\n"
+	  . "\\g2+"
+	  . "client 0 got a deadlock failure \\(try 1/2\\) in command 5 of script 0:\n"
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_BEGINS . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_unlock\\(" . TRANSACTION_BEGINS . "\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(\\g1\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(\\g3\\);\n";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+sub test_pgbench_commit_failure
+{
+	my $isolation_level = SERIALIZABLE;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "select pg_advisory_lock(0);\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug --max-tries 2 --file),
+		$script_commit_failure);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) when 0 then 'zero' else 'not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /not_zero/);
+
+	# In psql, run a parallel transaction, release advisory locks and end the
+	# session:
+
+	$in_psql = "begin\;update xy set y = y + 1 where x = 2\;end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"commit failure: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"commit failure: check errors");
+
+	my $pattern =
+		"client 0 sending BEGIN;"
+	  . "UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = 1;"
+	  . "SELECT pg_advisory_lock\\(0\\);"
+	  . "END;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a serialization failure \\(try 1/2\\) in command 1 \\(subcommand 3\\) of script 0:\n"
+	  . "ERROR:  could not serialize access due to read/write dependencies among transactions\n"
+	  . "DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;"
+	  . "UPDATE xy SET y = y \\+ \\g1 WHERE x = 1;"
+	  . "SELECT pg_advisory_lock\\(0\\);"
+	  . "END;\n";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"commit failure: "
+	  . "check the completion of the failed transaction block");
+}
+
+test_pgbench_serialization_errors();
+
+test_pgbench_serialization_errors_and_failures(
+	$script_serialization,
+	"concurrent update with retrying",
+	0);
+# not all compound commands can be retried
+test_pgbench_serialization_errors_and_failures(
+	$script_serialization_compound,
+	"concurrent update with a transaction block in a compound command",
+	1);
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+test_pgbench_commit_failure();
-- 
2.7.4

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Marina Polyakova (#1)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

This is the fourth version of the patch for pgbench.

Consider adding the patch to the next commitfest?

--
Fabien.

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Fabien COELHO (#2)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 19-12-2017 16:52, Fabien COELHO wrote:

Hello Marina,

This is the fourth version of the patch for pgbench.

Consider adding the patch to the next commitfest?

Hi! Yes, here it is: https://commitfest.postgresql.org/16/1420/

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Marina Polyakova (#3)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Consider adding the patch to the next commitfest?

Hi! Yes, here it is: https://commitfest.postgresql.org/16/1420/

I think you may have to ask the cf app admin to remove the duplicate.

https://commitfest.postgresql.org/16/1419/

--
Fabien.

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Fabien COELHO (#4)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 19-12-2017 17:11, Fabien COELHO wrote:

Consider adding the patch to the next commitfest?

Hi! Yes, here it is: https://commitfest.postgresql.org/16/1420/

I think you may have to ask the cf app admin to remove the duplicate.

https://commitfest.postgresql.org/16/1419/

Thanks, I'm trying to do this..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Marina Polyakova (#5)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 19-12-2017 17:22, Marina Polyakova wrote:

On 19-12-2017 17:11, Fabien COELHO wrote:

Consider adding the patch to the next commitfest?

Hi! Yes, here it is: https://commitfest.postgresql.org/16/1420/

I think you may have to ask the cf app admin to remove the duplicate.

https://commitfest.postgresql.org/16/1419/

Thanks, I'm trying to do this..

The second one was removed, so
https://commitfest.postgresql.org/16/1419/ is a working one.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Marina Polyakova (#1)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Some comment about WIP pgbench error handling v4.

Patch applies, compiles, global & local "make check" are ok. doc compiles.

I'm generally okay with having such a feature, but I'd like it to be
*MUCH* simpler, otherwise it is going to make pgbench unmaintainable:-(

Also, ISTM that a new patch should address somehow (in the code or with
explanation about why you disagree) all comments from the previous review,
which is not really the case here, as I have to repeat some of the
comments I did on v3. You should answer to the previous comment mail and
tag all comments with "ok", "no because this or that", "done", "postponed
because this or that...".

About the documentation:

Again, as already said in the previous review, please take into account comments
or justify why you do not do so, I do not think that this feature should be
given any pre-emminence: most of the time performance testing is about all-is-
well transactions which do not display any error. I do not claim that it is not
a useful feature, on the contrary I do think that testing under error conditions
is a good capability, but I just insist that it is a on the side feature
should not be put forward. As a consequence, the "maximum number of transaction
tries" should certainly not be the default first output of a summary run.

I'm unclear about the added example added in the documentation. There
are 71% errors, but 100% of transactions are reported as processed. If
there were errors, then it is not a success, so the transaction were not
processed? To me it looks inconsistent. Also, while testing, it seems that
failed transactions are counted in tps, which I think is not appropriate:

About the feature:

sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
starting vacuum...end.
progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
# NOT 10845.8 TPS...
progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
...
number of transactions actually processed: 32028 # NO!
number of errors: 30969 (96.694 %)
latency average = 2.833 ms
latency stddev = 1.508 ms
tps = 10666.720870 (including connections establishing) # NO
tps = 10683.034369 (excluding connections establishing) # NO
...

For me this is all wrong. I think that the tps report is about transactions
that succeeded, not mere attempts. I cannot say that a transaction which aborted
was "actually processed"... as it was not.

The order of reported elements is not logical:

maximum number of transaction tries: 100
scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
latency average = 9.630 ms
latency stddev = 13.366 ms
number of transactions retried: 623 (64.426 %)
number of retries: 32272

I would suggest to group everything about error handling in one block,
eg something like:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
number of transactions retried: 623 (64.426 %)
number of retries: 32272
maximum number of transaction tries: 100
latency average = 9.630 ms
latency stddev = 13.366 ms

Also, percent character should be stuck to its number: 15.719% to have the
style more homogeneous (although there seems to be pre-existing
inhomogeneities).

I would replace "transaction tries/retried" by "tries/retried", everything
is about transactions in the report anyway.

Without reading the documentation, the overall report semantics is unclear,
especially given the absurd tps results I got with the my first attempt,
as failing transactions are counted as "processed".

For the detailed report, ISTM that I already said in the previous review that
the 22 columns (?) indentation is much too large:

0.078 149 30886 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

When giving the number of retries, I'd like to also have the average number of
retried per attempted transactions, whether they succeeded or failed in the end.

About the code:

I'm at lost with the 7 states added to the automaton, where I would have hoped
that only 2 (eg RETRY & FAIL, or even less) would be enough.

I would hope for something like "if (some error) state = ERROR",
then in "ERROR" do the stats, check whether it should be retried, and if
so state = START... and we are done.

I'm wondering whether the whole feature could be simplified by considering
that one script is one "transaction" (it is from the report point of view
at least), and that any retry is for the full script only, from its
beginning. That would remove the trying to guess at transactions begin or
end, avoid scanning manually for subcommands, and so on.
- Would it make sense?
- Would it be ok for your use case?
The proposed version of the code looks unmaintainable to me. There are
3 levels of nested "switch/case" with state changes at the deepest level.
I cannot even see it on my screen which is not wide enough.

There should be a typedef for "random_state", eg something like:

typedef struct { unsigned short data[3]; } RandomState;

Please keep "const" declarations, eg "commandFailed".

I think that choosing script should depend on the thread random state, not
the client random state, so that a run would generate the same pattern per
thread, independently of which client finishes first.

I'm sceptical of the "--debug-fails" options. ISTM that --debug is already there
and should just be reused. Also, you have changed some existing error unrelated
error messages with this option, especially in evalFunc, which is clearly not
appropriate:

  -    fprintf(stderr, "random range is too large\n");
  +    if (debug_fails)
  +        fprintf(stderr, "random range is too large\n");

Please let unrelated code as is.

I agree that function naming style is a already a mess, but I think that
new functions you add should use a common style, eg "is_compound" vs
"canRetry".

Translating error strings to their enum should be put in a function.

I do not believe in the normalize_whitespace function: ISTM that it
would turn "SELECT LENGTH(' ');" to "SELECT LENGTH(' ');", which
is not desirable. I do not think that it should be needed.

I do not understand the "get subcommands of a compound command" strange
re-parsing phase. There should be comments explaining what you want to
achieve. I'm not sure about what happens if several begin/end appears
on the line. I'm not sure this whole thing should be done anyway.

About the tests:

The 828 lines perl script seems over complicated, with pgbench & psql
interacting together... Is all that really necessary? Isn't some (much) simpler
test possible and would be sufficient?

The "node" is started but never stopped.

For file contents, maybe the << 'EOF' here-document syntax would help instead
of using concatenated backslashed strings everywhere.

--
Fabien.

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Fabien COELHO (#7)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 03-01-2018 19:28, Fabien COELHO wrote:

Hello Marina,

Hello!

Some comment about WIP pgbench error handling v4.

Many thanks!

Patch applies, compiles, global & local "make check" are ok. doc
compiles.

I'm generally okay with having such a feature, but I'd like it to be
*MUCH* simpler, otherwise it is going to make pgbench
unmaintainable:-(

I understand your concern about this, but what exactly do you want to
simplify (except for compound commands, the option --debug-fails and
processing the end of the failed transaction block, because you discuss
them later)?

Also, ISTM that a new patch should address somehow (in the code or
with explanation about why you disagree) all comments from the
previous review, which is not really the case here, as I have to
repeat some of the comments I did on v3. You should answer to the
previous comment mail and tag all comments with "ok", "no because this
or that", "done", "postponed because this or that...".

1) You can read my answer to your previous review [1]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru:
- if there's a short answer like "Ok!", "Thanks!", "Sorry, thanks",
"Thanks, I agree" or "Thanks, I'll fix it", it means "done";
- if there's an open question, it means that it is postponed because I
don't know exactly how to do it.

2) The second note of the --max-tries documentation. I wrote you a
suggestion in [1]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru and included it in the patch because you did not
answer me about it:

I do not understand the second note of the --max-tries documentation.
It seems to suggest that some script may not end their own
transaction...
which should be an error in my opinion? Some explanations would be
welcome.

As you told me here [1], "I disagree about exit in ParseScript if the
transaction block is not completed <...> and would break an existing
feature.". Maybe it's be better to say this:

In pgbench you can use scripts in which the transaction blocks do not
end. Be careful in this case because transactions that span over more
than one script are not rolled back and will not be retried in case of
an error. In such cases, the script in which the error occurred is
reported as failed.

?

3) About the tests. I wrote to you why they are so heavy in [1]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru:

The tap tests seems over-complicated and heavy with two pgbench run in
parallel... I'm not sure we really want all that complexity for this
somehow small feature. Moreover pgbench can run several scripts, I'm
not
sure why two pgbench would need to be invoked. Could something much
simpler and lighter be proposed instead to test the feature?

Firstly, two pgbench need to be invoked because we don't know which of
them will get a deadlock failure. Secondly, I tried much simplier tests
but all of them failed sometimes although everything was ok:
- tests in which pgbench runs 5 clients and 10 transactions per client
for a serialization/deadlock failure on any client (sometimes there are
no failures when it is expected that they will be)
- tests in which pgbench runs 30 clients and 400 transactions per
client
for a serialization/deadlock failure on any client (sometimes there are
no failures when it is expected that they will be)
- tests in which the psql session starts concurrently and you use sleep
commands to wait pgbench for 10 seconds (sometimes it does not work)
Only advisory locks help me not to get such errors in the tests :(

+ as I wrote in [2]/messages/by-id/894ac29c18f12a4fd6f8c97c9123a152@postgrespro.ru, they became lighter (28 simple tests instead of 57,
8 runs of pgbench instead of 17).

About the documentation:

Again, as already said in the previous review, please take into account
comments
or justify why you do not do so, I do not think that this feature
should be
given any pre-emminence: most of the time performance testing is about
all-is-
well transactions which do not display any error. I do not claim that
it is not
a useful feature, on the contrary I do think that testing under error
conditions
is a good capability, but I just insist that it is a on the side
feature
should not be put forward. As a consequence, the "maximum number of
transaction
tries" should certainly not be the default first output of a summary
run.

I agree with you that there should be no pre-emminence for my feature.
Here there're my reasons for the order in the reports (except for the
"maximum number of transaction tries", because you discuss it later):
1) Per-statement report: errors and retries are printed at the end.
2) Other reports (main report, per-script report, progress report,
transaction logging, aggregation logging):
- errors are printed before optional results because you don't need any
options for getting errors;
- retries are printed at the end because they need the option
--max-tries.

I'm unclear about the added example added in the documentation. There
are 71% errors, but 100% of transactions are reported as processed. If
there were errors, then it is not a success, so the transaction were
not
processed? To me it looks inconsistent. Also, while testing, it seems
that
failed transactions are counted in tps, which I think is not
appropriate:

About the feature:

sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
starting vacuum...end.
progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
# NOT 10845.8 TPS...
progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
...
number of transactions actually processed: 32028 # NO!
number of errors: 30969 (96.694 %)
latency average = 2.833 ms
latency stddev = 1.508 ms
tps = 10666.720870 (including connections establishing) # NO
tps = 10683.034369 (excluding connections establishing) # NO
...

For me this is all wrong. I think that the tps report is about
transactions
that succeeded, not mere attempts. I cannot say that a transaction
which aborted
was "actually processed"... as it was not.

Thank you, I agree and I will fix it.

The order of reported elements is not logical:

maximum number of transaction tries: 100
scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
latency average = 9.630 ms
latency stddev = 13.366 ms
number of transactions retried: 623 (64.426 %)
number of retries: 32272

I would suggest to group everything about error handling in one block,
eg something like:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
number of transactions retried: 623 (64.426 %)
number of retries: 32272
maximum number of transaction tries: 100
latency average = 9.630 ms
latency stddev = 13.366 ms

Do you think this is ok when only errors and the maximum number of
transaction tries are printed. (because retries are printed if they are
not-zero)? We can get something like this:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
maximum number of transaction tries: 1 # default value
latency average = 9.630 ms
latency stddev = 13.366 ms

or this:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
maximum number of transaction tries: 3 # not the default value
latency average = 9.630 ms
latency stddev = 13.366 ms

Also, percent character should be stuck to its number: 15.719% to have
the style more homogeneous (although there seems to be pre-existing
inhomogeneities).

I would replace "transaction tries/retried" by "tries/retried",
everything
is about transactions in the report anyway.

Without reading the documentation, the overall report semantics is
unclear,
especially given the absurd tps results I got with the my first
attempt,
as failing transactions are counted as "processed".

I agree and I will fix it.

For the detailed report, ISTM that I already said in the previous
review that
the 22 columns (?) indentation is much too large:

0.078 149 30886 UPDATE
pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

You did not answer me in [1]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru about this:

Moreover the 25 characters
alignment is ugly, better use a much smaller alignment.

The variables for the numbers of failures and retries are of type int64
since the variable for the total number of transactions has the same
type. That's why such a large alignment (as I understand it now, enough
20 characters). Do you prefer floating alignemnts, depending on the
maximum number of failures/retries for any command in any script?

When giving the number of retries, I'd like to also have the average
number of
retried per attempted transactions, whether they succeeded or failed in
the end.

Is this not a number of retried divided by the number of attempted
transactions (this is now printed in the main report)? Or do you mean
the average number of retried transactions per script run?

About the code:

I'm at lost with the 7 states added to the automaton, where I would
have hoped
that only 2 (eg RETRY & FAIL, or even less) would be enough.

Well, I will try to simplify it. As I can see now, there should be a
separate code if we end a failed transaction block:
- if we try to end a failed transaction block and a failure occurs, this
failure cannot be retried in any case;
- if we cannot retry a failure and there was a failed transaction block,
we must go to the next command after it, and not to the next command
after the failure.

I would hope for something like "if (some error) state = ERROR",
then in "ERROR" do the stats, check whether it should be retried, and
if
so state = START... and we are done.

We discussed this question in [3]/messages/by-id/6ce8613f061001105654673506348c13@postgrespro.ru and [4]/messages/by-id/alpine.DEB.2.20.1707131818200.20175@lancre:

I wondering whether the RETRY & FAILURE states could/should be
merged:

on RETRY:
-> count retry
-> actually retry if < max_tries (reset client state, jump to
command)
-> else count failure and skip to end of script

The start and end of transaction detection seem expensive (malloc,
...) and assume a one statement per command (what about "BEGIN \; ...
\; COMMIT;", which is not necessarily the case, this limitation
should
be documented. ISTM that the space normalization should be avoided,
and something simpler/lighter should be devised? Possibly it should
consider handling SAVEPOINT.

I divided these states because if there's a failed transaction block
you
should end it before retrying. It means to go to states
CSTATE_START_COMMAND -> CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with
the appropriate command. How do you propose not to go to these states?

Hmmm. Maybe I'm wrong. I'll think about it.

I'm wondering whether the whole feature could be simplified by
considering that one script is one "transaction" (it is from the
report point of view at least), and that any retry is for the full
script only, from its beginning. That would remove the trying to guess
at transactions begin or end, avoid scanning manually for subcommands,
and so on.
- Would it make sense?
- Would it be ok for your use case?

I'm not sure if this makes sense: if we retry the script from the very
beginning including successful transactions, there may be errors..

The proposed version of the code looks unmaintainable to me. There are
3 levels of nested "switch/case" with state changes at the deepest
level.
I cannot even see it on my screen which is not wide enough.

I understand your concern, although the 80-column limit breaks only for
long strings for printf functions. I will try to simplify this by
transferring the 2nd and the 3rd levels of nested "switch/case" into a
separate function.

There should be a typedef for "random_state", eg something like:

typedef struct { unsigned short data[3]; } RandomState;

Please keep "const" declarations, eg "commandFailed".

I think that choosing script should depend on the thread random state,
not
the client random state, so that a run would generate the same pattern
per
thread, independently of which client finishes first.

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there
and should just be reused.

Thanks, I agree with you and I'll fix this.
(Therefore I assume that both the thread state and the client state will
have a random state.)
How do you like the idea of creating several levels for debugging?

Also, you have changed some existing error unrelated
error messages with this option, especially in evalFunc, which is
clearly not
appropriate:
-    fprintf(stderr, "random range is too large\n");
+    if (debug_fails)
+        fprintf(stderr, "random range is too large\n");
Please let unrelated code as is.

I suppose this is a related code. If we do not change this, a program
run that does not use debugging options can receive many error messages
without aborting any clients. + it will be very difficult to get reports
of progress among all this.

I agree that function naming style is a already a mess, but I think
that
new functions you add should use a common style, eg "is_compound" vs
"canRetry".

Translating error strings to their enum should be put in a function.

Ok, I will fix it.

I do not believe in the normalize_whitespace function: ISTM that it
would turn "SELECT LENGTH(' ');" to "SELECT LENGTH(' ');", which
is not desirable. I do not think that it should be needed.

This function does not change the commands that are passed to the
server; this just simplifies the function is_tx_block_end and makes the
latter more understandable.

I do not understand the "get subcommands of a compound command" strange
re-parsing phase. There should be comments explaining what you want to
achieve.

Well, I will write these comments. My reasons are as follows:
1) I need to know the number of non-empty subcommands for processing the
command in CSTATE_WAIT_RESULT;
2) I need this parser to correctly distinguish subcommands in such cases
as "SELECT ';';";
3) for any non-empty subcommand I need to know its initial sequence
number, it is used to report errors/failures;
4) if the compound command contains only one non-empty subcommand (for
example, the compound command ';;END;;'), it can be retried as a usual
command;
5) retrying of failures in the compound command depends on its
subcommands which start or complete the transaction block.

I'm not sure about what happens if several begin/end appears
on the line.

Maybe this section of the documentation [5]https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT will help you?

I'm not sure this whole thing should be done anyway.

Well, I will simplify it so the compound commands will not be retried in
any case; but even now the parser is needed for reasons 1, 2 and 3 in
the list above ("My reasons are as follows: ...").

About the tests:

The 828 lines perl script seems over complicated, with pgbench & psql
interacting together... Is all that really necessary? Isn't some (much)
simpler
test possible and would be sufficient?

I wrote to you about this at the beginning of this letter ("3) About the
tests. ...").

The "node" is started but never stopped.

Thank you, I will fix it.

For file contents, maybe the << 'EOF' here-document syntax would help
instead
of using concatenated backslashed strings everywhere.

Thanks, but I'm not sure that it is better to open file handlers for
printing in variables..

[1]: /messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru
/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru
[2]: /messages/by-id/894ac29c18f12a4fd6f8c97c9123a152@postgrespro.ru
/messages/by-id/894ac29c18f12a4fd6f8c97c9123a152@postgrespro.ru
[3]: /messages/by-id/6ce8613f061001105654673506348c13@postgrespro.ru
/messages/by-id/6ce8613f061001105654673506348c13@postgrespro.ru
[4]: /messages/by-id/alpine.DEB.2.20.1707131818200.20175@lancre
/messages/by-id/alpine.DEB.2.20.1707131818200.20175@lancre
[5]: https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT
https://www.postgresql.org/docs/devel/static/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Marina Polyakova (#8)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

A partial answer, to focus on how the feature may be simplified.

I apologise as it seems that I overlooked one of your mail. Changing the
thread has not helped.

I'm wondering whether the whole feature could be simplified by
considering that one script is one "transaction" (it is from the
report point of view at least), and that any retry is for the full
script only, from its beginning. That would remove the trying to guess
at transactions begin or end, avoid scanning manually for subcommands,
and so on.
- Would it make sense?
- Would it be ok for your use case?

I'm not sure if this makes sense: if we retry the script from the very
beginning including successful transactions, there may be errors..

What I'm suggesting is to simplify the problem by adding some restrictions
to what kind of case which is handled, so as to simplify the code a lot.

I'd start by stating (i.e. documenting) that the features assumes that one
script is just *one* transaction.

Note that pgbench somehow already assumes that one script is one
transaction when it reports performance anyway.

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Under these restrictions, ISTM that a retry is something like:

case ABORTED:
if (we want to retry) {
// do necessary stats
// reset the initial state (random, vars, current command)
state = START_TX; // loop
}
else {
// count as failed...
state = FINISHED; // or done.
}
break;

Once this works, maybe it could go a step further by restarting at
savepoints. I'd put restrictions there to ease detecting a savepoint so
that it cannot occur in a compound command for instance. This would
probably require a new state. Fine.

A detail:

For file contents, maybe the << 'EOF' here-document syntax would help
instead of using concatenated backslashed strings everywhere.

Thanks, but I'm not sure that it is better to open file handlers for
printing in variables..

Perl here-document stuff is just a multiline string, no file is involved,
it is different from the shell.

--
Fabien.

#10

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Fabien COELHO (#9)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 12-01-2018 15:47, Fabien COELHO wrote:

Hello Marina,

A partial answer, to focus on how the feature may be simplified.

Thank you!

I apologise as it seems that I overlooked one of your mail. Changing
the thread has not helped.

I'm wondering whether the whole feature could be simplified by
considering that one script is one "transaction" (it is from the
report point of view at least), and that any retry is for the full
script only, from its beginning. That would remove the trying to
guess
at transactions begin or end, avoid scanning manually for
subcommands,
and so on.
- Would it make sense?
- Would it be ok for your use case?

I'm not sure if this makes sense: if we retry the script from the very
beginning including successful transactions, there may be errors..

What I'm suggesting is to simplify the problem by adding some
restrictions to what kind of case which is handled, so as to simplify
the code a lot.

I'd start by stating (i.e. documenting) that the features assumes that
one script is just *one* transaction.

Note that pgbench somehow already assumes that one script is one
transaction when it reports performance anyway.

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Therefore if the script consists of several single statements (= several
transactions), you cannot retry them. For example, the script looked
like this:

UPDATE xy1 SET x = 1 WHERE y = 1;
UPDATE xy2 SET x = 2 WHERE y = 2;
UPDATE xy3 SET x = 3 WHERE y = 3;

If this restriction is ok for you, I'll simplify the code :)

Under these restrictions, ISTM that a retry is something like:

case ABORTED:
if (we want to retry) {
// do necessary stats
// reset the initial state (random, vars, current command)
state = START_TX; // loop
}
else {
// count as failed...
state = FINISHED; // or done.
}
break;

If we successfully complete a failed transaction block and process its
end command in CSTATE_END_COMMAND, we may want to make a retry. So do
you think that in this case it is ok to go to CSTATE_ABORTED at the end
of CSTATE_END_COMMAND?..

Once this works, maybe it could go a step further by restarting at
savepoints. I'd put restrictions there to ease detecting a savepoint
so that it cannot occur in a compound command for instance. This would
probably require a new state. Fine.

We discussed the savepoints earlier in [1]/messages/by-id/alpine.DEB.2.20.1707141637300.7871@lancre:

- if there's a failure what savepoint we should rollback to and start
the
execution again?

...

Maybe to go to the last one, if it is not successful go to the
previous
one etc. Retrying the entire transaction may take less time..

Well, I do not know that. My 0.02 € is that if there was a savepoint
then
this is natural the restarting point of a transaction which has some
recoverable error.

A detail:

For file contents, maybe the << 'EOF' here-document syntax would help
instead of using concatenated backslashed strings everywhere.

Thanks, but I'm not sure that it is better to open file handlers for
printing in variables..

Perl here-document stuff is just a multiline string, no file is
involved, it is different from the shell.

Oh, now googling was successful, thanks)

[1]: /messages/by-id/alpine.DEB.2.20.1707141637300.7871@lancre
/messages/by-id/alpine.DEB.2.20.1707141637300.7871@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Marina Polyakova (#10)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Therefore if the script consists of several single statements (= several
transactions), you cannot retry them. For example, the script looked like
this:

UPDATE xy1 SET x = 1 WHERE y = 1;
UPDATE xy2 SET x = 2 WHERE y = 2;
UPDATE xy3 SET x = 3 WHERE y = 3;

If this restriction is ok for you, I'll simplify the code :)

Yes, that is what I'm suggesting. If you want to restart them, you can put
them in 3 scripts.

Under these restrictions, ISTM that a retry is something like:

case ABORTED:
if (we want to retry) {
// do necessary stats
// reset the initial state (random, vars, current command)
state = START_TX; // loop
}
else {
// count as failed...
state = FINISHED; // or done.
}
break;

If we successfully complete a failed transaction block and process its end
command in CSTATE_END_COMMAND, we may want to make a retry. So do you think
that in this case it is ok to go to CSTATE_ABORTED at the end of
CSTATE_END_COMMAND?..

Dunno.

I'm fine with having END_COMMAND skipping to START_TX if it can be done
easily and cleanly, esp without code duplication.

ISTM that ABORTED & FINISHED are currently exactly the same. That would
put a particular use to aborted. Also, there are many points where the
code may go to "aborted" state, so reusing it could help avoid duplicating
stuff on each abort decision.

Once this works, maybe it could go a step further by restarting at
savepoints. I'd put restrictions there to ease detecting a savepoint
so that it cannot occur in a compound command for instance. This would
probably require a new state. Fine.

We discussed the savepoints earlier in [1]:

Yep. I'm trying to suggest an incremental path with simple but yet quite
useful things first.

--
Fabien.

#12

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Fabien COELHO (#11)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 12-01-2018 18:13, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Therefore if the script consists of several single statements (=
several transactions), you cannot retry them. For example, the script
looked like this:

UPDATE xy1 SET x = 1 WHERE y = 1;
UPDATE xy2 SET x = 2 WHERE y = 2;
UPDATE xy3 SET x = 3 WHERE y = 3;

If this restriction is ok for you, I'll simplify the code :)

Yes, that is what I'm suggesting. If you want to restart them, you can
put them in 3 scripts.

Okay, in the next patch I'll simplify the code.

Under these restrictions, ISTM that a retry is something like:
...

If we successfully complete a failed transaction block and process its
end command in CSTATE_END_COMMAND, we may want to make a retry. So do
you think that in this case it is ok to go to CSTATE_ABORTED at the
end of CSTATE_END_COMMAND?..

Dunno.

I'm fine with having END_COMMAND skipping to START_TX if it can be
done easily and cleanly, esp without code duplication.

If I understand you correctly, I'm not sure that we should skip the
statistics collector for the command that completes a failed transaction
block.. + maybe for this command we will have only such statistics.

ISTM that ABORTED & FINISHED are currently exactly the same. That would
put a particular use to aborted. Also, there are many points where the
code may go to "aborted" state, so reusing it could help avoid
duplicating stuff on each abort decision.

Thanks, I agree with you.

Once this works, maybe it could go a step further by restarting at
savepoints. I'd put restrictions there to ease detecting a savepoint
so that it cannot occur in a compound command for instance. This
would
probably require a new state. Fine.

We discussed the savepoints earlier in [1]:

Yep. I'm trying to suggest an incremental path with simple but yet
quite useful things first.

This question ("if there's a failure what savepoint we should rollback
to and start the execution again? ...") mostly concerns the basic idea
of how to maintain the savepoints in this feature, rather than the exact
architecture of the code, so we can discuss this now :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13

David Steele

david@pgmasters.net

almost 8 years ago

In reply to: Marina Polyakova (#12)

Re: Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

On 1/12/18 12:01 PM, Marina Polyakova wrote:

Yep. I'm trying to suggest an incremental path with simple but yet
quite useful things first.

This question ("if there's a failure what savepoint we should rollback
to and start the execution again? ...") mostly concerns the basic idea
of how to maintain the savepoints in this feature, rather than the exact
architecture of the code, so we can discuss this now :)

This patch was marked Waiting on Author on Jan 8 and no new patch was
submitted before this commitfest.

I think we should mark this patch as Returned with Feedback.

Thanks,
--
-David
david@pgmasters.net

#14

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: David Steele (#13)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 05-03-2018 18:21, David Steele wrote:

Hello Marina,

Hello, David!

On 1/12/18 12:01 PM, Marina Polyakova wrote:

...

This patch was marked Waiting on Author on Jan 8 and no new patch was
submitted before this commitfest.

I think we should mark this patch as Returned with Feedback.

I'm sorry, I was very busy with the patch to precalculate stable
functions.. I'm working on a new version of this patch for pgbench
unless, of course, it's too late for v11.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#15

David Steele

david@pgmasters.net

almost 8 years ago

In reply to: Marina Polyakova (#14)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hi Marina,

On 3/6/18 4:45 AM, Marina Polyakova wrote:

On 05-03-2018 18:21, David Steele wrote:

Hello Marina,

Hello, David!

On 1/12/18 12:01 PM, Marina Polyakova wrote:

...

This patch was marked Waiting on Author on Jan 8 and no new patch was
submitted before this commitfest.

I think we should mark this patch as Returned with Feedback.

I'm sorry, I was very busy with the patch to precalculate stable
functions.. I'm working on a new version of this patch for pgbench
unless, of course, it's too late for v11.

Understood, but in fairness to other authors who got their patches
pushed for not providing a new patch before the CF, we should push this
one as well.

Since this is in Waiting on Author state I have marked it Returned with
Feedback. Please submit to the next CF when you have a new patch.

Regards,
--
-David
david@pgmasters.net

#16

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: David Steele (#15)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 06-03-2018 18:02, David Steele wrote:

Hi Marina,

On 3/6/18 4:45 AM, Marina Polyakova wrote:

On 05-03-2018 18:21, David Steele wrote:

Hello Marina,

Hello, David!

On 1/12/18 12:01 PM, Marina Polyakova wrote:

...

This patch was marked Waiting on Author on Jan 8 and no new patch was
submitted before this commitfest.

I think we should mark this patch as Returned with Feedback.

I'm sorry, I was very busy with the patch to precalculate stable
functions.. I'm working on a new version of this patch for pgbench
unless, of course, it's too late for v11.

Understood, but in fairness to other authors who got their patches
pushed for not providing a new patch before the CF, we should push this
one as well.

Now I understand, thank you.

Since this is in Waiting on Author state I have marked it Returned with
Feedback. Please submit to the next CF when you have a new patch.

Thanks, I'll do it.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company