WIP Patch: Pgbench Serialization and deadlock errors

Started by Marina Polyakovaover 8 years ago203 messages

m.polyakova@postgrespro.ru

over 8 years ago

4 attachment(s)

Hello, hackers!

Now in pgbench we can test only transactions with Read Committed
isolation level because client sessions are disconnected forever on
serialization failures. There were some proposals and discussions about
it (see message here [1]/messages/by-id/4EC65830020000250004323F@gw.wicourts.gov and thread here [2]/messages/by-id/alpine.DEB.2.02.1305182259550.1473@localhost6.localdomain6).

I suggest a patch where pgbench client sessions are not disconnected
because of serialization or deadlock failures and these failures are
mentioned in reports. In details:
- transaction with one of these failures continue run normally, but its
result is rolled back;
- if there were these failures during script execution this
"transaction" is marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress,
in aggregation logs and in the end with other results (all and for each
script);

Advanced options:
- mostly for testing built-in scripts: you can set the default
transaction isolation level by the appropriate benchmarking option (-I);
- for more detailed reports: to know per-statement serialization and
deadlock failures you can use the appropriate benchmarking option
(--report-failures).

Also: TAP tests for new functionality and changed documentation with new
examples.

Patches are attached. Any suggestions are welcome!

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

[1]: /messages/by-id/4EC65830020000250004323F@gw.wicourts.gov
/messages/by-id/4EC65830020000250004323F@gw.wicourts.gov
[2]: /messages/by-id/alpine.DEB.2.02.1305182259550.1473@localhost6.localdomain6
/messages/by-id/alpine.DEB.2.02.1305182259550.1473@localhost6.localdomain6

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v1-0002-Pgbench-Set-default-transaction-isolation-level.patchtext/x-diff; name=v1-0002-Pgbench-Set-default-transaction-isolation-level.patchDownload

From d27bdbb4e360b30ec0960634c13a6ba21c7a618d Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 9 Jun 2017 16:34:08 +0300
Subject: [PATCH v1 2/4] Pgbench Set default transaction isolation level

You can set the default transaction isolation level by the appropriate
benchmarking option (-I).
---
 src/bin/pgbench/pgbench.c                          | 66 ++++++++++++++++++-
 .../004_set_default_transaction_isolation_level.pl | 76 ++++++++++++++++++++++
 2 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pgbench/t/004_set_default_transaction_isolation_level.pl

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index bbf444b..8c48793 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -456,6 +456,24 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+/* Default transaction isolation level */
+typedef enum DefaultIsolationLevel
+{
+	READ_COMMITTED,
+	REPEATABLE_READ,
+	SERIALIZABLE,
+	NUM_DEFAULT_ISOLATION_LEVEL
+} DefaultIsolationLevel;
+
+DefaultIsolationLevel default_isolation_level = READ_COMMITTED;
+
+static const char *DEFAULT_ISOLATION_LEVEL_ABBREVIATION[] = {"RC", "RR", "S"};
+static const char *DEFAULT_ISOLATION_LEVEL_SQL[] = {
+	"read committed",
+	"repeatable read",
+	"serializable"
+};
+
 
 /* Function prototypes */
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -508,6 +526,8 @@ usage(void)
 		   "  -C, --connect            establish new connection for each transaction\n"
 		   "  -D, --define=VARNAME=VALUE\n"
 	  "                           define variable for use by custom script\n"
+		   "  -I, --default-isolation-level=RC|RR|S\n"
+	  "                           default transaction isolation level (default: RC)\n"
 		   "  -j, --jobs=NUM           number of threads (default: 1)\n"
 		   "  -l, --log                write transaction times to log file\n"
 		   "  -L, --latency-limit=NUM  count transactions lasting more than NUM ms as late\n"
@@ -2108,6 +2128,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				if (st->con == NULL)
 				{
 					instr_time	start;
+					char		buffer[256];
 
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2124,6 +2145,16 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 					/* Reset session-local state */
 					memset(st->prepared, 0, sizeof(st->prepared));
+
+					/* set default isolation level */
+					snprintf(buffer, sizeof(buffer),
+							 "set session characteristics as transaction isolation level %s",
+							 DEFAULT_ISOLATION_LEVEL_SQL[
+								default_isolation_level]);
+					executeStatement(st->con, buffer);
+					if (debug)
+						fprintf(stderr, "client %d execute command: %s\n",
+								st->id, buffer);
 				}
 
 				/*
@@ -3583,6 +3614,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
+	printf("default transaction isolation level: %s\n",
+		   DEFAULT_ISOLATION_LEVEL_SQL[default_isolation_level]);
 	printf("scaling factor: %d\n", scale);
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
@@ -3719,6 +3752,7 @@ main(int argc, char **argv)
 		{"fillfactor", required_argument, NULL, 'F'},
 		{"host", required_argument, NULL, 'h'},
 		{"initialize", no_argument, NULL, 'i'},
+		{"default-isolation-level", required_argument, NULL, 'I'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"log", no_argument, NULL, 'l'},
 		{"latency-limit", required_argument, NULL, 'L'},
@@ -3811,7 +3845,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:I:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -4051,6 +4085,25 @@ main(int argc, char **argv)
 					latency_limit = (int64) (limit_ms * 1000);
 				}
 				break;
+			case 'I':
+				{
+					benchmarking_option_set = true;
+
+					for (default_isolation_level = 0;
+						 default_isolation_level < NUM_DEFAULT_ISOLATION_LEVEL;
+						 default_isolation_level++)
+						if (strcmp(optarg,
+								DEFAULT_ISOLATION_LEVEL_ABBREVIATION[
+									default_isolation_level]) == 0)
+							break;
+					if (default_isolation_level >= NUM_DEFAULT_ISOLATION_LEVEL)
+					{
+						fprintf(stderr, "invalid default isolation level (-I): \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+				}
+				break;
 			case 0:
 				/* This covers long options which take no argument. */
 				if (foreign_keys || unlogged_tables)
@@ -4522,8 +4575,19 @@ threadRun(void *arg)
 		/* make connections to the database */
 		for (i = 0; i < nstate; i++)
 		{
+			char		buffer[256];
+
 			if ((state[i].con = doConnect()) == NULL)
 				goto done;
+
+			/* set default isolation level */
+			snprintf(buffer, sizeof(buffer),
+					 "set session characteristics as transaction isolation level %s",
+					 DEFAULT_ISOLATION_LEVEL_SQL[default_isolation_level]);
+			executeStatement(state[i].con, buffer);
+			if (debug)
+				fprintf(stderr, "client %d execute command: %s\n",
+						state[i].id, buffer);
 		}
 	}
 
diff --git a/src/bin/pgbench/t/004_set_default_transaction_isolation_level.pl b/src/bin/pgbench/t/004_set_default_transaction_isolation_level.pl
new file mode 100644
index 0000000..cb9f03b
--- /dev/null
+++ b/src/bin/pgbench/t/004_set_default_transaction_isolation_level.pl
@@ -0,0 +1,76 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 27;
+
+# Test concurrent update in table row with different default transaction
+# isolation levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+	    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+	  . 'INSERT INTO xy VALUES (1, 2);');
+
+my $script = $node->basedir . '/pgbench_script';
+append_to_file($script, "\\set delta random(-5000, 5000)\n");
+append_to_file($script, "UPDATE xy SET y = y + :delta WHERE x = 1;");
+
+# Test Read committed default transaction isolation level
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RC --file), $script ],
+	qr{default transaction isolation level: read committed},
+	'concurrent update: Read Committed: check default isolation level');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RC --file), $script ],
+	qr{processed: 50/50},
+	'concurrent update: Read Committed: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RC --file), $script ],
+	qr{serialization failures: 0 \(0\.000 %\)},
+	'concurrent update: Read Committed: check serialization failures');
+
+# Test Repeatable read default transaction isolation level
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RR --file), $script ],
+	qr{default transaction isolation level: repeatable read},
+	'concurrent update: Repeatable Read: check default isolation level');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RR --file), $script ],
+	qr{processed: 50/50},
+	'concurrent update: Repeatable Read: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=RR --file), $script ],
+	qr{serialization failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent update: Repeatable Read: check serialization failures');
+
+# Test Serializable default transaction isolation level
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=S --file), $script ],
+	qr{default transaction isolation level: serializable},
+	'concurrent update: Serializable: check default isolation level');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=S --file), $script ],
+	qr{processed: 50/50},
+	'concurrent update: Serializable: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10
+		  --default-isolation-level=S --file), $script ],
+	qr{serialization failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent update: Serializable: check serialization failures');
-- 
1.9.1

v1-0003-Pgbench-Report-per-statement-serialization-and-de.patchtext/x-diff; name=v1-0003-Pgbench-Report-per-statement-serialization-and-de.patchDownload

From fd2727472745929e50ebf564ab77d27588483f7c Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 9 Jun 2017 17:41:11 +0300
Subject: [PATCH v1 3/4] Pgbench Report per-statement serialization and
 deadlock failures

They are reported if you use the appropriate benchmarking option
(--report-failures). It can be combined with the average per-statement latencies
option (-r).
---
 src/bin/pgbench/pgbench.c | 69 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8c48793..ec27ace 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -179,6 +179,8 @@ int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
+bool		report_failures = false;	/* whether to report serialization and
+										 * deadlock failures per command */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -393,6 +395,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		serialization_failures;	/* number of serialization failures in
+										 * this command */
+	int64		deadlock_failures;	/* number of deadlock failures in this
+									 * command */
 } Command;
 
 typedef struct ParsedScript
@@ -543,6 +549,7 @@ usage(void)
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		"  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		"  --report-failures        report serialization and deadlock failures per command\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
@@ -2419,23 +2426,34 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
+				command = sql_script[st->use_file].commands[st->command];
+
 				if (is_latencies && !serialization_failure && !deadlock_failure)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
 
 					/* XXX could use a mutex here, but we choose not to */
-					command = sql_script[st->use_file].commands[st->command];
 					addToSimpleStats(&command->stats,
 									 INSTR_TIME_GET_DOUBLE(now) -
 									 INSTR_TIME_GET_DOUBLE(st->stmt_begin));
 				}
 
-				/* remember for transaction if there were failures */
+				/*
+				 * Accumulate per-command serialization / deadlock failures
+				 * count in thread-local data structure and remember for
+				 * transaction if there were failures.
+				 */
 				if (serialization_failure)
+				{
+					command->serialization_failures++;
 					st->serialization_failure = true;
+				}
 				if (deadlock_failure)
+				{
+					command->deadlock_failures++;
 					st->deadlock_failure = true;
+				}
 
 				/* Go ahead with next command */
 				st->command++;
@@ -3098,6 +3116,8 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
 
 	/*
 	 * If SQL command is multi-line, we only want to save the first line as
@@ -3167,6 +3187,8 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	my_command->type = META_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
 
 	/* Save first word (command name) */
 	j = 0;
@@ -3718,20 +3740,43 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (num_scripts > 1)
 				printSimpleStats(" - latency", &sql_script[i].stats.latency);
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/*
+			 * Report per-command serialization / deadlock failures and
+			 * latencies
+			 */
+			if (report_failures || is_latencies)
 			{
 				Command   **commands;
 
-				printf(" - statement latencies in milliseconds:\n");
+				if (report_failures && is_latencies)
+					printf(" - statement serialization, deadlock failures and latencies in milliseconds:\n");
+				else if (report_failures)
+					printf(" - statement serialization and deadlock failures:\n");
+				else
+					printf(" - statement latencies in milliseconds:\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
 					 commands++)
-					printf("   %11.3f  %s\n",
-						   1000.0 * (*commands)->stats.sum /
-						   (*commands)->stats.count,
-						   (*commands)->line);
+				{
+					if (report_failures && is_latencies)
+						printf("   %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d  %11.3f  %s\n",
+							   (*commands)->serialization_failures,
+							   (*commands)->deadlock_failures,
+							   1000.0 * (*commands)->stats.sum /
+							   (*commands)->stats.count,
+							   (*commands)->line);
+					else if (report_failures)
+						printf("   %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d  %s\n",
+							   (*commands)->serialization_failures,
+							   (*commands)->deadlock_failures,
+							   (*commands)->line);
+					else
+						printf("   %11.3f  %s\n",
+							   1000.0 * (*commands)->stats.sum /
+							   (*commands)->stats.count,
+							   (*commands)->line);
+				}
 			}
 		}
 	}
@@ -3779,6 +3824,7 @@ main(int argc, char **argv)
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
+		{"report-failures", no_argument, NULL, 8},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4144,6 +4190,11 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				logfile_prefix = pg_strdup(optarg);
 				break;
+			case 8:
+				benchmarking_option_set = true;
+				per_script_stats = true;
+				report_failures = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
-- 
1.9.1

v1-0004-Pgbench-Fix-documentation.patchtext/x-diff; name=v1-0004-Pgbench-Fix-documentation.patchDownload

From ce28b9433d596adcfa3ddd860faaeb8de3315971 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 13 Jun 2017 10:34:23 +0300
Subject: [PATCH v1 4/4] Pgbench Fix documentation

New:
- no fails on serialization and deadlock errors; report about them in logs, in
progress, in aggregation logs and in the end with other results (all and for
each script)
- benchmarking option to set default transaction isolation level (-I)
- benchmarking option to report per-statement serialization and deadlock
failures (--report-failures)
---
 doc/src/sgml/ref/pgbench.sgml | 177 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 152 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5735c48..158d300 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,23 +49,30 @@
 
 <screen>
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+default transaction isolation level: read committed
 scaling factor: 10
 query mode: simple
 number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 0 (0.000 %)
+number of transactions with deadlock failures: 0 (0.000 %)
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
+  The first seven lines report some of the most important parameter
   settings.  The next line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
   failed before completion.  (In <option>-T</> mode, only the actual
   number of transactions is printed.)
-  The last two lines report the number of transactions per second,
+  The next two lines report the number of
+  transactions with serialization and deadlock failures; unlike other errors
+  transactions with these ones don't fail and continue run normally, but
+  transaction result is rolled back.
+  And the last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
 
@@ -338,6 +345,30 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-I</option> <replaceable>isolationlevel</></term>
+      <term><option>--default-isolation-level=</option><replaceable>isolationlevel</></term>
+      <listitem>
+       <para>
+         Set default transaction isolation level.  Use next abbreviations for
+         <replaceable>isolationlevel</>:
+          <itemizedlist>
+           <listitem>
+            <para><literal>RC</>: set default transaction isolation level as Read Committed.</para>
+           </listitem>
+           <listitem>
+            <para><literal>RR</>: set default transaction isolation level as Repeatable Read.</para>
+           </listitem>
+           <listitem>
+            <para><literal>S</>: set default transaction isolation level as Serializable.</para>
+           </listitem>
+          </itemizedlist>
+          The default is Read Committed isolation level.  (See
+          <xref linkend="transaction-iso"> for more information.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-j</option> <replaceable>threads</></term>
       <term><option>--jobs=</option><replaceable>threads</></term>
       <listitem>
@@ -434,12 +465,13 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
       <listitem>
        <para>
         Show progress report every <replaceable>sec</> seconds.  The report
-        includes the time since the beginning of the run, the tps since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        includes the time since the beginning of the run and next
+        characteristics since the last report: the tps, the number of
+        transactions with serialization and deadlock failures, and the
+        transaction latency average and standard deviation.  Under throttling
+        (<option>-R</>), the latency is computed with respect to the transaction
+        scheduled start time, not the actual transaction beginning time, thus it
+        also includes the average schedule lag time.
        </para>
       </listitem>
      </varlistentry>
@@ -451,7 +483,9 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        <para>
         Report the average per-statement latency (execution time from the
         perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        finishes (with the number of serialization and deadlock failures for
+        each command if it is used with <option>--report-failures</> option).
+        See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -496,6 +530,15 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        </para>
 
        <para>
+        Transactions with serialization or deadlock failures (or with both
+        of them if used script contains several transactions; see
+        <xref linkend="transactions-and-scripts"
+        endterm="transactions-and-scripts-title"> for more information) are
+        marked separately and their time is not reported as for skipped
+        transactions.
+       </para>
+
+       <para>
         A high schedule lag time is an indication that the system cannot
         process transactions at the specified rate, with the chosen number of
         clients and threads. When the average transaction execution time is
@@ -593,6 +636,18 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>--report-failures</option></term>
+      <listitem>
+       <para>
+        Report the number of serialization and deadlock failures for each
+        command after the benchmark finishes (with the average per-statement
+        latencies of each command if it is used with <option>-r</> option).  See
+        below for details.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</></option></term>
       <listitem>
        <para>
@@ -693,8 +748,8 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</> executes test scripts chosen randomly
@@ -1169,6 +1224,13 @@ END;
    When both <option>--rate</> and <option>--latency-limit</> are used,
    the <replaceable>time</> for a skipped transaction will be reported as
    <literal>skipped</>.
+   If transaction has serialization / deadlock failure or them both (last thing
+   is possible if used script contains several transactions; see
+   <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"> for more information), its
+   <replaceable>time</> will be reported as <literal>serialization failure</> /
+   <literal>deadlock failure</> /
+   <literal>serialization and deadlock failures</> appropriately.
   </para>
 
   <para>
@@ -1198,6 +1260,20 @@ END;
   </para>
 
   <para>
+   Example with serialization, deadlock and both these failures:
+<screen>
+1 128 24968 0 1496759158 426984
+0 129 serialization failure 0 1496759158 427023
+3 129 serialization failure 0 1496759158 432662
+2 128 serialization failure 0 1496759158 432765
+0 130 deadlock failure 0 1496759159 460070
+1 129 serialization failure 0 1496759160 485188
+2 129 serialization and deadlock failures 0 1496759160 485339
+4 130 serialization failure 0 1496759160 485465
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</> option
    can be used to log only a random sample of transactions.
@@ -1212,7 +1288,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional>
+<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>num_serialization_failures_transactions</> <replaceable>num_deadlock_failures_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional>
 </synopsis>
 
    where
@@ -1220,6 +1296,9 @@ END;
    epoch time stamp),
    <replaceable>num_transactions</> is the number of transactions
    within the interval,
+   <replaceable>num_serialization_failures_transactions</> and
+   <replaceable>num_deadlock_failures_transactions</> are the numbers of
+   transactions with appropriate failures within the interval,
    <replaceable>sum_latency</replaceable> is the sum of the transaction
    latencies within the interval,
    <replaceable>sum_latency_2</replaceable> is the sum of squares of the
@@ -1244,11 +1323,11 @@ END;
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 0 0 1542744 483552416 61 2573
+1345828503 7884 0 0 1979812 565806736 60 1479
+1345828505 7208 0 0 1979422 567277552 59 1391
+1345828507 7685 0 0 1980268 569784714 60 1398
+1345828509 7073 0 0 1979779 573489941 236 1411
 </screen></para>
 
   <para>
@@ -1260,31 +1339,45 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Latencies and Failures</title>
 
   <para>
-   With the <option>-r</> option, <application>pgbench</> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   There're two options to get some per-statement characteristics: <option>-r</> and
+   <option>--report-failures</> (they can be combined together).  All values
+   are computed for each statement executed by every client and are reported
+   after the benchmark has finished.  With the <option>-r</> option,
+   <application>pgbench</> collects the elapsed transaction time of each
+   statement.  It then reports an average of those values, referred to as the
+   latency for each statement.  With the <option>--report-failures</> option,
+   <application>pgbench</> collects the number of serialization and deadlock
+   failures for each statement (notice that the total sum of per-command
+   failures of each type can be greater than the number of "transactions" with
+   these failures; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"> for more information).
   </para>
 
   <para>
-   For the default script, the output will look similar to this:
+   For the default script if you use <option>-r</> option, the output will look
+   similar to this:
 <screen>
 starting vacuum...end.
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+default transaction isolation level: read committed
 scaling factor: 1
 query mode: simple
 number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 0 (0.000 %)
+number of transactions with deadlock failures: 0 (0.000 %)
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
 script statistics:
+ - number of transactions with serialization failures: 0 (0.000%)
+ - number of transactions with deadlock failures: 0 (0.000%)
  - statement latencies in milliseconds:
         0.002  \set aid random(1, 100000 * :scale)
         0.005  \set bid random(1, 1 * :scale)
@@ -1298,11 +1391,45 @@ script statistics:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (-I S) and <option>--report-failures</> option:
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+default transaction isolation level: serializable
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 9284 (92.840 %)
+number of transactions with deadlock failures: 0 (0.000 %)
+latency average = 10.817 ms
+tps = 924.477920 (including connections establishing)
+tps = 928.159352 (excluding connections establishing)
+script statistics:
+ - number of transactions with serialization failures: 9284 (92.840%)
+ - number of transactions with deadlock failures: 0 (0.000%)
+ - statement serialization and deadlock failures:
+                           0                          0  \set aid random(1, 100000 * :scale)
+                           0                          0  \set bid random(1, 1 * :scale)
+                           0                          0  \set tid random(1, 10 * :scale)
+                           0                          0  \set delta random(-5000, 5000)
+                           0                          0  BEGIN;
+                           0                          0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+                           0                          0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+                        7975                          0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+                        1305                          0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+                           0                          0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+                           4                          0  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
-   separately for each script file.
+   If multiple script files are specified, the averages and the failures are
+   reported separately for each script file.
   </para>
 
   <para>
-- 
1.9.1

v1-0001-Pgbench-Serialization-and-deadlock-errors.patchtext/x-diff; name=v1-0001-Pgbench-Serialization-and-deadlock-errors.patchDownload

From c393e68c8da0c691fdccdfd4a584d035b18f982e Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 9 Jun 2017 14:42:30 +0300
Subject: [PATCH v1 1/4] Pgbench Serialization and deadlock errors

Now session is not disconnected because of serialization or deadlock errors.
If there were such errors during script execution this "transaction" is marked
appropriately in logs. Numbers of "transactions" with such errors are printed in
progress, in aggregation logs and in the end with other results (all and for
each script).
---
 src/bin/pgbench/pgbench.c                     | 169 +++++++++++++++++++++-----
 src/bin/pgbench/t/002_serialization_errors.pl |  75 ++++++++++++
 src/bin/pgbench/t/003_deadlock_errors.pl      |  93 ++++++++++++++
 3 files changed, 308 insertions(+), 29 deletions(-)
 create mode 100644 src/bin/pgbench/t/002_serialization_errors.pl
 create mode 100644 src/bin/pgbench/t/003_deadlock_errors.pl

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae36247..bbf444b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -58,6 +58,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -232,6 +235,10 @@ typedef struct StatsData
 	int64		cnt;			/* number of transactions */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		serialization_failures;	/* number of transactions with
+										 * serialization failures */
+	int64		deadlock_failures; /* number of transactions with deadlock
+									* failures */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -330,6 +337,10 @@ typedef struct
 
 	/* per client collected stats */
 	int64		cnt;			/* transaction count */
+	bool		serialization_failure;	/* if there was serialization failure
+										 * during script execution */
+	bool		deadlock_failure;	/* if there was deadlock failure during
+									 * script execution */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -786,6 +797,8 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -794,14 +807,20 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, bool serialization_failure,
+		   bool deadlock_failure, double lat, double lag)
 {
 	stats->cnt++;
 
-	if (skipped)
+	if (skipped || serialization_failure || deadlock_failure)
 	{
-		/* no latency to record on skipped transactions */
-		stats->skipped++;
+		/* no latency to record on such transactions */
+		if (skipped)
+			stats->skipped++;
+		if (serialization_failure)
+			stats->serialization_failures++;
+		if (deadlock_failure)
+			stats->deadlock_failures++;
 	}
 	else
 	{
@@ -1962,6 +1981,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	bool		serialization_failure = false;
+	bool		deadlock_failure = false;
+	bool		in_failed_transaction = false;
+	ExecStatusType result_status;
+	char	   *sqlState;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2121,6 +2145,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						st->txn_scheduled = INSTR_TIME_GET_MICROSEC(now);
 				}
 
+				/* reset transaction variables to default values */
+				st->serialization_failure = false;
+				st->deadlock_failure = false;
+
 				/* Begin with the first command */
 				st->command = 0;
 				st->state = CSTATE_START_COMMAND;
@@ -2142,6 +2170,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					break;
 				}
 
+				/* reset command result variables to default values */
+				serialization_failure = false;
+				deadlock_failure = false;
+				in_failed_transaction = false;
+
 				/*
 				 * Record statement start time if per-command latencies are
 				 * requested
@@ -2299,21 +2332,34 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Read and discard the query result;
 				 */
 				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
+				result_status = PQresultStatus(res);
+				sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+				if (sqlState) {
+					serialization_failure =
+						strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0;
+					deadlock_failure =
+						strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0;
+					in_failed_transaction =
+						strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0;
+				}
+
+				if (result_status == PGRES_COMMAND_OK ||
+					result_status == PGRES_TUPLES_OK ||
+					result_status == PGRES_EMPTY_QUERY ||
+					serialization_failure ||
+					deadlock_failure ||
+					in_failed_transaction)
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, PQerrorMessage(st->con));
-						PQclear(res);
-						st->state = CSTATE_ABORTED;
-						break;
+					/* OK */
+					PQclear(res);
+					discard_response(st);
+					st->state = CSTATE_END_COMMAND;
+				}
+				else
+				{
+					commandFailed(st, PQerrorMessage(st->con));
+					PQclear(res);
+					st->state = CSTATE_ABORTED;
 				}
 				break;
 
@@ -2342,7 +2388,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (is_latencies && !serialization_failure && !deadlock_failure)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2354,6 +2400,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 									 INSTR_TIME_GET_DOUBLE(st->stmt_begin));
 				}
 
+				/* remember for transaction if there were failures */
+				if (serialization_failure)
+					st->serialization_failure = true;
+				if (deadlock_failure)
+					st->deadlock_failure = true;
+
 				/* Go ahead with next command */
 				st->command++;
 				st->state = CSTATE_START_COMMAND;
@@ -2370,9 +2422,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (progress || throttle_delay || latency_limit ||
 					per_script_stats || use_log)
+				{
 					processXactStats(thread, st, &now, false, agg);
+				}
 				else
+				{
 					thread->stats.cnt++;
+					if (st->serialization_failure)
+						thread->stats.serialization_failures++;
+					if (st->deadlock_failure)
+						thread->stats.deadlock_failures++;
+				}
 
 				if (is_connect)
 				{
@@ -2462,9 +2522,11 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
 					(long) agg->start_time,
 					agg->cnt,
+					agg->serialization_failures,
+					agg->deadlock_failures,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
@@ -2486,17 +2548,28 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, st->serialization_failure,
+				   st->deadlock_failure, latency, lag);
 	}
 	else
 	{
 		/* no, print raw transactions */
 		struct timeval tv;
+		char		transaction_label[256];
 
-		gettimeofday(&tv, NULL);
 		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+			snprintf(transaction_label, sizeof(transaction_label), "skipped");
+		else if (st->serialization_failure && st->deadlock_failure)
+			snprintf(transaction_label, sizeof(transaction_label),
+					 "serialization and deadlock failures");
+		else if (st->serialization_failure || st->deadlock_failure)
+			snprintf(transaction_label, sizeof(transaction_label), "%s failure",
+					 st->serialization_failure ? "serialization" : "deadlock");
+
+		gettimeofday(&tv, NULL);
+		if (skipped || st->serialization_failure || st->deadlock_failure)
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d %ld %ld",
+					st->id, st->cnt, transaction_label, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
 		else
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
@@ -2523,7 +2596,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
 		INSTR_TIME_SET_CURRENT(*now);
 
-	if (!skipped)
+	if (!skipped && !st->serialization_failure && !st->deadlock_failure)
 	{
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
@@ -2532,21 +2605,30 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	if (progress || throttle_delay || latency_limit)
 	{
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, st->serialization_failure,
+				   st->deadlock_failure, latency, lag);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
 			thread->latency_late++;
 	}
 	else
+	{
 		thread->stats.cnt++;
+		if (st->serialization_failure)
+			thread->stats.serialization_failures++;
+		if (st->deadlock_failure)
+			thread->stats.deadlock_failures++;
+	}
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped,
+				   st->serialization_failure, st->deadlock_failure, latency,
+				   lag);
 }
 
 
@@ -3522,6 +3604,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (total->cnt <= 0)
 		return;
 
+	printf("number of transactions with serialization failures: " INT64_FORMAT " (%.3f %%)\n",
+		   total->serialization_failures,
+		   (100.0 * total->serialization_failures / total->cnt));
+
+	printf("number of transactions with deadlock failures: " INT64_FORMAT " (%.3f %%)\n",
+		   total->deadlock_failures,
+		   (100.0 * total->deadlock_failures / total->cnt));
+
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
@@ -3576,6 +3666,16 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			else
 				printf("script statistics:\n");
 
+			printf(" - number of transactions with serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+				   sql_script[i].stats.serialization_failures,
+				   (100.0 * sql_script[i].stats.serialization_failures /
+					sql_script[i].stats.cnt));
+
+			printf(" - number of transactions with deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+				   sql_script[i].stats.deadlock_failures,
+				   (100.0 * sql_script[i].stats.deadlock_failures /
+					sql_script[i].stats.cnt));
+
 			if (latency_limit)
 				printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 					   sql_script[i].stats.skipped,
@@ -4340,6 +4440,8 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -4639,6 +4741,9 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.serialization_failures +=
+						thread[i].stats.serialization_failures;
+					cur.deadlock_failures += thread[i].stats.deadlock_failures;
 				}
 
 				total_run = (now - thread_start) / 1000000.0;
@@ -4669,8 +4774,14 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 
 				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
+						"progress: %s, %.1f tps, " INT64_FORMAT " serialization failures transactions, " INT64_FORMAT " deadlock failures transactions, lat %.3f ms stddev %.3f",
+						tbuf,
+						tps,
+						(cur.serialization_failures -
+						 last.serialization_failures),
+						(cur.deadlock_failures - last.deadlock_failures),
+						latency,
+						stdev);
 
 				if (throttle_delay)
 				{
diff --git a/src/bin/pgbench/t/002_serialization_errors.pl b/src/bin/pgbench/t/002_serialization_errors.pl
new file mode 100644
index 0000000..8d0d99f
--- /dev/null
+++ b/src/bin/pgbench/t/002_serialization_errors.pl
@@ -0,0 +1,75 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 18;
+
+# Test concurrent update in table row with different transaction isolation
+# levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+	    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+	  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Test serialization errors on transactions with Read committed isolation level
+my $script_read_committed = $node->basedir . '/pgbench_script_read_committed';
+append_to_file($script_read_committed,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_read_committed ],
+	qr{processed: 50/50},
+	'concurrent update: Read Committed: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_read_committed ],
+	qr{serialization failures: 0 \(0\.000 %\)},
+	'concurrent update: Read Committed: check serialization failures');
+
+# Test serialization errors on transactions with Repeatable read isolation level
+my $script_repeatable_read = $node->basedir . '/pgbench_script_repeatable_read';
+append_to_file($script_repeatable_read,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_repeatable_read ],
+	qr{processed: 50/50},
+	'concurrent update: Repeatable Read: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_repeatable_read ],
+	qr{serialization failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent update: Repeatable Read: check serialization failures');
+
+# Test serialization errors on transactions with Serializable isolation level
+my $script_serializable = $node->basedir . '/pgbench_script_serializable';
+append_to_file($script_serializable,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_serializable ],
+	qr{processed: 50/50},
+	'concurrent update: Serializable: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_serializable ],
+	qr{serialization failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent update: Serializable: check serialization failures');
diff --git a/src/bin/pgbench/t/003_deadlock_errors.pl b/src/bin/pgbench/t/003_deadlock_errors.pl
new file mode 100644
index 0000000..791d456
--- /dev/null
+++ b/src/bin/pgbench/t/003_deadlock_errors.pl
@@ -0,0 +1,93 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 18;
+
+# Test concurrent deadlock updates in table with different transaction isolation
+# levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+	    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+	  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+# Test deadlock errors on transactions with Read committed isolation level
+my $script_read_committed = $node->basedir . '/pgbench_script_read_committed';
+append_to_file($script_read_committed,
+		"\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_read_committed ],
+	qr{processed: 50/50},
+	'concurrent deadlock update: Read Committed: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_read_committed ],
+	qr{deadlock failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent deadlock update: Read Committed: check deadlock failures');
+
+# Test deadlock errors on transactions with Repeatable read isolation level
+my $script_repeatable_read = $node->basedir . '/pgbench_script_repeatable_read';
+append_to_file($script_repeatable_read,
+		"\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_repeatable_read ],
+	qr{processed: 50/50},
+	'concurrent deadlock update: Repeatable Read: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_repeatable_read ],
+	qr{deadlock failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent deadlock update: Repeatable Read: check deadlock failures');
+
+# Test deadlock errors on transactions with Serializable isolation level
+my $script_serializable = $node->basedir . '/pgbench_script_serializable';
+append_to_file($script_serializable,
+		"\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;\n"
+	  . "BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;\n");
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_serializable ],
+	qr{processed: 50/50},
+	'concurrent update: Serializable: check processed transactions');
+
+$node->command_like(
+	[   qw(pgbench --no-vacuum --client=5 --transactions=10 --file),
+		$script_serializable ],
+	qr{deadlock failures: [1-9]\d* \([1-9]\d*\.\d* %\)},
+	'concurrent update: Serializable: check deadlock failures');
\ No newline at end of file
-- 
1.9.1

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Marina Polyakova (#1)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Wed, Jun 14, 2017 at 4:48 AM, Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

Now in pgbench we can test only transactions with Read Committed isolation
level because client sessions are disconnected forever on serialization
failures. There were some proposals and discussions about it (see message
here [1] and thread here [2]).

I suggest a patch where pgbench client sessions are not disconnected because
of serialization or deadlock failures and these failures are mentioned in
reports. In details:
- transaction with one of these failures continue run normally, but its
result is rolled back;
- if there were these failures during script execution this "transaction" is
marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress, in
aggregation logs and in the end with other results (all and for each
script);

Advanced options:
- mostly for testing built-in scripts: you can set the default transaction
isolation level by the appropriate benchmarking option (-I);
- for more detailed reports: to know per-statement serialization and
deadlock failures you can use the appropriate benchmarking option
(--report-failures).

Also: TAP tests for new functionality and changed documentation with new
examples.

Patches are attached. Any suggestions are welcome!

Sounds like a good idea. Please add to the next CommitFest and review
somebody else's patch in exchange for having your own patch reviewed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Robert Haas (#2)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Sounds like a good idea.

Thank you!

Please add to the next CommitFest

Done: https://commitfest.postgresql.org/14/1170/

and review
somebody else's patch in exchange for having your own patch reviewed.

Of course, I remember about it.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Marina Polyakova (#1)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hi,

On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

Now in pgbench we can test only transactions with Read Committed isolation
level because client sessions are disconnected forever on serialization
failures. There were some proposals and discussions about it (see message
here [1] and thread here [2]).

I suggest a patch where pgbench client sessions are not disconnected because
of serialization or deadlock failures and these failures are mentioned in
reports.

I think that's a good idea and sorely needed.

In details:

- if there were these failures during script execution this "transaction" is
marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress, in
aggregation logs and in the end with other results (all and for each
script);

I guess that'll include a "rolled-back %' or 'retried %' somewhere?

Advanced options:
- mostly for testing built-in scripts: you can set the default transaction
isolation level by the appropriate benchmarking option (-I);

I'm less convinced of the need of htat, you can already set arbitrary
connection options with
PGOPTIONS='-c default_transaction_isolation=serializable' pgbench

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

I can't quite parse that sentence, could you restate?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@gmail.com

over 8 years ago

In reply to: Andres Freund (#4)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Thu, Jun 15, 2017 at 2:16 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

I suggest a patch where pgbench client sessions are not disconnected because
of serialization or deadlock failures and these failures are mentioned in
reports.

I think that's a good idea and sorely needed.

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure. Most people who use serializable transactions (at
least in my experience) run though a framework that does that
automatically, regardless of what client code initiated the
transaction. These retries are generally hidden from the client
code -- it just looks like the transaction took a bit longer.
Sometimes people will have a limit on the number of retries. I
never used such a limit and never had a problem, because our
implementation of serializable transactions will not throw a
serialization failure error until one of the transactions involved
in causing it has successfully committed -- meaning that the retry
can only hit this again on a *new* set of transactions.

Essentially, the transaction should only count toward the TPS rate
when it eventually completes without a serialization failure.

Marina, did I understand you correctly?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 8 years ago

In reply to: Kevin Grittner (#5)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Kevin Grittner wrote:

On Thu, Jun 15, 2017 at 2:16 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure.

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

I think it's pretty obvious that transactions that failed with
some serializability problem should be retried.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Alvaro Herrera (#6)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Fri, Jun 16, 2017 at 9:18 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

On Thu, Jun 15, 2017 at 2:16 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure.

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

I think it's pretty obvious that transactions that failed with
some serializability problem should be retried.

+1 for retry with reporting of retry rates

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@gmail.com

over 8 years ago

In reply to: Alvaro Herrera (#6)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Thu, Jun 15, 2017 at 4:18 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

I think it's pretty obvious that transactions that failed with
some serializability problem should be retried.

Agreed all around.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Andres Freund (#4)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hi,

Hello!

I think that's a good idea and sorely needed.

Thanks, I'm very glad to hear it!

- if there were these failures during script execution this
"transaction" is
marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in
progress, in
aggregation logs and in the end with other results (all and for each
script);

I guess that'll include a "rolled-back %' or 'retried %' somewhere?

Not exactly, see documentation:

+   If transaction has serialization / deadlock failure or them both 
(last thing
+   is possible if used script contains several transactions; see
+   <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"> for more information), its
+   <replaceable>time</> will be reported as <literal>serialization 
failure</> /
+   <literal>deadlock failure</> /
+   <literal>serialization and deadlock failures</> appropriately.

+   Example with serialization, deadlock and both these failures:
+<screen>
+1 128 24968 0 1496759158 426984
+0 129 serialization failure 0 1496759158 427023
+3 129 serialization failure 0 1496759158 432662
+2 128 serialization failure 0 1496759158 432765
+0 130 deadlock failure 0 1496759159 460070
+1 129 serialization failure 0 1496759160 485188
+2 129 serialization and deadlock failures 0 1496759160 485339
+4 130 serialization failure 0 1496759160 485465
+</screen>

I have understood proposals in next messages of this thread that the
most interesting case is to retry failed transaction. Do you think it's
better to write for example "rolled-back after % retries (serialization
failure)' or "time (retried % times, serialization and deadlock
failures)'?

Advanced options:
- mostly for testing built-in scripts: you can set the default
transaction
isolation level by the appropriate benchmarking option (-I);

I'm less convinced of the need of htat, you can already set arbitrary
connection options with
PGOPTIONS='-c default_transaction_isolation=serializable' pgbench

Oh, thanks, I forgot about it =[

P.S. Does this use case (do not retry transaction with serialization
or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of
success...)?

I can't quite parse that sentence, could you restate?

Álvaro Herrera later in this thread has understood my text right:

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

With his explanation has my text become clearer?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Kevin Grittner (#5)

Re: WIP Patch: Pgbench Serialization and deadlock errors

P.S. Does this use case (do not retry transaction with serialization
or
deadlock failure) is most interesting or failed transactions should
be
retried (and how much times if there seems to be no hope of
success...)?

I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure. Most people who use serializable transactions (at
least in my experience) run though a framework that does that
automatically, regardless of what client code initiated the
transaction. These retries are generally hidden from the client
code -- it just looks like the transaction took a bit longer.
Sometimes people will have a limit on the number of retries. I
never used such a limit and never had a problem, because our
implementation of serializable transactions will not throw a
serialization failure error until one of the transactions involved
in causing it has successfully committed -- meaning that the retry
can only hit this again on a *new* set of transactions.

Essentially, the transaction should only count toward the TPS rate
when it eventually completes without a serialization failure.

Marina, did I understand you correctly?

Álvaro Herrera in next message of this thread has understood my text
right:

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

And thank you very much for your explanation how and why transactions
with failures should be retried! I'll try to implement all of it.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Alvaro Herrera (#6)

Re: WIP Patch: Pgbench Serialization and deadlock errors

P.S. Does this use case (do not retry transaction with serialization or
deadlock failure) is most interesting or failed transactions should be
retried (and how much times if there seems to be no hope of success...)?

I can't quite parse that sentence, could you restate?

The way I read it was that the most interesting solution would retry
a transaction from the beginning on a serialization failure or
deadlock failure.

As far as I understand her proposal, it is exactly the opposite -- if a
transaction fails, it is discarded. And this P.S. note is asking
whether this is a good idea, or would we prefer that failing
transactions are retried.

Yes, I have meant this, thank you!

I think it's pretty obvious that transactions that failed with
some serializability problem should be retried.

Thank you voted :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Kevin Grittner

kgrittn@gmail.com

over 8 years ago

In reply to: Marina Polyakova (#10)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Fri, Jun 16, 2017 at 5:31 AM, Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

And thank you very much for your explanation how and why transactions with
failures should be retried! I'll try to implement all of it.

To be clear, part of "retrying from the beginning" means that if a
result from one statement is used to determine the content (or
whether to run) a subsequent statement, that first statement must be
run in the new transaction and the results evaluated again to
determine what to use for the later statement. You can't simply
replay the statements that were run during the first try. For
examples, to help get a feel of why that is, see:

https://wiki.postgresql.org/wiki/SSI

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Kevin Grittner (#12)

Re: WIP Patch: Pgbench Serialization and deadlock errors

<div dir='auto'><div class="gmail_extra" dir="auto"><div style="font-family: sans-serif; font-size: 13.696px;" dir="auto"><div style="font-size: 13.696px;" dir="auto"><div><font face="arial, helvetica, sans-serif">> To be clear, part of "retrying from the beginning" means that if a</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">result from one statement is used to determine the content (or</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">whether to run) a subsequent statement, that first statement must be</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">run in the new transaction and the results evaluated again to</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">determine what to use for the later statement.  You can't simply</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">replay the statements that were run during the first try.  For</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif">examples, to help get a feel of why that is, see:</font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">> </span><font face="arial, helvetica, sans-serif"><br></font></div><div><span style="font-family:'arial' , 'helvetica' , sans-serif">></span><span style="font-family:'arial' , 'helvetica' , sans-serif"> </span><font face="arial, helvetica, sans-serif"><a href="https://wiki.postgresql.org/wiki/SSI" style="text-decoration-line: none; color: rgb(66, 133, 244);">https://wiki.postgresql.org/<wbr>wiki/SSI</a></font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Thank you again! :))</font></div></div><div style="font-size: 13.696px; font-family: arial, helvetica, sans-serif;" dir="auto"><br></div><div style="font-size: 13.696px; font-family: arial, helvetica, sans-serif;" dir="auto">-- </div><div style="font-size: 13.696px; font-family: arial, helvetica, sans-serif;" dir="auto">Marina Polyakova</div><div style="font-size: 13.696px; font-family: arial, helvetica, sans-serif;" dir="auto">Postgres Professional: <a href="http://www.postgrespro.com/" style="text-decoration-line: none; color: rgb(66, 133, 244);">http://www.postgrespro.com</a></div><div style="font-size: 13.696px; font-family: arial, helvetica, sans-serif;" dir="auto">The Russian Postgres Company</div></div></div></div>

Import Notes

Resolved by subject fallback

#14

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#1)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

A few comments about the submitted patches.

I agree that improving the error handling ability of pgbench is a good
thing, although I'm not sure about the implications...

About the "retry" discussion: I agree that retry is the relevant option
from an application point of view.

ISTM that the retry implementation should be implemented somehow in the
automaton, restarting the same script for the beginning.

As pointed out in the discussion, the same values/commands should be
executed, which suggests that random generated values should be the same
on the retry runs, so that for a simple script the same operations are
attempted. This means that the random generator state must be kept &
reinstated for a client on retries. Currently the random state is in the
thread, which is not convenient for this purpose, so it should be moved in
the client so that it can be saved at transaction start and reinstated on
retries.

The number of retries and maybe failures should be counted, maybe with
some adjustable maximum, as suggested.

About 0001:

In accumStats, just use one level if, the two levels bring nothing.

In doLog, added columns should be at the end of the format. The number of
column MUST NOT change when different issues arise, so that it works well
with cut/... unix commands, so inserting a sentence such as "serialization
and deadlock failures" is a bad idea.

threadRun: the point of the progress format is to fit on one not too wide
line on a terminal and to allow some simple automatic processing. Adding a
verbose sentence in the middle of it is not the way to go.

About tests: I do not understand why test 003 includes 2 transactions.
It would seem more logical to have two scripts.

About 0003:

I'm not sure that there should be an new option to report failures, the
information when relevant should be integrated in a clean format into the
existing reports... Maybe the "per command latency" report/option should
be renamed if it becomes more general.

About 0004:

The documentation must not be in a separate patch, but in the same patch
as their corresponding code.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#14)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Hello, Fabien!

A few comments about the submitted patches.

Thank you very much for them!

I agree that improving the error handling ability of pgbench is a good
thing, although I'm not sure about the implications...

Could you tell a little bit more exactly.. What implications are you
worried about?

About the "retry" discussion: I agree that retry is the relevant
option from an application point of view.

I'm glad to hear it!

ISTM that the retry implementation should be implemented somehow in
the automaton, restarting the same script for the beginning.

If there are several transactions in this script - don't you think that
we should restart only the failed transaction?..

As pointed out in the discussion, the same values/commands should be
executed, which suggests that random generated values should be the
same on the retry runs, so that for a simple script the same
operations are attempted. This means that the random generator state
must be kept & reinstated for a client on retries. Currently the
random state is in the thread, which is not convenient for this
purpose, so it should be moved in the client so that it can be saved
at transaction start and reinstated on retries.

I think about it in the same way =)

The number of retries and maybe failures should be counted, maybe with
some adjustable maximum, as suggested.

If we fix the maximum number of attempts the maximum number of failures
for one script execution will be bounded above
(number_of_transactions_in_script * maximum_number_of_attempts). Do you
think we should make the option in program to limit this number much
more?

About 0001:

In accumStats, just use one level if, the two levels bring nothing.

Thanks, I agree =[

In doLog, added columns should be at the end of the format.

I have inserted it earlier because these columns are not optional. Do
you think they should be optional?

The number
of column MUST NOT change when different issues arise, so that it
works well with cut/... unix commands, so inserting a sentence such as
"serialization and deadlock failures" is a bad idea.

Thanks, I agree again.

threadRun: the point of the progress format is to fit on one not too
wide line on a terminal and to allow some simple automatic processing.
Adding a verbose sentence in the middle of it is not the way to go.

I was thinking about it.. Thanks, I'll try to make it shorter.

About tests: I do not understand why test 003 includes 2 transactions.
It would seem more logical to have two scripts.

Ok!

About 0003:

I'm not sure that there should be an new option to report failures,
the information when relevant should be integrated in a clean format
into the existing reports... Maybe the "per command latency"
report/option should be renamed if it becomes more general.

I have tried do not change other parts of program as much as possible.
But if you think that it will be more useful to change the option I'll
do it.

About 0004:

The documentation must not be in a separate patch, but in the same
patch as their corresponding code.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#15)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

I agree that improving the error handling ability of pgbench is a good
thing, although I'm not sure about the implications...

Could you tell a little bit more exactly.. What implications are you worried
about?

The current error handling is either "close connection" or maybe in some
cases even "exit". If this is changed, then the client may continue
execution in some unforseen state and behave unexpectedly. We'll see.

ISTM that the retry implementation should be implemented somehow in
the automaton, restarting the same script for the beginning.

If there are several transactions in this script - don't you think that we
should restart only the failed transaction?..

On some transaction failures based on their status. My point is that the
retry process must be implemented clearly with a new state in the client
automaton. Exactly when the transition to this new state must be taken is
another issue.

The number of retries and maybe failures should be counted, maybe with
some adjustable maximum, as suggested.

If we fix the maximum number of attempts the maximum number of failures for
one script execution will be bounded above (number_of_transactions_in_script
* maximum_number_of_attempts). Do you think we should make the option in
program to limit this number much more?

Probably not. I think that there should be a configurable maximum of
retries on a transaction, which may be 0 by default if we want to be
upward compatible with the current behavior, or maybe something else.

In doLog, added columns should be at the end of the format.

I have inserted it earlier because these columns are not optional. Do you
think they should be optional?

I think that new non-optional columns it should be at the end of the
existing non-optional columns so that existing scripts which may process
the output may not need to be updated.

I'm not sure that there should be an new option to report failures,
the information when relevant should be integrated in a clean format
into the existing reports... Maybe the "per command latency"
report/option should be renamed if it becomes more general.

I have tried do not change other parts of program as much as possible. But if
you think that it will be more useful to change the option I'll do it.

I think that the option should change if its naming becomes less relevant,
which is to be determined. AFAICS, ISTM that new measures should be added
to the various existing reports unconditionnaly (i.e. without a new
option), so maybe no new option would be needed.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#16)

Re: WIP Patch: Pgbench Serialization and deadlock errors

The current error handling is either "close connection" or maybe in
some cases even "exit". If this is changed, then the client may
continue execution in some unforseen state and behave unexpectedly.
We'll see.

Thanks, now I understand this.

ISTM that the retry implementation should be implemented somehow in
the automaton, restarting the same script for the beginning.

If there are several transactions in this script - don't you think
that we should restart only the failed transaction?..

On some transaction failures based on their status. My point is that
the retry process must be implemented clearly with a new state in the
client automaton. Exactly when the transition to this new state must
be taken is another issue.

About it, I agree with you that it should be done in this way.

The number of retries and maybe failures should be counted, maybe
with
some adjustable maximum, as suggested.

If we fix the maximum number of attempts the maximum number of
failures for one script execution will be bounded above
(number_of_transactions_in_script * maximum_number_of_attempts). Do
you think we should make the option in program to limit this number
much more?

Probably not. I think that there should be a configurable maximum of
retries on a transaction, which may be 0 by default if we want to be
upward compatible with the current behavior, or maybe something else.

I propose the option --max-attempts-number=NUM which NUM cannot be less
than 1. I propose it because I think that, for example,
--max-attempts-number=100 is better than --max-retries-number=99. And
maybe it's better to set its default value to 1 too because retrying of
shell commands can produce new errors..

In doLog, added columns should be at the end of the format.

I have inserted it earlier because these columns are not optional. Do
you think they should be optional?

I think that new non-optional columns it should be at the end of the
existing non-optional columns so that existing scripts which may
process the output may not need to be updated.

Thanks, I agree with you :)

I'm not sure that there should be an new option to report failures,
the information when relevant should be integrated in a clean format
into the existing reports... Maybe the "per command latency"
report/option should be renamed if it becomes more general.

I have tried do not change other parts of program as much as possible.
But if you think that it will be more useful to change the option I'll
do it.

I think that the option should change if its naming becomes less
relevant, which is to be determined. AFAICS, ISTM that new measures
should be added to the various existing reports unconditionnaly (i.e.
without a new option), so maybe no new option would be needed.

Thanks! I didn't think about it in this way..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#17)

Re: WIP Patch: Pgbench Serialization and deadlock errors

The number of retries and maybe failures should be counted, maybe with
some adjustable maximum, as suggested.

If we fix the maximum number of attempts the maximum number of failures
for one script execution will be bounded above
(number_of_transactions_in_script * maximum_number_of_attempts). Do you
think we should make the option in program to limit this number much more?

Probably not. I think that there should be a configurable maximum of
retries on a transaction, which may be 0 by default if we want to be
upward compatible with the current behavior, or maybe something else.

I propose the option --max-attempts-number=NUM which NUM cannot be less than
1. I propose it because I think that, for example, --max-attempts-number=100
is better than --max-retries-number=99. And maybe it's better to set its
default value to 1 too because retrying of shell commands can produce new
errors..

Personnaly, I like counting retries because it also counts the number of
time the transaction actually failed for some reason. But this is a
marginal preference, and one can be switchted to the other easily.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Alexander Korotkov

a.korotkov@postgrespro.ru

over 8 years ago

In reply to: Andres Freund (#4)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Thu, Jun 15, 2017 at 10:16 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-14 11:48:25 +0300, Marina Polyakova wrote:

Advanced options:
- mostly for testing built-in scripts: you can set the default

transaction

isolation level by the appropriate benchmarking option (-I);

I'm less convinced of the need of htat, you can already set arbitrary
connection options with
PGOPTIONS='-c default_transaction_isolation=serializable' pgbench

Right, there is already way to specify default isolation level using
environment variables.
However, once we make pgbench work with various isolation levels, users may
want to run pgbench multiple times in a row with different isolation
levels. Command line option would be very convenient in this case.
In addition, isolation level is vital parameter to interpret benchmark
results correctly. Often, graphs with pgbench results are entitled with
pgbench command line. Having, isolation level specified in command line
would naturally fit into this entitling scheme.
Of course, this is solely usability question and it's fair enough to live
without such a command line option. But I'm +1 to add this option.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#20

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#18)

1 attachment(s)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello everyone!

There's the second version of my patch for pgbench. Now transactions
with serialization and deadlock failures are rolled back and retried
until they end successfully or their number of attempts reaches maximum.

In details:
- You can set the maximum number of attempts by the appropriate
benchmarking option (--max-attempts-number). Its default value is 1
partly because retrying of shell commands can produce new errors.
- Statistics of attempts and failures is printed in progress, in
transaction / aggregation logs and in the end with other results (all
and for each script). The transaction failure is reported here only if
the last retry of this transaction fails.
- Also failures and average numbers of transactions attempts are printed
per-command with average latencies if you use the appropriate
benchmarking option (--report-per-command, -r) (it replaces the option
--report-latencies as I was advised here [1]/messages/by-id/alpine.DEB.2.20.1707031321370.3419@lancre). Average numbers of
transactions attempts are printed only for commands which start
transactions.

As usual: TAP tests for new functionality and changed documentation with
new examples.

Patch is attached. Any suggestions are welcome!

[1]: /messages/by-id/alpine.DEB.2.20.1707031321370.3419@lancre
/messages/by-id/alpine.DEB.2.20.1707031321370.3419@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v2-0001-Pgbench-Retry-transactions-with-serialization-or-.patchtext/x-diff; name=v2-0001-Pgbench-Retry-transactions-with-serialization-or-.patchDownload

From 58f51cdc896af801bcd35e495406655ca03aa6ce Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Mon, 10 Jul 2017 13:33:41 +0300
Subject: [PATCH v2] Pgbench Retry transactions with serialization or deadlock
 errors

Now transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of attempts
reaches maximum. You can set the maximum number of attempts by the appropriate
benchmarking option (--max-attempts-number). Its default value is 1. Statistics
of attempts and failures is printed in progress, in transaction / aggregation
logs and in the end with other results (all and for each script). The
transaction failure is reported here only if the last retry of this transaction
fails. Also failures and average numbers of transactions attempts are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). Average numbers of transactions attempts are
printed only for commands which start transactions.
---
 doc/src/sgml/ref/pgbench.sgml                      | 277 ++++++--
 src/bin/pgbench/pgbench.c                          | 751 ++++++++++++++++++---
 src/bin/pgbench/t/002_serialization_errors.pl      | 121 ++++
 src/bin/pgbench/t/003_deadlock_errors.pl           | 130 ++++
 src/bin/pgbench/t/004_retry_failed_transactions.pl | 280 ++++++++
 5 files changed, 1421 insertions(+), 138 deletions(-)
 create mode 100644 src/bin/pgbench/t/002_serialization_errors.pl
 create mode 100644 src/bin/pgbench/t/003_deadlock_errors.pl
 create mode 100644 src/bin/pgbench/t/004_retry_failed_transactions.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 64b043b..dc1daa9 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,22 +49,34 @@
 
 <screen>
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum attempts number: 1
 scaling factor: 10
 query mode: simple
 number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 0 (0.000 %)
+number of transactions with deadlock failures: 0 (0.000 %)
+attempts number average = 1.00
+attempts number stddev = 0.00
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
+  The first seven lines report some of the most important parameter
   settings.  The next line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
   failed before completion.  (In <option>-T</> mode, only the actual
   number of transactions is printed.)
+  The next four lines report the number of transactions with serialization and
+  deadlock failures, and also the statistics of transactions attempts.  With
+  such errors, transactions are rolled back and are repeated again and again
+  until they end sucessufully or their number of attempts reaches maximum (to
+  change this maximum see the appropriate benchmarking option
+  <option>--max-attempts-number</>).  The transaction failure is reported here
+  only if the last retry of this transaction fails.
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -434,24 +446,28 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
       <listitem>
        <para>
         Show progress report every <replaceable>sec</> seconds.  The report
-        includes the time since the beginning of the run, the tps since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        includes the time since the beginning of the run and the following
+        statistics since the last report: the tps, the transaction latency
+        average and standard deviation, the number of transactions with
+        serialization and deadlock failures, and the average number of
+        transactions attempts and its standard deviation. Under throttling
+        (<option>-R</>), the latency is computed with respect to the transaction
+        scheduled start time, not the actual transaction beginning time, thus it
+        also includes the average schedule lag time.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of serialization and deadlock
+        failures, and the average number of transactions attempts (only for
+        commands that start transactions).  See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -496,6 +512,15 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        </para>
 
        <para>
+        Transactions with serialization or deadlock failures (or with both
+        of them if used script contains several transactions; see
+        <xref linkend="transactions-and-scripts"
+        endterm="transactions-and-scripts-title"> for more information) are
+        marked separately and their time is not reported as for skipped
+        transactions.
+       </para>
+
+       <para>
         A high schedule lag time is an indication that the system cannot
         process transactions at the specified rate, with the chosen number of
         clients and threads. When the average transaction execution time is
@@ -590,6 +615,23 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-attempts-number=<replaceable>attempts_number</></option></term>
+      <listitem>
+       <para>
+        Set the maximum attempts number for transactions. Default is 1.
+       </para>
+       <note>
+         <para>
+         Be careful if you want to repeat transactions with shell commands
+         inside.  Unlike sql commands the result of shell command is not rolled
+         back except for its variable value.  If a shell command fails its
+         client is aborted without restarting.
+         </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -693,8 +735,8 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</> executes test scripts chosen randomly
@@ -1148,7 +1190,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>script_no</> <replaceable>time_epoch</> <replaceable>time_us</> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>script_no</> <replaceable>time_epoch</> <replaceable>time_us</> <replaceable>average_attempts_number</> <optional> <replaceable>schedule_lag</replaceable> </optional>
 </synopsis>
 
    where
@@ -1158,39 +1200,53 @@ END;
    <replaceable>time</> is the total elapsed transaction time in microseconds,
    <replaceable>script_no</> identifies which script file was used (useful when
    multiple scripts were specified with <option>-f</> or <option>-b</>),
-   and <replaceable>time_epoch</>/<replaceable>time_us</> are a
+   <replaceable>time_epoch</>/<replaceable>time_us</> are a
    Unix-epoch time stamp and an offset
    in microseconds (suitable for creating an ISO 8601
    time stamp with fractional seconds) showing when
-   the transaction completed.
+   the transaction completed,
+   and <replaceable>average_attempts_number</> is the average number of
+   transactions attempts during the current script execution.
    The <replaceable>schedule_lag</> field is the difference between the
    transaction's scheduled start time, and the time it actually started, in
    microseconds. It is only present when the <option>--rate</> option is used.
    When both <option>--rate</> and <option>--latency-limit</> are used,
    the <replaceable>time</> for a skipped transaction will be reported as
    <literal>skipped</>.
+    If a transaction has serialization and/or deadlock failures, its
+   <replaceable>time</> will be reported as <literal>serialization failure</>,
+   <literal>deadlock failure</>, or
+   <literal>serialization and deadlock failures</>, respectively.
   </para>
+  <note>
+   <para>
+     Transactions can have both serialization and deadlock failures if the
+     used script contained several transactions.  See
+     <xref linkend="transactions-and-scripts"
+     endterm="transactions-and-scripts-title"> for more information.
+    </para>
+  </note>
 
   <para>
    Here is a snippet of a log file generated in a single-client run:
 <screen>
-0 199 2241 0 1175850568 995598
-0 200 2465 0 1175850568 998079
-0 201 2513 0 1175850569 608
-0 202 2038 0 1175850569 2663
+0 199 2241 0 1175850568 995598 1
+0 200 2465 0 1175850568 998079 1
+0 201 2513 0 1175850569 608 1
+0 202 2038 0 1175850569 2663 1
 </screen>
 
    Another example with <literal>--rate=100</>
    and <literal>--latency-limit=5</> (note the additional
    <replaceable>schedule_lag</> column):
 <screen>
-0 81 4621 0 1412881037 912698 3005
-0 82 6173 0 1412881037 914578 4304
-0 83 skipped 0 1412881037 914578 5217
-0 83 skipped 0 1412881037 914578 5099
-0 83 4722 0 1412881037 916203 3108
-0 84 4142 0 1412881037 918023 2333
-0 85 2465 0 1412881037 919759 740
+0 81 4621 0 1412881037 912698 1 3005
+0 82 6173 0 1412881037 914578 1 4304
+0 83 skipped 0 1412881037 914578 1 5217
+0 83 skipped 0 1412881037 914578 1 5099
+0 83 4722 0 1412881037 916203 1 3108
+0 84 4142 0 1412881037 918023 1 2333
+0 85 2465 0 1412881037 919759 1 740
 </screen>
    In this example, transaction 82 was late, because its latency (6.173 ms) was
    over the 5 ms limit. The next two transactions were skipped, because they
@@ -1198,6 +1254,22 @@ END;
   </para>
 
   <para>
+   Example with serialization failures (the maximum number of attempts is 10):
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 serialization_failure 0 1499414498 84905 10
+2 0 serialization_failure 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</> option
    can be used to log only a random sample of transactions.
@@ -1212,7 +1284,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional>
+<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <replaceable>num_serialization_failures_transactions</> <replaceable>num_deadlock_failures_transactions</> <replaceable>attempts_count</> <replaceable>attempts_sum</> <replaceable>attempts_sum2</> <replaceable>attempts_min</> <replaceable>attempts_max</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional>
 </synopsis>
 
    where
@@ -1226,7 +1298,14 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</> is the maximum latency within the interval.
+   <replaceable>max_latency</> is the maximum latency within the interval,
+   <replaceable>num_serialization_failures_transactions</> and
+   <replaceable>num_deadlock_failures_transactions</> - the numbers of
+   transactions with the corresponding failures within the interval,
+   <replaceable>attempts_count</>, <replaceable>attempts_sum</>,
+   <replaceable>attempts_sum2</>, <replaceable>attempts_min</> and
+   <replaceable>attempts_max</> - the statistics of transactions attempts within
+   the interval.
    The next fields,
    <replaceable>sum_lag</>, <replaceable>sum_lag_2</>, <replaceable>min_lag</>,
    and <replaceable>max_lag</>, are only present if the <option>--rate</>
@@ -1241,14 +1320,23 @@ END;
    Each transaction is counted in the interval when it was committed.
   </para>
 
+  <note>
+   <para>
+    The number of transactions attempts within the interval can be greater than
+    the number of transactions within this interval multiplied by the maximum
+    attempts number.  See <xref linkend="transactions-and-scripts"
+    endterm="transactions-and-scripts-title"> for more information.
+   </para>
+  </note>
+
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0 5601 5601 5601 1 1
+1345828503 7884 1979812 565806736 60 1479 0 0 7884 7884 7884 1 1
+1345828505 7208 1979422 567277552 59 1391 0 0 7208 7208 7884 1 1
+1345828507 7685 1980268 569784714 60 1398 0 0 7685 7685 7685 1 1
+1345828509 7073 1979779 573489941 236 1411 0 0 7073 7073 7073 1 1
 </screen></para>
 
   <para>
@@ -1260,13 +1348,41 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</> option, <application>pgbench</> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</> option, <application>pgbench</> collects the following
+   statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         the elapsed transaction time of each statement; <application>pgbench</>
+         reports an average of those values, referred to as the latency for each
+         statement;
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         the number of serialization and deadlock failures;
+       </para>
+       <note>
+         <para>The total sum of per-command failures of each type can be greater
+         than the number of transactions with reported failures.
+         See <xref linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"> for more information.
+         </para>
+       </note>
+     </listitem>
+     <listitem>
+       <para>
+         the average number of transaction attempts for command which start this
+         transaction;
+       </para>
+     </listitem>
+   </itemizedlist>
+
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -1274,35 +1390,90 @@ END;
 <screen>
 starting vacuum...end.
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum attempts number: 1
 scaling factor: 1
 query mode: simple
 number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 0 (0.000 %)
+number of transactions with deadlock failures: 0 (0.000 %)
+attempts number average = 1.00
+attempts number stddev = 0.00
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
 script statistics:
- - statement latencies in milliseconds:
-        0.002  \set aid random(1, 100000 * :scale)
-        0.005  \set bid random(1, 1 * :scale)
-        0.002  \set tid random(1, 10 * :scale)
-        0.001  \set delta random(-5000, 5000)
-        0.326  BEGIN;
-        0.603  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-        0.454  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-        5.528  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-        7.335  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-        0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-        1.212  END;
+ - statement latencies in milliseconds, serialization & deadlock failures,
+   numbers of transactions attempts:
+  0.002  0  0   -    \set aid random(1, 100000 * :scale)
+  0.005  0  0   -    \set bid random(1, 1 * :scale)
+  0.002  0  0   -    \set tid random(1, 10 * :scale)
+  0.001  0  0   -    \set delta random(-5000, 5000)
+  0.326  0  0  1.00  BEGIN;
+  0.603  0  0   -    UPDATE pgbench_accounts
+                     SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  0   -    SELECT abalance FROM pgbench_accounts
+                     WHERE aid = :aid;
+  5.528  0  0   -    UPDATE pgbench_tellers
+                     SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  0   -    UPDATE pgbench_branches
+                     SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  0   -    INSERT INTO pgbench_history
+                            (tid, bid, aid, delta, mtime)
+                     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  0   -    END;
+</screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum attempts number: 100
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 10000/10000
+number of transactions with serialization failures: 3599 (35.990 %)
+number of transactions with deadlock failures: 0 (0.000 %)
+attempts number average = 47.54
+attempts number stddev = 44.04
+latency average = 235.795 ms
+latency stddev = 408.854 ms
+tps = 26.694245 (including connections establishing)
+tps = 26.697308 (excluding connections establishing)
+script statistics:
+ - statement latencies in milliseconds, serialization & deadlock failures,
+   numbers of transactions attempts:
+  0.003       0  0    -    \set aid random(1, 100000 * :scale)
+  0.001       0  0    -    \set bid random(1, 1 * :scale)
+  0.001       0  0    -    \set tid random(1, 10 * :scale)
+  0.000       0  0    -    \set delta random(-5000, 5000)
+  4.626       0  0  47.54  BEGIN;
+  1.165       0  0    -    UPDATE pgbench_accounts
+                           SET abalance = abalance + :delta WHERE aid = :aid;
+  0.870       0  0    -    SELECT abalance FROM pgbench_accounts
+                           WHERE aid = :aid;
+  1.060  456156  0    -    UPDATE pgbench_tellers
+                           SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.883   12826  0    -    UPDATE pgbench_branches
+                           SET bbalance = bbalance + :delta WHERE bid = :bid;
+  1.052       0  0    -    INSERT INTO pgbench_history
+                                  (tid, bid, aid, delta, mtime)
+                           VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  4.866      36  0    -    END;
 </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
-   separately for each script file.
+   If multiple script files are specified, the averages and the failures are
+   reported separately for each script file.
   </para>
 
   <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4d364a1..2e84d34 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -58,6 +58,8 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -174,8 +176,12 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies,
+										 * failures and attempts */
 int			main_pid;			/* main process id used in log filename */
+int			max_attempts_number = 1;	/* maximum number of attempts to run the
+										 * transaction with serialization or
+										 * deadlock failures */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -232,11 +238,35 @@ typedef struct StatsData
 	int64		cnt;			/* number of transactions */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		serialization_failures;	/* number of transactions with
+										 * serialization failures */
+	int64		deadlock_failures; /* number of transactions with deadlock
+									* failures */
+	SimpleStats attempts;
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct LastBeginState
+{
+	int			command;		/* command number in script */
+	int			attempts_number;	/* how many times have we tried to run the
+									 * transaction without serialization or
+									 * deadlock failures */
+
+	unsigned short random_state[3];	/* random seed */
+
+	/* client variables */
+	Variable   *variables;		/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} LastBeginState;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -287,6 +317,20 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First of all report about the failure in CSTATE_FAILURE. Then if we
+	 * should end the failed transaction block go to states CSTATE_START_COMMAND
+	 * -> CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with the appropriate command.
+	 * After that if we are able to repeat the failed transaction go to
+	 * CSTATE_RETRY_FAILED_TRANSACTION to set the same parameters for the
+	 * transaction execution as they were in the previous attempts. Otherwise go
+	 * to the next command after the failed transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY_FAILED_TRANSACTION,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -311,6 +355,7 @@ typedef struct
 	PGconn	   *con;			/* connection handle to DB */
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
+	unsigned short random_state[3];	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
@@ -328,6 +373,17 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	bool		serialization_failure;	/* if there was serialization failure
+										 * during script execution */
+	bool		deadlock_failure;	/* if there was deadlock failure during
+									 * script execution */
+
+	/* for repeating transactions with serialization or deadlock failures: */
+	LastBeginState *last_begin_state;
+	bool		end_failed_transaction_block; /* are we ending the failed
+											   * transaction block? */
+	SimpleStats attempts;
+
 	/* per client collected stats */
 	int64		cnt;			/* transaction count */
 	int			ecnt;			/* error count */
@@ -342,7 +398,6 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 
@@ -382,6 +437,16 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		serialization_failures;	/* number of serialization failures in
+										 * this command */
+	int64		deadlock_failures;	/* number of deadlock failures in this
+									 * command */
+	SimpleStats attempts;		/* is valid if command starts a transaction */
+
+	/* for repeating transactions with serialization and deadlock failures: */
+	bool		is_transaction_begin;	/* do we start a transaction? */
+	int			transaction_end;	/* command number to complete transaction
+									 * starting in this command. */
 } Command;
 
 typedef struct ParsedScript
@@ -504,7 +569,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and attempts per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -513,6 +578,8 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-attempts-number=NUM\n"
+		   "                           max number of tries to run transaction (default: 1)\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
@@ -624,7 +691,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(CState *st, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -635,7 +702,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(st->random_state));
 }
 
 /*
@@ -644,7 +711,7 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(CState *st, int64 min, int64 max, double parameter)
 {
 	double		cut,
 				uniform,
@@ -654,7 +721,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(st->random_state);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -667,7 +734,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(CState *st, int64 min, int64 max, double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -695,8 +762,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(st->random_state);
+		double		rand2 = 1.0 - pg_erand48(st->random_state);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -723,7 +790,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(CState *st, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -732,7 +799,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(st->random_state);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -786,24 +853,42 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
+	initSimpleStats(&sd->attempts);
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
 
 /*
- * Accumulate one additional item into the given stats object.
+ * Accumulate statistics regardless of whether there was a failure / transaction
+ * was skipped or not.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumMainStats(StatsData *stats, bool skipped, bool serialization_failure,
+			   bool deadlock_failure, SimpleStats *attempts)
 {
 	stats->cnt++;
-
 	if (skipped)
-	{
-		/* no latency to record on skipped transactions */
 		stats->skipped++;
-	}
-	else
+	else if (serialization_failure)
+		stats->serialization_failures++;
+	else if (deadlock_failure)
+		stats->deadlock_failures++;
+	mergeSimpleStats(&stats->attempts, attempts);
+}
+
+/*
+ * Accumulate one additional item into the given stats object.
+ */
+static void
+accumStats(StatsData *stats, bool skipped, bool serialization_failure,
+		   bool deadlock_failure, double lat, double lag, SimpleStats *attempts)
+{
+	accumMainStats(stats, skipped, serialization_failure, deadlock_failure,
+				   attempts);
+
+	if (!skipped && !serialization_failure && !deadlock_failure)
 	{
 		addToSimpleStats(&stats->latency, lat);
 
@@ -1593,7 +1678,7 @@ evalFunc(TState *thread, CState *st,
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(st, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -1615,7 +1700,7 @@ evalFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(st, imin, imax, param));
 					}
 					else		/* exponential */
 					{
@@ -1628,7 +1713,7 @@ evalFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(st, imin, imax, param));
 					}
 				}
 
@@ -1817,7 +1902,7 @@ commandFailed(CState *st, char *message)
 
 /* return a script number with a weighted choice. */
 static int
-chooseScript(TState *thread)
+chooseScript(CState *st)
 {
 	int			i = 0;
 	int64		w;
@@ -1825,7 +1910,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(st, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -1951,6 +2036,48 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 	return true;
 }
 
+static void
+free_variables_pointers(Variable *variables, int nvariables)
+{
+	Variable   *current;
+
+	for (current = variables; current - variables < nvariables; ++current)
+	{
+		pg_free(current->name);
+		current->name = NULL;
+
+		pg_free(current->value);
+		current->value = NULL;
+	}
+}
+
+/* return a deep copy of variables array */
+static Variable *
+copy_variables(Variable *destination, int destination_nvariables,
+			   const Variable *source, int source_nvariables)
+{
+	Variable   *current_destination;
+	const Variable *current_source;
+
+	free_variables_pointers(destination, destination_nvariables);
+	destination = pg_realloc(destination, sizeof(Variable) * source_nvariables);
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < source_nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->value)
+			current_destination->value = pg_strdup(current_source->value);
+		else
+			current_destination->value = NULL;
+		current_destination->is_numeric = current_source->is_numeric;
+		current_destination->num_value = current_source->num_value;
+	}
+
+	return destination;
+}
+
 /*
  * Advance the state machine of a connection, if possible.
  */
@@ -1962,6 +2089,14 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	bool		serialization_failure = false;
+	bool		deadlock_failure = false;
+	ExecStatusType result_status;
+	char	   *sqlState;
+	int 		last_begin_command;
+	Command    *last_begin;
+	int 		attempts_number;
+	int 		transaction_end;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -1990,7 +2125,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_CHOOSE_SCRIPT:
 
-				st->use_file = chooseScript(thread);
+				st->use_file = chooseScript(st);
 
 				if (debug)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
@@ -2017,7 +2152,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(st, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2049,7 +2184,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(st, throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -2121,6 +2256,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						st->txn_scheduled = INSTR_TIME_GET_MICROSEC(now);
 				}
 
+				/* reset transaction variables to default values */
+				st->serialization_failure = false;
+				st->deadlock_failure = false;
+				initSimpleStats(&st->attempts);
+
 				/* Begin with the first command */
 				st->command = 0;
 				st->state = CSTATE_START_COMMAND;
@@ -2142,11 +2282,39 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					break;
 				}
 
+				/* reset command result variables to default values */
+				serialization_failure = false;
+				deadlock_failure = false;
+
+				if (command->is_transaction_begin && !st->last_begin_state)
+				{
+					/*
+					 * It is a first attempt to run the transaction which begins
+					 * in current command.  Remember its parameters just in case
+					 * we should repeat it in future.
+					 */
+					st->last_begin_state = (LastBeginState *)
+						pg_malloc0(sizeof(LastBeginState));
+
+					st->last_begin_state->command = st->command;
+					st->last_begin_state->attempts_number = 1;
+					memcpy(st->last_begin_state->random_state, st->random_state,
+						   sizeof(unsigned short) * 3);
+
+					st->last_begin_state->variables = copy_variables(
+											st->last_begin_state->variables,
+											st->last_begin_state->nvariables,
+											st->variables,
+											st->nvariables);
+					st->last_begin_state->nvariables = st->nvariables;
+					st->last_begin_state->vars_sorted = st->vars_sorted;
+				}
+
 				/*
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2299,21 +2467,41 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Read and discard the query result;
 				 */
 				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
+				result_status = PQresultStatus(res);
+				sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+				if (sqlState) {
+					serialization_failure =
+						(strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) ==
+						 0);
+					deadlock_failure =
+						strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0;
+
+					if (debug && (serialization_failure || deadlock_failure))
+						fprintf(stderr, "client %d got a %s failure (attempt %d/%d)\n",
+								st->id,
+								(serialization_failure ?
+								 "serialization" :
+								 "deadlock"),
+								st->last_begin_state->attempts_number,
+								max_attempts_number);
+				}
+
+				if (result_status == PGRES_COMMAND_OK ||
+					result_status == PGRES_TUPLES_OK ||
+					result_status == PGRES_EMPTY_QUERY ||
+					serialization_failure ||
+					deadlock_failure)
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, PQerrorMessage(st->con));
-						PQclear(res);
-						st->state = CSTATE_ABORTED;
-						break;
+					/* OK */
+					PQclear(res);
+					discard_response(st);
+					st->state = CSTATE_END_COMMAND;
+				}
+				else
+				{
+					commandFailed(st, PQerrorMessage(st->con));
+					PQclear(res);
+					st->state = CSTATE_ABORTED;
 				}
 				break;
 
@@ -2337,12 +2525,70 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_END_COMMAND:
 
+				if (st->last_begin_state)
+				{
+					last_begin_command = st->last_begin_state->command;
+					last_begin =
+						sql_script[st->use_file].commands[last_begin_command];
+					transaction_end = last_begin->transaction_end;
+					attempts_number = st->last_begin_state->attempts_number;
+
+					if ((st->command == transaction_end) &&
+						((!st->end_failed_transaction_block &&
+						  !serialization_failure &&
+						  !deadlock_failure) ||
+						 attempts_number == max_attempts_number))
+					{
+						/*
+						 * It is the end of transaction and:
+						 * 1) this transaction was successful;
+						 * 2) or this transaction has failed and we will not be
+						 * able to repeat it.
+						 *
+						 * So let's record its number of attempts in statistics
+						 * per-command and for current script execution.  Also
+						 * let's free its begin state because we don't not need
+						 * it anymore.
+						 */
+						if (debug)
+						{
+							char		buffer[256];
+
+							if (serialization_failure ||
+								deadlock_failure ||
+								st->end_failed_transaction_block)
+								snprintf(buffer, sizeof(buffer), "failure");
+							else
+								snprintf(buffer, sizeof(buffer), "successful");
+
+							fprintf(stderr, "client %d ends transaction with %d attempts (%s)\n",
+									st->id, attempts_number, buffer);
+						}
+
+						addToSimpleStats(&last_begin->attempts,
+										 attempts_number);
+						addToSimpleStats(&st->attempts, attempts_number);
+
+						free_variables_pointers(
+											st->last_begin_state->variables,
+											st->last_begin_state->nvariables);
+						pg_free(st->last_begin_state);
+						st->last_begin_state = NULL;
+					}
+				}
+
+				if (serialization_failure || deadlock_failure)
+				{
+					st->state = CSTATE_FAILURE;
+					break;
+				}
+
 				/*
 				 * command completed: accumulate per-command execution times
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2354,8 +2600,102 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 									 INSTR_TIME_GET_DOUBLE(st->stmt_begin));
 				}
 
-				/* Go ahead with next command */
-				st->command++;
+				if (st->end_failed_transaction_block &&
+					attempts_number < max_attempts_number)
+				{
+					st->command = last_begin_command;
+					st->state = CSTATE_RETRY_FAILED_TRANSACTION;
+				}
+				else
+				{
+					/* Go ahead with next command */
+					st->command++;
+					st->state = CSTATE_START_COMMAND;
+				}
+				st->end_failed_transaction_block = false;
+
+				break;
+
+				/*
+				 * Report about failure and end the failed transaction block.
+				 */
+			case CSTATE_FAILURE:
+
+				/*
+				 * Accumulate per-command serialization / deadlock failures
+				 * count in thread-local data structure.
+				 */
+				if (serialization_failure)
+					command->serialization_failures++;
+				if (deadlock_failure)
+					command->deadlock_failures++;
+
+				if (attempts_number == max_attempts_number)
+				{
+					/*
+					 * We will not be able to repeat the failed transaction
+					 * so let's record this failure in the stats for current
+					 * script execution.
+					 */
+					if (serialization_failure)
+						st->serialization_failure = true;
+					else if (deadlock_failure)
+						st->deadlock_failure = true;
+				}
+
+				if (st->command != transaction_end)
+				{
+					/* end the failed transaction block */
+					st->command = transaction_end;
+					st->end_failed_transaction_block = true;
+					st->state = CSTATE_START_COMMAND;
+				}
+				else
+				{
+					/*
+					 * We are not in transaction block.  So let's try to repeat
+					 * the failed transaction or go ahead with next command.
+					 */
+					if (attempts_number < max_attempts_number)
+					{
+						st->command = last_begin_command;
+						st->state = CSTATE_RETRY_FAILED_TRANSACTION;
+					}
+					else
+					{
+						st->command++;
+						st->state = CSTATE_START_COMMAND;
+					}
+				}
+				break;
+
+				/*
+				 * Set the parameters to retry the failed transaction.
+				 */
+			case CSTATE_RETRY_FAILED_TRANSACTION:
+
+				/*
+				 * We assume that transaction attempts number (which should be
+				 * limited by max_attempts_number) was checked earlier.
+				 */
+				if (debug)
+					fprintf(stderr, "client %d repeats the failed transaction (attempt %d/%d)\n",
+							st->id,
+							st->last_begin_state->attempts_number + 1,
+							max_attempts_number);
+
+				st->last_begin_state->attempts_number++;
+				memcpy(st->random_state, st->last_begin_state->random_state,
+					   sizeof(unsigned short) * 3);
+
+				st->variables = copy_variables(
+											st->variables,
+											st->nvariables,
+											st->last_begin_state->variables,
+											st->last_begin_state->nvariables);
+				st->nvariables = st->last_begin_state->nvariables;
+				st->vars_sorted = st->last_begin_state->vars_sorted;
+
 				st->state = CSTATE_START_COMMAND;
 				break;
 
@@ -2372,7 +2712,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					per_script_stats || use_log)
 					processXactStats(thread, st, &now, false, agg);
 				else
-					thread->stats.cnt++;
+					accumMainStats(&thread->stats, false,
+								   st->serialization_failure,
+								   st->deadlock_failure, &st->attempts);
 
 				if (is_connect)
 				{
@@ -2426,6 +2768,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 }
 
 /*
+ * return zero if there is no statistics data because of skipped transactions.
+ */
+static double
+get_average_attempts(const SimpleStats *attempts)
+{
+	return (attempts->count == 0 ? 0 : attempts->sum / attempts->count);
+}
+
+/*
  * Print log entry after completing one transaction.
  *
  * We print Unix-epoch timestamps in the log, so that entries can be
@@ -2446,7 +2797,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(st->random_state) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -2462,13 +2813,20 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->serialization_failures,
+					agg->deadlock_failures,
+					agg->attempts.count,
+					agg->attempts.sum,
+					agg->attempts.sum2,
+					agg->attempts.min,
+					agg->attempts.max);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -2486,22 +2844,34 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, st->serialization_failure,
+				   st->deadlock_failure, latency, lag, &st->attempts);
 	}
 	else
 	{
 		/* no, print raw transactions */
 		struct timeval tv;
+		char		transaction_label[256];
+		double		attempts_avg = get_average_attempts(&st->attempts);
 
-		gettimeofday(&tv, NULL);
 		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
-					(long) tv.tv_sec, (long) tv.tv_usec);
+			snprintf(transaction_label, sizeof(transaction_label), "skipped");
+		else if (st->serialization_failure && st->deadlock_failure)
+			snprintf(transaction_label, sizeof(transaction_label),
+					 "serialization_and_deadlock_failures");
+		else if (st->serialization_failure || st->deadlock_failure)
+			snprintf(transaction_label, sizeof(transaction_label), "%s_failure",
+					 st->serialization_failure ? "serialization" : "deadlock");
+
+		gettimeofday(&tv, NULL);
+		if (skipped || st->serialization_failure || st->deadlock_failure)
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d %ld %ld %.0f",
+					st->id, st->cnt, transaction_label, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec, attempts_avg);
 		else
-			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
+			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld %.0f",
 					st->id, st->cnt, latency, st->use_file,
-					(long) tv.tv_sec, (long) tv.tv_usec);
+					(long) tv.tv_sec, (long) tv.tv_usec, attempts_avg);
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
 		fputc('\n', logfile);
@@ -2523,7 +2893,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
 		INSTR_TIME_SET_CURRENT(*now);
 
-	if (!skipped)
+	if (!skipped && !st->serialization_failure && !st->deadlock_failure)
 	{
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
@@ -2532,21 +2902,25 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	if (progress || throttle_delay || latency_limit)
 	{
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, st->serialization_failure,
+				   st->deadlock_failure, latency, lag, &st->attempts);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
 			thread->latency_late++;
 	}
 	else
-		thread->stats.cnt++;
+		accumMainStats(&thread->stats, skipped, st->serialization_failure,
+					   st->deadlock_failure, &st->attempts);
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped,
+				   st->serialization_failure, st->deadlock_failure, latency,
+				   lag, &st->attempts);
 }
 
 
@@ -2985,6 +3359,9 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
+	initSimpleStats(&my_command->attempts);
 
 	/*
 	 * If SQL command is multi-line, we only want to save the first line as
@@ -3054,6 +3431,9 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	my_command->type = META_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
+	initSimpleStats(&my_command->attempts);
 
 	/* Save first word (command name) */
 	j = 0;
@@ -3185,6 +3565,60 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 }
 
 /*
+ * Returns the same command where all сontinuous blocks of whitespaces are
+ * replaced by one space symbol.
+ *
+ * Returns a malloc'd string.
+ */
+static char *
+normalize_whitespaces(const char *command)
+{
+	const char *ptr = command;
+	char	   *buffer = pg_malloc(strlen(command) + 1);
+	int			length = 0;
+
+	while (*ptr)
+	{
+		while (*ptr && !isspace((unsigned char) *ptr))
+			buffer[length++] = *(ptr++);
+		if (isspace((unsigned char) *ptr))
+		{
+			buffer[length++] = ' ';
+			while (isspace((unsigned char) *ptr))
+				ptr++;
+		}
+	}
+	buffer[length] = '\0';
+
+	return buffer;
+}
+
+/*
+ * Returns true if given command generally ends a transaction block (we don't
+ * check here if the last transaction block is already completed).
+ */
+static bool
+is_transaction_block_end(const char *command_text)
+{
+	bool		result = false;
+	char	   *command = normalize_whitespaces(command_text);
+
+	if (pg_strncasecmp(command, "end", 3) == 0 ||
+		(pg_strncasecmp(command, "commit", 6) == 0 &&
+		 pg_strncasecmp(command, "commit prepared", 15) != 0) ||
+		(pg_strncasecmp(command, "rollback", 8) == 0 &&
+		 pg_strncasecmp(command, "rollback prepared", 17) != 0 &&
+		 pg_strncasecmp(command, "rollback to", 11) != 0) ||
+		(pg_strncasecmp(command, "prepare transaction ", 20) == 0 &&
+		 pg_strncasecmp(command, "prepare transaction (", 21) != 0 &&
+		 pg_strncasecmp(command, "prepare transaction as ", 23) != 0))
+		result = true;
+
+	pg_free(command);
+	return result;
+}
+
+/*
  * Parse a script (either the contents of a file, or a built-in script)
  * and add it to the list of scripts.
  */
@@ -3196,6 +3630,8 @@ ParseScript(const char *script, const char *desc, int weight)
 	PQExpBufferData line_buf;
 	int			alloc_num;
 	int			index;
+	int			last_transaction_block_begin = -1;
+	bool		transaction_block_completed = true;
 
 #define COMMANDS_ALLOC_NUM 128
 	alloc_num = COMMANDS_ALLOC_NUM;
@@ -3238,6 +3674,8 @@ ParseScript(const char *script, const char *desc, int weight)
 		command = process_sql_command(&line_buf, desc);
 		if (command)
 		{
+			char	   *command_text = command->argv[0];
+
 			ps.commands[index] = command;
 			index++;
 
@@ -3247,6 +3685,36 @@ ParseScript(const char *script, const char *desc, int weight)
 				ps.commands = (Command **)
 					pg_realloc(ps.commands, sizeof(Command *) * alloc_num);
 			}
+
+			/* check if there's the begin of new transaction */
+			if (transaction_block_completed)
+			{
+				/*
+				 * Each sql command outside of transaction block either starts a
+				 * new transaction block or is run as separate transaction.
+				 */
+				command->is_transaction_begin = true;
+
+				if (pg_strncasecmp(command_text, "begin", 5) == 0 ||
+					pg_strncasecmp(command_text, "start", 5) == 0)
+				{
+					last_transaction_block_begin = index - 1;
+					transaction_block_completed = false;
+				}
+				else
+				{
+					command->transaction_end = index - 1;
+				}
+			}
+
+			/* check if command ends the transaction block */
+			if (!transaction_block_completed &&
+				is_transaction_block_end(command_text))
+			{
+				ps.commands[last_transaction_block_begin]->transaction_end =
+					index - 1;
+				transaction_block_completed = true;
+			}
 		}
 
 		/* If we reached a backslash, process that */
@@ -3272,6 +3740,13 @@ ParseScript(const char *script, const char *desc, int weight)
 			break;
 	}
 
+	if (!transaction_block_completed)
+	{
+		fprintf(stderr, "script \"%s\": last transaction block is not completed\n",
+				desc);
+		exit(1);
+	}
+
 	ps.commands[index] = NULL;
 
 	addScript(ps);
@@ -3474,14 +3949,34 @@ addScript(ParsedScript script)
 }
 
 static void
-printSimpleStats(char *prefix, SimpleStats *ss)
+printSimpleStats(char *prefix, SimpleStats *ss, bool print_zeros, double factor,
+				 unsigned int decimals_number, char *unit_of_measure)
 {
-	/* print NaN if no transactions where executed */
-	double		latency = ss->sum / ss->count;
-	double		stddev = sqrt(ss->sum2 / ss->count - latency * latency);
+	double		average;
+	double		stddev;
+	char		buffer[256];
+
+	if (print_zeros && ss->count == 0)
+	{
+		average = 0;
+		stddev = 0;
+	}
+	else
+	{
+		/* print NaN if no transactions where executed */
+		average = ss->sum / ss->count;
+		stddev = sqrt(ss->sum2 / ss->count - average * average);
+	}
+
+	if (strlen(unit_of_measure) == 0)
+		snprintf(buffer, sizeof(buffer), "%s %%s = %%.%df\n",
+				 prefix, decimals_number);
+	else
+		snprintf(buffer, sizeof(buffer), "%s %%s = %%.%df %s\n",
+				 prefix, decimals_number, unit_of_measure);
 
-	printf("%s average = %.3f ms\n", prefix, 0.001 * latency);
-	printf("%s stddev = %.3f ms\n", prefix, 0.001 * stddev);
+	printf(buffer, "average", factor * average);
+	printf(buffer, "stddev", factor * stddev);
 }
 
 /* print out results */
@@ -3522,6 +4017,16 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (total->cnt <= 0)
 		return;
 
+	printf("number of transactions with serialization failures: " INT64_FORMAT " (%.3f %%)\n",
+		   total->serialization_failures,
+		   (100.0 * total->serialization_failures / total->cnt));
+
+	printf("number of transactions with deadlock failures: " INT64_FORMAT " (%.3f %%)\n",
+		   total->deadlock_failures,
+		   (100.0 * total->deadlock_failures / total->cnt));
+
+	printSimpleStats("attempts number", &total->attempts, true, 1, 2, "");
+
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
@@ -3533,7 +4038,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			   100.0 * latency_late / (total->skipped + total->cnt));
 
 	if (throttle_delay || progress || latency_limit)
-		printSimpleStats("latency", &total->latency);
+		printSimpleStats("latency", &total->latency, false, 0.001, 3, "ms");
 	else
 	{
 		/* no measurement, show average latency computed from run time */
@@ -3557,22 +4062,34 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || latency_limit || is_latencies)
+	if (per_script_stats || latency_limit || report_per_command)
 	{
 		int			i;
 
 		for (i = 0; i < num_scripts; i++)
 		{
 			if (num_scripts > 1)
+			{
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
-					   " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n",
+					   " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n"
+					   " - number of transactions with serialization failures: " INT64_FORMAT " (%.3f%%)\n"
+					   " - number of transactions with deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
 					   i + 1, sql_script[i].desc,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sql_script[i].stats.cnt,
 					   100.0 * sql_script[i].stats.cnt / total->cnt,
-					   sql_script[i].stats.cnt / time_include);
+					   sql_script[i].stats.cnt / time_include,
+					   sql_script[i].stats.serialization_failures,
+					   (100.0 * sql_script[i].stats.serialization_failures /
+						sql_script[i].stats.cnt),
+					   sql_script[i].stats.deadlock_failures,
+					   (100.0 * sql_script[i].stats.deadlock_failures /
+						sql_script[i].stats.cnt));
+				printSimpleStats(" - attempts number", &total->attempts, true,
+								 1, 2, "");
+			}
 			else
 				printf("script statistics:\n");
 
@@ -3583,22 +4100,39 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   (sql_script[i].stats.skipped + sql_script[i].stats.cnt));
 
 			if (num_scripts > 1)
-				printSimpleStats(" - latency", &sql_script[i].stats.latency);
-
-			/* Report per-command latencies */
-			if (is_latencies)
+				printSimpleStats(" - latency", &sql_script[i].stats.latency,
+								 false, 0.001, 3, "ms");
+
+			/*
+			 * Report per-command statistics: latencies, serialization &
+			 * deadlock failures.
+			 */
+			if (report_per_command)
 			{
 				Command   **commands;
 
-				printf(" - statement latencies in milliseconds:\n");
+				printf(" - statement latencies in milliseconds, serialization & deadlock failures, numbers of transactions attempts:\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
 					 commands++)
-					printf("   %11.3f  %s\n",
+				{
+					char		buffer[256];
+
+					if ((*commands)->is_transaction_begin)
+						snprintf(buffer, sizeof(buffer), "%8.2f",
+								 get_average_attempts(&(*commands)->attempts));
+					else
+						snprintf(buffer, sizeof(buffer), "     -  ");
+
+					printf("   %11.3f  %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d  %s  %s\n",
 						   1000.0 * (*commands)->stats.sum /
 						   (*commands)->stats.count,
+						   (*commands)->serialization_failures,
+						   (*commands)->deadlock_failures,
+						   buffer,
 						   (*commands)->line);
+				}
 			}
 		}
 	}
@@ -3627,7 +4161,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -3645,6 +4179,7 @@ main(int argc, char **argv)
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
+		{"max-attempts-number", required_argument, NULL, 8},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3787,7 +4322,7 @@ main(int argc, char **argv)
 			case 'r':
 				benchmarking_option_set = true;
 				per_script_stats = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -3991,6 +4526,16 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				logfile_prefix = pg_strdup(optarg);
 				break;
+			case 8:
+				benchmarking_option_set = true;
+				max_attempts_number = atoi(optarg);
+				if (max_attempts_number <= 0)
+				{
+					fprintf(stderr, "invalid number of maximum attempts: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4259,9 +4804,6 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		initStats(&thread->stats, 0);
@@ -4340,6 +4882,9 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
+		mergeSimpleStats(&stats.attempts, &thread->stats.attempts);
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -4422,6 +4967,11 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 				goto done;
+
+			/* set random state */
+			state[i].random_state[0] = random();
+			state[i].random_state[1] = random();
+			state[i].random_state[2] = random();
 		}
 	}
 
@@ -4613,12 +5163,15 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report;
+				int64		attempts_count;
 				double		tps,
 							total_run,
 							latency,
 							sqlat,
 							lag,
-							stdev;
+							latency_stdev,
+							attempts_average,
+							attempts_stddev;
 				char		tbuf[64];
 
 				/*
@@ -4639,6 +5192,10 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.serialization_failures +=
+						thread[i].stats.serialization_failures;
+					cur.deadlock_failures += thread[i].stats.deadlock_failures;
+					mergeSimpleStats(&cur.attempts, &thread[i].stats.attempts);
 				}
 
 				total_run = (now - thread_start) / 1000000.0;
@@ -4647,10 +5204,26 @@ threadRun(void *arg)
 					(cur.cnt - last.cnt);
 				sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2)
 					/ (cur.cnt - last.cnt);
-				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
+				latency_stdev = 0.001 *
+					sqrt(sqlat - 1000000.0 * latency * latency);
 				lag = 0.001 * (cur.lag.sum - last.lag.sum) /
 					(cur.cnt - last.cnt);
 
+				attempts_count = cur.attempts.count - last.attempts.count;
+				if (attempts_count == 0)
+				{
+					attempts_average = 0;
+					attempts_stddev = 0;
+				}
+				else
+				{
+					attempts_average = (cur.attempts.sum - last.attempts.sum) /
+						attempts_count;
+					attempts_stddev = sqrt(
+						(cur.attempts.sum2 - last.attempts.sum2) /
+						attempts_count - attempts_average * attempts_average);
+				}
+
 				if (progress_timestamp)
 				{
 					/*
@@ -4669,8 +5242,16 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 
 				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
+						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, failed trx: " INT64_FORMAT " (serialization), " INT64_FORMAT " (deadlocks), attempts avg %.2f stddev %.2f",
+						tbuf,
+						tps,
+						latency,
+						latency_stdev,
+						(cur.serialization_failures -
+						 last.serialization_failures),
+						(cur.deadlock_failures - last.deadlock_failures),
+						attempts_average,
+						attempts_stddev);
 
 				if (throttle_delay)
 				{
diff --git a/src/bin/pgbench/t/002_serialization_errors.pl b/src/bin/pgbench/t/002_serialization_errors.pl
new file mode 100644
index 0000000..3c89484
--- /dev/null
+++ b/src/bin/pgbench/t/002_serialization_errors.pl
@@ -0,0 +1,121 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+use constant
+{
+	READ_COMMITTED  => 0,
+	REPEATABLE_READ => 1,
+	SERIALIZABLE    => 2,
+};
+
+my @isolation_level_sql = ('read committed', 'repeatable read', 'serializable');
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# Test concurrent update in table row with different default transaction
+# isolation levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+my $script = $node->basedir . '/pgbench_script';
+append_to_file($script,
+		"BEGIN;\n"
+	  . "\\set delta random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;");
+
+sub test_pgbench
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $stderr);
+
+	# Open the psql session and run the parallel transaction:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql = "update xy set y = y + 1 where x = 1;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (qw(pgbench --no-vacuum --file), $script);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$stderr;
+
+	# Let pgbench run the update command in the transaction:
+	sleep 10;
+
+	# In psql, commit the transaction and end the session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+	is($stderr,  '', "@command no stderr");
+
+	like($out_pgbench,
+		qr{processed: 10/10},
+		"concurrent update: $isolation_level_sql: check processed transactions");
+
+	my $regex =
+		($isolation_level == READ_COMMITTED)
+	  ? qr{serialization failures: 0 \(0\.000 %\)}
+	  : qr{serialization failures: [1-9]\d* \([1-9]\d*\.\d* %\)};
+
+	like($out_pgbench,
+		$regex,
+		"concurrent update: $isolation_level_sql: check serialization failures");
+}
+
+test_pgbench(READ_COMMITTED);
+test_pgbench(REPEATABLE_READ);
+test_pgbench(SERIALIZABLE);
diff --git a/src/bin/pgbench/t/003_deadlock_errors.pl b/src/bin/pgbench/t/003_deadlock_errors.pl
new file mode 100644
index 0000000..8d92f78
--- /dev/null
+++ b/src/bin/pgbench/t/003_deadlock_errors.pl
@@ -0,0 +1,130 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 21;
+
+use constant
+{
+	READ_COMMITTED  => 0,
+	REPEATABLE_READ => 1,
+	SERIALIZABLE    => 2,
+};
+
+my @isolation_level_sql = ('read committed', 'repeatable read', 'serializable');
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# Test concurrent deadlock updates in table with different default transaction
+# isolation levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+	    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+	  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script1 = $node->basedir . '/pgbench_script1';
+append_to_file($script1,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "SELECT pg_sleep(20);\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;");
+
+my $script2 = $node->basedir . '/pgbench_script2';
+append_to_file($script2,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;");
+
+sub test_pgbench
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Run first pgbench
+	my @command1 = (qw(pgbench --no-vacuum --transactions=1 --file), $script1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Let pgbench run first update command in the transaction:
+	sleep 10;
+
+	# Run second pgbench
+	my @command2 = (qw(pgbench --no-vacuum --transactions=1 --file), $script2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Get all pgbench results
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	is($err1,  '', "@command1 no stderr");
+	is($err2,  '', "@command2 no stderr");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": pgbench 1: check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": pgbench 2: check processed transactions");
+
+	# First or second pgbench should get a deadlock error
+	like($out1 . $out2,
+		qr{deadlock failures: 1 \(100\.000 %\)},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": check deadlock failures");
+}
+
+test_pgbench(READ_COMMITTED);
+test_pgbench(REPEATABLE_READ);
+test_pgbench(SERIALIZABLE);
diff --git a/src/bin/pgbench/t/004_retry_failed_transactions.pl b/src/bin/pgbench/t/004_retry_failed_transactions.pl
new file mode 100644
index 0000000..01a9ab2
--- /dev/null
+++ b/src/bin/pgbench/t/004_retry_failed_transactions.pl
@@ -0,0 +1,280 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 30;
+
+use constant
+{
+	READ_COMMITTED  => 0,
+	REPEATABLE_READ => 1,
+	SERIALIZABLE    => 2,
+};
+
+my @isolation_level_sql = ('read committed', 'repeatable read', 'serializable');
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# Test concurrent update in table row with different default transaction
+# isolation levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script = $node->basedir . '/pgbench_script';
+append_to_file($script,
+		"BEGIN;\n"
+	  . "\\set delta random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;");
+
+my $script1 = $node->basedir . '/pgbench_script1';
+append_to_file($script1,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "SELECT pg_sleep(20);\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;");
+
+my $script2 = $node->basedir . '/pgbench_script2';
+append_to_file($script2,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;");
+
+sub test_pgbench_serialization_failures
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $stderr);
+
+	# Open the psql session and run the parallel transaction:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql ="begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql = "update xy set y = y + 1 where x = 1;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --max-attempts-number 2 --debug --file),
+		$script);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$stderr;
+
+	# Let pgbench run the update command in the transaction:
+	sleep 10;
+
+	# In psql, commit the transaction and end the session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 10/10},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check processed transactions");
+
+	like($out_pgbench,
+		qr{serialization failures: 0 \(0\.000 %\)},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check serialization failures");
+
+	my $pattern =
+		"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = 1;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a serialization failure \\(attempt 1/2\\)\n"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(attempt 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 WHERE x = 1;";
+
+	like($stderr,
+		qr{$pattern},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check the retried transaction");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Run first pgbench
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions=1  --max-attempts-number=2),
+		qw(--debug --file), $script1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Let pgbench run first update command in the transaction:
+	sleep 10;
+
+	# Run second pgbench
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions=1  --max-attempts-number=2),
+		qw(--debug --file), $script2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Get all pgbench results
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 1: check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 2: check processed transactions");
+
+	like($out1,
+		qr{deadlock failures: 0 \(0\.000 %\)},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 1: check deadlock failures");
+	like($out2,
+		qr{deadlock failures: 0 \(0\.000 %\)},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 2: check deadlock failures");
+
+	# First or second pgbench should get a deadlock error
+	like($err1 . $err2,
+		qr{client 0 got a deadlock failure},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": check deadlock failure in debug logs");
+
+	if ($isolation_level == READ_COMMITTED)
+	{
+		my $pattern =
+			"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = (\\d);\n"
+		  . "(client 0 receiving\n)+"
+		  . "(|client 0 sending SELECT pg_sleep\\(20\\);\n)"
+		  . "\\g3*"
+		  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = (\\d);\n"
+		  . "\\g3+"
+		  . "client 0 got a deadlock failure \\(attempt 1/2\\)\n"
+		  . "client 0 sending END;\n"
+		  . "\\g3+"
+		  . "client 0 repeats the failed transaction \\(attempt 2/2\\)\n"
+		  . "client 0 sending BEGIN;\n"
+		  . "\\g3+"
+		  . "client 0 executing \\\\set delta1\n"
+		  . "client 0 executing \\\\set delta2\n"
+		  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 WHERE x = \\g2;\n"
+		  . "\\g3+"
+		  . "\\g4"
+		  . "\\g3*"
+		  . "client 0 sending UPDATE xy SET y = y \\+ \\g5 WHERE x = \\g6;\n";
+
+		like($err1 . $err2,
+			qr{$pattern},
+			"concurrent deadlock update with retrying: "
+		  . $isolation_level_sql
+		  . ": check the retried transaction");
+	}
+}
+
+test_pgbench_serialization_failures(REPEATABLE_READ);
+test_pgbench_serialization_failures(SERIALIZABLE);
+
+test_pgbench_deadlock_failures(READ_COMMITTED);
+test_pgbench_deadlock_failures(REPEATABLE_READ);
+test_pgbench_deadlock_failures(SERIALIZABLE);
-- 
1.9.1

#21

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#20)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

There's the second version of my patch for pgbench. Now transactions
with serialization and deadlock failures are rolled back and retried
until they end successfully or their number of attempts reaches maximum.

In details:
- You can set the maximum number of attempts by the appropriate
benchmarking option (--max-attempts-number). Its default value is 1
partly because retrying of shell commands can produce new errors.

- Statistics of attempts and failures is printed in progress, in
transaction / aggregation logs and in the end with other results (all
and for each script). The transaction failure is reported here only if
the last retry of this transaction fails.

- Also failures and average numbers of transactions attempts are printed
per-command with average latencies if you use the appropriate
benchmarking option (--report-per-command, -r) (it replaces the option
--report-latencies as I was advised here [1]). Average numbers of
transactions attempts are printed only for commands which start
transactions.

As usual: TAP tests for new functionality and changed documentation with
new examples.

Here are a round of comments on the current version of the patch:

* About the feature

There is a latent issue about what is a transaction. For pgbench a transaction is a full script execution.
For postgresql, it is a statement or a BEGIN/END block, several of which may appear in a script. From a retry
perspective, you may retry from a SAVEPOINT within a BEGIN/END block... I'm not sure how to make general sense
of all this, so this is just a comment without attached action for now.

As the default is not to retry, which is the upward compatible behavior, I think that the changes should not
change much the current output bar counting the number of failures.

I would consider using "try/tries" instead of "attempt/attempts" as it is shorter. An English native speaker
opinion would be welcome on that point.

* About the code

ISTM that the code interacts significantly with various patches under review or ready for committers.
Not sure how to deal with that, there will be some rebasing work...

I'm fine with renaming "is_latencies" to "report_per_command", which is more logical & generic.

"max_attempt_number": I'm against typing fields again in their name, aka "hungarian naming". I'd suggest
"max_tries" or "max_attempts".

"SimpleStats attempts": I disagree with using this floating poiunt oriented structures to count integers.
I would suggest "int64 tries" instead, which should be enough for the
purpose.

LastBeginState -> RetryState? I'm not sure why this state is a pointer in
CState. Putting the struct would avoid malloc/free cycles. Index "-1" may
be used to tell it is not set if necessary.

"CSTATE_RETRY_FAILED_TRANSACTION" -> "CSTATE_RETRY" is simpler and clear enough.

In CState and some code, a failure is a failure, maybe one boolean would
be enough. It need only be differentiated when counting, and you have
(deadlock_failure || serialization_failure) everywhere.

Some variables, such as "int attempt_number", should be in the client
structure, not in the client? Generally, try to use block variables if
possible to keep the state clearly disjoints. If there could be NO new
variable at the doCustom level that would be great, because that would
ensure that there is no machine state mixup hidden in these variables.

I wondering whether the RETRY & FAILURE states could/should be merged:

on RETRY:
-> count retry
-> actually retry if < max_tries (reset client state, jump to command)
-> else count failure and skip to end of script

The start and end of transaction detection seem expensive (malloc, ...)
and assume a one statement per command (what about "BEGIN \; ... \;
COMMIT;", which is not necessarily the case, this limitation should be
documented. ISTM that the space normalization should be avoided, and
something simpler/lighter should be devised? Possibly it should consider
handling SAVEPOINT.

I disagree about exit in ParseScript if the transaction block is not
completed, especially as it misses out on combined statements/queries
(BEGIN \; stuff... \; COMMIT") and would break an existing feature.

There are strange characters things in comments, eg "??ontinuous".

Option "max-attempt-number" -> "max-tries"

I would put the client random state initialization with the state
intialization, not with the connection.

* About tracing

Progress is expected to be short, not detailed. Only add the number of
failures and retries if max retry is not 1.

* About reporting

I think that too much is reported. I advised to do that, but nevertheless
it is a little bit steep.

At least, it should not report the number of tries/attempts when the max
number is one. Simple counting should be reported for failures, not
floats...

I would suggest a more compact one-line report about failures:

"number of failures: 12 (0.001%, deadlock: 7, serialization: 5)"

* About the TAP tests

They are too expensive, with 3 initdb. I think that they should be
integrated in the existing tests, as a patch has been submitted to rework
the whole pgbench tap test infrastructure.

For now, at most one initdb and several small tests inside.

* About the documentation

I'm not sure that the feature needs pre-emminence in the documentation,
because most of the time there is no retry as none is needed, there is no
failure, so this rather a special (although useful) case for people
playing with serializable and other advanced features.

Smaller updates, without dedicated examples, should be enough.

If a transaction is skipped, there was no tries, so the corresponding
number of attempts is 0, not one.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Fabien COELHO (#21)

Re: WIP Patch: Pgbench Serialization and deadlock errors

LastBeginState -> RetryState? I'm not sure why this state is a pointer in
CState. Putting the struct would avoid malloc/free cycles. Index "-1" may be
used to tell it is not set if necessary.

Another detail I forgot about this point: there may be a memory leak on
variables copies, ISTM that the "variables" array is never freed.

I was not convinced by the overall memory management around variables to
begin with, and it is even less so with their new copy management. Maybe
having a clean "Variables" data structure could help improve the
situation.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#21)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Here are a round of comments on the current version of the patch:

Thank you very much again!

There is a latent issue about what is a transaction. For pgbench a
transaction is a full script execution.
For postgresql, it is a statement or a BEGIN/END block, several of
which may appear in a script. From a retry
perspective, you may retry from a SAVEPOINT within a BEGIN/END
block... I'm not sure how to make general sense
of all this, so this is just a comment without attached action for now.

Yes it is. That's why I wrote several notes about it in documentation
where there may be a misunderstanding:

+        Transactions with serialization or deadlock failures (or with 
both
+        of them if used script contains several transactions; see
+        <xref linkend="transactions-and-scripts"
+        endterm="transactions-and-scripts-title"> for more information) 
are
+        marked separately and their time is not reported as for skipped
+        transactions.

+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the 
<quote>Transaction</> Actually Performed in 
<application>pgbench</application>?</title>

+    If a transaction has serialization and/or deadlock failures, its
+   <replaceable>time</> will be reported as <literal>serialization 
failure</>,
+   <literal>deadlock failure</>, or
+   <literal>serialization and deadlock failures</>, respectively.
    </para>
+  <note>
+   <para>
+     Transactions can have both serialization and deadlock failures if 
the
+     used script contained several transactions.  See
+     <xref linkend="transactions-and-scripts"
+     endterm="transactions-and-scripts-title"> for more information.
+    </para>
+  </note>

+  <note>
+   <para>
+    The number of transactions attempts within the interval can be 
greater than
+    the number of transactions within this interval multiplied by the 
maximum
+    attempts number.  See <xref linkend="transactions-and-scripts"
+    endterm="transactions-and-scripts-title"> for more information.
+   </para>
+  </note>

+       <note>
+         <para>The total sum of per-command failures of each type can 
be greater
+         than the number of transactions with reported failures.
+         See <xref linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"> for more 
information.
+         </para>
+       </note>

And I didn't make rollbacks to savepoints after the failure because they
cannot help for serialization failures at all: after rollback to
savepoint a new attempt will be always unsuccessful.

I would consider using "try/tries" instead of "attempt/attempts" as it
is shorter. An English native speaker
opinion would be welcome on that point.

Thank you, I'll change it.

I'm fine with renaming "is_latencies" to "report_per_command", which
is more logical & generic.

Glad to hear it!

"max_attempt_number": I'm against typing fields again in their name,
aka "hungarian naming". I'd suggest
"max_tries" or "max_attempts".

Ok!

"SimpleStats attempts": I disagree with using this floating poiunt
oriented structures to count integers.
I would suggest "int64 tries" instead, which should be enough for the
purpose.

I'm not sure that it is enough. Firstly it may be several transactions
in script so to count the average attempts number you should know the
total number of runned transactions. Secondly I think that stddev for
attempts number can be quite interesting and often it is not close to
zero.

LastBeginState -> RetryState? I'm not sure why this state is a pointer
in CState. Putting the struct would avoid malloc/free cycles. Index
"-1" may be used to tell it is not set if necessary.

Thanks, I agree that it's better to do in this way.

"CSTATE_RETRY_FAILED_TRANSACTION" -> "CSTATE_RETRY" is simpler and
clear enough.

Ok!

In CState and some code, a failure is a failure, maybe one boolean
would be enough. It need only be differentiated when counting, and you
have (deadlock_failure || serialization_failure) everywhere.

I agree with you. I'll change it.

Some variables, such as "int attempt_number", should be in the client
structure, not in the client? Generally, try to use block variables if
possible to keep the state clearly disjoints. If there could be NO new
variable at the doCustom level that would be great, because that would
ensure that there is no machine state mixup hidden in these variables.

Do you mean the code cleanup for doCustom function? Because if I do so
there will be two code styles for state blocks and their variables in
this function..

I wondering whether the RETRY & FAILURE states could/should be merged:

on RETRY:
-> count retry
-> actually retry if < max_tries (reset client state, jump to
command)
-> else count failure and skip to end of script

The start and end of transaction detection seem expensive (malloc,
...) and assume a one statement per command (what about "BEGIN \; ...
\; COMMIT;", which is not necessarily the case, this limitation should
be documented. ISTM that the space normalization should be avoided,
and something simpler/lighter should be devised? Possibly it should
consider handling SAVEPOINT.

I divided these states because if there's a failed transaction block you
should end it before retrying. It means to go to states
CSTATE_START_COMMAND -> CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with
the appropriate command. How do you propose not to go to these states?

About malloc - I agree with you that it should be done without
malloc/free.

About savepoints - as I wrote you earlier I didn't make rollbacks to
savepoints after the failure. Because they cannot help for serialization
failures at all: after rollback to savepoint a new attempt will be
always unsuccessful.

I disagree about exit in ParseScript if the transaction block is not
completed, especially as it misses out on combined statements/queries
(BEGIN \; stuff... \; COMMIT") and would break an existing feature.

Thanks, I'll fix it for usual transaction blocks that don't end in the
scripts.

There are strange characters things in comments, eg "??ontinuous".

Oh, I'm sorry. I'll fix it too.

Option "max-attempt-number" -> "max-tries"

I would put the client random state initialization with the state
intialization, not with the connection.

* About tracing

Progress is expected to be short, not detailed. Only add the number of
failures and retries if max retry is not 1.

Ok!

* About reporting

I think that too much is reported. I advised to do that, but
nevertheless it is a little bit steep.

At least, it should not report the number of tries/attempts when the
max number is one.

Ok!

Simple counting should be reported for failures,
not floats...

I would suggest a more compact one-line report about failures:

"number of failures: 12 (0.001%, deadlock: 7, serialization: 5)"

I think, there may be a misunderstanding. Because script can contain
several transactions and get both failures.

* About the TAP tests

They are too expensive, with 3 initdb. I think that they should be
integrated in the existing tests, as a patch has been submitted to
rework the whole pgbench tap test infrastructure.

For now, at most one initdb and several small tests inside.

Ok!

* About the documentation

I'm not sure that the feature needs pre-emminence in the
documentation, because most of the time there is no retry as none is
needed, there is no failure, so this rather a special (although
useful) case for people playing with serializable and other advanced
features.

Smaller updates, without dedicated examples, should be enough.

Maybe there should be some examples to prepare people what they can see
in the output of the program? Of course now failures are special cases
because they disconnect its clients to the end of the program and ruin
all the results. I hope that if this patch is committed there will be
much more cases with retried failures.

If a transaction is skipped, there was no tries, so the corresponding
number of attempts is 0, not one.

Oh, I'm sorry, it is a typo in the documentation.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#22)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Another detail I forgot about this point: there may be a memory leak
on variables copies, ISTM that the "variables" array is never freed.

I was not convinced by the overall memory management around variables
to begin with, and it is even less so with their new copy management.
Maybe having a clean "Variables" data structure could help improve the
situation.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#23)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello,

[...] I didn't make rollbacks to savepoints after the failure because
they cannot help for serialization failures at all: after rollback to
savepoint a new attempt will be always unsuccessful.

Not necessarily? It depends on where the locks triggering the issue are
set, if they are all set after the savepoint it could work on a second
attempt.

"SimpleStats attempts": I disagree with using this floating poiunt
oriented structures to count integers. I would suggest "int64 tries"
instead, which should be enough for the purpose.

I'm not sure that it is enough. Firstly it may be several transactions in
script so to count the average attempts number you should know the total
number of runned transactions. Secondly I think that stddev for attempts
number can be quite interesting and often it is not close to zero.

I would prefer to have a real motivation to add this complexity in the
report and in the code. Without that, a simple int seems better for now.
It can be improved later if the need really arises.

Some variables, such as "int attempt_number", should be in the client
structure, not in the client? Generally, try to use block variables if
possible to keep the state clearly disjoints. If there could be NO new
variable at the doCustom level that would be great, because that would
ensure that there is no machine state mixup hidden in these variables.

Do you mean the code cleanup for doCustom function? Because if I do so there
will be two code styles for state blocks and their variables in this
function..

I think that any variable shared between state is a recipee for bugs if it
is not reset properly, so they should be avoided. Maybe there are already
too many of them, then too bad, not a reason to add more. The status
before the automaton was a nightmare.

I wondering whether the RETRY & FAILURE states could/should be merged:

I divided these states because if there's a failed transaction block you
should end it before retrying.

Hmmm. Maybe I'm wrong. I'll think about it.

I would suggest a more compact one-line report about failures:

"number of failures: 12 (0.001%, deadlock: 7, serialization: 5)"

I think, there may be a misunderstanding. Because script can contain several
transactions and get both failures.

I do not understand. Both failures number are on the compact line I
suggested.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#24)

Re: WIP Patch: Pgbench Serialization and deadlock errors

I was not convinced by the overall memory management around variables
to begin with, and it is even less so with their new copy management.
Maybe having a clean "Variables" data structure could help improve the
situation.

Ok!

Note that there is something for psql (src/bin/psql/variable.c) which may
or may not be shared. It should be checked before recoding eventually the
same thing.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#25)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On 13-07-2017 19:32, Fabien COELHO wrote:

Hello,

Hi!

[...] I didn't make rollbacks to savepoints after the failure because
they cannot help for serialization failures at all: after rollback to
savepoint a new attempt will be always unsuccessful.

Not necessarily? It depends on where the locks triggering the issue
are set, if they are all set after the savepoint it could work on a
second attempt.

Don't you mean the deadlock failures where can really help rollback to
savepoint? And could you, please, give an example where a rollback to
savepoint can help to end its subtransaction successfully after a
serialization failure?

"SimpleStats attempts": I disagree with using this floating poiunt
oriented structures to count integers. I would suggest "int64 tries"
instead, which should be enough for the purpose.

I'm not sure that it is enough. Firstly it may be several transactions
in script so to count the average attempts number you should know the
total number of runned transactions. Secondly I think that stddev for
attempts number can be quite interesting and often it is not close to
zero.

I would prefer to have a real motivation to add this complexity in the
report and in the code. Without that, a simple int seems better for
now. It can be improved later if the need really arises.

Ok!

Some variables, such as "int attempt_number", should be in the client
structure, not in the client? Generally, try to use block variables
if
possible to keep the state clearly disjoints. If there could be NO
new
variable at the doCustom level that would be great, because that
would
ensure that there is no machine state mixup hidden in these
variables.

Do you mean the code cleanup for doCustom function? Because if I do so
there will be two code styles for state blocks and their variables in
this function..

I think that any variable shared between state is a recipee for bugs
if it is not reset properly, so they should be avoided. Maybe there
are already too many of them, then too bad, not a reason to add more.
The status before the automaton was a nightmare.

Ok!

I would suggest a more compact one-line report about failures:

"number of failures: 12 (0.001%, deadlock: 7, serialization: 5)"

I think, there may be a misunderstanding. Because script can contain
several transactions and get both failures.

I do not understand. Both failures number are on the compact line I
suggested.

I mean that the sum of transactions with serialization failure and
transactions with deadlock failure can be greater then the totally sum
of transactions with failures. But if you think it's ok I'll change it
and write the appropriate note in documentation.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#26)

Re: WIP Patch: Pgbench Serialization and deadlock errors

I was not convinced by the overall memory management around variables
to begin with, and it is even less so with their new copy management.
Maybe having a clean "Variables" data structure could help improve
the
situation.

Note that there is something for psql (src/bin/psql/variable.c) which
may or may not be shared. It should be checked before recoding
eventually the same thing.

Thank you very much for pointing this file! As I checked this is another
structure: here there's a simple list, while in pgbench we should know
if the list is sorted and the number of elements in the list. How do you
think, is it a good idea to name a variables structure in pgbench in the
same way (VariableSpace) or it should be different not to be confused
(Variables, for example)?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#27)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Not necessarily? It depends on where the locks triggering the issue
are set, if they are all set after the savepoint it could work on a
second attempt.

Don't you mean the deadlock failures where can really help rollback to

Yes, I mean deadlock failures can rollback to a savepoint and work on a
second attempt.

And could you, please, give an example where a rollback to savepoint can
help to end its subtransaction successfully after a serialization
failure?

I do not know whether this is possible with about serialization failures.
It might be if the stuff before and after the savepoint are somehow
unrelated...

[...] I mean that the sum of transactions with serialization failure and
transactions with deadlock failure can be greater then the totally sum
of transactions with failures.

Hmmm. Ok.

A "failure" is a transaction (in the sense of pgbench) that could not made
it to the end, even after retries. If there is a rollback and the a retry
which works, it is not a failure.

Now deadlock or serialization errors, which trigger retries, are worth
counting as well, although they are not "failures". So my format proposal
was over optimistic, and the number of deadlocks and serializations should
better be on a retry count line.

Maybe something like:
...
number of failures: 12 (0.004%)
number of retries: 64 (deadlocks: 29, serialization: 35)

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#28)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Note that there is something for psql (src/bin/psql/variable.c) which
may or may not be shared. It should be checked before recoding
eventually the same thing.

Thank you very much for pointing this file! As I checked this is another
structure: here there's a simple list, while in pgbench we should know
if the list is sorted and the number of elements in the list. How do you
think, is it a good idea to name a variables structure in pgbench in the
same way (VariableSpace) or it should be different not to be confused
(Variables, for example)?

Given that the number of variables of a pgbench script is expected to be
pretty small, I'm not sure that the sorting stuff is worth the effort.

My suggestion is really to look at both implementations and to answer the
question "should pgbench share its variable implementation with psql?".

If the answer is yes, then the relevant part of the implementation should
be moved to fe_utils, and that's it.

If the answer is no, then implement something in pgbench directly.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#29)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Not necessarily? It depends on where the locks triggering the issue
are set, if they are all set after the savepoint it could work on a
second attempt.

Don't you mean the deadlock failures where can really help rollback to

Yes, I mean deadlock failures can rollback to a savepoint and work on
a second attempt.

And could you, please, give an example where a rollback to savepoint
can help to end its subtransaction successfully after a serialization
failure?

I do not know whether this is possible with about serialization
failures.
It might be if the stuff before and after the savepoint are somehow
unrelated...

If you mean, for example, the updates of different tables - a rollback
to savepoint doesn't help.

And I'm not sure that we should do all the stuff for savepoints
rollbacks because:
- as I see it now it only makes sense for the deadlock failures;
- if there's a failure what savepoint we should rollback to and start
the execution again? Maybe to go to the last one, if it is not
successful go to the previous one etc.
Retrying the entire transaction may take less time..

[...] I mean that the sum of transactions with serialization failure
and transactions with deadlock failure can be greater then the totally
sum of transactions with failures.

Hmmm. Ok.

A "failure" is a transaction (in the sense of pgbench) that could not
made it to the end, even after retries. If there is a rollback and the
a retry which works, it is not a failure.

Now deadlock or serialization errors, which trigger retries, are worth
counting as well, although they are not "failures". So my format
proposal was over optimistic, and the number of deadlocks and
serializations should better be on a retry count line.

Maybe something like:
...
number of failures: 12 (0.004%)
number of retries: 64 (deadlocks: 29, serialization: 35)

Ok! How to you like the idea to use the same format (the total number of
transactions with failures and the number of retries for each failure
type) in other places (log, aggregation log, progress) if the values are
not "default" (= no failures and no retries)?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#30)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Given that the number of variables of a pgbench script is expected to
be pretty small, I'm not sure that the sorting stuff is worth the
effort.

I think it is a good insurance if there're many variables..

My suggestion is really to look at both implementations and to answer
the question "should pgbench share its variable implementation with
psql?".

If the answer is yes, then the relevant part of the implementation
should be moved to fe_utils, and that's it.

If the answer is no, then implement something in pgbench directly.

The structure of variables is different, the container structure of the
variables is different, so I think that the answer is no.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#32)

Re: WIP Patch: Pgbench Serialization and deadlock errors

If the answer is no, then implement something in pgbench directly.

The structure of variables is different, the container structure of the
variables is different, so I think that the answer is no.

Ok, fine. My point was just to check before proceeding.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#31)

Re: WIP Patch: Pgbench Serialization and deadlock errors

And I'm not sure that we should do all the stuff for savepoints rollbacks
because:
- as I see it now it only makes sense for the deadlock failures;
- if there's a failure what savepoint we should rollback to and start the
execution again?

ISTM that it is the point of having savepoint in the first place, the
ability to restart the transaction at that point if something failed?

Maybe to go to the last one, if it is not successful go to the previous
one etc. Retrying the entire transaction may take less time..

Well, I do not know that. My 0.02 ï¿œ is that if there was a savepoint then
this is natural the restarting point of a transaction which has some
recoverable error.

Well, the short version may be to only do a full transaction retry and to
document that for now savepoints are not handled, and to let that for
future work if need arises.

Maybe something like:
...
number of failures: 12 (0.004%)
number of retries: 64 (deadlocks: 29, serialization: 35)

Ok! How to you like the idea to use the same format (the total number of
transactions with failures and the number of retries for each failure type)
in other places (log, aggregation log, progress) if the values are not
"default" (= no failures and no retries)?

For progress the output must be short and readable, and probably we do not
care about whether retries came from this or that, so I would let that
out.

For log and aggregated log possibly that would make more sense, but it
must stay easy to parse.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#33)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Ok, fine. My point was just to check before proceeding.

And I'm very grateful for that :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Fabien COELHO (#34)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Well, the short version may be to only do a full transaction retry and
to document that for now savepoints are not handled, and to let that
for future work if need arises.

I agree with you.

For progress the output must be short and readable, and probably we do
not care about whether retries came from this or that, so I would let
that out.

For log and aggregated log possibly that would make more sense, but it
must stay easy to parse.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Marina Polyakova (#36)

1 attachment(s)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello again!

Here is the third version of the patch for pgbench thanks to Fabien
Coelho comments. As in the previous one, transactions with serialization
and deadlock failures are rolled back and retried until they end
successfully or their number of tries reaches maximum.

Differences from the previous version:
* Some code cleanup :) In particular, the Variables structure for
managing client variables and only one new tap tests file (as they were
recommended here [1]/messages/by-id/alpine.DEB.2.20.1707121338090.12795@lancre and here [2]/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre).
* There's no error if the last transaction in the script is not
completed. But the transactions started in the previous scripts and/or
not ending in the current script, are not rolled back and retried after
the failure. Such script try is reported as failed because it contains a
failure that was not rolled back and retried.
* Usually the retries and/or failures are printed if they are not equal
to zeros. In transaction/aggregation logs the failures are always
printed and the retries are printed if max_tries is greater than 1. It
is done for the general format of the log during the execution of the
program.

Patch is attached. Any suggestions are welcome!

[1]: /messages/by-id/alpine.DEB.2.20.1707121338090.12795@lancre
/messages/by-id/alpine.DEB.2.20.1707121338090.12795@lancre
[2]: /messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre
/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v3-0001-Pgbench-Retry-transactions-with-serialization-or-.patchtext/x-diff; name=v3-0001-Pgbench-Retry-transactions-with-serialization-or-.patchDownload

From 0ee37aaaa2e93b8d7017563d2f2f55357c39c08a Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 21 Jul 2017 17:57:58 +0300
Subject: [PATCH v3] Pgbench Retry transactions with serialization or deadlock
 errors

Now transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of tries
reaches maximum. You can set the maximum number of tries by using the
appropriate benchmarking option (--max-tries). The default value is 1. If
there're retries and/or failures their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction failure is reported here only if the last try of
this transaction fails. Also retries and/or failures are printed per-command
with average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or failures is
not zero.

Note that the transactions started in the previous scripts and/or not ending in
the current script, are not rolled back and retried after the failure. Such
script try is reported as failed because it contains a failure that was not
rolled back and retried.
---
 doc/src/sgml/ref/pgbench.sgml                      | 240 +++++-
 src/bin/pgbench/pgbench.c                          | 872 +++++++++++++++++----
 .../t/002_serialization_and_deadlock_failures.pl   | 459 +++++++++++
 3 files changed, 1412 insertions(+), 159 deletions(-)
 create mode 100644 src/bin/pgbench/t/002_serialization_and_deadlock_failures.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 64b043b..3bbeec5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,6 +49,7 @@
 
 <screen>
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum tries number: 1
 scaling factor: 10
 query mode: simple
 number of clients: 10
@@ -59,7 +60,7 @@ tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
+  The first seven lines report some of the most important parameter
   settings.  The next line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
@@ -436,22 +437,33 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
         Show progress report every <replaceable>sec</> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If since the last report there are
+        transactions that ended with serialization/deadlock failures they are
+        also reported here as failed (see
+        <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"> for more information).  Under
+        throttling (<option>-R</>), the latency is computed with respect to the
+        transaction scheduled start time, not the actual transaction beginning
+        time, thus it also includes the average schedule lag time.  If since the
+        last report there're transactions that have been rolled back and retried
+        after a serialization/deadlock failure, the report includes the
+        number of retries of all such transactions (use option
+        <option>--max-tries</> to make it possible).
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of serialization failures and
+        retries, the number of deadlock failures and retries. Note that the
+        report contains failures only if the total number of failures for all
+        scripts is not zero. The same for retries. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -496,6 +508,15 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        </para>
 
        <para>
+        Transactions with serialization or deadlock failures (or with both
+        of them if used script contains several transactions; see
+        <xref linkend="transactions-and-scripts"
+        endterm="transactions-and-scripts-title"> for more information) are
+        marked separately and their time is not reported as for skipped
+        transactions.
+       </para>
+
+       <para>
         A high schedule lag time is an indication that the system cannot
         process transactions at the specified rate, with the chosen number of
         clients and threads. When the average transaction execution time is
@@ -590,6 +611,32 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>tries_number</></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. Default is 1.
+       </para>
+       <note>
+         <para>
+         Be careful if you want to repeat transactions with shell commands
+         inside. Unlike sql commands the result of shell command is not rolled
+         back except for the variable value of command
+         <command>\setshell</command>. If a shell command fails its client is
+         aborted without restarting.
+         </para>
+       </note>
+       <note>
+         <para>
+         The transactions started in the previous scripts and/or not ending in
+         the current script, are not rolled back and retried after the failure.
+         Such script try is reported as failed.
+         </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -693,8 +740,8 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</> executes test scripts chosen randomly
@@ -1148,7 +1195,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>script_no</> <replaceable>time_epoch</> <replaceable>time_us</> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>script_no</> <replaceable>time_epoch</> <replaceable>time_us</> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>serialization_retries</replaceable> <replaceable>deadlock_retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1169,6 +1216,14 @@ END;
    When both <option>--rate</> and <option>--latency-limit</> are used,
    the <replaceable>time</> for a skipped transaction will be reported as
    <literal>skipped</>.
+   <replaceable>serialization_retries</> and <replaceable>deadlock_retries</>
+   are the sums of all the retries after the corresponding failures during the
+   current script execution. They are only present when the maximum number of
+   tries for transactions is more than 1 (<option>--max-tries</>).
+   If the transaction ended with a serialization/deadlock failure, its
+   <replaceable>time</> will be reported as <literal>failed</> (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title">
+   for more information).
   </para>
 
   <para>
@@ -1198,6 +1253,22 @@ END;
   </para>
 
   <para>
+   Example with failures and retries (the maximum number of tries is 10):
+<screen>
+3 0 47423 0 1499414498 34501 4 0
+3 1 8333 0 1499414498 42848 1 0
+3 2 8358 0 1499414498 51219 1 0
+4 0 72345 0 1499414498 59433 7 0
+1 3 41718 0 1499414498 67879 5 0
+1 4 8416 0 1499414498 76311 1 0
+3 3 33235 0 1499414498 84469 4 0
+0 0 failed 0 1499414498 84905 10 0
+2 0 failed 0 1499414498 86248 10 0
+3 4 8307 0 1499414498 92788 1 0
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</> option
    can be used to log only a random sample of transactions.
@@ -1212,7 +1283,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional>
+<replaceable>interval_start</> <replaceable>num_transactions</> <replaceable>sum_latency</> <replaceable>sum_latency_2</> <replaceable>min_latency</> <replaceable>max_latency</> <replaceable>failures</> <optional> <replaceable>sum_lag</> <replaceable>sum_lag_2</> <replaceable>min_lag</> <replaceable>max_lag</> <optional> <replaceable>skipped</> </optional> </optional> <optional> <replaceable>serialization_retries</> <replaceable>deadlock_retries</> </optional>
 </synopsis>
 
    where
@@ -1226,7 +1297,11 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</> is the maximum latency within the interval.
+   <replaceable>max_latency</> is the maximum latency within the interval,
+   <replaceable>failures</> is the number of transactions ended with
+   serialization/deadlock failures within the interval (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title">
+   for more information).
    The next fields,
    <replaceable>sum_lag</>, <replaceable>sum_lag_2</>, <replaceable>min_lag</>,
    and <replaceable>max_lag</>, are only present if the <option>--rate</>
@@ -1234,21 +1309,26 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</>,
+   The next field, <replaceable>skipped</>,
    is only present if the <option>--latency-limit</> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The very last fields, <replaceable>serialization_retries</> and
+   <replaceable>deadlock_retries</>, are the sums of all the retries after the
+   corresponding failures within the interval. They are only present when the
+   maximum number of tries for transactions is more than 1
+   (<option>--max-tries</>).
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -1260,13 +1340,51 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</> option, <application>pgbench</> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</> option, <application>pgbench</> collects the following
+   statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         the elapsed transaction time of each statement; <application>pgbench</>
+         reports an average of those values, referred to as the latency for each
+         statement;
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         the number of serialization and deadlock failures that were not retried
+         (see <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"> for more information);
+       </para>
+       <note>
+         <para>The total sum of per-command failures can be greater than the
+         number of failed transactions. See
+         <xref linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"> for more information.
+         </para>
+       </note>
+     </listitem>
+     <listitem>
+       <para>
+         the number of retries when there was a serialization/deadlock failure
+         in this command; they are reported as serialization/deadlock retries,
+         respectively.
+       </para>
+     </listitem>
+   </itemizedlist>
+
+   <note>
+     <para>
+     The report contains failures only if the total number of failures for all
+     scripts is not zero. The same for retries.
+     </para>
+   </note>
+
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -1274,6 +1392,7 @@ END;
 <screen>
 starting vacuum...end.
 transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum tries number: 1
 scaling factor: 1
 query mode: simple
 number of clients: 10
@@ -1298,10 +1417,51 @@ script statistics:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+transaction maximum tries number: 100
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 10000/10000
+number of failures: 3493 (34.930 %)
+number of retries: 449743 (serialization: 449743, deadlocks: 0)
+latency average = 211.539 ms
+latency stddev = 354.318 ms
+tps = 29.310488 (including connections establishing)
+tps = 29.310885 (excluding connections establishing)
+script statistics:
+ - statement latencies in milliseconds, serialization failures and retries,
+   deadlock failures and retries:
+  0.004     0       0  0  0  \set aid random(1, 100000 * :scale)
+  0.001     0       0  0  0  \set bid random(1, 1 * :scale)
+  0.001     0       0  0  0  \set tid random(1, 10 * :scale)
+  0.001     0       0  0  0  \set delta random(-5000, 5000)
+  0.452     0       0  0  0  BEGIN;
+  1.080     0       1  0  0  UPDATE pgbench_accounts
+                             SET abalance = abalance + :delta WHERE aid = :aid;
+  0.853     0       1  0  0  SELECT abalance FROM pgbench_accounts
+                             WHERE aid = :aid;
+  1.028  3455  436867  0  0  UPDATE pgbench_tellers
+                             SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.860    38   12836  0  0  UPDATE pgbench_branches
+                             SET bbalance = bbalance + :delta WHERE bid = :bid;
+  1.027     0       0  0  0  INSERT INTO pgbench_history
+                                    (tid, bid, aid, delta, mtime)
+                             VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.147     0      38  0  0  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1315,6 +1475,34 @@ script statistics:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Serialization/deadlock failures and retries</title>
+
+  <para>
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated again and again until they end sucessufully or their number of tries
+   reaches maximum (to change this maximum see the appropriate benchmarking
+   option <option>--max-tries</>). If the last try of a transaction fails this
+   transaction will be reported as failed. Note that the transactions started in
+   the previous scripts and/or not ending in the current script, are not rolled
+   back and retried after the failure. Such script try is reported as failed
+   because it contains a failure that was not rolled back and retried. The
+   latencies are not computed for the failed transactions and commands. The
+   latency of sucessful transaction includes the entire time of transaction
+   execution with roll backs and retries.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is not zero
+   (see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"> for more information). If the total
+   number of retries is not zero the main report also contains it and the number
+   of retries after each kind of failure (use option <option>--max-tries</> to
+   make it possible). The per-statement report contains the failures if the main
+   report contains them too. The same for retries.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4d364a1..0a5a0d9 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -58,6 +58,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -174,8 +177,12 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * and failures without retrying */
 int			main_pid;			/* main process id used in log filename */
+int			max_tries = 1;		/* maximum number of tries to run the
+								 * transaction with serialization or deadlock
+								 * failures */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -223,6 +230,16 @@ typedef struct SimpleStats
 } SimpleStats;
 
 /*
+ * Data structure to hold retries after failures.
+ */
+typedef struct Retries
+{
+	int64		serialization;	/* number of retries after serialization
+								 * failures */
+	int64		deadlocks;		/* number of retries after deadlock failures */
+} Retries;
+
+/*
  * Data structure to hold various statistics: per-thread and per-script stats
  * are maintained and merged together.
  */
@@ -232,11 +249,43 @@ typedef struct StatsData
 	int64		cnt;			/* number of transactions */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	Retries		retries;
+	int64		failures;		/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	/*
+	* Command number in script; -1 if there were not any transactions yet or we
+	* continue the transaction block from the previous scripts
+	*/
+	int			command;
+
+	int			retries;
+
+	unsigned short random_state[3];	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -287,6 +336,19 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, report the failure in CSTATE_FAILURE. Then, if we need to end the
+	 * failed transaction block, go to states CSTATE_START_COMMAND ->
+	 * CSTATE_WAIT_RESULT -> CSTATE_END_COMMAND with the appropriate command.
+	 * After that, go to CSTATE_RETRY. If we can repeat the failed transaction,
+	 * set the same parameters for the transaction execution as in the previous
+	 * tries. Otherwise, go to the next command after the failed transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -311,14 +373,13 @@ typedef struct
 	PGconn	   *con;			/* connection handle to DB */
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
+	unsigned short random_state[3];	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -328,6 +389,15 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/* for repeating transactions with serialization or deadlock failures: */
+	bool		in_transaction_block;	/* are we in transaction block? */
+	bool		end_failed_transaction_block; /* are we ending the failed
+											   * transaction block? */
+	RetryState  retry_state;
+	Retries		retries;
+	bool		failure;		/* if there was a serialization or a deadlock
+								 * failure without retrying */
+
 	/* per client collected stats */
 	int64		cnt;			/* transaction count */
 	int			ecnt;			/* error count */
@@ -342,7 +412,6 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 
@@ -382,6 +451,17 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	Retries		retries;
+	int64		serialization_failures;	/* number of serialization failures that
+										 * were not retried */
+	int64		deadlock_failures;	/* number of deadlock failures that were not
+									 * retried */
+
+	/* for repeating transactions with serialization and deadlock failures: */
+	bool		is_transaction_block_begin;	/* if command syntactically start a
+											 * a transaction block */
+	int			transaction_block_end;	/* nearest command number to complete
+										 * the transaction block or -1 */
 } Command;
 
 typedef struct ParsedScript
@@ -445,6 +525,17 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_TRANSACTION,
+	FAILURE_STATUS_ANOTHER		/* another failure or no failure */
+} FailureStatus;
+
 
 /* Function prototypes */
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -504,7 +595,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -513,6 +604,7 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
@@ -624,7 +716,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(CState *st, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -635,7 +727,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(st->random_state));
 }
 
 /*
@@ -644,7 +736,7 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(CState *st, int64 min, int64 max, double parameter)
 {
 	double		cut,
 				uniform,
@@ -654,7 +746,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(st->random_state);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -667,7 +759,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(CState *st, int64 min, int64 max, double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -695,8 +787,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(st->random_state);
+		double		rand2 = 1.0 - pg_erand48(st->random_state);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -723,7 +815,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(CState *st, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -732,7 +824,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(st->random_state);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -777,6 +869,25 @@ mergeSimpleStats(SimpleStats *acc, SimpleStats *ss)
 }
 
 /*
+ * Initialize the given Retries struct to all zeroes
+ */
+static void
+initRetries(Retries *retries)
+{
+	memset(retries, 0, sizeof(Retries));
+}
+
+/*
+ * Merge two Retries objects
+ */
+static void
+mergeRetries(Retries *acc, Retries *retries)
+{
+	acc->serialization += retries->serialization;
+	acc->deadlocks += retries->deadlocks;
+}
+
+/*
  * Initialize a StatsData struct to mostly zeroes, with its start time set to
  * the given value.
  */
@@ -786,24 +897,37 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->failures = 0;
+	initRetries(&sd->retries);
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
 
 /*
- * Accumulate one additional item into the given stats object.
+ * Accumulate statistics regardless of whether there was a failure / transaction
+ * was skipped or not.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumMainStats(StatsData *stats, bool skipped, bool failure, Retries *retries)
 {
 	stats->cnt++;
-
 	if (skipped)
-	{
-		/* no latency to record on skipped transactions */
 		stats->skipped++;
-	}
-	else
+	else if (failure)
+		stats->failures++;
+	mergeRetries(&stats->retries, retries);
+}
+
+/*
+ * Accumulate one additional item into the given stats object.
+ */
+static void
+accumStats(StatsData *stats, bool skipped, bool failure, double lat, double lag,
+		   Retries *retries)
+{
+	accumMainStats(stats, skipped, failure, retries);
+
+	if (!skipped && !failure)
 	{
 		addToSimpleStats(&stats->latency, lat);
 
@@ -936,39 +1060,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1041,11 +1165,11 @@ isLegalVariableName(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1062,23 +1186,24 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(
+								variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->value = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1087,12 +1212,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1110,12 +1236,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a numeric value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableNumber(CState *st, const char *context, char *name,
+putVariableNumber(Variables *variables, const char *context, char *name,
 				  const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1131,12 +1257,13 @@ putVariableNumber(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableNumber(st, context, name, &val);
+	return putVariableNumber(variables, context, name, &val);
 }
 
 static char *
@@ -1181,7 +1308,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1202,7 +1329,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1217,12 +1344,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 /* get a value as an int, tell if there is a problem */
@@ -1593,7 +1721,7 @@ evalFunc(TState *thread, CState *st,
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(st, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -1615,7 +1743,7 @@ evalFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(st, imin, imax, param));
 					}
 					else		/* exponential */
 					{
@@ -1628,7 +1756,7 @@ evalFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(st, imin, imax, param));
 					}
 				}
 
@@ -1664,7 +1792,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					fprintf(stderr, "undefined variable \"%s\"\n",
 							expr->u.variable.varname);
@@ -1697,7 +1825,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -1728,7 +1856,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[i]);
@@ -1791,7 +1919,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 				argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 #ifdef DEBUG
@@ -1817,7 +1945,7 @@ commandFailed(CState *st, char *message)
 
 /* return a script number with a weighted choice. */
 static int
-chooseScript(TState *thread)
+chooseScript(CState *st)
 {
 	int			i = 0;
 	int64		w;
@@ -1825,7 +1953,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(st, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -1845,7 +1973,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -1857,7 +1985,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -1891,7 +2019,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		if (debug)
@@ -1919,14 +2047,14 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[1]);
@@ -1951,6 +2079,67 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/* make a deep copy of variables array */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination = destination_vars->array;
+	Variable   *current_destination;
+	const Variable *source = source_vars->array;
+	const Variable *current_source;
+	int			nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->value);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->value)
+			current_destination->value = pg_strdup(current_source->value);
+		else
+			current_destination->value = NULL;
+		current_destination->is_numeric = current_source->is_numeric;
+		current_destination->num_value = current_source->num_value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if there's a serialization/deadlock failure.
+ */
+static bool
+anyFailure(FailureStatus status)
+{
+	return status == SERIALIZATION_FAILURE || status == DEADLOCK_FAILURE;
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	return (!(st->in_transaction_block && command->transaction_block_end < 0) &&
+			st->retry_state.command >= 0 &&
+			st->retry_state.retries + 1 < max_tries);
+}
+
 /*
  * Advance the state machine of a connection, if possible.
  */
@@ -1962,6 +2151,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -1990,7 +2180,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_CHOOSE_SCRIPT:
 
-				st->use_file = chooseScript(thread);
+				st->use_file = chooseScript(st);
 
 				if (debug)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
@@ -2017,7 +2207,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(st, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2049,7 +2239,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(st, throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -2102,6 +2292,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/* reset transaction variables to default values */
+				st->retry_state.command = -1;
+				initRetries(&st->retries);
+				st->failure = false;
+
 				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
@@ -2143,10 +2338,38 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
+				 * It will be changed in CSTATE_WAIT_RESULT if there is a
+				 * serialization/deadlock failure or we continue the failed
+				 * transaction block. It is set here because the meta commands
+				 * don't use CSTATE_WAIT_RESULT.
+				 */
+				failure_status = FAILURE_STATUS_ANOTHER;
+
+				if (command->type == SQL_COMMAND &&
+					!st->in_transaction_block &&
+					st->retry_state.command < st->command)
+				{
+					/*
+					 * It is a first try to run the transaction which begins in
+					 * current command.  Remember its parameters just in case we
+					 * should repeat it in future.
+					 */
+					st->retry_state.command = st->command;
+					st->retry_state.retries = 0;
+					memcpy(st->retry_state.random_state, st->random_state,
+						   sizeof(unsigned short) * 3);
+					copyVariables(&st->retry_state.variables, &st->variables);
+				}
+
+				if (command->is_transaction_block_begin &&
+					!st->in_transaction_block)
+					st->in_transaction_block = true;
+
+				/*
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2192,7 +2415,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
 							commandFailed(st, "execution of meta-command 'sleep' failed");
 							st->state = CSTATE_ABORTED;
@@ -2219,7 +2442,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								break;
 							}
 
-							if (!putVariableNumber(st, argv[0], argv[1], &result))
+							if (!putVariableNumber(&st->variables, argv[0],
+												   argv[1], &result))
 							{
 								commandFailed(st, "assignment of meta-command 'set' failed");
 								st->state = CSTATE_ABORTED;
@@ -2228,7 +2452,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (pg_strcasecmp(argv[0], "setshell") == 0)
 						{
-							bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+							bool		ret = runShellCommand(&st->variables,
+															  argv[1], argv + 2,
+															  argc - 2);
 
 							if (timer_exceeded) /* timeout */
 							{
@@ -2248,7 +2474,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (pg_strcasecmp(argv[0], "shell") == 0)
 						{
-							bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+							bool		ret = runShellCommand(&st->variables,
+															  NULL, argv + 1,
+															  argc - 1);
 
 							if (timer_exceeded) /* timeout */
 							{
@@ -2283,37 +2511,81 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
+					ExecStatusType result_status;
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					if (debug)
+						fprintf(stderr, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "perhaps the backend died while processing");
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					result_status = PQresultStatus(res);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					failure_status = FAILURE_STATUS_ANOTHER;
+					if (sqlState) {
+						if (strcmp(sqlState,
+								   ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+							failure_status = SERIALIZATION_FAILURE;
+						else if (strcmp(sqlState,
+										ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+							failure_status = DEADLOCK_FAILURE;
+						else if (strcmp(sqlState,
+										ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+							failure_status = IN_FAILED_TRANSACTION;
+					}
+
+					if (debug)
+					{
+						if (anyFailure(failure_status))
+							fprintf(stderr, "client %d got a %s failure (try %d/%d)\n",
+									st->id,
+									(failure_status == SERIALIZATION_FAILURE ?
+									 "serialization" :
+									 "deadlock"),
+									st->retry_state.retries + 1,
+									max_tries);
+						else if (failure_status == IN_FAILED_TRANSACTION)
+							fprintf(stderr, "client %d in the failed transaction\n",
+									st->id);
+					}
+
+					/*
+					 * All is ok if one of the following conditions is
+					 * satisfied:
+					 * - there's no failure;
+					 * - there is a serialization/deadlock failure (these
+					 * failures will be processed later);
+					 * - we continue the failed transaction block (move on to
+					 * the next command).
+					 */
+					if (result_status == PGRES_COMMAND_OK ||
+						result_status == PGRES_TUPLES_OK ||
+						result_status == PGRES_EMPTY_QUERY ||
+						failure_status != FAILURE_STATUS_ANOTHER)
+					{
 						/* OK */
 						PQclear(res);
 						discard_response(st);
 						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
+					}
+					else
+					{
 						commandFailed(st, PQerrorMessage(st->con));
 						PQclear(res);
 						st->state = CSTATE_ABORTED;
-						break;
+					}
 				}
 				break;
 
@@ -2337,12 +2609,20 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_END_COMMAND:
 
+				/* process the serialization/deadlock failure if we have it */
+				if (anyFailure(failure_status))
+				{
+					st->state = CSTATE_FAILURE;
+					break;
+				}
+
 				/*
 				 * command completed: accumulate per-command execution times
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command &&
+					failure_status != IN_FAILED_TRANSACTION)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2354,8 +2634,135 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 									 INSTR_TIME_GET_DOUBLE(st->stmt_begin));
 				}
 
-				/* Go ahead with next command */
-				st->command++;
+				if (st->in_transaction_block &&
+					command->transaction_block_end == st->command)
+					st->in_transaction_block = false;
+
+				if (st->end_failed_transaction_block)
+				{
+					/*
+					 * The failed transaction block has ended. Retry it if
+					 * possible.
+					 */
+					st->end_failed_transaction_block = false;
+					st->state = CSTATE_RETRY;
+				}
+				else
+				{
+					/* Go ahead with next command */
+					st->command++;
+					st->state = CSTATE_START_COMMAND;
+				}
+
+				break;
+
+				/*
+				 * Report about failure and end the failed transaction block.
+				 */
+			case CSTATE_FAILURE:
+
+				if (canRetry(st))
+				{
+					/*
+					 * The failed transaction will be retried. So accumulate
+					 * the retry for the command and for the current script
+					 * execution.
+					 */
+					if (failure_status == SERIALIZATION_FAILURE)
+					{
+						st->retries.serialization++;
+						if (report_per_command)
+							command->retries.serialization++;
+					}
+					else
+					{
+						st->retries.deadlocks++;
+						if (report_per_command)
+							command->retries.deadlocks++;
+					}
+				}
+				else
+				{
+					/*
+					 * We will not be able to retry this failed transaction.
+					 * So accumulate the failure for the command and for the
+					 * current script execution.
+					 */
+					st->failure = true;
+					if (report_per_command)
+					{
+						if (failure_status == SERIALIZATION_FAILURE)
+							command->serialization_failures++;
+						else
+							command->deadlock_failures++;
+					}
+				}
+
+				if (st->in_transaction_block)
+				{
+					if (command->transaction_block_end >= 0)
+					{
+						if (st->command == command->transaction_block_end)
+						{
+							/*
+							 * The failed transaction block has ended. Retry
+							 * it if possible.
+							 */
+							st->in_transaction_block = false;
+							st->state = CSTATE_RETRY;
+						}
+						else
+						{
+							/* end the failed transaction block */
+							st->command = command->transaction_block_end;
+							st->end_failed_transaction_block = true;
+							st->state = CSTATE_START_COMMAND;
+						}
+					}
+					else
+					{
+						/*
+						 * There's not a transaction block end later in this
+						 * script. We are in the failed transaction block and
+						 * all next commands will fail. So let's end the current
+						 * script execution.
+						 */
+						st->state = CSTATE_END_TX;
+					}
+				}
+				else
+				{
+					/* retry the failed transaction if possible */
+					st->state = CSTATE_RETRY;
+				}
+
+				break;
+
+				/*
+				 * Retry the failed transaction if possible.
+				 */
+			case CSTATE_RETRY:
+
+				if (canRetry(st))
+				{
+					st->retry_state.retries++;
+					if (debug)
+						fprintf(stderr, "client %d repeats the failed transaction (try %d/%d)\n",
+								st->id,
+								st->retry_state.retries + 1,
+								max_tries);
+
+					st->command = st->retry_state.command;
+					memcpy(st->random_state, st->retry_state.random_state,
+						   sizeof(unsigned short) * 3);
+					copyVariables(&st->variables, &st->retry_state.variables);
+				}
+				else
+				{
+					/* Go ahead with next command */
+					st->command++;
+				}
+
 				st->state = CSTATE_START_COMMAND;
 				break;
 
@@ -2372,7 +2779,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					per_script_stats || use_log)
 					processXactStats(thread, st, &now, false, agg);
 				else
-					thread->stats.cnt++;
+					accumMainStats(&thread->stats, false, st->failure,
+								   &st->retries);
 
 				if (is_connect)
 				{
@@ -2446,7 +2854,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(st->random_state) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -2462,13 +2870,14 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->failures);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -2479,6 +2888,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retries.serialization,
+						agg->retries.deadlocks);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -2486,7 +2899,7 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, st->failure, latency, lag, &st->retries);
 	}
 	else
 	{
@@ -2498,12 +2911,20 @@ doLog(TState *thread, CState *st,
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
 					st->id, st->cnt, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->failure)
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, st->cnt, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
 		else
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
 					st->id, st->cnt, latency, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						st->retries.serialization,
+						st->retries.deadlocks);
 		fputc('\n', logfile);
 	}
 }
@@ -2523,7 +2944,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
 		INSTR_TIME_SET_CURRENT(*now);
 
-	if (!skipped)
+	if (!skipped && !st->failure)
 	{
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
@@ -2532,21 +2953,23 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	if (progress || throttle_delay || latency_limit)
 	{
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, st->failure, latency, lag,
+				   &st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
 			thread->latency_late++;
 	}
 	else
-		thread->stats.cnt++;
+		accumMainStats(&thread->stats, skipped, st->failure, &st->retries);
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, st->failure,
+				   latency, lag, &st->retries);
 }
 
 
@@ -2985,6 +3408,11 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	initRetries(&my_command->retries);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
+	my_command->is_transaction_block_begin = false;
+	my_command->transaction_block_end = -1;
 
 	/*
 	 * If SQL command is multi-line, we only want to save the first line as
@@ -3054,6 +3482,11 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	my_command->type = META_COMMAND;
 	my_command->argc = 0;
 	initSimpleStats(&my_command->stats);
+	initRetries(&my_command->retries);
+	my_command->serialization_failures = 0;
+	my_command->deadlock_failures = 0;
+	my_command->is_transaction_block_begin = false;
+	my_command->transaction_block_end = -1;
 
 	/* Save first word (command name) */
 	j = 0;
@@ -3185,6 +3618,60 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 }
 
 /*
+ * Returns the same command where all continuous blocks of whitespaces are
+ * replaced by one space symbol.
+ *
+ * Returns a malloc'd string.
+ */
+static char *
+normalize_whitespaces(const char *command)
+{
+	const char *ptr = command;
+	char	   *buffer = pg_malloc(strlen(command) + 1);
+	int			length = 0;
+
+	while (*ptr)
+	{
+		while (*ptr && !isspace((unsigned char) *ptr))
+			buffer[length++] = *(ptr++);
+		if (isspace((unsigned char) *ptr))
+		{
+			buffer[length++] = ' ';
+			while (isspace((unsigned char) *ptr))
+				ptr++;
+		}
+	}
+	buffer[length] = '\0';
+
+	return buffer;
+}
+
+/*
+ * Returns true if given command generally ends a transaction block (we don't
+ * check here if the last transaction block is already completed).
+ */
+static bool
+is_transaction_block_end(const char *command_text)
+{
+	bool		result = false;
+	char	   *command = normalize_whitespaces(command_text);
+
+	if (pg_strncasecmp(command, "end", 3) == 0 ||
+		(pg_strncasecmp(command, "commit", 6) == 0 &&
+		 pg_strncasecmp(command, "commit prepared", 15) != 0) ||
+		(pg_strncasecmp(command, "rollback", 8) == 0 &&
+		 pg_strncasecmp(command, "rollback prepared", 17) != 0 &&
+		 pg_strncasecmp(command, "rollback to", 11) != 0) ||
+		(pg_strncasecmp(command, "prepare transaction ", 20) == 0 &&
+		 pg_strncasecmp(command, "prepare transaction (", 21) != 0 &&
+		 pg_strncasecmp(command, "prepare transaction as ", 23) != 0))
+		result = true;
+
+	pg_free(command);
+	return result;
+}
+
+/*
  * Parse a script (either the contents of a file, or a built-in script)
  * and add it to the list of scripts.
  */
@@ -3196,6 +3683,7 @@ ParseScript(const char *script, const char *desc, int weight)
 	PQExpBufferData line_buf;
 	int			alloc_num;
 	int			index;
+	int			last_transaction_block_end = -1;
 
 #define COMMANDS_ALLOC_NUM 128
 	alloc_num = COMMANDS_ALLOC_NUM;
@@ -3238,6 +3726,9 @@ ParseScript(const char *script, const char *desc, int weight)
 		command = process_sql_command(&line_buf, desc);
 		if (command)
 		{
+			char	   *command_text = command->argv[0];
+			int			cur_index;
+
 			ps.commands[index] = command;
 			index++;
 
@@ -3247,6 +3738,25 @@ ParseScript(const char *script, const char *desc, int weight)
 				ps.commands = (Command **)
 					pg_realloc(ps.commands, sizeof(Command *) * alloc_num);
 			}
+
+			/* check if the command syntactically starts a transaction block */
+			if (pg_strncasecmp(command_text, "begin", 5) == 0 ||
+				pg_strncasecmp(command_text, "start", 5) == 0)
+				command->is_transaction_block_begin = true;
+
+			/* check if the command syntactically ends a transaction block*/
+			if (is_transaction_block_end(command_text))
+			{
+				/*
+				 * Remember it for commands that can fail a transaction block
+				 * earlier.
+				 */
+				for (cur_index = last_transaction_block_end + 1;
+					 cur_index < index;
+					 cur_index++)
+					ps.commands[cur_index]->transaction_block_end = index - 1;
+				last_transaction_block_end = index - 1;
+			}
 		}
 
 		/* If we reached a backslash, process that */
@@ -3484,6 +3994,15 @@ printSimpleStats(char *prefix, SimpleStats *ss)
 	printf("%s stddev = %.3f ms\n", prefix, 0.001 * stddev);
 }
 
+/*
+ * Return the sum of all retries.
+ */
+static int64
+getAllRetries(Retries *retries)
+{
+	return retries->serialization + retries->deadlocks;
+}
+
 /* print out results */
 static void
 printResults(TState *threads, StatsData *total, instr_time total_time,
@@ -3492,6 +4011,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
+	int64		all_failures = total->failures;
+	int64		all_retries = getAllRetries(&total->retries);
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 	tps_include = total->cnt / time_include;
@@ -3501,6 +4022,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
 		   num_scripts == 1 ? sql_script[0].desc : "multiple scripts");
+	printf("transaction maximum tries number: %d\n", max_tries);
 	printf("scaling factor: %d\n", scale);
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
@@ -3522,6 +4044,16 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (total->cnt <= 0)
 		return;
 
+	if (all_failures > 0)
+		printf("number of failures: " INT64_FORMAT " (%.3f %%)\n",
+			   all_failures, (100.0 * all_failures / total->cnt));
+
+	if (all_retries > 0)
+		printf("number of retries: " INT64_FORMAT " (serialization: " INT64_FORMAT ", deadlocks: " INT64_FORMAT ")\n",
+			   all_retries,
+			   total->retries.serialization,
+			   total->retries.deadlocks);
+
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
@@ -3557,13 +4089,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || latency_limit || is_latencies)
+	if (per_script_stats || latency_limit || report_per_command)
 	{
 		int			i;
 
 		for (i = 0; i < num_scripts; i++)
 		{
 			if (num_scripts > 1)
+			{
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
 					   " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n",
@@ -3573,8 +4106,23 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].stats.cnt,
 					   100.0 * sql_script[i].stats.cnt / total->cnt,
 					   sql_script[i].stats.cnt / time_include);
+
+				if (all_failures > 0)
+					printf(" - number of failures: " INT64_FORMAT " (%.3f %%)\n",
+						   sql_script[i].stats.failures,
+						   (100.0 * sql_script[i].stats.failures /
+							sql_script[i].stats.cnt));
+
+				if (all_retries > 0)
+					printf(" - number of retries: " INT64_FORMAT " (serialization: " INT64_FORMAT ", deadlocks: " INT64_FORMAT ")\n",
+						   getAllRetries(&sql_script[i].stats.retries),
+						   sql_script[i].stats.retries.serialization,
+						   sql_script[i].stats.retries.deadlocks);
+			}
 			else
+			{
 				printf("script statistics:\n");
+			}
 
 			if (latency_limit)
 				printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
@@ -3585,20 +4133,45 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (num_scripts > 1)
 				printSimpleStats(" - latency", &sql_script[i].stats.latency);
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/*
+			 * Report per-command statistics: latencies, retries after failures,
+			 * failures without retrying.
+			 */
+			if (report_per_command)
 			{
 				Command   **commands;
 
-				printf(" - statement latencies in milliseconds:\n");
+				printf(" - statement latencies in milliseconds");
+				if (all_failures > 0 && all_retries > 0)
+					printf(", serialization failures and retries, deadlock failures and retries");
+				else if (all_failures > 0 || all_retries > 0)
+					printf(", serialization and deadlock %s",
+						   (all_failures > 0 ? "failures" : "retries"));
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
 					 commands++)
-					printf("   %11.3f  %s\n",
+				{
+					printf("   %11.3f",
 						   1000.0 * (*commands)->stats.sum /
-						   (*commands)->stats.count,
-						   (*commands)->line);
+						   (*commands)->stats.count);
+					if (all_failures > 0 && all_retries > 0)
+						printf("  %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d",
+							   (*commands)->serialization_failures,
+							   (*commands)->retries.serialization,
+							   (*commands)->deadlock_failures,
+							   (*commands)->retries.deadlocks);
+					else if (all_failures > 0 || all_retries > 0)
+						printf("  %25" INT64_MODIFIER "d  %25" INT64_MODIFIER "d",
+							   (all_failures > 0 ?
+								(*commands)->serialization_failures :
+								(*commands)->retries.serialization),
+							   (all_failures > 0 ?
+								(*commands)->deadlock_failures :
+								(*commands)->retries.deadlocks));
+					printf("  %s\n", (*commands)->line);
+				}
 			}
 		}
 	}
@@ -3627,7 +4200,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -3645,6 +4218,7 @@ main(int argc, char **argv)
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
+		{"max-tries", required_argument, NULL, 8},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3710,6 +4284,7 @@ main(int argc, char **argv)
 
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
+	state->retry_state.command = -1;
 
 	while ((c = getopt_long(argc, argv, "ih:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
@@ -3787,7 +4362,7 @@ main(int argc, char **argv)
 			case 'r':
 				benchmarking_option_set = true;
 				per_script_stats = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -3881,7 +4456,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -3991,6 +4566,16 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				logfile_prefix = pg_strdup(optarg);
 				break;
+			case 8:
+				benchmarking_option_set = true;
+				max_tries = atoi(optarg);
+				if (max_tries <= 0)
+				{
+					fprintf(stderr, "invalid number of maximum tries: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4123,19 +4708,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->is_numeric)
 				{
-					if (!putVariableNumber(&state[i], "startup",
+					if (!putVariableNumber(&state[i].variables, "startup",
 										   var->name, &var->num_value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->value))
 						exit(1);
 				}
@@ -4143,6 +4728,18 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* set random seed */
+	INSTR_TIME_SET_CURRENT(start_time);
+	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
+	/* set random states for clients */
+	for (i = 0; i < nclients; i++)
+	{
+		state[i].random_state[0] = random();
+		state[i].random_state[1] = random();
+		state[i].random_state[2] = random();
+	}
+
 	if (debug)
 	{
 		if (duration <= 0)
@@ -4204,11 +4801,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -4217,11 +4814,11 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 		}
 	}
@@ -4243,10 +4840,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
@@ -4259,9 +4852,6 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		initStats(&thread->stats, 0);
@@ -4340,6 +4930,8 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.failures += thread->stats.failures;
+		mergeRetries(&stats.retries, &thread->stats.retries);
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -4613,6 +5205,8 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report;
+				int64		failures,
+							retries;
 				double		tps,
 							total_run,
 							latency,
@@ -4639,6 +5233,8 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.failures += thread[i].stats.failures;
+					mergeRetries(&cur.retries, &thread[i].stats.retries);
 				}
 
 				total_run = (now - thread_start) / 1000000.0;
@@ -4650,6 +5246,9 @@ threadRun(void *arg)
 				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
 				lag = 0.001 * (cur.lag.sum - last.lag.sum) /
 					(cur.cnt - last.cnt);
+				failures = cur.failures - last.failures;
+				retries = getAllRetries(&cur.retries) -
+					getAllRetries(&last.retries);
 
 				if (progress_timestamp)
 				{
@@ -4672,6 +5271,9 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (failures > 0)
+					fprintf(stderr, ", " INT64_FORMAT " failed" , failures);
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -4679,6 +5281,10 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				if (retries > 0)
+					fprintf(stderr, ", " INT64_FORMAT " retries", retries);
+
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/002_serialization_and_deadlock_failures.pl b/src/bin/pgbench/t/002_serialization_and_deadlock_failures.pl
new file mode 100644
index 0000000..4849aee
--- /dev/null
+++ b/src/bin/pgbench/t/002_serialization_and_deadlock_failures.pl
@@ -0,0 +1,459 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 57;
+
+use constant
+{
+	READ_COMMITTED  => 0,
+	REPEATABLE_READ => 1,
+	SERIALIZABLE    => 2,
+};
+
+my @isolation_level_sql = ('read committed', 'repeatable read', 'serializable');
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# Test concurrent update in table row with different default transaction
+# isolation levels.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"BEGIN;\n"
+	  . "\\set delta random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "END;");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "SELECT pg_sleep(20);\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "END;");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "\\set delta1 random(-5000, 5000)\n"
+	  . "\\set delta2 random(-5000, 5000)\n"
+	  . "UPDATE xy SET y = y + :delta2 WHERE x = 2;\n"
+	  . "UPDATE xy SET y = y + :delta1 WHERE x = 1;\n"
+	  . "END;");
+
+sub test_pgbench_serialization_failures
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open the psql session and run the parallel transaction:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql = "update xy set y = y + 1 where x = 1;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --debug --file),
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Let pgbench run the update command in the transaction:
+	sleep 10;
+
+	# In psql, commit the transaction and end the session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 10/10},
+		"concurrent update: $isolation_level_sql: check processed transactions");
+
+	my $regex =
+		($isolation_level == READ_COMMITTED)
+	  ? qr{^((?!number of failures)(.|\n))*$}
+	  : qr{number of failures: [1-9]\d* \([1-9]\d*\.\d* %\)};
+
+	like($out_pgbench,
+		$regex,
+		"concurrent update: $isolation_level_sql: check failures");
+
+	$regex =
+		($isolation_level == READ_COMMITTED)
+	  ? qr{^((?!client 0 got a serialization failure \(try 1/1\))(.|\n))*$}
+	  : qr{client 0 got a serialization failure \(try 1/1\)};
+
+	like($err_pgbench,
+		$regex,
+		"concurrent update: $isolation_level_sql: check serialization failure");
+}
+
+sub test_pgbench_serialization_failures_retry
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $stderr);
+
+	# Open the psql session and run the parallel transaction:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql ="begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql = "update xy set y = y + 1 where x = 1;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --debug --max-tries 2 --file),
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$stderr;
+
+	# Let pgbench run the update command in the transaction:
+	sleep 10;
+
+	# In psql, commit the transaction and end the session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 10/10},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of failures)(.|\n))*$},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check failures");
+
+	my $pattern =
+		"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = 1;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a serialization failure \\(try 1/2\\)\n"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 WHERE x = 1;";
+
+	like($stderr,
+		qr{$pattern},
+		"concurrent update with retrying: "
+	  . $isolation_level_sql
+	  . ": check the retried transaction");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Run first pgbench
+	my @command1 = (
+		qw(pgbench --no-vacuum --debug --transactions 1 --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Let pgbench run first update command in the transaction:
+	sleep 10;
+
+	# Run second pgbench
+	my @command2 = (
+		qw(pgbench --no-vacuum --debug --transactions 1 --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Get all pgbench results
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": pgbench 1: check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": pgbench 2: check processed transactions");
+
+	# First or second pgbench should get a deadlock error
+	like($out1 . $out2,
+		qr{number of failures: 1 \(100\.000 %\)},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": check failures");
+
+	like($err1 . $err2,
+		qr{client 0 got a deadlock failure \(try 1/1\)},
+		"concurrent deadlock update: "
+	  . $isolation_level_sql
+	  . ": check deadlock failure");
+}
+
+sub test_pgbench_deadlock_failures_retry
+{
+	my ($isolation_level) = @_;
+
+	my $isolation_level_sql = $isolation_level_sql[$isolation_level];
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Run first pgbench
+	my @command1 = (
+		qw(pgbench --no-vacuum --debug --transactions 1  --max-tries 2 --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Let pgbench run first update command in the transaction:
+	sleep 10;
+
+	# Run second pgbench
+	my @command2 = (
+		qw(pgbench --no-vacuum --debug --transactions 1  --max-tries 2 --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Get all pgbench results
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 1: check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": pgbench 2: check processed transactions");
+
+	like($out1 . $out2,
+		qr{^((?!number of failures)(.|\n))*$},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": check failures");
+
+	# First or second pgbench should get a deadlock error
+	like($err1 . $err2,
+		qr{client 0 got a deadlock failure \(try 1/2\)},
+		"concurrent deadlock update with retrying: "
+	  . $isolation_level_sql
+	  . ": check deadlock failure");
+
+	if ($isolation_level == READ_COMMITTED)
+	{
+		my $pattern =
+			"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = (\\d);\n"
+		  . "(client 0 receiving\n)+"
+		  . "(|client 0 sending SELECT pg_sleep\\(20\\);\n)"
+		  . "\\g3*"
+		  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = (\\d);\n"
+		  . "\\g3+"
+		  . "client 0 got a deadlock failure \\(try 1/2\\)\n"
+		  . "client 0 sending END;\n"
+		  . "\\g3+"
+		  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+		  . "client 0 sending BEGIN;\n"
+		  . "\\g3+"
+		  . "client 0 executing \\\\set delta1\n"
+		  . "client 0 executing \\\\set delta2\n"
+		  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 WHERE x = \\g2;\n"
+		  . "\\g3+"
+		  . "\\g4"
+		  . "\\g3*"
+		  . "client 0 sending UPDATE xy SET y = y \\+ \\g5 WHERE x = \\g6;\n";
+
+		like($err1 . $err2,
+			qr{$pattern},
+			"concurrent deadlock update with retrying: "
+		  . $isolation_level_sql
+		  . ": check the retried transaction");
+	}
+}
+
+test_pgbench_serialization_failures(READ_COMMITTED);
+test_pgbench_serialization_failures(REPEATABLE_READ);
+test_pgbench_serialization_failures(SERIALIZABLE);
+
+test_pgbench_serialization_failures_retry(REPEATABLE_READ);
+test_pgbench_serialization_failures_retry(SERIALIZABLE);
+
+test_pgbench_deadlock_failures(READ_COMMITTED);
+test_pgbench_deadlock_failures(REPEATABLE_READ);
+test_pgbench_deadlock_failures(SERIALIZABLE);
+
+test_pgbench_deadlock_failures_retry(READ_COMMITTED);
+test_pgbench_deadlock_failures_retry(REPEATABLE_READ);
+test_pgbench_deadlock_failures_retry(SERIALIZABLE);
-- 
1.9.1

#38

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Marina Polyakova (#37)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hi,

On 2017-07-21 19:32:02 +0300, Marina Polyakova wrote:

Here is the third version of the patch for pgbench thanks to Fabien Coelho
comments. As in the previous one, transactions with serialization and
deadlock failures are rolled back and retried until they end successfully or
their number of tries reaches maximum.

Just had a need for this feature, and took this to a short test
drive. So some comments:
- it'd be useful to display a retry percentage of all transactions,
similar to what's displayed for failed transactions.
- it appears that we now unconditionally do not disregard a connection
after a serialization / deadlock failure. Good. But that's useful far
beyond just deadlocks / serialization errors, and should probably be exposed.
- it'd be useful to also conveniently display the number of retried
transactions, rather than the total number of retries.

Nice feature!

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Alexander Korotkov

a.korotkov@postgrespro.ru

over 8 years ago

In reply to: Andres Freund (#38)

Re: WIP Patch: Pgbench Serialization and deadlock errors

On Fri, Aug 11, 2017 at 10:50 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-07-21 19:32:02 +0300, Marina Polyakova wrote:

Here is the third version of the patch for pgbench thanks to Fabien

Coelho

comments. As in the previous one, transactions with serialization and
deadlock failures are rolled back and retried until they end

successfully or

their number of tries reaches maximum.

Just had a need for this feature, and took this to a short test
drive. So some comments:
- it'd be useful to display a retry percentage of all transactions,
similar to what's displayed for failed transactions.
- it appears that we now unconditionally do not disregard a connection
after a serialization / deadlock failure. Good. But that's useful far
beyond just deadlocks / serialization errors, and should probably be
exposed.

Yes, it would be nice to don't disregard a connection after other errors
too. However, I'm not sure if we should retry the *same* transaction on
errors beyond deadlocks / serialization errors. For example, in case of
division by zero or unique violation error it would be more natural to give
up with current transaction and continue with next one.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#40

Marina Polyakova

m.polyakova@postgrespro.ru

over 8 years ago

In reply to: Andres Freund (#38)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hi,

Hello!

Just had a need for this feature, and took this to a short test
drive. So some comments:
- it'd be useful to display a retry percentage of all transactions,
similar to what's displayed for failed transactions.

- it'd be useful to also conveniently display the number of retried
transactions, rather than the total number of retries.

Ok!

- it appears that we now unconditionally do not disregard a connection
after a serialization / deadlock failure. Good. But that's useful far
beyond just deadlocks / serialization errors, and should probably be
exposed.

I agree that it will be useful. But how do you propose to print the
results if there are many types of errors? I'm afraid that the progress
report can be very long although it is expected that it will be rather
short [1]/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre. The per statement report can also be very long..

Nice feature!

Thanks and thank you for your comments :)

[1]: /messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre
/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Fabien COELHO

coelho@cri.ensmp.fr

over 8 years ago

In reply to: Marina Polyakova (#37)

2 attachment(s)

Re: WIP Patch: Pgbench Serialization and deadlock errors

Hello,

Here is the third version of the patch for pgbench thanks to Fabien Coelho
comments. As in the previous one, transactions with serialization and
deadlock failures are rolled back and retried until they end successfully or
their number of tries reaches maximum.

Here is some partial review.

Patch applies cleanly.

It compiles with warnings, please fix them:

pgbench.c:2624:28: warning: οΏ½failure_statusοΏ½ may be used uninitialized in this function
pgbench.c:2697:34: warning: οΏ½commandοΏ½ may be used uninitialized in this function

I do not think that the error handling feature needs preeminence in the
final report, compare to scale, number of clients and so. The number
of tries should be put further on.

I would spell "number of tries" instead of "tries number" which seems to
suggest that each try is attributed a number. "sql" -> "SQL".

For the per statement latency final report, I do not think it is worth
distinguishing the kind of retry at this level, because ISTM that
serialization & deadlocks are unlikely to appear simultaneously. I would
just report total failures and total tries on this report. We only have 2
errors now, but if more are added I'm pretty sure that we would not want
to have more columns... Moreover the 25 characters alignment is ugly,
better use a much smaller alignment.

I'm okay with having details shown in the "log to file" group report.
The documentation does not seem consistent. It discusses "the very last fields"
and seem to suggest that there are two, but the example trace below just
adds one field.

If you want a paragraph you should add <para>, skipping a line does not
work (around "All values are computed for ...").

I do not understand the second note of the --max-tries documentation.
It seems to suggest that some script may not end their own transaction...
which should be an error in my opinion? Some explanations would be welcome.

I'm not sure that "Retries" deserves a type of its own for two counters.
The "retries" in RetriesState may be redundant with these.
The failures are counted on simple counters while retries have a type,
this is not consistent. I suggest to just use simple counters everywhere.

I'm ok with having the detail report tell about failures & retries only
when some occured.

typo: sucessufully -> successfully

If a native English speaker could provide an opinion on that, and more
generally review the whole documentation, it would be great.

I think that the rand functions should really take a random_state pointer
argument, not a Thread or Client.

I'm at odds that FailureStatus does not have a clean NO_FAILURE state,
and that it is merged with misc failures.

I'm not sure that initRetries, mergeRetries, getAllRetries really
deserve a function.

I do not thing that there should be two accum Functions. Just extend
the existing one, and adding zero to zero is not a problem.

I guess that in the end pgbench & psql variables will have to be merged
if pgbench expression engine is to be used by psql as well, but this is
not linked to this patch.

The tap tests seems over-complicated and heavy with two pgbench run in
parallel... I'm not sure we really want all that complexity for this
somehow small feature. Moreover pgbench can run several scripts, I'm not
sure why two pgbench would need to be invoked. Could something much
simpler and lighter be proposed instead to test the feature?

The added code does not conform to Pg C style. For instance, if brace
should be aligned to the if. Please conform the project style.

The is_transaction_block_end seems simplistic. ISTM that it would not
work with compound commands. It should be clearly documented somewhere.

Also find attached two scripts I used for some testing:

psql < dl_init.sql
pgbench -f dl_trans.sql -c 8 -T 10 -P 1

--
Fabien.

#42

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Fabien COELHO (#41)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello,

Hi! I'm very sorry that I did not answer for so long, I was very busy in
the release of Postgres Pro 10 :(

Here is the third version of the patch for pgbench thanks to Fabien
Coelho comments. As in the previous one, transactions with
serialization and deadlock failures are rolled back and retried until
they end successfully or their number of tries reaches maximum.

Here is some partial review.

Thank you very much for it!

It compiles with warnings, please fix them:

pgbench.c:2624:28: warning: ‘failure_status’ may be used
uninitialized in this function
pgbench.c:2697:34: warning: ‘command’ may be used uninitialized in
this function

Ok!

I do not think that the error handling feature needs preeminence in the
final report, compare to scale, number of clients and so. The number
of tries should be put further on.

I added it here only because both this field and field "transaction
type" are transaction characteristics. I have some doubts where to add
it. On the one hand, the number of clients, the number of transactions
per client and the number of transactions actually processed form a good
logical block which I don't want to divide. On the other hand, the
number of clients and the number of transactions per client are
parameters, but the number of transactions actually processed is one of
the program results. Where, in your opinion, would it be better to add
the maximum number of transaction tries?

I would spell "number of tries" instead of "tries number" which seems
to
suggest that each try is attributed a number. "sql" -> "SQL".

Ok!

For the per statement latency final report, I do not think it is worth
distinguishing the kind of retry at this level, because ISTM that
serialization & deadlocks are unlikely to appear simultaneously. I
would just report total failures and total tries on this report. We
only have 2 errors now, but if more are added I'm pretty sure that we
would not want to have more columns...

Thanks, I agree with you.

Moreover the 25 characters
alignment is ugly, better use a much smaller alignment.

The variables for the numbers of failures and retries are of type int64
since the variable for the total number of transactions has the same
type. That's why such a large alignment (as I understand it now, enough
20 characters). Do you prefer floating alignemnts, depending on the
maximum number of failures/retries for any command in any script?

I'm okay with having details shown in the "log to file" group report.

I think that the output format of retries statistics should be same
everywhere, so I would just like to output the total number of retries
here.

The documentation does not seem consistent. It discusses "the very last
fields"
and seem to suggest that there are two, but the example trace below
just
adds one field.

I'm sorry, I do not understand what you are talking about. I used
commands and the files from the end of your message ("psql <
dl_init.sql" and "pgbench -f dl_trans.sql -c 8 -T 10 -P 1"), and I got
this output from pgbench:

starting vacuum...ERROR: relation "pgbench_branches" does not exist
(ignoring this error and continuing anyway)
ERROR: relation "pgbench_tellers" does not exist
(ignoring this error and continuing anyway)
ERROR: relation "pgbench_history" does not exist
(ignoring this error and continuing anyway)
end.
progress: 1.0 s, 14.0 tps, lat 9.094 ms stddev 5.304
progress: 2.0 s, 25.0 tps, lat 284.934 ms stddev 450.692, 1 failed
progress: 3.0 s, 21.0 tps, lat 337.942 ms stddev 473.210, 1 failed
progress: 4.0 s, 11.0 tps, lat 459.041 ms stddev 499.908, 2 failed
progress: 5.0 s, 28.0 tps, lat 220.219 ms stddev 411.390, 2 failed
progress: 6.0 s, 5.0 tps, lat 402.695 ms stddev 492.526, 2 failed
progress: 7.0 s, 24.0 tps, lat 343.249 ms stddev 626.181, 2 failed
progress: 8.0 s, 14.0 tps, lat 505.396 ms stddev 501.836, 1 failed
progress: 9.0 s, 40.0 tps, lat 180.080 ms stddev 381.335, 1 failed
progress: 10.0 s, 1.0 tps, lat 0.000 ms stddev 0.000, 1 failed
transaction type: dl_trans.sql
transaction maximum tries number: 1
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 10 s
number of transactions actually processed: 191
number of failures: 14 (7.330 %)
latency average = 356.701 ms
latency stddev = 564.942 ms
tps = 18.735807 (including connections establishing)
tps = 18.744898 (excluding connections establishing)

As I understand it, in the documentation "the very last fields" refer to
the aggregation logging which is not used here. So what's the problem?

If you want a paragraph you should add <para>, skipping a line does not
work (around "All values are computed for ...").

Sorry, thanks =[

I do not understand the second note of the --max-tries documentation.
It seems to suggest that some script may not end their own
transaction...
which should be an error in my opinion? Some explanations would be
welcome.

As you told me here [1]/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre, "I disagree about exit in ParseScript if the
transaction block is not completed <...> and would break an existing
feature.". Maybe it's be better to say this:

In pgbench you can use scripts in which the transaction blocks do not
end. Be careful in this case because transactions that span over more
than one script are not rolled back and will not be retried in case of
an error. In such cases, the script in which the error occurred is
reported as failed.

I'm not sure that "Retries" deserves a type of its own for two
counters.

Ok!

The "retries" in RetriesState may be redundant with these.

The "retries" in RetriesState have a different goal: they sum up not all
the retries during the execution of the current script but the retries
for the current transaction.

The failures are counted on simple counters while retries have a type,
this is not consistent. I suggest to just use simple counters
everywhere.

Ok!

I'm ok with having the detail report tell about failures & retries only
when some occured.

Ok!

typo: sucessufully -> successfully

Thanks! =[

If a native English speaker could provide an opinion on that, and more
generally review the whole documentation, it would be great.

I agree with you))

I think that the rand functions should really take a random_state
pointer
argument, not a Thread or Client.

Thanks, I agree.

I'm at odds that FailureStatus does not have a clean NO_FAILURE state,
and that it is merged with misc failures.

:) It is funny but for the code it really did not matter)

I'm not sure that initRetries, mergeRetries, getAllRetries really
deserve a function.

Ok!

I do not thing that there should be two accum Functions. Just extend
the existing one, and adding zero to zero is not a problem.

Ok!

I guess that in the end pgbench & psql variables will have to be merged
if pgbench expression engine is to be used by psql as well, but this is
not linked to this patch.

Ok!

The tap tests seems over-complicated and heavy with two pgbench run in
parallel... I'm not sure we really want all that complexity for this
somehow small feature. Moreover pgbench can run several scripts, I'm
not
sure why two pgbench would need to be invoked. Could something much
simpler and lighter be proposed instead to test the feature?

Firstly, two pgbench need to be invoked because we don't know which of
them will get a deadlock failure. Secondly, I tried much simplier tests
but all of them failed sometimes although everything was ok:
- tests in which pgbench runs 5 clients and 10 transactions per client
for a serialization/deadlock failure on any client (sometimes there are
no failures when it is expected that they will be)
- tests in which pgbench runs 30 clients and 400 transactions per client
for a serialization/deadlock failure on any client (sometimes there are
no failures when it is expected that they will be)
- tests in which the psql session starts concurrently and you use sleep
commands to wait pgbench for 10 seconds (sometimes it does not work)
Only advisory locks help me not to get such errors in the tests :(

The added code does not conform to Pg C style. For instance, if brace
should be aligned to the if. Please conform the project style.

I'm sorry, thanks =[

The is_transaction_block_end seems simplistic. ISTM that it would not
work with compound commands. It should be clearly documented somewhere.

Thanks, I'll fix it.

Also find attached two scripts I used for some testing:

psql < dl_init.sql
pgbench -f dl_trans.sql -c 8 -T 10 -P 1

[1]: /messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre
/messages/by-id/alpine.DEB.2.20.1707121142300.12795@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#43

Teodor Sigaev

teodor@sigaev.ru

almost 8 years ago

In reply to: Marina Polyakova (#1)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

I suggest a patch where pgbench client sessions are not disconnected because of
serialization or deadlock failures and these failures are mentioned in reports.
In details:
- transaction with one of these failures continue run normally, but its result
is rolled back;
- if there were these failures during script execution this "transaction" is marked
appropriately in logs;
- numbers of "transactions" with these failures are printed in progress, in
aggregation logs and in the end with other results (all and for each script);

Hm, I took a look on both thread about patch and it seems to me now it's
overcomplicated. With recently committed enhancements of pgbench (\if, \when) it
becomes close to impossible to retry transation in case of failure. So, initial
approach just to rollback such transaction looks more attractive.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#44

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Teodor Sigaev (#43)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hm, I took a look on both thread about patch and it seems to me now it's
overcomplicated. With recently committed enhancements of pgbench (\if,
\when) it becomes close to impossible to retry transation in case of
failure. So, initial approach just to rollback such transaction looks
more attractive.

Yep.

I think that the best approach for now is simply to reset (command zero,
random generator) and start over the whole script, without attempting to
be more intelligent. The limitations should be clearly documented (one
transaction per script), though. That would be a significant enhancement
already.

--
Fabien.

#45

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Fabien COELHO (#44)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 25-03-2018 15:23, Fabien COELHO wrote:

Hm, I took a look on both thread about patch and it seems to me now
it's overcomplicated. With recently committed enhancements of pgbench
(\if, \when) it becomes close to impossible to retry transation in
case of failure. So, initial approach just to rollback such
transaction looks more attractive.

Yep.

Many thanks to both of you! I'm working on a patch in this direction..

I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.

I'm not sure that we can always do this, because we can get new errors
until we finish the failed transaction block, and we need destroy the
conditional stack..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#46

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Marina Polyakova (#45)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Many thanks to both of you! I'm working on a patch in this direction..

I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.

I'm not sure that we can always do this, because we can get new errors until
we finish the failed transaction block, and we need destroy the conditional
stack..

Sure. I'm suggesting so as to simplify that on failures the retry would
always restarts from the beginning of the script by resetting everything,
indeed including the conditional stack, the random generator state, the
variable values, and so on.

This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the result
becomes unpredictable on errors (eg some sub-transactions could be
executed more than once).

Then if more is needed, that could be for another patch.

--
Fabien.

#47

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Fabien COELHO (#46)

1 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 26-03-2018 18:53, Fabien COELHO wrote:

Hello Marina,

Hello!

Many thanks to both of you! I'm working on a patch in this direction..

I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.

I'm not sure that we can always do this, because we can get new errors
until we finish the failed transaction block, and we need destroy the
conditional stack..

Sure. I'm suggesting so as to simplify that on failures the retry
would always restarts from the beginning of the script by resetting
everything, indeed including the conditional stack, the random
generator state, the variable values, and so on.

This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the
result becomes unpredictable on errors (eg some sub-transactions could
be executed more than once).

Then if more is needed, that could be for another patch.

Here is the fifth version of the patch for pgbench (based on the commit
4b9094eb6e14dfdbed61278ea8e51cc846e43579) where I tried to implement
these ideas, thanks to your comments and those of Teodor Sigaev. Since
we may need to execute commands to complete a failed transaction block,
the script is now always executed completely. If there is a
serialization/deadlock failure which can be retried, the script is
executed again with the same random state and array of variables as
before its first run. Meta commands errors as well as all SQL errors do
not cause the aborting of the client. The first failure in the current
script execution determines whether the script run will be retried or
not, so only such failures (they have a retry) or errors (they are not
retried) are reported.

I tried to make fixes in accordance with your previous reviews ([1]/messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre,
[2]: /messages/by-id/alpine.DEB.2.20.1801121309300.10810@lancre

I'm unclear about the added example added in the documentation. There
are 71% errors, but 100% of transactions are reported as processed. If
there were errors, then it is not a success, so the transaction were
not
processed? To me it looks inconsistent. Also, while testing, it seems
that
failed transactions are counted in tps, which I think is not
appropriate:

About the feature:

sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
starting vacuum...end.
progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
# NOT 10845.8 TPS...
progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
...
number of transactions actually processed: 32028 # NO!
number of errors: 30969 (96.694 %)
latency average = 2.833 ms
latency stddev = 1.508 ms
tps = 10666.720870 (including connections establishing) # NO
tps = 10683.034369 (excluding connections establishing) # NO
...

For me this is all wrong. I think that the tps report is about
transactions
that succeeded, not mere attempts. I cannot say that a transaction
which aborted
was "actually processed"... as it was not.

Fixed

The order of reported elements is not logical:

maximum number of transaction tries: 100
scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
latency average = 9.630 ms
latency stddev = 13.366 ms
number of transactions retried: 623 (64.426 %)
number of retries: 32272

I would suggest to group everything about error handling in one block,
eg something like:

scaling factor: 10
query mode: prepared
number of clients: 4
number of threads: 2
duration: 3 s
number of transactions actually processed: 967
number of errors: 152 (15.719 %)
number of transactions retried: 623 (64.426 %)
number of retries: 32272
maximum number of transaction tries: 100
latency average = 9.630 ms
latency stddev = 13.366 ms

Fixed

Also, percent character should be stuck to its number: 15.719% to have
the style more homogeneous (although there seems to be pre-existing
inhomogeneities).

I would replace "transaction tries/retried" by "tries/retried",
everything
is about transactions in the report anyway.

Without reading the documentation, the overall report semantics is
unclear,
especially given the absurd tps results I got with the my first
attempt,
as failing transactions are counted as "processed".

Fixed

About the code:

I'm at lost with the 7 states added to the automaton, where I would
have hoped
that only 2 (eg RETRY & FAIL, or even less) would be enough.

Fixed

I'm wondering whether the whole feature could be simplified by
considering that one script is one "transaction" (it is from the
report point of view at least), and that any retry is for the full
script only, from its beginning. That would remove the trying to guess
at transactions begin or end, avoid scanning manually for subcommands,
and so on.
- Would it make sense?
- Would it be ok for your use case?

Fixed

The proposed version of the code looks unmaintainable to me. There are
3 levels of nested "switch/case" with state changes at the deepest
level.
I cannot even see it on my screen which is not wide enough.

Fixed

There should be a typedef for "random_state", eg something like:

typedef struct { unsigned short data[3]; } RandomState;

Please keep "const" declarations, eg "commandFailed".

I think that choosing script should depend on the thread random state,
not
the client random state, so that a run would generate the same pattern
per
thread, independently of which client finishes first.

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there
and should just be reused.

Fixed

I agree that function naming style is a already a mess, but I think
that
new functions you add should use a common style, eg "is_compound" vs
"canRetry".

Fixed

Translating error strings to their enum should be put in a function.

Removed

I'm not sure this whole thing should be done anyway.

The processing of compound commands is removed.

The "node" is started but never stopped.

Fixed

For file contents, maybe the << 'EOF' here-document syntax would help
instead
of using concatenated backslashed strings everywhere.

I'm sorry, but I could not get it to work with regular expressions :(

I'd start by stating (i.e. documenting) that the features assumes that
one
script is just *one* transaction.

Note that pgbench somehow already assumes that one script is one
transaction when it reports performance anyway.

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Fixed

Under these restrictions, ISTM that a retry is something like:

case ABORTED:
if (we want to retry) {
// do necessary stats
// reset the initial state (random, vars, current command)
state = START_TX; // loop
}
else {
// count as failed...
state = FINISHED; // or done.
}
break;

...

I'm fine with having END_COMMAND skipping to START_TX if it can be done
easily and cleanly, esp without code duplication.

I did not want to add the additional if-expressions possibly to most of
the code in CSTATE_START_TX/CSTATE_END_TX/CSTATE_END_COMMAND, so
CSTATE_FAILURE is used instead of CSTATE_END_COMMAND in case of failure,
and CSTATE_RETRY is called before CSTATE_END_TX if there was a failure
during the current script execution.

ISTM that ABORTED & FINISHED are currently exactly the same. That would
put a particular use to aborted. Also, there are many points where the
code may go to "aborted" state, so reusing it could help avoid
duplicating
stuff on each abort decision.

To end and rollback the failed transaction block the script is always
executed completely, and after the failure the following script command
is executed..

[1]: /messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre
/messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre
[2]: /messages/by-id/alpine.DEB.2.20.1801121309300.10810@lancre
/messages/by-id/alpine.DEB.2.20.1801121309300.10810@lancre
[3]: /messages/by-id/alpine.DEB.2.20.1801121607310.13422@lancre
/messages/by-id/alpine.DEB.2.20.1801121607310.13422@lancre

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v5-0001-Pgbench-errors-and-serialization-deadlock-retries.patchtext/x-diff; name=v5-0001-Pgbench-errors-and-serialization-deadlock-retries.patchDownload

From c31d17661e38db3ae64906dfb7f6ed2c485cdb58 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 27 Mar 2018 17:46:43 +0300
Subject: [PATCH v5] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of tries
reaches maximum. You can set the maximum number of tries by using the
appropriate benchmarking option (--max-tries). The default value is 1.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block does not terminate in the current script, the
commands of the following scripts are processed as usual so you can get a lot of
errors of type "in failed SQL transaction" (when the current SQL transaction is
aborted and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

If you want to distinguish between failures or errors by type, use the pgbench
debugging output created with the option --debug and with the debugging level
"fails" or "all". The first variant is recommended for this purpose because with
in the second case the debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  304 +++++-
 src/bin/pgbench/pgbench.c                          | 1148 ++++++++++++++++----
 src/bin/pgbench/t/001_pgbench_with_server.pl       |   25 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl         |    2 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  815 ++++++++++++++
 5 files changed, 2022 insertions(+), 272 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 41d9030..e105470 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,19 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
-  and intended (the latter being just the product of number of clients
+  The first six lines and the eighth line report some of the most important
+  parameter settings.  The seventh line reports the number of transactions
+  completed and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  (see <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+  for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -380,11 +383,28 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
-      <term><option>-d</option></term>
-      <term><option>--debug</option></term>
+      <term><option>-d</option> <replaceable>debug_level</replaceable></term>
+      <term><option>--debug=</option><replaceable>debug_level</replaceable></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. You can use the following debugging levels:
+          <itemizedlist>
+           <listitem>
+            <para><literal>no</literal>: no debugging output (except built-in
+            function <function>debug</function>, see <xref
+            linkend="pgbench-functions"/>).</para>
+           </listitem>
+           <listitem>
+            <para><literal>fails</literal>: print only error messages and
+            failures (see <xref linkend="errors-and-retries"
+            endterm="errors-and-retries-title"/> for more information).</para>
+           </listitem>
+           <listitem>
+            <para><literal>all</literal>: print all debugging output
+            (throttling, executed/sent/received commands etc.).</para>
+           </listitem>
+          </itemizedlist>
+        The default is no debugging output.
        </para>
       </listitem>
      </varlistentry>
@@ -513,22 +533,37 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the <option>--max-tries</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -667,6 +702,35 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. The default is 1.
+       </para>
+       <note>
+        <para>
+         In <application>pgbench</application> it is usually assumed that one
+         transaction script contains only one transaction (see <xref
+         linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"/> for more information). Be
+         careful when repeating scripts that contain multiple transactions: the
+         script is always retried completely, so the successful transactions can
+         be performed several times.
+        </para>
+       </note>
+       <note>
+        <para>
+         Be careful when repeating transactions with shell commands. Unlike the
+         results of SQL commands, the results of shell commands are not rolled
+         back, except for the variable value of the <command>\setshell</command>
+         command.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +871,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1647,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1668,15 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>). If the transaction ended with an error "in
+   failed SQL transaction", its <replaceable>time</replaceable> will be reported
+   as <literal>in_failed_tx</literal>. If the transaction ended with other
+   error, its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1706,23 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10:
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1737,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1751,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1765,27 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>). They report the number of retried
+   transactions and the sum of all the retries after serialization or deadlock
+   failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1695,15 +1797,54 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
   </para>
 
+   <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
+   </para>
+
   <para>
    For the default script, the output will look similar to this:
 <screen>
@@ -1715,6 +1856,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
@@ -1732,10 +1874,49 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 3988/10000
+number of errors: 6012 (60.120%)
+number of retried: 8113 (81.130%)
+number of retries: 655869
+maximum number of tries: 100
+latency average = 345.979 ms
+latency stddev = 637.964 ms
+tps = 8.203884 (including connections establishing)
+tps = 8.203969 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.003     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.312     0       0  BEGIN;
+  0.866     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.698     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.965  5983  648829  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.886    29    7029  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.960     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.009     0      11  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1930,63 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally until the end of the
+   current script execution (it is assumed that one transaction script contains
+   only one transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated until they complete successfully or reach the maximum number of
+   tries specified by the <option>--max-tries</option> option. If the last
+   transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    If a failed transaction block does not terminate in the current script, the
+    commands of the following scripts are processed as usual so you can get a
+    lot of errors of type "in failed SQL transaction" (when the current SQL
+    transaction is aborted and commands ignored until end of transaction block).
+    In such cases you can use separate statistics of these errors in all
+    reports.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use option
+   <option>--max-tries</option> to make it possible). The per-statement report
+   inherits all columns from the main report. Note that if a failure/error
+   occurs, the following failures/errors in the current script execution are not
+   shown in the reports. The retry is only reported for the first command where
+   the failure occured during the current script execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type, use the
+   <application>pgbench</application> debugging output created with the option
+   <option>--debug</option> and with the debugging level
+   <literal>fails</literal> or <literal>all</literal>. The first variant is
+   recommended for this purpose because with in the second case the debugging
+   output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8529e7d..c4e2436 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -186,8 +189,13 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
+int			max_tries = 1;		/* maximum number of tries to run the
+								 * transaction with serialization or deadlock
+								 * failures */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -242,14 +250,66 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for thread/client random seed.
+ */
+typedef struct RandomState
+{
+	unsigned short data[3];
+} RandomState;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION,
+	ANOTHER_FAILURE
+} FailureStatus;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -304,6 +364,19 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, report the failure in CSTATE_FAILURE. Then process other commands
+	 * of the failed transaction if any and go to CSTATE_RETRY. If we can
+	 * re-execute the transaction from the very beginning, set the same
+	 * parameters for the transaction execution as in the previous tries and
+	 * process the first transaction command in CSTATE_START_COMMAND. Otherwise,
+	 * go to CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -329,14 +402,13 @@ typedef struct
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
+	RandomState random_state;	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -346,6 +418,16 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	FailureStatus first_failure;	/* the status of the first failure in the
+									 * current transaction execution; NO_FAILURE
+									 * if there were no failures or errors */
+	RetryState  retry_state;
+	int			retries;		/* is less than max_tries */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -389,7 +471,7 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+	RandomState random_state; 	/* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -445,6 +527,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 } Command;
 
 typedef struct ParsedScript
@@ -460,7 +546,17 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum Debuglevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_FAILS,				/* print only error messages and failures */
+	DEBUG_ALL,					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+	NUM_DEBUGLEVEL
+} Debuglevel;
+
+static Debuglevel debug_level = NO_DEBUG;	/* debug flag */
+static const char *DEBUGLEVEl[] = {"no", "fails", "all"};
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -508,6 +604,15 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+typedef enum FailStatus
+{
+	TX_FAILURE,					/* the transaction will be re-executed from the
+								 * very beginning */
+	IN_FAILED_TX,				/* continue the failed transaction */
+	TX_ERROR,					/* the transaction will be marked as failed */
+	CLIENT_ABORTED				/* the cliend is aborted */
+} FailStatus;
+
 
 /* Function prototypes */
 static void setNullValue(PgBenchValue *pv);
@@ -572,7 +677,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -581,11 +686,12 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug              print debugging output\n"
+		   "  -d, --debug=no|fails|all print debugging output (default: no)\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -693,7 +799,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -704,7 +810,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->data));
 }
 
 /*
@@ -713,7 +819,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -723,7 +830,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -736,7 +843,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -764,8 +872,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->data);
+		double		rand2 = 1.0 - pg_erand48(random_state->data);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -792,7 +900,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -801,7 +909,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -879,7 +987,7 @@ zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -890,8 +998,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->data);
+		v = pg_erand48(random_state->data);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -909,10 +1017,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, RandomState *random_state, int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state->data);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -924,7 +1033,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -933,8 +1043,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					? computeIterativeZipfian(random_state, n, s)
+					: computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -1034,6 +1144,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1042,8 +1156,24 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus first_error, int64 retries)
 {
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* failed transactions are processed separatly */
+	if (first_error != NO_FAILURE)
+	{
+		stats->errors++;
+
+		if (first_error == IN_FAILED_SQL_TRANSACTION)
+			stats->errors_in_failed_tx++;
+
+		return;
+	}
+
 	stats->cnt++;
 
 	if (skipped)
@@ -1061,6 +1191,14 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag)
 	}
 }
 
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->data[0] = random();
+	random_state->data[1] = random();
+	random_state->data[2] = random();
+}
+
 /* call PQexec() and exit() on failure */
 static void
 executeStatement(PGconn *con, const char *sql)
@@ -1184,39 +1322,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1290,9 +1428,12 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1340,11 +1481,12 @@ valid_variable_name(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool aborted)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1355,29 +1497,32 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			if (aborted || debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
+						context, name);
+			}
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1386,12 +1531,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, true);
 	if (!var)
 		return false;
 
@@ -1409,12 +1555,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
-				  const PgBenchValue *value)
+putVariableValue(Variables *variables, const char *context, char *name,
+				  const PgBenchValue *value, bool aborted)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, aborted);
 	if (!var)
 		return false;
 
@@ -1429,12 +1575,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value, bool aborted)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val, aborted);
 }
 
 /*
@@ -1489,7 +1636,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1510,7 +1657,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1525,12 +1672,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -1565,7 +1713,11 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else /* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "cannot coerce %s to boolean\n",
+					valueTypeName(pval));
+		}
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1610,7 +1762,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1618,7 +1771,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1639,7 +1793,9 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to double\n",
+					valueTypeName(pval));
 		return false;
 	}
 }
@@ -1817,8 +1973,11 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr,
+					"too many function arguments, maximum is %d\n", MAX_FARGS);
+		}
 		return false;
 	}
 
@@ -1941,7 +2100,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								if (debug_level >= DEBUG_FAILS)
+									fprintf(stderr, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1952,7 +2112,11 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										if (debug_level >= DEBUG_FAILS)
+										{
+											fprintf(stderr,
+													"bigint out of range\n");
+										}
 										return false;
 									}
 									else
@@ -2187,20 +2351,22 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "random range is too large\n");
 					return false;
 				}
 
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2215,39 +2381,51 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"gaussian parameter must be at least %f (not %f)\n",
+										MIN_GAUSSIAN_PARAM, param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										MAX_ZIPFIAN_PARAM, param);
+							}
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, &st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"exponential parameter must be greater than zero (got %f)\n",
+										param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2346,10 +2524,13 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					if (debug_level >= DEBUG_FAILS)
+					{
+						fprintf(stderr, "undefined variable \"%s\"\n",
+								expr->u.variable.varname);
+					}
 					return false;
 				}
 
@@ -2410,7 +2591,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2441,17 +2622,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[i]);
+			}
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2468,7 +2653,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded)
+			if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 			return false;
 		}
@@ -2478,19 +2663,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded)
+		if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2500,11 +2687,14 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
+					argv[0], res);
+		}
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval, false))
 		return false;
 
 #ifdef DEBUG
@@ -2521,11 +2711,46 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(CState *st, const char *cmd, const char *message, FailStatus
+			  fail_status)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	switch (fail_status)
+	{
+		case TX_FAILURE:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d got a failure (try %d/%d) in command %d (%s) of script %d; %s\n",
+						st->id, st->retries + 1, max_tries, st->command, cmd,
+						st->use_file, message);
+			}
+			break;
+		case IN_FAILED_TX:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d continues a failed transaction in command %d (%s) of script %d; %s\n",
+						st->id, st->command, cmd, st->use_file, message);
+			}
+			break;
+		case TX_ERROR:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d got an error in command %d (%s) of script %d; %s\n",
+						st->id, st->command, cmd, st->use_file, message);
+			}
+			break;
+		case CLIENT_ABORTED:
+			fprintf(stderr,
+					"client %d aborted in command %d (%s) of script %d; %s\n",
+					st->id, st->command, cmd, st->use_file, message);
+			break;
+		default:
+			/* internal error which should never occur */
+			fprintf(stderr, "unexpected fail status %d", fail_status);
+			exit(1);
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2538,7 +2763,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2558,9 +2783,9 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
@@ -2570,9 +2795,9 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
@@ -2604,10 +2829,10 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
@@ -2617,10 +2842,9 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d could not send %s\n",
 					st->id, command->argv[0]);
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2632,17 +2856,20 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[1]);
+			}
 			return false;
 		}
 		usec = atoi(var);
@@ -2665,6 +2892,162 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+/*
+ * Copy an array of random state.
+ */
+static void
+copyRandomState(RandomState *destination, const RandomState *source)
+{
+	memcpy(destination->data, source->data, sizeof(unsigned short) * 3);
+}
+
+/*
+ * Make a deep copy of variables array.
+ */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->svalue);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->svalue)
+			current_destination->svalue = pg_strdup(current_source->svalue);
+		else
+			current_destination->svalue = NULL;
+		current_destination->value = current_source->value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, FailureStatus failure_status)
+{
+	Assert(failure_status != NO_FAILURE);
+
+	/*
+	 * All subsequent failures will be "retried" if the first failure of this
+	 * transaction can be retried.
+	 */
+	if (st->first_failure != NO_FAILURE)
+		failure_status = st->first_failure;
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!(failure_status == SERIALIZATION_FAILURE ||
+		  failure_status == DEADLOCK_FAILURE))
+		return false;
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (st->retries + 1 >= max_tries)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Return the transaction status: find out if there's a failure that can be
+ * retried, or there's an error that cannot be retired; or we continue an
+ * already failed transaction.
+ */
+static FailStatus
+getFailStatus(CState *st, FailureStatus failure_status)
+{
+	Assert(failure_status != NO_FAILURE);
+
+	if (st->first_failure == NO_FAILURE)
+		return canRetry(st, failure_status) ? TX_FAILURE : TX_ERROR;
+	else
+		return IN_FAILED_TX;
+}
+
+/*
+ * Process the conditional stack depending on the condition value; is used for
+ * the meta commands \if and \elif.
+ */
+static void
+executeCondition(CState *st, bool condition)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	/* execute or not depending on evaluated condition */
+	if (command->meta == META_IF)
+	{
+		conditional_stack_push(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+	else if (command->meta == META_ELIF)
+	{
+		/* we should get here only if the "elif" needed evaluation */
+		Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
+		conditional_stack_poke(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+			return IN_FAILED_SQL_TRANSACTION;
+	}
+
+	return ANOTHER_FAILURE;
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2675,6 +3058,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2705,7 +3089,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
 							sql_script[st->use_file].desc);
 
@@ -2715,6 +3099,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->first_failure = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2732,7 +3121,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2762,16 +3151,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->random_state,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -2779,7 +3169,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
 							st->id, wait);
 				break;
@@ -2826,6 +3216,14 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case we should repeat it in future.
+				 */
+				copyRandomState(&st->retry_state.random_state,
+								&st->random_state);
+				copyVariables(&st->retry_state.variables, &st->variables);
+
+				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
 				 */
@@ -2861,7 +3259,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (command == NULL)
 				{
-					st->state = CSTATE_END_TX;
+					if (st->first_failure == NO_FAILURE)
+					{
+						st->state = CSTATE_END_TX;
+					}
+					else
+					{
+						/* check if we can retry the failure */
+						st->state = CSTATE_RETRY;
+					}
 					break;
 				}
 
@@ -2869,7 +3275,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2880,7 +3286,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(st, "SQL", "SQL command send failed",
+									  CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -2892,7 +3299,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					if (debug_level >= DEBUG_ALL)
 					{
 						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
 						for (i = 1; i < argc; i++)
@@ -2900,6 +3307,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						fprintf(stderr, "\n");
 					}
 
+					/* change it if the meta command fails */
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -2911,10 +3321,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "sleep", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -2942,35 +3354,35 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, argv[0], "evaluation of meta-command failed",
+										  getFailStatus(st, failure_status));
+
+							/*
+							 * Do not ruin the following conditional commands,
+							 * if any.
+							 */
+							executeCondition(st, false);
+
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(&st->variables, argv[0],
+												  argv[1], &result, false))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = ANOTHER_FAILURE;
+								commandFailed(st, "set", "assignment of meta-command failed",
+											  getFailStatus(st, failure_status));
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
 						else /* if and elif evaluated cases */
 						{
-							bool cond = valueTruth(&result);
-
-							/* execute or not depending on evaluated condition */
-							if (command->meta == META_IF)
-							{
-								conditional_stack_push(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
-							else /* elif */
-							{
-								/* we should get here only if the "elif" needed evaluation */
-								Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
-								conditional_stack_poke(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
+							executeCondition(st, valueTruth(&result));
 						}
 					}
 					else if (command->meta == META_ELSE)
@@ -2999,7 +3411,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3008,8 +3422,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "setshell", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3019,7 +3435,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3028,8 +3445,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "shell", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3134,37 +3553,54 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					if (debug_level >= DEBUG_ALL)
+						fprintf(stderr, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "SQL", "perhaps the backend died while processing",
+									  CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							PQclear(res);
+							discard_response(st);
+							failure_status = NO_FAILURE;
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							failure_status = getFailureStatus(sqlState);
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  getFailStatus(st, failure_status));
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  CLIENT_ABORTED);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3193,7 +3629,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3212,6 +3648,90 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
+				 * Report the failure/error and go ahead with next command.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				Assert(failure_status != NO_FAILURE);
+
+				/*
+				 * All subsequent failures will be "retried"/"failed" if the
+				 * first failure of this transaction can be/cannot be retried.
+				 * Therefore mention only the first failure/error in the
+				 * reports.
+				 */
+				if (st->first_failure == NO_FAILURE)
+				{
+					st->first_failure = failure_status;
+
+					if (report_per_command)
+					{
+						if (canRetry(st, failure_status))
+						{
+							/*
+							 * The failed transaction will be retried. So
+							 * accumulate the retry for the command.
+							 */
+							command->retries++;
+						}
+						else
+						{
+							/*
+							 * We will not be able to retry this failed
+							 * transaction. So accumulate the error for the
+							 * command.
+							 */
+							command->errors++;
+							if (failure_status == IN_FAILED_SQL_TRANSACTION)
+								command->errors_in_failed_tx++;
+						}
+					}
+				}
+
+				/* Go ahead with next command, to be executed or skipped */
+				st->command++;
+				st->state = conditional_active(st->cstack) ?
+					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				if (canRetry(st, st->first_failure))
+				{
+					st->retries++;
+
+					if (debug_level >= DEBUG_ALL)
+					{
+						fprintf(stderr, "client %d repeats the failed transaction (try %d/%d)\n",
+								st->id,
+								st->retries + 1,
+								max_tries);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction.
+					 */
+					copyRandomState(&st->random_state,
+									&st->retry_state.random_state);
+					copyVariables(&st->variables, &st->retry_state.variables);
+
+					/* Process the first transaction command */
+					st->command = 0;
+					st->first_failure = NO_FAILURE;
+					st->state = CSTATE_START_COMMAND;
+				}
+				else
+				{
+					/* End the failed transaction */
+					st->state = CSTATE_END_TX;
+				}
+				break;
+
+				/*
 				 * End of transaction.
 				 */
 			case CSTATE_END_TX:
@@ -3232,7 +3752,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3292,7 +3813,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->random_state.data) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -3308,13 +3829,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3325,6 +3848,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3332,7 +3859,7 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->first_failure, st->retries);
 	}
 	else
 	{
@@ -3342,14 +3869,25 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->first_failure == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->first_failure == IN_FAILED_SQL_TRANSACTION)
+			fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3369,7 +3907,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->first_failure == NO_FAILURE)
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3382,7 +3920,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, latency, lag, st->first_failure,
+				   st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -3390,19 +3929,24 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, 0, 0, st->first_failure,
+				   st->retries);
 	}
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->first_failure == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->first_failure, st->retries);
 }
 
 
@@ -4535,7 +5079,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + total->errors;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4556,8 +5101,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4565,6 +5110,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors, 100.0 * total->errors / total_ntx);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total_ntx);
+
+	/* it can be non-zero only if max_tries is greater than one */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+	printf("maximum number of tries: %d\n", max_tries);
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4614,7 +5178,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4623,6 +5187,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_total_ntx = sstats->cnt + sstats->errors;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4631,9 +5196,30 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
-					   100.0 * sstats->cnt / total->cnt,
+					   100.0 * sstats->cnt / script_total_ntx,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / script_total_ntx);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors_in_failed_tx,
+						   (100.0 * sstats->errors_in_failed_tx /
+							script_total_ntx));
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4642,15 +5228,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+				{
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" : ","));
+				}
+				if (total->errors_in_failed_tx > 0)
+				{
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				}
+				if (total->retried > 0)
+				{
+					printf(" and retries");
+				}
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4658,10 +5262,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					}
+					if (total->errors_in_failed_tx > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					}
+					if (total->retried > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					}
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4720,7 +5339,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"debug", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -4735,7 +5354,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4754,6 +5373,7 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4825,7 +5445,7 @@ main(int argc, char **argv)
 	/* set random seed early, because it may be used while parsing scripts. */
 	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
 
-	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "iI:h:nvp:d:qb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -4855,8 +5475,22 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
-				break;
+				{
+					for (debug_level = 0;
+						 debug_level < NUM_DEBUGLEVEL;
+						 debug_level++)
+					{
+						if (strcmp(optarg, DEBUGLEVEl[debug_level]) == 0)
+							break;
+					}
+					if (debug_level >= NUM_DEBUGLEVEL)
+					{
+						fprintf(stderr, "invalid debug level (-d): \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+					break;
+				}
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
@@ -4908,7 +5542,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -4989,7 +5623,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5101,6 +5735,16 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				set_random_seed(optarg, "--random-seed option");
 				break;
+			case 10:			/* max-tries */
+				benchmarking_option_set = true;
+				max_tries = atoi(optarg);
+				if (max_tries <= 0)
+				{
+					fprintf(stderr, "invalid number of maximum tries: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5287,19 +5931,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
-										   var->name, &var->value))
+					if (!putVariableValue(&state[i].variables, "startup",
+										   var->name, &var->value, true))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5311,9 +5955,10 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
+	if (debug_level >= DEBUG_ALL)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -5374,11 +6019,12 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale,
+								true))
 				exit(1);
 		}
 	}
@@ -5387,15 +6033,18 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+		{
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i,
+								true))
 				exit(1);
+		}
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64	seed = ((uint64) (random() & 0xFFFF) << 48) |
 					   ((uint64) (random() & 0xFFFF) << 32) |
@@ -5403,15 +6052,17 @@ main(int argc, char **argv)
 					   (uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed, true))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed, true))
 				exit(1);
 	}
 
@@ -5444,9 +6095,7 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->random_state);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
@@ -5528,6 +6177,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5812,7 +6465,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -5839,6 +6496,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -5856,6 +6518,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -5881,6 +6548,14 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (errors > 0)
+				{
+					fprintf(stderr, ", " INT64_FORMAT " failed" , errors);
+					if (errors_in_failed_tx > 0)
+						fprintf(stderr, " (" INT64_FORMAT " in failed tx)",
+								errors_in_failed_tx);
+				}
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -5888,6 +6563,11 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (retried > 0)
+					fprintf(stderr, ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							retried, retries);
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 0929418..4ce7786 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -118,7 +118,8 @@ pgbench(
 	[   qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple} ],
+		qr{mode: simple},
+		qr{maximum number of tries: 1} ],
 	[qr{^$}],
 	'pgbench tpcb-like');
 
@@ -134,7 +135,7 @@ pgbench(
 	'pgbench simple update');
 
 pgbench(
-	'-t 100 -c 7 -M prepared -b se --debug',
+	'-t 100 -c 7 -M prepared -b se --debug all',
 	0,
 	[   qr{builtin: select only},
 		qr{clients: 7\b},
@@ -491,6 +492,10 @@ my @errors = (
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 } ],
+	[   'sql division by zero', 0, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+	SELECT 1/0;
+} ],
 
 	# SHELL
 	[   'shell bad command',               0,
@@ -621,6 +626,16 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 	[   'sleep unknown unit',         1,
 		[qr{unrecognized time unit}], q{\sleep 1 week} ],
 
+	# CONDITIONAL BLOCKS
+	[   'if elif failed conditions', 0,
+		[qr{division by zero}],
+		q{-- failed conditions
+\if 1 / 0
+\elif 1 / 0
+\else
+\endif
+} ],
+
 	# MISC
 	[   'misc invalid backslash command',         1,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand} ],
@@ -635,9 +650,11 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared -d fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 682bc22..a0c227e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -57,7 +57,7 @@ my @options = (
 
 	# name, options, stderr checks
 	[   'bad option',
-		'-h home -p 5432 -U calvin -d --bad-option',
+		'-h home -p 5432 -U calvin -d all --bad-option',
 		[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ] ],
 	[   'no file',
 		'-f no-such-file',
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..da9d7da
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,815 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 21;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	DEADLOCK_1         => 3,
+	WAIT_PGBENCH_2     => 4,
+	DEADLOCK_2         => 5,
+	TRANSACTION_ENDS_1 => 6,
+	TRANSACTION_ENDS_2 => 7,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "END;\n");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_1 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_2 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_commit_failure = $node->basedir . '/pgbench_script_commit_failure';
+append_to_file($script_commit_failure,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "SELECT pg_advisory_lock(0);\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 0/1},
+		"concurrent update: check processed transactions");
+
+	my $pattern =
+		"client 0 got an error in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update: check serialization error");
+}
+
+sub test_pgbench_serialization_failures
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update with retrying: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent update with retrying: check errors");
+
+	my $pattern =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update\n\n"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 continues a failed transaction in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g2 "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update with retrying: check the retried transaction");
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	# The first or second pgbench should get a deadlock error
+	ok(($out1 =~ /processed: 0\/1/ or $out2 =~ /processed: 0\/1/),
+		"concurrent deadlock update: check processed transactions");
+
+	ok(
+		($err1 =~ /client 0 got an error in command 3 \(SQL\) of script 0; ERROR:  deadlock detected/ or
+	     $err2 =~ /client 0 got an error in command 2 \(SQL\) of script 0; ERROR:  deadlock detected/),
+		"concurrent deadlock update: check deadlock error");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks that pgbenches will wait for:
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_1 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_1 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_1 ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_2 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_2 ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_1 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_1 ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_2 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_2 ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	my $pattern1 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	my $pattern2 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+sub test_pgbench_commit_failure
+{
+	my $isolation_level = SERIALIZABLE;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "select pg_advisory_lock(0);\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_commit_failure);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) when 0 then 'zero' else 'not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /not_zero/);
+
+	# In psql, run a parallel transaction, release advisory locks and end the
+	# session:
+
+	$in_psql = "begin\;update xy set y = y + 1 where x = 2\;end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"commit failure: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"commit failure: check errors");
+
+	my $pattern =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = 1;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(0\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 4 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to read/write dependencies among transactions\n"
+	  . "DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g2 WHERE x = 1;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(0\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"commit failure: "
+	  . "check the completion of the failed transaction block");
+}
+
+test_pgbench_serialization_errors();
+test_pgbench_serialization_failures();
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+test_pgbench_commit_failure();
+
+#done
+$node->stop;
-- 
2.7.4

#48

Teodor Sigaev

teodor@sigaev.ru

almost 8 years ago

In reply to: Marina Polyakova (#47)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Conception of max-retry option seems strange for me. if number of retries
reaches max-retry option, then we just increment counter of failed transaction
and try again (possibly, with different random numbers). At the end we should
distinguish number of error transaction and failed transaction, to found this
difference documentation suggests to rerun pgbench with debugging on.

May be I didn't catch an idea, but it seems to me max-tries should be removed.
On transaction searialization or deadlock error pgbench should increment counter
of failed transaction, resets conditional stack, variables, etc but not a random
generator and then start new transaction for the first line of script.

Marina Polyakova wrote:

On 26-03-2018 18:53, Fabien COELHO wrote:

Hello Marina,

Hello!

Many thanks to both of you! I'm working on a patch in this direction..

I think that the best approach for now is simply to reset (command
zero, random generator) and start over the whole script, without
attempting to be more intelligent. The limitations should be clearly
documented (one transaction per script), though. That would be a
significant enhancement already.

I'm not sure that we can always do this, because we can get new errors until
we finish the failed transaction block, and we need destroy the conditional
stack..

Sure. I'm suggesting so as to simplify that on failures the retry
would always restarts from the beginning of the script by resetting
everything, indeed including the conditional stack, the random
generator state, the variable values, and so on.

This mean enforcing somehow one script is one transaction.

If the user does not do that, it would be their decision and the
result becomes unpredictable on errors (eg some sub-transactions could
be executed more than once).

Then if more is needed, that could be for another patch.

Here is the fifth version of the patch for pgbench (based on the commit
4b9094eb6e14dfdbed61278ea8e51cc846e43579) where I tried to implement these
ideas, thanks to your comments and those of Teodor Sigaev. Since we may need to
execute commands to complete a failed transaction block, the script is now
always executed completely. If there is a serialization/deadlock failure which
can be retried, the script is executed again with the same random state and
array of variables as before its first run. Meta commandsО©╫ errors as well as all
SQL errors do not cause the aborting of the client. The first failure in the
current script execution determines whether the script run will be retried or
not, so only such failures (they have a retry) or errors (they are not retried)
are reported.

I tried to make fixes in accordance with your previous reviews ([1], [2], [3]):

I'm unclear about the added example added in the documentation. There
are 71% errors, but 100% of transactions are reported as processed. If
there were errors, then it is not a success, so the transaction were
not
processed? To me it looks inconsistent. Also, while testing, it seems
that
failed transactions are counted in tps, which I think is not
appropriate:

About the feature:

О©╫sh> PGOPTIONS='-c default_transaction_isolation=serializable' \
О©╫О©╫О©╫О©╫О©╫О©╫ ./pgbench -P 1 -T 3 -r -M prepared -j 2 -c 4
О©╫starting vacuum...end.
О©╫progress: 1.0 s, 10845.8 tps, lat 0.091 ms stddev 0.491, 10474 failed
О©╫# NOT 10845.8 TPS...
О©╫progress: 2.0 s, 10534.6 tps, lat 0.094 ms stddev 0.658, 10203 failed
О©╫progress: 3.0 s, 10643.4 tps, lat 0.096 ms stddev 0.568, 10290 failed
О©╫...
О©╫number of transactions actually processed: 32028 # NO!
О©╫number of errors: 30969 (96.694 %)
О©╫latency average = 2.833 ms
О©╫latency stddev = 1.508 ms
О©╫tps = 10666.720870 (including connections establishing) # NO
О©╫tps = 10683.034369 (excluding connections establishing) # NO
О©╫...

For me this is all wrong. I think that the tps report is about
transactions
that succeeded, not mere attempts. I cannot say that a transaction
which aborted
was "actually processed"... as it was not.

Fixed

The order of reported elements is not logical:

О©╫maximum number of transaction tries: 100
О©╫scaling factor: 10
О©╫query mode: prepared
О©╫number of clients: 4
О©╫number of threads: 2
О©╫duration: 3 s
О©╫number of transactions actually processed: 967
О©╫number of errors: 152 (15.719 %)
О©╫latency average = 9.630 ms
О©╫latency stddev = 13.366 ms
О©╫number of transactions retried: 623 (64.426 %)
О©╫number of retries: 32272

I would suggest to group everything about error handling in one block,
eg something like:

О©╫scaling factor: 10
О©╫query mode: prepared
О©╫number of clients: 4
О©╫number of threads: 2
О©╫duration: 3 s
О©╫number of transactions actually processed: 967
О©╫number of errors: 152 (15.719 %)
О©╫number of transactions retried: 623 (64.426 %)
О©╫number of retries: 32272
О©╫maximum number of transaction tries: 100
О©╫latency average = 9.630 ms
О©╫latency stddev = 13.366 ms

Fixed

Also, percent character should be stuck to its number: 15.719% to have
the style more homogeneous (although there seems to be pre-existing
inhomogeneities).

I would replace "transaction tries/retried" by "tries/retried",
everything
is about transactions in the report anyway.

Without reading the documentation, the overall report semantics is
unclear,
especially given the absurd tps results I got with the my first
attempt,
as failing transactions are counted as "processed".

Fixed

About the code:

I'm at lost with the 7 states added to the automaton, where I would
have hoped
that only 2 (eg RETRY & FAIL, or even less) would be enough.

Fixed

I'm wondering whether the whole feature could be simplified by
considering that one script is one "transaction" (it is from the
report point of view at least), and that any retry is for the full
script only, from its beginning. That would remove the trying to guess
at transactions begin or end, avoid scanning manually for subcommands,
and so on.
О©╫- Would it make sense?
О©╫- Would it be ok for your use case?

Fixed

The proposed version of the code looks unmaintainable to me. There are
3 levels of nested "switch/case" with state changes at the deepest
level.
I cannot even see it on my screen which is not wide enough.

Fixed

There should be a typedef for "random_state", eg something like:

О©╫ typedef struct { unsigned short data[3]; } RandomState;

Please keep "const" declarations, eg "commandFailed".

I think that choosing script should depend on the thread random state,
not
the client random state, so that a run would generate the same pattern
per
thread, independently of which client finishes first.

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there
and should just be reused.

Fixed

I agree that function naming style is a already a mess, but I think
that
new functions you add should use a common style, eg "is_compound" vs
"canRetry".

Fixed

Translating error strings to their enum should be put in a function.

Removed

I'm not sure this whole thing should be done anyway.

The processing of compound commands is removed.

The "node" is started but never stopped.

Fixed

For file contents, maybe the << 'EOF' here-document syntax would help
instead
of using concatenated backslashed strings everywhere.

I'm sorry, but I could not get it to work with regular expressions :(

I'd start by stating (i.e. documenting) that the features assumes that one
script is just *one* transaction.

Note that pgbench somehow already assumes that one script is one
transaction when it reports performance anyway.

If you want 2 transactions, then you have to put them in two scripts,
which looks fine with me. Different transactions are expected to be
independent, otherwise they should be merged into one transaction.

Fixed

Under these restrictions, ISTM that a retry is something like:

О©╫О©╫ case ABORTED:
О©╫О©╫О©╫О©╫О©╫ if (we want to retry) {
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ // do necessary stats
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ // reset the initial state (random, vars, current command)
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ state = START_TX; // loop
О©╫О©╫О©╫О©╫О©╫ }
О©╫О©╫О©╫О©╫О©╫ else {
О©╫О©╫О©╫О©╫О©╫О©╫О©╫ // count as failed...
О©╫О©╫О©╫О©╫О©╫О©╫О©╫ state = FINISHED; // or done.
О©╫О©╫О©╫О©╫О©╫ }
О©╫О©╫О©╫О©╫О©╫ break;

...

I'm fine with having END_COMMAND skipping to START_TX if it can be done
easily and cleanly, esp without code duplication.

I did not want to add the additional if-expressions possibly to most of the code
in CSTATE_START_TX/CSTATE_END_TX/CSTATE_END_COMMAND, so CSTATE_FAILURE is used
instead of CSTATE_END_COMMAND in case of failure, and CSTATE_RETRY is called
before CSTATE_END_TX if there was a failure during the current script execution.

ISTM that ABORTED & FINISHED are currently exactly the same. That would
put a particular use to aborted. Also, there are many points where the
code may go to "aborted" state, so reusing it could help avoid duplicating
stuff on each abort decision.

To end and rollback the failed transaction block the script is always executed
completely, and after the failure the following script command is executed..

[1]
/messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre
[2]
/messages/by-id/alpine.DEB.2.20.1801121309300.10810@lancre
[3]
/messages/by-id/alpine.DEB.2.20.1801121607310.13422@lancre

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#49

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Teodor Sigaev (#48)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Conception of max-retry option seems strange for me. if number of retries
reaches max-retry option, then we just increment counter of failed
transaction and try again (possibly, with different random numbers). At the
end we should distinguish number of error transaction and failed transaction,
to found this difference documentation suggests to rerun pgbench with
debugging on.

May be I didn't catch an idea, but it seems to me max-tries should be
removed. On transaction searialization or deadlock error pgbench should
increment counter of failed transaction, resets conditional stack, variables,
etc but not a random generator and then start new transaction for the first
line of script.

ISTM that there is the idea is that the client application should give up
at some point are report an error to the end user, kind of a "timeout" on
trying, and that max-retry would implement this logic of giving up: the
transaction which was intented, represented by a given initial random
generator state, could not be committed as if after some iterations.

Maybe the max retry should rather be expressed in time rather than number
of attempts, or both approach could be implemented? But there is a logic
of retrying the same (try again what the client wanted) vs retrying
something different (another client need is served).

--
Fabien.

#50

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Fabien COELHO (#49)

1 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 29-03-2018 22:39, Fabien COELHO wrote:

Conception of max-retry option seems strange for me. if number of
retries reaches max-retry option, then we just increment counter of
failed transaction and try again (possibly, with different random
numbers).

Then the client starts another script, but by chance or by the number of
scripts it can be the same.

At the end we should distinguish number of error transaction and
failed transaction, to found this difference documentation suggests
to rerun pgbench with debugging on.

If I understood you correctly, this difference is the total number of
retries and this is included in all reports.

May be I didn't catch an idea, but it seems to me max-tries should be
removed. On transaction searialization or deadlock error pgbench
should increment counter of failed transaction, resets conditional
stack, variables, etc but not a random generator and then start new
transaction for the first line of script.

When I sent the first version of the patch there were only rollbacks,
and the idea to retry failed transactions was approved (see [1]/messages/by-id/CACjxUsOfbn72EaH4i_OuzdY-0PUYfg1Y3o8G27tEA8fJOaPQEw@mail.gmail.com, [2]/messages/by-id/20170615211806.sfkpiy2acoavpovl@alvherre.pgsql,
[3]: /messages/by-id/CAEepm=3TRTc9Fy=fdFThDa4STzPTR6w=RGfYEPikEkc-Lcd+Mw@mail.gmail.com
variables in case of errors too, and not only in case of retries (see
attached, it is based on the commit
3da7502cd00ddf8228c9a4a7e4a08725decff99c).

ISTM that there is the idea is that the client application should give
up at some point are report an error to the end user, kind of a
"timeout" on trying, and that max-retry would implement this logic of
giving up: the transaction which was intented, represented by a given
initial random generator state, could not be committed as if after
some iterations.

Maybe the max retry should rather be expressed in time rather than
number of attempts, or both approach could be implemented? But there
is a logic of retrying the same (try again what the client wanted) vs
retrying something different (another client need is served).

I'm afraid that we will have a problem in debugging mode: should we
report a failure (which will be retried) or an error (which will not be
retried)? Because only after executing the following script commands (to
rollback this transaction block) we will know the time that we spent on
the execution of the current script..

[1]: /messages/by-id/CACjxUsOfbn72EaH4i_OuzdY-0PUYfg1Y3o8G27tEA8fJOaPQEw@mail.gmail.com
/messages/by-id/CACjxUsOfbn72EaH4i_OuzdY-0PUYfg1Y3o8G27tEA8fJOaPQEw@mail.gmail.com
[2]: /messages/by-id/20170615211806.sfkpiy2acoavpovl@alvherre.pgsql
/messages/by-id/20170615211806.sfkpiy2acoavpovl@alvherre.pgsql
[3]: /messages/by-id/CAEepm=3TRTc9Fy=fdFThDa4STzPTR6w=RGfYEPikEkc-Lcd+Mw@mail.gmail.com
/messages/by-id/CAEepm=3TRTc9Fy=fdFThDa4STzPTR6w=RGfYEPikEkc-Lcd+Mw@mail.gmail.com
[4]: /messages/by-id/CACjxUsOQw=vYjPWZQ29GmgWU8ZKj336OGiNQX5Z2W-AcV12+Nw@mail.gmail.com
/messages/by-id/CACjxUsOQw=vYjPWZQ29GmgWU8ZKj336OGiNQX5Z2W-AcV12+Nw@mail.gmail.com

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v6-0001-Pgbench-errors-and-serialization-deadlock-retries.patchtext/x-diff; name=v6-0001-Pgbench-errors-and-serialization-deadlock-retries.patchDownload

From 25791eab71a10a462c8f0a0fb24d98452e069b9f Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 30 Mar 2018 15:08:48 +0300
Subject: [PATCH v6] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures can be rolled back and
retried again and again until they end successfully or their number of tries
reaches maximum. You can set the maximum number of tries by using the
appropriate benchmarking option (--max-tries). The default value is 1.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block does not terminate in the current script, the
commands of the following scripts are processed as usual so you can get a lot of
errors of type "in failed SQL transaction" (when the current SQL transaction is
aborted and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

If you want to distinguish between failures or errors by type, use the pgbench
debugging output created with the option --debug and with the debugging level
"fails" or "all". The first variant is recommended for this purpose because with
in the second case the debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  306 +++++-
 src/bin/pgbench/pgbench.c                          | 1157 ++++++++++++++++----
 src/bin/pgbench/t/001_pgbench_with_server.pl       |   39 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl         |    2 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  815 ++++++++++++++
 5 files changed, 2047 insertions(+), 272 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 41d9030..5262bd4 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,19 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
-  and intended (the latter being just the product of number of clients
+  The first six lines and the eighth line report some of the most important
+  parameter settings.  The seventh line reports the number of transactions
+  completed and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  (see <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+  for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -380,11 +383,28 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
-      <term><option>-d</option></term>
-      <term><option>--debug</option></term>
+      <term><option>-d</option> <replaceable>debug_level</replaceable></term>
+      <term><option>--debug=</option><replaceable>debug_level</replaceable></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. You can use the following debugging levels:
+          <itemizedlist>
+           <listitem>
+            <para><literal>no</literal>: no debugging output (except built-in
+            function <function>debug</function>, see <xref
+            linkend="pgbench-functions"/>).</para>
+           </listitem>
+           <listitem>
+            <para><literal>fails</literal>: print only error messages and
+            failures (see <xref linkend="errors-and-retries"
+            endterm="errors-and-retries-title"/> for more information).</para>
+           </listitem>
+           <listitem>
+            <para><literal>all</literal>: print all debugging output
+            (throttling, executed/sent/received commands etc.).</para>
+           </listitem>
+          </itemizedlist>
+        The default is no debugging output.
        </para>
       </listitem>
      </varlistentry>
@@ -513,22 +533,37 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the <option>--max-tries</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -667,6 +702,35 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. The default is 1.
+       </para>
+       <note>
+        <para>
+         In <application>pgbench</application> it is usually assumed that one
+         transaction script contains only one transaction (see <xref
+         linkend="transactions-and-scripts"
+         endterm="transactions-and-scripts-title"/> for more information). Be
+         careful when repeating scripts that contain multiple transactions: the
+         script is always retried completely, so the successful transactions can
+         be performed several times.
+        </para>
+       </note>
+       <note>
+        <para>
+         Be careful when repeating transactions with shell commands. Unlike the
+         results of SQL commands, the results of shell commands are not rolled
+         back, except for the variable value of the <command>\setshell</command>
+         command.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +871,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1647,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1668,15 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>). If the transaction ended with an error "in
+   failed SQL transaction", its <replaceable>time</replaceable> will be reported
+   as <literal>in_failed_tx</literal>. If the transaction ended with other
+   error, its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1706,23 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10:
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1737,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1751,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1765,27 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>). They report the number of retried
+   transactions and the sum of all the retries after serialization or deadlock
+   failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1695,15 +1797,54 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
   </para>
 
+   <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
+   </para>
+
   <para>
    For the default script, the output will look similar to this:
 <screen>
@@ -1715,6 +1856,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
@@ -1732,10 +1874,49 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 3988/10000
+number of errors: 6012 (60.120%)
+number of retried: 8113 (81.130%)
+number of retries: 655869
+maximum number of tries: 100
+latency average = 345.979 ms
+latency stddev = 637.964 ms
+tps = 8.203884 (including connections establishing)
+tps = 8.203969 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.003     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.312     0       0  BEGIN;
+  0.866     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.698     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.965  5983  648829  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.886    29    7029  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.960     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.009     0      11  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1930,65 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally until the end of the
+   current script execution (it is assumed that one transaction script contains
+   only one transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated until they complete successfully or reach the maximum number of
+   tries specified by the <option>--max-tries</option> option. If the last
+   transaction run fails, this transaction will be reported as failed, and the
+   client variables will be set as they were before the first run of this
+   transaction.
+  </para>
+
+  <note>
+   <para>
+    If a failed transaction block does not terminate in the current script, the
+    commands of the following scripts are processed as usual so you can get a
+    lot of errors of type "in failed SQL transaction" (when the current SQL
+    transaction is aborted and commands ignored until end of transaction block).
+    In such cases you can use separate statistics of these errors in all
+    reports.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use option
+   <option>--max-tries</option> to make it possible). The per-statement report
+   inherits all columns from the main report. Note that if a failure/error
+   occurs, the following failures/errors in the current script execution are not
+   shown in the reports. The retry is only reported for the first command where
+   the failure occured during the current script execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type, use the
+   <application>pgbench</application> debugging output created with the option
+   <option>--debug</option> and with the debugging level
+   <literal>fails</literal> or <literal>all</literal>. The first variant is
+   recommended for this purpose because with in the second case the debugging
+   output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8529e7d..fa82edd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -186,8 +189,13 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
+int			max_tries = 1;		/* maximum number of tries to run the
+								 * transaction with serialization or deadlock
+								 * failures */
 
 char	   *pghost = "";
 char	   *pgport = "";
@@ -242,14 +250,66 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for thread/client random seed.
+ */
+typedef struct RandomState
+{
+	unsigned short data[3];
+} RandomState;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION,
+	ANOTHER_FAILURE
+} FailureStatus;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -304,6 +364,21 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, report the failure in CSTATE_FAILURE. Then process other commands
+	 * of the failed transaction if any and go to CSTATE_RETRY. If we can
+	 * re-execute the transaction from the very beginning, set the same
+	 * parameters for the transaction execution as in the previous tries and
+	 * process the first transaction command in CSTATE_START_COMMAND. Otherwise,
+	 * set the parameters for the transaction execution as they were before the
+	 * first run of this transaction (except for a random state) and go to
+	 * CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -329,14 +404,13 @@ typedef struct
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
+	RandomState random_state;	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -346,6 +420,16 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	FailureStatus first_failure;	/* the status of the first failure in the
+									 * current transaction execution; NO_FAILURE
+									 * if there were no failures or errors */
+	RetryState  retry_state;
+	int			retries;		/* is less than max_tries */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -389,7 +473,7 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+	RandomState random_state; 	/* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -445,6 +529,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 } Command;
 
 typedef struct ParsedScript
@@ -460,7 +548,17 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum Debuglevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_FAILS,				/* print only error messages and failures */
+	DEBUG_ALL,					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+	NUM_DEBUGLEVEL
+} Debuglevel;
+
+static Debuglevel debug_level = NO_DEBUG;	/* debug flag */
+static const char *DEBUGLEVEl[] = {"no", "fails", "all"};
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -508,6 +606,15 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+typedef enum FailStatus
+{
+	TX_FAILURE,					/* the transaction will be re-executed from the
+								 * very beginning */
+	IN_FAILED_TX,				/* continue the failed transaction */
+	TX_ERROR,					/* the transaction will be marked as failed */
+	CLIENT_ABORTED				/* the cliend is aborted */
+} FailStatus;
+
 
 /* Function prototypes */
 static void setNullValue(PgBenchValue *pv);
@@ -572,7 +679,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -581,11 +688,12 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug              print debugging output\n"
+		   "  -d, --debug=no|fails|all print debugging output (default: no)\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -693,7 +801,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -704,7 +812,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->data));
 }
 
 /*
@@ -713,7 +821,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -723,7 +832,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -736,7 +845,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -764,8 +874,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->data);
+		double		rand2 = 1.0 - pg_erand48(random_state->data);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -792,7 +902,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -801,7 +911,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -879,7 +989,7 @@ zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -890,8 +1000,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->data);
+		v = pg_erand48(random_state->data);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -909,10 +1019,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, RandomState *random_state, int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state->data);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -924,7 +1035,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -933,8 +1045,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					? computeIterativeZipfian(random_state, n, s)
+					: computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -1034,6 +1146,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1042,8 +1158,24 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus first_error, int64 retries)
 {
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* failed transactions are processed separatly */
+	if (first_error != NO_FAILURE)
+	{
+		stats->errors++;
+
+		if (first_error == IN_FAILED_SQL_TRANSACTION)
+			stats->errors_in_failed_tx++;
+
+		return;
+	}
+
 	stats->cnt++;
 
 	if (skipped)
@@ -1061,6 +1193,14 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag)
 	}
 }
 
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->data[0] = random();
+	random_state->data[1] = random();
+	random_state->data[2] = random();
+}
+
 /* call PQexec() and exit() on failure */
 static void
 executeStatement(PGconn *con, const char *sql)
@@ -1184,39 +1324,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1290,9 +1430,12 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1340,11 +1483,12 @@ valid_variable_name(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool aborted)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1355,29 +1499,32 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			if (aborted || debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
+						context, name);
+			}
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1386,12 +1533,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, true);
 	if (!var)
 		return false;
 
@@ -1409,12 +1557,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
-				  const PgBenchValue *value)
+putVariableValue(Variables *variables, const char *context, char *name,
+				  const PgBenchValue *value, bool aborted)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, aborted);
 	if (!var)
 		return false;
 
@@ -1429,12 +1577,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value, bool aborted)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val, aborted);
 }
 
 /*
@@ -1489,7 +1638,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1510,7 +1659,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1525,12 +1674,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -1565,7 +1715,11 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else /* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "cannot coerce %s to boolean\n",
+					valueTypeName(pval));
+		}
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1610,7 +1764,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1618,7 +1773,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1639,7 +1795,9 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to double\n",
+					valueTypeName(pval));
 		return false;
 	}
 }
@@ -1817,8 +1975,11 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr,
+					"too many function arguments, maximum is %d\n", MAX_FARGS);
+		}
 		return false;
 	}
 
@@ -1941,7 +2102,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								if (debug_level >= DEBUG_FAILS)
+									fprintf(stderr, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1952,7 +2114,11 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										if (debug_level >= DEBUG_FAILS)
+										{
+											fprintf(stderr,
+													"bigint out of range\n");
+										}
 										return false;
 									}
 									else
@@ -2187,20 +2353,22 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "random range is too large\n");
 					return false;
 				}
 
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2215,39 +2383,51 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"gaussian parameter must be at least %f (not %f)\n",
+										MIN_GAUSSIAN_PARAM, param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										MAX_ZIPFIAN_PARAM, param);
+							}
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, &st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"exponential parameter must be greater than zero (got %f)\n",
+										param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2346,10 +2526,13 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					if (debug_level >= DEBUG_FAILS)
+					{
+						fprintf(stderr, "undefined variable \"%s\"\n",
+								expr->u.variable.varname);
+					}
 					return false;
 				}
 
@@ -2410,7 +2593,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2441,17 +2624,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[i]);
+			}
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2468,7 +2655,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded)
+			if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 			return false;
 		}
@@ -2478,19 +2665,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded)
+		if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2500,11 +2689,14 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
+					argv[0], res);
+		}
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval, false))
 		return false;
 
 #ifdef DEBUG
@@ -2521,11 +2713,46 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(CState *st, const char *cmd, const char *message, FailStatus
+			  fail_status)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	switch (fail_status)
+	{
+		case TX_FAILURE:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d got a failure (try %d/%d) in command %d (%s) of script %d; %s\n",
+						st->id, st->retries + 1, max_tries, st->command, cmd,
+						st->use_file, message);
+			}
+			break;
+		case IN_FAILED_TX:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d continues a failed transaction in command %d (%s) of script %d; %s\n",
+						st->id, st->command, cmd, st->use_file, message);
+			}
+			break;
+		case TX_ERROR:
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"client %d got an error in command %d (%s) of script %d; %s\n",
+						st->id, st->command, cmd, st->use_file, message);
+			}
+			break;
+		case CLIENT_ABORTED:
+			fprintf(stderr,
+					"client %d aborted in command %d (%s) of script %d; %s\n",
+					st->id, st->command, cmd, st->use_file, message);
+			break;
+		default:
+			/* internal error which should never occur */
+			fprintf(stderr, "unexpected fail status %d", fail_status);
+			exit(1);
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2538,7 +2765,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2558,9 +2785,9 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
@@ -2570,9 +2797,9 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
@@ -2604,10 +2831,10 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
@@ -2617,10 +2844,9 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d could not send %s\n",
 					st->id, command->argv[0]);
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2632,17 +2858,20 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[1]);
+			}
 			return false;
 		}
 		usec = atoi(var);
@@ -2665,6 +2894,162 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+/*
+ * Copy an array of random state.
+ */
+static void
+copyRandomState(RandomState *destination, const RandomState *source)
+{
+	memcpy(destination->data, source->data, sizeof(unsigned short) * 3);
+}
+
+/*
+ * Make a deep copy of variables array.
+ */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->svalue);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->svalue)
+			current_destination->svalue = pg_strdup(current_source->svalue);
+		else
+			current_destination->svalue = NULL;
+		current_destination->value = current_source->value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, FailureStatus failure_status)
+{
+	Assert(failure_status != NO_FAILURE);
+
+	/*
+	 * All subsequent failures will be "retried" if the first failure of this
+	 * transaction can be retried.
+	 */
+	if (st->first_failure != NO_FAILURE)
+		failure_status = st->first_failure;
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!(failure_status == SERIALIZATION_FAILURE ||
+		  failure_status == DEADLOCK_FAILURE))
+		return false;
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (st->retries + 1 >= max_tries)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Return the transaction status: find out if there's a failure that can be
+ * retried, or there's an error that cannot be retired; or we continue an
+ * already failed transaction.
+ */
+static FailStatus
+getFailStatus(CState *st, FailureStatus failure_status)
+{
+	Assert(failure_status != NO_FAILURE);
+
+	if (st->first_failure == NO_FAILURE)
+		return canRetry(st, failure_status) ? TX_FAILURE : TX_ERROR;
+	else
+		return IN_FAILED_TX;
+}
+
+/*
+ * Process the conditional stack depending on the condition value; is used for
+ * the meta commands \if and \elif.
+ */
+static void
+executeCondition(CState *st, bool condition)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	/* execute or not depending on evaluated condition */
+	if (command->meta == META_IF)
+	{
+		conditional_stack_push(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+	else if (command->meta == META_ELIF)
+	{
+		/* we should get here only if the "elif" needed evaluation */
+		Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
+		conditional_stack_poke(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+			return IN_FAILED_SQL_TRANSACTION;
+	}
+
+	return ANOTHER_FAILURE;
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2675,6 +3060,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2705,7 +3091,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
 							sql_script[st->use_file].desc);
 
@@ -2715,6 +3101,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->first_failure = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2732,7 +3123,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2762,16 +3153,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->random_state,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -2779,7 +3171,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
 							st->id, wait);
 				break;
@@ -2826,6 +3218,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case if it fails or we should repeat it in
+				 * future.
+				 */
+				copyRandomState(&st->retry_state.random_state,
+								&st->random_state);
+				copyVariables(&st->retry_state.variables, &st->variables);
+
+				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
 				 */
@@ -2861,7 +3262,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (command == NULL)
 				{
-					st->state = CSTATE_END_TX;
+					if (st->first_failure == NO_FAILURE)
+					{
+						st->state = CSTATE_END_TX;
+					}
+					else
+					{
+						/* check if we can retry the failure */
+						st->state = CSTATE_RETRY;
+					}
 					break;
 				}
 
@@ -2869,7 +3278,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2880,7 +3289,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(st, "SQL", "SQL command send failed",
+									  CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -2892,7 +3302,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					if (debug_level >= DEBUG_ALL)
 					{
 						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
 						for (i = 1; i < argc; i++)
@@ -2900,6 +3310,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						fprintf(stderr, "\n");
 					}
 
+					/* change it if the meta command fails */
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -2911,10 +3324,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "sleep", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -2942,35 +3357,35 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, argv[0], "evaluation of meta-command failed",
+										  getFailStatus(st, failure_status));
+
+							/*
+							 * Do not ruin the following conditional commands,
+							 * if any.
+							 */
+							executeCondition(st, false);
+
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(&st->variables, argv[0],
+												  argv[1], &result, false))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								failure_status = ANOTHER_FAILURE;
+								commandFailed(st, "set", "assignment of meta-command failed",
+											  getFailStatus(st, failure_status));
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
 						else /* if and elif evaluated cases */
 						{
-							bool cond = valueTruth(&result);
-
-							/* execute or not depending on evaluated condition */
-							if (command->meta == META_IF)
-							{
-								conditional_stack_push(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
-							else /* elif */
-							{
-								/* we should get here only if the "elif" needed evaluation */
-								Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
-								conditional_stack_poke(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
+							executeCondition(st, valueTruth(&result));
 						}
 					}
 					else if (command->meta == META_ELSE)
@@ -2999,7 +3414,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3008,8 +3425,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "setshell", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3019,7 +3438,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3028,8 +3448,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							failure_status = ANOTHER_FAILURE;
+							commandFailed(st, "shell", "execution of meta-command failed",
+										  getFailStatus(st, failure_status));
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3134,37 +3556,54 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					if (debug_level >= DEBUG_ALL)
+						fprintf(stderr, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "SQL", "perhaps the backend died while processing",
+									  CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							PQclear(res);
+							discard_response(st);
+							failure_status = NO_FAILURE;
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							failure_status = getFailureStatus(sqlState);
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  getFailStatus(st, failure_status));
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  CLIENT_ABORTED);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3193,7 +3632,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3212,6 +3651,96 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
+				 * Report the failure/error and go ahead with next command.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				Assert(failure_status != NO_FAILURE);
+
+				/*
+				 * All subsequent failures will be "retried"/"failed" if the
+				 * first failure of this transaction can be/cannot be retried.
+				 * Therefore mention only the first failure/error in the
+				 * reports.
+				 */
+				if (st->first_failure == NO_FAILURE)
+				{
+					st->first_failure = failure_status;
+
+					if (report_per_command)
+					{
+						if (canRetry(st, failure_status))
+						{
+							/*
+							 * The failed transaction will be retried. So
+							 * accumulate the retry for the command.
+							 */
+							command->retries++;
+						}
+						else
+						{
+							/*
+							 * We will not be able to retry this failed
+							 * transaction. So accumulate the error for the
+							 * command.
+							 */
+							command->errors++;
+							if (failure_status == IN_FAILED_SQL_TRANSACTION)
+								command->errors_in_failed_tx++;
+						}
+					}
+				}
+
+				/* Go ahead with next command, to be executed or skipped */
+				st->command++;
+				st->state = conditional_active(st->cstack) ?
+					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				if (canRetry(st, st->first_failure))
+				{
+					st->retries++;
+
+					if (debug_level >= DEBUG_ALL)
+					{
+						fprintf(stderr, "client %d repeats the failed transaction (try %d/%d)\n",
+								st->id,
+								st->retries + 1,
+								max_tries);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction.
+					 */
+					copyRandomState(&st->random_state,
+									&st->retry_state.random_state);
+					copyVariables(&st->variables, &st->retry_state.variables);
+
+					/* Process the first transaction command */
+					st->command = 0;
+					st->first_failure = NO_FAILURE;
+					st->state = CSTATE_START_COMMAND;
+				}
+				else
+				{
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction except for a random state.
+					 */
+					copyVariables(&st->variables, &st->retry_state.variables);
+
+					/* End the failed transaction */
+					st->state = CSTATE_END_TX;
+				}
+				break;
+
+				/*
 				 * End of transaction.
 				 */
 			case CSTATE_END_TX:
@@ -3232,7 +3761,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3292,7 +3822,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->random_state.data) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -3308,13 +3838,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3325,6 +3857,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3332,7 +3868,7 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->first_failure, st->retries);
 	}
 	else
 	{
@@ -3342,14 +3878,25 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->first_failure == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->first_failure == IN_FAILED_SQL_TRANSACTION)
+			fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3369,7 +3916,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->first_failure == NO_FAILURE)
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3382,7 +3929,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, latency, lag, st->first_failure,
+				   st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -3390,19 +3938,24 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, 0, 0, st->first_failure,
+				   st->retries);
 	}
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->first_failure == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->first_failure, st->retries);
 }
 
 
@@ -4535,7 +5088,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + total->errors;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4556,8 +5110,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4565,6 +5119,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors, 100.0 * total->errors / total_ntx);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total_ntx);
+
+	/* it can be non-zero only if max_tries is greater than one */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+	printf("maximum number of tries: %d\n", max_tries);
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4614,7 +5187,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4623,6 +5196,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_total_ntx = sstats->cnt + sstats->errors;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4631,9 +5205,30 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
-					   100.0 * sstats->cnt / total->cnt,
+					   100.0 * sstats->cnt / script_total_ntx,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / script_total_ntx);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors_in_failed_tx,
+						   (100.0 * sstats->errors_in_failed_tx /
+							script_total_ntx));
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4642,15 +5237,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+				{
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" : ","));
+				}
+				if (total->errors_in_failed_tx > 0)
+				{
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				}
+				if (total->retried > 0)
+				{
+					printf(" and retries");
+				}
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4658,10 +5271,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					}
+					if (total->errors_in_failed_tx > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					}
+					if (total->retried > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					}
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4720,7 +5348,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"debug", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -4735,7 +5363,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4754,6 +5382,7 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4825,7 +5454,7 @@ main(int argc, char **argv)
 	/* set random seed early, because it may be used while parsing scripts. */
 	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
 
-	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "iI:h:nvp:d:qb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -4855,8 +5484,22 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
-				break;
+				{
+					for (debug_level = 0;
+						 debug_level < NUM_DEBUGLEVEL;
+						 debug_level++)
+					{
+						if (strcmp(optarg, DEBUGLEVEl[debug_level]) == 0)
+							break;
+					}
+					if (debug_level >= NUM_DEBUGLEVEL)
+					{
+						fprintf(stderr, "invalid debug level (-d): \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+					break;
+				}
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
@@ -4908,7 +5551,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -4989,7 +5632,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5101,6 +5744,16 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				set_random_seed(optarg, "--random-seed option");
 				break;
+			case 10:			/* max-tries */
+				benchmarking_option_set = true;
+				max_tries = atoi(optarg);
+				if (max_tries <= 0)
+				{
+					fprintf(stderr, "invalid number of maximum tries: \"%s\"\n",
+							optarg);
+					exit(1);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5287,19 +5940,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
-										   var->name, &var->value))
+					if (!putVariableValue(&state[i].variables, "startup",
+										   var->name, &var->value, true))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5311,9 +5964,10 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
+	if (debug_level >= DEBUG_ALL)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -5374,11 +6028,12 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale,
+								true))
 				exit(1);
 		}
 	}
@@ -5387,15 +6042,18 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+		{
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i,
+								true))
 				exit(1);
+		}
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64	seed = ((uint64) (random() & 0xFFFF) << 48) |
 					   ((uint64) (random() & 0xFFFF) << 32) |
@@ -5403,15 +6061,17 @@ main(int argc, char **argv)
 					   (uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed, true))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed, true))
 				exit(1);
 	}
 
@@ -5444,9 +6104,7 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->random_state);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
@@ -5528,6 +6186,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5812,7 +6474,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -5839,6 +6505,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -5856,6 +6527,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -5881,6 +6557,14 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (errors > 0)
+				{
+					fprintf(stderr, ", " INT64_FORMAT " failed" , errors);
+					if (errors_in_failed_tx > 0)
+						fprintf(stderr, " (" INT64_FORMAT " in failed tx)",
+								errors_in_failed_tx);
+				}
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -5888,6 +6572,11 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				/* it can be non-zero only if max_tries is greater than one */
+				if (retried > 0)
+					fprintf(stderr, ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							retried, retries);
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index be08b20..4d46496 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -118,7 +118,8 @@ pgbench(
 	[   qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple} ],
+		qr{mode: simple},
+		qr{maximum number of tries: 1} ],
 	[qr{^$}],
 	'pgbench tpcb-like');
 
@@ -134,7 +135,7 @@ pgbench(
 	'pgbench simple update');
 
 pgbench(
-	'-t 100 -c 7 -M prepared -b se --debug',
+	'-t 100 -c 7 -M prepared -b se --debug all',
 	0,
 	[   qr{builtin: select only},
 		qr{clients: 7\b},
@@ -491,6 +492,10 @@ my @errors = (
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 } ],
+	[   'sql division by zero', 0, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+	SELECT 1 / 0;
+} ],
 
 	# SHELL
 	[   'shell bad command',               0,
@@ -621,6 +626,16 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 	[   'sleep unknown unit',         1,
 		[qr{unrecognized time unit}], q{\sleep 1 week} ],
 
+	# CONDITIONAL BLOCKS
+	[   'if elif failed conditions', 0,
+		[qr{division by zero}],
+		q{-- failed conditions
+\if 1 / 0
+\elif 1 / 0
+\else
+\endif
+} ],
+
 	# MISC
 	[   'misc invalid backslash command',         1,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand} ],
@@ -635,14 +650,30 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared -d fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
 }
 
+# reset client variables in case of failure
+pgbench(
+	'-n -t 2 -d fails', 0,
+	[ qr{processed: 0/2}, qr{number of errors: 2 \(100.000%\)} ],
+	[ qr{(client 0 got an error in command 1 \(SQL\) of script 0; ERROR:  syntax error at or near ":"(.|\n)*){2}} ],
+	'pgbench reset client variables in case of failure',
+	{	'001_pgbench_reset_client_variables' => q{
+BEGIN;
+-- select an unassigned variable
+SELECT :unassigned_var;
+\set unassigned_var 1
+END;
+} });
+
 # zipfian cache array overflow
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 682bc22..a0c227e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -57,7 +57,7 @@ my @options = (
 
 	# name, options, stderr checks
 	[   'bad option',
-		'-h home -p 5432 -U calvin -d --bad-option',
+		'-h home -p 5432 -U calvin -d all --bad-option',
 		[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ] ],
 	[   'no file',
 		'-f no-such-file',
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..da9d7da
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,815 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 21;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	DEADLOCK_1         => 3,
+	WAIT_PGBENCH_2     => 4,
+	DEADLOCK_2         => 5,
+	TRANSACTION_ENDS_1 => 6,
+	TRANSACTION_ENDS_2 => 7,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "END;\n");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_1 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_2 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_commit_failure = $node->basedir . '/pgbench_script_commit_failure';
+append_to_file($script_commit_failure,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "UPDATE xy SET y = y + :delta WHERE x = 1;\n"
+	  . "SELECT pg_advisory_lock(0);\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 0/1},
+		"concurrent update: check processed transactions");
+
+	my $pattern =
+		"client 0 got an error in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update: check serialization error");
+}
+
+sub test_pgbench_serialization_failures
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update with retrying: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent update with retrying: check errors");
+
+	my $pattern =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update\n\n"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 continues a failed transaction in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g2 "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update with retrying: check the retried transaction");
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	# The first or second pgbench should get a deadlock error
+	ok(($out1 =~ /processed: 0\/1/ or $out2 =~ /processed: 0\/1/),
+		"concurrent deadlock update: check processed transactions");
+
+	ok(
+		($err1 =~ /client 0 got an error in command 3 \(SQL\) of script 0; ERROR:  deadlock detected/ or
+	     $err2 =~ /client 0 got an error in command 2 \(SQL\) of script 0; ERROR:  deadlock detected/),
+		"concurrent deadlock update: check deadlock error");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks that pgbenches will wait for:
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_1 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_1 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_1 ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_2 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_2 ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_1 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_1 ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_2 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_2 ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	my $pattern1 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	my $pattern2 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+sub test_pgbench_commit_failure
+{
+	my $isolation_level = SERIALIZABLE;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "select pg_advisory_lock(0);\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_commit_failure);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) when 0 then 'zero' else 'not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /not_zero/);
+
+	# In psql, run a parallel transaction, release advisory locks and end the
+	# session:
+
+	$in_psql = "begin\;update xy set y = y + 1 where x = 2\;end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"commit failure: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"commit failure: check errors");
+
+	my $pattern =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) WHERE x = 1;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(0\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 got a failure \\(try 1/2\\) in command 4 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to read/write dependencies among transactions\n"
+	  . "DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g2 WHERE x = 1;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(0\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"commit failure: "
+	  . "check the completion of the failed transaction block");
+}
+
+test_pgbench_serialization_errors();
+test_pgbench_serialization_failures();
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+test_pgbench_commit_failure();
+
+#done
+$node->stop;
-- 
2.7.4

#51

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Marina Polyakova (#50)

1 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello, hackers!

Here there's a seventh version of the patch for error handling and
retrying of transactions with serialization/deadlock failures in pgbench
(based on the commit a08dc711952081d63577fc182fcf955958f70add). I added
the option --max-tries-time which is an implemetation of Fabien Coelho's
proposal in [1]/messages/by-id/alpine.DEB.2.20.1803292134380.16472@lancre: the transaction with serialization or deadlock failure
can be retried if the total time of all its tries is less than this
limit (in ms). This option can be combined with the option --max-tries.
But if none of them are used, failed transactions are not retried at
all.

Also:
* Now when the first failure occurs in the transaction it is always
reported as a failure since only after the remaining commands of this
transaction are executed we find out whether we can try again or not.
Therefore add the messages about retrying or ending the failed
transaction to the "fails" debugging level so you can distinguish
failures (which are retried) and errors (which are not retried).
* Fix a report on the latency average because the total time includes
time for both errors and successful transactions.
* Code cleanup (including tests).

[1]: /messages/by-id/alpine.DEB.2.20.1803292134380.16472@lancre
/messages/by-id/alpine.DEB.2.20.1803292134380.16472@lancre

Maybe the max retry should rather be expressed in time rather than
number
of attempts, or both approach could be implemented?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v7-0001-Pgbench-errors-and-serialization-deadlock-retries.patchtext/x-diff; name=v7-0001-Pgbench-errors-and-serialization-deadlock-retries.patchDownload

From 594edfbbcacc332172db491998117cd1e7781770 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Wed, 4 Apr 2018 16:00:46 +0300
Subject: [PATCH v7] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures are rolled back and
repeated until they complete successfully or reach the maximum number of tries
(specified by the --max-tries option) / the maximum time of tries (specified by
the --max-tries-time option). These options can be combined together; but if
none of them are used, failed transactions are not retried at all. If the last
transaction run fails, this transaction will be reported as failed, and the
client variables will be set as they were before the first run of this
transaction.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block does not terminate in the current script, the
commands of the following scripts are processed as usual so you can get a lot of
errors of type "in failed SQL transaction" (when the current SQL transaction is
aborted and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

If you want to distinguish between failures or errors by type (including which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock errors), use the pgbench debugging output created with
the option --debug and with the debugging level "fails" or "all". The first
variant is recommended for this purpose because with in the second case the
debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  332 ++++-
 src/bin/pgbench/pgbench.c                          | 1291 ++++++++++++++++----
 src/bin/pgbench/t/001_pgbench_with_server.pl       |   49 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl         |    6 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  739 +++++++++++
 5 files changed, 2138 insertions(+), 279 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 41d9030..6b691fd 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,19 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
-  and intended (the latter being just the product of number of clients
+  The first six lines and the eighth line report some of the most important
+  parameter settings.  The seventh line reports the number of transactions
+  completed and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  (see <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+  for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -380,11 +383,28 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
-      <term><option>-d</option></term>
-      <term><option>--debug</option></term>
+      <term><option>-d</option> <replaceable>debug_level</replaceable></term>
+      <term><option>--debug=</option><replaceable>debug_level</replaceable></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. You can use the following debugging levels:
+          <itemizedlist>
+           <listitem>
+            <para><literal>no</literal>: no debugging output (except built-in
+            function <function>debug</function>, see <xref
+            linkend="pgbench-functions"/>).</para>
+           </listitem>
+           <listitem>
+            <para><literal>fails</literal>: print only failure messages, errors
+            and retries (see <xref linkend="errors-and-retries"
+            endterm="errors-and-retries-title"/> for more information).</para>
+           </listitem>
+           <listitem>
+            <para><literal>all</literal>: print all debugging output
+            (throttling, executed/sent/received commands etc.).</para>
+           </listitem>
+          </itemizedlist>
+        The default is no debugging output.
        </para>
       </listitem>
      </varlistentry>
@@ -513,22 +533,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the options <option>--max-tries</option> and/or
+        <option>--max-tries-time</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -667,6 +703,42 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures.
+       </para>
+       <para>
+        This option can be combined with the option
+        <option>--max-tries-time</option>. But if none of them are used, failed
+        transactions are not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--max-tries-time=<replaceable>time_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum time (in milliseconds) of tries for transactions with
+        serialization/deadlock failures. The transaction with serialization or
+        deadlock failure can be retried if the total time of all its tries is
+        less than <replaceable>time_of_tries</replaceable> ms.
+       </para>
+       <para>
+        This option can be combined with the option
+        <option>--max-tries</option>. But if none of them are used, failed
+        transactions are not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +879,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1655,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1676,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--max-tries-time</option>). If the transaction
+   ended with an error "in failed SQL transaction", its
+   <replaceable>time</replaceable> will be reported as
+   <literal>in_failed_tx</literal>. If the transaction ended with other error,
+   its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1716,24 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1748,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1762,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1776,28 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--max-tries-time</option>). They report the
+   number of retried transactions and the sum of all the retries after
+   serialization or deadlock failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1695,16 +1809,55 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
   </para>
 
   <para>
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
+  </para>
+
+   <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
+   </para>
+
+  <para>
    For the default script, the output will look similar to this:
 <screen>
 starting vacuum...end.
@@ -1715,6 +1868,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
@@ -1732,10 +1886,50 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 5293/10000
+number of errors: 4707 (47.070%)
+number of retried: 7164 (71.640%)
+number of retries: 255928
+maximum number of tries: 100
+maximum time of tries: 100.0 ms
+latency average = 34.817 ms
+latency stddev = 37.347 ms
+tps = 71.083700 (including connections establishing)
+tps = 71.088507 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.003     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.186     0       0  BEGIN;
+  0.337     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.295     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.349  4168  247084  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.277   539    8839  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.264     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  0.444     0       5  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1943,78 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally until the end of the
+   current script execution (it is assumed that one transaction script contains
+   only one transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--max-tries-time</option> option). If
+   the last transaction run fails, this transaction will be reported as failed,
+   and the client variables will be set as they were before the first run of
+   this transaction.
+  </para>
+
+  <note>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+   <para>
+    If a failed transaction block does not terminate in the current script, the
+    commands of the following scripts are processed as usual so you can get a
+    lot of errors of type "in failed SQL transaction" (when the current SQL
+    transaction is aborted and commands ignored until end of transaction block).
+    In such cases you can use separate statistics of these errors in all
+    reports.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use the options
+   <option>--max-tries</option> and/or <option>--max-tries-time</option> to make
+   it possible). The per-statement report inherits all columns from the main
+   report. Note that if a failure/error occurs, the following failures/errors in
+   the current script execution are not shown in the reports. The retry is only
+   reported for the first command where the failure occured during the current
+   script execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type (including
+   which limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock errors), use the <application>pgbench</application>
+   debugging output created with the option <option>--debug</option> and with
+   the debugging level <literal>fails</literal> or <literal>all</literal>. The
+   first variant is recommended for this purpose because with in the second case
+   the debugging output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fd18568..d35cc1d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -186,9 +189,26 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There're different types of restrictions for deciding that the current failed
+ * transaction can no longer be retried and should be reported as failed. They
+ * can be combined together, and you need to use at least one of them to retry
+ * the failed transactions. By default, failed transactions are not retried at
+ * all.
+ */
+uint32		max_tries = 0;		/* we cannot retry a failed transaction if its
+								 * number of tries reaches this maximum; if its
+								 * value is zero, it is not used */
+uint64		max_tries_time = 0;	/* we cannot retry a failed transaction if we
+								 * spent more time on it than indicated in this
+								 * limit (in usec); if its value is zero, it is
+								 * not used */
+
 char	   *pghost = "";
 char	   *pgport = "";
 char	   *login = NULL;
@@ -242,14 +262,73 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for thread/client random seed.
+ */
+typedef struct RandomState
+{
+	unsigned short data[3];
+} RandomState;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION,
+	ANOTHER_FAILURE
+} FailureStatus;
+
+typedef struct Failure
+{
+	FailureStatus status;		/* type of the failure */
+	int			command;		/* command number in script where the failure
+								 * occurred */
+} Failure;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -304,6 +383,22 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, remember the failure in CSTATE_FAILURE. Then process other
+	 * commands of the failed transaction if any and go to CSTATE_RETRY. If we
+	 * can re-execute the transaction from the very beginning, report this as a
+	 * failure, set the same parameters for the transaction execution as in the
+	 * previous tries and process the first transaction command in
+	 * CSTATE_START_COMMAND. Otherwise, report this as an error, set the
+	 * parameters for the transaction execution as they were before the first
+	 * run of this transaction (except for a random state) and go to
+	 * CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -329,14 +424,13 @@ typedef struct
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
+	RandomState random_state;	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -346,6 +440,18 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	Failure		first_failure;	/* status and command number of the first
+								 * failure in the current transaction execution;
+								 * status NO_FAILURE if there were no failures
+								 * or errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction? */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -389,7 +495,7 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+	RandomState random_state; 	/* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -445,6 +551,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 } Command;
 
 typedef struct ParsedScript
@@ -460,7 +570,18 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum Debuglevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_FAILS,				/* print only failure messages, errors and
+								 * retries */
+	DEBUG_ALL,					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+	NUM_DEBUGLEVEL
+} Debuglevel;
+
+static Debuglevel debug_level = NO_DEBUG;	/* debug flag */
+static const char *DEBUGLEVEl[] = {"no", "fails", "all"};
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -572,7 +693,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -581,11 +702,13 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction\n"
+		   "  --max-tries-time=NUM     max time (in ms) of tries to run transaction\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug              print debugging output\n"
+		   "  -d, --debug=no|fails|all print debugging output (default: no)\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -693,7 +816,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -704,7 +827,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->data));
 }
 
 /*
@@ -713,7 +836,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -723,7 +847,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -736,7 +860,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -764,8 +889,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->data);
+		double		rand2 = 1.0 - pg_erand48(random_state->data);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -792,7 +917,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -801,7 +926,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -879,7 +1004,7 @@ zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -890,8 +1015,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->data);
+		v = pg_erand48(random_state->data);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -909,10 +1034,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, RandomState *random_state, int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state->data);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -924,7 +1050,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -933,8 +1060,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					? computeIterativeZipfian(random_state, n, s)
+					: computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -1034,6 +1161,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1042,8 +1173,30 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus first_error, int64 retries)
 {
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* Record the failed transaction */
+	if (first_error != NO_FAILURE)
+	{
+		stats->errors++;
+
+		if (first_error == IN_FAILED_SQL_TRANSACTION)
+			stats->errors_in_failed_tx++;
+
+		return;
+	}
+
+	/* Record the successful transaction */
+
 	stats->cnt++;
 
 	if (skipped)
@@ -1184,39 +1337,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1290,9 +1443,12 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1340,11 +1496,12 @@ valid_variable_name(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool aborted)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1355,29 +1512,32 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			if (aborted || debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
+						context, name);
+			}
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1386,12 +1546,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, true);
 	if (!var)
 		return false;
 
@@ -1409,12 +1570,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
-				  const PgBenchValue *value)
+putVariableValue(Variables *variables, const char *context, char *name,
+				  const PgBenchValue *value, bool aborted)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name, aborted);
 	if (!var)
 		return false;
 
@@ -1429,12 +1590,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value, bool aborted)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val, aborted);
 }
 
 /*
@@ -1489,7 +1651,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1510,7 +1672,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1525,12 +1687,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -1565,7 +1728,11 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else /* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "cannot coerce %s to boolean\n",
+					valueTypeName(pval));
+		}
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1610,7 +1777,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1618,7 +1786,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1639,7 +1808,9 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "cannot coerce %s to double\n",
+					valueTypeName(pval));
 		return false;
 	}
 }
@@ -1817,8 +1988,11 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr,
+					"too many function arguments, maximum is %d\n", MAX_FARGS);
+		}
 		return false;
 	}
 
@@ -1941,7 +2115,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								if (debug_level >= DEBUG_FAILS)
+									fprintf(stderr, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1952,7 +2127,11 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										if (debug_level >= DEBUG_FAILS)
+										{
+											fprintf(stderr,
+													"bigint out of range\n");
+										}
 										return false;
 									}
 									else
@@ -2187,20 +2366,22 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					if (debug_level >= DEBUG_FAILS)
+						fprintf(stderr, "random range is too large\n");
 					return false;
 				}
 
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2215,39 +2396,51 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"gaussian parameter must be at least %f (not %f)\n",
+										MIN_GAUSSIAN_PARAM, param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										MAX_ZIPFIAN_PARAM, param);
+							}
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, &st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							if (debug_level >= DEBUG_FAILS)
+							{
+								fprintf(stderr,
+										"exponential parameter must be greater than zero (got %f)\n",
+										param);
+							}
 							return false;
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2346,10 +2539,13 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					if (debug_level >= DEBUG_FAILS)
+					{
+						fprintf(stderr, "undefined variable \"%s\"\n",
+								expr->u.variable.varname);
+					}
 					return false;
 				}
 
@@ -2410,7 +2606,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2441,17 +2637,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[i]);
+			}
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			if (debug_level >= DEBUG_FAILS)
+				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2468,7 +2668,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded)
+			if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 			return false;
 		}
@@ -2478,19 +2678,21 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded)
+		if (!timer_exceeded && debug_level >= DEBUG_FAILS)
 			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		if (debug_level >= DEBUG_FAILS)
+			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2500,11 +2702,14 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		if (debug_level >= DEBUG_FAILS)
+		{
+			fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
+					argv[0], res);
+		}
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval, false))
 		return false;
 
 #ifdef DEBUG
@@ -2521,11 +2726,45 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(CState *st, const char *cmd, const char *message, bool aborted)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	/*
+	 * Always print an error message if the client is aborted...
+	 */
+	if (aborted)
+	{
+		fprintf(stderr,
+				"client %d aborted in command %d (%s) of script %d; %s\n",
+				st->id, st->command, cmd, st->use_file, message);
+		return;
+	}
+
+	/*
+	 * ... otherwise print an error message only if there's at least the
+	 * debugging mode for fails.
+	 */
+	if (debug_level < DEBUG_FAILS)
+		return;
+
+	if (st->first_failure.status == NO_FAILURE)
+	{
+		/*
+		 * This is the first failure during the execution of the current script.
+		 */
+		fprintf(stderr,
+				"client %d got a failure in command %d (%s) of script %d; %s\n",
+				st->id, st->command, cmd, st->use_file, message);
+	}
+	else
+	{
+		/*
+		 * This is not the first failure during the execution of the current
+		 * script.
+		 */
+		fprintf(stderr,
+				"client %d continues a failed transaction in command %d (%s) of script %d; %s\n",
+				st->id, st->command, cmd, st->use_file, message);
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2538,7 +2777,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2558,9 +2797,9 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
@@ -2570,9 +2809,9 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
@@ -2604,10 +2843,10 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
@@ -2617,10 +2856,9 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d could not send %s\n",
 					st->id, command->argv[0]);
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2632,17 +2870,20 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[1]);
+			}
 			return false;
 		}
 		usec = atoi(var);
@@ -2665,6 +2906,169 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+/*
+ * Copy an array of random state.
+ */
+static void
+copyRandomState(RandomState *destination, const RandomState *source)
+{
+	memcpy(destination->data, source->data, sizeof(unsigned short) * 3);
+}
+
+/*
+ * Make a deep copy of variables array.
+ */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->svalue);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->svalue)
+			current_destination->svalue = pg_strdup(current_source->svalue);
+		else
+			current_destination->svalue = NULL;
+		current_destination->value = current_source->value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if this type of failure can be retried.
+ */
+static bool
+canRetryFailure(FailureStatus failure_status)
+{
+	return (failure_status == SERIALIZATION_FAILURE ||
+			failure_status == DEADLOCK_FAILURE);
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, instr_time *now)
+{
+	FailureStatus failure_status = st->first_failure.status;
+
+	Assert(failure_status != NO_FAILURE);
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!canRetryFailure(failure_status))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of failed
+	 * transactions.
+	 */
+	Assert(max_tries || max_tries_time);
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (max_tries && st->retries + 1 >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the failure if we spent too much time on this
+	 * transaction.
+	 */
+	if (max_tries_time)
+	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
+		if (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled >= max_tries_time)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Process the conditional stack depending on the condition value; is used for
+ * the meta commands \if and \elif.
+ */
+static void
+executeCondition(CState *st, bool condition)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	/* execute or not depending on evaluated condition */
+	if (command->meta == META_IF)
+	{
+		conditional_stack_push(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+	else if (command->meta == META_ELIF)
+	{
+		/* we should get here only if the "elif" needed evaluation */
+		Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
+		conditional_stack_poke(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+			return IN_FAILED_SQL_TRANSACTION;
+	}
+
+	return ANOTHER_FAILURE;
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2675,6 +3079,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2705,7 +3110,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
 							sql_script[st->use_file].desc);
 
@@ -2715,6 +3120,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->first_failure.status = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2732,7 +3142,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2762,16 +3172,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->random_state,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -2779,7 +3190,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
 							st->id, wait);
 				break;
@@ -2826,11 +3237,20 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
-				 * Record transaction start time under logging, progress or
-				 * throttling.
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case if it fails or we should repeat it in
+				 * future.
+				 */
+				copyRandomState(&st->retry_state.random_state,
+								&st->random_state);
+				copyVariables(&st->retry_state.variables, &st->variables);
+
+				/*
+				 * Record transaction start time under logging, progress,
+				 * throttling, or if we have the maximum time of tries.
 				 */
 				if (use_log || progress || throttle_delay || latency_limit ||
-					per_script_stats)
+					per_script_stats || max_tries_time)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2861,7 +3281,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (command == NULL)
 				{
-					st->state = CSTATE_END_TX;
+					if (st->first_failure.status == NO_FAILURE)
+					{
+						st->state = CSTATE_END_TX;
+					}
+					else
+					{
+						/* check if we can retry the failure */
+						st->state = CSTATE_RETRY;
+					}
 					break;
 				}
 
@@ -2869,7 +3297,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2880,7 +3308,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(st, "SQL", "SQL command send failed",
+									  true);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -2892,7 +3321,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					if (debug_level >= DEBUG_ALL)
 					{
 						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
 						for (i = 1; i < argc; i++)
@@ -2900,6 +3329,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						fprintf(stderr, "\n");
 					}
 
+					/* change it if the meta command fails */
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -2911,10 +3343,13 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "sleep",
+										  "execution of meta-command failed",
+										  false);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -2942,35 +3377,37 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, argv[0],
+										  "evaluation of meta-command failed",
+										  false);
+
+							/*
+							 * Do not ruin the following conditional commands,
+							 * if any.
+							 */
+							executeCondition(st, false);
+
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(&st->variables, argv[0],
+												  argv[1], &result, false))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								commandFailed(st, "set",
+											  "assignment of meta-command failed",
+											  false);
+								failure_status = ANOTHER_FAILURE;
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
 						else /* if and elif evaluated cases */
 						{
-							bool cond = valueTruth(&result);
-
-							/* execute or not depending on evaluated condition */
-							if (command->meta == META_IF)
-							{
-								conditional_stack_push(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
-							else /* elif */
-							{
-								/* we should get here only if the "elif" needed evaluation */
-								Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
-								conditional_stack_poke(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
+							executeCondition(st, valueTruth(&result));
 						}
 					}
 					else if (command->meta == META_ELSE)
@@ -2999,7 +3436,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3008,8 +3447,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "setshell",
+										  "execution of meta-command failed",
+										  false);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3019,7 +3461,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3028,8 +3471,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "shell",
+										  "execution of meta-command failed",
+										  false);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3134,37 +3580,55 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					if (debug_level >= DEBUG_ALL)
+						fprintf(stderr, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "SQL",
+									  "perhaps the backend died while processing",
+									  true);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							PQclear(res);
+							discard_response(st);
+							failure_status = NO_FAILURE;
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							failure_status = getFailureStatus(sqlState);
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  false);
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  true);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3193,7 +3657,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3212,6 +3676,139 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
+				 * Remember the failure and go ahead with next command.
+				 */
+			case CSTATE_FAILURE:
+
+				Assert(failure_status != NO_FAILURE);
+
+				/*
+				 * All subsequent failures will be "retried"/"failed" if the
+				 * first failure of this transaction can be/cannot be retried.
+				 * Therefore remember only the first failure.
+				 */
+				if (st->first_failure.status == NO_FAILURE)
+				{
+					st->first_failure.status = failure_status;
+					st->first_failure.command = st->command;
+				}
+
+				/* Go ahead with next command, to be executed or skipped */
+				st->command++;
+				st->state = conditional_active(st->cstack) ?
+					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				{
+					double		used_time = 0;
+
+					command = sql_script[st->use_file].commands[st->first_failure.command];
+
+					if (max_tries_time)
+					{
+						if (INSTR_TIME_IS_ZERO(now))
+							INSTR_TIME_SET_CURRENT(now);
+
+						used_time = (100.0 * (INSTR_TIME_GET_MICROSEC(now) -
+							st->txn_scheduled) / max_tries_time);
+					}
+
+					if (canRetry(st, &now))
+					{
+						/*
+						 * The failed transaction will be retried. So accumulate
+						 * the retry.
+						 */
+						st->retries++;
+						command->retries++;
+
+						if (debug_level >= DEBUG_FAILS)
+						{
+							fprintf(stderr,
+									"client %d repeats the failed transaction (try %d",
+									st->id, st->retries + 1);
+
+							if (max_tries)
+								fprintf(stderr, "/%d", max_tries);
+
+							if (max_tries_time)
+							{
+								fprintf(stderr,
+										", %.3f%% of the maximum time of tries was used",
+										used_time);
+							}
+
+							fprintf(stderr, ")\n");
+						}
+
+						/*
+						 * Reset the execution parameters as they were at the
+						 * beginning of the transaction.
+						 */
+						copyRandomState(&st->random_state,
+										&st->retry_state.random_state);
+						copyVariables(&st->variables, &st->retry_state.variables);
+
+						/* Process the first transaction command */
+						st->command = 0;
+						st->first_failure.status = NO_FAILURE;
+						st->state = CSTATE_START_COMMAND;
+					}
+					else
+					{
+						/*
+						 * We will not be able to retry this failed transaction.
+						 * So accumulate the error.
+						 */
+						command->errors++;
+						if (st->first_failure.status == IN_FAILED_SQL_TRANSACTION)
+							command->errors_in_failed_tx++;
+
+						if (debug_level >= DEBUG_FAILS)
+						{
+							fprintf(stderr,
+									"client %d ends the failed transaction (try %d",
+									st->id, st->retries + 1);
+
+							/*
+							 * Report the actual number and/or time of
+							 * tries. We do not need this information if this
+							 * type of failure can be never retried.
+							 */
+							if (canRetryFailure(st->first_failure.status))
+							{
+								if (max_tries)
+									fprintf(stderr, "/%d", max_tries);
+
+								if (max_tries_time)
+								{
+									fprintf(stderr,
+											", %.3f%% of the maximum time of tries was used",
+											used_time);
+								}
+							}
+
+							fprintf(stderr, ")\n");
+						}
+
+						/*
+						 * Reset the execution parameters as they were at the
+						 * beginning of the transaction except for a random
+						 * state.
+						 */
+						copyVariables(&st->variables, &st->retry_state.variables);
+
+						/* End the failed transaction */
+						st->state = CSTATE_END_TX;
+					}
+				}
+				break;
+
+				/*
 				 * End of transaction.
 				 */
 			case CSTATE_END_TX:
@@ -3232,7 +3829,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3292,7 +3890,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->random_state.data) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -3308,13 +3906,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3325,6 +3925,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1 || max_tries_time)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3332,7 +3936,8 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->first_failure.status,
+				   st->retries);
 	}
 	else
 	{
@@ -3342,14 +3947,25 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->first_failure.status == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->first_failure.status == IN_FAILED_SQL_TRANSACTION)
+			fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1 || max_tries_time)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3369,7 +3985,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->first_failure.status == NO_FAILURE)
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3382,7 +3998,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -3390,19 +4007,24 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, 0, 0, st->first_failure.status,
+				   st->retries);
 	}
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->first_failure.status == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 }
 
 
@@ -4535,7 +5157,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + total->errors;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4556,8 +5179,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4565,6 +5188,32 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors, 100.0 * total->errors / total_ntx);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total_ntx);
+
+	/*
+	 * It can be non-zero only if max_tries is greater than one or
+	 * max_tries_time is used.
+	 */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+	if (max_tries_time)
+		printf("maximum time of tries: %.1f ms\n", max_tries_time / 1000.0);
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4594,8 +5243,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+		printf("latency average = %.3f ms",
+			   1000.0 * time_include * nclients / total_ntx);
+
+		/* this statistics includes both successful and failed transactions */
+		if (total->errors > 0)
+			printf(" (including errors)");
+
+		printf("\n");
 	}
 
 	if (throttle_delay)
@@ -4614,7 +5269,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4623,6 +5278,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_total_ntx = sstats->cnt + sstats->errors;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4631,9 +5287,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
-					   100.0 * sstats->cnt / total->cnt,
+					   100.0 * sstats->cnt / script_total_ntx,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / script_total_ntx);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors_in_failed_tx,
+						   (100.0 * sstats->errors_in_failed_tx /
+							script_total_ntx));
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * max_tries_time is used.
+				 */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4642,15 +5322,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+				{
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" : ","));
+				}
+				if (total->errors_in_failed_tx > 0)
+				{
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				}
+				if (total->retried > 0)
+				{
+					printf(" and retries");
+				}
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4658,10 +5356,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					}
+					if (total->errors_in_failed_tx > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					}
+					if (total->retried > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					}
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4716,6 +5429,17 @@ set_random_seed(const char *seed)
 	return true;
 }
 
+/*
+ * Initialize the random state of the client/thread.
+ */
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->data[0] = random();
+	random_state->data[1] = random();
+	random_state->data[2] = random();
+}
+
 
 int
 main(int argc, char **argv)
@@ -4725,7 +5449,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"debug", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -4740,7 +5464,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4759,6 +5483,8 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
+		{"max-tries-time", required_argument, NULL, 11},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4834,7 +5560,7 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "iI:h:nvp:d:qb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -4864,8 +5590,22 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
-				break;
+				{
+					for (debug_level = 0;
+						 debug_level < NUM_DEBUGLEVEL;
+						 debug_level++)
+					{
+						if (strcmp(optarg, DEBUGLEVEl[debug_level]) == 0)
+							break;
+					}
+					if (debug_level >= NUM_DEBUGLEVEL)
+					{
+						fprintf(stderr, "invalid debug level (-d): \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+					break;
+				}
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
@@ -4917,7 +5657,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -4998,7 +5738,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5114,6 +5854,34 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 10:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg <= 0)
+					{
+						fprintf(stderr, "invalid number of maximum tries: \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 11:			/* max-tries-time */
+				{
+					double		max_tries_time_ms = atof(optarg);
+
+					if (max_tries_time_ms <= 0.0)
+					{
+						fprintf(stderr, "invalid maximum time of tries: \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+					benchmarking_option_set = true;
+					max_tries_time = (uint64) (max_tries_time_ms * 1000);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5283,6 +6051,10 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/* If necessary set the default tries limit  */
+	if (!max_tries && !max_tries_time)
+		max_tries = 1;
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5300,19 +6072,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
-										   var->name, &var->value))
+					if (!putVariableValue(&state[i].variables, "startup",
+										   var->name, &var->value, true))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5324,9 +6096,10 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
+	if (debug_level >= DEBUG_ALL)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -5387,11 +6160,12 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale,
+								true))
 				exit(1);
 		}
 	}
@@ -5400,15 +6174,18 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+		{
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i,
+								true))
 				exit(1);
+		}
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64	seed = ((uint64) (random() & 0xFFFF) << 48) |
 					   ((uint64) (random() & 0xFFFF) << 32) |
@@ -5416,15 +6193,17 @@ main(int argc, char **argv)
 					   (uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed, true))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed, true))
 				exit(1);
 	}
 
@@ -5457,9 +6236,7 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->random_state);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
@@ -5541,6 +6318,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5825,7 +6606,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -5852,6 +6637,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -5869,6 +6659,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -5894,6 +6689,14 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (errors > 0)
+				{
+					fprintf(stderr, ", " INT64_FORMAT " failed" , errors);
+					if (errors_in_failed_tx > 0)
+						fprintf(stderr, " (" INT64_FORMAT " in failed tx)",
+								errors_in_failed_tx);
+				}
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -5901,6 +6704,16 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * max_tries_time is used.
+				 */
+				if (retried > 0)
+				{
+					fprintf(stderr, ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							retried, retries);
+				}
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index be08b20..96b3876 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -118,23 +118,28 @@ pgbench(
 	[   qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple} ],
+		qr{mode: simple},
+		qr{maximum number of tries: 1},
+		qr{^((?!maximum time of tries)(.|\n))*$} ],
 	[qr{^$}],
 	'pgbench tpcb-like');
 
 pgbench(
-'--transactions=20 --client=5 -M extended --builtin=si -C --no-vacuum -s 1',
+'--transactions=20 --client=5 -M extended --builtin=si -C --no-vacuum -s 1'
+	  . ' --max-tries-time 1',    # no-op, just for testing
 	0,
 	[   qr{builtin: simple update},
 		qr{clients: 5\b},
 		qr{threads: 1\b},
 		qr{processed: 100/100},
-		qr{mode: extended} ],
+		qr{mode: extended},
+		qr{maximum time of tries: 1},
+		qr{^((?!maximum number of tries)(.|\n))*$} ],
 	[qr{scale option ignored}],
 	'pgbench simple update');
 
 pgbench(
-	'-t 100 -c 7 -M prepared -b se --debug',
+	'-t 100 -c 7 -M prepared -b se --debug all',
 	0,
 	[   qr{builtin: select only},
 		qr{clients: 7\b},
@@ -491,6 +496,10 @@ my @errors = (
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 } ],
+	[   'sql division by zero', 0, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+	SELECT 1 / 0;
+} ],
 
 	# SHELL
 	[   'shell bad command',               0,
@@ -621,6 +630,16 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 	[   'sleep unknown unit',         1,
 		[qr{unrecognized time unit}], q{\sleep 1 week} ],
 
+	# CONDITIONAL BLOCKS
+	[   'if elif failed conditions', 0,
+		[qr{division by zero}],
+		q{-- failed conditions
+\if 1 / 0
+\elif 1 / 0
+\else
+\endif
+} ],
+
 	# MISC
 	[   'misc invalid backslash command',         1,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand} ],
@@ -635,14 +654,32 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared -d fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)},
+		   qr{^((?!number of retried)(.|\n))*$} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
 }
 
+# reset client variables in case of failure
+pgbench(
+	'-n -t 2 -d fails', 0,
+	[ qr{processed: 0/2}, qr{number of errors: 2 \(100.000%\)},
+	  qr{^((?!number of retried)(.|\n))*$} ],
+	[ qr{(client 0 got a failure in command 1 \(SQL\) of script 0; ERROR:  syntax error at or near ":"(.|\n)*){2}} ],
+	'pgbench reset client variables in case of failure',
+	{	'001_pgbench_reset_client_variables' => q{
+BEGIN;
+-- select an unassigned variable
+SELECT :unassigned_var;
+\set unassigned_var 1
+END;
+} });
+
 # zipfian cache array overflow
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index af21f04..e6886a7 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -57,7 +57,7 @@ my @options = (
 
 	# name, options, stderr checks
 	[   'bad option',
-		'-h home -p 5432 -U calvin -d --bad-option',
+		'-h home -p 5432 -U calvin -d all --bad-option',
 		[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ] ],
 	[   'no file',
 		'-f no-such-file',
@@ -113,6 +113,10 @@ my @options = (
 	[ 'bad random seed', '--random-seed=one',
 		[qr{unrecognized random seed option "one": expecting an unsigned integer, "time" or "rand"},
 		 qr{error while setting random seed from --random-seed option} ] ],
+	[ 'bad maximum number of tries', '--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"} ] ],
+	[ 'bad maximum time of tries', '--max-tries-time -10',
+		[qr{invalid maximum time of tries: "-10"} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..5660ddd
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,739 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 32;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	DEADLOCK_1         => 3,
+	WAIT_PGBENCH_2     => 4,
+	DEADLOCK_2         => 5,
+	TRANSACTION_ENDS_1 => 6,
+	TRANSACTION_ENDS_2 => 7,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "SELECT pg_sleep(1);\n"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "END;\n");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_1 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_2 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my ($max_tries, $max_tries_time, $test_name) = @_;
+
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	my $retry_options =
+		($max_tries ? "--max-tries $max_tries" : "")
+	  . ($max_tries_time ? "--max-tries-time $max_tries_time" : "");
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 2 --debug fails --file),
+		$script_serialization,
+		split /\s+/, $retry_options);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/2},
+		"$test_name: check processed transactions");
+
+	like($out_pgbench,
+		qr{number of errors: 1 \(50\.000%\)},
+		"$test_name: check errors");
+
+	like($out_pgbench,
+		qr{^((?!number of retried)(.|\n))*$},
+		"$test_name: check retried");
+
+	like($out_pgbench,
+		qr{latency average = \d+\.\d{3} ms \(including errors\)},
+		"$test_name: check latency average");
+
+	my $pattern =
+		"client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"$test_name: check serialization failure");
+}
+
+sub test_pgbench_serialization_failures
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update with retrying: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent update with retrying: check errors");
+
+	like($out_pgbench,
+		qr{number of retried: 1 \(100\.000%\)},
+		"concurrent update with retrying: check retried");
+
+	like($out_pgbench,
+		qr{number of retries: 1},
+		"concurrent update with retrying: check retries");
+
+	like($out_pgbench,
+		qr{latency average = \d+\.\d{3} ms\n},
+		"concurrent update with retrying: check latency average");
+
+	my $pattern =
+		"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update\n\n"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g2+"
+	  . "client 0 continues a failed transaction in command 4 \\(SQL\\) of script 0; "
+	  . "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_sleep\\(1\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update with retrying: check the retried transaction");
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	# The first or second pgbench should get a deadlock error
+	ok((($out1 =~ /processed: 0\/1/ and $out2 =~ /processed: 1\/1/) or
+		($out2 =~ /processed: 0\/1/ and $out1 =~ /processed: 1\/1/)),
+		"concurrent deadlock update: check processed transactions");
+
+	ok((($out1 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of errors)(.|\n))*$/) or
+		($out2 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of errors)(.|\n))*$/)),
+		"concurrent deadlock update: check errors");
+
+	ok(($err1 =~ /client 0 got a failure in command 3 \(SQL\) of script 0; ERROR:  deadlock detected/ or
+		$err2 =~ /client 0 got a failure in command 2 \(SQL\) of script 0; ERROR:  deadlock detected/),
+		"concurrent deadlock update: check deadlock failure");
+
+	# Both pgbenches do not have retried transactions
+	like($out1 . $out2,
+		qr{^((?!number of retried)(.|\n))*$},
+		"concurrent deadlock update: check retried");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks that pgbenches will wait for:
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_1 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_1 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_1 ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_2 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_2 ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_1 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_1 ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_2 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_2 ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	ok((($out1 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of retried)(.|\n))*$/) or
+		($out2 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of retried)(.|\n))*$/)),
+		"concurrent deadlock update with retrying: check retries");
+
+	my $pattern1 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	my $pattern2 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+test_pgbench_serialization_errors(
+								1,      # --max-tries
+								0,      # --max-tries-time (will not be used)
+								"concurrent update");
+test_pgbench_serialization_errors(
+								0,	    # --max-tries (will not be used)
+								900,    # --max-tries-time
+								"concurrent update with maximum time of tries");
+
+test_pgbench_serialization_failures();
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+#done
+$node->stop;
-- 
2.7.4

#52

Ildus Kurbangaliev

i.kurbangaliev@postgrespro.ru

almost 8 years ago

In reply to: Marina Polyakova (#51)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Wed, 04 Apr 2018 16:07:25 +0300
Marina Polyakova <m.polyakova@postgrespro.ru> wrote:

Hello, hackers!

Here there's a seventh version of the patch for error handling and
retrying of transactions with serialization/deadlock failures in
pgbench (based on the commit
a08dc711952081d63577fc182fcf955958f70add). I added the option
--max-tries-time which is an implemetation of Fabien Coelho's
proposal in [1]: the transaction with serialization or deadlock
failure can be retried if the total time of all its tries is less
than this limit (in ms). This option can be combined with the option
--max-tries. But if none of them are used, failed transactions are
not retried at all.

Also:
* Now when the first failure occurs in the transaction it is always
reported as a failure since only after the remaining commands of this
transaction are executed we find out whether we can try again or not.
Therefore add the messages about retrying or ending the failed
transaction to the "fails" debugging level so you can distinguish
failures (which are retried) and errors (which are not retried).
* Fix a report on the latency average because the total time includes
time for both errors and successful transactions.
* Code cleanup (including tests).

[1]
/messages/by-id/alpine.DEB.2.20.1803292134380.16472@lancre

Maybe the max retry should rather be expressed in time rather than
number
of attempts, or both approach could be implemented?

Hi, I did a little review of your patch. It seems to work as
expected, documentation and tests are there. Still I have few comments.

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

In CSTATE_RETRY state used_time is used only in printing but calculated
more than needed.

In my opinion Debuglevel should be renamed to DebugLevel that looks
nicer, also there DEBUGLEVEl (where last letter is in lower case) which
is very confusing.

I have checked overall functionality of this patch, but haven't checked
any special cases yet.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#53

Marina Polyakova

m.polyakova@postgrespro.ru

almost 8 years ago

In reply to: Ildus Kurbangaliev (#52)

1 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hi, I did a little review of your patch. It seems to work as
expected, documentation and tests are there. Still I have few comments.

Hello! Thank you very much! I attached the fixed version of the patch
(based on the commit 94c1f9ba11d1241a2b3b2be7177604b26b08bc3d) + thanks
to Fabien Coelho's comments outside of this thread, I removed the option
--max-tries-time and the option --latency-limit can be used to limit the
time of transaction tries.

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

I agree, fixed.

In CSTATE_RETRY state used_time is used only in printing but calculated
more than needed.

Sorry, fixed.

In my opinion Debuglevel should be renamed to DebugLevel that looks
nicer, also there DEBUGLEVEl (where last letter is in lower case) which
is very confusing.

Sorry for this typos =[ Fixed.

I have checked overall functionality of this patch, but haven't checked
any special cases yet.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v8-0001-Pgbench-errors-and-serialization-deadlock-retries.patchtext/x-diff; charset=us-ascii; name=v8-0001-Pgbench-errors-and-serialization-deadlock-retries.patchDownload

From a743d5e9e562e727c8f2a9df7e0949148d2f4984 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Fri, 6 Apr 2018 21:12:41 +0300
Subject: [PATCH v8] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures are rolled back and
repeated until they complete successfully or reach the maximum number of tries
(specified by the --max-tries option) / the maximum time of tries (specified by
the --latency-limit option). These options can be combined together; but if
none of them are used, failed transactions are not retried at all. If the last
transaction run fails, this transaction will be reported as failed, and the
client variables will be set as they were before the first run of this
transaction.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block does not terminate in the current script, the
commands of the following scripts are processed as usual so you can get a lot of
errors of type "in failed SQL transaction" (when the current SQL transaction is
aborted and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

If you want to distinguish between failures or errors by type (including which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock errors), use the pgbench debugging output created with
the option --debug and with the debugging level "fails" or "all". The first
variant is recommended for this purpose because with in the second case the
debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  321 ++-
 src/bin/pgbench/pgbench.c                          | 2100 +++++++++++++++-----
 src/bin/pgbench/t/001_pgbench_with_server.pl       |   41 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl         |    4 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  761 +++++++
 5 files changed, 2715 insertions(+), 512 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 41d9030..2756703 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,19 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
-  and intended (the latter being just the product of number of clients
+  The first six lines and the eighth line report some of the most important
+  parameter settings.  The seventh line reports the number of transactions
+  completed and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  (see <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+  for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -380,11 +383,28 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
-      <term><option>-d</option></term>
-      <term><option>--debug</option></term>
+      <term><option>-d</option> <replaceable>debug_level</replaceable></term>
+      <term><option>--debug=</option><replaceable>debug_level</replaceable></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. You can use the following debugging levels:
+          <itemizedlist>
+           <listitem>
+            <para><literal>no</literal>: no debugging output (except built-in
+            function <function>debug</function>, see <xref
+            linkend="pgbench-functions"/>).</para>
+           </listitem>
+           <listitem>
+            <para><literal>fails</literal>: print only failure messages, errors
+            and retries (see <xref linkend="errors-and-retries"
+            endterm="errors-and-retries-title"/> for more information).</para>
+           </listitem>
+           <listitem>
+            <para><literal>all</literal>: print all debugging output
+            (throttling, executed/sent/received commands etc.).</para>
+           </listitem>
+          </itemizedlist>
+        The default is no debugging output.
        </para>
       </listitem>
      </varlistentry>
@@ -453,6 +473,16 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        The transaction with serialization or deadlock failure can be retried if
+        the total time of all its tries is less than
+        <replaceable>limit</replaceable> ms. This option can be combined with
+        the option <option>--max-tries</option> which limits the total number of
+        transaction tries. But if none of them are used, failed transactions are
+        not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -513,22 +543,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the options <option>--max-tries</option> and/or
+        <option>--latency-limit</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -667,6 +713,21 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. This option can be combined with the
+        option <option>--latency-limit</option> which limits the total time of
+        transaction tries. But if none of them are used, failed transactions are
+        not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +868,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1644,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1665,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). If the transaction
+   ended with an error "in failed SQL transaction", its
+   <replaceable>time</replaceable> will be reported as
+   <literal>in_failed_tx</literal>. If the transaction ended with other error,
+   its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1705,24 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1737,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1751,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1765,28 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). They report the
+   number of retried transactions and the sum of all the retries after
+   serialization or deadlock failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1695,15 +1798,54 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
   </para>
 
+   <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
+   </para>
+
   <para>
    For the default script, the output will look similar to this:
 <screen>
@@ -1715,6 +1857,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
@@ -1732,10 +1875,50 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 4473/10000
+number of errors: 5527 (55.270%)
+number of retried: 7467 (74.670%)
+number of retries: 257244
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 5766/10000 (57.660 %) (including errors)
+latency average = 41.169 ms
+latency stddev = 51.783 ms
+tps = 50.322494 (including connections establishing)
+tps = 50.324595 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.004     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.213     0       0  BEGIN;
+  0.393     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.332     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.409  4971  250265  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.311   556    6975  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.299     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  0.520     0       4  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1932,78 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally until the end of the
+   current script execution (it is assumed that one transaction script contains
+   only one transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed,
+   and the client variables will be set as they were before the first run of
+   this transaction.
+  </para>
+
+  <note>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+   <para>
+    If a failed transaction block does not terminate in the current script, the
+    commands of the following scripts are processed as usual so you can get a
+    lot of errors of type "in failed SQL transaction" (when the current SQL
+    transaction is aborted and commands ignored until end of transaction block).
+    In such cases you can use separate statistics of these errors in all
+    reports.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use the options
+   <option>--max-tries</option> and/or <option>--latency-limit</option> to make
+   it possible). The per-statement report inherits all columns from the main
+   report. Note that if a failure/error occurs, the following failures/errors in
+   the current script execution are not shown in the reports. The retry is only
+   reported for the first command where the failure occured during the current
+   script execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type (including
+   which limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock errors), use the <application>pgbench</application>
+   debugging output created with the option <option>--debug</option> and with
+   the debugging level <literal>fails</literal> or <literal>all</literal>. The
+   first variant is recommended for this purpose because with in the second case
+   the debugging output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fd18568..8073421 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -186,9 +189,25 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There're different types of restrictions for deciding that the current failed
+ * transaction can no longer be retried and should be reported as failed:
+ * - max_tries can be used to limit the number of tries;
+ * - latency_limit can be used to limit the total time of tries.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the failed transactions. By default, failed transactions are not
+ * retried at all.
+ */
+uint32		max_tries = 0;		/* we cannot retry a failed transaction if its
+								 * number of tries reaches this maximum; if its
+								 * value is zero, it is not used */
+
 char	   *pghost = "";
 char	   *pgport = "";
 char	   *login = NULL;
@@ -242,14 +261,74 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
+ * Data structure for thread/client random seed.
+ */
+typedef struct RandomState
+{
+	unsigned short data[3];
+} RandomState;
+
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	ANOTHER_FAILURE,			/* other failures that are not listed by
+								 * themselves below */
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION
+} FailureStatus;
+
+typedef struct Failure
+{
+	FailureStatus status;		/* type of the failure */
+	int			command;		/* command number in script where the failure
+								 * occurred */
+} Failure;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -304,6 +383,22 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, remember the failure in CSTATE_FAILURE. Then process other
+	 * commands of the failed transaction if any and go to CSTATE_RETRY. If we
+	 * can re-execute the transaction from the very beginning, report this as a
+	 * failure, set the same parameters for the transaction execution as in the
+	 * previous tries and process the first transaction command in
+	 * CSTATE_START_COMMAND. Otherwise, report this as an error, set the
+	 * parameters for the transaction execution as they were before the first
+	 * run of this transaction (except for a random state) and go to
+	 * CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -329,14 +424,13 @@ typedef struct
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
+	RandomState random_state;	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -346,6 +440,18 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	Failure		first_failure;	/* status and command number of the first
+								 * failure in the current transaction execution;
+								 * status NO_FAILURE if there were no failures
+								 * or errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction? */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -380,6 +486,42 @@ typedef struct
 	ZipfCell	cells[ZIPF_CACHE_SIZE];
 } ZipfCache;
 
+typedef enum ErrorLevel
+{
+	/*
+	 * To report throttling, executed/sent/received commands etc.
+	 */
+	ELEVEL_DEBUG,
+
+	/*
+	 * Normal failure of the SQL/meta command, or processing of the failed
+	 * transaction (its end/retry).
+	 */
+	ELEVEL_CLIENT_FAIL,
+
+	/*
+	 * Something serious e.g. connection with the backend was lost.. therefore
+	 * abort the client.
+	 */
+	ELEVEL_CLIENT_ABORTED,
+
+	/*
+	 * To report the error/log messages of the main program and/or
+	 * PGBENCH_DEBUG.
+	 */
+	ELEVEL_MAIN,
+} ErrorLevel;
+
+/*
+ * Error state for error messages or other logging messages.
+ */
+typedef struct ErrorState
+{
+	bool		in_process;		/* are we processing the error now? */
+	char		message[2048];
+	int			message_length;
+} ErrorState;
+
 /*
  * Thread state
  */
@@ -389,17 +531,20 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+	RandomState random_state; 	/* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
 								 * generation */
+	ErrorState  error_state;	/* for thread-safe error messages or other
+								 * logging messages */
 
 	/* per thread collected stats */
 	instr_time	start_time;		/* thread start time */
 	instr_time	conn_time;
 	StatsData	stats;
-	int64		latency_late;	/* executed but late transactions */
+	int64		latency_late;	/* executed but late transactions (including
+								 * errors) */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -445,6 +590,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 } Command;
 
 typedef struct ParsedScript
@@ -460,7 +609,18 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum DebugLevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_FAILS,				/* print only failure messages, errors and
+								 * retries */
+	DEBUG_ALL,					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+	NUM_DEBUGLEVEL
+} DebugLevel;
+
+static DebugLevel debug_level = NO_DEBUG;	/* debug flag */
+static const char *DEBUGLEVEL[] = {"no", "fails", "all"};
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -508,8 +668,14 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+/*
+ * To handle error messages or other logging messages outside threads.
+ */
+static ErrorState main_error_state;
+
 
 /* Function prototypes */
+static int64 strtoint64_thread(TState *thread, const char *str);
 static void setNullValue(PgBenchValue *pv);
 static void setBoolValue(PgBenchValue *pv, bool bval);
 static void setIntValue(PgBenchValue *pv, int64 ival);
@@ -525,6 +691,42 @@ static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
 
+/*
+ * To report error messages or other log messages, you can use the following
+ * constructs:
+ *
+ * elog(thread, ELEVEL_CLIENT_FAIL,
+ *      "too many function arguments, maximum is %d\n", MAX_FARGS);
+ *
+ * Or:
+ *
+ * if (errstart(thread, ELEVEL_CLIENT_FAIL))
+ * {
+ *     errmsg(thread,
+ *            "too many function arguments, maximum is %d\n", MAX_FARGS);
+ *     ... other errmsg() calls if needed ...
+ *     errfinish(thread);
+ * }
+ *
+ * In fact, the function elog() calls the functions errstart(), errmsg() and
+ * errfinish() in the same way, so it is just wrapper for more comfortable use.
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise. In the
+ * first case the error state of the thread will be used to store the entire
+ * error message during the invokations of the function errmsg(). And the
+ * message will be printed to stderr in function errfinish() (it is assumed that
+ * it is not empty).
+ * Otherwise the main error state will be used for the same purposes.
+ */
+static void errmsg_internal(TState *thread, const char *fmt,
+							va_list *args) pg_attribute_printf(2, 0);
+static void elog(TState *thread, ErrorLevel error_level,
+				 const char *fmt,...) pg_attribute_printf(3, 4);
+static bool errstart(TState *thread, ErrorLevel error_level);
+static void errmsg(TState *thread,
+				   const char *fmt,...) pg_attribute_printf(2, 3);
+static void errfinish(TState *thread);
+
 
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
@@ -572,7 +774,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -581,11 +783,12 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug              print debugging output\n"
+		   "  -d, --debug=no|fails|all print debugging output (default: no)\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -632,6 +835,12 @@ is_an_int(const char *str)
 int64
 strtoint64(const char *str)
 {
+	return strtoint64_thread(NULL, str);
+}
+
+static int64
+strtoint64_thread(TState *thread, const char *str)
+{
 	const char *ptr = str;
 	int64		result = 0;
 	int			sign = 1;
@@ -667,7 +876,10 @@ strtoint64(const char *str)
 
 	/* require at least one digit */
 	if (!isdigit((unsigned char) *ptr))
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+	{
+		elog(thread, ELEVEL_MAIN,
+			 "invalid input syntax for integer: \"%s\"\n", str);
+	}
 
 	/* process digits */
 	while (*ptr && isdigit((unsigned char) *ptr))
@@ -675,7 +887,10 @@ strtoint64(const char *str)
 		int64		tmp = result * 10 + (*ptr++ - '0');
 
 		if ((tmp / 10) != result)	/* overflow? */
-			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+		{
+			elog(thread, ELEVEL_MAIN,
+				 "value \"%s\" is out of range for type bigint\n", str);
+		}
 		result = tmp;
 	}
 
@@ -686,14 +901,17 @@ gotdigits:
 		ptr++;
 
 	if (*ptr != '\0')
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+	{
+		elog(thread, ELEVEL_MAIN,
+			 "invalid input syntax for integer: \"%s\"\n", str);
+	}
 
 	return ((sign < 0) ? -result : result);
 }
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -704,7 +922,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->data));
 }
 
 /*
@@ -713,7 +931,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -723,7 +942,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -736,7 +955,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -764,8 +984,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->data);
+		double		rand2 = 1.0 - pg_erand48(random_state->data);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -792,7 +1012,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -801,7 +1021,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -879,7 +1099,7 @@ zipfFindOrCreateCacheCell(ZipfCache * cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -890,8 +1110,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->data);
+		v = pg_erand48(random_state->data);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -909,10 +1129,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, RandomState *random_state, int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state->data);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -924,7 +1145,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -933,8 +1155,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					? computeIterativeZipfian(random_state, n, s)
+					: computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -1034,6 +1256,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1042,8 +1268,30 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus first_error, int64 retries)
 {
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* Record the failed transaction */
+	if (first_error != NO_FAILURE)
+	{
+		stats->errors++;
+
+		if (first_error == IN_FAILED_SQL_TRANSACTION)
+			stats->errors_in_failed_tx++;
+
+		return;
+	}
+
+	/* Record the successful transaction */
+
 	stats->cnt++;
 
 	if (skipped)
@@ -1070,7 +1318,7 @@ executeStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
+		elog(NULL, ELEVEL_MAIN, "%s", PQerrorMessage(con));
 		exit(1);
 	}
 	PQclear(res);
@@ -1085,15 +1333,19 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		fprintf(stderr, "(ignoring this error and continuing anyway)\n");
+		if (errstart(NULL, ELEVEL_MAIN))
+		{
+			errmsg(NULL, "%s", PQerrorMessage(con));
+			errmsg(NULL, "(ignoring this error and continuing anyway)\n");
+			errfinish(NULL);
+		}
 	}
 	PQclear(res);
 }
 
 /* set up a connection to the backend */
 static PGconn *
-doConnect(void)
+doConnect(TState *thread)
 {
 	PGconn	   *conn;
 	bool		new_pass;
@@ -1132,8 +1384,8 @@ doConnect(void)
 
 		if (!conn)
 		{
-			fprintf(stderr, "connection to database \"%s\" failed\n",
-					dbName);
+			elog(thread, ELEVEL_MAIN,
+				 "connection to database \"%s\" failed\n", dbName);
 			return NULL;
 		}
 
@@ -1151,8 +1403,9 @@ doConnect(void)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed:\n%s",
-				dbName, PQerrorMessage(conn));
+		elog(thread, ELEVEL_MAIN,
+			 "connection to database \"%s\" failed:\n%s",
+			 dbName, PQerrorMessage(conn));
 		PQfinish(conn);
 		return NULL;
 	}
@@ -1184,39 +1437,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1244,7 +1497,7 @@ getVariable(CState *st, char *name)
 
 /* Try to convert variable to a value; return false on failure */
 static bool
-makeVariableValue(Variable *var)
+makeVariableValue(TState *thread, Variable *var)
 {
 	size_t slen;
 
@@ -1281,7 +1534,7 @@ makeVariableValue(Variable *var)
 	}
 	else if (is_an_int(var->svalue))
 	{
-		setIntValue(&var->value, strtoint64(var->svalue));
+		setIntValue(&var->value, strtoint64_thread(thread, var->svalue));
 	}
 	else						/* type should be double */
 	{
@@ -1290,9 +1543,9 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "malformed variable \"%s\" value: \"%s\"\n",
+				 var->name, var->svalue);
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1338,13 +1591,16 @@ valid_variable_name(const char *name)
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise.
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(TState *thread, Variables *variables, const char *context,
+					 char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1355,29 +1611,34 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			/*
+			 * About the error level used: if we process client commands, it a
+			 * normal failure; otherwise it is not and we exit the program.
+			 */
+			elog(thread, thread ? ELEVEL_CLIENT_FAIL : ELEVEL_MAIN,
+				 "%s: invalid variable name: \"%s\"\n",
+				 context, name);
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1386,12 +1647,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(NULL, variables, context, name);
 	if (!var)
 		return false;
 
@@ -1408,13 +1670,14 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
+/* Pass the thread as not NULL for client commands and as NULL otherwise */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
-				  const PgBenchValue *value)
+putVariableValue(TState *thread, Variables *variables, const char *context,
+				 char *name, const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(thread, variables, context, name);
 	if (!var)
 		return false;
 
@@ -1428,13 +1691,15 @@ putVariableValue(CState *st, const char *context, char *name,
 
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
+/* Pass the thread as not NULL for client commands and as NULL otherwise */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(TState *thread, Variables *variables, const char *context,
+			   char *name, int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(thread, variables, context, name, &val);
 }
 
 /*
@@ -1489,7 +1754,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1510,7 +1775,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1525,12 +1790,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -1556,7 +1822,7 @@ valueTypeName(PgBenchValue *pval)
 
 /* get a value as a boolean, or tell if there is a problem */
 static bool
-coerceToBool(PgBenchValue *pval, bool *bval)
+coerceToBool(TState *thread, PgBenchValue *pval, bool *bval)
 {
 	if (pval->type == PGBT_BOOLEAN)
 	{
@@ -1565,7 +1831,8 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else /* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "cannot coerce %s to boolean\n", valueTypeName(pval));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1597,7 +1864,7 @@ valueTruth(PgBenchValue *pval)
 
 /* get a value as an int, tell if there is a problem */
 static bool
-coerceToInt(PgBenchValue *pval, int64 *ival)
+coerceToInt(TState *thread, PgBenchValue *pval, int64 *ival)
 {
 	if (pval->type == PGBT_INT)
 	{
@@ -1610,7 +1877,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1618,14 +1886,15 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
 
 /* get a value as a double, or tell if there is a problem */
 static bool
-coerceToDouble(PgBenchValue *pval, double *dval)
+coerceToDouble(TState *thread, PgBenchValue *pval, double *dval)
 {
 	if (pval->type == PGBT_DOUBLE)
 	{
@@ -1639,7 +1908,8 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else /* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "cannot coerce %s to double\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1707,7 +1977,7 @@ evalLazyFunc(TState *thread, CState *st,
 			return true;
 		}
 
-		if (!coerceToBool(&a1, &ba1))
+		if (!coerceToBool(thread, &a1, &ba1))
 			return false;
 
 		if (!ba1)
@@ -1724,7 +1994,7 @@ evalLazyFunc(TState *thread, CState *st,
 			setNullValue(retval);
 			return true;
 		}
-		else if (!coerceToBool(&a2, &ba2))
+		else if (!coerceToBool(thread, &a2, &ba2))
 			return false;
 		else
 		{
@@ -1742,7 +2012,7 @@ evalLazyFunc(TState *thread, CState *st,
 			return true;
 		}
 
-		if (!coerceToBool(&a1, &ba1))
+		if (!coerceToBool(thread, &a1, &ba1))
 			return false;
 
 		if (ba1)
@@ -1759,7 +2029,7 @@ evalLazyFunc(TState *thread, CState *st,
 			setNullValue(retval);
 			return true;
 		}
-		else if (!coerceToBool(&a2, &ba2))
+		else if (!coerceToBool(thread, &a2, &ba2))
 			return false;
 		else
 		{
@@ -1817,8 +2087,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "too many function arguments, maximum is %d\n", MAX_FARGS);
 		return false;
 	}
 
@@ -1855,8 +2125,8 @@ evalStandardFunc(TState *thread, CState *st,
 					double		ld,
 								rd;
 
-					if (!coerceToDouble(lval, &ld) ||
-						!coerceToDouble(rval, &rd))
+					if (!coerceToDouble(thread, lval, &ld) ||
+						!coerceToDouble(thread, rval, &rd))
 						return false;
 
 					switch (func)
@@ -1903,8 +2173,8 @@ evalStandardFunc(TState *thread, CState *st,
 					int64		li,
 								ri;
 
-					if (!coerceToInt(lval, &li) ||
-						!coerceToInt(rval, &ri))
+					if (!coerceToInt(thread, lval, &li) ||
+						!coerceToInt(thread, rval, &ri))
 						return false;
 
 					switch (func)
@@ -1941,7 +2211,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								elog(thread, ELEVEL_CLIENT_FAIL,
+									 "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1952,7 +2223,8 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										elog(thread, ELEVEL_CLIENT_FAIL,
+											 "bigint out of range\n");
 										return false;
 									}
 									else
@@ -1986,7 +2258,8 @@ evalStandardFunc(TState *thread, CState *st,
 			{
 				int64 li, ri;
 
-				if (!coerceToInt(&vargs[0], &li) || !coerceToInt(&vargs[1], &ri))
+				if (!coerceToInt(thread, &vargs[0], &li) ||
+					!coerceToInt(thread, &vargs[1], &ri))
 					return false;
 
 				if (func == PGBENCH_BITAND)
@@ -2009,7 +2282,7 @@ evalStandardFunc(TState *thread, CState *st,
 		case PGBENCH_NOT:
 			{
 				bool b;
-				if (!coerceToBool(&vargs[0], &b))
+				if (!coerceToBool(thread, &vargs[0], &b))
 					return false;
 
 				setBoolValue(retval, !b);
@@ -2051,19 +2324,24 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 1);
 
-				fprintf(stderr, "debug(script=%d,command=%d): ",
-						st->use_file, st->command + 1);
-
-				if (varg->type == PGBT_NULL)
-					fprintf(stderr, "null\n");
-				else if (varg->type == PGBT_BOOLEAN)
-					fprintf(stderr, "boolean %s\n", varg->u.bval ? "true" : "false");
-				else if (varg->type == PGBT_INT)
-					fprintf(stderr, "int " INT64_FORMAT "\n", varg->u.ival);
-				else if (varg->type == PGBT_DOUBLE)
-					fprintf(stderr, "double %.*g\n", DBL_DIG, varg->u.dval);
-				else /* internal error, unexpected type */
-					Assert(0);
+				if (errstart(thread, ELEVEL_MAIN))
+				{
+					errmsg(thread, "debug(script=%d,command=%d): ",
+						   st->use_file, st->command + 1);
+
+					if (varg->type == PGBT_NULL)
+						errmsg(thread, "null\n");
+					else if (varg->type == PGBT_BOOLEAN)
+						errmsg(thread, "boolean %s\n", varg->u.bval ? "true" : "false");
+					else if (varg->type == PGBT_INT)
+						errmsg(thread, "int " INT64_FORMAT "\n", varg->u.ival);
+					else if (varg->type == PGBT_DOUBLE)
+						errmsg(thread, "double %.*g\n", DBL_DIG, varg->u.dval);
+					else /* internal error, unexpected type */
+						Assert(0);
+
+					errfinish(thread);
+				}
 
 				*retval = *varg;
 
@@ -2080,7 +2358,7 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 1);
 
-				if (!coerceToDouble(&vargs[0], &dval))
+				if (!coerceToDouble(thread, &vargs[0], &dval))
 					return false;
 
 				if (func == PGBENCH_SQRT)
@@ -2102,7 +2380,7 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 1);
 
-				if (!coerceToInt(&vargs[0], &ival))
+				if (!coerceToInt(thread, &vargs[0], &ival))
 					return false;
 
 				setIntValue(retval, ival);
@@ -2132,13 +2410,13 @@ evalStandardFunc(TState *thread, CState *st,
 				{
 					double		extremum;
 
-					if (!coerceToDouble(&vargs[0], &extremum))
+					if (!coerceToDouble(thread, &vargs[0], &extremum))
 						return false;
 					for (i = 1; i < nargs; i++)
 					{
 						double		dval;
 
-						if (!coerceToDouble(&vargs[i], &dval))
+						if (!coerceToDouble(thread, &vargs[i], &dval))
 							return false;
 						if (func == PGBENCH_LEAST)
 							extremum = Min(extremum, dval);
@@ -2151,13 +2429,13 @@ evalStandardFunc(TState *thread, CState *st,
 				{
 					int64		extremum;
 
-					if (!coerceToInt(&vargs[0], &extremum))
+					if (!coerceToInt(thread, &vargs[0], &extremum))
 						return false;
 					for (i = 1; i < nargs; i++)
 					{
 						int64		ival;
 
-						if (!coerceToInt(&vargs[i], &ival))
+						if (!coerceToInt(thread, &vargs[i], &ival))
 							return false;
 						if (func == PGBENCH_LEAST)
 							extremum = Min(extremum, ival);
@@ -2180,27 +2458,29 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs >= 2);
 
-				if (!coerceToInt(&vargs[0], &imin) ||
-					!coerceToInt(&vargs[1], &imax))
+				if (!coerceToInt(thread, &vargs[0], &imin) ||
+					!coerceToInt(thread, &vargs[1], &imax))
 					return false;
 
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					elog(thread, ELEVEL_CLIENT_FAIL,
+						 "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					elog(thread, ELEVEL_CLIENT_FAIL,
+						 "random range is too large\n");
 					return false;
 				}
 
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2208,46 +2488,49 @@ evalStandardFunc(TState *thread, CState *st,
 
 					Assert(nargs == 3);
 
-					if (!coerceToDouble(&vargs[2], &param))
+					if (!coerceToDouble(thread, &vargs[2], &param))
 						return false;
 
 					if (func == PGBENCH_RANDOM_GAUSSIAN)
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							elog(thread, ELEVEL_CLIENT_FAIL,
+								 "gaussian parameter must be at least %f "
+								 "(not %f)\n", MIN_GAUSSIAN_PARAM, param);
 							return false;
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							elog(thread, ELEVEL_CLIENT_FAIL,
+								 "zipfian parameter must be in range (0, 1) U (1, %d]"
+								 " (got %f)\n", MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, &st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							elog(thread, ELEVEL_CLIENT_FAIL,
+								 "exponential parameter must be greater than zero"
+								 " (got %f)\n", param);
 							return false;
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2263,8 +2546,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 2);
 
-				if (!coerceToDouble(lval, &ld) ||
-					!coerceToDouble(rval, &rd))
+				if (!coerceToDouble(thread, lval, &ld) ||
+					!coerceToDouble(thread, rval, &rd))
 					return false;
 
 				setDoubleValue(retval, pow(ld, rd));
@@ -2291,8 +2574,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 2);
 
-				if (!coerceToInt(&vargs[0], &val) ||
-					!coerceToInt(&vargs[1], &seed))
+				if (!coerceToInt(thread, &vargs[0], &val) ||
+					!coerceToInt(thread, &vargs[1], &seed))
 					return false;
 
 				if (func == PGBENCH_HASH_MURMUR2)
@@ -2346,14 +2629,15 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					elog(thread, ELEVEL_CLIENT_FAIL,
+						 "undefined variable \"%s\"\n",
+						 expr->u.variable.varname);
 					return false;
 				}
 
-				if (!makeVariableValue(var))
+				if (!makeVariableValue(thread, var))
 					return false;
 
 				*retval = var->value;
@@ -2368,8 +2652,8 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 		default:
 			/* internal error which should never occur */
-			fprintf(stderr, "unexpected enode type in evaluation: %d\n",
-					expr->etype);
+			elog(thread, ELEVEL_MAIN,
+				 "unexpected enode type in evaluation: %d\n", expr->etype);
 			exit(1);
 	}
 }
@@ -2410,7 +2694,8 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(TState *thread, Variables *variables, char *variable,
+				char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2441,17 +2726,18 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "%s: undefined variable \"%s\"\n", argv[0], argv[i]);
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2469,7 +2755,10 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		if (system(command))
 		{
 			if (!timer_exceeded)
-				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+			{
+				elog(thread, ELEVEL_CLIENT_FAIL,
+					 "%s: could not launch shell command\n", argv[0]);
+			}
 			return false;
 		}
 		return true;
@@ -2478,19 +2767,24 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
 		if (!timer_exceeded)
-			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
+		{
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "%s: could not read result of shell command\n", argv[0]);
+		}
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2500,11 +2794,12 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		elog(thread, ELEVEL_CLIENT_FAIL,
+			 "%s: shell command must return an integer (not \"%s\")\n",
+			 argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(thread, variables, "setshell", variable, retval))
 		return false;
 
 #ifdef DEBUG
@@ -2521,11 +2816,47 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(TState *thread, CState *st, const char *cmd, const char *message,
+			  ErrorLevel error_level)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	switch (error_level)
+	{
+		case ELEVEL_CLIENT_FAIL:
+			if (st->first_failure.status == NO_FAILURE)
+			{
+				/*
+				 * This is the first failure during the execution of the current
+				 * script.
+				 */
+				elog(thread, ELEVEL_CLIENT_FAIL,
+					 "client %d got a failure in command %d (%s) of script %d; %s\n",
+					 st->id, st->command, cmd, st->use_file, message);
+			}
+			else
+			{
+				/*
+				 * This is not the first failure during the execution of the
+				 * current script.
+				 */
+				elog(thread, ELEVEL_CLIENT_FAIL,
+					 "client %d continues a failed transaction in command %d (%s) of script %d; %s\n",
+					 st->id, st->command, cmd, st->use_file, message);
+			}
+			break;
+		case ELEVEL_CLIENT_ABORTED:
+			elog(thread, ELEVEL_CLIENT_ABORTED,
+				 "client %d aborted in command %d (%s) of script %d; %s\n",
+				 st->id, st->command, cmd, st->use_file, message);
+			break;
+		case ELEVEL_DEBUG:
+		case ELEVEL_MAIN:
+		default:
+			/* internal error which should never occur */
+			elog(thread, ELEVEL_MAIN,
+				 "unexpected error level when the command failed: %d\n",
+				 error_level);
+			exit(1);
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2538,7 +2869,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2549,7 +2880,7 @@ chooseScript(TState *thread)
 
 /* Send a SQL command, using the chosen querymode */
 static bool
-sendCommand(CState *st, Command *command)
+sendCommand(TState *thread, CState *st, Command *command)
 {
 	int			r;
 
@@ -2558,10 +2889,9 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		elog(thread, ELEVEL_DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
 	}
@@ -2570,10 +2900,9 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		elog(thread, ELEVEL_DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
 	}
@@ -2598,17 +2927,16 @@ sendCommand(CState *st, Command *command)
 				res = PQprepare(st->con, name,
 								commands[j]->argv[0], commands[j]->argc - 1, NULL);
 				if (PQresultStatus(res) != PGRES_COMMAND_OK)
-					fprintf(stderr, "%s", PQerrorMessage(st->con));
+					elog(thread, ELEVEL_MAIN, "%s", PQerrorMessage(st->con));
 				PQclear(res);
 			}
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, name);
+		elog(thread, ELEVEL_DEBUG, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
 	}
@@ -2617,10 +2945,8 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
-			fprintf(stderr, "client %d could not send %s\n",
-					st->id, command->argv[0]);
-		st->ecnt++;
+		elog(thread, ELEVEL_DEBUG,
+			 "client %d could not send %s\n", st->id, command->argv[0]);
 		return false;
 	}
 	else
@@ -2632,17 +2958,18 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(TState *thread, Variables *variables, int argc, char **argv,
+			  int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			elog(thread, ELEVEL_CLIENT_FAIL,
+				 "%s: undefined variable \"%s\"\n", argv[0], argv[1]);
 			return false;
 		}
 		usec = atoi(var);
@@ -2665,6 +2992,186 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+/*
+ * Copy an array of random state.
+ */
+static void
+copyRandomState(RandomState *destination, const RandomState *source)
+{
+	memcpy(destination->data, source->data, sizeof(unsigned short) * 3);
+}
+
+/*
+ * Make a deep copy of variables array.
+ */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->svalue);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->svalue)
+			current_destination->svalue = pg_strdup(current_source->svalue);
+		else
+			current_destination->svalue = NULL;
+		current_destination->value = current_source->value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if this type of failure can be retried.
+ */
+static bool
+canRetryFailure(FailureStatus failure_status)
+{
+	return (failure_status == SERIALIZATION_FAILURE ||
+			failure_status == DEADLOCK_FAILURE);
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, instr_time *now)
+{
+	FailureStatus failure_status = st->first_failure.status;
+
+	Assert(failure_status != NO_FAILURE);
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!canRetryFailure(failure_status))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of failed
+	 * transactions.
+	 */
+	Assert(max_tries || latency_limit);
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (max_tries && st->retries + 1 >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the failure if we spent too much time on this
+	 * transaction.
+	 */
+	if (latency_limit)
+	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
+		if (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled >= latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Process the conditional stack depending on the condition value; is used for
+ * the meta commands \if and \elif.
+ */
+static void
+executeCondition(CState *st, bool condition)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	/* execute or not depending on evaluated condition */
+	if (command->meta == META_IF)
+	{
+		conditional_stack_push(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+	else if (command->meta == META_ELIF)
+	{
+		/* we should get here only if the "elif" needed evaluation */
+		Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
+		conditional_stack_poke(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+			return IN_FAILED_SQL_TRANSACTION;
+	}
+
+	return ANOTHER_FAILURE;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, instr_time *now)
+{
+	if (!latency_limit)
+		return 0;
+
+	if (INSTR_TIME_IS_ZERO(*now))
+		INSTR_TIME_SET_CURRENT(*now);
+
+	return (100.0 * (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled) /
+			latency_limit);
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2675,6 +3182,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2705,9 +3213,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
-					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
-							sql_script[st->use_file].desc);
+				elog(thread, ELEVEL_DEBUG,
+					 "client %d executing script \"%s\"\n",
+					 st->id, sql_script[st->use_file].desc);
 
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
@@ -2715,6 +3223,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->first_failure.status = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2732,7 +3245,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2762,16 +3275,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->random_state,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -2779,9 +3293,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
-					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
-							st->id, wait);
+				elog(thread, ELEVEL_DEBUG,
+					 "client %d throttling " INT64_FORMAT " us\n",
+					 st->id, wait);
 				break;
 
 				/*
@@ -2811,10 +3325,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
 					start = now;
-					if ((st->con = doConnect()) == NULL)
+					if ((st->con = doConnect(thread)) == NULL)
 					{
-						fprintf(stderr, "client %d aborted while establishing connection\n",
-								st->id);
+						elog(thread, ELEVEL_CLIENT_ABORTED,
+							 "client %d aborted while establishing connection\n",
+							 st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -2826,7 +3341,16 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
-				 * Record transaction start time under logging, progress or
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case if it fails or we should repeat it in
+				 * future.
+				 */
+				copyRandomState(&st->retry_state.random_state,
+								&st->random_state);
+				copyVariables(&st->retry_state.variables, &st->variables);
+
+				/*
+				 * Record transaction start time under logging, progress,
 				 * throttling.
 				 */
 				if (use_log || progress || throttle_delay || latency_limit ||
@@ -2861,7 +3385,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (command == NULL)
 				{
-					st->state = CSTATE_END_TX;
+					if (st->first_failure.status == NO_FAILURE)
+					{
+						st->state = CSTATE_END_TX;
+					}
+					else
+					{
+						/* check if we can retry the failure */
+						st->state = CSTATE_RETRY;
+					}
 					break;
 				}
 
@@ -2869,7 +3401,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2878,9 +3410,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				if (command->type == SQL_COMMAND)
 				{
-					if (!sendCommand(st, command))
+					if (!sendCommand(thread, st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(thread, st, "SQL",
+									  "SQL command send failed",
+									  ELEVEL_CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -2892,14 +3426,19 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					if (errstart(thread, ELEVEL_DEBUG))
 					{
-						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
+						errmsg(thread, "client %d executing \\%s",
+							   st->id, argv[0]);
 						for (i = 1; i < argc; i++)
-							fprintf(stderr, " %s", argv[i]);
-						fprintf(stderr, "\n");
+							errmsg(thread, " %s", argv[i]);
+						errmsg(thread, "\n");
+						errfinish(thread);
 					}
 
+					/* change it if the meta command fails */
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -2911,10 +3450,14 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(thread, &st->variables, argc, argv,
+										   &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(thread, st, "sleep",
+										  "execution of meta-command failed",
+										  ELEVEL_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -2942,35 +3485,37 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(thread, st, argv[0],
+										  "evaluation of meta-command failed",
+										  ELEVEL_CLIENT_FAIL);
+
+							/*
+							 * Do not ruin the following conditional commands,
+							 * if any.
+							 */
+							executeCondition(st, false);
+
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(thread, &st->variables,
+												  argv[0], argv[1], &result))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								commandFailed(thread, st, "set",
+											  "assignment of meta-command failed",
+											  ELEVEL_CLIENT_FAIL);
+								failure_status = ANOTHER_FAILURE;
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
 						else /* if and elif evaluated cases */
 						{
-							bool cond = valueTruth(&result);
-
-							/* execute or not depending on evaluated condition */
-							if (command->meta == META_IF)
-							{
-								conditional_stack_push(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
-							else /* elif */
-							{
-								/* we should get here only if the "elif" needed evaluation */
-								Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
-								conditional_stack_poke(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
+							executeCondition(st, valueTruth(&result));
 						}
 					}
 					else if (command->meta == META_ELSE)
@@ -2999,7 +3544,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(thread,
+														  &st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3008,8 +3556,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(thread, st, "setshell",
+										  "execution of meta-command failed",
+										  ELEVEL_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3019,7 +3570,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(thread,
+														  &st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3028,8 +3581,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(thread, st, "shell",
+										  "execution of meta-command failed",
+										  ELEVEL_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3134,37 +3690,56 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					elog(thread, ELEVEL_DEBUG, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(thread, st, "SQL",
+									  "perhaps the backend died while processing",
+									  ELEVEL_CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							PQclear(res);
+							discard_response(st);
+							failure_status = NO_FAILURE;
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							failure_status = getFailureStatus(sqlState);
+							commandFailed(thread, st, "SQL",
+										  PQerrorMessage(st->con),
+										  ELEVEL_CLIENT_FAIL);
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(thread, st, "SQL",
+										  PQerrorMessage(st->con),
+										  ELEVEL_CLIENT_ABORTED);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3193,7 +3768,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3212,6 +3787,132 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
+				 * Remember the failure and go ahead with next command.
+				 */
+			case CSTATE_FAILURE:
+
+				Assert(failure_status != NO_FAILURE);
+
+				/*
+				 * All subsequent failures will be "retried"/"failed" if the
+				 * first failure of this transaction can be/cannot be retried.
+				 * Therefore remember only the first failure.
+				 */
+				if (st->first_failure.status == NO_FAILURE)
+				{
+					st->first_failure.status = failure_status;
+					st->first_failure.command = st->command;
+				}
+
+				/* Go ahead with next command, to be executed or skipped */
+				st->command++;
+				st->state = conditional_active(st->cstack) ?
+					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->first_failure.command];
+
+				if (canRetry(st, &now))
+				{
+					/*
+					 * The failed transaction will be retried. So accumulate the
+					 * retry.
+					 */
+					st->retries++;
+					command->retries++;
+
+					/*
+					 * Report this with failures to indicate that the failed
+					 * transaction will be retried.
+					 */
+					if (errstart(thread, ELEVEL_CLIENT_FAIL))
+					{
+						errmsg(thread,
+							   "client %d repeats the failed transaction (try %d",
+							   st->id, st->retries + 1);
+						if (max_tries)
+							errmsg(thread, "/%d", max_tries);
+						if (latency_limit)
+						{
+							errmsg(thread,
+								   ", %.3f%% of the maximum time of tries was used",
+								   getLatencyUsed(st, &now));
+						}
+						errmsg(thread, ")\n");
+						errfinish(thread);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction.
+					 */
+					copyRandomState(&st->random_state,
+									&st->retry_state.random_state);
+					copyVariables(&st->variables, &st->retry_state.variables);
+
+					/* Process the first transaction command */
+					st->command = 0;
+					st->first_failure.status = NO_FAILURE;
+					st->state = CSTATE_START_COMMAND;
+				}
+				else
+				{
+					/*
+					 * We will not be able to retry this failed transaction. So
+					 * accumulate the error.
+					 */
+					command->errors++;
+					if (st->first_failure.status ==
+						IN_FAILED_SQL_TRANSACTION)
+						command->errors_in_failed_tx++;
+
+					/*
+					 * Report this with failures to indicate that the failed
+					 * transaction will not be retried.
+					 */
+					if (errstart(thread, ELEVEL_CLIENT_FAIL))
+					{
+						errmsg(thread,
+							   "client %d ends the failed transaction (try %d",
+							   st->id, st->retries + 1);
+
+						/*
+						 * Report the actual number and/or time of tries. We
+						 * do not need this information if this type of
+						 * failure can be never retried.
+						 */
+						if (canRetryFailure(st->first_failure.status))
+						{
+							if (max_tries)
+								errmsg(thread, "/%d", max_tries);
+							if (latency_limit)
+							{
+								errmsg(thread,
+									   ", %.3f%% of the maximum time of tries was used",
+									   getLatencyUsed(st, &now));
+							}
+						}
+						errmsg(thread, ")\n");
+						errfinish(thread);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction except for a random
+					 * state.
+					 */
+					copyVariables(&st->variables, &st->retry_state.variables);
+
+					/* End the failed transaction */
+					st->state = CSTATE_END_TX;
+				}
+				break;
+
+				/*
 				 * End of transaction.
 				 */
 			case CSTATE_END_TX:
@@ -3222,7 +3923,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				/* conditional stack must be empty */
 				if (!conditional_stack_empty(st->cstack))
 				{
-					fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
+					elog(thread, ELEVEL_MAIN,
+						 "end of script reached within a conditional, missing \\endif\n");
 					exit(1);
 				}
 
@@ -3232,7 +3934,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3292,7 +3995,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->random_state.data) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -3308,13 +4011,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3325,6 +4030,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1 || latency_limit)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3332,7 +4041,8 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->first_failure.status,
+				   st->retries);
 	}
 	else
 	{
@@ -3342,14 +4052,25 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->first_failure.status == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->first_failure.status == IN_FAILED_SQL_TRANSACTION)
+			fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1 || latency_limit)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3369,7 +4090,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped &&
+		(st->first_failure.status == NO_FAILURE || latency_limit))
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3382,7 +4104,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -3390,19 +4113,24 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, 0, 0, st->first_failure.status,
+				   st->retries);
 	}
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->first_failure.status == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 }
 
 
@@ -3422,7 +4150,7 @@ disconnect_all(CState *state, int length)
 static void
 initDropTables(PGconn *con)
 {
-	fprintf(stderr, "dropping old tables...\n");
+	elog(NULL, ELEVEL_MAIN, "dropping old tables...\n");
 
 	/*
 	 * We drop all the tables in one command, so that whether there are
@@ -3497,7 +4225,7 @@ initCreateTables(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating tables...\n");
+	elog(NULL, ELEVEL_MAIN, "creating tables...\n");
 
 	for (i = 0; i < lengthof(DDLs); i++)
 	{
@@ -3550,7 +4278,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	elog(NULL, ELEVEL_MAIN, "generating data...\n");
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3596,7 +4324,7 @@ initGenerateData(PGconn *con)
 	res = PQexec(con, "copy pgbench_accounts from stdin");
 	if (PQresultStatus(res) != PGRES_COPY_IN)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
+		elog(NULL, ELEVEL_MAIN, "%s", PQerrorMessage(con));
 		exit(1);
 	}
 	PQclear(res);
@@ -3613,7 +4341,7 @@ initGenerateData(PGconn *con)
 				 j, k / naccounts + 1, 0);
 		if (PQputline(con, sql))
 		{
-			fprintf(stderr, "PQputline failed\n");
+			elog(NULL, ELEVEL_MAIN, "PQputline failed\n");
 			exit(1);
 		}
 
@@ -3629,10 +4357,11 @@ initGenerateData(PGconn *con)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = ((double) scale * naccounts - j) * elapsed_sec / j;
 
-			fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-					j, (int64) naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * (int64) scale)),
-					elapsed_sec, remaining_sec);
+			elog(NULL, ELEVEL_MAIN,
+				 INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+				 j, (int64) naccounts * scale,
+				 (int) (((int64) j * 100) / (naccounts * (int64) scale)),
+				 elapsed_sec, remaining_sec);
 		}
 		/* let's not call the timing for each row, but only each 100 rows */
 		else if (use_quiet && (j % 100 == 0))
@@ -3646,9 +4375,10 @@ initGenerateData(PGconn *con)
 			/* have we reached the next interval (or end)? */
 			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-				fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-						j, (int64) naccounts * scale,
-						(int) (((int64) j * 100) / (naccounts * (int64) scale)), elapsed_sec, remaining_sec);
+				elog(NULL, ELEVEL_MAIN,
+					 INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+					 j, (int64) naccounts * scale,
+					 (int) (((int64) j * 100) / (naccounts * (int64) scale)), elapsed_sec, remaining_sec);
 
 				/* skip to the next interval */
 				log_interval = (int) ceil(elapsed_sec / LOG_STEP_SECONDS);
@@ -3658,12 +4388,12 @@ initGenerateData(PGconn *con)
 	}
 	if (PQputline(con, "\\.\n"))
 	{
-		fprintf(stderr, "very last PQputline failed\n");
+		elog(NULL, ELEVEL_MAIN, "very last PQputline failed\n");
 		exit(1);
 	}
 	if (PQendcopy(con))
 	{
-		fprintf(stderr, "PQendcopy failed\n");
+		elog(NULL, ELEVEL_MAIN, "PQendcopy failed\n");
 		exit(1);
 	}
 
@@ -3676,7 +4406,7 @@ initGenerateData(PGconn *con)
 static void
 initVacuum(PGconn *con)
 {
-	fprintf(stderr, "vacuuming...\n");
+	elog(NULL, ELEVEL_MAIN, "vacuuming...\n");
 	executeStatement(con, "vacuum analyze pgbench_branches");
 	executeStatement(con, "vacuum analyze pgbench_tellers");
 	executeStatement(con, "vacuum analyze pgbench_accounts");
@@ -3696,7 +4426,7 @@ initCreatePKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating primary keys...\n");
+	elog(NULL, ELEVEL_MAIN, "creating primary keys...\n");
 	for (i = 0; i < lengthof(DDLINDEXes); i++)
 	{
 		char		buffer[256];
@@ -3733,7 +4463,7 @@ initCreateFKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating foreign keys...\n");
+	elog(NULL, ELEVEL_MAIN, "creating foreign keys...\n");
 	for (i = 0; i < lengthof(DDLKEYs); i++)
 	{
 		executeStatement(con, DDLKEYs[i]);
@@ -3754,7 +4484,7 @@ checkInitSteps(const char *initialize_steps)
 
 	if (initialize_steps[0] == '\0')
 	{
-		fprintf(stderr, "no initialization steps specified\n");
+		elog(NULL, ELEVEL_MAIN, "no initialization steps specified\n");
 		exit(1);
 	}
 
@@ -3762,9 +4492,13 @@ checkInitSteps(const char *initialize_steps)
 	{
 		if (strchr("dtgvpf ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
+			if (errstart(NULL, ELEVEL_MAIN))
+			{
+				errmsg(NULL, "unrecognized initialization step \"%c\"\n",
+					   *step);
+				errmsg(NULL, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
+				errfinish(NULL);
+			}
 			exit(1);
 		}
 	}
@@ -3779,7 +4513,7 @@ runInitSteps(const char *initialize_steps)
 	PGconn	   *con;
 	const char *step;
 
-	if ((con = doConnect()) == NULL)
+	if ((con = doConnect(NULL)) == NULL)
 		exit(1);
 
 	for (step = initialize_steps; *step != '\0'; step++)
@@ -3807,14 +4541,14 @@ runInitSteps(const char *initialize_steps)
 			case ' ':
 				break;			/* ignore */
 			default:
-				fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-						*step);
+				elog(NULL, ELEVEL_MAIN,
+					 "unrecognized initialization step \"%c\"\n", *step);
 				PQfinish(con);
 				exit(1);
 		}
 	}
 
-	fprintf(stderr, "done.\n");
+	elog(NULL, ELEVEL_MAIN, "done.\n");
 	PQfinish(con);
 }
 
@@ -3852,8 +4586,9 @@ parseQuery(Command *cmd)
 
 		if (cmd->argc >= MAX_ARGS)
 		{
-			fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n",
-					MAX_ARGS - 1, cmd->argv[0]);
+			elog(NULL, ELEVEL_MAIN,
+				 "statement has too many arguments (maximum is %d): %s\n",
+				 MAX_ARGS - 1, cmd->argv[0]);
 			pg_free(name);
 			return false;
 		}
@@ -3879,9 +4614,14 @@ pgbench_error(const char *fmt,...)
 	va_list		ap;
 
 	fflush(stdout);
-	va_start(ap, fmt);
-	vfprintf(stderr, _(fmt), ap);
-	va_end(ap);
+	if (errstart(NULL, ELEVEL_MAIN))
+	{
+		va_start(ap, fmt);
+		errmsg_internal(NULL, fmt, &ap);
+		va_end(ap);
+
+		errfinish(NULL);
+	}
 }
 
 /*
@@ -3901,25 +4641,29 @@ syntax_error(const char *source, int lineno,
 			 const char *line, const char *command,
 			 const char *msg, const char *more, int column)
 {
-	fprintf(stderr, "%s:%d: %s", source, lineno, msg);
-	if (more != NULL)
-		fprintf(stderr, " (%s)", more);
-	if (column >= 0 && line == NULL)
-		fprintf(stderr, " at column %d", column + 1);
-	if (command != NULL)
-		fprintf(stderr, " in command \"%s\"", command);
-	fprintf(stderr, "\n");
-	if (line != NULL)
-	{
-		fprintf(stderr, "%s\n", line);
-		if (column >= 0)
+	if (errstart(NULL, ELEVEL_MAIN))
+	{
+		errmsg(NULL, "%s:%d: %s", source, lineno, msg);
+		if (more != NULL)
+			errmsg(NULL, " (%s)", more);
+		if (column >= 0 && line == NULL)
+			errmsg(NULL, " at column %d", column + 1);
+		if (command != NULL)
+			errmsg(NULL, " in command \"%s\"", command);
+		errmsg(NULL, "\n");
+		if (line != NULL)
 		{
-			int			i;
+			errmsg(NULL, "%s\n", line);
+			if (column >= 0)
+			{
+				int			i;
 
-			for (i = 0; i < column; i++)
-				fprintf(stderr, " ");
-			fprintf(stderr, "^ error found here\n");
+				for (i = 0; i < column; i++)
+					errmsg(NULL, " ");
+				errmsg(NULL, "^ error found here\n");
+			}
 		}
+		errfinish(NULL);
 	}
 	exit(1);
 }
@@ -4170,9 +4914,8 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 static void
 ConditionError(const char *desc, int cmdn, const char *msg)
 {
-	fprintf(stderr,
-			"condition error in script \"%s\" command %d: %s\n",
-			desc, cmdn, msg);
+	elog(NULL, ELEVEL_MAIN,
+		 "condition error in script \"%s\" command %d: %s\n", desc, cmdn, msg);
 	exit(1);
 }
 
@@ -4370,8 +5113,8 @@ process_file(const char *filename, int weight)
 		fd = stdin;
 	else if ((fd = fopen(filename, "r")) == NULL)
 	{
-		fprintf(stderr, "could not open file \"%s\": %s\n",
-				filename, strerror(errno));
+		elog(NULL, ELEVEL_MAIN,
+			 "could not open file \"%s\": %s\n", filename, strerror(errno));
 		exit(1);
 	}
 
@@ -4379,8 +5122,8 @@ process_file(const char *filename, int weight)
 
 	if (ferror(fd))
 	{
-		fprintf(stderr, "could not read file \"%s\": %s\n",
-				filename, strerror(errno));
+		elog(NULL, ELEVEL_MAIN,
+			 "could not read file \"%s\": %s\n", filename, strerror(errno));
 		exit(1);
 	}
 
@@ -4405,10 +5148,14 @@ listAvailableScripts(void)
 {
 	int			i;
 
-	fprintf(stderr, "Available builtin scripts:\n");
-	for (i = 0; i < lengthof(builtin_script); i++)
-		fprintf(stderr, "\t%s\n", builtin_script[i].name);
-	fprintf(stderr, "\n");
+	if (errstart(NULL, ELEVEL_MAIN))
+	{
+		errmsg(NULL, "Available builtin scripts:\n");
+		for (i = 0; i < lengthof(builtin_script); i++)
+			errmsg(NULL, "\t%s\n", builtin_script[i].name);
+		errmsg(NULL, "\n");
+		errfinish(NULL);
+	}
 }
 
 /* return builtin script "name" if unambiguous, fails if not found */
@@ -4435,10 +5182,16 @@ findBuiltin(const char *name)
 
 	/* error cases */
 	if (found == 0)
-		fprintf(stderr, "no builtin script found for name \"%s\"\n", name);
-	else						/* found > 1 */
-		fprintf(stderr,
-				"ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n", found, name);
+	{
+		elog(NULL, ELEVEL_MAIN,
+			 "no builtin script found for name \"%s\"\n", name);
+	}
+	else
+	{						/* found > 1 */
+		elog(NULL, ELEVEL_MAIN,
+			 "ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n",
+			 found, name);
+	}
 
 	listAvailableScripts();
 	exit(1);
@@ -4471,14 +5224,14 @@ parseScriptWeight(const char *option, char **script)
 		wtmp = strtol(sep + 1, &badp, 10);
 		if (errno != 0 || badp == sep + 1 || *badp != '\0')
 		{
-			fprintf(stderr, "invalid weight specification: %s\n", sep);
+			elog(NULL, ELEVEL_MAIN, "invalid weight specification: %s\n", sep);
 			exit(1);
 		}
 		if (wtmp > INT_MAX || wtmp < 0)
 		{
-			fprintf(stderr,
-					"weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
-					INT_MAX, (int64) wtmp);
+			elog(NULL, ELEVEL_MAIN,
+				 "weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
+				 INT_MAX, (int64) wtmp);
 			exit(1);
 		}
 		weight = wtmp;
@@ -4498,13 +5251,15 @@ addScript(ParsedScript script)
 {
 	if (script.commands == NULL || script.commands[0] == NULL)
 	{
-		fprintf(stderr, "empty command list for script \"%s\"\n", script.desc);
+		elog(NULL, ELEVEL_MAIN,
+			 "empty command list for script \"%s\"\n", script.desc);
 		exit(1);
 	}
 
 	if (num_scripts >= MAX_SCRIPTS)
 	{
-		fprintf(stderr, "at most %d SQL scripts are allowed\n", MAX_SCRIPTS);
+		elog(NULL, ELEVEL_MAIN,
+			 "at most %d SQL scripts are allowed\n", MAX_SCRIPTS);
 		exit(1);
 	}
 
@@ -4535,7 +5290,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + total->errors;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4556,8 +5312,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4565,6 +5321,43 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors, 100.0 * total->errors / total_ntx);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total_ntx);
+
+	/*
+	 * It can be non-zero only if max_tries is greater than one or
+	 * latency_limit is used.
+	 */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
+	if (latency_limit)
+	{
+		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)",
+			   latency_limit / 1000.0, latency_late, total_ntx,
+			   (total_ntx > 0) ? 100.0 * latency_late / total_ntx : 0.0);
+
+		/* this statistics includes both successful and failed transactions */
+		if (total->errors > 0)
+			printf(" (including errors)");
+
+		printf("\n");
+	}
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4584,18 +5377,19 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			   total->skipped,
 			   100.0 * total->skipped / total->cnt);
 
-	if (latency_limit)
-		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
-
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+		printf("latency average = %.3f ms",
+			   1000.0 * time_include * nclients / total_ntx);
+
+		/* this statistics includes both successful and failed transactions */
+		if (total->errors > 0)
+			printf(" (including errors)");
+
+		printf("\n");
 	}
 
 	if (throttle_delay)
@@ -4614,7 +5408,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4623,6 +5417,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_total_ntx = sstats->cnt + sstats->errors;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4631,9 +5426,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
-					   100.0 * sstats->cnt / total->cnt,
+					   100.0 * sstats->cnt / script_total_ntx,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / script_total_ntx);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors_in_failed_tx,
+						   (100.0 * sstats->errors_in_failed_tx /
+							script_total_ntx));
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * latency_limit is used.
+				 */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4642,15 +5461,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+				{
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" : ","));
+				}
+				if (total->errors_in_failed_tx > 0)
+				{
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				}
+				if (total->retried > 0)
+				{
+					printf(" and retries");
+				}
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4658,10 +5495,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					}
+					if (total->errors_in_failed_tx > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					}
+					if (total->retried > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					}
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4689,9 +5541,9 @@ set_random_seed(const char *seed)
 		if (!pg_strong_random(&iseed, sizeof(iseed)))
 #endif
 		{
-			fprintf(stderr,
-					"cannot seed random from a strong source, none available: "
-					"use \"time\" or an unsigned integer value.\n");
+			elog(NULL, ELEVEL_MAIN,
+				 "cannot seed random from a strong source, none available: "
+				 "use \"time\" or an unsigned integer value.\n");
 			return false;
 		}
 	}
@@ -4701,21 +5553,32 @@ set_random_seed(const char *seed)
 		char garbage;
 		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
 		{
-			fprintf(stderr,
-					"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
-					seed);
+			elog(NULL, ELEVEL_MAIN,
+				 "unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
+				 seed);
 			return false;
 		}
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
+		elog(NULL, ELEVEL_MAIN, "setting random seed to %u\n", iseed);
 	srandom(iseed);
 	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
 	random_seed = iseed;
 	return true;
 }
 
+/*
+ * Initialize the random state of the client/thread.
+ */
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->data[0] = random();
+	random_state->data[1] = random();
+	random_state->data[2] = random();
+}
+
 
 int
 main(int argc, char **argv)
@@ -4725,7 +5588,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"debug", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -4740,7 +5603,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4759,6 +5622,7 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4827,14 +5691,21 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/*
+	 * Set the main error state early, because it may be used while setting
+	 * parameters from environment variables or from the command line.
+	 */
+	memset(&main_error_state, 0, sizeof(ErrorState));
+
 	/* set random seed early, because it may be used while parsing scripts. */
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
 	{
-		fprintf(stderr, "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
 		exit(1);
 	}
 
-	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "iI:h:nvp:d:qb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -4864,15 +5735,29 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
-				break;
+				{
+					for (debug_level = 0;
+						 debug_level < NUM_DEBUGLEVEL;
+						 debug_level++)
+					{
+						if (strcmp(optarg, DEBUGLEVEL[debug_level]) == 0)
+							break;
+					}
+					if (debug_level >= NUM_DEBUGLEVEL)
+					{
+						elog(NULL, ELEVEL_MAIN,
+							 "invalid debug level (-d): \"%s\"\n", optarg);
+						exit(1);
+					}
+					break;
+				}
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
 				if (nclients <= 0 || nclients > MAXCLIENTS)
 				{
-					fprintf(stderr, "invalid number of clients: \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid number of clients: \"%s\"\n", optarg);
 					exit(1);
 				}
 #ifdef HAVE_GETRLIMIT
@@ -4882,14 +5767,19 @@ main(int argc, char **argv)
 				if (getrlimit(RLIMIT_OFILE, &rlim) == -1)
 #endif							/* RLIMIT_NOFILE */
 				{
-					fprintf(stderr, "getrlimit failed: %s\n", strerror(errno));
+					elog(NULL, ELEVEL_MAIN,
+						 "getrlimit failed: %s\n", strerror(errno));
 					exit(1);
 				}
 				if (rlim.rlim_cur < nclients + 3)
 				{
-					fprintf(stderr, "need at least %d open files, but system limit is %ld\n",
-							nclients + 3, (long) rlim.rlim_cur);
-					fprintf(stderr, "Reduce number of clients, or use limit/ulimit to increase the system limit.\n");
+					if (errstart(NULL, ELEVEL_MAIN))
+					{
+						errmsg(NULL, "need at least %d open files, but system limit is %ld\n",
+							   nclients + 3, (long) rlim.rlim_cur);
+						errmsg(NULL, "Reduce number of clients, or use limit/ulimit to increase the system limit.\n");
+						errfinish(NULL);
+					}
 					exit(1);
 				}
 #endif							/* HAVE_GETRLIMIT */
@@ -4899,14 +5789,15 @@ main(int argc, char **argv)
 				nthreads = atoi(optarg);
 				if (nthreads <= 0)
 				{
-					fprintf(stderr, "invalid number of threads: \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid number of threads: \"%s\"\n", optarg);
 					exit(1);
 				}
 #ifndef ENABLE_THREAD_SAFETY
 				if (nthreads != 1)
 				{
-					fprintf(stderr, "threads are not supported on this platform; use -j1\n");
+					elog(NULL, ELEVEL_MAIN,
+						 "threads are not supported on this platform; use -j1\n");
 					exit(1);
 				}
 #endif							/* !ENABLE_THREAD_SAFETY */
@@ -4917,14 +5808,15 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
 				scale = atoi(optarg);
 				if (scale <= 0)
 				{
-					fprintf(stderr, "invalid scaling factor: \"%s\"\n", optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid scaling factor: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -4933,8 +5825,8 @@ main(int argc, char **argv)
 				nxacts = atoi(optarg);
 				if (nxacts <= 0)
 				{
-					fprintf(stderr, "invalid number of transactions: \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid number of transactions: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -4943,7 +5835,8 @@ main(int argc, char **argv)
 				duration = atoi(optarg);
 				if (duration <= 0)
 				{
-					fprintf(stderr, "invalid duration: \"%s\"\n", optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid duration: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -4992,13 +5885,13 @@ main(int argc, char **argv)
 
 					if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 					{
-						fprintf(stderr, "invalid variable definition: \"%s\"\n",
-								optarg);
+						elog(NULL, ELEVEL_MAIN,
+							 "invalid variable definition: \"%s\"\n", optarg);
 						exit(1);
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5007,7 +5900,8 @@ main(int argc, char **argv)
 				fillfactor = atoi(optarg);
 				if (fillfactor < 10 || fillfactor > 100)
 				{
-					fprintf(stderr, "invalid fillfactor: \"%s\"\n", optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid fillfactor: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -5018,8 +5912,8 @@ main(int argc, char **argv)
 						break;
 				if (querymode >= NUM_QUERYMODE)
 				{
-					fprintf(stderr, "invalid query mode (-M): \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid query mode (-M): \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -5028,8 +5922,8 @@ main(int argc, char **argv)
 				progress = atoi(optarg);
 				if (progress <= 0)
 				{
-					fprintf(stderr, "invalid thread progress delay: \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid thread progress delay: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -5042,7 +5936,8 @@ main(int argc, char **argv)
 
 					if (throttle_value <= 0.0)
 					{
-						fprintf(stderr, "invalid rate limit: \"%s\"\n", optarg);
+						elog(NULL, ELEVEL_MAIN,
+							 "invalid rate limit: \"%s\"\n", optarg);
 						exit(1);
 					}
 					/* Invert rate limit into a time offset */
@@ -5055,8 +5950,8 @@ main(int argc, char **argv)
 
 					if (limit_ms <= 0.0)
 					{
-						fprintf(stderr, "invalid latency limit: \"%s\"\n",
-								optarg);
+						elog(NULL, ELEVEL_MAIN,
+							 "invalid latency limit: \"%s\"\n", optarg);
 						exit(1);
 					}
 					benchmarking_option_set = true;
@@ -5080,7 +5975,8 @@ main(int argc, char **argv)
 				sample_rate = atof(optarg);
 				if (sample_rate <= 0.0 || sample_rate > 1.0)
 				{
-					fprintf(stderr, "invalid sampling rate: \"%s\"\n", optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid sampling rate: \"%s\"\n", optarg);
 					exit(1);
 				}
 				break;
@@ -5089,8 +5985,9 @@ main(int argc, char **argv)
 				agg_interval = atoi(optarg);
 				if (agg_interval <= 0)
 				{
-					fprintf(stderr, "invalid number of seconds for aggregation: \"%s\"\n",
-							optarg);
+					elog(NULL, ELEVEL_MAIN,
+						 "invalid number of seconds for aggregation: \"%s\"\n",
+						 optarg);
 					exit(1);
 				}
 				break;
@@ -5110,12 +6007,28 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				if (!set_random_seed(optarg))
 				{
-					fprintf(stderr, "error while setting random seed from --random-seed option\n");
+					elog(NULL, ELEVEL_MAIN,
+						 "error while setting random seed from --random-seed option\n");
 					exit(1);
 				}
 				break;
+			case 10:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg <= 0)
+					{
+						elog(NULL, ELEVEL_MAIN,
+							 "invalid number of maximum tries: \"%s\"\n", optarg);
+						exit(1);
+					}
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
 			default:
-				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				elog(NULL, ELEVEL_MAIN,
+					 _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
 				break;
 		}
@@ -5154,7 +6067,7 @@ main(int argc, char **argv)
 
 	if (total_weight == 0 && !is_init_mode)
 	{
-		fprintf(stderr, "total script weight must not be zero\n");
+		elog(NULL, ELEVEL_MAIN, "total script weight must not be zero\n");
 		exit(1);
 	}
 
@@ -5189,7 +6102,8 @@ main(int argc, char **argv)
 	{
 		if (benchmarking_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in initialization (-i) mode\n");
+			elog(NULL, ELEVEL_MAIN,
+				 "some of the specified options cannot be used in initialization (-i) mode\n");
 			exit(1);
 		}
 
@@ -5224,14 +6138,16 @@ main(int argc, char **argv)
 	{
 		if (initialization_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in benchmarking mode\n");
+			elog(NULL, ELEVEL_MAIN,
+				 "some of the specified options cannot be used in benchmarking mode\n");
 			exit(1);
 		}
 	}
 
 	if (nxacts > 0 && duration > 0)
 	{
-		fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "specify either a number of transactions (-t) or a duration (-T), not both\n");
 		exit(1);
 	}
 
@@ -5242,47 +6158,60 @@ main(int argc, char **argv)
 	/* --sampling-rate may be used only with -l */
 	if (sample_rate > 0.0 && !use_log)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
 		exit(1);
 	}
 
 	/* --sampling-rate may not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
 		exit(1);
 	}
 
 	if (agg_interval > 0 && !use_log)
 	{
-		fprintf(stderr, "log aggregation is allowed only when actually logging transactions\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "log aggregation is allowed only when actually logging transactions\n");
 		exit(1);
 	}
 
 	if (!use_log && logfile_prefix)
 	{
-		fprintf(stderr, "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
 		exit(1);
 	}
 
 	if (duration > 0 && agg_interval > duration)
 	{
-		fprintf(stderr, "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n", agg_interval, duration);
+		elog(NULL, ELEVEL_MAIN,
+			 "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n",
+			 agg_interval, duration);
 		exit(1);
 	}
 
 	if (duration > 0 && agg_interval > 0 && duration % agg_interval != 0)
 	{
-		fprintf(stderr, "duration (%d) must be a multiple of aggregation interval (%d)\n", duration, agg_interval);
+		elog(NULL, ELEVEL_MAIN,
+			 "duration (%d) must be a multiple of aggregation interval (%d)\n",
+			 duration, agg_interval);
 		exit(1);
 	}
 
 	if (progress_timestamp && progress == 0)
 	{
-		fprintf(stderr, "--progress-timestamp is allowed only under --progress\n");
+		elog(NULL, ELEVEL_MAIN,
+			 "--progress-timestamp is allowed only under --progress\n");
 		exit(1);
 	}
 
+	/* If necessary set the default tries limit  */
+	if (!max_tries && !latency_limit)
+		max_tries = 1;
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5300,19 +6229,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
-										   var->name, &var->value))
+					if (!putVariableValue(NULL, &state[i].variables, "startup",
+										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5324,27 +6253,31 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
-	{
-		if (duration <= 0)
-			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
-				   pghost, pgport, nclients, nxacts, dbName);
-		else
-			printf("pghost: %s pgport: %s nclients: %d duration: %d dbName: %s\n",
-				   pghost, pgport, nclients, duration, dbName);
-	}
+	if (duration <= 0)
+		elog(NULL, ELEVEL_DEBUG,
+			 "pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
+			 pghost, pgport, nclients, nxacts, dbName);
+	else
+		elog(NULL, ELEVEL_DEBUG,
+			 "pghost: %s pgport: %s nclients: %d duration: %d dbName: %s\n",
+			 pghost, pgport, nclients, duration, dbName);
 
 	/* opening connection... */
-	con = doConnect();
+	con = doConnect(NULL);
 	if (con == NULL)
 		exit(1);
 
 	if (PQstatus(con) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed\n", dbName);
-		fprintf(stderr, "%s", PQerrorMessage(con));
+		if (errstart(NULL, ELEVEL_MAIN))
+		{
+			errmsg(NULL, "connection to database \"%s\" failed\n", dbName);
+			errmsg(NULL, "%s", PQerrorMessage(con));
+			errfinish(NULL);
+		}
 		exit(1);
 	}
 
@@ -5359,10 +6292,15 @@ main(int argc, char **argv)
 		{
 			char	   *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
 
-			fprintf(stderr, "%s", PQerrorMessage(con));
-			if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
+			if (errstart(NULL, ELEVEL_MAIN))
 			{
-				fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con));
+				errmsg(NULL, "%s", PQerrorMessage(con));
+				if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
+				{
+					errmsg(NULL, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n",
+						   PQdb(con));
+				}
+				errfinish(NULL);
 			}
 
 			exit(1);
@@ -5370,28 +6308,30 @@ main(int argc, char **argv)
 		scale = atoi(PQgetvalue(res, 0, 0));
 		if (scale < 0)
 		{
-			fprintf(stderr, "invalid count(*) from pgbench_branches: \"%s\"\n",
-					PQgetvalue(res, 0, 0));
+			elog(NULL, ELEVEL_MAIN,
+				 "invalid count(*) from pgbench_branches: \"%s\"\n",
+				 PQgetvalue(res, 0, 0));
 			exit(1);
 		}
 		PQclear(res);
 
 		/* warn if we override user-given -s switch */
 		if (scale_given)
-			fprintf(stderr,
-					"scale option ignored, using count from pgbench_branches table (%d)\n",
-					scale);
+			elog(NULL, ELEVEL_MAIN,
+				 "scale option ignored, using count from pgbench_branches table (%d)\n",
+				 scale);
 	}
 
 	/*
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(NULL, &state[i].variables, "startup", "scale",
+								scale))
 				exit(1);
 		}
 	}
@@ -5400,15 +6340,18 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+		{
+			if (!putVariableInt(NULL, &state[i].variables, "startup",
+								"client_id", i))
 				exit(1);
+		}
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64	seed = ((uint64) (random() & 0xFFFF) << 48) |
 					   ((uint64) (random() & 0xFFFF) << 32) |
@@ -5416,31 +6359,33 @@ main(int argc, char **argv)
 					   (uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(NULL, &state[i].variables, "startup",
+								"default_seed", (int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(NULL, &state[i].variables, "startup",
+								"random_seed", random_seed))
 				exit(1);
 	}
 
 	if (!is_no_vacuum)
 	{
-		fprintf(stderr, "starting vacuum...");
+		elog(NULL, ELEVEL_MAIN, "starting vacuum...");
 		tryExecuteStatement(con, "vacuum pgbench_branches");
 		tryExecuteStatement(con, "vacuum pgbench_tellers");
 		tryExecuteStatement(con, "truncate pgbench_history");
-		fprintf(stderr, "end.\n");
+		elog(NULL, ELEVEL_MAIN, "end.\n");
 
 		if (do_vacuum_accounts)
 		{
-			fprintf(stderr, "starting vacuum pgbench_accounts...");
+			elog(NULL, ELEVEL_MAIN, "starting vacuum pgbench_accounts...");
 			tryExecuteStatement(con, "vacuum analyze pgbench_accounts");
-			fprintf(stderr, "end.\n");
+			elog(NULL, ELEVEL_MAIN, "end.\n");
 		}
 	}
 	PQfinish(con);
@@ -5457,15 +6402,14 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->random_state);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
 		thread->zipf_cache.current = 0;
 		thread->zipf_cache.overflowCount = 0;
 		initStats(&thread->stats, 0);
+		memset(&(thread->error_state), 0, sizeof(ErrorState));
 
 		nclients_dealt += thread->nstate;
 	}
@@ -5500,7 +6444,8 @@ main(int argc, char **argv)
 
 			if (err != 0 || thread->thread == INVALID_THREAD)
 			{
-				fprintf(stderr, "could not create thread: %s\n", strerror(err));
+				elog(NULL, ELEVEL_MAIN,
+					 "could not create thread: %s\n", strerror(err));
 				exit(1);
 			}
 		}
@@ -5541,6 +6486,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5610,8 +6559,9 @@ threadRun(void *arg)
 
 		if (thread->logfile == NULL)
 		{
-			fprintf(stderr, "could not open logfile \"%s\": %s\n",
-					logpath, strerror(errno));
+			elog(thread, ELEVEL_MAIN,
+				 "could not open logfile \"%s\": %s\n",
+				 logpath, strerror(errno));
 			goto done;
 		}
 	}
@@ -5621,7 +6571,7 @@ threadRun(void *arg)
 		/* make connections to the database */
 		for (i = 0; i < nstate; i++)
 		{
-			if ((state[i].con = doConnect()) == NULL)
+			if ((state[i].con = doConnect(NULL)) == NULL)
 				goto done;
 		}
 	}
@@ -5689,8 +6639,8 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					elog(thread, ELEVEL_MAIN,
+						 "invalid socket: %s", PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -5766,7 +6716,8 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				fprintf(stderr, "select() failed: %s\n", strerror(errno));
+				elog(thread, ELEVEL_MAIN,
+					 "select() failed: %s\n", strerror(errno));
 				goto done;
 			}
 		}
@@ -5790,8 +6741,8 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					elog(thread, ELEVEL_MAIN,
+						 "invalid socket: %s", PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -5825,7 +6776,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -5852,6 +6807,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -5869,6 +6829,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -5890,18 +6855,39 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 				}
 
-				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
-
-				if (throttle_delay)
+				if (errstart(NULL, ELEVEL_MAIN))
 				{
-					fprintf(stderr, ", lag %.3f ms", lag);
-					if (latency_limit)
-						fprintf(stderr, ", " INT64_FORMAT " skipped",
-								cur.skipped - last.skipped);
+					errmsg(NULL, "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
+						   tbuf, tps, latency, stdev);
+
+					if (errors > 0)
+					{
+						errmsg(NULL, ", " INT64_FORMAT " failed" , errors);
+						if (errors_in_failed_tx > 0)
+							errmsg(NULL, " (" INT64_FORMAT " in failed tx)",
+								   errors_in_failed_tx);
+					}
+
+					if (throttle_delay)
+					{
+						errmsg(NULL, ", lag %.3f ms", lag);
+						if (latency_limit)
+							errmsg(NULL, ", " INT64_FORMAT " skipped",
+								   cur.skipped - last.skipped);
+					}
+
+					/*
+					 * It can be non-zero only if max_tries is greater than one or
+					 * latency_limit is used.
+					 */
+					if (retried > 0)
+					{
+						errmsg(NULL, ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							   retried, retries);
+					}
+					errmsg(NULL, "\n");
+					errfinish(NULL);
 				}
-				fprintf(stderr, "\n");
 
 				last = cur;
 				last_report = now;
@@ -5986,7 +6972,7 @@ setalarm(int seconds)
 							   win32_timer_callback, NULL, seconds * 1000, 0,
 							   WT_EXECUTEINTIMERTHREAD | WT_EXECUTEONLYONCE))
 	{
-		fprintf(stderr, "failed to set timer\n");
+		elog(NULL, ELEVEL_MAIN, "failed to set timer\n");
 		exit(1);
 	}
 }
@@ -6058,3 +7044,169 @@ pthread_join(pthread_t th, void **thread_return)
 }
 
 #endif							/* WIN32 */
+
+/*
+ * Return the error state of the thread if the thread is not NULL; otherwise
+ * return the main error state.
+ */
+static ErrorState *
+getErrorState(TState *thread)
+{
+	return thread ? &(thread->error_state) : &main_error_state;
+}
+
+/*
+ * Add the formatted string to the end of the saved error/log message if the
+ * buffer size allows it.
+ *
+ * Only for use in the functions elog() / errmsg() / pgbench_error().
+ */
+static void
+errmsg_internal(TState *thread, const char *fmt, va_list *args)
+{
+	ErrorState  *error_state = getErrorState(thread);
+
+	error_state->message_length +=
+		vsnprintf(error_state->message + error_state->message_length,
+				  sizeof(error_state->message) - error_state->message_length,
+				  _(fmt), *args);
+
+	if (!error_state->in_process)
+	{
+		/* internal error which should never occur */
+		/* try to print an existing error message in stderr */
+		fprintf(stderr, "could not process an existing error message:\n%s",
+				error_state->message);
+		exit(1);
+	}
+}
+
+/*
+ * Report an error message or other log message if the debugging level resolves
+ * this error/logging level.
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise.
+ */
+static void
+elog(TState *thread, ErrorLevel error_level, const char *fmt,...)
+{
+	va_list		ap;
+
+	if (errstart(thread, error_level))
+	{
+		va_start(ap, fmt);
+		errmsg_internal(thread, fmt, &ap);
+		va_end(ap);
+
+		errfinish(thread);
+	}
+}
+
+/*
+ * Returns true if the debugging level resolves this error/logging level and
+ * false otherwise.
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise.
+ */
+static bool
+errstart(TState *thread, ErrorLevel error_level)
+{
+	ErrorState  *error_state = getErrorState(thread);
+	bool		start_error_reporting;
+
+	if (error_state->in_process)
+	{
+		/* internal error which should never occur */
+		fprintf(stderr, "cannot handle more than one error at a time\n");
+		exit(1);
+	}
+
+	/* Check if we have the appropriate debugging level */
+	switch (error_level)
+	{
+		case ELEVEL_DEBUG:
+			/*
+			 * Print the message only if there's a debugging mode for all types
+			 * of messages.
+			 */
+			start_error_reporting = debug_level >= DEBUG_ALL;
+			break;
+		case ELEVEL_CLIENT_FAIL:
+			/*
+			 * Print a failure message only if there's at least a debugging mode
+			 * for fails.
+			 */
+			start_error_reporting = debug_level >= DEBUG_FAILS;
+			break;
+		case ELEVEL_CLIENT_ABORTED:
+		case ELEVEL_MAIN:
+			/*
+			 * Always print an error message if the client is aborted or this is
+			 * the main program error/log message.
+			 */
+			start_error_reporting = true;
+			break;
+		default:
+			/* internal error which should never occur */
+			fprintf(stderr, "unexpected error level: %d\n", error_level);
+			exit(1);
+	}
+
+	if (start_error_reporting)
+	{
+		error_state->in_process = true;
+		error_state->message[0] = '\0';
+		error_state->message_length = 0;
+	}
+
+	return start_error_reporting;
+}
+
+/*
+ * Add the formatted string to the end of the stored error/log message if the
+ * buffer size allows it.
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise.
+ */
+static void errmsg(TState *thread, const char *fmt,...)
+{
+	va_list		ap;
+
+	va_start(ap, fmt);
+	errmsg_internal(thread, fmt, &ap);
+	va_end(ap);
+}
+
+/*
+ * Print an error/log message to stderr (it is assumed that it is not
+ * empty).
+ *
+ * Pass the thread as not NULL for client commands and as NULL otherwise.
+ */
+static void
+errfinish(TState *thread)
+{
+	ErrorState  *error_state = getErrorState(thread);
+
+	if (!error_state->in_process)
+	{
+		/* internal error which should never occur */
+		/* try to print an existing error message in stderr */
+		fprintf(stderr, "could not complete an existing error message:\n%s",
+				error_state->message);
+		exit(1);
+	}
+
+	if (error_state->message_length)
+	{
+		fprintf(stderr, "%s", error_state->message);
+	}
+	else
+	{
+		/* internal error which should never occur */
+		fprintf(stderr, "empty error message cannot be reported\n");
+		exit(1);
+	}
+
+	error_state->in_process = false;
+}
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index be08b20..370fce6 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -118,7 +118,8 @@ pgbench(
 	[   qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple} ],
+		qr{mode: simple},
+		qr{maximum number of tries: 1} ],
 	[qr{^$}],
 	'pgbench tpcb-like');
 
@@ -134,7 +135,7 @@ pgbench(
 	'pgbench simple update');
 
 pgbench(
-	'-t 100 -c 7 -M prepared -b se --debug',
+	'-t 100 -c 7 -M prepared -b se --debug all',
 	0,
 	[   qr{builtin: select only},
 		qr{clients: 7\b},
@@ -491,6 +492,10 @@ my @errors = (
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 } ],
+	[   'sql division by zero', 0, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+	SELECT 1 / 0;
+} ],
 
 	# SHELL
 	[   'shell bad command',               0,
@@ -621,6 +626,16 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 	[   'sleep unknown unit',         1,
 		[qr{unrecognized time unit}], q{\sleep 1 week} ],
 
+	# CONDITIONAL BLOCKS
+	[   'if elif failed conditions', 0,
+		[qr{division by zero}],
+		q{-- failed conditions
+\if 1 / 0
+\elif 1 / 0
+\else
+\endif
+} ],
+
 	# MISC
 	[   'misc invalid backslash command',         1,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand} ],
@@ -635,14 +650,32 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared -d fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)},
+		   qr{^((?!number of retried)(.|\n))*$} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
 }
 
+# reset client variables in case of failure
+pgbench(
+	'-n -t 2 -d fails', 0,
+	[ qr{processed: 0/2}, qr{number of errors: 2 \(100.000%\)},
+	  qr{^((?!number of retried)(.|\n))*$} ],
+	[ qr{(client 0 got a failure in command 1 \(SQL\) of script 0; ERROR:  syntax error at or near ":"(.|\n)*){2}} ],
+	'pgbench reset client variables in case of failure',
+	{	'001_pgbench_reset_client_variables' => q{
+BEGIN;
+-- select an unassigned variable
+SELECT :unassigned_var;
+\set unassigned_var 1
+END;
+} });
+
 # zipfian cache array overflow
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index af21f04..ed8def9 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -57,7 +57,7 @@ my @options = (
 
 	# name, options, stderr checks
 	[   'bad option',
-		'-h home -p 5432 -U calvin -d --bad-option',
+		'-h home -p 5432 -U calvin -d all --bad-option',
 		[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ] ],
 	[   'no file',
 		'-f no-such-file',
@@ -113,6 +113,8 @@ my @options = (
 	[ 'bad random seed', '--random-seed=one',
 		[qr{unrecognized random seed option "one": expecting an unsigned integer, "time" or "rand"},
 		 qr{error while setting random seed from --random-seed option} ] ],
+	[ 'bad maximum number of tries', '--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..5e45cb1
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,761 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 34;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	DEADLOCK_1         => 3,
+	WAIT_PGBENCH_2     => 4,
+	DEADLOCK_2         => 5,
+	TRANSACTION_ENDS_1 => 6,
+	TRANSACTION_ENDS_2 => 7,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "SELECT pg_sleep(1);\n"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "END;\n");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_1 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_2 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my ($max_tries, $latency_limit, $test_name) = @_;
+
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	my $retry_options =
+		($max_tries ? "--max-tries $max_tries" : "")
+	  . ($latency_limit ? "--latency-limit $latency_limit" : "");
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_serialization,
+		split /\s+/, $retry_options);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 0/1},
+		"$test_name: check processed transactions");
+
+	like($out_pgbench,
+		qr{number of errors: 1 \(100\.000%\)},
+		"$test_name: check errors");
+
+	like($out_pgbench,
+		qr{^((?!number of retried)(.|\n))*$},
+		"$test_name: check retried");
+
+	if ($max_tries)
+	{
+		like($out_pgbench,
+			qr{maximum number of tries: $max_tries},
+			"$test_name: check the maximum number of tries");
+	}
+	else
+	{
+		like($out_pgbench,
+			qr{^((?!maximum number of tries)(.|\n))*$},
+			"$test_name: check the maximum number of tries");
+	}
+
+	if ($latency_limit)
+	{
+		like($out_pgbench,
+			qr{number of transactions above the $latency_limit\.0 ms latency limit: 1/1 \(100.000 \%\) \(including errors\)},
+			"$test_name: check transactions above latency limit");
+	}
+	else
+	{
+		like($out_pgbench,
+			qr{^((?!latency limit)(.|\n))*$},
+			"$test_name: check transactions above latency limit");
+	}
+
+	my $pattern =
+		"client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"$test_name: check serialization failure");
+}
+
+sub test_pgbench_serialization_failures
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update with retrying: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent update with retrying: check errors");
+
+	like($out_pgbench,
+		qr{number of retried: 1 \(100\.000%\)},
+		"concurrent update with retrying: check retried");
+
+	like($out_pgbench,
+		qr{number of retries: 1},
+		"concurrent update with retrying: check retries");
+
+	like($out_pgbench,
+		qr{latency average = \d+\.\d{3} ms\n},
+		"concurrent update with retrying: check latency average");
+
+	my $pattern =
+		"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update\n\n"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g2+"
+	  . "client 0 continues a failed transaction in command 4 \\(SQL\\) of script 0; "
+	  . "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_sleep\\(1\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update with retrying: check the retried transaction");
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	# The first or second pgbench should get a deadlock error
+	ok((($out1 =~ /processed: 0\/1/ and $out2 =~ /processed: 1\/1/) or
+		($out2 =~ /processed: 0\/1/ and $out1 =~ /processed: 1\/1/)),
+		"concurrent deadlock update: check processed transactions");
+
+	ok((($out1 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of errors)(.|\n))*$/) or
+		($out2 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of errors)(.|\n))*$/)),
+		"concurrent deadlock update: check errors");
+
+	ok(($err1 =~ /client 0 got a failure in command 3 \(SQL\) of script 0; ERROR:  deadlock detected/ or
+		$err2 =~ /client 0 got a failure in command 2 \(SQL\) of script 0; ERROR:  deadlock detected/),
+		"concurrent deadlock update: check deadlock failure");
+
+	# Both pgbenches do not have retried transactions
+	like($out1 . $out2,
+		qr{^((?!number of retried)(.|\n))*$},
+		"concurrent deadlock update: check retried");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks that pgbenches will wait for:
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_1 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_1 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_1 ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_2 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_2 ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_1 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_1 ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_2 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_2 ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	ok((($out1 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of retried)(.|\n))*$/) or
+		($out2 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of retried)(.|\n))*$/)),
+		"concurrent deadlock update with retrying: check retries");
+
+	my $pattern1 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	my $pattern2 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+test_pgbench_serialization_errors(
+								1,      # --max-tries
+								0,      # --latency-limit (will not be used)
+								"concurrent update");
+test_pgbench_serialization_errors(
+								0,	    # --max-tries (will not be used)
+								900,    # --latency-limit
+								"concurrent update with maximum time of tries");
+
+test_pgbench_serialization_failures();
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+#done
+$node->stop;
-- 
2.7.4

#54

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#53)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

FYI the v8 patch does not apply anymore, mostly because of a recent perl
reindentation.

I think that I'll have time for a round of review in the first half of
July. Providing a rebased patch before then would be nice.

--
Fabien.

#55

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Fabien COELHO (#54)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Fabien COELHO wrote:

I think that I'll have time for a round of review in the first half of July.
Providing a rebased patch before then would be nice.

Note that even in the absence of a rebased patch, you can apply to an
older checkout if you have some limited window of time for a review.

Looking over the diff, I find that this patch tries to do too much and
needs to be split up. At a minimum there is a preliminary patch that
introduces the error reporting stuff (errstart etc); there are other
thread-related changes (for example to the random generation functions)
that probably belong in a separate one too. Not sure if there are other
smaller patches hidden inside the rest.

On elog/errstart: we already have a convention for what ereport() calls
look like; I suggest to use that instead of inventing your own. With
that, is there a need for elog()? In the backend we have it because
$HISTORY but there's no need for that here -- I propose to lose elog()
and use only ereport everywhere. Also, I don't see that you need
errmsg_internal() at all; let's lose it too.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#56

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Alvaro Herrera (#55)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Alvaro,

I think that I'll have time for a round of review in the first half of July.
Providing a rebased patch before then would be nice.

Note that even in the absence of a rebased patch, you can apply to an
older checkout if you have some limited window of time for a review.

Yes, sure. I'd like to bring this feature to be committable, so it will
have to be rebased at some point anyway.

Looking over the diff, I find that this patch tries to do too much and
needs to be split up.

Yep, I agree that it would help the reviewing process. On the other hand I
have bad memories about maintaining dependent patches which interfere
significantly. Maybe it may not the case with this feature.

Thanks for the advices.

--
Fabien.

#57

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Fabien COELHO (#56)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello,

Fabien COELHO wrote:

Looking over the diff, I find that this patch tries to do too much and
needs to be split up.

Yep, I agree that it would help the reviewing process. On the other hand I
have bad memories about maintaining dependent patches which interfere
significantly.

Sure. I suggest not posting these patches separately -- instead, post
as a series of commits in a single email, attaching files from "git
format-patch".

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#58

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Alvaro Herrera (#57)

4 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello!

Fabien and Alvaro, thank you very much! And sorry for such a late reply
(I was a bit busy and making of ereport took some time..) :-( Below is a
rebased version of the patch (commit
9effb63e0dd12b0704cd8e11106fe08ff5c9d685) divided into several smaller
patches:

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch
- a patch for the ereport() macro (this is used to report client
failures that do not cause an aborts and this depends on the level of
debugging).
- implementation: if possible, use the local ErrorData structure during
the errstart()/errmsg()/errfinish() calls. Otherwise use a static
variable protected by a mutex if necessary. To do all of this export the
function appendPQExpBufferVA from libpq.

v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

Any suggestions are welcome!

On 08-05-2018 9:00, Fabien COELHO wrote:

Hello Marina,

FYI the v8 patch does not apply anymore, mostly because of a recent
perl reindentation.

I think that I'll have time for a round of review in the first half of
July. Providing a rebased patch before then would be nice.

They are attached, but a little delayed due to testing..

On 08-05-2018 13:58, Alvaro Herrera wrote:

Looking over the diff, I find that this patch tries to do too much and
needs to be split up. At a minimum there is a preliminary patch that
introduces the error reporting stuff (errstart etc); there are other
thread-related changes (for example to the random generation functions)
that probably belong in a separate one too. Not sure if there are
other
smaller patches hidden inside the rest.

Here is a try to do it..

On elog/errstart: we already have a convention for what ereport() calls
look like; I suggest to use that instead of inventing your own. With
that, is there a need for elog()? In the backend we have it because
$HISTORY but there's no need for that here -- I propose to lose elog()
and use only ereport everywhere. Also, I don't see that you need
errmsg_internal() at all; let's lose it too.

I agree, done. But there're some changes to make such a design
thread-safe..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patchtext/x-diff; name=v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patchDownload

From b5b8ca42a3299a75c80b89dcabf512a8b4f361f3 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Mon, 21 May 2018 13:03:11 +0300
Subject: [PATCH v9] Pgbench errors: use the RandomState structure for
 thread/client random seed.

This is most important when it is used to reset a client's random seed during
the repeating of transactions after serialization/deadlock failures.

Use the random state of the client during the processing its commands and the
random state of the thread to choose the script / get the throttle delay / to
log with a sample rate.
---
 src/bin/pgbench/pgbench.c | 85 +++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 29 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f0c5149..7b8f357 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -251,6 +251,14 @@ typedef struct StatsData
 } StatsData;
 
 /*
+ * Data structure for thread/client random seed.
+ */
+typedef struct RandomState
+{
+	unsigned short data[3];
+} RandomState;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -330,6 +338,7 @@ typedef struct
 	int			id;				/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
+	RandomState random_state;	/* separate randomness for each client */
 
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
@@ -390,7 +399,7 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+	RandomState random_state; 	/* separate randomness for each thread */
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -694,7 +703,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -705,7 +714,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->data));
 }
 
 /*
@@ -714,7 +723,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -724,7 +734,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -737,7 +747,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -765,8 +776,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * http://en.wikipedia.org/wiki/Box_muller)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->data);
+		double		rand2 = 1.0 - pg_erand48(random_state->data);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -793,7 +804,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -802,7 +813,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->data);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -880,7 +891,7 @@ zipfFindOrCreateCacheCell(ZipfCache *cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -891,8 +902,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->data);
+		v = pg_erand48(random_state->data);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -910,10 +921,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(TState *thread, RandomState *random_state, int64 n,
+					   double s)
 {
 	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	double		uniform = pg_erand48(random_state->data);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -925,7 +937,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(TState *thread, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -934,8 +947,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					  ? computeIterativeZipfian(random_state, n, s)
+					  : computeHarmonicZipfian(thread, random_state, n, s));
 }
 
 /*
@@ -2209,7 +2222,7 @@ evalStandardFunc(TState *thread, CState *st,
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2231,7 +2244,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
@@ -2243,7 +2257,8 @@ evalStandardFunc(TState *thread, CState *st,
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(thread, &st->random_state,
+												   imin, imax, param));
 					}
 					else		/* exponential */
 					{
@@ -2256,7 +2271,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2551,7 +2567,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->random_state, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2745,7 +2761,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->random_state, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2779,7 +2795,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->random_state,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -3322,7 +3339,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->random_state.data) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -4750,6 +4767,17 @@ set_random_seed(const char *seed)
 	return true;
 }
 
+/*
+ * Initialize the random state of the client/thread.
+ */
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->data[0] = random();
+	random_state->data[1] = random();
+	random_state->data[2] = random();
+}
+
 
 int
 main(int argc, char **argv)
@@ -5358,6 +5386,7 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
 	if (debug)
@@ -5491,9 +5520,7 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->random_state);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
-- 
2.7.4

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patchtext/x-diff; name=v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patchDownload

From 89af56aa7beaa6403ea2b01290710f64ee3d4570 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Mon, 21 May 2018 13:04:50 +0300
Subject: [PATCH v9] Pgbench errors: use the Variables structure for client
 variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.
---
 src/bin/pgbench/pgbench.c | 133 ++++++++++++++++++++++++++--------------------
 1 file changed, 75 insertions(+), 58 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7b8f357..254f125 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -251,6 +251,16 @@ typedef struct StatsData
 } StatsData;
 
 /*
+ * Data structure for client variables.
+ */
+typedef struct Variables
+{
+	Variable   *array;			/* array of variable definitions */
+	int			nvariables;		/* number of variables */
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
+/*
  * Data structure for thread/client random seed.
  */
 typedef struct RandomState
@@ -344,9 +354,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -1198,39 +1206,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvariables <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
-			  compareVariableNames);
-		st->vars_sorted = true;
+		qsort((void *) variables->array, variables->nvariables,
+			  sizeof(Variable), compareVariableNames);
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->array,
+								variables->nvariables,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1354,11 +1362,11 @@ valid_variable_name(const char *name)
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
 		Variable   *newvars;
@@ -1375,23 +1383,23 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
+		if (variables->array)
+			newvars = (Variable *) pg_realloc(variables->array,
+								(variables->nvariables + 1) * sizeof(Variable));
 		else
 			newvars = (Variable *) pg_malloc(sizeof(Variable));
 
-		st->variables = newvars;
+		variables->array = newvars;
 
-		var = &newvars[st->nvariables];
+		var = &newvars[variables->nvariables];
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvariables++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1400,12 +1408,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1423,12 +1432,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1443,12 +1452,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1503,7 +1513,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1524,7 +1534,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1539,12 +1549,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2375,7 +2386,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					fprintf(stderr, "undefined variable \"%s\"\n",
 							expr->u.variable.varname);
@@ -2439,7 +2450,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2470,7 +2481,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[i]);
@@ -2533,7 +2544,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 				argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 #ifdef DEBUG
@@ -2587,7 +2598,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2599,7 +2610,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2633,7 +2644,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		if (debug)
@@ -2661,14 +2672,14 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[1]);
@@ -2941,7 +2952,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
 							commandFailed(st, "sleep", "execution of meta-command failed");
 							st->state = CSTATE_ABORTED;
@@ -2982,7 +2993,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(&st->variables,  argv[0],
+												  argv[1], &result))
 							{
 								commandFailed(st, "set", "assignment of meta-command failed");
 								st->state = CSTATE_ABORTED;
@@ -3035,7 +3047,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3055,7 +3069,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -5060,7 +5075,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5362,19 +5377,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvariables; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.array[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5450,11 +5465,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -5463,15 +5478,15 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = ((uint64) (random() & 0xFFFF) << 48) |
 		((uint64) (random() & 0xFFFF) << 32) |
@@ -5479,15 +5494,17 @@ main(int argc, char **argv)
 		(uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.7.4

v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patchtext/x-diff; name=v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patchDownload

From 3721e6b2a98dd68014521269d4d6016937fa4e6d Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Mon, 21 May 2018 14:31:53 +0300
Subject: [PATCH v9] Pgbench errors: use the ereport() macro to report
 debug/log/error messages

This is most important when it is used to report client failures that do not
cause an aborts and this depends on the level of debugging.

If possible, use the local ErorrData structure during the
errstart()/errmsg()/errfinish() calls. Otherwise use a static variable protected
by a mutex if necessary. To do all of this export the function
appendPQExpBufferVA from libpq.
---
 src/bin/pgbench/pgbench.c          | 1011 ++++++++++++++++++++++++------------
 src/interfaces/libpq/exports.txt   |    1 +
 src/interfaces/libpq/pqexpbuffer.c |    4 +-
 src/interfaces/libpq/pqexpbuffer.h |    8 +
 4 files changed, 692 insertions(+), 332 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 254f125..d100cee 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -526,6 +526,93 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+typedef enum ErrorLevel
+{
+	/*
+	 * To report throttling, executed/sent/received commands etc.
+	 */
+	ELEVEL_DEBUG,
+
+	/*
+	 * To report the error/log messages and/or PGBENCH_DEBUG.
+	 */
+	ELEVEL_LOG,
+
+	/*
+	 * To report the error messages of the main program and to exit immediately.
+	 */
+	ELEVEL_FATAL
+} ErrorLevel;
+
+typedef struct ErrorData
+{
+	ErrorLevel	elevel;
+	PQExpBufferData message;
+} ErrorData;
+
+typedef ErrorData *Error;
+
+#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE__VA_ARGS)
+/* use the local ErrorData in ereport */
+#define LOCAL_ERROR_DATA()	ErrorData edata;
+
+#define errstart(elevel)	errstartImpl(&edata, elevel)
+#define errmsg(...)			errmsgImpl(&edata, __VA_ARGS__)
+#define errfinish(...)		errfinishImpl(&edata, __VA_ARGS__)
+#else							/* !(ENABLE_THREAD_SAFETY && HAVE__VA_ARGS) */
+/* use the global ErrorData in ereport... */
+#define LOCAL_ERROR_DATA()
+static ErrorData edata;
+static Error error = &edata;
+
+/* ...and protect it with a mutex if necessary */
+#ifdef ENABLE_THREAD_SAFETY
+static pthread_mutex_t error_mutex = PTHREAD_MUTEX_INITIALIZER;
+#endif							/* ENABLE_THREAD_SAFETY */
+
+#define errstart	errstartImpl
+#define errmsg		errmsgImpl
+#define errfinish	errfinishImpl
+#endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
+
+/*
+ * Error reporting API: to be used in this way:
+ *		ereport(ELEVEL_LOG,
+ *				(errmsg("connection to database \"%s\" failed\n", dbName),
+ *				... other errxxx() fields as needed ...));
+ *
+ * The error level is required, and so is a primary error message. All else is
+ * optional.
+ *
+ * If elevel >= ELEVEL_FATAL, the call will not return; we try to inform the
+ * compiler of that via abort(). However, no useful optimization effect is
+ * obtained unless the compiler sees elevel as a compile-time constant, else
+ * we're just adding code bloat. So, if __builtin_constant_p is available, use
+ * that to cause the second if() to vanish completely for non-constant cases. We
+ * avoid using a local variable because it's not necessary and prevents gcc from
+ * making the unreachability deduction at optlevel -O0.
+ */
+#ifdef HAVE__BUILTIN_CONSTANT_P
+#define ereport(elevel, rest) \
+	do { \
+		LOCAL_ERROR_DATA() \
+		if (errstart(elevel)) \
+			errfinish rest; \
+		if (__builtin_constant_p(elevel) && (elevel) >= ELEVEL_FATAL) \
+			abort(); \
+	} while(0)
+#else							/* !HAVE__BUILTIN_CONSTANT_P */
+#define ereport(elevel, rest) \
+	do { \
+		const int elevel_ = (elevel); \
+		LOCAL_ERROR_DATA() \
+		if (errstart(elevel_)) \
+			errfinish rest; \
+		if (elevel_ >= ELEVEL_FATAL) \
+			abort(); \
+	} while(0)
+#endif							/* HAVE__BUILTIN_CONSTANT_P */
+
 
 /* Function prototypes */
 static void setNullValue(PgBenchValue *pv);
@@ -543,6 +630,17 @@ static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
 
+#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE__VA_ARGS)
+static bool errstartImpl(Error error, ErrorLevel elevel);
+static int  errmsgImpl(Error error,
+					   const char *fmt,...) pg_attribute_printf(2, 3);
+static void errfinishImpl(Error error, int dummy,...);
+#else							/* !(ENABLE_THREAD_SAFETY && HAVE__VA_ARGS) */
+static bool errstartImpl(ErrorLevel elevel);
+static int  errmsgImpl(const char *fmt,...) pg_attribute_printf(1, 2);
+static void errfinishImpl(int dummy,...);
+#endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
+
 
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
@@ -685,7 +783,10 @@ strtoint64(const char *str)
 
 	/* require at least one digit */
 	if (!isdigit((unsigned char) *ptr))
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+	{
+		ereport(ELEVEL_LOG,
+				(errmsg("invalid input syntax for integer: \"%s\"\n", str)));
+	}
 
 	/* process digits */
 	while (*ptr && isdigit((unsigned char) *ptr))
@@ -693,7 +794,11 @@ strtoint64(const char *str)
 		int64		tmp = result * 10 + (*ptr++ - '0');
 
 		if ((tmp / 10) != result)	/* overflow? */
-			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+		{
+			ereport(ELEVEL_LOG,
+					(errmsg("value \"%s\" is out of range for type bigint\n",
+							str)));
+		}
 		result = tmp;
 	}
 
@@ -704,7 +809,10 @@ gotdigits:
 		ptr++;
 
 	if (*ptr != '\0')
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+	{
+		ereport(ELEVEL_LOG,
+				(errmsg("invalid input syntax for integer: \"%s\"\n", str)));
+	}
 
 	return ((sign < 0) ? -result : result);
 }
@@ -1091,10 +1199,7 @@ executeStatement(PGconn *con, const char *sql)
 
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("%s", PQerrorMessage(con))));
 	PQclear(res);
 }
 
@@ -1107,8 +1212,9 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		fprintf(stderr, "(ignoring this error and continuing anyway)\n");
+		ereport(ELEVEL_LOG,
+				(errmsg("%s(ignoring this error and continuing anyway)\n",
+						PQerrorMessage(con))));
 	}
 	PQclear(res);
 }
@@ -1154,8 +1260,8 @@ doConnect(void)
 
 		if (!conn)
 		{
-			fprintf(stderr, "connection to database \"%s\" failed\n",
-					dbName);
+			ereport(ELEVEL_LOG,
+					(errmsg("connection to database \"%s\" failed\n", dbName)));
 			return NULL;
 		}
 
@@ -1173,8 +1279,9 @@ doConnect(void)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed:\n%s",
-				dbName, PQerrorMessage(conn));
+		ereport(ELEVEL_LOG,
+				(errmsg("connection to database \"%s\" failed:\n%s",
+						dbName, PQerrorMessage(conn))));
 		PQfinish(conn);
 		return NULL;
 	}
@@ -1312,9 +1419,9 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			ereport(ELEVEL_LOG,
+					(errmsg("malformed variable \"%s\" value: \"%s\"\n",
+							var->name, var->svalue)));
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1359,10 +1466,12 @@ valid_variable_name(const char *name)
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
- * Returns NULL on failure (bad name).
+ * On failure (bad name): if this is a client run returns NULL; exits the
+ * program otherwise.
  */
 static Variable *
-lookupCreateVariable(Variables *variables, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool client)
 {
 	Variable   *var;
 
@@ -1377,8 +1486,13 @@ lookupCreateVariable(Variables *variables, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			/*
+			 * About the error level used: if we process client commands, it a
+			 * normal failure; otherwise it is not and we exit the program.
+			 */
+			ereport(client ? ELEVEL_LOG : ELEVEL_FATAL,
+					(errmsg("%s: invalid variable name: \"%s\"\n",
+							context, name)));
 			return NULL;
 		}
 
@@ -1406,17 +1520,15 @@ lookupCreateVariable(Variables *variables, const char *context, char *name)
 }
 
 /* Assign a string value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
-static bool
+/* Exits on failure (bad name) */
+static void
 putVariable(Variables *variables, const char *context, char *name,
 			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(variables, context, name);
-	if (!var)
-		return false;
+	var = lookupCreateVariable(variables, context, name, false);
 
 	/* dup then free, in case value is pointing at this variable */
 	val = pg_strdup(value);
@@ -1425,19 +1537,20 @@ putVariable(Variables *variables, const char *context, char *name,
 		free(var->svalue);
 	var->svalue = val;
 	var->value.type = PGBT_NO_VALUE;
-
-	return true;
 }
 
-/* Assign a value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign a value to a variable, creating it if need be.
+ * On failure (bad name): if this is a client run returns false; exits the
+ * program otherwise.
+ */
 static bool
 putVariableValue(Variables *variables, const char *context, char *name,
-				 const PgBenchValue *value)
+				 const PgBenchValue *value, bool client)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(variables, context, name);
+	var = lookupCreateVariable(variables, context, name, client);
 	if (!var)
 		return false;
 
@@ -1449,16 +1562,19 @@ putVariableValue(Variables *variables, const char *context, char *name,
 	return true;
 }
 
-/* Assign an integer value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign an integer value to a variable, creating it if need be.
+ * On failure (bad name): if this is a client run returns false; exits the
+ * program otherwise.
+ */
 static bool
 putVariableInt(Variables *variables, const char *context, char *name,
-			   int64 value)
+			   int64 value, bool client)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(variables, context, name, &val);
+	return putVariableValue(variables, context, name, &val, client);
 }
 
 /*
@@ -1590,7 +1706,8 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		ereport(ELEVEL_LOG,
+				(errmsg("cannot coerce %s to boolean\n", valueTypeName(pval))));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1635,7 +1752,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			ereport(ELEVEL_LOG,
+					(errmsg("double to int overflow for %f\n", dval)));
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1643,7 +1761,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		ereport(ELEVEL_LOG,
+				(errmsg("cannot coerce %s to int\n", valueTypeName(pval))));
 		return false;
 	}
 }
@@ -1664,7 +1783,8 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		ereport(ELEVEL_LOG,
+				(errmsg("cannot coerce %s to double\n", valueTypeName(pval))));
 		return false;
 	}
 }
@@ -1845,8 +1965,9 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		ereport(ELEVEL_LOG,
+				(errmsg("too many function arguments, maximum is %d\n",
+					   MAX_FARGS)));
 		return false;
 	}
 
@@ -1969,7 +2090,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								ereport(ELEVEL_LOG,
+										(errmsg("division by zero\n")));
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1980,7 +2102,9 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										ereport(
+											ELEVEL_LOG,
+											(errmsg("bigint out of range\n")));
 										return false;
 									}
 									else
@@ -2081,22 +2205,42 @@ evalStandardFunc(TState *thread, CState *st,
 		case PGBENCH_DEBUG:
 			{
 				PgBenchValue *varg = &vargs[0];
+				PQExpBufferData errormsg_buf;
 
 				Assert(nargs == 1);
 
-				fprintf(stderr, "debug(script=%d,command=%d): ",
-						st->use_file, st->command + 1);
+				initPQExpBuffer(&errormsg_buf);
+				printfPQExpBuffer(&errormsg_buf,
+								  "debug(script=%d,command=%d): ",
+								  st->use_file, st->command + 1);
 
 				if (varg->type == PGBT_NULL)
-					fprintf(stderr, "null\n");
+				{
+					appendPQExpBuffer(&errormsg_buf, "null\n");
+				}
 				else if (varg->type == PGBT_BOOLEAN)
-					fprintf(stderr, "boolean %s\n", varg->u.bval ? "true" : "false");
+				{
+					appendPQExpBuffer(&errormsg_buf,
+									  "boolean %s\n",
+									  varg->u.bval ? "true" : "false");
+				}
 				else if (varg->type == PGBT_INT)
-					fprintf(stderr, "int " INT64_FORMAT "\n", varg->u.ival);
+				{
+					appendPQExpBuffer(&errormsg_buf,
+									  "int " INT64_FORMAT "\n", varg->u.ival);
+				}
 				else if (varg->type == PGBT_DOUBLE)
-					fprintf(stderr, "double %.*g\n", DBL_DIG, varg->u.dval);
+				{
+					appendPQExpBuffer(&errormsg_buf,
+									  "double %.*g\n", DBL_DIG, varg->u.dval);
+				}
 				else			/* internal error, unexpected type */
+				{
 					Assert(0);
+				}
+
+				ereport(ELEVEL_LOG, (errmsg("%s", errormsg_buf.data)));
+				termPQExpBuffer(&errormsg_buf);
 
 				*retval = *varg;
 
@@ -2220,13 +2364,15 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					ereport(ELEVEL_LOG,
+							(errmsg("empty range given to random\n")));
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					ereport(ELEVEL_LOG,
+							(errmsg("random range is too large\n")));
 					return false;
 				}
 
@@ -2248,9 +2394,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							ereport(ELEVEL_LOG,
+									(errmsg("gaussian parameter must be at least %f (not %f)\n",
+											MIN_GAUSSIAN_PARAM, param)));
 							return false;
 						}
 
@@ -2262,9 +2408,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							ereport(ELEVEL_LOG,
+									(errmsg("zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+											MAX_ZIPFIAN_PARAM, param)));
 							return false;
 						}
 						setIntValue(retval,
@@ -2275,9 +2421,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							ereport(ELEVEL_LOG,
+									(errmsg("exponential parameter must be greater than zero (got %f)\n",
+											param)));
 							return false;
 						}
 
@@ -2388,8 +2534,9 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					ereport(ELEVEL_LOG,
+							(errmsg("undefined variable \"%s\"\n",
+									expr->u.variable.varname)));
 					return false;
 				}
 
@@ -2408,9 +2555,9 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 		default:
 			/* internal error which should never occur */
-			fprintf(stderr, "unexpected enode type in evaluation: %d\n",
-					expr->etype);
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("unexpected enode type in evaluation: %d\n",
+							expr->etype)));
 	}
 }
 
@@ -2483,15 +2630,17 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			ereport(ELEVEL_LOG,
+					(errmsg("%s: undefined variable \"%s\"\n",
+							argv[0], argv[i])));
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			ereport(ELEVEL_LOG,
+					(errmsg("%s: shell command is too long\n", argv[0])));
 			return false;
 		}
 
@@ -2509,7 +2658,11 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		if (system(command))
 		{
 			if (!timer_exceeded)
-				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+			{
+				ereport(ELEVEL_LOG,
+						(errmsg("%s: could not launch shell command\n",
+								argv[0])));
+			}
 			return false;
 		}
 		return true;
@@ -2518,19 +2671,25 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		ereport(ELEVEL_LOG,
+				(errmsg("%s: could not launch shell command\n", argv[0])));
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
 		if (!timer_exceeded)
-			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
+		{
+			ereport(ELEVEL_LOG,
+					(errmsg("%s: could not read result of shell command\n",
+							argv[0])));
+		}
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		ereport(ELEVEL_LOG,
+				(errmsg("%s: could not close shell command\n", argv[0])));
 		return false;
 	}
 
@@ -2540,11 +2699,12 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		ereport(ELEVEL_LOG,
+				(errmsg("%s: shell command must return an integer (not \"%s\")\n",
+						argv[0], res)));
 		return false;
 	}
-	if (!putVariableInt(variables, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval, true))
 		return false;
 
 #ifdef DEBUG
@@ -2563,9 +2723,9 @@ preparedStatementName(char *buffer, int file, int state)
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	ereport(ELEVEL_LOG,
+			(errmsg("client %d aborted in command %d (%s) of script %d; %s\n",
+					st->id, st->command, cmd, st->use_file, message)));
 }
 
 /* return a script number with a weighted choice. */
@@ -2600,8 +2760,7 @@ sendCommand(CState *st, Command *command)
 		sql = pg_strdup(command->argv[0]);
 		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		ereport(ELEVEL_DEBUG, (errmsg("client %d sending %s\n", st->id, sql)));
 		r = PQsendQuery(st->con, sql);
 		free(sql);
 	}
@@ -2612,8 +2771,7 @@ sendCommand(CState *st, Command *command)
 
 		getQueryParams(&st->variables, command, params);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		ereport(ELEVEL_DEBUG, (errmsg("client %d sending %s\n", st->id, sql)));
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
 	}
@@ -2638,7 +2796,10 @@ sendCommand(CState *st, Command *command)
 				res = PQprepare(st->con, name,
 								commands[j]->argv[0], commands[j]->argc - 1, NULL);
 				if (PQresultStatus(res) != PGRES_COMMAND_OK)
-					fprintf(stderr, "%s", PQerrorMessage(st->con));
+				{
+					ereport(ELEVEL_LOG,
+							(errmsg("%s", PQerrorMessage(st->con))));
+				}
 				PQclear(res);
 			}
 			st->prepared[st->use_file] = true;
@@ -2647,8 +2808,7 @@ sendCommand(CState *st, Command *command)
 		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, name);
+		ereport(ELEVEL_DEBUG, (errmsg("client %d sending %s\n", st->id, name)));
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
 	}
@@ -2657,9 +2817,9 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
-			fprintf(stderr, "client %d could not send %s\n",
-					st->id, command->argv[0]);
+		ereport(ELEVEL_DEBUG,
+				(errmsg("client %d could not send %s\n",
+						st->id, command->argv[0])));
 		st->ecnt++;
 		return false;
 	}
@@ -2681,8 +2841,9 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			ereport(ELEVEL_LOG,
+					(errmsg("%s: undefined variable \"%s\"\n",
+							argv[0], argv[1])));
 			return false;
 		}
 		usec = atoi(var);
@@ -2745,9 +2906,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
-					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
-							sql_script[st->use_file].desc);
+				ereport(ELEVEL_DEBUG,
+						(errmsg("client %d executing script \"%s\"\n",
+								st->id, sql_script[st->use_file].desc)));
 
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
@@ -2820,9 +2981,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
-					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
-							st->id, wait);
+				ereport(ELEVEL_DEBUG,
+						(errmsg("client %d throttling " INT64_FORMAT " us\n",
+								st->id, wait)));
 				break;
 
 				/*
@@ -2854,8 +3015,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					start = now;
 					if ((st->con = doConnect()) == NULL)
 					{
-						fprintf(stderr, "client %d aborted while establishing connection\n",
-								st->id);
+						ereport(ELEVEL_LOG,
+								(errmsg("client %d aborted while establishing connection\n",
+										st->id)));
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -2932,14 +3094,16 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					int			argc = command->argc,
 								i;
 					char	  **argv = command->argv;
+					PQExpBufferData errmsg_buf;
 
-					if (debug)
-					{
-						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
-						for (i = 1; i < argc; i++)
-							fprintf(stderr, " %s", argv[i]);
-						fprintf(stderr, "\n");
-					}
+					initPQExpBuffer(&errmsg_buf);
+					printfPQExpBuffer(&errmsg_buf, "client %d executing \\%s",
+									  st->id, argv[0]);
+					for (i = 1; i < argc; i++)
+						appendPQExpBuffer(&errmsg_buf, " %s", argv[i]);
+					appendPQExpBufferChar(&errmsg_buf, '\n');
+					ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data)));
+					termPQExpBuffer(&errmsg_buf);
 
 					if (command->meta == META_SLEEP)
 					{
@@ -2994,7 +3158,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						if (command->meta == META_SET)
 						{
 							if (!putVariableValue(&st->variables,  argv[0],
-												  argv[1], &result))
+												  argv[1], &result, true))
 							{
 								commandFailed(st, "set", "assignment of meta-command failed");
 								st->state = CSTATE_ABORTED;
@@ -3197,8 +3361,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_WAIT_RESULT:
 				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
+				ereport(ELEVEL_DEBUG,
+						(errmsg("client %d receiving\n", st->id)));
 				if (!PQconsumeInput(st->con))
 				{				/* there's something wrong */
 					commandFailed(st, "SQL", "perhaps the backend died while processing");
@@ -3284,8 +3448,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				/* conditional stack must be empty */
 				if (!conditional_stack_empty(st->cstack))
 				{
-					fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("end of script reached within a conditional, missing \\endif\n")));
 				}
 
 				if (is_connect)
@@ -3484,7 +3648,7 @@ disconnect_all(CState *state, int length)
 static void
 initDropTables(PGconn *con)
 {
-	fprintf(stderr, "dropping old tables...\n");
+	ereport(ELEVEL_LOG, (errmsg("dropping old tables...\n")));
 
 	/*
 	 * We drop all the tables in one command, so that whether there are
@@ -3559,7 +3723,7 @@ initCreateTables(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating tables...\n");
+	ereport(ELEVEL_LOG, (errmsg("creating tables...\n")));
 
 	for (i = 0; i < lengthof(DDLs); i++)
 	{
@@ -3612,7 +3776,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	ereport(ELEVEL_LOG, (errmsg("generating data...\n")));
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3657,10 +3821,7 @@ initGenerateData(PGconn *con)
 	 */
 	res = PQexec(con, "copy pgbench_accounts from stdin");
 	if (PQresultStatus(res) != PGRES_COPY_IN)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("%s", PQerrorMessage(con))));
 	PQclear(res);
 
 	INSTR_TIME_SET_CURRENT(start);
@@ -3674,10 +3835,7 @@ initGenerateData(PGconn *con)
 				 INT64_FORMAT "\t" INT64_FORMAT "\t%d\t\n",
 				 j, k / naccounts + 1, 0);
 		if (PQputline(con, sql))
-		{
-			fprintf(stderr, "PQputline failed\n");
-			exit(1);
-		}
+			ereport(ELEVEL_FATAL, (errmsg("PQputline failed\n")));
 
 		/*
 		 * If we want to stick with the original logging, print a message each
@@ -3691,10 +3849,12 @@ initGenerateData(PGconn *con)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = ((double) scale * naccounts - j) * elapsed_sec / j;
 
-			fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-					j, (int64) naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * (int64) scale)),
-					elapsed_sec, remaining_sec);
+			ereport(ELEVEL_LOG,
+					(errmsg(INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+							j, (int64) naccounts * scale,
+							(int) (((int64) j * 100) /
+								   (naccounts * (int64) scale)),
+							elapsed_sec, remaining_sec)));
 		}
 		/* let's not call the timing for each row, but only each 100 rows */
 		else if (use_quiet && (j % 100 == 0))
@@ -3708,9 +3868,12 @@ initGenerateData(PGconn *con)
 			/* have we reached the next interval (or end)? */
 			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-				fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-						j, (int64) naccounts * scale,
-						(int) (((int64) j * 100) / (naccounts * (int64) scale)), elapsed_sec, remaining_sec);
+				ereport(ELEVEL_LOG,
+						(errmsg(INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+								j, (int64) naccounts * scale,
+								(int) (((int64) j * 100) /
+									   (naccounts * (int64) scale)),
+								elapsed_sec, remaining_sec)));
 
 				/* skip to the next interval */
 				log_interval = (int) ceil(elapsed_sec / LOG_STEP_SECONDS);
@@ -3719,15 +3882,9 @@ initGenerateData(PGconn *con)
 
 	}
 	if (PQputline(con, "\\.\n"))
-	{
-		fprintf(stderr, "very last PQputline failed\n");
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("very last PQputline failed\n")));
 	if (PQendcopy(con))
-	{
-		fprintf(stderr, "PQendcopy failed\n");
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("PQendcopy failed\n")));
 
 	executeStatement(con, "commit");
 }
@@ -3738,7 +3895,7 @@ initGenerateData(PGconn *con)
 static void
 initVacuum(PGconn *con)
 {
-	fprintf(stderr, "vacuuming...\n");
+	ereport(ELEVEL_LOG, (errmsg("vacuuming...\n")));
 	executeStatement(con, "vacuum analyze pgbench_branches");
 	executeStatement(con, "vacuum analyze pgbench_tellers");
 	executeStatement(con, "vacuum analyze pgbench_accounts");
@@ -3758,7 +3915,7 @@ initCreatePKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating primary keys...\n");
+	ereport(ELEVEL_LOG, (errmsg("creating primary keys...\n")));
 	for (i = 0; i < lengthof(DDLINDEXes); i++)
 	{
 		char		buffer[256];
@@ -3795,7 +3952,7 @@ initCreateFKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating foreign keys...\n");
+	ereport(ELEVEL_LOG, (errmsg("creating foreign keys...\n")));
 	for (i = 0; i < lengthof(DDLKEYs); i++)
 	{
 		executeStatement(con, DDLKEYs[i]);
@@ -3815,19 +3972,16 @@ checkInitSteps(const char *initialize_steps)
 	const char *step;
 
 	if (initialize_steps[0] == '\0')
-	{
-		fprintf(stderr, "no initialization steps specified\n");
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("no initialization steps specified\n")));
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		if (strchr("dtgvpf ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("unrecognized initialization step \"%c\"\n"
+							"allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n",
+							*step)));
 		}
 	}
 }
@@ -3869,14 +4023,15 @@ runInitSteps(const char *initialize_steps)
 			case ' ':
 				break;			/* ignore */
 			default:
-				fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-						*step);
+				ereport(ELEVEL_LOG,
+						(errmsg("unrecognized initialization step \"%c\"\n",
+								*step)));
 				PQfinish(con);
 				exit(1);
 		}
 	}
 
-	fprintf(stderr, "done.\n");
+	ereport(ELEVEL_LOG, (errmsg("done.\n")));
 	PQfinish(con);
 }
 
@@ -3914,8 +4069,9 @@ parseQuery(Command *cmd)
 
 		if (cmd->argc >= MAX_ARGS)
 		{
-			fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n",
-					MAX_ARGS - 1, cmd->argv[0]);
+			ereport(ELEVEL_LOG,
+					(errmsg("statement has too many arguments (maximum is %d): %s\n",
+							MAX_ARGS - 1, cmd->argv[0])));
 			pg_free(name);
 			return false;
 		}
@@ -3939,11 +4095,22 @@ static void
 pgbench_error(const char *fmt,...)
 {
 	va_list		ap;
+	PQExpBufferData errmsg_buf;
+	bool		done;
 
 	fflush(stdout);
-	va_start(ap, fmt);
-	vfprintf(stderr, _(fmt), ap);
-	va_end(ap);
+	initPQExpBuffer(&errmsg_buf);
+
+	/* Loop in case we have to retry after enlarging the buffer. */
+	do
+	{
+		va_start(ap, fmt);
+		done = appendPQExpBufferVA(&errmsg_buf, fmt, ap);
+		va_end(ap);
+	} while (!done);
+
+	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	termPQExpBuffer(&errmsg_buf);
 }
 
 /*
@@ -3963,26 +4130,32 @@ syntax_error(const char *source, int lineno,
 			 const char *line, const char *command,
 			 const char *msg, const char *more, int column)
 {
-	fprintf(stderr, "%s:%d: %s", source, lineno, msg);
+	PQExpBufferData errmsg_buf;
+
+	initPQExpBuffer(&errmsg_buf);
+	printfPQExpBuffer(&errmsg_buf, "%s:%d: %s", source, lineno, msg);
 	if (more != NULL)
-		fprintf(stderr, " (%s)", more);
+		appendPQExpBuffer(&errmsg_buf, " (%s)", more);
 	if (column >= 0 && line == NULL)
-		fprintf(stderr, " at column %d", column + 1);
+		appendPQExpBuffer(&errmsg_buf, " at column %d", column + 1);
 	if (command != NULL)
-		fprintf(stderr, " in command \"%s\"", command);
-	fprintf(stderr, "\n");
+		appendPQExpBuffer(&errmsg_buf, " in command \"%s\"", command);
+	appendPQExpBufferChar(&errmsg_buf, '\n');
 	if (line != NULL)
 	{
-		fprintf(stderr, "%s\n", line);
+		appendPQExpBuffer(&errmsg_buf, "%s\n", line);
 		if (column >= 0)
 		{
 			int			i;
 
 			for (i = 0; i < column; i++)
-				fprintf(stderr, " ");
-			fprintf(stderr, "^ error found here\n");
+				appendPQExpBufferChar(&errmsg_buf, ' ');
+			appendPQExpBufferStr(&errmsg_buf, "^ error found here\n");
 		}
 	}
+
+	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	termPQExpBuffer(&errmsg_buf);
 	exit(1);
 }
 
@@ -4232,10 +4405,9 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 static void
 ConditionError(const char *desc, int cmdn, const char *msg)
 {
-	fprintf(stderr,
-			"condition error in script \"%s\" command %d: %s\n",
-			desc, cmdn, msg);
-	exit(1);
+	ereport(ELEVEL_FATAL,
+			(errmsg("condition error in script \"%s\" command %d: %s\n",
+					desc, cmdn, msg)));
 }
 
 /*
@@ -4434,18 +4606,18 @@ process_file(const char *filename, int weight)
 		fd = stdin;
 	else if ((fd = fopen(filename, "r")) == NULL)
 	{
-		fprintf(stderr, "could not open file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("could not open file \"%s\": %s\n",
+						filename, strerror(errno))));
 	}
 
 	buf = read_file_contents(fd);
 
 	if (ferror(fd))
 	{
-		fprintf(stderr, "could not read file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("could not read file \"%s\": %s\n",
+						filename, strerror(errno))));
 	}
 
 	if (fd != stdin)
@@ -4468,11 +4640,16 @@ static void
 listAvailableScripts(void)
 {
 	int			i;
+	PQExpBufferData errmsg_buf;
 
-	fprintf(stderr, "Available builtin scripts:\n");
+	initPQExpBuffer(&errmsg_buf);
+	printfPQExpBuffer(&errmsg_buf, "Available builtin scripts:\n");
 	for (i = 0; i < lengthof(builtin_script); i++)
-		fprintf(stderr, "\t%s\n", builtin_script[i].name);
-	fprintf(stderr, "\n");
+		appendPQExpBuffer(&errmsg_buf, "\t%s\n", builtin_script[i].name);
+	appendPQExpBufferChar(&errmsg_buf, '\n');
+
+	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	termPQExpBuffer(&errmsg_buf);
 }
 
 /* return builtin script "name" if unambiguous, fails if not found */
@@ -4499,10 +4676,16 @@ findBuiltin(const char *name)
 
 	/* error cases */
 	if (found == 0)
-		fprintf(stderr, "no builtin script found for name \"%s\"\n", name);
-	else						/* found > 1 */
-		fprintf(stderr,
-				"ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n", found, name);
+	{
+		ereport(ELEVEL_LOG,
+				(errmsg("no builtin script found for name \"%s\"\n", name)));
+	}
+	else
+	{						/* found > 1 */
+		ereport(ELEVEL_LOG,
+				(errmsg("ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n",
+						found, name)));
+	}
 
 	listAvailableScripts();
 	exit(1);
@@ -4535,15 +4718,14 @@ parseScriptWeight(const char *option, char **script)
 		wtmp = strtol(sep + 1, &badp, 10);
 		if (errno != 0 || badp == sep + 1 || *badp != '\0')
 		{
-			fprintf(stderr, "invalid weight specification: %s\n", sep);
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("invalid weight specification: %s\n", sep)));
 		}
 		if (wtmp > INT_MAX || wtmp < 0)
 		{
-			fprintf(stderr,
-					"weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
-					INT_MAX, (int64) wtmp);
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
+							INT_MAX, (int64) wtmp)));
 		}
 		weight = wtmp;
 	}
@@ -4562,14 +4744,15 @@ addScript(ParsedScript script)
 {
 	if (script.commands == NULL || script.commands[0] == NULL)
 	{
-		fprintf(stderr, "empty command list for script \"%s\"\n", script.desc);
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("empty command list for script \"%s\"\n",
+						script.desc)));
 	}
 
 	if (num_scripts >= MAX_SCRIPTS)
 	{
-		fprintf(stderr, "at most %d SQL scripts are allowed\n", MAX_SCRIPTS);
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("at most %d SQL scripts are allowed\n", MAX_SCRIPTS)));
 	}
 
 	CheckConditional(script);
@@ -4754,9 +4937,8 @@ set_random_seed(const char *seed)
 		if (!pg_strong_random(&iseed, sizeof(iseed)))
 #endif
 		{
-			fprintf(stderr,
-					"cannot seed random from a strong source, none available: "
-					"use \"time\" or an unsigned integer value.\n");
+			ereport(ELEVEL_LOG,
+					(errmsg("cannot seed random from a strong source, none available: use \"time\" or an unsigned integer value.\n")));
 			return false;
 		}
 	}
@@ -4767,15 +4949,18 @@ set_random_seed(const char *seed)
 
 		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
 		{
-			fprintf(stderr,
-					"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
-					seed);
+			ereport(ELEVEL_LOG,
+					(errmsg("unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
+							seed)));
 			return false;
 		}
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
+	{
+		ereport(ELEVEL_LOG,
+				(errmsg("setting random seed to %u\n", iseed)));
+	}
 	srandom(iseed);
 	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
 	random_seed = iseed;
@@ -4907,8 +5092,8 @@ main(int argc, char **argv)
 	/* set random seed early, because it may be used while parsing scripts. */
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
 	{
-		fprintf(stderr, "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n")));
 	}
 
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
@@ -4948,9 +5133,9 @@ main(int argc, char **argv)
 				nclients = atoi(optarg);
 				if (nclients <= 0 || nclients > MAXCLIENTS)
 				{
-					fprintf(stderr, "invalid number of clients: \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid number of clients: \"%s\"\n",
+									optarg)));
 				}
 #ifdef HAVE_GETRLIMIT
 #ifdef RLIMIT_NOFILE			/* most platforms use RLIMIT_NOFILE */
@@ -4959,15 +5144,16 @@ main(int argc, char **argv)
 				if (getrlimit(RLIMIT_OFILE, &rlim) == -1)
 #endif							/* RLIMIT_NOFILE */
 				{
-					fprintf(stderr, "getrlimit failed: %s\n", strerror(errno));
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("getrlimit failed: %s\n",
+									strerror(errno))));
 				}
 				if (rlim.rlim_cur < nclients + 3)
 				{
-					fprintf(stderr, "need at least %d open files, but system limit is %ld\n",
-							nclients + 3, (long) rlim.rlim_cur);
-					fprintf(stderr, "Reduce number of clients, or use limit/ulimit to increase the system limit.\n");
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("need at least %d open files, but system limit is %ld\n"
+									"Reduce number of clients, or use limit/ulimit to increase the system limit.\n",
+									nclients + 3, (long) rlim.rlim_cur)));
 				}
 #endif							/* HAVE_GETRLIMIT */
 				break;
@@ -4976,15 +5162,15 @@ main(int argc, char **argv)
 				nthreads = atoi(optarg);
 				if (nthreads <= 0)
 				{
-					fprintf(stderr, "invalid number of threads: \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid number of threads: \"%s\"\n",
+									optarg)));
 				}
 #ifndef ENABLE_THREAD_SAFETY
 				if (nthreads != 1)
 				{
-					fprintf(stderr, "threads are not supported on this platform; use -j1\n");
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("threads are not supported on this platform; use -j1\n")));
 				}
 #endif							/* !ENABLE_THREAD_SAFETY */
 				break;
@@ -5001,8 +5187,9 @@ main(int argc, char **argv)
 				scale = atoi(optarg);
 				if (scale <= 0)
 				{
-					fprintf(stderr, "invalid scaling factor: \"%s\"\n", optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid scaling factor: \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 't':
@@ -5010,9 +5197,9 @@ main(int argc, char **argv)
 				nxacts = atoi(optarg);
 				if (nxacts <= 0)
 				{
-					fprintf(stderr, "invalid number of transactions: \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid number of transactions: \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 'T':
@@ -5020,8 +5207,8 @@ main(int argc, char **argv)
 				duration = atoi(optarg);
 				if (duration <= 0)
 				{
-					fprintf(stderr, "invalid duration: \"%s\"\n", optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid duration: \"%s\"\n", optarg)));
 				}
 				break;
 			case 'U':
@@ -5069,14 +5256,13 @@ main(int argc, char **argv)
 
 					if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 					{
-						fprintf(stderr, "invalid variable definition: \"%s\"\n",
-								optarg);
-						exit(1);
+						ereport(ELEVEL_FATAL,
+								(errmsg("invalid variable definition: \"%s\"\n",
+										optarg)));
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0].variables, "option", optarg, p))
-						exit(1);
+					putVariable(&state[0].variables, "option", optarg, p);
 				}
 				break;
 			case 'F':
@@ -5084,8 +5270,8 @@ main(int argc, char **argv)
 				fillfactor = atoi(optarg);
 				if (fillfactor < 10 || fillfactor > 100)
 				{
-					fprintf(stderr, "invalid fillfactor: \"%s\"\n", optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid fillfactor: \"%s\"\n", optarg)));
 				}
 				break;
 			case 'M':
@@ -5095,9 +5281,9 @@ main(int argc, char **argv)
 						break;
 				if (querymode >= NUM_QUERYMODE)
 				{
-					fprintf(stderr, "invalid query mode (-M): \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid query mode (-M): \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 'P':
@@ -5105,9 +5291,9 @@ main(int argc, char **argv)
 				progress = atoi(optarg);
 				if (progress <= 0)
 				{
-					fprintf(stderr, "invalid thread progress delay: \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid thread progress delay: \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 'R':
@@ -5119,8 +5305,9 @@ main(int argc, char **argv)
 
 					if (throttle_value <= 0.0)
 					{
-						fprintf(stderr, "invalid rate limit: \"%s\"\n", optarg);
-						exit(1);
+						ereport(ELEVEL_FATAL,
+								(errmsg("invalid rate limit: \"%s\"\n",
+										optarg)));
 					}
 					/* Invert rate limit into a time offset */
 					throttle_delay = (int64) (1000000.0 / throttle_value);
@@ -5132,9 +5319,9 @@ main(int argc, char **argv)
 
 					if (limit_ms <= 0.0)
 					{
-						fprintf(stderr, "invalid latency limit: \"%s\"\n",
-								optarg);
-						exit(1);
+						ereport(ELEVEL_FATAL,
+								(errmsg("invalid latency limit: \"%s\"\n",
+										optarg)));
 					}
 					benchmarking_option_set = true;
 					latency_limit = (int64) (limit_ms * 1000);
@@ -5157,8 +5344,9 @@ main(int argc, char **argv)
 				sample_rate = atof(optarg);
 				if (sample_rate <= 0.0 || sample_rate > 1.0)
 				{
-					fprintf(stderr, "invalid sampling rate: \"%s\"\n", optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid sampling rate: \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 5:				/* aggregate-interval */
@@ -5166,9 +5354,9 @@ main(int argc, char **argv)
 				agg_interval = atoi(optarg);
 				if (agg_interval <= 0)
 				{
-					fprintf(stderr, "invalid number of seconds for aggregation: \"%s\"\n",
-							optarg);
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("invalid number of seconds for aggregation: \"%s\"\n",
+									optarg)));
 				}
 				break;
 			case 6:				/* progress-timestamp */
@@ -5187,13 +5375,14 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				if (!set_random_seed(optarg))
 				{
-					fprintf(stderr, "error while setting random seed from --random-seed option\n");
-					exit(1);
+					ereport(ELEVEL_FATAL,
+							(errmsg("error while setting random seed from --random-seed option\n")));
 				}
 				break;
 			default:
-				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
-				exit(1);
+				ereport(ELEVEL_FATAL,
+						(errmsg(_("Try \"%s --help\" for more information.\n"),
+								progname)));
 				break;
 		}
 	}
@@ -5231,8 +5420,8 @@ main(int argc, char **argv)
 
 	if (total_weight == 0 && !is_init_mode)
 	{
-		fprintf(stderr, "total script weight must not be zero\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("total script weight must not be zero\n")));
 	}
 
 	/* show per script stats if several scripts are used */
@@ -5266,8 +5455,8 @@ main(int argc, char **argv)
 	{
 		if (benchmarking_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in initialization (-i) mode\n");
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("some of the specified options cannot be used in initialization (-i) mode\n")));
 		}
 
 		if (initialize_steps == NULL)
@@ -5301,15 +5490,15 @@ main(int argc, char **argv)
 	{
 		if (initialization_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in benchmarking mode\n");
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("some of the specified options cannot be used in benchmarking mode\n")));
 		}
 	}
 
 	if (nxacts > 0 && duration > 0)
 	{
-		fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("specify either a number of transactions (-t) or a duration (-T), not both\n")));
 	}
 
 	/* Use DEFAULT_NXACTS if neither nxacts nor duration is specified. */
@@ -5319,45 +5508,47 @@ main(int argc, char **argv)
 	/* --sampling-rate may be used only with -l */
 	if (sample_rate > 0.0 && !use_log)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n")));
 	}
 
 	/* --sampling-rate may not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n")));
 	}
 
 	if (agg_interval > 0 && !use_log)
 	{
-		fprintf(stderr, "log aggregation is allowed only when actually logging transactions\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("log aggregation is allowed only when actually logging transactions\n")));
 	}
 
 	if (!use_log && logfile_prefix)
 	{
-		fprintf(stderr, "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n")));
 	}
 
 	if (duration > 0 && agg_interval > duration)
 	{
-		fprintf(stderr, "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n", agg_interval, duration);
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("number of seconds for aggregation (%d) must not be higher than test duration (%d)\n",
+						agg_interval, duration)));
 	}
 
 	if (duration > 0 && agg_interval > 0 && duration % agg_interval != 0)
 	{
-		fprintf(stderr, "duration (%d) must be a multiple of aggregation interval (%d)\n", duration, agg_interval);
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("duration (%d) must be a multiple of aggregation interval (%d)\n",
+						duration, agg_interval)));
 	}
 
 	if (progress_timestamp && progress == 0)
 	{
-		fprintf(stderr, "--progress-timestamp is allowed only under --progress\n");
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("--progress-timestamp is allowed only under --progress\n")));
 	}
 
 	/*
@@ -5383,15 +5574,13 @@ main(int argc, char **argv)
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i].variables, "startup",
-										  var->name, &var->value))
-						exit(1);
+					putVariableValue(&state[i].variables, "startup", var->name,
+									 &var->value, false);
 				}
 				else
 				{
-					if (!putVariable(&state[i].variables, "startup",
-									 var->name, var->svalue))
-						exit(1);
+					putVariable(&state[i].variables, "startup", var->name,
+								var->svalue);
 				}
 			}
 		}
@@ -5404,15 +5593,14 @@ main(int argc, char **argv)
 		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
-	{
-		if (duration <= 0)
-			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
-				   pghost, pgport, nclients, nxacts, dbName);
-		else
-			printf("pghost: %s pgport: %s nclients: %d duration: %d dbName: %s\n",
-				   pghost, pgport, nclients, duration, dbName);
-	}
+	if (duration <= 0)
+		ereport(ELEVEL_DEBUG,
+				(errmsg("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
+						pghost, pgport, nclients, nxacts, dbName)));
+	else
+		ereport(ELEVEL_DEBUG,
+				(errmsg("pghost: %s pgport: %s nclients: %d duration: %d dbName: %s\n",
+						pghost, pgport, nclients, duration, dbName)));
 
 	/* opening connection... */
 	con = doConnect();
@@ -5421,9 +5609,9 @@ main(int argc, char **argv)
 
 	if (PQstatus(con) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed\n", dbName);
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
+		ereport(ELEVEL_FATAL,
+				(errmsg("connection to database \"%s\" failed\n%s",
+						dbName, PQerrorMessage(con))));
 	}
 
 	if (internal_script_used)
@@ -5436,29 +5624,35 @@ main(int argc, char **argv)
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 		{
 			char	   *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+			PQExpBufferData errmsg_buf;
 
-			fprintf(stderr, "%s", PQerrorMessage(con));
+			initPQExpBuffer(&errmsg_buf);
+			printfPQExpBuffer(&errmsg_buf, "%s", PQerrorMessage(con));
 			if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
 			{
-				fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con));
+				appendPQExpBuffer(&errmsg_buf,
+								  "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n",
+								  PQdb(con));
 			}
 
+			ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+			termPQExpBuffer(&errmsg_buf);
 			exit(1);
 		}
 		scale = atoi(PQgetvalue(res, 0, 0));
 		if (scale < 0)
 		{
-			fprintf(stderr, "invalid count(*) from pgbench_branches: \"%s\"\n",
-					PQgetvalue(res, 0, 0));
-			exit(1);
+			ereport(ELEVEL_FATAL,
+					(errmsg("invalid count(*) from pgbench_branches: \"%s\"\n",
+							PQgetvalue(res, 0, 0))));
 		}
 		PQclear(res);
 
 		/* warn if we override user-given -s switch */
 		if (scale_given)
-			fprintf(stderr,
-					"scale option ignored, using count from pgbench_branches table (%d)\n",
-					scale);
+			ereport(ELEVEL_LOG,
+					(errmsg("scale option ignored, using count from pgbench_branches table (%d)\n",
+							scale)));
 	}
 
 	/*
@@ -5469,8 +5663,8 @@ main(int argc, char **argv)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
-				exit(1);
+			putVariableInt(&state[i].variables, "startup", "scale", scale,
+						   false);
 		}
 	}
 
@@ -5481,8 +5675,8 @@ main(int argc, char **argv)
 	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
-				exit(1);
+			putVariableInt(&state[i].variables, "startup", "client_id", i,
+						   false);
 	}
 
 	/* set default seed for hash functions */
@@ -5494,33 +5688,32 @@ main(int argc, char **argv)
 		(uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
-								(int64) seed))
-				exit(1);
+			putVariableInt(&state[i].variables, "startup", "default_seed",
+						   (int64) seed, false);
 	}
 
 	/* set random seed unless overwritten */
 	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
-								random_seed))
-				exit(1);
+			putVariableInt(&state[i].variables, "startup", "random_seed",
+						   random_seed, false);
 	}
 
 	if (!is_no_vacuum)
 	{
-		fprintf(stderr, "starting vacuum...");
+		ereport(ELEVEL_LOG, (errmsg("starting vacuum...")));
 		tryExecuteStatement(con, "vacuum pgbench_branches");
 		tryExecuteStatement(con, "vacuum pgbench_tellers");
 		tryExecuteStatement(con, "truncate pgbench_history");
-		fprintf(stderr, "end.\n");
+		ereport(ELEVEL_LOG, (errmsg("end.\n")));
 
 		if (do_vacuum_accounts)
 		{
-			fprintf(stderr, "starting vacuum pgbench_accounts...");
+			ereport(ELEVEL_LOG,
+					(errmsg("starting vacuum pgbench_accounts...")));
 			tryExecuteStatement(con, "vacuum analyze pgbench_accounts");
-			fprintf(stderr, "end.\n");
+			ereport(ELEVEL_LOG, (errmsg("end.\n")));
 		}
 	}
 	PQfinish(con);
@@ -5578,8 +5771,9 @@ main(int argc, char **argv)
 
 			if (err != 0 || thread->thread == INVALID_THREAD)
 			{
-				fprintf(stderr, "could not create thread: %s\n", strerror(err));
-				exit(1);
+				ereport(ELEVEL_FATAL,
+						(errmsg("could not create thread: %s\n",
+								strerror(err))));
 			}
 		}
 		else
@@ -5688,8 +5882,9 @@ threadRun(void *arg)
 
 		if (thread->logfile == NULL)
 		{
-			fprintf(stderr, "could not open logfile \"%s\": %s\n",
-					logpath, strerror(errno));
+			ereport(ELEVEL_LOG,
+					(errmsg("could not open logfile \"%s\": %s\n",
+							logpath, strerror(errno))));
 			goto done;
 		}
 	}
@@ -5767,8 +5962,9 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					ereport(ELEVEL_LOG,
+							(errmsg("invalid socket: %s",
+									PQerrorMessage(st->con))));
 					goto done;
 				}
 
@@ -5844,7 +6040,8 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				fprintf(stderr, "select() failed: %s\n", strerror(errno));
+				ereport(ELEVEL_LOG,
+						(errmsg("select() failed: %s\n", strerror(errno))));
 				goto done;
 			}
 		}
@@ -5868,8 +6065,9 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					ereport(ELEVEL_LOG,
+							(errmsg("invalid socket: %s",
+									PQerrorMessage(st->con))));
 					goto done;
 				}
 
@@ -5911,6 +6109,7 @@ threadRun(void *arg)
 							lag,
 							stdev;
 				char		tbuf[315];
+				PQExpBufferData progress_buf;
 
 				/*
 				 * Add up the statistics of all threads.
@@ -5968,18 +6167,23 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 				}
 
-				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
+				initPQExpBuffer(&progress_buf);
+				printfPQExpBuffer(&progress_buf,
+								  "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
+								  tbuf, tps, latency, stdev);
 
 				if (throttle_delay)
 				{
-					fprintf(stderr, ", lag %.3f ms", lag);
+					appendPQExpBuffer(&progress_buf, ", lag %.3f ms", lag);
 					if (latency_limit)
-						fprintf(stderr, ", " INT64_FORMAT " skipped",
-								cur.skipped - last.skipped);
+						appendPQExpBuffer(&progress_buf,
+										  ", " INT64_FORMAT " skipped",
+										  cur.skipped - last.skipped);
 				}
-				fprintf(stderr, "\n");
+				appendPQExpBufferChar(&progress_buf, '\n');
+
+				ereport(ELEVEL_LOG, (errmsg("%s", progress_buf.data)));
+				termPQExpBuffer(&progress_buf);
 
 				last = cur;
 				last_report = now;
@@ -6063,10 +6267,7 @@ setalarm(int seconds)
 		!CreateTimerQueueTimer(&timer, queue,
 							   win32_timer_callback, NULL, seconds * 1000, 0,
 							   WT_EXECUTEINTIMERTHREAD | WT_EXECUTEONLYONCE))
-	{
-		fprintf(stderr, "failed to set timer\n");
-		exit(1);
-	}
+		ereport(ELEVEL_FATAL, (errmsg("failed to set timer\n")));
 }
 
 /* partial pthread implementation for Windows */
@@ -6136,3 +6337,155 @@ pthread_join(pthread_t th, void **thread_return)
 }
 
 #endif							/* WIN32 */
+
+/*
+ * errstartImpl --- begin an error-reporting cycle
+ *
+ * Initialize the error data and store the given parameters in it.
+ * Subsequently, errmsg() and perhaps other routines will be called to further
+ * populate the error data.  Finally, errfinish() will be called to actually
+ * process the error report.  If multiple threads can use the same error data,
+ * the error mutex is locked before the error data is initialized and will be
+ * unlocked in the end of the errfinish() call.
+ *
+ * Returns true in normal case.  Returns false to short-circuit the error
+ * report (if the debugging level does not resolve this error/logging level).
+ */
+static bool
+#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE__VA_ARGS)
+errstartImpl(Error error, ErrorLevel elevel)
+#else							/* !(ENABLE_THREAD_SAFETY && HAVE__VA_ARGS) */
+errstartImpl(ErrorLevel elevel)
+#endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
+{
+	bool		start_error_reporting;
+
+	/* Check if we have the appropriate debugging level */
+	switch (elevel)
+	{
+		case ELEVEL_DEBUG:
+			/*
+			 * Print the message only if there's a debugging mode for all types
+			 * of messages.
+			 */
+			start_error_reporting = debug;
+			break;
+		case ELEVEL_LOG:
+		case ELEVEL_FATAL:
+			/*
+			 * Always print the error/log message.
+			 */
+			start_error_reporting = true;
+			break;
+		default:
+			/* internal error which should never occur */
+			ereport(ELEVEL_FATAL,
+					(errmsg("unexpected error level: %d\n", elevel)));
+			break;
+	}
+
+	/* Initialize the error data */
+	if (start_error_reporting)
+	{
+		Assert(error);
+
+#if defined(ENABLE_THREAD_SAFETY) && !defined(HAVE__VA_ARGS)
+		pthread_mutex_lock(&error_mutex);
+#endif							/* ENABLE_THREAD_SAFETY && !HAVE__VA_ARGS */
+
+		error->elevel = elevel;
+		initPQExpBuffer(&error->message);
+	}
+
+	return start_error_reporting;
+}
+
+/*
+ * errmsgImpl --- add a primary error message text to the current error
+ */
+static int
+#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE__VA_ARGS)
+errmsgImpl(Error error, const char *fmt,...)
+#else							/* !(ENABLE_THREAD_SAFETY && HAVE__VA_ARGS) */
+errmsgImpl(const char *fmt,...)
+#endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
+{
+	va_list		ap;
+	bool		done;
+
+	Assert(error);
+
+	if (PQExpBufferBroken(&error->message))
+	{
+		/* Already failed. */
+		/* Return value does not matter. */
+		return 0;
+	}
+
+	/* Loop in case we have to retry after enlarging the buffer. */
+	do
+	{
+		va_start(ap, fmt);
+		done = appendPQExpBufferVA(&error->message, fmt, ap);
+		va_end(ap);
+	} while (!done);
+
+	/* Return value does not matter. */
+	return 0;
+}
+
+/*
+ * errfinishImpl --- end an error-reporting cycle
+ *
+ * Print the appropriate error report to stderr.
+ *
+ * If elevel is ELEVEL_FATAL or worse, control does not return to the caller.
+ * See ErrorLevel enumeration for the error level definitions.
+ *
+ * If the error message buffer is empty or broken, prints a corresponding error
+ * message and exits the program.
+ */
+static void
+#if defined(ENABLE_THREAD_SAFETY) && defined(HAVE__VA_ARGS)
+errfinishImpl(Error error, int dummy,...)
+#else							/* !(ENABLE_THREAD_SAFETY && HAVE__VA_ARGS) */
+errfinishImpl(int dummy,...)
+#endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
+{
+	bool		error_during_reporting = false;
+	ErrorLevel  elevel;
+
+	Assert(error);
+	elevel = error->elevel;
+
+	/*
+	 * Immediately print the message to stderr so as not to get an endless cycle
+	 * of errors...
+	 */
+	if (PQExpBufferDataBroken(error->message))
+	{
+		error_during_reporting = true;
+		fprintf(stderr, "out of memory\n");
+	}
+	else if (*(error->message.data) == '\0')
+	{
+		/* internal error which should never occur */
+		error_during_reporting = true;
+		fprintf(stderr, "empty error message cannot be reported\n");
+	}
+	else
+	{
+		fprintf(stderr, "%s", error->message.data);
+	}
+
+	/* Release the error data and exit if needed */
+
+	termPQExpBuffer(&error->message);
+
+#if defined(ENABLE_THREAD_SAFETY) && !defined(HAVE__VA_ARGS)
+	pthread_mutex_unlock(&error_mutex);
+#endif							/* ENABLE_THREAD_SAFETY && !HAVE__VA_ARGS */
+
+	if (elevel >= ELEVEL_FATAL || error_during_reporting)
+		exit(1);
+}
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index d6a38d0..e983abc 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -172,3 +172,4 @@ PQsslAttribute            169
 PQsetErrorContextVisibility 170
 PQresultVerboseErrorMessage 171
 PQencryptPasswordConn     172
+appendPQExpBufferVA       173
diff --git a/src/interfaces/libpq/pqexpbuffer.c b/src/interfaces/libpq/pqexpbuffer.c
index 86b16e6..3db2d4c 100644
--- a/src/interfaces/libpq/pqexpbuffer.c
+++ b/src/interfaces/libpq/pqexpbuffer.c
@@ -37,8 +37,6 @@
 /* All "broken" PQExpBuffers point to this string. */
 static const char oom_buffer[1] = "";
 
-static bool appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args) pg_attribute_printf(2, 0);
-
 
 /*
  * markPQExpBufferBroken
@@ -282,7 +280,7 @@ appendPQExpBuffer(PQExpBuffer str, const char *fmt,...)
  * Attempt to format data and append it to str.  Returns true if done
  * (either successful or hard failure), false if need to retry.
  */
-static bool
+bool
 appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args)
 {
 	size_t		avail;
diff --git a/src/interfaces/libpq/pqexpbuffer.h b/src/interfaces/libpq/pqexpbuffer.h
index 771602a..b70b868 100644
--- a/src/interfaces/libpq/pqexpbuffer.h
+++ b/src/interfaces/libpq/pqexpbuffer.h
@@ -158,6 +158,14 @@ extern void printfPQExpBuffer(PQExpBuffer str, const char *fmt,...) pg_attribute
 extern void appendPQExpBuffer(PQExpBuffer str, const char *fmt,...) pg_attribute_printf(2, 3);
 
 /*------------------------
+ * appendPQExpBufferVA
+ * Shared guts of printfPQExpBuffer/appendPQExpBuffer.
+ * Attempt to format data and append it to str.  Returns true if done
+ * (either successful or hard failure), false if need to retry.
+ */
+extern bool appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args) pg_attribute_printf(2, 0);
+
+/*------------------------
  * appendPQExpBufferStr
  * Append the given string to a PQExpBuffer, allocating more space
  * if necessary.
-- 
2.7.4

v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patchtext/x-diff; name=v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patchDownload

From b3d71b02d0ad7277e2c18e0d2073722871f1552f Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Mon, 21 May 2018 15:26:06 +0300
Subject: [PATCH v9] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures are rolled back and
repeated until they complete successfully or reach the maximum number of tries
(specified by the --max-tries option) / the maximum time of tries (specified by
the --latency-limit option). These options can be combined together; but if
none of them are used, failed transactions are not retried at all. If the last
transaction run fails, this transaction will be reported as failed, and the
client variables will be set as they were before the first run of this
transaction.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and/or errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r) and the total number of retries and/or errors is not
zero.

If a failed transaction block does not terminate in the current script, the
commands of the following scripts are processed as usual so you can get a lot of
errors of type "in failed SQL transaction" (when the current SQL transaction is
aborted and commands ignored until end of transaction block). In such cases you
can use separate statistics of these errors in all reports.

If you want to distinguish between failures or errors by type (including which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock errors), use the pgbench debugging output created with
the option --debug and with the debugging level "fails" or "all". The first
variant is recommended for this purpose because with in the second case the
debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                      |  321 +++++-
 src/bin/pgbench/pgbench.c                          | 1109 ++++++++++++++++----
 src/bin/pgbench/t/001_pgbench_with_server.pl       |   44 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl         |    7 +-
 .../t/003_serialization_and_deadlock_fails.pl      |  761 ++++++++++++++
 5 files changed, 2025 insertions(+), 217 deletions(-)
 create mode 100644 src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e4b37dd..f894390 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,19 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
-  and intended (the latter being just the product of number of clients
+  The first six lines and the eighth line report some of the most important
+  parameter settings.  The seventh line reports the number of transactions
+  completed and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  (see <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+  for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -380,11 +383,28 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
-      <term><option>-d</option></term>
-      <term><option>--debug</option></term>
+      <term><option>-d</option> <replaceable>debug_level</replaceable></term>
+      <term><option>--debug=</option><replaceable>debug_level</replaceable></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. You can use the following debugging levels:
+          <itemizedlist>
+           <listitem>
+            <para><literal>no</literal>: no debugging output (except built-in
+            function <function>debug</function>, see <xref
+            linkend="pgbench-functions"/>).</para>
+           </listitem>
+           <listitem>
+            <para><literal>fails</literal>: print only failure messages, errors
+            and retries (see <xref linkend="errors-and-retries"
+            endterm="errors-and-retries-title"/> for more information).</para>
+           </listitem>
+           <listitem>
+            <para><literal>all</literal>: print all debugging output
+            (throttling, executed/sent/received commands etc.).</para>
+           </listitem>
+          </itemizedlist>
+        The default is no debugging output.
        </para>
       </listitem>
      </varlistentry>
@@ -453,6 +473,16 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        The transaction with serialization or deadlock failure can be retried if
+        the total time of all its tries is less than
+        <replaceable>limit</replaceable> ms. This option can be combined with
+        the option <option>--max-tries</option> which limits the total number of
+        transaction tries. But if none of them are used, failed transactions are
+        not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -513,22 +543,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the tps since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions ended with a
+        failed SQL or meta command since the last report, they are also reported
+        as failed.  If any transactions ended with an error "in failed SQL
+        transaction block", they are reported separatly as <literal>in failed
+        tx</literal> (see <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information).  Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.  If
+        any transactions have been rolled back and retried after a
+        serialization/deadlock failure since the last report, the report
+        includes the number of such transactions and the sum of all retries. Use
+        the options <option>--max-tries</option> and/or
+        <option>--latency-limit</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of all errors, the number of
+        errors "in failed SQL transaction block", and the number of retries
+        after serialization or deadlock failures.  The report displays the
+        columns with statistics on errors and retries only if the current
+        <application>pgbench</application> run has an error of the corresponding
+        type or retry, respectively. See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -667,6 +713,21 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. This option can be combined with the
+        option <option>--latency-limit</option> which limits the total time of
+        transaction tries. But if none of them are used, failed transactions are
+        not retried at all. See
+        <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+        for more information about retrying failed transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +868,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1644,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1665,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). If the transaction
+   ended with an error "in failed SQL transaction", its
+   <replaceable>time</replaceable> will be reported as
+   <literal>in_failed_tx</literal>. If the transaction ended with other error,
+   its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal> (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1705,24 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1737,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <replaceable>failed_tx</replaceable> <replaceable>in_failed_tx</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1751,13 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval,
+   <replaceable>in_failed_tx</replaceable> is the number of transactions that
+   ended with an error "in failed SQL transaction block" (see
+   <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+   for more information).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1765,28 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). They report the
+   number of retried transactions and the sum of all the retries after
+   serialization or deadlock failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0 0
+1345828503 7884 1979812 565806736 60 1479 0 0
+1345828505 7208 1979422 567277552 59 1391 0 0
+1345828507 7685 1980268 569784714 60 1398 0 0
+1345828509 7073 1979779 573489941 236 1411 0 0
 </screen></para>
 
   <para>
@@ -1695,15 +1798,54 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors "in failed SQL transaction" in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   The report displays the columns with statistics on errors and retries only if
+   the current <application>pgbench</application> run has an error or retry,
+   respectively.
   </para>
 
+   <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
+   </para>
+
   <para>
    For the default script, the output will look similar to this:
 <screen>
@@ -1715,6 +1857,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
@@ -1732,10 +1875,50 @@ statement latencies in milliseconds:
         0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.212  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 4473/10000
+number of errors: 5527 (55.270%)
+number of retried: 7467 (74.670%)
+number of retries: 257244
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 5766/10000 (57.660 %) (including errors)
+latency average = 41.169 ms
+latency stddev = 51.783 ms
+tps = 50.322494 (including connections establishing)
+tps = 50.324595 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.004     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.213     0       0  BEGIN;
+  0.393     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.332     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.409  4971  250265  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.311   556    6975  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.299     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  0.520     0       4  END;
+</screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1932,78 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the backend was lost. Otherwise if the execution of SQL or
+   meta command fails, the client's run continues normally until the end of the
+   current script execution (it is assumed that one transaction script contains
+   only one transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are rolled back and
+   repeated until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed,
+   and the client variables will be set as they were before the first run of
+   this transaction.
+  </para>
+
+  <note>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+   <para>
+    If a failed transaction block does not terminate in the current script, the
+    commands of the following scripts are processed as usual so you can get a
+    lot of errors of type "in failed SQL transaction" (when the current SQL
+    transaction is aborted and commands ignored until end of transaction block).
+    In such cases you can use separate statistics of these errors in all
+    reports.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of transactions ended with an error "in failed SQL
+   transaction block" is non-zero, the main report also contains it. If the
+   total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries (use the options
+   <option>--max-tries</option> and/or <option>--latency-limit</option> to make
+   it possible). The per-statement report inherits all columns from the main
+   report. Note that if a failure/error occurs, the following failures/errors in
+   the current script execution are not shown in the reports. The retry is only
+   reported for the first command where the failure occured during the current
+   script execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type (including
+   which limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock errors), use the <application>pgbench</application>
+   debugging output created with the option <option>--debug</option> and with
+   the debugging level <literal>fails</literal> or <literal>all</literal>. The
+   first variant is recommended for this purpose because with in the second case
+   the debugging output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d100cee..57495d6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,9 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_IN_FAILED_SQL_TRANSACTION  "25P02"
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -187,9 +190,25 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There're different types of restrictions for deciding that the current failed
+ * transaction can no longer be retried and should be reported as failed:
+ * - max_tries can be used to limit the number of tries;
+ * - latency_limit can be used to limit the total time of tries.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the failed transactions. By default, failed transactions are not
+ * retried at all.
+ */
+uint32		max_tries = 0;		/* we cannot retry a failed transaction if its
+								 * number of tries reaches this maximum; if its
+								 * value is zero, it is not used */
+
 char	   *pghost = "";
 char	   *pgport = "";
 char	   *login = NULL;
@@ -243,9 +262,21 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		errors;			/* number of transactions that were not retried
+								 * after a serialization or a deadlock
+								 * failure or had another error (including meta
+								 * commands errors) */
+	int64		errors_in_failed_tx;	/* number of transactions that failed in
+										 * a error
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -269,6 +300,36 @@ typedef struct RandomState
 } RandomState;
 
 /*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct RetryState
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	ANOTHER_FAILURE,			/* other failures that are not listed by
+								 * themselves below */
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE,
+	IN_FAILED_SQL_TRANSACTION
+} FailureStatus;
+
+typedef struct Failure
+{
+	FailureStatus status;		/* type of the failure */
+	int			command;		/* command number in script where the failure
+								 * occurred */
+} Failure;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -323,6 +384,22 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for transactions with serialization or deadlock failures.
+	 *
+	 * First, remember the failure in CSTATE_FAILURE. Then process other
+	 * commands of the failed transaction if any and go to CSTATE_RETRY. If we
+	 * can re-execute the transaction from the very beginning, report this as a
+	 * failure, set the same parameters for the transaction execution as in the
+	 * previous tries and process the first transaction command in
+	 * CSTATE_START_COMMAND. Otherwise, report this as an error, set the
+	 * parameters for the transaction execution as they were before the first
+	 * run of this transaction (except for a random state) and go to
+	 * CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -364,6 +441,18 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	Failure		first_failure;	/* status and command number of the first
+								 * failure in the current transaction execution;
+								 * status NO_FAILURE if there were no failures
+								 * or errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction? */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -417,7 +506,8 @@ typedef struct
 	instr_time	start_time;		/* thread start time */
 	instr_time	conn_time;
 	StatsData	stats;
-	int64		latency_late;	/* executed but late transactions */
+	int64		latency_late;	/* executed but late transactions (including
+								 * errors) */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -463,6 +553,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;
+	int64		errors;			/* number of failures that were not retried */
+	int64		errors_in_failed_tx;	/* number of errors
+										 * ERRCODE_IN_FAILED_SQL_TRANSACTION */
 } Command;
 
 typedef struct ParsedScript
@@ -478,7 +572,18 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum DebugLevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_FAILS,				/* print only failure messages, errors and
+								 * retries */
+	DEBUG_ALL,					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+	NUM_DEBUGLEVEL
+} DebugLevel;
+
+static DebugLevel debug_level = NO_DEBUG;	/* debug flag */
+static const char *DEBUGLEVEL[] = {"no", "fails", "all"};
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -534,9 +639,22 @@ typedef enum ErrorLevel
 	ELEVEL_DEBUG,
 
 	/*
-	 * To report the error/log messages and/or PGBENCH_DEBUG.
+	 * Normal failure of the SQL/meta command, or processing of the failed
+	 * transaction (its end/retry).
+	 */
+	ELEVEL_LOG_CLIENT_FAIL,
+
+	/*
+	 * Something serious e.g. connection with the backend was lost.. therefore
+	 * abort the client.
 	 */
-	ELEVEL_LOG,
+	ELEVEL_LOG_CLIENT_ABORTED,
+
+	/*
+	 * To report the error/log messages of the main program and/or
+	 * PGBENCH_DEBUG.
+	 */
+	ELEVEL_LOG_MAIN,
 
 	/*
 	 * To report the error messages of the main program and to exit immediately.
@@ -641,7 +759,6 @@ static int  errmsgImpl(const char *fmt,...) pg_attribute_printf(1, 2);
 static void errfinishImpl(int dummy,...);
 #endif							/* ENABLE_THREAD_SAFETY && HAVE__VA_ARGS */
 
-
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,						/* don't need get_variable functionality */
@@ -688,7 +805,7 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
@@ -697,11 +814,12 @@ usage(void)
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug              print debugging output\n"
+		   "  -d, --debug=no|fails|all print debugging output (default: no)\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -784,7 +902,7 @@ strtoint64(const char *str)
 	/* require at least one digit */
 	if (!isdigit((unsigned char) *ptr))
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("invalid input syntax for integer: \"%s\"\n", str)));
 	}
 
@@ -795,7 +913,7 @@ strtoint64(const char *str)
 
 		if ((tmp / 10) != result)	/* overflow? */
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("value \"%s\" is out of range for type bigint\n",
 							str)));
 		}
@@ -810,7 +928,7 @@ gotdigits:
 
 	if (*ptr != '\0')
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("invalid input syntax for integer: \"%s\"\n", str)));
 	}
 
@@ -1164,6 +1282,10 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->errors = 0;
+	sd->errors_in_failed_tx = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1172,8 +1294,30 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus first_error, int64 retries)
 {
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* Record the failed transaction */
+	if (first_error != NO_FAILURE)
+	{
+		stats->errors++;
+
+		if (first_error == IN_FAILED_SQL_TRANSACTION)
+			stats->errors_in_failed_tx++;
+
+		return;
+	}
+
+	/* Record the successful transaction */
+
 	stats->cnt++;
 
 	if (skipped)
@@ -1212,7 +1356,7 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("%s(ignoring this error and continuing anyway)\n",
 						PQerrorMessage(con))));
 	}
@@ -1260,7 +1404,7 @@ doConnect(void)
 
 		if (!conn)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("connection to database \"%s\" failed\n", dbName)));
 			return NULL;
 		}
@@ -1279,7 +1423,7 @@ doConnect(void)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("connection to database \"%s\" failed:\n%s",
 						dbName, PQerrorMessage(conn))));
 		PQfinish(conn);
@@ -1419,7 +1563,7 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("malformed variable \"%s\" value: \"%s\"\n",
 							var->name, var->svalue)));
 			return false;
@@ -1490,7 +1634,7 @@ lookupCreateVariable(Variables *variables, const char *context, char *name,
 			 * About the error level used: if we process client commands, it a
 			 * normal failure; otherwise it is not and we exit the program.
 			 */
-			ereport(client ? ELEVEL_LOG : ELEVEL_FATAL,
+			ereport(client ? ELEVEL_LOG_CLIENT_FAIL : ELEVEL_FATAL,
 					(errmsg("%s: invalid variable name: \"%s\"\n",
 							context, name)));
 			return NULL;
@@ -1706,7 +1850,7 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("cannot coerce %s to boolean\n", valueTypeName(pval))));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
@@ -1752,7 +1896,7 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("double to int overflow for %f\n", dval)));
 			return false;
 		}
@@ -1761,7 +1905,7 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("cannot coerce %s to int\n", valueTypeName(pval))));
 		return false;
 	}
@@ -1783,7 +1927,7 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("cannot coerce %s to double\n", valueTypeName(pval))));
 		return false;
 	}
@@ -1965,7 +2109,7 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("too many function arguments, maximum is %d\n",
 					   MAX_FARGS)));
 		return false;
@@ -2090,7 +2234,7 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								ereport(ELEVEL_LOG,
+								ereport(ELEVEL_LOG_CLIENT_FAIL,
 										(errmsg("division by zero\n")));
 								return false;
 							}
@@ -2103,7 +2247,7 @@ evalStandardFunc(TState *thread, CState *st,
 									if (li == PG_INT64_MIN)
 									{
 										ereport(
-											ELEVEL_LOG,
+											ELEVEL_LOG_CLIENT_FAIL,
 											(errmsg("bigint out of range\n")));
 										return false;
 									}
@@ -2239,7 +2383,7 @@ evalStandardFunc(TState *thread, CState *st,
 					Assert(0);
 				}
 
-				ereport(ELEVEL_LOG, (errmsg("%s", errormsg_buf.data)));
+				ereport(ELEVEL_LOG_MAIN, (errmsg("%s", errormsg_buf.data)));
 				termPQExpBuffer(&errormsg_buf);
 
 				*retval = *varg;
@@ -2364,14 +2508,14 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_CLIENT_FAIL,
 							(errmsg("empty range given to random\n")));
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_CLIENT_FAIL,
 							(errmsg("random range is too large\n")));
 					return false;
 				}
@@ -2394,7 +2538,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							ereport(ELEVEL_LOG,
+							ereport(ELEVEL_LOG_CLIENT_FAIL,
 									(errmsg("gaussian parameter must be at least %f (not %f)\n",
 											MIN_GAUSSIAN_PARAM, param)));
 							return false;
@@ -2408,7 +2552,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							ereport(ELEVEL_LOG,
+							ereport(ELEVEL_LOG_CLIENT_FAIL,
 									(errmsg("zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
 											MAX_ZIPFIAN_PARAM, param)));
 							return false;
@@ -2421,7 +2565,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							ereport(ELEVEL_LOG,
+							ereport(ELEVEL_LOG_CLIENT_FAIL,
 									(errmsg("exponential parameter must be greater than zero (got %f)\n",
 											param)));
 							return false;
@@ -2534,7 +2678,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_CLIENT_FAIL,
 							(errmsg("undefined variable \"%s\"\n",
 									expr->u.variable.varname)));
 					return false;
@@ -2630,7 +2774,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("%s: undefined variable \"%s\"\n",
 							argv[0], argv[i])));
 			return false;
@@ -2639,7 +2783,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("%s: shell command is too long\n", argv[0])));
 			return false;
 		}
@@ -2659,7 +2803,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		{
 			if (!timer_exceeded)
 			{
-				ereport(ELEVEL_LOG,
+				ereport(ELEVEL_LOG_CLIENT_FAIL,
 						(errmsg("%s: could not launch shell command\n",
 								argv[0])));
 			}
@@ -2671,7 +2815,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("%s: could not launch shell command\n", argv[0])));
 		return false;
 	}
@@ -2679,7 +2823,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	{
 		if (!timer_exceeded)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("%s: could not read result of shell command\n",
 							argv[0])));
 		}
@@ -2688,7 +2832,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	}
 	if (pclose(fp) < 0)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("%s: could not close shell command\n", argv[0])));
 		return false;
 	}
@@ -2699,7 +2843,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_CLIENT_FAIL,
 				(errmsg("%s: shell command must return an integer (not \"%s\")\n",
 						argv[0], res)));
 		return false;
@@ -2721,11 +2865,50 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(CState *st, const char *cmd, const char *message,
+			  ErrorLevel elevel)
 {
-	ereport(ELEVEL_LOG,
-			(errmsg("client %d aborted in command %d (%s) of script %d; %s\n",
-					st->id, st->command, cmd, st->use_file, message)));
+	switch (elevel)
+	{
+		case ELEVEL_LOG_CLIENT_FAIL:
+			if (st->first_failure.status == NO_FAILURE)
+			{
+				/*
+				 * This is the first failure during the execution of the current
+				 * script.
+				 */
+				ereport(ELEVEL_LOG_CLIENT_FAIL,
+						(errmsg("client %d got a failure in command %d (%s) of script %d; %s\n",
+								st->id, st->command, cmd, st->use_file,
+								message)));
+			}
+			else
+			{
+				/*
+				 * This is not the first failure during the execution of the
+				 * current script.
+				 */
+				ereport(ELEVEL_LOG_CLIENT_FAIL,
+						(errmsg("client %d continues a failed transaction in command %d (%s) of script %d; %s\n",
+								st->id, st->command, cmd, st->use_file,
+								message)));
+			}
+			break;
+		case ELEVEL_LOG_CLIENT_ABORTED:
+			ereport(ELEVEL_LOG_CLIENT_ABORTED,
+					(errmsg("client %d aborted in command %d (%s) of script %d; %s\n",
+							st->id, st->command, cmd, st->use_file, message)));
+			break;
+		case ELEVEL_DEBUG:
+		case ELEVEL_LOG_MAIN:
+		case ELEVEL_FATAL:
+		default:
+			/* internal error which should never occur */
+			ereport(ELEVEL_FATAL,
+					(errmsg("unexpected error level when the command failed: %d\n",
+							elevel)));
+			break;
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2797,7 +2980,7 @@ sendCommand(CState *st, Command *command)
 								commands[j]->argv[0], commands[j]->argc - 1, NULL);
 				if (PQresultStatus(res) != PGRES_COMMAND_OK)
 				{
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_MAIN,
 							(errmsg("%s", PQerrorMessage(st->con))));
 				}
 				PQclear(res);
@@ -2820,7 +3003,6 @@ sendCommand(CState *st, Command *command)
 		ereport(ELEVEL_DEBUG,
 				(errmsg("client %d could not send %s\n",
 						st->id, command->argv[0])));
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2841,7 +3023,7 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_CLIENT_FAIL,
 					(errmsg("%s: undefined variable \"%s\"\n",
 							argv[0], argv[1])));
 			return false;
@@ -2866,6 +3048,186 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+/*
+ * Copy an array of random state.
+ */
+static void
+copyRandomState(RandomState *destination, const RandomState *source)
+{
+	memcpy(destination->data, source->data, sizeof(unsigned short) * 3);
+}
+
+/*
+ * Make a deep copy of variables array.
+ */
+static void
+copyVariables(Variables *destination_vars, const Variables *source_vars)
+{
+	Variable   *destination;
+	Variable   *current_destination;
+	const Variable *source;
+	const Variable *current_source;
+	int			nvariables;
+
+	if (!destination_vars || !source_vars)
+		return;
+
+	destination = destination_vars->array;
+	source = source_vars->array;
+	nvariables = source_vars->nvariables;
+
+	for (current_destination = destination;
+		 current_destination - destination < destination_vars->nvariables;
+		 ++current_destination)
+	{
+		pg_free(current_destination->name);
+		pg_free(current_destination->svalue);
+	}
+
+	destination_vars->array = pg_realloc(destination_vars->array,
+										 sizeof(Variable) * nvariables);
+	destination = destination_vars->array;
+
+	for (current_source = source, current_destination = destination;
+		 current_source - source < nvariables;
+		 ++current_source, ++current_destination)
+	{
+		current_destination->name = pg_strdup(current_source->name);
+		if (current_source->svalue)
+			current_destination->svalue = pg_strdup(current_source->svalue);
+		else
+			current_destination->svalue = NULL;
+		current_destination->value = current_source->value;
+	}
+
+	destination_vars->nvariables = nvariables;
+	destination_vars->vars_sorted = source_vars->vars_sorted;
+}
+
+/*
+ * Returns true if this type of failure can be retried.
+ */
+static bool
+canRetryFailure(FailureStatus failure_status)
+{
+	return (failure_status == SERIALIZATION_FAILURE ||
+			failure_status == DEADLOCK_FAILURE);
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, instr_time *now)
+{
+	FailureStatus failure_status = st->first_failure.status;
+
+	Assert(failure_status != NO_FAILURE);
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!canRetryFailure(failure_status))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of failed
+	 * transactions.
+	 */
+	Assert(max_tries || latency_limit);
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (max_tries && st->retries + 1 >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the failure if we spent too much time on this
+	 * transaction.
+	 */
+	if (latency_limit)
+	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
+		if (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled >= latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Process the conditional stack depending on the condition value; is used for
+ * the meta commands \if and \elif.
+ */
+static void
+executeCondition(CState *st, bool condition)
+{
+	Command    *command = sql_script[st->use_file].commands[st->command];
+
+	/* execute or not depending on evaluated condition */
+	if (command->meta == META_IF)
+	{
+		conditional_stack_push(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+	else if (command->meta == META_ELIF)
+	{
+		/* we should get here only if the "elif" needed evaluation */
+		Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
+		conditional_stack_poke(st->cstack,
+							   condition ? IFSTATE_TRUE : IFSTATE_FALSE);
+	}
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_IN_FAILED_SQL_TRANSACTION) == 0)
+			return IN_FAILED_SQL_TRANSACTION;
+	}
+
+	return ANOTHER_FAILURE;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, instr_time *now)
+{
+	if (!latency_limit)
+		return 0;
+
+	if (INSTR_TIME_IS_ZERO(*now))
+		INSTR_TIME_SET_CURRENT(*now);
+
+	return (100.0 * (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled) /
+			latency_limit);
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2876,6 +3238,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	instr_time	now;
 	bool		end_tx_processed = false;
 	int64		wait;
+	FailureStatus failure_status = NO_FAILURE;
 
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
@@ -2916,6 +3279,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->first_failure.status = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2963,7 +3331,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
@@ -2973,7 +3341,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -3015,7 +3383,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					start = now;
 					if ((st->con = doConnect()) == NULL)
 					{
-						ereport(ELEVEL_LOG,
+						ereport(ELEVEL_LOG_CLIENT_ABORTED,
 								(errmsg("client %d aborted while establishing connection\n",
 										st->id)));
 						st->state = CSTATE_ABORTED;
@@ -3029,6 +3397,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case if it fails or we should repeat it in
+				 * future.
+				 */
+				copyRandomState(&st->retry_state.random_state,
+								&st->random_state);
+				copyVariables(&st->retry_state.variables, &st->variables);
+
+				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
 				 */
@@ -3064,7 +3441,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 				if (command == NULL)
 				{
-					st->state = CSTATE_END_TX;
+					if (st->first_failure.status == NO_FAILURE)
+					{
+						st->state = CSTATE_END_TX;
+					}
+					else
+					{
+						/* check if we can retry the failure */
+						st->state = CSTATE_RETRY;
+					}
 					break;
 				}
 
@@ -3072,7 +3457,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3083,7 +3468,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(st, "SQL", "SQL command send failed",
+									  ELEVEL_LOG_CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -3105,6 +3491,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data)));
 					termPQExpBuffer(&errmsg_buf);
 
+					/* change it if the meta command fails */
+					failure_status = NO_FAILURE;
+
 					if (command->meta == META_SLEEP)
 					{
 						/*
@@ -3118,8 +3507,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "sleep",
+										  "execution of meta-command failed",
+										  ELEVEL_LOG_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -3150,8 +3542,18 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, argv[0],
+										  "evaluation of meta-command failed",
+										  ELEVEL_LOG_CLIENT_FAIL);
+
+							/*
+							 * Do not ruin the following conditional commands,
+							 * if any.
+							 */
+							executeCondition(st, false);
+
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -3160,29 +3562,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							if (!putVariableValue(&st->variables,  argv[0],
 												  argv[1], &result, true))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								commandFailed(st, "set",
+											  "assignment of meta-command failed",
+											  ELEVEL_LOG_CLIENT_FAIL);
+								failure_status = ANOTHER_FAILURE;
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
 						else	/* if and elif evaluated cases */
 						{
-							bool		cond = valueTruth(&result);
-
-							/* execute or not depending on evaluated condition */
-							if (command->meta == META_IF)
-							{
-								conditional_stack_push(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
-							else	/* elif */
-							{
-								/*
-								 * we should get here only if the "elif"
-								 * needed evaluation
-								 */
-								Assert(conditional_stack_peek(st->cstack) == IFSTATE_FALSE);
-								conditional_stack_poke(st->cstack, cond ? IFSTATE_TRUE : IFSTATE_FALSE);
-							}
+							executeCondition(st, valueTruth(&result));
 						}
 					}
 					else if (command->meta == META_ELSE)
@@ -3222,8 +3612,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "setshell",
+										  "execution of meta-command failed",
+										  ELEVEL_LOG_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3243,8 +3636,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "shell",
+										  "execution of meta-command failed",
+										  ELEVEL_LOG_CLIENT_FAIL);
+							failure_status = ANOTHER_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3360,37 +3756,55 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				ereport(ELEVEL_DEBUG,
-						(errmsg("client %d receiving\n", st->id)));
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					ereport(ELEVEL_DEBUG,
+							(errmsg("client %d receiving\n", st->id)));
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "SQL",
+									  "perhaps the backend died while processing",
+									  ELEVEL_LOG_CLIENT_ABORTED);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							PQclear(res);
+							discard_response(st);
+							failure_status = NO_FAILURE;
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							failure_status = getFailureStatus(sqlState);
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  ELEVEL_LOG_CLIENT_FAIL);
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  ELEVEL_LOG_CLIENT_ABORTED);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3419,7 +3833,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3438,6 +3852,139 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
+				 * Remember the failure and go ahead with next command.
+				 */
+			case CSTATE_FAILURE:
+
+				Assert(failure_status != NO_FAILURE);
+
+				/*
+				 * All subsequent failures will be "retried"/"failed" if the
+				 * first failure of this transaction can be/cannot be retried.
+				 * Therefore remember only the first failure.
+				 */
+				if (st->first_failure.status == NO_FAILURE)
+				{
+					st->first_failure.status = failure_status;
+					st->first_failure.command = st->command;
+				}
+
+				/* Go ahead with next command, to be executed or skipped */
+				st->command++;
+				st->state = conditional_active(st->cstack) ?
+					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				{
+					PQExpBufferData errmsg_buf;
+
+					command = sql_script[st->use_file].commands[st->first_failure.command];
+
+					if (canRetry(st, &now))
+					{
+						/*
+						 * The failed transaction will be retried. So accumulate
+						 * the retry.
+						 */
+						st->retries++;
+						command->retries++;
+
+						/*
+						 * Report this with failures to indicate that the failed
+						 * transaction will be retried.
+						 */
+						initPQExpBuffer(&errmsg_buf);
+						printfPQExpBuffer(&errmsg_buf,
+										  "client %d repeats the failed transaction (try %d",
+										  st->id, st->retries + 1);
+						if (max_tries)
+							appendPQExpBuffer(&errmsg_buf, "/%d", max_tries);
+						if (latency_limit)
+						{
+							appendPQExpBuffer(&errmsg_buf,
+											  ", %.3f%% of the maximum time of tries was used",
+											  getLatencyUsed(st, &now));
+						}
+						appendPQExpBufferStr(&errmsg_buf, ")\n");
+						ereport(ELEVEL_LOG_CLIENT_FAIL,
+								(errmsg("%s", errmsg_buf.data)));
+						termPQExpBuffer(&errmsg_buf);
+
+						/*
+						 * Reset the execution parameters as they were at the
+						 * beginning of the transaction.
+						 */
+						copyRandomState(&st->random_state,
+										&st->retry_state.random_state);
+						copyVariables(&st->variables, &st->retry_state.variables);
+
+						/* Process the first transaction command */
+						st->command = 0;
+						st->first_failure.status = NO_FAILURE;
+						st->state = CSTATE_START_COMMAND;
+					}
+					else
+					{
+						/*
+						 * We will not be able to retry this failed transaction.
+						 * So accumulate the error.
+						 */
+						command->errors++;
+						if (st->first_failure.status ==
+							IN_FAILED_SQL_TRANSACTION)
+							command->errors_in_failed_tx++;
+
+						/*
+						 * Report this with failures to indicate that the failed
+						 * transaction will not be retried.
+						 */
+						initPQExpBuffer(&errmsg_buf);
+						printfPQExpBuffer(&errmsg_buf,
+										  "client %d ends the failed transaction (try %d",
+										  st->id, st->retries + 1);
+
+						/*
+						 * Report the actual number and/or time of tries. We do
+						 * not need this information if this type of failure can
+						 * be never retried.
+						 */
+						if (canRetryFailure(st->first_failure.status))
+						{
+							if (max_tries)
+							{
+								appendPQExpBuffer(&errmsg_buf, "/%d",
+												  max_tries);
+							}
+							if (latency_limit)
+							{
+								appendPQExpBuffer(&errmsg_buf,
+												  ", %.3f%% of the maximum time of tries was used",
+												  getLatencyUsed(st, &now));
+							}
+						}
+						appendPQExpBufferStr(&errmsg_buf, ")\n");
+						ereport(ELEVEL_LOG_CLIENT_FAIL,
+								(errmsg("%s", errmsg_buf.data)));
+						termPQExpBuffer(&errmsg_buf);
+
+						/*
+						 * Reset the execution parameters as they were at the
+						 * beginning of the transaction except for a random
+						 * state.
+						 */
+						copyVariables(&st->variables, &st->retry_state.variables);
+
+						/* End the failed transaction */
+						st->state = CSTATE_END_TX;
+					}
+				}
+				break;
+
+				/*
 				 * End of transaction.
 				 */
 			case CSTATE_END_TX:
@@ -3458,7 +4005,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3534,13 +4082,15 @@ doLog(TState *thread, CState *st,
 		while (agg->start_time + agg_interval <= now)
 		{
 			/* print aggregated report to logfile */
-			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			fprintf(logfile, "%ld " INT64_FORMAT " %.0f %.0f %.0f %.0f " INT64_FORMAT " " INT64_FORMAT,
 					(long) agg->start_time,
 					agg->cnt,
 					agg->latency.sum,
 					agg->latency.sum2,
 					agg->latency.min,
-					agg->latency.max);
+					agg->latency.max,
+					agg->errors,
+					agg->errors_in_failed_tx);
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3551,6 +4101,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries > 1 || latency_limit)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3558,7 +4112,8 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->first_failure.status,
+				   st->retries);
 	}
 	else
 	{
@@ -3568,14 +4123,25 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->first_failure.status == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (st->first_failure.status == IN_FAILED_SQL_TRANSACTION)
+			fprintf(logfile, "%d " INT64_FORMAT " in_failed_tx %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries > 1 || latency_limit)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3595,7 +4161,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	bool		thread_details = progress || throttle_delay || latency_limit,
 				detailed = thread_details || use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped &&
+		(st->first_failure.status == NO_FAILURE || latency_limit))
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3608,7 +4175,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	if (thread_details)
 	{
 		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+		accumStats(&thread->stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 
 		/* count transactions over the latency limit, if needed */
 		if (latency_limit && latency > latency_limit)
@@ -3616,19 +4184,24 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 	}
 	else
 	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
+		/* no detailed stats */
+		accumStats(&thread->stats, skipped, 0, 0, st->first_failure.status,
+				   st->retries);
 	}
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->first_failure.status == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->first_failure.status, st->retries);
 }
 
 
@@ -3648,7 +4221,7 @@ disconnect_all(CState *state, int length)
 static void
 initDropTables(PGconn *con)
 {
-	ereport(ELEVEL_LOG, (errmsg("dropping old tables...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("dropping old tables...\n")));
 
 	/*
 	 * We drop all the tables in one command, so that whether there are
@@ -3723,7 +4296,7 @@ initCreateTables(PGconn *con)
 	};
 	int			i;
 
-	ereport(ELEVEL_LOG, (errmsg("creating tables...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("creating tables...\n")));
 
 	for (i = 0; i < lengthof(DDLs); i++)
 	{
@@ -3776,7 +4349,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	ereport(ELEVEL_LOG, (errmsg("generating data...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("generating data...\n")));
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3849,7 +4422,7 @@ initGenerateData(PGconn *con)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = ((double) scale * naccounts - j) * elapsed_sec / j;
 
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg(INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
 							j, (int64) naccounts * scale,
 							(int) (((int64) j * 100) /
@@ -3868,7 +4441,7 @@ initGenerateData(PGconn *con)
 			/* have we reached the next interval (or end)? */
 			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-				ereport(ELEVEL_LOG,
+				ereport(ELEVEL_LOG_MAIN,
 						(errmsg(INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
 								j, (int64) naccounts * scale,
 								(int) (((int64) j * 100) /
@@ -3895,7 +4468,7 @@ initGenerateData(PGconn *con)
 static void
 initVacuum(PGconn *con)
 {
-	ereport(ELEVEL_LOG, (errmsg("vacuuming...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("vacuuming...\n")));
 	executeStatement(con, "vacuum analyze pgbench_branches");
 	executeStatement(con, "vacuum analyze pgbench_tellers");
 	executeStatement(con, "vacuum analyze pgbench_accounts");
@@ -3915,7 +4488,7 @@ initCreatePKeys(PGconn *con)
 	};
 	int			i;
 
-	ereport(ELEVEL_LOG, (errmsg("creating primary keys...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("creating primary keys...\n")));
 	for (i = 0; i < lengthof(DDLINDEXes); i++)
 	{
 		char		buffer[256];
@@ -3952,7 +4525,7 @@ initCreateFKeys(PGconn *con)
 	};
 	int			i;
 
-	ereport(ELEVEL_LOG, (errmsg("creating foreign keys...\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("creating foreign keys...\n")));
 	for (i = 0; i < lengthof(DDLKEYs); i++)
 	{
 		executeStatement(con, DDLKEYs[i]);
@@ -4023,7 +4596,7 @@ runInitSteps(const char *initialize_steps)
 			case ' ':
 				break;			/* ignore */
 			default:
-				ereport(ELEVEL_LOG,
+				ereport(ELEVEL_LOG_MAIN,
 						(errmsg("unrecognized initialization step \"%c\"\n",
 								*step)));
 				PQfinish(con);
@@ -4031,7 +4604,7 @@ runInitSteps(const char *initialize_steps)
 		}
 	}
 
-	ereport(ELEVEL_LOG, (errmsg("done.\n")));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("done.\n")));
 	PQfinish(con);
 }
 
@@ -4069,7 +4642,7 @@ parseQuery(Command *cmd)
 
 		if (cmd->argc >= MAX_ARGS)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("statement has too many arguments (maximum is %d): %s\n",
 							MAX_ARGS - 1, cmd->argv[0])));
 			pg_free(name);
@@ -4109,7 +4682,7 @@ pgbench_error(const char *fmt,...)
 		va_end(ap);
 	} while (!done);
 
-	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("%s", errmsg_buf.data)));
 	termPQExpBuffer(&errmsg_buf);
 }
 
@@ -4154,7 +4727,7 @@ syntax_error(const char *source, int lineno,
 		}
 	}
 
-	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("%s", errmsg_buf.data)));
 	termPQExpBuffer(&errmsg_buf);
 	exit(1);
 }
@@ -4648,7 +5221,7 @@ listAvailableScripts(void)
 		appendPQExpBuffer(&errmsg_buf, "\t%s\n", builtin_script[i].name);
 	appendPQExpBufferChar(&errmsg_buf, '\n');
 
-	ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+	ereport(ELEVEL_LOG_MAIN, (errmsg("%s", errmsg_buf.data)));
 	termPQExpBuffer(&errmsg_buf);
 }
 
@@ -4677,12 +5250,12 @@ findBuiltin(const char *name)
 	/* error cases */
 	if (found == 0)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("no builtin script found for name \"%s\"\n", name)));
 	}
 	else
 	{						/* found > 1 */
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n",
 						found, name)));
 	}
@@ -4782,7 +5355,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + total->errors;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4803,8 +5377,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4812,6 +5386,43 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (total->errors > 0)
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors, 100.0 * total->errors / total_ntx);
+
+	if (total->errors_in_failed_tx > 0)
+		printf("number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+			   total->errors_in_failed_tx,
+			   100.0 * total->errors_in_failed_tx / total_ntx);
+
+	/*
+	 * It can be non-zero only if max_tries is greater than one or
+	 * latency_limit is used.
+	 */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
+	if (latency_limit)
+	{
+		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)",
+			   latency_limit / 1000.0, latency_late, total_ntx,
+			   (total_ntx > 0) ? 100.0 * latency_late / total_ntx : 0.0);
+
+		/* this statistics includes both successful and failed transactions */
+		if (total->errors > 0)
+			printf(" (including errors)");
+
+		printf("\n");
+	}
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4831,18 +5442,19 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			   total->skipped,
 			   100.0 * total->skipped / total->cnt);
 
-	if (latency_limit)
-		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
-
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+		printf("latency average = %.3f ms",
+			   1000.0 * time_include * nclients / total_ntx);
+
+		/* this statistics includes both successful and failed transactions */
+		if (total->errors > 0)
+			printf(" (including errors)");
+
+		printf("\n");
 	}
 
 	if (throttle_delay)
@@ -4861,7 +5473,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4870,6 +5482,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_total_ntx = sstats->cnt + sstats->errors;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4878,9 +5491,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   sql_script[i].weight,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
-					   100.0 * sstats->cnt / total->cnt,
+					   100.0 * sstats->cnt / script_total_ntx,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (total->errors > 0)
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors,
+						   100.0 * sstats->errors / script_total_ntx);
+
+				if (total->errors_in_failed_tx > 0)
+					printf(" - number of errors \"in failed SQL transaction\": " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->errors_in_failed_tx,
+						   (100.0 * sstats->errors_in_failed_tx /
+							script_total_ntx));
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * latency_limit is used.
+				 */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4889,15 +5526,33 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
 				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
+					printf(" - statement latencies in milliseconds");
 				else
-					printf("statement latencies in milliseconds:\n");
+					printf("statement latencies in milliseconds");
+
+				if (total->errors > 0)
+				{
+					printf("%s errors",
+						   ((total->errors_in_failed_tx == 0 &&
+							total->retried == 0) ?
+							" and" : ","));
+				}
+				if (total->errors_in_failed_tx > 0)
+				{
+					printf("%s errors \"in failed SQL transaction\"",
+						   total->retried == 0 ? " and" : ",");
+				}
+				if (total->retried > 0)
+				{
+					printf(" and retries");
+				}
+				printf(":\n");
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4905,10 +5560,25 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
+					printf("   %11.3f",
 						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+						   1000.0 * cstats->sum / cstats->count : 0.0);
+					if (total->errors > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors);
+					}
+					if (total->errors_in_failed_tx > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->errors_in_failed_tx);
+					}
+					if (total->retried > 0)
+					{
+						printf("  %20" INT64_MODIFIER "d",
+							   (*commands)->retries);
+					}
+					printf("  %s\n", (*commands)->line);
 				}
 			}
 		}
@@ -4937,7 +5607,7 @@ set_random_seed(const char *seed)
 		if (!pg_strong_random(&iseed, sizeof(iseed)))
 #endif
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("cannot seed random from a strong source, none available: use \"time\" or an unsigned integer value.\n")));
 			return false;
 		}
@@ -4949,7 +5619,7 @@ set_random_seed(const char *seed)
 
 		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
 							seed)));
 			return false;
@@ -4958,7 +5628,7 @@ set_random_seed(const char *seed)
 
 	if (seed != NULL)
 	{
-		ereport(ELEVEL_LOG,
+		ereport(ELEVEL_LOG_MAIN,
 				(errmsg("setting random seed to %u\n", iseed)));
 	}
 	srandom(iseed);
@@ -4987,7 +5657,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"debug", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -5002,7 +5672,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5021,6 +5691,7 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"max-tries", required_argument, NULL, 10},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5096,7 +5767,7 @@ main(int argc, char **argv)
 				(errmsg("error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n")));
 	}
 
-	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "iI:h:nvp:d:qb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
 
@@ -5126,8 +5797,22 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
-				break;
+				{
+					for (debug_level = 0;
+						 debug_level < NUM_DEBUGLEVEL;
+						 debug_level++)
+					{
+						if (strcmp(optarg, DEBUGLEVEL[debug_level]) == 0)
+							break;
+					}
+					if (debug_level >= NUM_DEBUGLEVEL)
+					{
+						ereport(ELEVEL_FATAL,
+								(errmsg("invalid debug level (-d): \"%s\"\n",
+										optarg)));
+					}
+					break;
+				}
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
@@ -5180,7 +5865,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -5379,6 +6064,20 @@ main(int argc, char **argv)
 							(errmsg("error while setting random seed from --random-seed option\n")));
 				}
 				break;
+			case 10:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg <= 0)
+					{
+						ereport(ELEVEL_FATAL,
+								(errmsg("invalid number of maximum tries: \"%s\"\n",
+										optarg)));
+					}
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
 			default:
 				ereport(ELEVEL_FATAL,
 						(errmsg(_("Try \"%s --help\" for more information.\n"),
@@ -5551,6 +6250,10 @@ main(int argc, char **argv)
 				(errmsg("--progress-timestamp is allowed only under --progress\n")));
 	}
 
+	/* If necessary set the default tries limit  */
+	if (!max_tries && !latency_limit)
+		max_tries = 1;
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5635,7 +6338,7 @@ main(int argc, char **argv)
 								  PQdb(con));
 			}
 
-			ereport(ELEVEL_LOG, (errmsg("%s", errmsg_buf.data)));
+			ereport(ELEVEL_LOG_MAIN, (errmsg("%s", errmsg_buf.data)));
 			termPQExpBuffer(&errmsg_buf);
 			exit(1);
 		}
@@ -5650,7 +6353,7 @@ main(int argc, char **argv)
 
 		/* warn if we override user-given -s switch */
 		if (scale_given)
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("scale option ignored, using count from pgbench_branches table (%d)\n",
 							scale)));
 	}
@@ -5702,18 +6405,18 @@ main(int argc, char **argv)
 
 	if (!is_no_vacuum)
 	{
-		ereport(ELEVEL_LOG, (errmsg("starting vacuum...")));
+		ereport(ELEVEL_LOG_MAIN, (errmsg("starting vacuum...")));
 		tryExecuteStatement(con, "vacuum pgbench_branches");
 		tryExecuteStatement(con, "vacuum pgbench_tellers");
 		tryExecuteStatement(con, "truncate pgbench_history");
-		ereport(ELEVEL_LOG, (errmsg("end.\n")));
+		ereport(ELEVEL_LOG_MAIN, (errmsg("end.\n")));
 
 		if (do_vacuum_accounts)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("starting vacuum pgbench_accounts...")));
 			tryExecuteStatement(con, "vacuum analyze pgbench_accounts");
-			ereport(ELEVEL_LOG, (errmsg("end.\n")));
+			ereport(ELEVEL_LOG_MAIN, (errmsg("end.\n")));
 		}
 	}
 	PQfinish(con);
@@ -5813,6 +6516,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.errors += thread->stats.errors;
+		stats.errors_in_failed_tx += thread->stats.errors_in_failed_tx;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5882,7 +6589,7 @@ threadRun(void *arg)
 
 		if (thread->logfile == NULL)
 		{
-			ereport(ELEVEL_LOG,
+			ereport(ELEVEL_LOG_MAIN,
 					(errmsg("could not open logfile \"%s\": %s\n",
 							logpath, strerror(errno))));
 			goto done;
@@ -5962,7 +6669,7 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_MAIN,
 							(errmsg("invalid socket: %s",
 									PQerrorMessage(st->con))));
 					goto done;
@@ -6040,7 +6747,7 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				ereport(ELEVEL_LOG,
+				ereport(ELEVEL_LOG_MAIN,
 						(errmsg("select() failed: %s\n", strerror(errno))));
 				goto done;
 			}
@@ -6065,7 +6772,7 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					ereport(ELEVEL_LOG,
+					ereport(ELEVEL_LOG_MAIN,
 							(errmsg("invalid socket: %s",
 									PQerrorMessage(st->con))));
 					goto done;
@@ -6101,7 +6808,11 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							errors,
+							errors_in_failed_tx;
 				double		tps,
 							total_run,
 							latency,
@@ -6129,6 +6840,11 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.errors += thread[i].stats.errors;
+					cur.errors_in_failed_tx +=
+						thread[i].stats.errors_in_failed_tx;
 				}
 
 				/* we count only actually executed transactions */
@@ -6146,6 +6862,11 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				errors = cur.errors - last.errors;
+				errors_in_failed_tx = cur.errors_in_failed_tx -
+					last.errors_in_failed_tx;
 
 				if (progress_timestamp)
 				{
@@ -6172,6 +6893,16 @@ threadRun(void *arg)
 								  "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 								  tbuf, tps, latency, stdev);
 
+				if (errors > 0)
+				{
+					appendPQExpBuffer(&progress_buf,
+									  ", " INT64_FORMAT " failed" , errors);
+					if (errors_in_failed_tx > 0)
+						appendPQExpBuffer(&progress_buf,
+										  " (" INT64_FORMAT " in failed tx)",
+										  errors_in_failed_tx);
+				}
+
 				if (throttle_delay)
 				{
 					appendPQExpBuffer(&progress_buf, ", lag %.3f ms", lag);
@@ -6180,9 +6911,20 @@ threadRun(void *arg)
 										  ", " INT64_FORMAT " skipped",
 										  cur.skipped - last.skipped);
 				}
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * latency_limit is used.
+				 */
+				if (retried > 0)
+				{
+					appendPQExpBuffer(&progress_buf,
+									  ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+									  retried, retries);
+				}
 				appendPQExpBufferChar(&progress_buf, '\n');
 
-				ereport(ELEVEL_LOG, (errmsg("%s", progress_buf.data)));
+				ereport(ELEVEL_LOG_MAIN, (errmsg("%s", progress_buf.data)));
 				termPQExpBuffer(&progress_buf);
 
 				last = cur;
@@ -6368,12 +7110,21 @@ errstartImpl(ErrorLevel elevel)
 			 * Print the message only if there's a debugging mode for all types
 			 * of messages.
 			 */
-			start_error_reporting = debug;
+			start_error_reporting = debug_level >= DEBUG_ALL;
+			break;
+		case ELEVEL_LOG_CLIENT_FAIL:
+			/*
+			 * Print a failure message only if there's at least a debugging mode
+			 * for fails.
+			 */
+			start_error_reporting = debug_level >= DEBUG_FAILS;
 			break;
-		case ELEVEL_LOG:
+		case ELEVEL_LOG_CLIENT_ABORTED:
+		case ELEVEL_LOG_MAIN:
 		case ELEVEL_FATAL:
 			/*
-			 * Always print the error/log message.
+			 * Always print an error message if the client is aborted or this is
+			 * the main program error/log message.
 			 */
 			start_error_reporting = true;
 			break;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 00fb04f..9a0ea00 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -132,7 +132,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -151,7 +152,7 @@ pgbench(
 	'pgbench simple update');
 
 pgbench(
-	'-t 100 -c 7 -M prepared -b se --debug',
+	'-t 100 -c 7 -M prepared -b se --debug all',
 	0,
 	[
 		qr{builtin: select only},
@@ -546,6 +547,11 @@ my @errors = (
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 }
 	],
+	[   'sql division by zero', 0, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+SELECT 1 / 0;
+}
+	],
 
 	# SHELL
 	[
@@ -718,6 +724,17 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 		[qr{unrecognized time unit}], q{\sleep 1 week}
 	],
 
+	# CONDITIONAL BLOCKS
+	[   'if elif failed conditions', 0,
+		[qr{division by zero}],
+		q{-- failed conditions
+\if 1 / 0
+\elif 1 / 0
+\else
+\endif
+}
+	],
+
 	# MISC
 	[
 		'misc invalid backslash command',         1,
@@ -736,14 +753,33 @@ for my $e (@errors)
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared -d fails',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		($status ?
+		 [ qr{^$} ] :
+		 [ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)},
+		   qr{^((?!number of retried)(.|\n))*$} ]),
 		$re,
 		'pgbench script error: ' . $name,
 		{ $n => $script });
 }
 
+# reset client variables in case of failure
+pgbench(
+	'-n -t 2 -d fails', 0,
+	[ qr{processed: 0/2}, qr{number of errors: 2 \(100.000%\)},
+	  qr{^((?!number of retried)(.|\n))*$} ],
+	[ qr{(client 0 got a failure in command 1 \(SQL\) of script 0; ERROR:  syntax error at or near ":"(.|\n)*){2}} ],
+	'pgbench reset client variables in case of failure',
+	{	'001_pgbench_reset_client_variables' => q{
+BEGIN;
+-- select an unassigned variable
+SELECT :unassigned_var;
+\set unassigned_var 1
+END;
+}
+	});
+
 # zipfian cache array overflow
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index a9e067b..b262d5d 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -59,7 +59,7 @@ my @options = (
 	# name, options, stderr checks
 	[
 		'bad option',
-		'-h home -p 5432 -U calvin -d --bad-option',
+		'-h home -p 5432 -U calvin -d all --bad-option',
 		[ qr{(unrecognized|illegal) option}, qr{--help.*more information} ]
 	],
 	[
@@ -151,6 +151,11 @@ my @options = (
 			qr{error while setting random seed from --random-seed option}
 		]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[ qr{invalid number of maximum tries: "-10"} ]
+	],
 
 	# loging sub-options
 	[
diff --git a/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
new file mode 100644
index 0000000..5e45cb1
--- /dev/null
+++ b/src/bin/pgbench/t/003_serialization_and_deadlock_fails.pl
@@ -0,0 +1,761 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 34;
+
+use constant
+{
+	READ_COMMITTED   => 0,
+	REPEATABLE_READ  => 1,
+	SERIALIZABLE     => 2,
+};
+
+my @isolation_level_shell = (
+	'read\\ committed',
+	'repeatable\\ read',
+	'serializable');
+
+# The keys of advisory locks for testing deadlock failures:
+use constant
+{
+	DEADLOCK_1         => 3,
+	WAIT_PGBENCH_2     => 4,
+	DEADLOCK_2         => 5,
+	TRANSACTION_ENDS_1 => 6,
+	TRANSACTION_ENDS_2 => 7,
+};
+
+# Test concurrent update in table row.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->safe_psql('postgres',
+    'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2), (2, 3);');
+
+my $script_serialization = $node->basedir . '/pgbench_script_serialization';
+append_to_file($script_serialization,
+		"\\set delta random(-5000, 5000)\n"
+	  . "BEGIN;\n"
+	  . "SELECT pg_sleep(1);\n"
+	  . "UPDATE xy SET y = y + :delta "
+	  . "WHERE x = 1 AND pg_advisory_lock(0) IS NOT NULL;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "END;\n");
+
+my $script_deadlocks1 = $node->basedir . '/pgbench_script_deadlocks1';
+append_to_file($script_deadlocks1,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "SELECT pg_advisory_lock(" . WAIT_PGBENCH_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_1 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+my $script_deadlocks2 = $node->basedir . '/pgbench_script_deadlocks2';
+append_to_file($script_deadlocks2,
+		"BEGIN;\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_2 . ");\n"
+	  . "SELECT pg_advisory_lock(" . DEADLOCK_1 . ");\n"
+	  . "END;\n"
+	  . "SELECT pg_advisory_unlock_all();\n"
+	  . "SELECT pg_advisory_lock(" . TRANSACTION_ENDS_2 . ");\n"
+	  . "SELECT pg_advisory_unlock_all();");
+
+sub test_pgbench_serialization_errors
+{
+	my ($max_tries, $latency_limit, $test_name) = @_;
+
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	my $retry_options =
+		($max_tries ? "--max-tries $max_tries" : "")
+	  . ($latency_limit ? "--latency-limit $latency_limit" : "");
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_serialization,
+		split /\s+/, $retry_options);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 0/1},
+		"$test_name: check processed transactions");
+
+	like($out_pgbench,
+		qr{number of errors: 1 \(100\.000%\)},
+		"$test_name: check errors");
+
+	like($out_pgbench,
+		qr{^((?!number of retried)(.|\n))*$},
+		"$test_name: check retried");
+
+	if ($max_tries)
+	{
+		like($out_pgbench,
+			qr{maximum number of tries: $max_tries},
+			"$test_name: check the maximum number of tries");
+	}
+	else
+	{
+		like($out_pgbench,
+			qr{^((?!maximum number of tries)(.|\n))*$},
+			"$test_name: check the maximum number of tries");
+	}
+
+	if ($latency_limit)
+	{
+		like($out_pgbench,
+			qr{number of transactions above the $latency_limit\.0 ms latency limit: 1/1 \(100.000 \%\) \(including errors\)},
+			"$test_name: check transactions above latency limit");
+	}
+	else
+	{
+		like($out_pgbench,
+			qr{^((?!latency limit)(.|\n))*$},
+			"$test_name: check transactions above latency limit");
+	}
+
+	my $pattern =
+		"client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"$test_name: check serialization failure");
+}
+
+sub test_pgbench_serialization_failures
+{
+	my $isolation_level = REPEATABLE_READ;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h_pgbench, $in_pgbench, $out_pgbench, $err_pgbench);
+
+	# Open a psql session, run a parallel transaction and aquire an advisory
+	# lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql = "begin;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /BEGIN/;
+
+	$in_psql =
+		"update xy set y = y + 1 "
+	  . "where x = 1 and pg_advisory_lock(0) is not null;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /UPDATE 1/;
+
+	# Start pgbench:
+	my @command = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_serialization);
+	print "# Running: " . join(" ", @command) . "\n";
+	$h_pgbench = IPC::Run::start \@command, \$in_pgbench, \$out_pgbench,
+	  \$err_pgbench;
+
+	# Wait until pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select * from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = 0::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /1 row/);
+
+	# In psql, commit the transaction, release advisory locks and end the
+	# session:
+	$in_psql = "end;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /COMMIT/;
+
+	$in_psql = "select pg_advisory_unlock_all();\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get pgbench results
+	$h_pgbench->pump() until length $out_pgbench;
+	$h_pgbench->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h_pgbench->full_results)[0]
+	  : $h_pgbench->result(0);
+
+	# Check pgbench results
+	ok(!$result, "@command exit code 0");
+
+	like($out_pgbench,
+		qr{processed: 1/1},
+		"concurrent update with retrying: check processed transactions");
+
+	like($out_pgbench,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent update with retrying: check errors");
+
+	like($out_pgbench,
+		qr{number of retried: 1 \(100\.000%\)},
+		"concurrent update with retrying: check retried");
+
+	like($out_pgbench,
+		qr{number of retries: 1},
+		"concurrent update with retrying: check retries");
+
+	like($out_pgbench,
+		qr{latency average = \d+\.\d{3} ms\n},
+		"concurrent update with retrying: check latency average");
+
+	my $pattern =
+		"client 0 sending UPDATE xy SET y = y \\+ (-?\\d+) "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  could not serialize access due to concurrent update\n\n"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g2+"
+	  . "client 0 continues a failed transaction in command 4 \\(SQL\\) of script 0; "
+	  . "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n"
+	  . "client 0 sending END;\n"
+	  . "\\g2+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 executing \\\\set delta\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g2+"
+	  . "client 0 sending SELECT pg_sleep\\(1\\);\n"
+	  . "\\g2+"
+	  . "client 0 sending UPDATE xy SET y = y \\+ \\g1 "
+	  . "WHERE x = 1 AND pg_advisory_lock\\(0\\) IS NOT NULL;";
+
+	like($err_pgbench,
+		qr{$pattern},
+		"concurrent update with retrying: check the retried transaction");
+}
+
+sub test_pgbench_deadlock_errors
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug fails --file),
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for and end the session:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	# The first or second pgbench should get a deadlock error
+	ok((($out1 =~ /processed: 0\/1/ and $out2 =~ /processed: 1\/1/) or
+		($out2 =~ /processed: 0\/1/ and $out1 =~ /processed: 1\/1/)),
+		"concurrent deadlock update: check processed transactions");
+
+	ok((($out1 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of errors)(.|\n))*$/) or
+		($out2 =~ /number of errors: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of errors)(.|\n))*$/)),
+		"concurrent deadlock update: check errors");
+
+	ok(($err1 =~ /client 0 got a failure in command 3 \(SQL\) of script 0; ERROR:  deadlock detected/ or
+		$err2 =~ /client 0 got a failure in command 2 \(SQL\) of script 0; ERROR:  deadlock detected/),
+		"concurrent deadlock update: check deadlock failure");
+
+	# Both pgbenches do not have retried transactions
+	like($out1 . $out2,
+		qr{^((?!number of retried)(.|\n))*$},
+		"concurrent deadlock update: check retried");
+}
+
+sub test_pgbench_deadlock_failures
+{
+	my $isolation_level = READ_COMMITTED;
+	my $isolation_level_shell = $isolation_level_shell[$isolation_level];
+
+	local $ENV{PGPORT} = $node->port;
+	local $ENV{PGOPTIONS} =
+		"-c default_transaction_isolation=" . $isolation_level_shell;
+	print "# PGOPTIONS: " . $ENV{PGOPTIONS} . "\n";
+
+	my ($h_psql, $in_psql, $out_psql);
+	my ($h1, $in1, $out1, $err1);
+	my ($h2, $in2, $out2, $err2);
+
+	# Open a psql session and aquire an advisory lock:
+	print "# Starting psql\n";
+	$h_psql = IPC::Run::start [ 'psql' ], \$in_psql, \$out_psql;
+
+	$in_psql =
+		"select pg_advisory_lock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_lock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Run the first pgbench:
+	my @command1 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks1);
+	print "# Running: " . join(" ", @command1) . "\n";
+	$h1 = IPC::Run::start \@command1, \$in1, \$out1, \$err1;
+
+	# Wait until the first pgbench also tries to acquire the same advisory lock:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . WAIT_PGBENCH_2 . "_zero' "
+		  . "else '" . WAIT_PGBENCH_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . WAIT_PGBENCH_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ WAIT_PGBENCH_2 ]}_not_zero/);
+
+	# Run the second pgbench:
+	my @command2 = (
+		qw(pgbench --no-vacuum --transactions 1 --debug all --max-tries 2),
+		"--file",
+		$script_deadlocks2);
+	print "# Running: " . join(" ", @command2) . "\n";
+	$h2 = IPC::Run::start \@command2, \$in2, \$out2, \$err2;
+
+	# Wait until the second pgbench tries to acquire the lock held by the first
+	# pgbench:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . DEADLOCK_1 . "_zero' "
+		  . "else '" . DEADLOCK_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . DEADLOCK_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ DEADLOCK_1 ]}_not_zero/);
+
+	# In the psql session, acquire the locks that pgbenches will wait for:
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_1 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_1 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_1 ]}/;
+
+	$in_psql =
+		"select pg_advisory_lock(" . TRANSACTION_ENDS_2 . ") "
+	  . "as pg_advisory_lock_" . TRANSACTION_ENDS_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_lock_@{[ TRANSACTION_ENDS_2 ]}/;
+
+	# In the psql session, release the lock that the first pgbench is waiting
+	# for:
+	$in_psql =
+		"select pg_advisory_unlock(" . WAIT_PGBENCH_2 . ") "
+	  . "as pg_advisory_unlock_" . WAIT_PGBENCH_2 . ";\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_@{[ WAIT_PGBENCH_2 ]}/;
+
+	# Wait until pgbenches try to acquire the locks held by the psql session:
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_1 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_1 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_1
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_1 ]}_not_zero/);
+
+	do
+	{
+		$in_psql =
+			"select case count(*) "
+		  . "when 0 then '" . TRANSACTION_ENDS_2 . "_zero' "
+		  . "else '" . TRANSACTION_ENDS_2 . "_not_zero' end "
+		  . "from pg_locks where "
+		  . "locktype = 'advisory' and "
+		  . "objsubid = 1 and "
+		  . "((classid::bigint << 32) | objid::bigint = "
+		  . TRANSACTION_ENDS_2
+		  . "::bigint) and "
+		  . "not granted;\n";
+		print "# Running in psql: " . join(" ", $in_psql);
+		$h_psql->pump() while length $in_psql;
+	} while ($out_psql !~ /@{[ TRANSACTION_ENDS_2 ]}_not_zero/);
+
+	# In the psql session, release advisory locks and end the session:
+	$in_psql = "select pg_advisory_unlock_all() as pg_advisory_unlock_all;\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() until $out_psql =~ /pg_advisory_unlock_all/;
+
+	$in_psql = "\\q\n";
+	print "# Running in psql: " . join(" ", $in_psql);
+	$h_psql->pump() while length $in_psql;
+
+	$h_psql->finish();
+
+	# Get results from all pgbenches:
+	$h1->pump() until length $out1;
+	$h1->finish();
+
+	$h2->pump() until length $out2;
+	$h2->finish();
+
+	# On Windows, the exit status of the process is returned directly as the
+	# process's exit code, while on Unix, it's returned in the high bits
+	# of the exit code (see WEXITSTATUS macro in the standard <sys/wait.h>
+	# header file). IPC::Run's result function always returns exit code >> 8,
+	# assuming the Unix convention, which will always return 0 on Windows as
+	# long as the process was not terminated by an exception. To work around
+	# that, use $h->full_result on Windows instead.
+	my $result1 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h1->full_results)[0]
+	  : $h1->result(0);
+
+	my $result2 =
+	    ($Config{osname} eq "MSWin32")
+	  ? ($h2->full_results)[0]
+	  : $h2->result(0);
+
+	# Check all pgbench results
+	ok(!$result1, "@command1 exit code 0");
+	ok(!$result2, "@command2 exit code 0");
+
+	like($out1,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 1: "
+	  . "check processed transactions");
+	like($out2,
+		qr{processed: 1/1},
+		"concurrent deadlock update with retrying: pgbench 2: "
+	  . "check processed transactions");
+
+	# The first or second pgbench should get a deadlock error which was retried:
+	like($out1 . $out2,
+		qr{^((?!number of errors)(.|\n))*$},
+		"concurrent deadlock update with retrying: check errors");
+
+	ok((($out1 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out2 =~ /^((?!number of retried)(.|\n))*$/) or
+		($out2 =~ /number of retried: 1 \(100\.000%\)/ and
+		 $out1 =~ /^((?!number of retried)(.|\n))*$/)),
+		"concurrent deadlock update with retrying: check retries");
+
+	my $pattern1 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 3 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . WAIT_PGBENCH_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	my $pattern2 =
+		"client 0 sending BEGIN;\n"
+	  . "(client 0 receiving\n)+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 got a failure in command 2 \\(SQL\\) of script 0; "
+	  . "ERROR:  deadlock detected\n"
+	  . "((?!client 0)(.|\n))*"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 repeats the failed transaction \\(try 2/2\\)\n"
+	  . "client 0 sending BEGIN;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . DEADLOCK_1 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending END;\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_lock\\(" . TRANSACTION_ENDS_2 . "\\);\n"
+	  . "\\g1+"
+	  . "client 0 sending SELECT pg_advisory_unlock_all\\(\\);\n"
+	  . "\\g1+";
+
+	ok(($err1 =~ /$pattern1/ or $err2 =~ /$pattern2/),
+		"concurrent deadlock update with retrying: "
+	  . "check the retried transaction");
+}
+
+test_pgbench_serialization_errors(
+								1,      # --max-tries
+								0,      # --latency-limit (will not be used)
+								"concurrent update");
+test_pgbench_serialization_errors(
+								0,	    # --max-tries (will not be used)
+								900,    # --latency-limit
+								"concurrent update with maximum time of tries");
+
+test_pgbench_serialization_failures();
+
+test_pgbench_deadlock_errors();
+test_pgbench_deadlock_failures();
+
+#done
+$node->stop;
-- 
2.7.4

#59

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#58)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a client's
random seed during the repeating of transactions after serialization/deadlock
failures).

A few comments about this first patch.

Patch applies cleanly, compiles, global & pgbench "make check" ok.

I'm mostly ok with the changes, which cleanly separate the different use
of random between threads (script choice, throttle delay, sampling...) and
client (random*() calls).

This change is necessary so that a client can restart a transaction
deterministically (at the client level at least), which is the ultimate
aim of the patch series.

A few remarks:

The RandomState struct is 6 bytes, which will induce some padding when
used. This is life and pre-existing. No problem.

ISTM that the struct itself does not need a name, ie. "typedef struct {
... } RandomState" is enough.

There could be clear comments, say in the TState and CState structs, about
what randomness is impacted (i.e. script choices, etc.).

getZipfianRand, computeHarmonicZipfian: The "thread" parameter was
justified because it was used for two fieds. As the random state is
separated, I'd suggest that the other argument should be a zipfcache
pointer.

While reading your patch, it occurs to me that a run is not deterministic
at the thread level under throttling and sampling, because the random
state is sollicited differently depending on when transaction ends. This
suggest that maybe each thread random_state use should have its own random
state.

In passing, and totally unrelated to this patch:

I've always been a little puzzled about why a quite small 48-bit internal
state random generator is used. I understand the need for pg to have a
portable & state-controlled thread-safe random generator, but why this
particular small one fails me. The source code (src/port/erand48.c,
copyright in 1993...) looks optimized for 16 bits architectures, which is
probably pretty inefficent to run on 64 bits architectures. Maybe this
could be updated with something more consistent with today's processors,
providing more quality at a lower cost.

--
Fabien.

#60

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#58)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client variables
during the repeating of transactions after serialization/deadlock failures).

About this second patch:

This extracts the variable holding structure, so that it is somehow easier
to reset them to their initial state on transaction failures, the
management of which is the ultimate aim of this patch series.

It is also cleaner this way.

Patch applies cleanly on top of the previous one (there is no real
interactions with it). It compiles cleanly. Global & pgbench "make check"
are both ok.

The structure typedef does not need a name. "typedef struct { } V...".

I tend to disagree with naming things after their type, eg "array". I'd
suggest "vars" instead. "nvariables" could be "nvars" for consistency with
that and "vars_sorted", and because "foo.variables->nvariables" starts
looking heavy.

I'd suggest but "Variables" type declaration just after "Variable" type
declaration in the file.

--
Fabien.

#61

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#59)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 09-06-2018 9:55, Fabien COELHO wrote:

Hello Marina,

Hello!

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

A few comments about this first patch.

Thank you very much!

Patch applies cleanly, compiles, global & pgbench "make check" ok.

I'm mostly ok with the changes, which cleanly separate the different
use of random between threads (script choice, throttle delay,
sampling...) and client (random*() calls).

Glad to hear it :)

This change is necessary so that a client can restart a transaction
deterministically (at the client level at least), which is the
ultimate aim of the patch series.

A few remarks:

The RandomState struct is 6 bytes, which will induce some padding when
used. This is life and pre-existing. No problem.

ISTM that the struct itself does not need a name, ie. "typedef struct
{ ... } RandomState" is enough.

Ok!

There could be clear comments, say in the TState and CState structs,
about what randomness is impacted (i.e. script choices, etc.).

Thank you, I'll add them.

getZipfianRand, computeHarmonicZipfian: The "thread" parameter was
justified because it was used for two fieds. As the random state is
separated, I'd suggest that the other argument should be a zipfcache
pointer.

I agree with you and I will change it.

While reading your patch, it occurs to me that a run is not
deterministic at the thread level under throttling and sampling,
because the random state is sollicited differently depending on when
transaction ends. This suggest that maybe each thread random_state use
should have its own random state.

Thank you, I'll fix this.

In passing, and totally unrelated to this patch:

I've always been a little puzzled about why a quite small 48-bit
internal state random generator is used. I understand the need for pg
to have a portable & state-controlled thread-safe random generator,
but why this particular small one fails me. The source code
(src/port/erand48.c, copyright in 1993...) looks optimized for 16 bits
architectures, which is probably pretty inefficent to run on 64 bits
architectures. Maybe this could be updated with something more
consistent with today's processors, providing more quality at a lower
cost.

This sounds interesting, thanks!
*went to look for a multiplier and a summand that are large enough and
are mutually prime..*

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#62

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#60)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 09-06-2018 16:31, Fabien COELHO wrote:

Hello Marina,

Hello!

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

About this second patch:

This extracts the variable holding structure, so that it is somehow
easier to reset them to their initial state on transaction failures,
the management of which is the ultimate aim of this patch series.

It is also cleaner this way.

Patch applies cleanly on top of the previous one (there is no real
interactions with it). It compiles cleanly. Global & pgbench "make
check" are both ok.

:-)

The structure typedef does not need a name. "typedef struct { } V...".

Ok!

I tend to disagree with naming things after their type, eg "array".
I'd suggest "vars" instead. "nvariables" could be "nvars" for
consistency with that and "vars_sorted", and because
"foo.variables->nvariables" starts looking heavy.

I'd suggest but "Variables" type declaration just after "Variable"
type declaration in the file.

Thank you, I agree and I'll fix all this.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#63

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#58)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch
- a patch for the ereport() macro (this is used to report client failures
that do not cause an aborts and this depends on the level of debugging).

ISTM that abort() is called under FATAL.

- implementation: if possible, use the local ErrorData structure during the
errstart()/errmsg()/errfinish() calls. Otherwise use a static variable
protected by a mutex if necessary. To do all of this export the function
appendPQExpBufferVA from libpq.

This patch applies cleanly on top of the other ones (there are minimal
interactions), compiles cleanly, global & pgbench "make check" are ok.

IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could do
without, couldn't it? Moreover, it changes pgbench current behavior, which
might be admissible, but should be discussed clearly.

I'd suggest that it should be an independent submission, unrelated to the
pgbench error management patch.

The code adapts/duplicates existing server-side "ereport" stuff and brings
it to the frontend, where the logging needs are somehow quite different.

I'd prefer to avoid duplication and/or have some code sharing. If it
really needs to be duplicated, I'd suggest to put all this stuff in
separated files. If we want to do that, I think that it would belong to
fe_utils, and where it could/should be used by all front-end programs.

I do not understand why names are changed, eg ELEVEL_FATAL instead of
FATAL. ISTM that part of the point of the move would be to be homogeneous,
which suggests that the same names should be reused.

For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce the
additional parentheses hack for "rest".

I see no actual value in creating on the fly a dynamic buffer through
plenty macros and functions as the end result is just to print the message
out to stderr in the end.

errfinishImpl: fprintf(stderr, "%s", error->message.data);

This looks like overkill. From reading the code, this does not look
like an improvement:

fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con));

ereport(ELEVEL_LOG, (errmsg("invalid socket: %s", PQerrorMessage(st->con))));

The whole complexity of the server-side interface only make sense because
TRY/CATCH stuff and complex logging requirements (eg several outputs) in
the backend. The patch adds quite some code and complexity without clear
added value that I can see.

The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR level
as well.

My 0.02ï¿œ: maybe you just want to turn

fprintf(stderr, format, ...);
// then possibly exit or abort depending...

into

elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not
actually report under some levels and/or some conditions. For that, it
could enough to just provide an nice "elog" function.

In conclusion, which you can disagree with because maybe I have missed
something... anyway I currently think that:

- it should be an independent submission

- possibly at "fe_utils" level

- possibly just a nice "elog" function is enough, if so just do that.

--
Fabien.

#64

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#63)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 10-06-2018 10:38, Fabien COELHO wrote:

Hello Marina,

Hello!

v9-0003-Pgbench-errors-use-the-ereport-macro-to-report-de.patch
- a patch for the ereport() macro (this is used to report client
failures that do not cause an aborts and this depends on the level of
debugging).

ISTM that abort() is called under FATAL.

If you mean abortion of the client, this is not an abortion of the main
program.

- implementation: if possible, use the local ErrorData structure
during the errstart()/errmsg()/errfinish() calls. Otherwise use a
static variable protected by a mutex if necessary. To do all of this
export the function appendPQExpBufferVA from libpq.

This patch applies cleanly on top of the other ones (there are minimal
interactions), compiles cleanly, global & pgbench "make check" are ok.

:-)

IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could
do without, couldn't it?

I'd suggest that it should be an independent submission, unrelated to
the pgbench error management patch.

I suppose that this is related; because of my patch there may be a lot
of such code (see v7 in [1]/messages/by-id/453fa52de88477df2c4a2d82e09e461c@postgrespro.ru):

-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}

-		if (debug)
+		if (debug_level >= DEBUG_ALL)
  			fprintf(stderr, "client %d sending %s\n", st->id, sql);

That's why it was suggested to make the error function which hides all
these things (see [2]/messages/by-id/20180405180807.0bc1114f@wp.localdomain):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

Moreover, it changes pgbench current
behavior, which might be admissible, but should be discussed clearly.

The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR
level as well.

Oh, thanks, I agree with you. And I do not want to change the program
exit code without good reasons, but I'm sorry I may not know all pros
and cons in this matter..

Or did you also mean other changes?

The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.

I was recommended to use the same interface in [3]/messages/by-id/20180508105832.6o3uf3npfpjgk5m7@alvherre.pgsql:

On elog/errstart: we already have a convention for what ereport() calls
look like; I suggest to use that instead of inventing your own.

If it
really needs to be duplicated, I'd suggest to put all this stuff in
separated files. If we want to do that, I think that it would belong
to fe_utils, and where it could/should be used by all front-end
programs.

I'll try to do it..

I do not understand why names are changed, eg ELEVEL_FATAL instead of
FATAL. ISTM that part of the point of the move would be to be
homogeneous, which suggests that the same names should be reused.

Ok!

For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce the
additional parentheses hack for "rest".

I was also recommended to use ereport() instead of elog() in [3]/messages/by-id/20180508105832.6o3uf3npfpjgk5m7@alvherre.pgsql:

With that, is there a need for elog()? In the backend we have it
because $HISTORY but there's no need for that here -- I propose to lose
elog() and use only ereport everywhere.

I see no actual value in creating on the fly a dynamic buffer through
plenty macros and functions as the end result is just to print the
message out to stderr in the end.

errfinishImpl: fprintf(stderr, "%s", error->message.data);

This looks like overkill. From reading the code, this does not look
like an improvement:

fprintf(stderr, "invalid socket: %s", PQerrorMessage(st->con));

vs

ereport(ELEVEL_LOG, (errmsg("invalid socket: %s",
PQerrorMessage(st->con))));

The whole complexity of the server-side interface only make sense
because TRY/CATCH stuff and complex logging requirements (eg several
outputs) in the backend. The patch adds quite some code and complexity
without clear added value that I can see.

My 0.02€: maybe you just want to turn

fprintf(stderr, format, ...);
// then possibly exit or abort depending...

into

elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not
actually report under some levels and/or some conditions. For that, it
could enough to just provide an nice "elog" function.

I agree that elog() can be coded in this way. To use ereport() I need a
structure to store the error level as a condition to exit.

In conclusion, which you can disagree with because maybe I have missed
something... anyway I currently think that:

- it should be an independent submission

- possibly at "fe_utils" level

- possibly just a nice "elog" function is enough, if so just do that.

I hope I answered all this above..

[1]: /messages/by-id/453fa52de88477df2c4a2d82e09e461c@postgrespro.ru
/messages/by-id/453fa52de88477df2c4a2d82e09e461c@postgrespro.ru
[2]: /messages/by-id/20180405180807.0bc1114f@wp.localdomain
/messages/by-id/20180405180807.0bc1114f@wp.localdomain
[3]: /messages/by-id/20180508105832.6o3uf3npfpjgk5m7@alvherre.pgsql
/messages/by-id/20180508105832.6o3uf3npfpjgk5m7@alvherre.pgsql

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#65

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#64)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

I suppose that this is related; because of my patch there may be a lot of
such code (see v7 in [1]):

-			fprintf(stderr,
-					"malformed variable \"%s\" value: 
\"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" 
value: \"%s\"\n",
+						var->name, var->svalue);
+			}

-		if (debug)
+		if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, 
sql);

I'm not sure that debug messages needs to be kept after debug, if it is
about debugging pgbench itself. That is debatable.

That's why it was suggested to make the error function which hides all these
things (see [2]):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

Yep. I did not wrote that, but I agree with an "elog" suggestion to switch

if (...) { fprintf(...); exit/abort/continue/... }

to a simpler:

elog(level, ...)

Moreover, it changes pgbench current behavior, which might be
admissible, but should be discussed clearly.

The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR
level as well.

Oh, thanks, I agree with you. And I do not want to change the program exit
code without good reasons, but I'm sorry I may not know all pros and cons in
this matter..

Or did you also mean other changes?

AFAICR I meant switching exit to abort in some cases.

The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.

I was recommended to use the same interface in [3]:

On elog/errstart: we already have a convention for what ereport()
calls look like; I suggest to use that instead of inventing your own.

The "elog" interface already exists, it is not an invention. "ereport" is
a hack which is somehow necessary in some cases. I prefer a simple
function call if possible for the purpose, and ISTM that this is the case.

If it really needs to be duplicated, I'd suggest to put all this stuff
in separated files. If we want to do that, I think that it would belong
to fe_utils, and where it could/should be used by all front-end
programs.

I'll try to do it..

Dunno. If you only need one "elog" function which prints a message to
stderr and decides whether to abort/exit/whatevrer, maybe it can just be
kept in pgbench. If there are are several complicated functions and
macros, better with a file. So I'd say it depends.

For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce the
additional parentheses hack for "rest".

I was also recommended to use ereport() instead of elog() in [3]:

Probably. Are you hoping that advises from different reviewers should be
consistent? That seems optimistic:-)

With that, is there a need for elog()? In the backend we have it
because $HISTORY but there's no need for that here -- I propose to
lose elog() and use only ereport everywhere.

See commit 8a07ebb3c172 which turns some ereport into elog...

My 0.02€: maybe you just want to turn

fprintf(stderr, format, ...);
// then possibly exit or abort depending...

into

elog(level, format, ...);

which maybe would exit or abort depending on level, and possibly not
actually report under some levels and/or some conditions. For that, it
could enough to just provide an nice "elog" function.

I agree that elog() can be coded in this way. To use ereport() I need a
structure to store the error level as a condition to exit.

Yep. That is a lot of complication which are justified server side where
logging requirements are special, but in this case I see it as overkill.

So my current view is that if you only need an "elog" function, it is
simpler to add it to "pgbench.c".

--
Fabien.

#66

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Fabien COELHO (#65)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-Jun-13, Fabien COELHO wrote:

With that, is there a need for elog()? In the backend we have
it because $HISTORY but there's no need for that here -- I
propose to lose elog() and use only ereport everywhere.

See commit 8a07ebb3c172 which turns some ereport into elog...

For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages. There are many things ereport() has that elog() doesn't, such
as additional message fields (HINT, DETAIL, etc) that I think could have
some use in pgbench as well. If you use elog() then you can't have that.

Another difference is that in the backend, elog() messages are never
translated, while ereport() message are translated. Since pgbench is
translatable I think it would be best to keep those things in sync, to
avoid confusion. (Although of course you could do it differently in
pgbench than backend.)

One thing that just came to mind is that pgbench uses some src/fe_utils
stuff. I hope having ereport() doesn't cause a conflict with that ...

BTW I think abort() is not the right thing, as it'll cause core dumps if
enabled. Why not just exit(1)?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#67

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Alvaro Herrera (#66)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 13-06-2018 22:59, Alvaro Herrera wrote:

For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages. There are many things ereport() has that elog() doesn't,
such
as additional message fields (HINT, DETAIL, etc) that I think could
have
some use in pgbench as well. If you use elog() then you can't have
that.

AFAIU originally it was not supposed that the pgbench error messages
have these fields, so will it be good to change the final output to
stderr?.. For example:

-		fprintf(stderr, "%s", PQerrorMessage(con));
-		fprintf(stderr, "(ignoring this error and continuing anyway)\n");
+		ereport(LOG,
+				(errmsg("Ignoring the server error and continuing anyway"),
+				 errdetail("%s", PQerrorMessage(con))));

-			fprintf(stderr, "%s", PQerrorMessage(con));
-			if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
-			{
-				fprintf(stderr, "Perhaps you need to do initialization (\"pgbench 
-i\") in database \"%s\"\n", PQdb(con));
-			}
-
-			exit(1);
+			ereport(ERROR,
+					(errmsg("Server error"),
+					 errdetail("%s", PQerrorMessage(con)),
+					 sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0 ?
+					 errhint("Perhaps you need to do initialization (\"pgbench -i\") 
in database \"%s\"\n",
+							 PQdb(con)) : 0));

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Import Notes

Resolved by subject fallback

#68

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#65)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 13-06-2018 22:44, Fabien COELHO wrote:

Hello Marina,
I suppose that this is related; because of my patch there may be a lot
of such code (see v7 in [1]):
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);
I'm not sure that debug messages needs to be kept after debug, if it
is about debugging pgbench itself. That is debatable.

AFAICS it is not about debugging pgbench itself, but about more detailed
information that can be used to understand what exactly happened during
its launch. In the case of errors this helps to distinguish between
failures or errors by type (including which limit for retries was
violated and how far it was exceeded for the serialization/deadlock
errors).

The code adapts/duplicates existing server-side "ereport" stuff and
brings it to the frontend, where the logging needs are somehow quite
different.

I'd prefer to avoid duplication and/or have some code sharing.

I was recommended to use the same interface in [3]:

On elog/errstart: we already have a convention for what ereport()
calls look like; I suggest to use that instead of inventing your
own.

The "elog" interface already exists, it is not an invention. "ereport"
is a hack which is somehow necessary in some cases. I prefer a simple
function call if possible for the purpose, and ISTM that this is the
case.

That is a lot of complication which are justified server side
where logging requirements are special, but in this case I see it as
overkill.

I think we need ereport() if we want to make detailed error messages
(see examples in [1]/messages/by-id/c89fcc380a19380260b5ea463efc1416@postgrespro.ru)..

If it really needs to be duplicated, I'd suggest to put all this
stuff in separated files. If we want to do that, I think that it
would belong to fe_utils, and where it could/should be used by all
front-end programs.

I'll try to do it..

Dunno. If you only need one "elog" function which prints a message to
stderr and decides whether to abort/exit/whatevrer, maybe it can just
be kept in pgbench. If there are are several complicated functions and
macros, better with a file. So I'd say it depends.

So my current view is that if you only need an "elog" function, it is
simpler to add it to "pgbench.c".

Thank you!

For logging purposes, ISTM that the "elog" macro interface is nicer,
closer to the existing "fprintf(stderr", as it would not introduce
the
additional parentheses hack for "rest".

I was also recommended to use ereport() instead of elog() in [3]:

Probably. Are you hoping that advises from different reviewers should
be consistent? That seems optimistic:-)

To make the patch committable there should be no objection to it..

[1]: /messages/by-id/c89fcc380a19380260b5ea463efc1416@postgrespro.ru
/messages/by-id/c89fcc380a19380260b5ea463efc1416@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#69

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Alvaro Herrera (#66)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Alvaro,

For context: in the backend, elog() is only used for internal messages
(i.e. "can't-happen" conditions), and ereport() is used for user-facing
messages. There are many things ereport() has that elog() doesn't, such
as additional message fields (HINT, DETAIL, etc) that I think could have
some use in pgbench as well. If you use elog() then you can't have that.
[...]

Ok. Then forget elog, but I'm pretty against having a kind of ereport
which looks greatly overkill to me, because:

(1) the syntax is pretty heavy, and does not look like a function.

(2) the implementation allocates a string buffer for the message
this is greatly overkill for pgbench which only needs to print
to stderr once.

This makes sense server-side because the generated message may be output
several times (eg stderr, file logging, to the client), and the
implementation has to work with cpp implementations which do not handle
varags (and maybe other reasons).

So I would be in favor of having just a simpler error function.
Incidentally, one already exists "pgbench_error" and could be improved,
extended, replaced. There is also "syntax_error".

One thing that just came to mind is that pgbench uses some src/fe_utils
stuff. I hope having ereport() doesn't cause a conflict with that ...

Currently ereport does not exists client-side. I do not think that this
patch is the right moment to decide to do that. Also, there are some
"elog" in libpq, but they are out with a "#ifndef FRONTEND".

BTW I think abort() is not the right thing, as it'll cause core dumps if
enabled. Why not just exit(1)?

Yes, I agree and already reported that.

Conclusion:

My current opinion is that I'm pretty against bringing "ereport" to the
front-end on this specific pgbench patch. I agree with you that "elog"
would be misleading there as well, for the arguments you developed above.

I'd suggest to have just one clean and simple pgbench internal function to
handle errors and possibly exit, debug... Something like

void pgb_error(FATAL, "error %d raised", 12);

Implemented as

void pgb_error(int/enum XXX level, const char * format, ...)
{
test level and maybe return immediately (eg debug);
print to stderr;
exit/abort/return depending;
}

Then if some advanced error handling is introduced for front-end programs,
possibly through some macros, then it would be time to improve upon that.

--
Fabien.

#70

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#58)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v9-0004-Pgbench-errors-and-serialization-deadlock-retries.patch
- the main patch for handling client errors and repetition of transactions
with serialization/deadlock failures (see the detailed description in the
file).

Here is a review for the last part of your v9 version.

Patch does not "git apply" (may anymore):
error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

However I could get it to apply with the "patch" command.

Then patch compiles, global & pgbench "make check" are ok.

Feature
=======

The patch adds the ability to restart transactions (i.e. the full script)
on some errors, which is a good thing as it allows to exercice postgres
performance in more realistic scenarii.

* -d/--debug: I'm not in favor in requiring a mandatory text argument on this
option. It is not pratical, the user has to remember it, and it is a change.
I'm sceptical of the overall debug handling changes. Maybe we could have
multiple -d which lead to higher debug level, but I'm not sure that it can be
made to work for this case and still be compatible with the previous behavior.
Maybe you need a specific option for your purpose, eg "--debug-retry"?

Code
====

* The implementation is less complex that the previous submission, which
is a good thing. I'm not sure that all the remaining complexity is still
fully needed.

* I'm reserved about the whole ereport thing, see comments in other
messages.

Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one and
add other ones for ABORT and PGBENCH, maybe.

* There are no comments about "retries" in StatData, CState and Command
structures.

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code to
understand how these fields relate one to the other.

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish, then
we should distinguish homogeneously: maybe just count the different error
types, eg have things like "deadlock_errors", "serializable_errors",
"other_errors", "internal_pgbench_errors" which would be orthogonal one to
the other, and "errors" could be recomputed from these.

* How "errors" differs from "ecnt" is unclear to me.

* FailureStatus states are not homogeneously named. I'd suggest to use
*_FAILURE for all cases. The miscellaneous case should probably be the
last. I do not understand the distinction between ANOTHER_FAILURE &
IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of
review]

* I do not understand the comments on CState enum: "First, remember the failure
in CSTATE_FAILURE. Then process other commands of the failed transaction if any"
Why would other commands be processed at all if the transaction is aborted?
For me any error must leads to the rollback and possible retry of the
transaction. This comment needs to be clarified. It should also say
that on FAILURE, it will go either to RETRY or ABORTED. See below my
comments about doCustom.

It is unclear to me why their could be several failures within a
transaction, as I would have stopped that it would be aborted on the first
one.

* I do not undestand the purpose of first_failure. The comment should explain
why it would need to be remembered. From my point of view, I'm not fully
convinced that it should.

* commandFailed: I think that it should be kept much simpler. In
particular, having errors on errors does not help much: on ELEVEL_FATAL,
it ignores the actual reported error and generates another error of the
same level, so that the initial issue is hidden. Even if these are can't
happen cases, hidding the origin if it occurs looks unhelpful. Just print
it directly, and maybe abort if you think that it is a can't happen case.

* copyRandomState: just use sizeof(RandomState) instead of making assumptions
about the contents of the struct. Also, this function looks pretty useless,
why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is cleaned up
and so on. The cleanup phase could probaly be in a distinct function, so that
the code would be clearer. Maybe the function variable names are too long.

if (current_source->svalue)

in the context of a guard for a strdup, maybe:

if (current_source->svalue != NULL)

* executeCondition: this hides client automaton state changes which were
clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an information,
all state changes should be clearly seen in the state switch so that it is
easier to understand and follow.

I do not see why touching the conditional stack on internal errors
(evaluateExpr failure) brings anything, the whole transaction will be aborted
anyway.

* doCustom changes.

On CSTATE_START_COMMAND, it considers whether to retry on the end.
For me, this cannot happen: if some command failed, then it should have
skipped directly to the RETRY state, so that you cannot get to the end
of the script with an error. Maybe you could assert that the state of the
previous command is NO_FAILURE, though.

On CSTATE_FAILURE, the next command is possibly started. Although there is some
consistency with the previous point, I think that it totally breaks the state
automaton where now a command can start while the whole transaction is
in failing state anyway. There was no point in starting it in the first
place.

So, for me, the FAILURE state should record/count the failure, then skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the *first*
command again.

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be guarded.

It is unclear to me why backslash command errors are turned to FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?

Function executeCondition is a bad idea, as stated above.

* reporting

The number of transactions above the latency limit report can be simplified.
Remove the if and just use one printf f with a %s for the optional comment.
I'm not sure this optional comment is useful there.

Before the patch, ISTM that all lines relied on one printf. you have
changed to a style where a collection of printf is used to compose a line.
I'd suggest to keep to the previous one-printf-prints-one-line style,
where possible.

You have added 20-columns alignment prints. This looks like too much and
generates much too large lines. Probably 10 (billion) would be enough.

Some people try to parse the output, so it should be deterministic. I'd add
the needed columns always if appropriate (i.e. under retry), even if none
occured.

* processXactStats: An else is replaced by a detailed stats, with the initial
"no detailed stats" comment kept. The function is called both in the thenb
& else branch. The structure does not make sense anymore. I'm not sure
this changed was needed.

* getLatencyUsed: declared "double" so "return 0.0".

* typo: ruin -> run; probably others, I did not check for them in detail.

TAP Tests
=========

On my laptop, tests last 5.5 seconds before the patch, and about 13 seconds
after. This is much too large. Pgbench TAP tests do not deserve to take over
twice as much time as before just on this patch.

One reason which explains this large time is there is a new script with a
new created instance. I'd suggest to append tests to the existing 2
scripts, depending on whether they need a running instance or not.

Secondly, I think that the design of the tests are too heavy. For such a
feature, ISTM enough to check that it works, i.e. one test for deadlocks
(trigger one or a few deadlocks), idem for serializable, maybe idem for
other errors if any.

The challenge is to do that reliably and efficiently, i.e. so that the test does
not rely on chance and is still quite efficient.

The trick you use is to run an interactive psql in parallel to pgbench so as to
play with concurrent locks. That is interesting, but deserves more comments
and explanatation, eg before the test functions.

Maybe this could be achieved within pgbench by using some wait stuff in
PL/pgSQL so that concurrent client can wait one another based on data in
unlogged table updated by a CALL within an "embedded" transactions? Not
sure. Otherwise, maybe (simple) pgbench-side thread barrier could help,
but this would require more thinking.

Anyway, TAP tests should be much lighter (in total time), and if possible
much simpler.

The latency limit to 900 ms try is a bad idea because it takes a lot of time.
I did such tests before and they were removed by Tom Lane because of determinism
and time issues. I would comment this test out for now.

Documentation
=============

Not looked at in much details for now. Just a few comments:

Having the "most important settings" on line 1-6 and 8 (i.e. skipping 7) looks
silly. The important ones should simply be the first ones, and the 8th is not
that important, or it is in 7th position.

I do not understand why there is so much text about in failed sql transaction
stuff, while we are mainly interested in serialization & deadlock errors, and
this only falls in some "other" category. There seems to be more details about
other errors that about deadlocks & serializable errors.

The reporting should focus on what is of interest, either all errors, or some
detailed split of these errors. The documentation should state clearly what
are the counted errors, and then what are their effects on the reported stats.
The "Errors and Serialization/Deadlock Retries" section is a good start in that
direction, but it does not talk about pgbench internal errors (eg "cos(true)").
I think it should more explicit about errors.

Option --max-tries default value should be spelled out in the doc.

"Client's run is aborted", do you mean "Pgbench run is aborted"?

"If a failed transaction block does not terminate in the current script":
this just looks like a very bad idea, and explains my general ranting
above about this error condition. ISTM that the only reasonable option
is that a pgbench script should be inforced as a transaction, or a set of
transactions, but cannot be a "piece" of transaction, i.e. pgbench script
with "BEGIN;" but without a corresponding "COMMIT" is a user error and
warrants an abort, so that there is no need to manage these "in aborted
transaction" errors every where and report about them and document them
extensively.

This means adding a check when a script is finished or starting that
PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if not
with a fatal error. Then we can forget about these "in tx errors" counting,
reporting and so on, and just have to document the restriction.

--
Fabien.

#71

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#70)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 09-07-2018 16:05, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

Here is a review for the last part of your v9 version.

Thank you very much for this!

Patch does not "git apply" (may anymore):
error: patch failed: doc/src/sgml/ref/pgbench.sgml:513
error: doc/src/sgml/ref/pgbench.sgml: patch does not apply

Sorry, I'll send a new version soon.

However I could get it to apply with the "patch" command.

Then patch compiles, global & pgbench "make check" are ok.

:-)

Feature
=======

The patch adds the ability to restart transactions (i.e. the full
script)
on some errors, which is a good thing as it allows to exercice postgres
performance in more realistic scenarii.

* -d/--debug: I'm not in favor in requiring a mandatory text argument
on this
option. It is not pratical, the user has to remember it, and it is a
change.
I'm sceptical of the overall debug handling changes. Maybe we could
have
multiple -d which lead to higher debug level, but I'm not sure that it
can be
made to work for this case and still be compatible with the previous
behavior.
Maybe you need a specific option for your purpose, eg "--debug-retry"?

As you wrote in [1]/messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre, adding an additional option is also a bad idea:

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there
and should just be reused.

Maybe it's better to use an optional argument/arguments for
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the
numbers, now I can see only 2 levels, and there's no guarantee that they
will no change..

Code
====

* The implementation is less complex that the previous submission,
which is a good thing. I'm not sure that all the remaining complexity
is still fully needed.

* I'm reserved about the whole ereport thing, see comments in other
messages.

Thank you, I'll try to implement the error reporting in the way you
suggested.

Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one
and
add other ones for ABORT and PGBENCH, maybe.

Ok!

* There are no comments about "retries" in StatData, CState and Command
structures.

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code
to
understand how these fields relate one to the other.

<...>

* How "errors" differs from "ecnt" is unclear to me.

Thank you, I'll fix this.

* commandFailed: I think that it should be kept much simpler. In
particular, having errors on errors does not help much: on
ELEVEL_FATAL, it ignores the actual reported error and generates
another error of the same level, so that the initial issue is hidden.
Even if these are can't happen cases, hidding the origin if it occurs
looks unhelpful. Just print it directly, and maybe abort if you think
that it is a can't happen case.

Oh, thanks, my mistake(

* copyRandomState: just use sizeof(RandomState) instead of making
assumptions
about the contents of the struct. Also, this function looks pretty
useless,
why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is
cleaned up
and so on. The cleanup phase could probaly be in a distinct function,
so that
the code would be clearer. Maybe the function variable names are too
long.

Thank you, I'll fix this.

if (current_source->svalue)

in the context of a guard for a strdup, maybe:

if (current_source->svalue != NULL)

I'm sorry, I'll fix this.

* I do not understand the comments on CState enum: "First, remember the
failure
in CSTATE_FAILURE. Then process other commands of the failed
transaction if any"
Why would other commands be processed at all if the transaction is
aborted?
For me any error must leads to the rollback and possible retry of the
transaction. This comment needs to be clarified. It should also say
that on FAILURE, it will go either to RETRY or ABORTED. See below my
comments about doCustom.

It is unclear to me why their could be several failures within a
transaction, as I would have stopped that it would be aborted on the
first one.

* I do not undestand the purpose of first_failure. The comment should
explain
why it would need to be remembered. From my point of view, I'm not
fully
convinced that it should.

<...>

* executeCondition: this hides client automaton state changes which
were
clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an
information,
all state changes should be clearly seen in the state switch so that it
is
easier to understand and follow.

I do not see why touching the conditional stack on internal errors
(evaluateExpr failure) brings anything, the whole transaction will be
aborted
anyway.

* doCustom changes.

On CSTATE_START_COMMAND, it considers whether to retry on the end.
For me, this cannot happen: if some command failed, then it should have
skipped directly to the RETRY state, so that you cannot get to the end
of the script with an error. Maybe you could assert that the state of
the
previous command is NO_FAILURE, though.

On CSTATE_FAILURE, the next command is possibly started. Although there
is some
consistency with the previous point, I think that it totally breaks the
state
automaton where now a command can start while the whole transaction is
in failing state anyway. There was no point in starting it in the first
place.

So, for me, the FAILURE state should record/count the failure, then
skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the
*first*
command again.

<...>

It is unclear to me why backslash command errors are turned to FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?

Function executeCondition is a bad idea, as stated above.

So do you propose to execute the command "ROLLBACK" without calculating
its latency etc. if we are in a failed transaction and clear the
conditional stack after each failure?

Also just to be clear: do you want to have the state CSTATE_ABORTED for
client abortion and another state for interrupting the current
transaction?

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal
behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be
guarded.

I think we need these debugging messages because, for example, if you
use the option --latency-limit, you we will never know in advance
whether the serialization/deadlock failure will be retried or not. They
also help to understand which limit of retries was violated or how close
we were to these limits during the execution of a specific transaction.
But I agree with you that they are costly and can be skipped if the
failure type is never retried. Maybe it is better to split them into
multiple error function calls?..

* reporting

The number of transactions above the latency limit report can be
simplified.
Remove the if and just use one printf f with a %s for the optional
comment.
I'm not sure this optional comment is useful there.

Oh, thanks, my mistake(

Before the patch, ISTM that all lines relied on one printf. you have
changed to a style where a collection of printf is used to compose a
line. I'd suggest to keep to the previous one-printf-prints-one-line
style, where possible.

Ok!

You have added 20-columns alignment prints. This looks like too much
and
generates much too large lines. Probably 10 (billion) would be enough.

I have already asked you about this in [2]/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru:

The variables for the numbers of failures and retries are of type int64
since the variable for the total number of transactions has the same
type. That's why such a large alignment (as I understand it now, enough
20 characters). Do you prefer floating alignemnts, depending on the
maximum number of failures/retries for any command in any script?

Some people try to parse the output, so it should be deterministic. I'd
add
the needed columns always if appropriate (i.e. under retry), even if
none
occured.

Ok!

* processXactStats: An else is replaced by a detailed stats, with the
initial
"no detailed stats" comment kept. The function is called both in the
thenb
& else branch. The structure does not make sense anymore. I'm not sure
this changed was needed.

* getLatencyUsed: declared "double" so "return 0.0".

* typo: ruin -> run; probably others, I did not check for them in
detail.

Oh, thanks, my mistakes(

TAP Tests
=========

On my laptop, tests last 5.5 seconds before the patch, and about 13
seconds
after. This is much too large. Pgbench TAP tests do not deserve to take
over
twice as much time as before just on this patch.

One reason which explains this large time is there is a new script
with a new created instance. I'd suggest to append tests to the
existing 2 scripts, depending on whether they need a running instance
or not.

Ok! All new tests that do not need a running instance are already added
to the file 002_pgbench_no_server.pl.

Secondly, I think that the design of the tests are too heavy. For such
a feature, ISTM enough to check that it works, i.e. one test for
deadlocks (trigger one or a few deadlocks), idem for serializable,
maybe idem for other errors if any.

<...>

The latency limit to 900 ms try is a bad idea because it takes a lot of
time.
I did such tests before and they were removed by Tom Lane because of
determinism
and time issues. I would comment this test out for now.

Ok! If it doesn't bother you - can you tell more about the causes of
these determinism issues?.. Tests for some other failures that cannot be
retried are already added to 001_pgbench_with_server.pl.

The challenge is to do that reliably and efficiently, i.e. so that the
test does
not rely on chance and is still quite efficient.

The trick you use is to run an interactive psql in parallel to pgbench
so as to
play with concurrent locks. That is interesting, but deserves more
comments
and explanatation, eg before the test functions.

Maybe this could be achieved within pgbench by using some wait stuff
in PL/pgSQL so that concurrent client can wait one another based on
data in unlogged table updated by a CALL within an "embedded"
transactions? Not sure.

<...>

Anyway, TAP tests should be much lighter (in total time), and if
possible much simpler.

I'll try, thank you..

Otherwise, maybe (simple) pgbench-side thread
barrier could help, but this would require more thinking.

Tests must pass if we use --disable-thread-safety..

Documentation
=============

Not looked at in much details for now. Just a few comments:

Having the "most important settings" on line 1-6 and 8 (i.e. skipping
7) looks
silly. The important ones should simply be the first ones, and the 8th
is not
that important, or it is in 7th position.

Ok!

I do not understand why there is so much text about in failed sql
transaction
stuff, while we are mainly interested in serialization & deadlock
errors, and
this only falls in some "other" category. There seems to be more
details about
other errors that about deadlocks & serializable errors.

The reporting should focus on what is of interest, either all errors,
or some
detailed split of these errors.

<...>

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish,
then we should distinguish homogeneously: maybe just count the
different error types, eg have things like "deadlock_errors",
"serializable_errors", "other_errors", "internal_pgbench_errors" which
would be orthogonal one to the other, and "errors" could be recomputed
from these.

Thank you, I agree with you. Unfortunately each new error type adds a
new 1 or 2 columns of maximum width 20 to the per-statement report (to
report errors and possibly retries of this type in this statement) and
we already have 2 new columns for all errors and retries. So I'm not
sure that we need add anything other than statistics only about all the
errors and all the retries in general.

The documentation should state clearly what
are the counted errors, and then what are their effects on the reported
stats.
The "Errors and Serialization/Deadlock Retries" section is a good start
in that
direction, but it does not talk about pgbench internal errors (eg
"cos(true)").
I think it should more explicit about errors.

Thank you, I'll try to improve it.

Option --max-tries default value should be spelled out in the doc.

If you mean that it is set to 1 if neither of the options --max-tries or
--latency-limit is explicitly used, I'll fix this.

"Client's run is aborted", do you mean "Pgbench run is aborted"?

No, other clients continue their run as usual.

* FailureStatus states are not homogeneously named. I'd suggest to use
*_FAILURE for all cases. The miscellaneous case should probably be the
last. I do not understand the distinction between ANOTHER_FAILURE &
IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of
review]

<...>

"If a failed transaction block does not terminate in the current
script":
this just looks like a very bad idea, and explains my general ranting
above about this error condition. ISTM that the only reasonable option
is that a pgbench script should be inforced as a transaction, or a set
of
transactions, but cannot be a "piece" of transaction, i.e. pgbench
script
with "BEGIN;" but without a corresponding "COMMIT" is a user error and
warrants an abort, so that there is no need to manage these "in aborted
transaction" errors every where and report about them and document them
extensively.

This means adding a check when a script is finished or starting that
PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if
not
with a fatal error. Then we can forget about these "in tx errors"
counting,
reporting and so on, and just have to document the restriction.

Ok!

[1]: /messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre
/messages/by-id/alpine.DEB.2.20.1801031720270.20034@lancre
[2]: /messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru
/messages/by-id/e4c5e8cefa4a8e88f1273b0f1ee29e56@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#72

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#71)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

* -d/--debug: I'm not in favor in requiring a mandatory text argument on
this option.

As you wrote in [1], adding an additional option is also a bad idea:

Hey, I'm entitled to some internal contradictions:-)

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there and should just be reused.

I was thinking that you could just use the existing --debug, not change
its syntax. My point was that --debug exists, and you could just print
the messages when under --debug.

Maybe it's better to use an optional argument/arguments for compatibility
(--debug[=fails] or --debug[=NUM])? But if we use the numbers, now I can see
only 2 levels, and there's no guarantee that they will no change..

Optional arguments to options (!) are not really clean things, so I'd like
to avoid going onto this path, esp. as I cannot see any other instance in
pgbench or elsewhere in postgres, and I personnaly consider these as a bad
idea.

So if absolutely necessary, a new option is still better than changing
--debug syntax. If not necessary, then it is better:-)

* I'm reserved about the whole ereport thing, see comments in other
messages.

Thank you, I'll try to implement the error reporting in the way you
suggested.

Dunno if it is a good idea either. The committer word is the good one in
the end:-ï¿œ

Thank you, I'll fix this.
I'm sorry, I'll fix this.

You do not have to thank me or being sorry on every comment I do, once a
the former is enough, and there is no need for the later.

* doCustom changes.

On CSTATE_FAILURE, the next command is possibly started. Although there
is some consistency with the previous point, I think that it totally
breaks the state automaton where now a command can start while the
whole transaction is in failing state anyway. There was no point in
starting it in the first place.

So, for me, the FAILURE state should record/count the failure, then skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the
*first* command again.
<...>

It is unclear to me why backslash command errors are turned to FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?

So do you propose to execute the command "ROLLBACK" without calculating its
latency etc. if we are in a failed transaction and clear the conditional
stack after each failure?

Also just to be clear: do you want to have the state CSTATE_ABORTED for
client abortion and another state for interrupting the current transaction?

I do not understand what "interrupting the current transaction" means. A
transaction is either committed or rollbacked, I do not know about
"interrupted". When it is rollbacked, probably some stats will be
collected in passing, I'm fine with that.

If there is an error in a pgbench script, the transaction is aborted,
which means for me that the script execution is stopped where it was, and
either it is restarted from the beginning (retry) or counted as failure
(not retry, just aborted, really).

If by interrupted you mean that one script begins a transaction and
another ends it, as I said in the review I think that this strange case
should be forbidden, so that all the code and documentation trying to
manage that can be removed.

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be
guarded.

I think we need these debugging messages because, for example,

Debugging message should cost only when under debug. When not under debug,
there should be no debugging message, and there should be no cost for
building and discarding such messages in the executed code path beyond
testing whether program is under debug.

if you use the option --latency-limit, you we will never know in advance
whether the serialization/deadlock failure will be retried or not.

ISTM that it will be shown final report. If I want debug, I ask for
--debug, otherwise I think that the command should do what it was asked
for, i.e. run scripts, collect performance statistics and show them at the
end.

In particular, when running with retries is enabled, the user is expecting
deadlock/serialization errors, so that they are not "errors" as such for
them.

They also help to understand which limit of retries was violated or how
close we were to these limits during the execution of a specific
transaction. But I agree with you that they are costly and can be
skipped if the failure type is never retried. Maybe it is better to
split them into multiple error function calls?..

Debugging message costs should only be incurred when under --debug, not
otherwise.

You have added 20-columns alignment prints. This looks like too much and
generates much too large lines. Probably 10 (billion) would be enough.

I have already asked you about this in [2]:

Probably:-)

The variables for the numbers of failures and retries are of type int64
since the variable for the total number of transactions has the same
type. That's why such a large alignment (as I understand it now, enough
20 characters). Do you prefer floating alignemnts, depending on the
maximum number of failures/retries for any command in any script?

An int64 counter is not likely to reach its limit anytime soon:-) If the
column display limit is ever reached, ISTM that then the text is just
misaligned, which is a minor and rare inconvenience. If very wide columns
are used, then it does not fit my terminal and the report text will always
be wrapped around, which makes it harder to read, every time.

The latency limit to 900 ms try is a bad idea because it takes a lot of
time. I did such tests before and they were removed by Tom Lane because
of determinism and time issues. I would comment this test out for now.

Ok! If it doesn't bother you - can you tell more about the causes of these
determinism issues?.. Tests for some other failures that cannot be retried
are already added to 001_pgbench_with_server.pl.

Some farm animals are very slow, so you cannot really assume much about
time one way or another.

Otherwise, maybe (simple) pgbench-side thread
barrier could help, but this would require more thinking.

Tests must pass if we use --disable-thread-safety..

Sure. My wording was misleading. I just meant a synchronisation barrier
between concurrent clients, which could be managed with one thread.
Anyway, it is probably overkill for the problem at hand, so just forget.

I do not understand why there is so much text about in failed sql
transaction stuff, while we are mainly interested in serialization &
deadlock errors, and this only falls in some "other" category. There
seems to be more details about other errors that about deadlocks &
serializable errors.

The reporting should focus on what is of interest, either all errors,
or some detailed split of these errors.

<...>

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish,
then we should distinguish homogeneously: maybe just count the
different error types, eg have things like "deadlock_errors",
"serializable_errors", "other_errors", "internal_pgbench_errors" which
would be orthogonal one to the other, and "errors" could be recomputed
from these.

Thank you, I agree with you. Unfortunately each new error type adds a new 1
or 2 columns of maximum width 20 to the per-statement report

The fact that some data are collected does not mean that they should all
be reported in detail. We can have detailed error count and report the sum
of this errors for instance, or have some more verbose/detailed reports
as options (eg --latencies does just that).

<...>

"If a failed transaction block does not terminate in the current script":
this just looks like a very bad idea, and explains my general ranting
above about this error condition. ISTM that the only reasonable option
is that a pgbench script should be inforced as a transaction, or a set of
transactions, but cannot be a "piece" of transaction, i.e. pgbench script
with "BEGIN;" but without a corresponding "COMMIT" is a user error and
warrants an abort, so that there is no need to manage these "in aborted
transaction" errors every where and report about them and document them
extensively.

This means adding a check when a script is finished or starting that
PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if not
with a fatal error. Then we can forget about these "in tx errors" counting,
reporting and so on, and just have to document the restriction.

Ok!

Good:-) ISTM that this would remove a significant amount of complexity
from the code and documentation.

--
Fabien.

#73

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#72)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-07-2018 16:24, Fabien COELHO wrote:

Hello Marina,

* -d/--debug: I'm not in favor in requiring a mandatory text argument
on this option.

As you wrote in [1], adding an additional option is also a bad idea:

Hey, I'm entitled to some internal contradictions:-)

... and discussions will be continue forever %-)

I'm sceptical of the "--debug-fails" options. ISTM that --debug is
already there and should just be reused.

I was thinking that you could just use the existing --debug, not
change its syntax. My point was that --debug exists, and you could
just print
the messages when under --debug.

Now I understand you better, thanks. I think it will be useful to
receive only messages about failures, because they and progress reports
can be lost in many other debug messages such as "client %d sending ..."
/ "client %d executing ..." / "client %d receiving".

Maybe it's better to use an optional argument/arguments for
compatibility (--debug[=fails] or --debug[=NUM])? But if we use the
numbers, now I can see only 2 levels, and there's no guarantee that
they will no change..

Optional arguments to options (!) are not really clean things, so I'd
like to avoid going onto this path, esp. as I cannot see any other
instance in pgbench or elsewhere in postgres,

AFAICS they are used in pg_waldump (option --stats[=record]) and in psql
(option --help[=topic]).

and I personnaly
consider these as a bad idea.

So if absolutely necessary, a new option is still better than changing
--debug syntax. If not necessary, then it is better:-)

Ok!

* I'm reserved about the whole ereport thing, see comments in other
messages.

Thank you, I'll try to implement the error reporting in the way you
suggested.

Dunno if it is a good idea either. The committer word is the good one
in the end:-à

I agree with you that ereport has good reasons to be non-trivial in the
backend and it does not have the same in pgbench..

* doCustom changes.

On CSTATE_FAILURE, the next command is possibly started. Although
there is some consistency with the previous point, I think that it
totally breaks the state automaton where now a command can start
while the whole transaction is in failing state anyway. There was no
point in starting it in the first place.

So, for me, the FAILURE state should record/count the failure, then
skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the
*first* command again.
<...>

It is unclear to me why backslash command errors are turned to
FAILURE
instead of ABORTED: there is no way they are going to be retried, so
maybe they should/could skip directly to ABORTED?

So do you propose to execute the command "ROLLBACK" without
calculating its latency etc. if we are in a failed transaction and
clear the conditional stack after each failure?

Also just to be clear: do you want to have the state CSTATE_ABORTED
for client abortion and another state for interrupting the current
transaction?

I do not understand what "interrupting the current transaction" means.
A transaction is either committed or rollbacked, I do not know about
"interrupted".

I mean that IIUC the server usually only reports the error and you must
manually send the command "END" or "ROLLBACK" to rollback a failed
transaction.

When it is rollbacked, probably some stats will be
collected in passing, I'm fine with that.

If there is an error in a pgbench script, the transaction is aborted,
which means for me that the script execution is stopped where it was,
and either it is restarted from the beginning (retry) or counted as
failure (not retry, just aborted, really).

If by interrupted you mean that one script begins a transaction and
another ends it, as I said in the review I think that this strange
case should be forbidden, so that all the code and documentation
trying to
manage that can be removed.

Ok!

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and
useless
operation. If the user required "retries", then this is normal
behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be
guarded.

I think we need these debugging messages because, for example,

Debugging message should cost only when under debug. When not under
debug, there should be no debugging message, and there should be no
cost for building and discarding such messages in the executed code
path beyond
testing whether program is under debug.

if you use the option --latency-limit, you we will never know in
advance whether the serialization/deadlock failure will be retried or
not.

ISTM that it will be shown final report. If I want debug, I ask for
--debug, otherwise I think that the command should do what it was
asked for, i.e. run scripts, collect performance statistics and show
them at the end.

In particular, when running with retries is enabled, the user is
expecting deadlock/serialization errors, so that they are not "errors"
as such for
them.

They also help to understand which limit of retries was violated or
how close we were to these limits during the execution of a specific
transaction. But I agree with you that they are costly and can be
skipped if the failure type is never retried. Maybe it is better to
split them into multiple error function calls?..

Debugging message costs should only be incurred when under --debug,
not otherwise.

Ok! IIUC instead of this part of the code

initPQExpBuffer(&errmsg_buf);
printfPQExpBuffer(&errmsg_buf,
"client %d repeats the failed transaction (try %d",
st->id, st->retries + 1);
if (max_tries)
appendPQExpBuffer(&errmsg_buf, "/%d", max_tries);
if (latency_limit)
{
appendPQExpBuffer(&errmsg_buf,
", %.3f%% of the maximum time of tries was used",
getLatencyUsed(st, &now));
}
appendPQExpBufferStr(&errmsg_buf, ")\n");
pgbench_error(DEBUG_FAIL, "%s", errmsg_buf.data);
termPQExpBuffer(&errmsg_buf);

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",
st->id, st->retries + 1);
if (max_tries)
PGBENCH_ERROR("/%d", max_tries);
if (latency_limit)
{
PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used",
getLatencyUsed(st, &now));
}
PGBENCH_ERROR(")\n");
}
PGBENCH_ERROR_END();

You have added 20-columns alignment prints. This looks like too much
and
generates much too large lines. Probably 10 (billion) would be
enough.

I have already asked you about this in [2]:

Probably:-)

The variables for the numbers of failures and retries are of type
int64
since the variable for the total number of transactions has the same
type. That's why such a large alignment (as I understand it now,
enough
20 characters). Do you prefer floating alignemnts, depending on the
maximum number of failures/retries for any command in any script?

An int64 counter is not likely to reach its limit anytime soon:-) If
the column display limit is ever reached, ISTM that then the text is
just misaligned, which is a minor and rare inconvenience. If very wide
columns are used, then it does not fit my terminal and the report text
will always be wrapped around, which makes it harder to read, every
time.

Ok!

The latency limit to 900 ms try is a bad idea because it takes a lot
of time. I did such tests before and they were removed by Tom Lane
because of determinism and time issues. I would comment this test out
for now.

Ok! If it doesn't bother you - can you tell more about the causes of
these determinism issues?.. Tests for some other failures that cannot
be retried are already added to 001_pgbench_with_server.pl.

Some farm animals are very slow, so you cannot really assume much
about time one way or another.

Thanks!

I do not understand why there is so much text about in failed sql
transaction stuff, while we are mainly interested in serialization &
deadlock errors, and this only falls in some "other" category. There
seems to be more details about other errors that about deadlocks &
serializable errors.

The reporting should focus on what is of interest, either all errors,
or some detailed split of these errors.

<...>

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish,
then we should distinguish homogeneously: maybe just count the
different error types, eg have things like "deadlock_errors",
"serializable_errors", "other_errors", "internal_pgbench_errors"
which
would be orthogonal one to the other, and "errors" could be
recomputed
from these.

Thank you, I agree with you. Unfortunately each new error type adds a
new 1 or 2 columns of maximum width 20 to the per-statement report

The fact that some data are collected does not mean that they should
all be reported in detail. We can have detailed error count and report
the sum of this errors for instance, or have some more
verbose/detailed reports
as options (eg --latencies does just that).

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#74

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Marina Polyakova (#73)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-Jul-11, Marina Polyakova wrote:

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",
st->id, st->retries + 1);
if (max_tries)
PGBENCH_ERROR("/%d", max_tries);
if (latency_limit)
{
PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used",
getLatencyUsed(st, &now));
}
PGBENCH_ERROR(")\n");
}
PGBENCH_ERROR_END();

I didn't quite understand what these PGBENCH_ERROR() functions/macros
are supposed to do. Care to explain?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#75

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Marina Polyakova (#58)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Just a quick skim while refreshing what were those error reporting API
changes about ...

On 2018-May-21, Marina Polyakova wrote:

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a client's
random seed during the repeating of transactions after
serialization/deadlock failures).

LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing. Maybe that's just me.

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after serialization/deadlock
failures).

Please don't allocate Variable structs one by one. First time allocate
some decent number (say 8) and then enlarge by duplicating size. That
way you save realloc overhead. We use this technique everywhere else,
no reason do different here. Other than that, LGTM.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#76

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#73)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",

Argh, no? I was thinking of something much more trivial:

pgbench_error(DEBUG, "message format %d %s...", 12, "hello world");

If you really need some complex dynamic buffer, and I would prefer
that you avoid that, then the fallback is:

if (level >= DEBUG)
{
initPQstuff(&msg);
...
pgbench_error(DEBUG, "fixed message... %s\n", msg);
freePQstuff(&msg);
}

The point is to avoid building the message with dynamic allocation and so
if in the end it is not used.

--
Fabien.

#77

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Alvaro Herrera (#74)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-07-2018 20:49, Alvaro Herrera wrote:

On 2018-Jul-11, Marina Polyakova wrote:

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",
st->id, st->retries + 1);
if (max_tries)
PGBENCH_ERROR("/%d", max_tries);
if (latency_limit)
{
PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used",
getLatencyUsed(st, &now));
}
PGBENCH_ERROR(")\n");
}
PGBENCH_ERROR_END();

I didn't quite understand what these PGBENCH_ERROR() functions/macros
are supposed to do. Care to explain?

It is used only to print a string with the given arguments to stderr.
Probably it might be just the function pgbench_error and not a macro..

P.S. This is my mistake, I did not think that PGBENCH_ERROR_END does not
know the elevel for calling exit(1) if the elevel >= ERROR.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#78

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Alvaro Herrera (#75)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-07-2018 21:04, Alvaro Herrera wrote:

Just a quick skim while refreshing what were those error reporting API
changes about ...

Thank you!

On 2018-May-21, Marina Polyakova wrote:

v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch
- a patch for the RandomState structure (this is used to reset a
client's
random seed during the repeating of transactions after
serialization/deadlock failures).

LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing. Maybe that's just me.

IIUC, do you like "xseed" instead of "data"?

typedef struct RandomState
{
- unsigned short data[3];
+ unsigned short xseed[3];
} RandomState;

Or do you want to rename "random_state" in the structures RetryState /
CState / TState? Thanks to Fabien Coelho' comments in [1]/messages/by-id/alpine.DEB.2.21.1806090810090.5307@lancre, TState can
contain several RandomStates for different purposes, something like
this:

/*
* Thread state
*/
typedef struct
{
...
/*
* Separate randomness for each thread. Each thread option uses its own
* random state to make all of them independent of each other and
therefore
* deterministic at the thread level.
*/
RandomState choose_script_rs; /* random state for selecting a script */
RandomState throttling_rs; /* random state for transaction throttling
*/
RandomState sampling_rs; /* random state for log sampling */
...
} TState;

v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock
failures).

Please don't allocate Variable structs one by one. First time allocate
some decent number (say 8) and then enlarge by duplicating size. That
way you save realloc overhead. We use this technique everywhere else,
no reason do different here. Other than that, LGTM.

Ok!

[1]: /messages/by-id/alpine.DEB.2.21.1806090810090.5307@lancre
/messages/by-id/alpine.DEB.2.21.1806090810090.5307@lancre

While reading your patch, it occurs to me that a run is not
deterministic
at the thread level under throttling and sampling, because the random
state is sollicited differently depending on when transaction ends.
This
suggest that maybe each thread random_state use should have its own
random
state.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#79

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#76)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-07-2018 22:34, Fabien COELHO wrote:

can we try something like this?

PGBENCH_ERROR_START(DEBUG_FAIL)
{
PGBENCH_ERROR("client %d repeats the failed transaction (try %d",

Argh, no? I was thinking of something much more trivial:

pgbench_error(DEBUG, "message format %d %s...", 12, "hello world");

If you really need some complex dynamic buffer, and I would prefer
that you avoid that, then the fallback is:

if (level >= DEBUG)
{
initPQstuff(&msg);
...
pgbench_error(DEBUG, "fixed message... %s\n", msg);
freePQstuff(&msg);
}

The point is to avoid building the message with dynamic allocation and
so
if in the end it is not used.

Ok! About avoidance - I'm afraid there's one more piece of debugging
code with the same problem:

else if (command->type == META_COMMAND)
{
...
initPQExpBuffer(&errmsg_buf);
printfPQExpBuffer(&errmsg_buf, "client %d executing \\%s",
st->id, argv[0]);
for (i = 1; i < argc; i++)
appendPQExpBuffer(&errmsg_buf, " %s", argv[i]);
appendPQExpBufferChar(&errmsg_buf, '\n');
ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data)));
termPQExpBuffer(&errmsg_buf);

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#80

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#79)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

The point is to avoid building the message with dynamic allocation and so
if in the end it is not used.

Ok! About avoidance - I'm afraid there's one more piece of debugging code
with the same problem:

Indeed. I'd like to avoid all instances, so that PQExpBufferData is not
needed anywhere, if possible. If not possible, then too bad, but I'd
prefer to make do with formatted prints only, for simplicity.

--
Fabien.

#81

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#80)

4 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello, hackers!

Here there's a tenth version of the patch for error handling and
retrying of transactions with serialization/deadlock failures in pgbench
(based on the commit e0ee93053998b159e395deed7c42e02b1f921552) thanks to
the comments of Fabien Coelho and Alvaro Herrera in this thread.

v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to
report client failures that do not cause an aborts and this depends on
the level of debugging).

v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

As Fabien wrote in [5]/messages/by-id/alpine.DEB.2.21.1807091451520.17811@lancre, some of the new tests were too slow. Earlier on
my laptop they increased the testing time of pgbench from 5.5 seconds to
12.5 seconds. In the new version the testing time of pgbench takes about
7 seconds. These tests include one test for serialization failure and
retry, as well as one test for deadlock failure and retry. Both of them
are in file 001_pgbench_with_server.pl, each test uses only one pgbench
run, they use PL/pgSQL scripts instead of a parallel psql session.

Any suggestions are welcome!

All that was fixed from the previous version:

[1]: /messages/by-id/alpine.DEB.2.21.1806090810090.5307@lancre
/messages/by-id/alpine.DEB.2.21.1806090810090.5307@lancre

ISTM that the struct itself does not need a name, ie. "typedef struct {
... } RandomState" is enough.

There could be clear comments, say in the TState and CState structs,
about
what randomness is impacted (i.e. script choices, etc.).

getZipfianRand, computeHarmonicZipfian: The "thread" parameter was
justified because it was used for two fieds. As the random state is
separated, I'd suggest that the other argument should be a zipfcache
pointer.

While reading your patch, it occurs to me that a run is not
deterministic
at the thread level under throttling and sampling, because the random
state is sollicited differently depending on when transaction ends.
This
suggest that maybe each thread random_state use should have its own
random
state.

[2]: /messages/by-id/alpine.DEB.2.21.1806091514060.3655@lancre
/messages/by-id/alpine.DEB.2.21.1806091514060.3655@lancre

The structure typedef does not need a name. "typedef struct { } V...".

I tend to disagree with naming things after their type, eg "array". I'd
suggest "vars" instead. "nvariables" could be "nvars" for consistency
with
that and "vars_sorted", and because "foo.variables->nvariables" starts
looking heavy.

I'd suggest but "Variables" type declaration just after "Variable" type
declaration in the file.

[3]: /messages/by-id/alpine.DEB.2.21.1806100837380.3655@lancre
/messages/by-id/alpine.DEB.2.21.1806100837380.3655@lancre

The semantics of the existing code is changed, the FATAL levels calls
abort() and replace existing exit(1) calls. Maybe you want an ERROR
level as well.

I do not understand why names are changed, eg ELEVEL_FATAL instead of
FATAL. ISTM that part of the point of the move would be to be
homogeneous,
which suggests that the same names should be reused.

[4]: /messages/by-id/alpine.DEB.2.21.1807081014260.17811@lancre
/messages/by-id/alpine.DEB.2.21.1807081014260.17811@lancre

I'd suggest to have just one clean and simple pgbench internal function
to
handle errors and possibly exit, debug... Something like

void pgb_error(FATAL, "error %d raised", 12);

Implemented as

void pgb_error(int/enum XXX level, const char * format, ...)
{
test level and maybe return immediately (eg debug);
print to stderr;
exit/abort/return depending;
}

[5]: /messages/by-id/alpine.DEB.2.21.1807091451520.17811@lancre
/messages/by-id/alpine.DEB.2.21.1807091451520.17811@lancre

Leves ELEVEL_LOG_CLIENT_{FAIL,ABORTED} & LOG_MAIN look unclear to me.
In particular, the "CLIENT" part is not very useful. If the
distinction makes sense, I would have kept "LOG" for the initial one
and
add other ones for ABORT and PGBENCH, maybe.

* There are no comments about "retries" in StatData, CState and Command
structures.

* Also, for StatData, I would like to understand the logic between cnt,
skipped, retries, retried, errors, ... so a clear information about the
expected invariant if any would be welcome. One has to go in the code
to
understand how these fields relate one to the other.

* "errors_in_failed_tx" is some subcounter of "errors", for a special
case. Why it is there fails me [I finally understood, and I think it
should be removed, see end of review]. If we wanted to distinguish,
then
we should distinguish homogeneously: maybe just count the different
error
types, eg have things like "deadlock_errors", "serializable_errors",
"other_errors", "internal_pgbench_errors" which would be orthogonal one
to
the other, and "errors" could be recomputed from these.

* How "errors" differs from "ecnt" is unclear to me.

* FailureStatus states are not homogeneously named. I'd suggest to use
*_FAILURE for all cases. The miscellaneous case should probably be the
last.

* I do not understand the comments on CState enum: "First, remember the
failure
in CSTATE_FAILURE. Then process other commands of the failed
transaction if any"
Why would other commands be processed at all if the transaction is
aborted?
For me any error must leads to the rollback and possible retry of the
transaction.
...
So, for me, the FAILURE state should record/count the failure, then
skip
to RETRY if a retry is decided, else proceed to ABORT. Nothing else.
This is much clearer that way.

Then RETRY should reinstate the global state and proceed to start the
*first*
command again.

* commandFailed: I think that it should be kept much simpler. In
particular, having errors on errors does not help much: on
ELEVEL_FATAL,
it ignores the actual reported error and generates another error of the
same level, so that the initial issue is hidden. Even if these are
can't
happen cases, hidding the origin if it occurs looks unhelpful. Just
print
it directly, and maybe abort if you think that it is a can't happen
case.

* copyRandomState: just use sizeof(RandomState) instead of making
assumptions
about the contents of the struct. Also, this function looks pretty
useless,
why not just do a plain assignment?

* copyVariables: lacks comments to explain that the destination is
cleaned up
and so on. The cleanup phase could probaly be in a distinct function,
so that
the code would be clearer. Maybe the function variable names are too
long.

if (current_source->svalue)

in the context of a guard for a strdup, maybe:

if (current_source->svalue != NULL)

* executeCondition: this hides client automaton state changes which
were
clearly visible beforehand in the switch, and the different handling of
if & elif is also hidden.

I'm against this unnecessary restructuring and to hide such an
information,
all state changes should be clearly seen in the state switch so that it
is
easier to understand and follow.

I do not see why touching the conditional stack on internal errors
(evaluateExpr failure) brings anything, the whole transaction will be
aborted
anyway.

The current RETRY state does memory allocations to generate a message
with buffer allocation and so on. This looks like a costly and useless
operation. If the user required "retries", then this is normal
behavior,
the retries are counted and will be printed out in the final report,
and there is no point in printing out every single one of them.
Maybe you want that debugging, but then coslty operations should be
guarded.

The number of transactions above the latency limit report can be
simplified.
Remove the if and just use one printf f with a %s for the optional
comment.
I'm not sure this optional comment is useful there.

Before the patch, ISTM that all lines relied on one printf. you have
changed to a style where a collection of printf is used to compose a
line.
I'd suggest to keep to the previous one-printf-prints-one-line style,
where possible.

You have added 20-columns alignment prints. This looks like too much
and
generates much too large lines. Probably 10 (billion) would be enough.

Some people try to parse the output, so it should be deterministic. I'd
add
the needed columns always if appropriate (i.e. under retry), even if
none
occured.

* processXactStats: An else is replaced by a detailed stats, with the
initial
"no detailed stats" comment kept. The function is called both in the
thenb
& else branch. The structure does not make sense anymore. I'm not sure
this changed was needed.

* getLatencyUsed: declared "double" so "return 0.0".

* typo: ruin -> run; probably others, I did not check for them in
detail.

On my laptop, tests last 5.5 seconds before the patch, and about 13
seconds
after. This is much too large. Pgbench TAP tests do not deserve to take
over
twice as much time as before just on this patch.

One reason which explains this large time is there is a new script with
a
new created instance. I'd suggest to append tests to the existing 2
scripts, depending on whether they need a running instance or not.

Secondly, I think that the design of the tests are too heavy. For such
a
feature, ISTM enough to check that it works, i.e. one test for
deadlocks
(trigger one or a few deadlocks), idem for serializable, maybe idem for
other errors if any.

The challenge is to do that reliably and efficiently, i.e. so that the
test does
not rely on chance and is still quite efficient.

The trick you use is to run an interactive psql in parallel to pgbench
so as to
play with concurrent locks. That is interesting, but deserves more
comments
and explanatation, eg before the test functions.

Maybe this could be achieved within pgbench by using some wait stuff in
PL/pgSQL so that concurrent client can wait one another based on data
in
unlogged table updated by a CALL within an "embedded" transactions? Not
sure. ...

Anyway, TAP tests should be much lighter (in total time), and if
possible
much simpler.

The latency limit to 900 ms try is a bad idea because it takes a lot of
time.
I did such tests before and they were removed by Tom Lane because of
determinism
and time issues. I would comment this test out for now.

Documentation
...
Having the "most important settings" on line 1-6 and 8 (i.e. skipping
7) looks
silly. The important ones should simply be the first ones, and the 8th
is not
that important, or it is in 7th position.

I do not understand why there is so much text about in failed sql
transaction
stuff, while we are mainly interested in serialization & deadlock
errors, and
this only falls in some "other" category. There seems to be more
details about
other errors that about deadlocks & serializable errors.

The reporting should focus on what is of interest, either all errors,
or some
detailed split of these errors. The documentation should state clearly
what
are the counted errors, and then what are their effects on the reported
stats.
The "Errors and Serialization/Deadlock Retries" section is a good start
in that
direction, but it does not talk about pgbench internal errors (eg
"cos(true)").
I think it should more explicit about errors.

Option --max-tries default value should be spelled out in the doc.

[6]: /messages/by-id/alpine.DEB.2.21.1807111435250.27883@lancre
/messages/by-id/alpine.DEB.2.21.1807111435250.27883@lancre

So if absolutely necessary, a new option is still better than changing
--debug syntax. If not necessary, then it is better:-)

The fact that some data are collected does not mean that they should
all
be reported in detail. We can have detailed error count and report the
sum
of this errors for instance, or have some more verbose/detailed reports
as options (eg --latencies does just that).

[7]: /messages/by-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql
/messages/by-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql

LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing. Maybe that's just me.

Please don't allocate Variable structs one by one. First time allocate
some decent number (say 8) and then enlarge by duplicating size. That
way you save realloc overhead. We use this technique everywhere else,
no reason do different here. Other than that, LGTM.

[8]: /messages/by-id/alpine.DEB.2.21.1807112124210.27883@lancre
/messages/by-id/alpine.DEB.2.21.1807112124210.27883@lancre

If you really need some complex dynamic buffer, and I would prefer
that you avoid that, then the fallback is:

if (level >= DEBUG)
{
initPQstuff(&msg);
...
pgbench_error(DEBUG, "fixed message... %s\n", msg);
freePQstuff(&msg);
}

The point is to avoid building the message with dynamic allocation and
so
if in the end it is not used.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patchtext/x-diff; charset=us-ascii; name=v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patchDownload

From d71a8157b48880722092e6e747f684fb31d8d019 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 7 Aug 2018 13:27:21 +0300
Subject: [PATCH v10 1/4] Pgbench errors: use the RandomState structure for
 thread/client random seed.

This is most important when it is used to reset a client's random seed during
the repeating of transactions after serialization/deadlock failures.

Use the random state of the client for random functions PGBENCH_RANDOM_* during
the execution of the script. Use the random state of the each thread option (to
choose the script / get the throttle delay / to log with a sample rate) to make
all of them independent of each other and therefore deterministic at the thread
level.
---
 src/bin/pgbench/pgbench.c | 104 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 74 insertions(+), 30 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c..988e37b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -251,6 +251,14 @@ typedef struct StatsData
 } StatsData;
 
 /*
+ * Data structure for thread/client random seed.
+ */
+typedef struct
+{
+	unsigned short xseed[3];
+} RandomState;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -331,6 +339,12 @@ typedef struct
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
 
+	/*
+	 * Separate randomness for each client. This is used for random functions
+	 * PGBENCH_RANDOM_* during the execution of the script.
+	 */
+	RandomState random_state;
+
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
@@ -390,7 +404,16 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+
+	/*
+	 * Separate randomness for each thread. Each thread option uses its own
+	 * random state to make all of them independent of each other and therefore
+	 * deterministic at the thread level.
+	 */
+	RandomState choose_script_rs;	/* random state for selecting a script */
+	RandomState throttling_rs;	/* random state for transaction throttling */
+	RandomState sampling_rs;	/* random state for log sampling */
+
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -694,7 +717,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -705,7 +728,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->xseed));
 }
 
 /*
@@ -714,7 +737,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -724,7 +748,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->xseed);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -737,7 +761,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -765,8 +790,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * https://en.wikipedia.org/wiki/Box-Muller_transform)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->xseed);
+		double		rand2 = 1.0 - pg_erand48(random_state->xseed);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -793,7 +818,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -802,7 +827,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->xseed);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -880,7 +905,7 @@ zipfFindOrCreateCacheCell(ZipfCache *cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -891,8 +916,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->xseed);
+		v = pg_erand48(random_state->xseed);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -910,10 +935,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(ZipfCache *zipf_cache, RandomState *random_state,
+					   int64 n, double s)
 {
-	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	ZipfCell   *cell = zipfFindOrCreateCacheCell(zipf_cache, n, s);
+	double		uniform = pg_erand48(random_state->xseed);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -925,7 +951,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(ZipfCache *zipf_cache, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -934,8 +961,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					  ? computeIterativeZipfian(random_state, n, s)
+					  : computeHarmonicZipfian(zipf_cache, random_state, n, s));
 }
 
 /*
@@ -2209,7 +2236,7 @@ evalStandardFunc(TState *thread, CState *st,
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2231,7 +2258,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
@@ -2243,7 +2271,9 @@ evalStandardFunc(TState *thread, CState *st,
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(&thread->zipf_cache,
+												   &st->random_state, imin,
+												   imax, param));
 					}
 					else		/* exponential */
 					{
@@ -2256,7 +2286,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2551,7 +2582,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->choose_script_rs, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2745,7 +2776,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->throttling_rs, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2779,7 +2810,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->throttling_rs,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -3322,7 +3354,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->sampling_rs.xseed) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -4750,6 +4782,17 @@ set_random_seed(const char *seed)
 	return true;
 }
 
+/*
+ * Initialize the random state of the client/thread.
+ */
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->xseed[0] = random();
+	random_state->xseed[1] = random();
+	random_state->xseed[2] = random();
+}
+
 
 int
 main(int argc, char **argv)
@@ -5358,6 +5401,7 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
 	if (debug)
@@ -5491,9 +5535,9 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->choose_script_rs);
+		initRandomState(&thread->throttling_rs);
+		initRandomState(&thread->sampling_rs);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
-- 
2.7.4

v10-0002-Pgbench-errors-use-a-separate-function-to-report.patchtext/x-diff; charset=us-ascii; name=v10-0002-Pgbench-errors-use-a-separate-function-to-report.patchDownload

From 29c1740fd3ad0921c6460e57d6745ea700ef6399 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 7 Aug 2018 13:28:35 +0300
Subject: [PATCH v10 2/4] Pgbench errors: use a separate function to report a
 debug/log/error message

This is most important when it is used to report client failures that do not
cause an aborts and this depends on the level of debugging.

Rename the already used function pgbench_error() to pgbench_simple_error() for
flex lexer errors. Also export the function appendPQExpBufferVA from libpq.
---
 src/bin/pgbench/pgbench.c          | 846 +++++++++++++++++++++++--------------
 src/interfaces/libpq/exports.txt   |   1 +
 src/interfaces/libpq/pqexpbuffer.c |   4 +-
 src/interfaces/libpq/pqexpbuffer.h |   8 +
 4 files changed, 530 insertions(+), 329 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 988e37b..c45cd44 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -484,8 +484,6 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
-
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -532,6 +530,31 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+typedef enum ErrorLevel
+{
+	/*
+	 * To report throttling, executed/sent/received commands etc.
+	 */
+	DEBUG,
+
+	/*
+	 * To report:
+	 * - abortion of the client (something bad e.g. the SQL/meta command failed
+	 *   or the connection with the backend was lost);
+	 * - the log messages of the main program;
+	 * - PGBENCH_DEBUG messages.
+	 */
+	LOG,
+
+	/*
+	 * To report the error messages of the main program and immediately call
+	 * exit(1).
+	 */
+	ERROR
+} ErrorLevel;
+
+static ErrorLevel log_min_messages = LOG;	/* no debug by default */
+
 
 /* Function prototypes */
 static void setNullValue(PgBenchValue *pv);
@@ -543,17 +566,19 @@ static void doLog(TState *thread, CState *st,
 	  StatsData *agg, bool skipped, double latency, double lag);
 static void processXactStats(TState *thread, CState *st, instr_time *now,
 				 bool skipped, StatsData *agg);
-static void pgbench_error(const char *fmt,...) pg_attribute_printf(1, 2);
+static void pgbench_simple_error(const char *fmt,...) pg_attribute_printf(1, 2);
 static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
+static void pgbench_error(ErrorLevel elevel,
+						  const char *fmt,...) pg_attribute_printf(2, 3);
 
 
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,						/* don't need get_variable functionality */
-	pgbench_error
+	pgbench_simple_error
 };
 
 
@@ -691,7 +716,7 @@ strtoint64(const char *str)
 
 	/* require at least one digit */
 	if (!isdigit((unsigned char) *ptr))
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+		pgbench_error(LOG, "invalid input syntax for integer: \"%s\"\n", str);
 
 	/* process digits */
 	while (*ptr && isdigit((unsigned char) *ptr))
@@ -699,7 +724,10 @@ strtoint64(const char *str)
 		int64		tmp = result * 10 + (*ptr++ - '0');
 
 		if ((tmp / 10) != result)	/* overflow? */
-			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+		{
+			pgbench_error(LOG, "value \"%s\" is out of range for type bigint\n",
+						  str);
+		}
 		result = tmp;
 	}
 
@@ -710,7 +738,7 @@ gotdigits:
 		ptr++;
 
 	if (*ptr != '\0')
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+		pgbench_error(LOG, "invalid input syntax for integer: \"%s\"\n", str);
 
 	return ((sign < 0) ? -result : result);
 }
@@ -1098,8 +1126,10 @@ executeStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
+		/* we are sure that the function PQerrorMessage is always called */
+		Assert(ERROR >= log_min_messages);
+
+		pgbench_error(ERROR, "%s", PQerrorMessage(con));
 	}
 	PQclear(res);
 }
@@ -1113,8 +1143,11 @@ tryExecuteStatement(PGconn *con, const char *sql)
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		fprintf(stderr, "(ignoring this error and continuing anyway)\n");
+		/* we are sure that the function PQerrorMessage is always called */
+		Assert(LOG >= log_min_messages);
+
+		pgbench_error(LOG, "%s(ignoring this error and continuing anyway)\n",
+					  PQerrorMessage(con));
 	}
 	PQclear(res);
 }
@@ -1160,8 +1193,8 @@ doConnect(void)
 
 		if (!conn)
 		{
-			fprintf(stderr, "connection to database \"%s\" failed\n",
-					dbName);
+			pgbench_error(LOG, "connection to database \"%s\" failed\n",
+						  dbName);
 			return NULL;
 		}
 
@@ -1179,8 +1212,11 @@ doConnect(void)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed:\n%s",
-				dbName, PQerrorMessage(conn));
+		/* we are sure that the function PQerrorMessage is always called */
+		Assert(LOG >= log_min_messages);
+
+		pgbench_error(LOG, "connection to database \"%s\" failed:\n%s",
+					  dbName, PQerrorMessage(conn));
 		PQfinish(conn);
 		return NULL;
 	}
@@ -1318,9 +1354,9 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			pgbench_error(LOG,
+						  "malformed variable \"%s\" value: \"%s\"\n",
+						  var->name, var->svalue);
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1365,10 +1401,11 @@ valid_variable_name(const char *name)
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
- * Returns NULL on failure (bad name).
+ * On failure (bad name): if this is a client run returns NULL; exits the
+ * program otherwise.
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(CState *st, const char *context, char *name, bool client)
 {
 	Variable   *var;
 
@@ -1383,8 +1420,12 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			/*
+			 * About the error level used: if we process client commands, it a
+			 * normal failure; otherwise it is not and we exit the program.
+			 */
+			pgbench_error(client ? LOG : ERROR,
+						  "%s: invalid variable name: \"%s\"\n", context, name);
 			return NULL;
 		}
 
@@ -1412,16 +1453,14 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 }
 
 /* Assign a string value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
-static bool
+/* Exits on failure (bad name) */
+static void
 putVariable(CState *st, const char *context, char *name, const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
-	if (!var)
-		return false;
+	var = lookupCreateVariable(st, context, name, false);
 
 	/* dup then free, in case value is pointing at this variable */
 	val = pg_strdup(value);
@@ -1430,19 +1469,20 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 		free(var->svalue);
 	var->svalue = val;
 	var->value.type = PGBT_NO_VALUE;
-
-	return true;
 }
 
-/* Assign a value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign a value to a variable, creating it if need be.
+ * On failure (bad name): if this is a client run returns false; exits the
+ * program otherwise.
+ */
 static bool
 putVariableValue(CState *st, const char *context, char *name,
-				 const PgBenchValue *value)
+				 const PgBenchValue *value, bool client)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(st, context, name, client);
 	if (!var)
 		return false;
 
@@ -1454,15 +1494,19 @@ putVariableValue(CState *st, const char *context, char *name,
 	return true;
 }
 
-/* Assign an integer value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign an integer value to a variable, creating it if need be.
+ * On failure (bad name): if this is a client run returns false; exits the
+ * program otherwise.
+ */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(CState *st, const char *context, char *name, int64 value,
+			   bool client)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(st, context, name, &val, client);
 }
 
 /*
@@ -1593,7 +1637,11 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		/* we are sure that the function valueTypeName only is always called */
+		Assert(LOG >= log_min_messages);
+
+		pgbench_error(LOG, "cannot coerce %s to boolean\n",
+					  valueTypeName(pval));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1638,7 +1686,7 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			pgbench_error(LOG, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1646,7 +1694,10 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		/* we are sure that the function valueTypeName is always called */
+		Assert(LOG >= log_min_messages);
+
+		pgbench_error(LOG, "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1667,7 +1718,10 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		/* we are sure that the function valueTypeName is always called */
+		Assert(LOG >= log_min_messages);
+
+		pgbench_error(LOG, "cannot coerce %s to double\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1848,8 +1902,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		pgbench_error(LOG, "too many function arguments, maximum is %d\n",
+					  MAX_FARGS);
 		return false;
 	}
 
@@ -1972,7 +2026,7 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								pgbench_error(LOG, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -1983,7 +2037,8 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										pgbench_error(LOG,
+													  "bigint out of range\n");
 										return false;
 									}
 									else
@@ -2084,22 +2139,48 @@ evalStandardFunc(TState *thread, CState *st,
 		case PGBENCH_DEBUG:
 			{
 				PgBenchValue *varg = &vargs[0];
+				PQExpBufferData errmsg_buf;
 
 				Assert(nargs == 1);
 
-				fprintf(stderr, "debug(script=%d,command=%d): ",
-						st->use_file, st->command + 1);
+				/*
+				 * We are sure that the allocated memory for the message is
+				 * always used.
+				 */
+				Assert(LOG >= log_min_messages);
+
+				initPQExpBuffer(&errmsg_buf);
+				printfPQExpBuffer(&errmsg_buf,
+								  "debug(script=%d,command=%d): ",
+								  st->use_file, st->command + 1);
 
 				if (varg->type == PGBT_NULL)
-					fprintf(stderr, "null\n");
+				{
+					appendPQExpBuffer(&errmsg_buf, "null\n");
+				}
 				else if (varg->type == PGBT_BOOLEAN)
-					fprintf(stderr, "boolean %s\n", varg->u.bval ? "true" : "false");
+				{
+					appendPQExpBuffer(&errmsg_buf,
+									  "boolean %s\n",
+									  varg->u.bval ? "true" : "false");
+				}
 				else if (varg->type == PGBT_INT)
-					fprintf(stderr, "int " INT64_FORMAT "\n", varg->u.ival);
+				{
+					appendPQExpBuffer(&errmsg_buf,
+									  "int " INT64_FORMAT "\n", varg->u.ival);
+				}
 				else if (varg->type == PGBT_DOUBLE)
-					fprintf(stderr, "double %.*g\n", DBL_DIG, varg->u.dval);
+				{
+					appendPQExpBuffer(&errmsg_buf,
+									  "double %.*g\n", DBL_DIG, varg->u.dval);
+				}
 				else			/* internal error, unexpected type */
+				{
 					Assert(0);
+				}
+
+				pgbench_error(LOG, "%s", errmsg_buf.data);
+				termPQExpBuffer(&errmsg_buf);
 
 				*retval = *varg;
 
@@ -2223,13 +2304,13 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					pgbench_error(LOG, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					pgbench_error(LOG, "random range is too large\n");
 					return false;
 				}
 
@@ -2251,9 +2332,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							pgbench_error(LOG,
+										  "gaussian parameter must be at least %f (not %f)\n",
+										  MIN_GAUSSIAN_PARAM, param);
 							return false;
 						}
 
@@ -2265,9 +2346,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							pgbench_error(LOG,
+										  "zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										  MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
@@ -2279,9 +2360,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							pgbench_error(LOG,
+										  "exponential parameter must be greater than zero (got %f)\n",
+										  param);
 							return false;
 						}
 
@@ -2392,8 +2473,8 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					pgbench_error(LOG, "undefined variable \"%s\"\n",
+								  expr->u.variable.varname);
 					return false;
 				}
 
@@ -2411,10 +2492,15 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 							retval);
 
 		default:
-			/* internal error which should never occur */
-			fprintf(stderr, "unexpected enode type in evaluation: %d\n",
-					expr->etype);
-			exit(1);
+			{
+				/* internal error which should never occur */
+				pgbench_error(ERROR,
+							  "unexpected enode type in evaluation: %d\n",
+							  expr->etype);
+
+				/* keep compiler quiet */
+				return false;
+			}
 	}
 }
 
@@ -2487,15 +2573,15 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+						  argv[0], argv[i]);
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			pgbench_error(LOG, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2513,7 +2599,10 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		if (system(command))
 		{
 			if (!timer_exceeded)
-				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+			{
+				pgbench_error(LOG, "%s: could not launch shell command\n",
+							  argv[0]);
+			}
 			return false;
 		}
 		return true;
@@ -2522,19 +2611,22 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		pgbench_error(LOG, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
 		if (!timer_exceeded)
-			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
+		{
+			pgbench_error(LOG, "%s: could not read result of shell command\n",
+						  argv[0]);
+		}
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		pgbench_error(LOG, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2544,11 +2636,12 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		pgbench_error(LOG,
+					  "%s: shell command must return an integer (not \"%s\")\n",
+					  argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(st, "setshell", variable, retval, true))
 		return false;
 
 #ifdef DEBUG
@@ -2567,9 +2660,9 @@ preparedStatementName(char *buffer, int file, int state)
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
-	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	pgbench_error(LOG,
+				  "client %d aborted in command %d (%s) of script %d; %s\n",
+				  st->id, st->command, cmd, st->use_file, message);
 }
 
 /* return a script number with a weighted choice. */
@@ -2604,8 +2697,7 @@ sendCommand(CState *st, Command *command)
 		sql = pg_strdup(command->argv[0]);
 		sql = assignVariables(st, sql);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
 	}
@@ -2616,8 +2708,7 @@ sendCommand(CState *st, Command *command)
 
 		getQueryParams(st, command, params);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
 	}
@@ -2642,7 +2733,15 @@ sendCommand(CState *st, Command *command)
 				res = PQprepare(st->con, name,
 								commands[j]->argv[0], commands[j]->argc - 1, NULL);
 				if (PQresultStatus(res) != PGRES_COMMAND_OK)
-					fprintf(stderr, "%s", PQerrorMessage(st->con));
+				{
+					/*
+					 * We are sure that the function PQerrorMessage is always
+					 * called.
+					 */
+					Assert(LOG >= log_min_messages);
+
+					pgbench_error(LOG, "%s", PQerrorMessage(st->con));
+				}
 				PQclear(res);
 			}
 			st->prepared[st->use_file] = true;
@@ -2651,8 +2750,7 @@ sendCommand(CState *st, Command *command)
 		getQueryParams(st, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
-			fprintf(stderr, "client %d sending %s\n", st->id, name);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
 	}
@@ -2661,9 +2759,8 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
-			fprintf(stderr, "client %d could not send %s\n",
-					st->id, command->argv[0]);
+		pgbench_error(DEBUG, "client %d could not send %s\n",
+					  st->id, command->argv[0]);
 		st->ecnt++;
 		return false;
 	}
@@ -2685,8 +2782,8 @@ evaluateSleep(CState *st, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(st, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+						  argv[0], argv[1]);
 			return false;
 		}
 		usec = atoi(var);
@@ -2749,9 +2846,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
-					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
-							sql_script[st->use_file].desc);
+				pgbench_error(DEBUG, "client %d executing script \"%s\"\n",
+							  st->id, sql_script[st->use_file].desc);
 
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
@@ -2824,9 +2920,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
-					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
-							st->id, wait);
+				pgbench_error(DEBUG,
+							  "client %d throttling " INT64_FORMAT " us\n",
+							  st->id, wait);
 				break;
 
 				/*
@@ -2858,8 +2954,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					start = now;
 					if ((st->con = doConnect()) == NULL)
 					{
-						fprintf(stderr, "client %d aborted while establishing connection\n",
-								st->id);
+						pgbench_error(LOG,
+									  "client %d aborted while establishing connection\n",
+									  st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -2937,12 +3034,19 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					/* allocate memory for the message only if necessary */
+					if (DEBUG >= log_min_messages)
 					{
-						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
+						PQExpBufferData errmsg_buf;
+
+						initPQExpBuffer(&errmsg_buf);
+						printfPQExpBuffer(&errmsg_buf, "client %d executing \\%s",
+										  st->id, argv[0]);
 						for (i = 1; i < argc; i++)
-							fprintf(stderr, " %s", argv[i]);
-						fprintf(stderr, "\n");
+							appendPQExpBuffer(&errmsg_buf, " %s", argv[i]);
+						appendPQExpBufferChar(&errmsg_buf, '\n');
+						pgbench_error(DEBUG, "%s", errmsg_buf.data);
+						termPQExpBuffer(&errmsg_buf);
 					}
 
 					if (command->meta == META_SLEEP)
@@ -2997,7 +3101,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(st, argv[0], argv[1], &result,
+												  true))
 							{
 								commandFailed(st, "set", "assignment of meta-command failed");
 								st->state = CSTATE_ABORTED;
@@ -3197,8 +3302,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_WAIT_RESULT:
 				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
-					fprintf(stderr, "client %d receiving\n", st->id);
+				pgbench_error(DEBUG, "client %d receiving\n", st->id);
 				if (!PQconsumeInput(st->con))
 				{				/* there's something wrong */
 					commandFailed(st, "SQL", "perhaps the backend died while processing");
@@ -3284,8 +3388,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				/* conditional stack must be empty */
 				if (!conditional_stack_empty(st->cstack))
 				{
-					fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
-					exit(1);
+					pgbench_error(ERROR,
+								  "end of script reached within a conditional, missing \\endif\n");
 				}
 
 				if (is_connect)
@@ -3484,7 +3588,7 @@ disconnect_all(CState *state, int length)
 static void
 initDropTables(PGconn *con)
 {
-	fprintf(stderr, "dropping old tables...\n");
+	pgbench_error(LOG, "dropping old tables...\n");
 
 	/*
 	 * We drop all the tables in one command, so that whether there are
@@ -3559,7 +3663,7 @@ initCreateTables(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating tables...\n");
+	pgbench_error(LOG, "creating tables...\n");
 
 	for (i = 0; i < lengthof(DDLs); i++)
 	{
@@ -3612,7 +3716,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	pgbench_error(LOG, "generating data...\n");
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3658,8 +3762,10 @@ initGenerateData(PGconn *con)
 	res = PQexec(con, "copy pgbench_accounts from stdin");
 	if (PQresultStatus(res) != PGRES_COPY_IN)
 	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
+		/* we are sure that the function PQerrorMessage is always called */
+		Assert(ERROR >= log_min_messages);
+
+		pgbench_error(ERROR, "%s", PQerrorMessage(con));
 	}
 	PQclear(res);
 
@@ -3674,10 +3780,7 @@ initGenerateData(PGconn *con)
 				 INT64_FORMAT "\t" INT64_FORMAT "\t%d\t\n",
 				 j, k / naccounts + 1, 0);
 		if (PQputline(con, sql))
-		{
-			fprintf(stderr, "PQputline failed\n");
-			exit(1);
-		}
+			pgbench_error(ERROR, "PQputline failed\n");
 
 		/*
 		 * If we want to stick with the original logging, print a message each
@@ -3691,10 +3794,12 @@ initGenerateData(PGconn *con)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = ((double) scale * naccounts - j) * elapsed_sec / j;
 
-			fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-					j, (int64) naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * (int64) scale)),
-					elapsed_sec, remaining_sec);
+			pgbench_error(LOG,
+						  INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+						  j, (int64) naccounts * scale,
+						  (int) (((int64) j * 100) /
+								 (naccounts * (int64) scale)),
+						  elapsed_sec, remaining_sec);
 		}
 		/* let's not call the timing for each row, but only each 100 rows */
 		else if (use_quiet && (j % 100 == 0))
@@ -3708,9 +3813,12 @@ initGenerateData(PGconn *con)
 			/* have we reached the next interval (or end)? */
 			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-				fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-						j, (int64) naccounts * scale,
-						(int) (((int64) j * 100) / (naccounts * (int64) scale)), elapsed_sec, remaining_sec);
+				pgbench_error(LOG,
+							  INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+							  j, (int64) naccounts * scale,
+							  (int) (((int64) j * 100) /
+									 (naccounts * (int64) scale)),
+							  elapsed_sec, remaining_sec);
 
 				/* skip to the next interval */
 				log_interval = (int) ceil(elapsed_sec / LOG_STEP_SECONDS);
@@ -3719,15 +3827,9 @@ initGenerateData(PGconn *con)
 
 	}
 	if (PQputline(con, "\\.\n"))
-	{
-		fprintf(stderr, "very last PQputline failed\n");
-		exit(1);
-	}
+		pgbench_error(ERROR, "very last PQputline failed\n");
 	if (PQendcopy(con))
-	{
-		fprintf(stderr, "PQendcopy failed\n");
-		exit(1);
-	}
+		pgbench_error(ERROR, "PQendcopy failed\n");
 
 	executeStatement(con, "commit");
 }
@@ -3738,7 +3840,7 @@ initGenerateData(PGconn *con)
 static void
 initVacuum(PGconn *con)
 {
-	fprintf(stderr, "vacuuming...\n");
+	pgbench_error(LOG, "vacuuming...\n");
 	executeStatement(con, "vacuum analyze pgbench_branches");
 	executeStatement(con, "vacuum analyze pgbench_tellers");
 	executeStatement(con, "vacuum analyze pgbench_accounts");
@@ -3758,7 +3860,7 @@ initCreatePKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating primary keys...\n");
+	pgbench_error(LOG, "creating primary keys...\n");
 	for (i = 0; i < lengthof(DDLINDEXes); i++)
 	{
 		char		buffer[256];
@@ -3795,7 +3897,7 @@ initCreateFKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating foreign keys...\n");
+	pgbench_error(LOG, "creating foreign keys...\n");
 	for (i = 0; i < lengthof(DDLKEYs); i++)
 	{
 		executeStatement(con, DDLKEYs[i]);
@@ -3815,19 +3917,16 @@ checkInitSteps(const char *initialize_steps)
 	const char *step;
 
 	if (initialize_steps[0] == '\0')
-	{
-		fprintf(stderr, "no initialization steps specified\n");
-		exit(1);
-	}
+		pgbench_error(ERROR, "no initialization steps specified\n");
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		if (strchr("dtgvpf ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
-			exit(1);
+			pgbench_error(ERROR,
+						  "unrecognized initialization step \"%c\"\n"
+						  "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n",
+						  *step);
 		}
 	}
 }
@@ -3869,14 +3968,14 @@ runInitSteps(const char *initialize_steps)
 			case ' ':
 				break;			/* ignore */
 			default:
-				fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-						*step);
+				pgbench_error(LOG, "unrecognized initialization step \"%c\"\n",
+							  *step);
 				PQfinish(con);
 				exit(1);
 		}
 	}
 
-	fprintf(stderr, "done.\n");
+	pgbench_error(LOG, "done.\n");
 	PQfinish(con);
 }
 
@@ -3914,8 +4013,9 @@ parseQuery(Command *cmd)
 
 		if (cmd->argc >= MAX_ARGS)
 		{
-			fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n",
-					MAX_ARGS - 1, cmd->argv[0]);
+			pgbench_error(LOG,
+						  "statement has too many arguments (maximum is %d): %s\n",
+						  MAX_ARGS - 1, cmd->argv[0]);
 			pg_free(name);
 			return false;
 		}
@@ -3936,14 +4036,28 @@ parseQuery(Command *cmd)
  * Simple error-printing function, might be needed by lexer
  */
 static void
-pgbench_error(const char *fmt,...)
+pgbench_simple_error(const char *fmt,...)
 {
 	va_list		ap;
+	PQExpBufferData errmsg_buf;
+	bool		done;
+
+	/* We are sure that the allocated memory for the message is always used. */
+	Assert(LOG >= log_min_messages);
 
 	fflush(stdout);
-	va_start(ap, fmt);
-	vfprintf(stderr, _(fmt), ap);
-	va_end(ap);
+	initPQExpBuffer(&errmsg_buf);
+
+	/* Loop in case we have to retry after enlarging the buffer. */
+	do
+	{
+		va_start(ap, fmt);
+		done = appendPQExpBufferVA(&errmsg_buf, fmt, ap);
+		va_end(ap);
+	} while (!done);
+
+	pgbench_error(LOG, "%s", errmsg_buf.data);
+	termPQExpBuffer(&errmsg_buf);
 }
 
 /*
@@ -3963,26 +4077,35 @@ syntax_error(const char *source, int lineno,
 			 const char *line, const char *command,
 			 const char *msg, const char *more, int column)
 {
-	fprintf(stderr, "%s:%d: %s", source, lineno, msg);
+	PQExpBufferData errmsg_buf;
+
+	/* we are sure that the allocated memory for the message is always used */
+	Assert(LOG >= log_min_messages);
+
+	initPQExpBuffer(&errmsg_buf);
+	printfPQExpBuffer(&errmsg_buf, "%s:%d: %s", source, lineno, msg);
 	if (more != NULL)
-		fprintf(stderr, " (%s)", more);
+		appendPQExpBuffer(&errmsg_buf, " (%s)", more);
 	if (column >= 0 && line == NULL)
-		fprintf(stderr, " at column %d", column + 1);
+		appendPQExpBuffer(&errmsg_buf, " at column %d", column + 1);
 	if (command != NULL)
-		fprintf(stderr, " in command \"%s\"", command);
-	fprintf(stderr, "\n");
+		appendPQExpBuffer(&errmsg_buf, " in command \"%s\"", command);
+	appendPQExpBufferChar(&errmsg_buf, '\n');
 	if (line != NULL)
 	{
-		fprintf(stderr, "%s\n", line);
+		appendPQExpBuffer(&errmsg_buf, "%s\n", line);
 		if (column >= 0)
 		{
 			int			i;
 
 			for (i = 0; i < column; i++)
-				fprintf(stderr, " ");
-			fprintf(stderr, "^ error found here\n");
+				appendPQExpBufferChar(&errmsg_buf, ' ');
+			appendPQExpBufferStr(&errmsg_buf, "^ error found here\n");
 		}
 	}
+
+	pgbench_error(LOG, "%s", errmsg_buf.data);
+	termPQExpBuffer(&errmsg_buf);
 	exit(1);
 }
 
@@ -4232,10 +4355,8 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 static void
 ConditionError(const char *desc, int cmdn, const char *msg)
 {
-	fprintf(stderr,
-			"condition error in script \"%s\" command %d: %s\n",
-			desc, cmdn, msg);
-	exit(1);
+	pgbench_error(ERROR, "condition error in script \"%s\" command %d: %s\n",
+				  desc, cmdn, msg);
 }
 
 /*
@@ -4434,18 +4555,22 @@ process_file(const char *filename, int weight)
 		fd = stdin;
 	else if ((fd = fopen(filename, "r")) == NULL)
 	{
-		fprintf(stderr, "could not open file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
+		/* we are sure that the function strerror is always called */
+		Assert(ERROR >= log_min_messages);
+
+		pgbench_error(ERROR, "could not open file \"%s\": %s\n",
+					  filename, strerror(errno));
 	}
 
 	buf = read_file_contents(fd);
 
 	if (ferror(fd))
 	{
-		fprintf(stderr, "could not read file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
+		/* we are sure that the function strerror is always called */
+		Assert(ERROR >= log_min_messages);
+
+		pgbench_error(ERROR, "could not read file \"%s\": %s\n",
+					  filename, strerror(errno));
 	}
 
 	if (fd != stdin)
@@ -4468,11 +4593,19 @@ static void
 listAvailableScripts(void)
 {
 	int			i;
+	PQExpBufferData errmsg_buf;
 
-	fprintf(stderr, "Available builtin scripts:\n");
+	/* we are sure that the allocated memory for the message is always used */
+	Assert(LOG >= log_min_messages);
+
+	initPQExpBuffer(&errmsg_buf);
+	printfPQExpBuffer(&errmsg_buf, "Available builtin scripts:\n");
 	for (i = 0; i < lengthof(builtin_script); i++)
-		fprintf(stderr, "\t%s\n", builtin_script[i].name);
-	fprintf(stderr, "\n");
+		appendPQExpBuffer(&errmsg_buf, "\t%s\n", builtin_script[i].name);
+	appendPQExpBufferChar(&errmsg_buf, '\n');
+
+	pgbench_error(LOG, "%s", errmsg_buf.data);
+	termPQExpBuffer(&errmsg_buf);
 }
 
 /* return builtin script "name" if unambiguous, fails if not found */
@@ -4499,10 +4632,11 @@ findBuiltin(const char *name)
 
 	/* error cases */
 	if (found == 0)
-		fprintf(stderr, "no builtin script found for name \"%s\"\n", name);
+		pgbench_error(LOG, "no builtin script found for name \"%s\"\n", name);
 	else						/* found > 1 */
-		fprintf(stderr,
-				"ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n", found, name);
+		pgbench_error(LOG,
+					  "ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n",
+					  found, name);
 
 	listAvailableScripts();
 	exit(1);
@@ -4534,16 +4668,12 @@ parseScriptWeight(const char *option, char **script)
 		errno = 0;
 		wtmp = strtol(sep + 1, &badp, 10);
 		if (errno != 0 || badp == sep + 1 || *badp != '\0')
-		{
-			fprintf(stderr, "invalid weight specification: %s\n", sep);
-			exit(1);
-		}
+			pgbench_error(ERROR, "invalid weight specification: %s\n", sep);
 		if (wtmp > INT_MAX || wtmp < 0)
 		{
-			fprintf(stderr,
-					"weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
-					INT_MAX, (int64) wtmp);
-			exit(1);
+			pgbench_error(ERROR,
+						  "weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
+						  INT_MAX, (int64) wtmp);
 		}
 		weight = wtmp;
 	}
@@ -4562,14 +4692,14 @@ addScript(ParsedScript script)
 {
 	if (script.commands == NULL || script.commands[0] == NULL)
 	{
-		fprintf(stderr, "empty command list for script \"%s\"\n", script.desc);
-		exit(1);
+		pgbench_error(ERROR, "empty command list for script \"%s\"\n",
+					  script.desc);
 	}
 
 	if (num_scripts >= MAX_SCRIPTS)
 	{
-		fprintf(stderr, "at most %d SQL scripts are allowed\n", MAX_SCRIPTS);
-		exit(1);
+		pgbench_error(ERROR, "at most %d SQL scripts are allowed\n",
+					  MAX_SCRIPTS);
 	}
 
 	CheckConditional(script);
@@ -4754,9 +4884,8 @@ set_random_seed(const char *seed)
 		if (!pg_strong_random(&iseed, sizeof(iseed)))
 #endif
 		{
-			fprintf(stderr,
-					"cannot seed random from a strong source, none available: "
-					"use \"time\" or an unsigned integer value.\n");
+			pgbench_error(LOG,
+						  "cannot seed random from a strong source, none available: use \"time\" or an unsigned integer value.\n");
 			return false;
 		}
 	}
@@ -4767,15 +4896,15 @@ set_random_seed(const char *seed)
 
 		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
 		{
-			fprintf(stderr,
-					"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
-					seed);
+			pgbench_error(LOG,
+						  "unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
+						  seed);
 			return false;
 		}
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
+		pgbench_error(LOG, "setting random seed to %u\n", iseed);
 	srandom(iseed);
 	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
 	random_seed = iseed;
@@ -4907,8 +5036,8 @@ main(int argc, char **argv)
 	/* set random seed early, because it may be used while parsing scripts. */
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
 	{
-		fprintf(stderr, "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
 	}
 
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
@@ -4941,16 +5070,15 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
+				log_min_messages = DEBUG;
 				break;
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
 				if (nclients <= 0 || nclients > MAXCLIENTS)
 				{
-					fprintf(stderr, "invalid number of clients: \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid number of clients: \"%s\"\n",
+								  optarg);
 				}
 #ifdef HAVE_GETRLIMIT
 #ifdef RLIMIT_NOFILE			/* most platforms use RLIMIT_NOFILE */
@@ -4959,15 +5087,20 @@ main(int argc, char **argv)
 				if (getrlimit(RLIMIT_OFILE, &rlim) == -1)
 #endif							/* RLIMIT_NOFILE */
 				{
-					fprintf(stderr, "getrlimit failed: %s\n", strerror(errno));
-					exit(1);
+					/*
+					 * We are sure that the function strerror is always called.
+					 */
+					Assert(ERROR >= log_min_messages);
+
+					pgbench_error(ERROR, "getrlimit failed: %s\n",
+								  strerror(errno));
 				}
 				if (rlim.rlim_cur < nclients + 3)
 				{
-					fprintf(stderr, "need at least %d open files, but system limit is %ld\n",
-							nclients + 3, (long) rlim.rlim_cur);
-					fprintf(stderr, "Reduce number of clients, or use limit/ulimit to increase the system limit.\n");
-					exit(1);
+					pgbench_error(ERROR,
+								  "need at least %d open files, but system limit is %ld\n"
+								  "Reduce number of clients, or use limit/ulimit to increase the system limit.\n",
+								  nclients + 3, (long) rlim.rlim_cur);
 				}
 #endif							/* HAVE_GETRLIMIT */
 				break;
@@ -4976,15 +5109,14 @@ main(int argc, char **argv)
 				nthreads = atoi(optarg);
 				if (nthreads <= 0)
 				{
-					fprintf(stderr, "invalid number of threads: \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid number of threads: \"%s\"\n",
+								  optarg);
 				}
 #ifndef ENABLE_THREAD_SAFETY
 				if (nthreads != 1)
 				{
-					fprintf(stderr, "threads are not supported on this platform; use -j1\n");
-					exit(1);
+					pgbench_error(ERROR,
+								  "threads are not supported on this platform; use -j1\n");
 				}
 #endif							/* !ENABLE_THREAD_SAFETY */
 				break;
@@ -5001,8 +5133,8 @@ main(int argc, char **argv)
 				scale = atoi(optarg);
 				if (scale <= 0)
 				{
-					fprintf(stderr, "invalid scaling factor: \"%s\"\n", optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid scaling factor: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 't':
@@ -5010,19 +5142,16 @@ main(int argc, char **argv)
 				nxacts = atoi(optarg);
 				if (nxacts <= 0)
 				{
-					fprintf(stderr, "invalid number of transactions: \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR,
+								  "invalid number of transactions: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 'T':
 				benchmarking_option_set = true;
 				duration = atoi(optarg);
 				if (duration <= 0)
-				{
-					fprintf(stderr, "invalid duration: \"%s\"\n", optarg);
-					exit(1);
-				}
+					pgbench_error(ERROR, "invalid duration: \"%s\"\n", optarg);
 				break;
 			case 'U':
 				login = pg_strdup(optarg);
@@ -5069,14 +5198,13 @@ main(int argc, char **argv)
 
 					if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
 					{
-						fprintf(stderr, "invalid variable definition: \"%s\"\n",
-								optarg);
-						exit(1);
+						pgbench_error(ERROR,
+									  "invalid variable definition: \"%s\"\n",
+									  optarg);
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
-						exit(1);
+					putVariable(&state[0], "option", optarg, p);
 				}
 				break;
 			case 'F':
@@ -5084,8 +5212,8 @@ main(int argc, char **argv)
 				fillfactor = atoi(optarg);
 				if (fillfactor < 10 || fillfactor > 100)
 				{
-					fprintf(stderr, "invalid fillfactor: \"%s\"\n", optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid fillfactor: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 'M':
@@ -5095,9 +5223,8 @@ main(int argc, char **argv)
 						break;
 				if (querymode >= NUM_QUERYMODE)
 				{
-					fprintf(stderr, "invalid query mode (-M): \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid query mode (-M): \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 'P':
@@ -5105,9 +5232,9 @@ main(int argc, char **argv)
 				progress = atoi(optarg);
 				if (progress <= 0)
 				{
-					fprintf(stderr, "invalid thread progress delay: \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR,
+								  "invalid thread progress delay: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 'R':
@@ -5119,8 +5246,8 @@ main(int argc, char **argv)
 
 					if (throttle_value <= 0.0)
 					{
-						fprintf(stderr, "invalid rate limit: \"%s\"\n", optarg);
-						exit(1);
+						pgbench_error(ERROR, "invalid rate limit: \"%s\"\n",
+									  optarg);
 					}
 					/* Invert rate limit into a time offset */
 					throttle_delay = (int64) (1000000.0 / throttle_value);
@@ -5132,9 +5259,8 @@ main(int argc, char **argv)
 
 					if (limit_ms <= 0.0)
 					{
-						fprintf(stderr, "invalid latency limit: \"%s\"\n",
-								optarg);
-						exit(1);
+						pgbench_error(ERROR, "invalid latency limit: \"%s\"\n",
+									  optarg);
 					}
 					benchmarking_option_set = true;
 					latency_limit = (int64) (limit_ms * 1000);
@@ -5157,8 +5283,8 @@ main(int argc, char **argv)
 				sample_rate = atof(optarg);
 				if (sample_rate <= 0.0 || sample_rate > 1.0)
 				{
-					fprintf(stderr, "invalid sampling rate: \"%s\"\n", optarg);
-					exit(1);
+					pgbench_error(ERROR, "invalid sampling rate: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 5:				/* aggregate-interval */
@@ -5166,9 +5292,9 @@ main(int argc, char **argv)
 				agg_interval = atoi(optarg);
 				if (agg_interval <= 0)
 				{
-					fprintf(stderr, "invalid number of seconds for aggregation: \"%s\"\n",
-							optarg);
-					exit(1);
+					pgbench_error(ERROR,
+								  "invalid number of seconds for aggregation: \"%s\"\n",
+								  optarg);
 				}
 				break;
 			case 6:				/* progress-timestamp */
@@ -5187,13 +5313,14 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				if (!set_random_seed(optarg))
 				{
-					fprintf(stderr, "error while setting random seed from --random-seed option\n");
-					exit(1);
+					pgbench_error(ERROR,
+								  "error while setting random seed from --random-seed option\n");
 				}
 				break;
 			default:
-				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
-				exit(1);
+				pgbench_error(ERROR,
+							  _("Try \"%s --help\" for more information.\n"),
+							  progname);
 				break;
 		}
 	}
@@ -5230,10 +5357,7 @@ main(int argc, char **argv)
 		total_weight += sql_script[i].weight;
 
 	if (total_weight == 0 && !is_init_mode)
-	{
-		fprintf(stderr, "total script weight must not be zero\n");
-		exit(1);
-	}
+		pgbench_error(ERROR, "total script weight must not be zero\n");
 
 	/* show per script stats if several scripts are used */
 	if (num_scripts > 1)
@@ -5266,8 +5390,8 @@ main(int argc, char **argv)
 	{
 		if (benchmarking_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in initialization (-i) mode\n");
-			exit(1);
+			pgbench_error(ERROR,
+						  "some of the specified options cannot be used in initialization (-i) mode\n");
 		}
 
 		if (initialize_steps == NULL)
@@ -5301,15 +5425,15 @@ main(int argc, char **argv)
 	{
 		if (initialization_option_set)
 		{
-			fprintf(stderr, "some of the specified options cannot be used in benchmarking mode\n");
-			exit(1);
+			pgbench_error(ERROR,
+						  "some of the specified options cannot be used in benchmarking mode\n");
 		}
 	}
 
 	if (nxacts > 0 && duration > 0)
 	{
-		fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "specify either a number of transactions (-t) or a duration (-T), not both\n");
 	}
 
 	/* Use DEFAULT_NXACTS if neither nxacts nor duration is specified. */
@@ -5319,45 +5443,47 @@ main(int argc, char **argv)
 	/* --sampling-rate may be used only with -l */
 	if (sample_rate > 0.0 && !use_log)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
 	}
 
 	/* --sampling-rate may not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
 	{
-		fprintf(stderr, "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
 	}
 
 	if (agg_interval > 0 && !use_log)
 	{
-		fprintf(stderr, "log aggregation is allowed only when actually logging transactions\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "log aggregation is allowed only when actually logging transactions\n");
 	}
 
 	if (!use_log && logfile_prefix)
 	{
-		fprintf(stderr, "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
 	}
 
 	if (duration > 0 && agg_interval > duration)
 	{
-		fprintf(stderr, "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n", agg_interval, duration);
-		exit(1);
+		pgbench_error(ERROR,
+					  "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n",
+					  agg_interval, duration);
 	}
 
 	if (duration > 0 && agg_interval > 0 && duration % agg_interval != 0)
 	{
-		fprintf(stderr, "duration (%d) must be a multiple of aggregation interval (%d)\n", duration, agg_interval);
-		exit(1);
+		pgbench_error(ERROR,
+					  "duration (%d) must be a multiple of aggregation interval (%d)\n",
+					  duration, agg_interval);
 	}
 
 	if (progress_timestamp && progress == 0)
 	{
-		fprintf(stderr, "--progress-timestamp is allowed only under --progress\n");
-		exit(1);
+		pgbench_error(ERROR,
+					  "--progress-timestamp is allowed only under --progress\n");
 	}
 
 	/*
@@ -5383,15 +5509,12 @@ main(int argc, char **argv)
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
-										  var->name, &var->value))
-						exit(1);
+					putVariableValue(&state[i], "startup",
+									 var->name, &var->value, false);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
-									 var->name, var->svalue))
-						exit(1);
+					putVariable(&state[i], "startup", var->name, var->svalue);
 				}
 			}
 		}
@@ -5404,7 +5527,7 @@ main(int argc, char **argv)
 		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
+	if (DEBUG >= log_min_messages)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -5421,9 +5544,11 @@ main(int argc, char **argv)
 
 	if (PQstatus(con) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed\n", dbName);
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
+		/* we are sure that the function PQerrorMessage is always called */
+		Assert(ERROR >= log_min_messages);
+
+		pgbench_error(ERROR, "connection to database \"%s\" failed\n%s",
+					  dbName, PQerrorMessage(con));
 	}
 
 	if (internal_script_used)
@@ -5436,29 +5561,44 @@ main(int argc, char **argv)
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 		{
 			char	   *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+			PQExpBufferData errmsg_buf;
 
-			fprintf(stderr, "%s", PQerrorMessage(con));
+			/*
+			 * we are sure that the allocated memory for the message is always
+			 * used
+			 */
+			Assert(LOG >= log_min_messages);
+
+			initPQExpBuffer(&errmsg_buf);
+			printfPQExpBuffer(&errmsg_buf, "%s", PQerrorMessage(con));
 			if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
 			{
-				fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con));
+				appendPQExpBuffer(&errmsg_buf,
+								  "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n",
+								  PQdb(con));
 			}
 
+			pgbench_error(LOG, "%s", errmsg_buf.data);
+			termPQExpBuffer(&errmsg_buf);
 			exit(1);
 		}
 		scale = atoi(PQgetvalue(res, 0, 0));
 		if (scale < 0)
 		{
-			fprintf(stderr, "invalid count(*) from pgbench_branches: \"%s\"\n",
-					PQgetvalue(res, 0, 0));
-			exit(1);
+			/* we are sure that the function PQgetvalue is always called */
+			Assert(ERROR >= log_min_messages);
+
+			pgbench_error(ERROR,
+						  "invalid count(*) from pgbench_branches: \"%s\"\n",
+						  PQgetvalue(res, 0, 0));
 		}
 		PQclear(res);
 
 		/* warn if we override user-given -s switch */
 		if (scale_given)
-			fprintf(stderr,
-					"scale option ignored, using count from pgbench_branches table (%d)\n",
-					scale);
+			pgbench_error(LOG,
+						  "scale option ignored, using count from pgbench_branches table (%d)\n",
+						  scale);
 	}
 
 	/*
@@ -5468,10 +5608,7 @@ main(int argc, char **argv)
 	if (lookupVariable(&state[0], "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
-				exit(1);
-		}
+			putVariableInt(&state[i], "startup", "scale", scale, false);
 	}
 
 	/*
@@ -5481,8 +5618,7 @@ main(int argc, char **argv)
 	if (lookupVariable(&state[0], "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
-				exit(1);
+			putVariableInt(&state[i], "startup", "client_id", i, false);
 	}
 
 	/* set default seed for hash functions */
@@ -5494,31 +5630,35 @@ main(int argc, char **argv)
 		(uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
-				exit(1);
+		{
+			putVariableInt(&state[i], "startup", "default_seed", (int64) seed,
+						   false);
+		}
 	}
 
 	/* set random seed unless overwritten */
 	if (lookupVariable(&state[0], "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
-				exit(1);
+		{
+			putVariableInt(&state[i], "startup", "random_seed", random_seed,
+						   false);
+		}
 	}
 
 	if (!is_no_vacuum)
 	{
-		fprintf(stderr, "starting vacuum...");
+		pgbench_error(LOG, "starting vacuum...");
 		tryExecuteStatement(con, "vacuum pgbench_branches");
 		tryExecuteStatement(con, "vacuum pgbench_tellers");
 		tryExecuteStatement(con, "truncate pgbench_history");
-		fprintf(stderr, "end.\n");
+		pgbench_error(LOG, "end.\n");
 
 		if (do_vacuum_accounts)
 		{
-			fprintf(stderr, "starting vacuum pgbench_accounts...");
+			pgbench_error(LOG, "starting vacuum pgbench_accounts...");
 			tryExecuteStatement(con, "vacuum analyze pgbench_accounts");
-			fprintf(stderr, "end.\n");
+			pgbench_error(LOG, "end.\n");
 		}
 	}
 	PQfinish(con);
@@ -5578,8 +5718,11 @@ main(int argc, char **argv)
 
 			if (err != 0 || thread->thread == INVALID_THREAD)
 			{
-				fprintf(stderr, "could not create thread: %s\n", strerror(err));
-				exit(1);
+				/* we are sure that the function strerror is always called */
+				Assert(ERROR >= log_min_messages);
+
+				pgbench_error(ERROR, "could not create thread: %s\n",
+							  strerror(err));
 			}
 		}
 		else
@@ -5688,8 +5831,11 @@ threadRun(void *arg)
 
 		if (thread->logfile == NULL)
 		{
-			fprintf(stderr, "could not open logfile \"%s\": %s\n",
-					logpath, strerror(errno));
+			/* we are sure that the function strerror is always called */
+			Assert(LOG >= log_min_messages);
+
+			pgbench_error(LOG, "could not open logfile \"%s\": %s\n",
+						  logpath, strerror(errno));
 			goto done;
 		}
 	}
@@ -5767,8 +5913,14 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					/*
+					 * We are sure that the function PQerrorMessage is always
+					 * called.
+					 */
+					Assert(LOG >= log_min_messages);
+
+					pgbench_error(LOG, "invalid socket: %s",
+								  PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -5844,7 +5996,11 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				fprintf(stderr, "select() failed: %s\n", strerror(errno));
+
+				/* we are sure that the function strerror is always called */
+				Assert(LOG >= log_min_messages);
+
+				pgbench_error(LOG, "select() failed: %s\n", strerror(errno));
 				goto done;
 			}
 		}
@@ -5868,8 +6024,14 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					/*
+					 * We are sure that the function PQerrorMessage is always
+					 * called.
+					 */
+					Assert(LOG >= log_min_messages);
+
+					pgbench_error(LOG, "invalid socket: %s",
+								  PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -5911,6 +6073,7 @@ threadRun(void *arg)
 							lag,
 							stdev;
 				char		tbuf[315];
+				PQExpBufferData progress_buf;
 
 				/*
 				 * Add up the statistics of all threads.
@@ -5968,18 +6131,29 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 				}
 
-				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
+				/*
+				 * We are sure that the allocated memory for the message is
+				 * always used.
+				 */
+				Assert(LOG >= log_min_messages);
+
+				initPQExpBuffer(&progress_buf);
+				printfPQExpBuffer(&progress_buf,
+								  "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
+								  tbuf, tps, latency, stdev);
 
 				if (throttle_delay)
 				{
-					fprintf(stderr, ", lag %.3f ms", lag);
+					appendPQExpBuffer(&progress_buf, ", lag %.3f ms", lag);
 					if (latency_limit)
-						fprintf(stderr, ", " INT64_FORMAT " skipped",
-								cur.skipped - last.skipped);
+						appendPQExpBuffer(&progress_buf,
+										  ", " INT64_FORMAT " skipped",
+										  cur.skipped - last.skipped);
 				}
-				fprintf(stderr, "\n");
+				appendPQExpBufferChar(&progress_buf, '\n');
+
+				pgbench_error(LOG, "%s", progress_buf.data);
+				termPQExpBuffer(&progress_buf);
 
 				last = cur;
 				last_report = now;
@@ -6063,10 +6237,7 @@ setalarm(int seconds)
 		!CreateTimerQueueTimer(&timer, queue,
 							   win32_timer_callback, NULL, seconds * 1000, 0,
 							   WT_EXECUTEINTIMERTHREAD | WT_EXECUTEONLYONCE))
-	{
-		fprintf(stderr, "failed to set timer\n");
-		exit(1);
-	}
+		pgbench_error(ERROR, "failed to set timer\n");
 }
 
 /* partial pthread implementation for Windows */
@@ -6136,3 +6307,26 @@ pthread_join(pthread_t th, void **thread_return)
 }
 
 #endif							/* WIN32 */
+
+static void
+pgbench_error(ErrorLevel elevel, const char *fmt,...)
+{
+	va_list		ap;
+
+	/* Determine whether message is enabled for log output */
+	if (elevel < log_min_messages)
+		return;
+
+	if (!fmt || !fmt[0])
+	{
+		/* internal error which should never occur */
+		pgbench_error(ERROR, "empty error message cannot be reported\n");
+	}
+
+	va_start(ap, fmt);
+	vfprintf(stderr, _(fmt), ap);
+	va_end(ap);
+
+	if (elevel >= ERROR)
+		exit(1);
+}
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index d6a38d0..e983abc 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -172,3 +172,4 @@ PQsslAttribute            169
 PQsetErrorContextVisibility 170
 PQresultVerboseErrorMessage 171
 PQencryptPasswordConn     172
+appendPQExpBufferVA       173
diff --git a/src/interfaces/libpq/pqexpbuffer.c b/src/interfaces/libpq/pqexpbuffer.c
index 86b16e6..3db2d4c 100644
--- a/src/interfaces/libpq/pqexpbuffer.c
+++ b/src/interfaces/libpq/pqexpbuffer.c
@@ -37,8 +37,6 @@
 /* All "broken" PQExpBuffers point to this string. */
 static const char oom_buffer[1] = "";
 
-static bool appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args) pg_attribute_printf(2, 0);
-
 
 /*
  * markPQExpBufferBroken
@@ -282,7 +280,7 @@ appendPQExpBuffer(PQExpBuffer str, const char *fmt,...)
  * Attempt to format data and append it to str.  Returns true if done
  * (either successful or hard failure), false if need to retry.
  */
-static bool
+bool
 appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args)
 {
 	size_t		avail;
diff --git a/src/interfaces/libpq/pqexpbuffer.h b/src/interfaces/libpq/pqexpbuffer.h
index 771602a..b70b868 100644
--- a/src/interfaces/libpq/pqexpbuffer.h
+++ b/src/interfaces/libpq/pqexpbuffer.h
@@ -158,6 +158,14 @@ extern void printfPQExpBuffer(PQExpBuffer str, const char *fmt,...) pg_attribute
 extern void appendPQExpBuffer(PQExpBuffer str, const char *fmt,...) pg_attribute_printf(2, 3);
 
 /*------------------------
+ * appendPQExpBufferVA
+ * Shared guts of printfPQExpBuffer/appendPQExpBuffer.
+ * Attempt to format data and append it to str.  Returns true if done
+ * (either successful or hard failure), false if need to retry.
+ */
+extern bool appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list args) pg_attribute_printf(2, 0);
+
+/*------------------------
  * appendPQExpBufferStr
  * Append the given string to a PQExpBuffer, allocating more space
  * if necessary.
-- 
2.7.4

v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; charset=us-ascii; name=v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From 9a00415b91bcae2a0ee1cce4b9593a374b17b83b Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 7 Aug 2018 13:29:38 +0300
Subject: [PATCH v10 3/4] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, first time allocate 8
variables and then enlarge by duplicating size.
---
 src/bin/pgbench/pgbench.c | 243 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 171 insertions(+), 72 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c45cd44..4f8700b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -202,6 +202,12 @@ const char *progname;
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /*
+ * The number of varaibles for which the array is allocated for the first time.
+ * If necessary, every next time the array size just doubles.
+ */
+#define DEFAULT_MAX_VARIABLES	8
+
+/*
  * Variable definitions.
  *
  * If a variable only has a string value, "svalue" is that value, and value is
@@ -218,6 +224,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -349,9 +373,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -1248,39 +1270,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1399,54 +1421,119 @@ valid_variable_name(const char *name)
 }
 
 /*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ * On failure (too many variables are requested): if elevel < ERROR returns
+ * false; exits the program otherwise.
+ */
+static bool
+enlargeVariables(Variables *variables, const char *context, int needed,
+				 ErrorLevel elevel)
+{
+	size_t		new_max_vars;
+	Variable   *new_vars;
+
+	if ((size_t) variables->nvars + needed > INT_MAX)
+	{
+		if (elevel >= log_min_messages)
+		{
+			PQExpBufferData errmsg_buf;
+
+			initPQExpBuffer(&errmsg_buf);
+			if (context)
+				appendPQExpBuffer(&errmsg_buf, "%s: ", context);
+			appendPQExpBuffer(&errmsg_buf,
+							  "too many variables are used (limit is %d)\n",
+							  INT_MAX);
+			pgbench_error(elevel, "%s", errmsg_buf.data);
+			termPQExpBuffer(&errmsg_buf);
+		}
+
+		return false;
+	}
+
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	/* Because of the above test, we now have needed <= INT_MAX */
+
+	if (needed <= variables->max_vars)
+		return true;			/* got enough space already */
+
+	/*
+	 * We don't want to allocate just a little more space with each addition;
+	 * for efficiency, double the array size each time it overflows.
+	 * Actually, we might need to more than double it if 'needed' is big...
+	 */
+
+	if (variables->max_vars > 0)
+		new_max_vars = 2 * ((size_t) variables->max_vars);
+	else
+		new_max_vars = DEFAULT_MAX_VARIABLES;
+
+	while ((size_t) needed > new_max_vars)
+		new_max_vars = 2 * new_max_vars;
+
+	/*
+	 * Clamp to INT_MAX in case we went past it.  Note we are assuming here
+	 * that INT_MAX <= UINT_MAX/2, else the above loop could overflow.  We
+	 * will still have new_max_vars >= needed_vars.
+	 */
+	if (new_max_vars > (size_t) INT_MAX)
+		new_max_vars = (size_t) INT_MAX;
+
+	new_vars = (Variable *) pg_realloc(variables->vars,
+									   new_max_vars * sizeof(Variable));
+	variables->vars = new_vars;
+	variables->max_vars = new_max_vars;
+	return true;
+}
+
+/*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * On failure (bad name): if this is a client run returns NULL; exits the
  * program otherwise.
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name, bool client)
+lookupCreateVariable(Variables *variables, const char *context, char *name,
+					 bool client)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	/*
+	 * About the error level used: if we process client commands, it a normal
+	 * failure; otherwise it is not and we exit the program.
+	 */
+	ErrorLevel elevel = client ? LOG : ERROR;
+
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
 		 */
 		if (!valid_variable_name(name))
 		{
-			/*
-			 * About the error level used: if we process client commands, it a
-			 * normal failure; otherwise it is not and we exit the program.
-			 */
-			pgbench_error(client ? LOG : ERROR,
-						  "%s: invalid variable name: \"%s\"\n", context, name);
+			pgbench_error(elevel, "%s: invalid variable name: \"%s\"\n",
+						  context, name);
 			return NULL;
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		if (!enlargeVariables(variables, context, 1, elevel))
+			return NULL;
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1455,12 +1542,13 @@ lookupCreateVariable(CState *st, const char *context, char *name, bool client)
 /* Assign a string value to a variable, creating it if need be */
 /* Exits on failure (bad name) */
 static void
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name, false);
+	var = lookupCreateVariable(variables, context, name, false);
 
 	/* dup then free, in case value is pointing at this variable */
 	val = pg_strdup(value);
@@ -1477,12 +1565,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
  * program otherwise.
  */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value, bool client)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name, client);
+	var = lookupCreateVariable(variables, context, name, client);
 	if (!var)
 		return false;
 
@@ -1500,13 +1588,13 @@ putVariableValue(CState *st, const char *context, char *name,
  * program otherwise.
  */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value,
-			   bool client)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value, bool client)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val, client);
+	return putVariableValue(variables, context, name, &val, client);
 }
 
 /*
@@ -1561,7 +1649,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1582,7 +1670,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1597,12 +1685,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2471,7 +2560,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pgbench_error(LOG, "undefined variable \"%s\"\n",
 								  expr->u.variable.varname);
@@ -2540,7 +2629,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2571,7 +2660,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
 						  argv[0], argv[i]);
@@ -2641,7 +2730,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 					  argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval, true))
+	if (!putVariableInt(variables, "setshell", variable, retval, true))
 		return false;
 
 #ifdef DEBUG
@@ -2695,7 +2784,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2706,7 +2795,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2747,7 +2836,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pgbench_error(DEBUG, "client %d sending %s\n", st->id, name);
@@ -2773,14 +2862,14 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
 						  argv[0], argv[1]);
@@ -3060,7 +3149,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
 							commandFailed(st, "sleep", "execution of meta-command failed");
 							st->state = CSTATE_ABORTED;
@@ -3101,8 +3190,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result,
-												  true))
+							if (!putVariableValue(&st->variables, argv[0],
+												  argv[1], &result,  true))
 							{
 								commandFailed(st, "set", "assignment of meta-command failed");
 								st->state = CSTATE_ABORTED;
@@ -3155,7 +3244,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3175,7 +3266,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -5204,7 +5296,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					putVariable(&state[0], "option", optarg, p);
+					putVariable(&state[0].variables, "option", optarg, p);
 				}
 				break;
 			case 'F':
@@ -5503,18 +5595,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					putVariableValue(&state[i], "startup",
+					putVariableValue(&state[i].variables, "startup",
 									 var->name, &var->value, false);
 				}
 				else
 				{
-					putVariable(&state[i], "startup", var->name, var->svalue);
+					putVariable(&state[i].variables, "startup", var->name,
+								var->svalue);
 				}
 			}
 		}
@@ -5605,24 +5698,30 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			putVariableInt(&state[i], "startup", "scale", scale, false);
+		{
+			putVariableInt(&state[i].variables, "startup", "scale", scale,
+						   false);
+		}
 	}
 
 	/*
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			putVariableInt(&state[i], "startup", "client_id", i, false);
+		{
+			putVariableInt(&state[i].variables, "startup", "client_id", i,
+						   false);
+		}
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = ((uint64) (random() & 0xFFFF) << 48) |
 		((uint64) (random() & 0xFFFF) << 32) |
@@ -5631,18 +5730,18 @@ main(int argc, char **argv)
 
 		for (i = 0; i < nclients; i++)
 		{
-			putVariableInt(&state[i], "startup", "default_seed", (int64) seed,
-						   false);
+			putVariableInt(&state[i].variables, "startup", "default_seed",
+						   (int64) seed, false);
 		}
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			putVariableInt(&state[i], "startup", "random_seed", random_seed,
-						   false);
+			putVariableInt(&state[i].variables, "startup", "random_seed",
+						   random_seed, false);
 		}
 	}
 
-- 
2.7.4

v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; charset=us-ascii; name=v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 3200b96bd42cc279634575153ab8a94020ac37e2 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 7 Aug 2018 13:30:13 +0300
Subject: [PATCH v10 4/4] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the backend was lost. Otherwise if the execution of SQL or meta
command fails, the client's run continues normally until the end of the current
script execution (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock failures are rolled back and
repeated until they complete successfully or reach the maximum number of tries
(specified by the --max-tries option) / the maximum time of tries (specified by
the --latency-limit option). These options can be combined together; but if
none of them are used, the default value for the option --max-tries is set to 1
and failed transactions are not retried at all. If the last transaction run
fails, this transaction will be reported as failed, and the client variables
will be set as they were before the first run of this transaction.

If there're retries and/or errors their statistics are printed in the progress,
in the transaction / aggregation logs and in the end with other results (all and
for each script). A transaction error is reported here only if the last try of
this transaction fails. Also retries and errors are printed per-command with
average latencies if you use the appropriate benchmarking option
(--report-per-command, -r). If you want to group errors by basic types
(serialization errors / deadlock errors / other SQL errors / errors in meta
commands), use the option --errors-detailed.

If you want to distinguish between failures or errors by type (including which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock errors), use the pgbench debugging output created with
the option --debug-fails or --debug. The first option is recommended for this
purpose because with the second option the debugging output can be very large.
---
 doc/src/sgml/ref/pgbench.sgml                |  421 +++++++++-
 src/bin/pgbench/pgbench.c                    | 1160 +++++++++++++++++++++++---
 src/bin/pgbench/t/001_pgbench_with_server.pl |  372 ++++++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |    5 +
 src/fe_utils/conditional.c                   |   16 +-
 src/include/fe_utils/conditional.h           |    2 +
 6 files changed, 1756 insertions(+), 220 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 88cf8b3..12d1120 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,20 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization and/or deadlock failures (see <xref linkend="errors-and-retries"
+  endterm="errors-and-retries-title"/> for more information)
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -453,6 +457,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        The transaction with serialization or deadlock failure can be retried if
+        the total time of all its tries is less than
+        <replaceable>limit</replaceable> ms. This option can be combined with
+        the option <option>--max-tries</option> which limits the total number of
+        transaction tries. If none of them is used, the default value for the
+        option <option>--max-tries</option> is set to 1 and failed transactions
+        are not retried at all. See <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information about retrying
+        failed transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -513,22 +528,34 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions have received an
+        error in the SQL or meta command since the last report, they are also
+        reported as failed.  Under throttling (<option>-R</option>), the latency
+        is computed with respect to the transaction scheduled start time, not
+        the actual transaction beginning time, thus it also includes the average
+        schedule lag time.  If any transactions have been rolled back and
+        retried after a serialization/deadlock failure since the last report,
+        the report includes the number of such transactions and the sum of all
+        retries. Use the options <option>--max-tries</option> and/or
+        <option>--latency-limit</option> to enable transactions retries after
+        serialization/deadlock failures.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of errors and the number of
+        retries after serialization or deadlock failures in this command.  The
+        report displays retry statistics only if the maximum number of tries for
+        transactions is more than 1 (<option>--max-tries</option>) and/or the
+        maximum time of tries for transactions is used
+        (<option>--latency-limit</option>).  See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -657,6 +684,49 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--errors-detailed</option></term>
+      <listitem>
+       <para>
+        Report errors in per-transaction and aggregation logs, as well as in the
+        main and per-script reports, grouped by the following types:
+        <itemizedlist>
+          <listitem>
+            <para>
+              serialization errors;
+            </para>
+          </listitem>
+          <listitem>
+            <para>
+              deadlock errors;
+            </para>
+          </listitem>
+          <listitem>
+            <para>
+              other SQL errors;
+            </para>
+          </listitem>
+          <listitem>
+            <para>
+              errors in meta commands.
+            </para>
+          </listitem>
+        </itemizedlist>
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--debug-fails</option></term>
+      <listitem>
+       <para>
+        Print debugging output only for errors, serialization/deadlock failures
+        and retries. See <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
        <para>
@@ -667,6 +737,22 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Set the maximum number of tries for transactions with
+        serialization/deadlock failures. This option can be combined with the
+        option <option>--latency-limit</option> which limits the total time of
+        transaction tries. If none of them is used, the default value for the
+        option <option>--max-tries</option> is set to 1 and failed transactions
+        are not retried at all. See <xref linkend="errors-and-retries"
+        endterm="errors-and-retries-title"/> for more information about retrying
+        failed transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
        <para>
@@ -807,8 +893,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1583,7 +1669,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1690,19 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock failures during the current script execution. It is
+   only present when the maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). If the transaction
+   ended with an error, its <replaceable>time</replaceable> will be reported as
+   <literal>failed</literal>. If you use the option
+   <option>--errors-detailed</option> the <replaceable>time</replaceable> of the
+   failed transaction will be reported as
+   <literal>serialization_error</literal> / <literal>deadlock_error</literal> /
+   <literal>another_sql_error</literal> / <literal>meta_command_error</literal>
+   depending on the type of error (see <xref linkend="errors-and-retries"
+   endterm="errors-and-retries-title"/> for more information).
   </para>
 
   <para>
@@ -1633,6 +1732,24 @@ END;
   </para>
 
   <para>
+   The following example shows a snippet of a log file with errors and retries,
+   with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 4
+3 1 8333 0 1499414498 42848 1
+3 2 8358 0 1499414498 51219 1
+4 0 72345 0 1499414498 59433 7
+1 3 41718 0 1499414498 67879 5
+1 4 8416 0 1499414498 76311 1
+3 3 33235 0 1499414498 84469 4
+0 0 failed 0 1499414498 84905 10
+2 0 failed 0 1499414498 86248 10
+3 4 8307 0 1499414498 92788 1
+</screen>
+  </para>
+
+  <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
    can be used to log only a random sample of transactions.
@@ -1647,7 +1764,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failed_tx</replaceable> <optional> | <replaceable>serialization_errors</replaceable> <replaceable>deadlock_errors</replaceable> <replaceable>other_sql_errors</replaceable> <replaceable>meta_command_errors</replaceable> </optional> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried_tx</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1778,20 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failed_tx</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval. If you use the option
+   <option>--errors-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_errors</replaceable> is the number of transactions
+   that got a serialization failure and were not retried after this,
+   <replaceable>deadlock_errors</replaceable> is the number of transactions that
+   got a deadlock failure and were not retried after this,
+   <replaceable>other_sql_errors</replaceable> is the number of transactions
+   that got a different error in the SQL command,
+   <replaceable>meta_command_errors</replaceable> is the number of transactions
+   that got an error in the meta command.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1799,28 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried_tx</replaceable> and
+   <replaceable>retries</replaceable> fields are only present if the maximum
+   number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). They report the
+   number of retried transactions and the sum of all the retries after
+   serialization or deadlock failures within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -1695,13 +1832,45 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of errors in this statement. See
+         <xref linkend="errors-and-retries" endterm="errors-and-retries-title"/>
+         for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock failure in
+         this statement. See <xref linkend="errors-and-retries"
+         endterm="errors-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the maximum number of tries for
+   transactions is more than 1 (<option>--max-tries</option>) and/or the maximum
+   time of tries for transactions is used (<option>--latency-limit</option>).
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -1715,27 +1884,69 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-statement latencies in milliseconds:
-        0.002  \set aid random(1, 100000 * :scale)
-        0.005  \set bid random(1, 1 * :scale)
-        0.002  \set tid random(1, 10 * :scale)
-        0.001  \set delta random(-5000, 5000)
-        0.326  BEGIN;
-        0.603  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-        0.454  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-        5.528  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-        7.335  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-        0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-        1.212  END;
+statement latencies in milliseconds and errors:
+        0.002  0  \set aid random(1, 100000 * :scale)
+        0.005  0  \set bid random(1, 1 * :scale)
+        0.002  0  \set tid random(1, 10 * :scale)
+        0.001  0  \set delta random(-5000, 5000)
+        0.326  0  BEGIN;
+        0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+        0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+        5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+        7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+        0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+        1.212  0  END;
+</screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 4473/10000
+number of errors: 5527 (55.270%)
+number of serialization errors: 5527 (55.270%)
+number of retried: 7467 (74.670%)
+number of retries: 257244
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 5766/10000 (57.660 %) (including errors)
+latency average = 41.169 ms
+latency stddev = 51.783 ms
+tps = 50.322494 (including connections establishing)
+tps = 50.324595 (excluding connections establishing)
+statement latencies in milliseconds, errors and retries:
+  0.004     0       0  \set aid random(1, 100000 * :scale)
+  0.000     0       0  \set bid random(1, 1 * :scale)
+  0.000     0       0  \set tid random(1, 10 * :scale)
+  0.000     0       0  \set delta random(-5000, 5000)
+  0.213     0       0  BEGIN;
+  0.393     0       0  UPDATE pgbench_accounts
+                       SET abalance = abalance + :delta WHERE aid = :aid;
+  0.332     0       0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.409  4971  250265  UPDATE pgbench_tellers
+                       SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.311   556    6975  UPDATE pgbench_branches
+                       SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.299     0       0  INSERT INTO pgbench_history
+                              (tid, bid, aid, delta, mtime)
+                       VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  0.520     0       4  END;
 </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1960,136 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="errors-and-retries">
+  <title id="errors-and-retries-title">Errors and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types of
+   errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur..).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Directly client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur..). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but most client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed failures and errors are
+         only the directly client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the database server was lost. Otherwise if the execution of
+   SQL or meta command fails, the failed transaction is always rolled back which
+   also includes setting the client variables as they were before the run of
+   this transaction (it is assumed that one transaction script contains only one
+   transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock failures are repeated after
+   rollbacks until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating scripts that continue the transaction block from
+    the previous script: the command used to start this transaction block like
+    all the commands from previous scripts is never used on a retry.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero
+   (to group them into basic types, use the option
+   <option>--errors-detailed</option>). If the total number of retried
+   transactions is non-zero, the main report also contains the statistics
+   related to retries: the total number of retried transactions and total number
+   of retries. The per-script report inherits all these fields from the main
+   report. The per-statement report displays retry statistics only if the
+   maximum number of tries for transactions is more than 1
+   (<option>--max-tries</option>) and/or the maximum time of tries for
+   transactions is used (<option>--latency-limit</option>). A retry is reported
+   for the command where the failure occured during the current script
+   execution.
+  </para>
+
+  <para>
+   If you want to distinguish between failures or errors by type (including
+   which limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock errors), use the <application>pgbench</application>
+   debugging output created with the option <option>--debug-fails</option> or
+   <option>--debug</option>. The first option is recommended for this purpose
+   because with the second option the debugging output can be very large.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4f8700b..1ed24fd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,8 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -187,9 +189,30 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the failures and errors
+										 * (failures without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There're different types of restrictions for deciding that the current failed
+ * transaction can no longer be retried and should be reported as failed:
+ * - max_tries can be used to limit the number of tries;
+ * - latency_limit can be used to limit the total time of tries.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the failed transactions. If none of them is used, the default value of
+ * max_tries is set to 1 and failed transactions are not retried at all.
+ */
+uint32		max_tries = 0;		/* we cannot retry a failed transaction if its
+								 * number of tries reaches this maximum; if its
+								 * value is zero, it is not used */
+
+#define RETRIES_ENABLED			(max_tries > 1 || latency_limit)
+
+bool		errors_detailed = false;	/* whether to group errors in reports or
+										 * logs by basic types */
+
 char	   *pghost = "";
 char	   *pgport = "";
 char	   *login = NULL;
@@ -267,9 +290,58 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * All the transactions fall into 2 main types: is there any command that
+	 * got a failure during the last execution of the transaction script or not?
+	 * Thus
+	 *
+	 * the number of all transactions =
+	 *   cnt (the number of successful ones) +
+	 *   ecnt (the number of failed transactions).
+	 *
+	 * A successful transaction can be one of several types:
+	 *
+	 * cnt (the number of successful transactions) =
+	 *   skipped (it was to late to execute them) +
+	 *   retried (they got a serialization or a deadlock failure(s), but were
+	 *            successfully retried from the very beginning) +
+	 *   number of other transactions.
+	 *
+	 * A failed transaction can be one of several types:
+	 *
+	 * ecnt (the number of failed transactions) =
+	 *   serialization_errors (they got a serialization failure and were not
+	 *                         retried) +
+	 *   deadlock_errors (they got a deadlock failure and were not retried) +
+	 *   other_sql_errors (they got a different error in the SQL command) +
+	 *   meta_command_errors (they got an error in the meta command).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock
+	 * failure this does not guarantee that this retry was successfull. Thus
+	 *
+	 * number of retries =
+	 *   number of retries in (retried + serialization_errors + deadlock_errors)
+	 *   transactions.
+	 */
+	int64		cnt;			/* number of sucessfull transactions, including
+								 * skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock failure in all the transactions */
+	int64		retried;		/* number of transactions that were retried
+								 * after a serialization or a deadlock
+								 * failure */
+	int64		serialization_errors;	/* number of transactions that were not
+										 * retried after a serialization
+										 * failure */
+	int64		deadlock_errors;	/* number of transactions that were not
+									 * retried after a deadlock failure */
+	int64		other_sql_errors;	/* number of transactions with a different
+									 * error in the SQL command */
+	int64		meta_command_errors;	/* number of transactions with an error
+										 * in the meta command */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -283,6 +355,29 @@ typedef struct
 } RandomState;
 
 /*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * For the failures during script execution.
+ */
+typedef enum FailureStatus
+{
+	NO_FAILURE = 0,
+	META_COMMAND_FAILURE,
+	ANOTHER_SQL_FAILURE,		/* other failures in SQL commands that are not
+								 * listed by themselves below */
+	SERIALIZATION_FAILURE,
+	DEADLOCK_FAILURE
+} FailureStatus;
+
+/*
  * Connection state machine states.
  */
 typedef enum
@@ -337,6 +432,23 @@ typedef enum
 	CSTATE_END_COMMAND,
 
 	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_FAILURE abort the current
+	 * transaction execution, roll back the failed transaction block if any and
+	 * clear the conditional stack.  Then go to CSTATE_RETRY.  If this is a
+	 * serialization or deadlock failure and we can re-execute the transaction
+	 * from the very beginning, report this as a failure, set the same
+	 * parameters for the transaction execution as in the previous tries and
+	 * process the first transaction command in CSTATE_START_COMMAND. Otherwise,
+	 * report this as an error, set the parameters for the transaction execution
+	 * as they were before the first run of this transaction (except for a
+	 * random state) and go to CSTATE_END_TX to complete this transaction.
+	 */
+	CSTATE_FAILURE,
+	CSTATE_RETRY,
+
+	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
 	 * current connection.  Chooses the next script to execute and starts over
@@ -383,6 +495,18 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing errors and repeating transactions with serialization or
+	 * deadlock failures:
+	 */
+	FailureStatus failure_status;	/* the failure status of the current
+									 * transaction execution; this is NO_FAILURE
+									 * if there were no failures */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction after a serialization or
+								 * a deadlock failure? */
+
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
@@ -445,7 +569,8 @@ typedef struct
 	instr_time	start_time;		/* thread start time */
 	instr_time	conn_time;
 	StatsData	stats;
-	int64		latency_late;	/* executed but late transactions */
+	int64		latency_late;	/* executed but late transactions (including
+								 * errors) */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -491,6 +616,9 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock failure */
+	int64		ecnt;			/* number of failures that were not retried */
 } Command;
 
 typedef struct ParsedScript
@@ -560,9 +688,15 @@ typedef enum ErrorLevel
 	DEBUG,
 
 	/*
+	 * Normal failure of the SQL/meta command, or processing of the failed
+	 * transaction (its end/retry).
+	 */
+	DEBUG_FAIL,
+
+	/*
 	 * To report:
-	 * - abortion of the client (something bad e.g. the SQL/meta command failed
-	 *   or the connection with the backend was lost);
+	 * - abortion of the client (something serious e.g. connection with the
+	 *   backend was lost);
 	 * - the log messages of the main program;
 	 * - PGBENCH_DEBUG messages.
 	 */
@@ -596,7 +730,6 @@ static void finishCon(CState *st);
 static void pgbench_error(ErrorLevel elevel,
 						  const char *fmt,...) pg_attribute_printf(2, 3);
 
-
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,						/* don't need get_variable functionality */
@@ -643,15 +776,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, errors and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --debug-fails            print debugging output only for failures and errors\n"
+		   "  --errors-detailed        report the errors grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1112,6 +1248,12 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_errors = 0;
+	sd->deadlock_errors = 0;
+	sd->other_sql_errors = 0;
+	sd->meta_command_errors = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1120,10 +1262,47 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   FailureStatus failure_status, int64 retries)
 {
-	stats->cnt++;
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	/* Record the failed transaction */
+	if (failure_status != NO_FAILURE)
+	{
+		if (failure_status == SERIALIZATION_FAILURE)
+		{
+			stats->serialization_errors++;
+		}
+		else if (failure_status == DEADLOCK_FAILURE)
+		{
+			stats->deadlock_errors++;
+		}
+		else if (failure_status == ANOTHER_SQL_FAILURE)
+		{
+			stats->other_sql_errors++;
+		}
+		else if (failure_status == META_COMMAND_FAILURE)
+		{
+			stats->meta_command_errors++;
+		}
+		else
+		{
+			/* internal error which should never occur */
+			pgbench_error(ERROR, "unexpected failure status: %d\n",
+						  failure_status);
+		}
+		return;
+	}
 
+	/* Record the successful transaction */
+	stats->cnt++;
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
@@ -1376,7 +1555,7 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			pgbench_error(LOG,
+			pgbench_error(DEBUG_FAIL,
 						  "malformed variable \"%s\" value: \"%s\"\n",
 						  var->name, var->svalue);
 			return false;
@@ -1505,7 +1684,7 @@ lookupCreateVariable(Variables *variables, const char *context, char *name,
 	 * About the error level used: if we process client commands, it a normal
 	 * failure; otherwise it is not and we exit the program.
 	 */
-	ErrorLevel elevel = client ? LOG : ERROR;
+	ErrorLevel elevel = client ? DEBUG_FAIL : ERROR;
 
 	var = lookupVariable(variables, name);
 	if (var == NULL)
@@ -1726,11 +1905,12 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		/* we are sure that the function valueTypeName only is always called */
-		Assert(LOG >= log_min_messages);
-
-		pgbench_error(LOG, "cannot coerce %s to boolean\n",
-					  valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (DEBUG_FAIL >= log_min_messages)
+		{
+			pgbench_error(DEBUG_FAIL, "cannot coerce %s to boolean\n",
+						  valueTypeName(pval));
+		}
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1775,7 +1955,7 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			pgbench_error(LOG, "double to int overflow for %f\n", dval);
+			pgbench_error(DEBUG_FAIL, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1783,10 +1963,12 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		/* we are sure that the function valueTypeName is always called */
-		Assert(LOG >= log_min_messages);
-
-		pgbench_error(LOG, "cannot coerce %s to int\n", valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (DEBUG_FAIL >= log_min_messages)
+		{
+			pgbench_error(DEBUG_FAIL, "cannot coerce %s to int\n",
+						  valueTypeName(pval));
+		}
 		return false;
 	}
 }
@@ -1807,10 +1989,12 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		/* we are sure that the function valueTypeName is always called */
-		Assert(LOG >= log_min_messages);
-
-		pgbench_error(LOG, "cannot coerce %s to double\n", valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (DEBUG_FAIL >= log_min_messages)
+		{
+			pgbench_error(DEBUG_FAIL, "cannot coerce %s to double\n",
+						  valueTypeName(pval));
+		}
 		return false;
 	}
 }
@@ -1991,7 +2175,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		pgbench_error(LOG, "too many function arguments, maximum is %d\n",
+		pgbench_error(DEBUG_FAIL,
+					  "too many function arguments, maximum is %d\n",
 					  MAX_FARGS);
 		return false;
 	}
@@ -2115,7 +2300,7 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								pgbench_error(LOG, "division by zero\n");
+								pgbench_error(DEBUG_FAIL, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -2126,7 +2311,7 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										pgbench_error(LOG,
+										pgbench_error(DEBUG_FAIL,
 													  "bigint out of range\n");
 										return false;
 									}
@@ -2393,13 +2578,13 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					pgbench_error(LOG, "empty range given to random\n");
+					pgbench_error(DEBUG_FAIL, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					pgbench_error(LOG, "random range is too large\n");
+					pgbench_error(DEBUG_FAIL, "random range is too large\n");
 					return false;
 				}
 
@@ -2421,7 +2606,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							pgbench_error(LOG,
+							pgbench_error(DEBUG_FAIL,
 										  "gaussian parameter must be at least %f (not %f)\n",
 										  MIN_GAUSSIAN_PARAM, param);
 							return false;
@@ -2435,7 +2620,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							pgbench_error(LOG,
+							pgbench_error(DEBUG_FAIL,
 										  "zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
 										  MAX_ZIPFIAN_PARAM, param);
 							return false;
@@ -2449,7 +2634,7 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							pgbench_error(LOG,
+							pgbench_error(DEBUG_FAIL,
 										  "exponential parameter must be greater than zero (got %f)\n",
 										  param);
 							return false;
@@ -2562,7 +2747,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					pgbench_error(LOG, "undefined variable \"%s\"\n",
+					pgbench_error(DEBUG_FAIL, "undefined variable \"%s\"\n",
 								  expr->u.variable.varname);
 					return false;
 				}
@@ -2662,7 +2847,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+			pgbench_error(DEBUG_FAIL, "%s: undefined variable \"%s\"\n",
 						  argv[0], argv[i]);
 			return false;
 		}
@@ -2670,7 +2855,8 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			pgbench_error(LOG, "%s: shell command is too long\n", argv[0]);
+			pgbench_error(DEBUG_FAIL, "%s: shell command is too long\n",
+						  argv[0]);
 			return false;
 		}
 
@@ -2689,8 +2875,8 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		{
 			if (!timer_exceeded)
 			{
-				pgbench_error(LOG, "%s: could not launch shell command\n",
-							  argv[0]);
+				pgbench_error(DEBUG_FAIL,
+							  "%s: could not launch shell command\n", argv[0]);
 			}
 			return false;
 		}
@@ -2700,14 +2886,16 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		pgbench_error(LOG, "%s: could not launch shell command\n", argv[0]);
+		pgbench_error(DEBUG_FAIL, "%s: could not launch shell command\n",
+					  argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
 		if (!timer_exceeded)
 		{
-			pgbench_error(LOG, "%s: could not read result of shell command\n",
+			pgbench_error(DEBUG_FAIL,
+						  "%s: could not read result of shell command\n",
 						  argv[0]);
 		}
 		(void) pclose(fp);
@@ -2715,7 +2903,8 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	}
 	if (pclose(fp) < 0)
 	{
-		pgbench_error(LOG, "%s: could not close shell command\n", argv[0]);
+		pgbench_error(DEBUG_FAIL, "%s: could not close shell command\n",
+					  argv[0]);
 		return false;
 	}
 
@@ -2725,7 +2914,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		pgbench_error(LOG,
+		pgbench_error(DEBUG_FAIL,
 					  "%s: shell command must return an integer (not \"%s\")\n",
 					  argv[0], res);
 		return false;
@@ -2747,11 +2936,20 @@ preparedStatementName(char *buffer, int file, int state)
 }
 
 static void
-commandFailed(CState *st, const char *cmd, const char *message)
+commandFailed(CState *st, const char *cmd, const char *message, bool aborted)
 {
-	pgbench_error(LOG,
-				  "client %d aborted in command %d (%s) of script %d; %s\n",
-				  st->id, st->command, cmd, st->use_file, message);
+	if (aborted)
+	{
+		pgbench_error(LOG,
+					  "client %d aborted in command %d (%s) of script %d; %s\n",
+					  st->id, st->command, cmd, st->use_file, message);
+	}
+	else
+	{
+		pgbench_error(DEBUG_FAIL,
+					  "client %d got a failure in command %d (%s) of script %d; %s\n",
+					  st->id, st->command, cmd, st->use_file, message);
+	}
 }
 
 /* return a script number with a weighted choice. */
@@ -2850,7 +3048,6 @@ sendCommand(CState *st, Command *command)
 	{
 		pgbench_error(DEBUG, "client %d could not send %s\n",
 					  st->id, command->argv[0]);
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2871,7 +3068,7 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+			pgbench_error(DEBUG_FAIL, "%s: undefined variable \"%s\"\n",
 						  argv[0], argv[1]);
 			return false;
 		}
@@ -2895,6 +3092,225 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 }
 
 /*
+ * Get the number of all processed transactions including skipped ones and
+ * errors.
+ */
+static int64
+getTotalCnt(const CState *st)
+{
+	return st->cnt + st->ecnt;
+}
+
+static int64
+getEcnt(const StatsData *stats)
+{
+	return (stats->serialization_errors +
+			stats->deadlock_errors +
+			stats->other_sql_errors +
+			stats->meta_command_errors);
+}
+
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ * On failure (too many variables are used in the source) returns false.
+ */
+static bool
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_vars,
+				   *source_var;
+	int			nvars;
+
+	if (!dest || !source || dest == source)
+		return true;			/* nothing to do here */
+
+	source_vars = source->vars;
+	nvars = source->nvars;
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+
+	clearVariables(dest);
+
+	/*
+	 * In case of an error the client will be aborted so always print an error
+	 * message.
+	 */
+	if (!enlargeVariables(dest, NULL, nvars, LOG))
+		return false;
+
+	/* Make a deep copy of variables array */
+	for (source_var = source_vars, dest_var = dest->vars;
+		 source_var - source_vars < nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		if (source_var->svalue == NULL)
+			dest_var->svalue = NULL;
+		else
+			dest_var->svalue = pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = nvars;
+	dest->vars_sorted = source->vars_sorted;
+	return true;
+}
+
+/*
+ * Returns true if this type of failure can be retried.
+ */
+static bool
+canRetryFailure(FailureStatus failure_status)
+{
+	return (failure_status == SERIALIZATION_FAILURE ||
+			failure_status == DEADLOCK_FAILURE);
+}
+
+/*
+ * Returns true if the failure can be retried.
+ */
+static bool
+canRetry(CState *st, instr_time *now)
+{
+	FailureStatus failure_status = st->failure_status;
+
+	Assert(failure_status != NO_FAILURE);
+
+	/* We can only retry serialization or deadlock failures. */
+	if (!canRetryFailure(failure_status))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of failed
+	 * transactions.
+	 */
+	Assert(max_tries || latency_limit);
+
+	/*
+	 * We cannot retry the failure if we have reached the maximum number of
+	 * tries.
+	 */
+	if (max_tries && st->retries + 1 >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the failure if we spent too much time on this
+	 * transaction.
+	 */
+	if (latency_limit)
+	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
+		if (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled >= latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Get the failure status from the error code.
+ */
+static FailureStatus
+getSQLFailureStatus(char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return SERIALIZATION_FAILURE;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return DEADLOCK_FAILURE;
+	}
+
+	return ANOTHER_SQL_FAILURE;
+}
+
+static bool
+inTransactionBlock(CState *st)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(st->con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			return false;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			return true;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(st->con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pgbench_error(LOG,
+							  "client %d aborted while receiving the transaction status; perhaps the backend died while processing\n",
+							  st->id);
+				st->state = CSTATE_ABORTED;
+				return false;	/* return value does not matter */
+			}
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pgbench_error(LOG,
+						  "client %d aborted while receiving the transaction status; unexpected transaction status %d\n",
+						  st->id, tx_status);
+			st->state = CSTATE_ABORTED;
+			return false;		/* return value does not matter */
+	}
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, instr_time *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	if (INSTR_TIME_IS_ZERO(*now))
+		INSTR_TIME_SET_CURRENT(*now);
+
+	return (100.0 * (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled) /
+			latency_limit);
+}
+
+/*
  * Advance the state machine of a connection, if possible.
  */
 static void
@@ -2944,6 +3360,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->failure_status = NO_FAILURE;
+				st->retries = 0;
+
 				break;
 
 				/*
@@ -2991,7 +3412,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						INSTR_TIME_SET_CURRENT(now);
 					now_us = INSTR_TIME_GET_MICROSEC(now);
 					while (thread->throttle_trigger < now_us - latency_limit &&
-						   (nxacts <= 0 || st->cnt < nxacts))
+						   (nxacts <= 0 || getTotalCnt(st) < nxacts))
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
@@ -3001,7 +3422,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						st->txn_scheduled = thread->throttle_trigger;
 					}
 					/* stop client if -t exceeded */
-					if (nxacts > 0 && st->cnt >= nxacts)
+					if (nxacts > 0 && getTotalCnt(st) >= nxacts)
 					{
 						st->state = CSTATE_FINISHED;
 						break;
@@ -3057,6 +3478,22 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters just in case if it fails or we should repeat it in
+				 * future.
+				 */
+				memcpy(&(st->retry_state.random_state), &(st->random_state),
+					   sizeof(RandomState));
+				if (!copyVariables(&st->retry_state.variables, &st->variables))
+				{
+					pgbench_error(LOG,
+								  "client %d aborted when preparing to execute a transaction\n",
+								  st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
 				 */
@@ -3100,7 +3537,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3111,7 +3548,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						commandFailed(st, "SQL", "SQL command send failed",
+									  true);
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -3151,8 +3589,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "sleep",
+										  "execution of meta-command failed",
+										  false);
+							st->failure_status = META_COMMAND_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
@@ -3183,18 +3624,24 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, argv[0],
+										  "evaluation of meta-command failed",
+										  false);
+							st->failure_status = META_COMMAND_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(&st->variables, argv[0],
-												  argv[1], &result,  true))
+							if (!putVariableValue(&st->variables,  argv[0],
+												  argv[1], &result, true))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								commandFailed(st, "set",
+											  "assignment of meta-command failed",
+											  false);
+								st->failure_status = META_COMMAND_FAILURE;
+								st->state = CSTATE_FAILURE;
 								break;
 							}
 						}
@@ -3255,8 +3702,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "setshell",
+										  "execution of meta-command failed",
+										  false);
+							st->failure_status = META_COMMAND_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3276,8 +3726,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							commandFailed(st, "shell",
+										  "execution of meta-command failed",
+										  false);
+							st->failure_status = META_COMMAND_FAILURE;
+							st->state = CSTATE_FAILURE;
 							break;
 						}
 						else
@@ -3393,36 +3846,54 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the current SQL command to complete
 				 */
 			case CSTATE_WAIT_RESULT:
-				command = sql_script[st->use_file].commands[st->command];
-				pgbench_error(DEBUG, "client %d receiving\n", st->id);
-				if (!PQconsumeInput(st->con))
-				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
-					st->state = CSTATE_ABORTED;
-					break;
-				}
-				if (PQisBusy(st->con))
-					return;		/* don't have the whole result yet */
-
-				/*
-				 * Read and discard the query result;
-				 */
-				res = PQgetResult(st->con);
-				switch (PQresultStatus(res))
 				{
-					case PGRES_COMMAND_OK:
-					case PGRES_TUPLES_OK:
-					case PGRES_EMPTY_QUERY:
-						/* OK */
-						PQclear(res);
-						discard_response(st);
-						st->state = CSTATE_END_COMMAND;
-						break;
-					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
-						PQclear(res);
+					char	   *sqlState;
+
+					command = sql_script[st->use_file].commands[st->command];
+					pgbench_error(DEBUG, "client %d receiving\n", st->id);
+					if (!PQconsumeInput(st->con))
+					{				/* there's something wrong */
+						commandFailed(st, "SQL",
+									  "perhaps the backend died while processing",
+									  true);
 						st->state = CSTATE_ABORTED;
 						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+						case PGRES_TUPLES_OK:
+						case PGRES_EMPTY_QUERY:
+							/* OK */
+							st->failure_status = NO_FAILURE;
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_END_COMMAND;
+							break;
+						case PGRES_NONFATAL_ERROR:
+						case PGRES_FATAL_ERROR:
+							st->failure_status = getSQLFailureStatus(sqlState);
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  false);
+							PQclear(res);
+							discard_response(st);
+							st->state = CSTATE_FAILURE;
+							break;
+						default:
+							commandFailed(st, "SQL", PQerrorMessage(st->con),
+										  true);
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 				}
 				break;
 
@@ -3451,7 +3922,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3469,6 +3940,169 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+			/*
+			 * Clean up after a failed transaction.
+			 */
+			case CSTATE_FAILURE:
+				{
+					bool		in_tx_block;
+
+					Assert(st->failure_status != NO_FAILURE);
+
+					/*
+					 * Check if we have a failed transaction block or not, and
+					 * roll it back if any.
+					 */
+					in_tx_block = inTransactionBlock(st);
+					if (st->state == CSTATE_ABORTED)
+						break;		/* there's something wrong */
+					if (in_tx_block)
+					{
+						PGresult   *res;
+
+						res = PQexec(st->con, "ROLLBACK");
+						if (PQresultStatus(res) != PGRES_COMMAND_OK)
+						{
+							/*
+							 * We are sure that the function PQerrorMessage is
+							 * always called.
+							 */
+							Assert(LOG >= log_min_messages);
+
+							pgbench_error(LOG,
+										  "client %d aborted during the termination of the failed transaction block; %s\n",
+										  st->id, PQerrorMessage(st->con));
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+						}
+						PQclear(res);
+					}
+
+					/* Clear the conditional stack */
+					conditional_stack_reset(st->cstack);
+
+					/* Check if we can retry the failure */
+					st->state = CSTATE_RETRY;
+				}
+				break;
+
+			/*
+			 * Retry the failed transaction if possible.
+			 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				if (canRetry(st, &now))
+				{
+					/*
+					 * The failed transaction will be retried. So accumulate
+					 * the retry.
+					 */
+					st->retries++;
+					command->retries++;
+
+					/*
+					 * Inform that the failed transaction will be retried.
+					 * Allocate memory for the message only if necessary.
+					 */
+					if (DEBUG_FAIL >= log_min_messages)
+					{
+						PQExpBufferData errmsg_buf;
+
+						initPQExpBuffer(&errmsg_buf);
+						printfPQExpBuffer(&errmsg_buf,
+										  "client %d repeats the failed transaction (try %d",
+										  st->id, st->retries + 1);
+						if (max_tries)
+							appendPQExpBuffer(&errmsg_buf, "/%d", max_tries);
+						if (latency_limit)
+						{
+							appendPQExpBuffer(&errmsg_buf,
+											  ", %.3f%% of the maximum time of tries was used",
+											  getLatencyUsed(st, &now));
+						}
+						appendPQExpBufferStr(&errmsg_buf, ")\n");
+						pgbench_error(DEBUG_FAIL, "%s", errmsg_buf.data);
+						termPQExpBuffer(&errmsg_buf);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction.
+					 */
+					memcpy(&(st->random_state), &(st->retry_state.random_state),
+						   sizeof(RandomState));
+					if (!copyVariables(&st->variables,
+									   &st->retry_state.variables))
+					{
+						pgbench_error(LOG,
+									  "client %d aborted when preparing to retry the failed transaction\n",
+									  st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					/* Process the first transaction command */
+					st->command = 0;
+					st->failure_status = NO_FAILURE;
+					st->state = CSTATE_START_COMMAND;
+				}
+				else
+				{
+					/*
+					 * We will not be able to retry this failed transaction.
+					 * So accumulate the error.
+					 */
+					command->ecnt++;
+
+					/*
+					 * If this is a serialization or deadlock failure, inform
+					 * that the failed transaction will not be retried. Allocate
+					 * memory for the message only if necessary.
+					 */
+					if (DEBUG_FAIL >= log_min_messages &&
+						canRetryFailure(st->failure_status))
+					{
+						PQExpBufferData errmsg_buf;
+
+						initPQExpBuffer(&errmsg_buf);
+						printfPQExpBuffer(&errmsg_buf,
+										  "client %d ends the failed transaction (try %d",
+										  st->id, st->retries + 1);
+						if (max_tries)
+							appendPQExpBuffer(&errmsg_buf, "/%d", max_tries);
+						if (latency_limit)
+						{
+							appendPQExpBuffer(&errmsg_buf,
+											  ", %.3f%% of the maximum time of tries was used",
+											  getLatencyUsed(st, &now));
+						}
+						appendPQExpBufferStr(&errmsg_buf, ")\n");
+						pgbench_error(DEBUG_FAIL, "%s", errmsg_buf.data);
+						termPQExpBuffer(&errmsg_buf);
+					}
+
+					/*
+					 * Reset the execution parameters as they were at the
+					 * beginning of the transaction except for a random
+					 * state.
+					 */
+					if (!copyVariables(&st->variables,
+									   &st->retry_state.variables))
+					{
+						pgbench_error(LOG,
+									  "client %d aborted when ending the failed transaction\n",
+									  st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					/* End the failed transaction */
+					st->state = CSTATE_END_TX;
+				}
+				break;
+
 				/*
 				 * End of transaction.
 				 */
@@ -3490,7 +4124,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					INSTR_TIME_SET_ZERO(now);
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				if ((getTotalCnt(st) >= nxacts && duration <= 0) ||
+					timer_exceeded)
 				{
 					/* exit success */
 					st->state = CSTATE_FINISHED;
@@ -3573,6 +4208,20 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (errors_detailed)
+			{
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_errors,
+						agg->deadlock_errors,
+						agg->other_sql_errors,
+						agg->meta_command_errors);
+			}
+			else
+			{
+				fprintf(logfile, " " INT64_FORMAT, getEcnt(agg));
+			}
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3583,6 +4232,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (RETRIES_ENABLED)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3590,7 +4243,7 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->failure_status, st->retries);
 	}
 	else
 	{
@@ -3600,14 +4253,46 @@ doLog(TState *thread, CState *st,
 		gettimeofday(&tv, NULL);
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
-					st->id, st->cnt, st->use_file,
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->failure_status == NO_FAILURE)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
-					st->id, st->cnt, latency, st->use_file,
+					st->id, getTotalCnt(st), latency, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (errors_detailed)
+		{
+			if (st->failure_status == SERIALIZATION_FAILURE)
+				fprintf(logfile, "%d " INT64_FORMAT " serialization_error %d %ld %ld",
+						st->id, getTotalCnt(st), st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+			else if (st->failure_status == DEADLOCK_FAILURE)
+				fprintf(logfile, "%d " INT64_FORMAT " deadlock_error %d %ld %ld",
+						st->id, getTotalCnt(st), st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+			else if (st->failure_status == ANOTHER_SQL_FAILURE)
+				fprintf(logfile, "%d " INT64_FORMAT " another_sql_error %d %ld %ld",
+						st->id, getTotalCnt(st), st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+			else if (st->failure_status == META_COMMAND_FAILURE)
+				fprintf(logfile, "%d " INT64_FORMAT " meta_command_error %d %ld %ld",
+						st->id, getTotalCnt(st), st->use_file,
+						(long) tv.tv_sec, (long) tv.tv_usec);
+			else
+			{
+				/* internal error which should never occur */
+				pgbench_error(ERROR, "unexpected failure status: %d",
+							  st->failure_status);
+			}
+		}
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, getTotalCnt(st), st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (RETRIES_ENABLED)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3624,10 +4309,11 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped &&
+		(st->failure_status == NO_FAILURE || latency_limit))
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3637,30 +4323,29 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->failure_status,
+			   st->retries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
-	st->cnt++;
+	if (st->failure_status == NO_FAILURE)
+		st->cnt++;
+	else
+		st->ecnt++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+	{
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->failure_status, st->retries);
+	}
 }
 
 
@@ -4821,7 +5506,9 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		ecnt = getEcnt(total);
+	int64		ntx = total->cnt - total->skipped,
+				total_ntx = total->cnt + ecnt;
 	int			i,
 				totalCacheOverflows = 0;
 
@@ -4842,8 +5529,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
-		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+		printf("number of transactions actually processed: " INT64_FORMAT "/" INT64_FORMAT "\n",
+			   ntx, total_ntx);
 	}
 	else
 	{
@@ -4851,6 +5538,67 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
 			   ntx);
 	}
+
+	if (ecnt > 0)
+	{
+		printf("number of errors: " INT64_FORMAT " (%.3f%%)\n",
+			   ecnt, 100.0 * ecnt / total_ntx);
+
+		if (errors_detailed)
+		{
+			/* SQL errors */
+			if (total->serialization_errors || total->other_sql_errors)
+			{
+				printf("number of serialization errors: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_errors,
+					   100.0 * total->serialization_errors / total_ntx);
+			}
+			if (total->deadlock_errors || total->other_sql_errors)
+			{
+				printf("number of deadlock errors: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_errors,
+					   100.0 * total->deadlock_errors / total_ntx);
+			}
+			if (total->other_sql_errors)
+			{
+				printf("number of other SQL errors: " INT64_FORMAT " (%.3f%%)\n",
+					   total->other_sql_errors,
+					   100.0 * total->other_sql_errors / total_ntx);
+			}
+
+			/* errors in meta commands */
+			if (total->meta_command_errors > 0)
+			{
+				printf("number of errors in meta-commands: " INT64_FORMAT " (%.3f%%)\n",
+					   total->meta_command_errors,
+					   100.0 * total->meta_command_errors / total_ntx);
+			}
+		}
+	}
+
+	/*
+	 * It can be non-zero only if max_tries is greater than one or
+	 * latency_limit is used.
+	 */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_ntx);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
+	if (latency_limit)
+	{
+		/* this statistics includes both successful and failed transactions */
+		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)%s\n",
+			   latency_limit / 1000.0, latency_late, total_ntx,
+			   (total_ntx > 0) ? 100.0 * latency_late / total_ntx : 0.0,
+			   ecnt > 0 ? " (including errors)" : "");
+	}
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4870,18 +5618,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			   total->skipped,
 			   100.0 * total->skipped / total->cnt);
 
-	if (latency_limit)
-		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
-
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   1000.0 * time_include * nclients / total_ntx,
+			   ecnt > 0 ? " (including errors)" : "");
 	}
 
 	if (throttle_delay)
@@ -4900,7 +5644,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4909,6 +5653,8 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_ecnt = getEcnt(sstats);
+				int64		script_total_ntx = sstats->cnt + script_ecnt;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4920,6 +5666,62 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   100.0 * sstats->cnt / total->cnt,
 					   (sstats->cnt - sstats->skipped) / time_include);
 
+				if (ecnt > 0)
+				{
+					printf(" - number of errors: " INT64_FORMAT " (%.3f%%)\n",
+						   script_ecnt, 100.0 * script_ecnt / script_total_ntx);
+
+					if (errors_detailed)
+					{
+						/* SQL errors */
+						if (total->serialization_errors ||
+							total->other_sql_errors)
+						{
+							printf(" - number of serialization errors: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_errors,
+								   (100.0 * sstats->serialization_errors /
+									script_total_ntx));
+						}
+						if (total->deadlock_errors ||
+							total->other_sql_errors)
+						{
+							printf(" - number of deadlock errors: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_errors,
+								   (100.0 * sstats->deadlock_errors /
+									script_total_ntx));
+						}
+						if (total->other_sql_errors)
+						{
+							printf(" - number of other SQL errors: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->other_sql_errors,
+								   (100.0 * sstats->other_sql_errors /
+									script_total_ntx));
+						}
+
+						/* errors in meta commands */
+						if (total->meta_command_errors > 0)
+						{
+							printf(" - number of errors in meta-commands: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->meta_command_errors,
+								   (100.0 * sstats->meta_command_errors /
+									script_total_ntx));
+						}
+					}
+				}
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * latency_limit is used.
+				 */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_ntx);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
 				if (throttle_delay && latency_limit && sstats->cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
@@ -4928,15 +5730,16 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/* Report per-command latencies and errors */
+			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (RETRIES_ENABLED ?
+						", errors and retries" :
+						" and errors"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4944,10 +5747,23 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+					if (RETRIES_ENABLED)
+					{
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->ecnt,
+							   (*commands)->retries,
+							   (*commands)->line);
+					}
+					else
+					{
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->ecnt,
+							   (*commands)->line);
+					}
 				}
 			}
 		}
@@ -5038,7 +5854,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5057,6 +5873,9 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"debug-fails", no_argument, NULL, 10},
+		{"errors-detailed", no_argument, NULL, 11},
+		{"max-tries", required_argument, NULL, 12},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5218,7 +6037,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -5409,6 +6228,28 @@ main(int argc, char **argv)
 								  "error while setting random seed from --random-seed option\n");
 				}
 				break;
+			case 10:			/* debug-fails */
+				/* do not conflict with the option --debug */
+				if (log_min_messages > DEBUG_FAIL)
+					log_min_messages = DEBUG_FAIL;
+				break;
+			case 11:			/* errors-detailed */
+				errors_detailed = true;
+				break;
+			case 12:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg <= 0)
+					{
+						pgbench_error(ERROR,
+									  "invalid number of maximum tries: \"%s\"\n",
+									  optarg);
+					}
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
 			default:
 				pgbench_error(ERROR,
 							  _("Try \"%s --help\" for more information.\n"),
@@ -5578,6 +6419,10 @@ main(int argc, char **argv)
 					  "--progress-timestamp is allowed only under --progress\n");
 	}
 
+	/* If necessary set the default tries limit  */
+	if (!max_tries && !latency_limit)
+		max_tries = 1;
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5861,6 +6706,12 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_errors += thread->stats.serialization_errors;
+		stats.deadlock_errors += thread->stats.deadlock_errors;
+		stats.other_sql_errors += thread->stats.other_sql_errors;
+		stats.meta_command_errors += thread->stats.meta_command_errors;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -6164,7 +7015,10 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							ntx,
+							retries,
+							retried,
+							ecnt;
 				double		tps,
 							total_run,
 							latency,
@@ -6192,6 +7046,14 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.serialization_errors +=
+						thread[i].stats.serialization_errors;
+					cur.deadlock_errors += thread[i].stats.deadlock_errors;
+					cur.other_sql_errors += thread[i].stats.other_sql_errors;
+					cur.meta_command_errors +=
+						thread[i].stats.meta_command_errors;
 				}
 
 				/* we count only actually executed transactions */
@@ -6209,6 +7071,9 @@ threadRun(void *arg)
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				retries = cur.retries - last.retries;
+				retried = cur.retried - last.retried;
+				ecnt = getEcnt(&cur) - getEcnt(&last);
 
 				if (progress_timestamp)
 				{
@@ -6241,6 +7106,12 @@ threadRun(void *arg)
 								  "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 								  tbuf, tps, latency, stdev);
 
+				if (ecnt > 0)
+				{
+					appendPQExpBuffer(&progress_buf,
+									  ", " INT64_FORMAT " failed", ecnt);
+				}
+
 				if (throttle_delay)
 				{
 					appendPQExpBuffer(&progress_buf, ", lag %.3f ms", lag);
@@ -6249,6 +7120,17 @@ threadRun(void *arg)
 										  ", " INT64_FORMAT " skipped",
 										  cur.skipped - last.skipped);
 				}
+
+				/*
+				 * It can be non-zero only if max_tries is greater than one or
+				 * latency_limit is used.
+				 */
+				if (retried > 0)
+				{
+					appendPQExpBuffer(&progress_buf,
+									  ", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+									  retried, retries);
+				}
 				appendPQExpBufferChar(&progress_buf, '\n');
 
 				pgbench_error(LOG, "%s", progress_buf.data);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 2fc021d..2114329 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -5,9 +5,20 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 
+use constant
+{
+	SQL_ERROR           => 0,
+	META_COMMAND_ERROR  => 1,
+	SYNTAX_ERROR        => 2,
+};
+
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench
@@ -136,7 +147,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -530,11 +542,12 @@ pgbench(
 # trigger many expression errors
 my @errors = (
 
-	# [ test name, expected status, expected stderr, script ]
+	# [ test name, expected status, error type, expected stderr, script ]
 	# SQL
 	[
 		'sql syntax error',
 		0,
+		SQL_ERROR,
 		[
 			qr{ERROR:  syntax error},
 			qr{prepared statement .* does not exist}
@@ -544,28 +557,36 @@ my @errors = (
 }
 	],
 	[
-		'sql too many args', 1, [qr{statement has too many arguments.*\b9\b}],
+		'sql too many args', 1, SYNTAX_ERROR,
+		[qr{statement has too many arguments.*\b9\b}],
 		q{-- MAX_ARGS=10 for prepared
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 }
 	],
+	[   'sql division by zero', 0, SQL_ERROR, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+SELECT 1 / 0;
+}
+	],
 
 	# SHELL
 	[
-		'shell bad command',                    0,
+		'shell bad command', 0, META_COMMAND_ERROR,
 		[qr{\(shell\) .* meta-command failed}], q{\shell no-such-command}
 	],
 	[
-		'shell undefined variable', 0,
+		'shell undefined variable', 0, META_COMMAND_ERROR,
 		[qr{undefined variable ":nosuchvariable"}],
 		q{-- undefined variable in shell
 \shell echo ::foo :nosuchvariable
 }
 	],
-	[ 'shell missing command', 1, [qr{missing command }], q{\shell} ],
+	[   'shell missing command', 1, SYNTAX_ERROR, [qr{missing command }],
+		q{\shell} ],
 	[
-		'shell too many args', 1, [qr{too many arguments in command "shell"}],
+		'shell too many args', 1, SYNTAX_ERROR,
+		[qr{too many arguments in command "shell"}],
 		q{-- 257 arguments to \shell
 \shell echo \
  0 1 2 3 4 5 6 7 8 9 A B C D E F \
@@ -589,162 +610,232 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 
 	# SET
 	[
-		'set syntax error',                  1,
+		'set syntax error', 1, SYNTAX_ERROR,
 		[qr{syntax error in command "set"}], q{\set i 1 +}
 	],
 	[
-		'set no such function',         1,
+		'set no such function', 1, SYNTAX_ERROR,
 		[qr{unexpected function name}], q{\set i noSuchFunction()}
 	],
 	[
-		'set invalid variable name', 0,
+		'set invalid variable name', 0, META_COMMAND_ERROR,
 		[qr{invalid variable name}], q{\set . 1}
 	],
 	[
-		'set int overflow',                   0,
+		'set int overflow', 0, META_COMMAND_ERROR,
 		[qr{double to int overflow for 100}], q{\set i int(1E32)}
 	],
-	[ 'set division by zero', 0, [qr{division by zero}], q{\set i 1/0} ],
 	[
-		'set bigint out of range', 0,
+		'set division by zero', 0, META_COMMAND_ERROR,
+		[qr{division by zero}], q{\set i 1/0}
+	],
+	[
+		'set bigint out of range', 0, META_COMMAND_ERROR,
 		[qr{bigint out of range}], q{\set i 9223372036854775808 / -1}
 	],
 	[
 		'set undefined variable',
 		0,
+		META_COMMAND_ERROR,
 		[qr{undefined variable "nosuchvariable"}],
 		q{\set i :nosuchvariable}
 	],
-	[ 'set unexpected char', 1, [qr{unexpected character .;.}], q{\set i ;} ],
+	[
+		'set unexpected char', 1, SYNTAX_ERROR,
+		[qr{unexpected character .;.}], q{\set i ;}
+	],
 	[
 		'set too many args',
 		0,
+		META_COMMAND_ERROR,
 		[qr{too many function arguments}],
 		q{\set i least(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)}
 	],
 	[
-		'set empty random range',          0,
+		'set empty random range', 0, META_COMMAND_ERROR,
 		[qr{empty range given to random}], q{\set i random(5,3)}
 	],
 	[
 		'set random range too large',
 		0,
+		META_COMMAND_ERROR,
 		[qr{random range is too large}],
 		q{\set i random(-9223372036854775808, 9223372036854775807)}
 	],
 	[
 		'set gaussian param too small',
 		0,
+		META_COMMAND_ERROR,
 		[qr{gaussian param.* at least 2}],
 		q{\set i random_gaussian(0, 10, 1.0)}
 	],
 	[
 		'set exponential param greater 0',
 		0,
+		META_COMMAND_ERROR,
 		[qr{exponential parameter must be greater }],
 		q{\set i random_exponential(0, 10, 0.0)}
 	],
 	[
 		'set zipfian param to 1',
 		0,
+		META_COMMAND_ERROR,
 		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1)}
 	],
 	[
 		'set zipfian param too large',
 		0,
+		META_COMMAND_ERROR,
 		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1000000)}
 	],
 	[
-		'set non numeric value',                     0,
+		'set non numeric value', 0, META_COMMAND_ERROR,
 		[qr{malformed variable "foo" value: "bla"}], q{\set i :foo + 1}
 	],
-	[ 'set no expression',    1, [qr{syntax error}],      q{\set i} ],
-	[ 'set missing argument', 1, [qr{missing argument}i], q{\set} ],
+	[ 'set no expression', 1, SYNTAX_ERROR, [qr{syntax error}], q{\set i} ],
 	[
-		'set not a bool',                      0,
+		'set missing argument', 1, SYNTAX_ERROR,
+		[qr{missing argument}i], q{\set}
+	],
+	[
+		'set not a bool', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce double to boolean}], q{\set b NOT 0.0}
 	],
 	[
-		'set not an int',                   0,
+		'set not an int', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to int}], q{\set i TRUE + 2}
 	],
 	[
-		'set not a double',                    0,
+		'set not a double', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to double}], q{\set d ln(TRUE)}
 	],
 	[
 		'set case error',
 		1,
+		SYNTAX_ERROR,
 		[qr{syntax error in command "set"}],
 		q{\set i CASE TRUE THEN 1 ELSE 0 END}
 	],
 	[
-		'set random error',                 0,
+		'set random error', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to int}], q{\set b random(FALSE, TRUE)}
 	],
 	[
-		'set number of args mismatch',        1,
+		'set number of args mismatch', 1, SYNTAX_ERROR,
 		[qr{unexpected number of arguments}], q{\set d ln(1.0, 2.0))}
 	],
 	[
-		'set at least one arg',               1,
+		'set at least one arg', 1, SYNTAX_ERROR,
 		[qr{at least one argument expected}], q{\set i greatest())}
 	],
 
 	# SETSHELL
 	[
-		'setshell not an int',                0,
+		'setshell not an int', 0, META_COMMAND_ERROR,
 		[qr{command must return an integer}], q{\setshell i echo -n one}
 	],
-	[ 'setshell missing arg', 1, [qr{missing argument }], q{\setshell var} ],
 	[
-		'setshell no such command',   0,
+		'setshell missing arg', 1, SYNTAX_ERROR,
+		[qr{missing argument }], q{\setshell var}
+	],
+	[
+		'setshell no such command', 0, META_COMMAND_ERROR,
 		[qr{could not read result }], q{\setshell var no-such-command}
 	],
 
 	# SLEEP
 	[
-		'sleep undefined variable',      0,
+		'sleep undefined variable', 0, META_COMMAND_ERROR,
 		[qr{sleep: undefined variable}], q{\sleep :nosuchvariable}
 	],
 	[
-		'sleep too many args',    1,
+		'sleep too many args', 1, SYNTAX_ERROR,
 		[qr{too many arguments}], q{\sleep too many args}
 	],
 	[
-		'sleep missing arg', 1,
+		'sleep missing arg', 1, SYNTAX_ERROR,
 		[ qr{missing argument}, qr{\\sleep} ], q{\sleep}
 	],
 	[
-		'sleep unknown unit',         1,
+		'sleep unknown unit', 1, SYNTAX_ERROR,
 		[qr{unrecognized time unit}], q{\sleep 1 week}
 	],
 
+	# CONDITIONAL BLOCKS
+	[   'error inside a conditional block', 0, SQL_ERROR,
+		[qr{ERROR:  division by zero}],
+		q{-- error inside a conditional block
+\if true
+SELECT 1 / 0;
+\endif
+}
+	],
+
 	# MISC
 	[
-		'misc invalid backslash command',         1,
+		'misc invalid backslash command', 1, SYNTAX_ERROR,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand}
 	],
-	[ 'misc empty script', 1, [qr{empty command list for script}], q{} ],
 	[
-		'bad boolean',                     0,
+		'misc empty script', 1, SYNTAX_ERROR,
+		[qr{empty command list for script}], q{}
+	],
+	[
+		'bad boolean', 0, META_COMMAND_ERROR,
 		[qr{malformed variable.*trueXXX}], q{\set b :badtrue or true}
 	],);
 
 
 for my $e (@errors)
 {
-	my ($name, $status, $re, $script) = @$e;
+	my ($name, $status, $error_type, $re, $script) = @$e;
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
+	my $test_name = 'pgbench script error: ' . $name;
+	my $stdout_re;
+
+	if ($status)
+	{
+		# only syntax errors get non-zero exit status
+		# internal error which should never occur
+		die $test_name . ": unexpected error type: " . $error_type . "\n"
+		if ($error_type != SYNTAX_ERROR);
+
+		$stdout_re = [ qr{^$} ];
+	}
+	else
+	{
+		$stdout_re =
+			[ qr{processed: 0/1}, qr{number of errors: 1 \(100.000%\)},
+			  qr{^((?!number of retried)(.|\n))*$} ];
+
+		if ($error_type == SQL_ERROR)
+		{
+			push @$stdout_re,
+				qr{number of serialization errors: 0 \(0.000%\)},
+				qr{number of deadlock errors: 0 \(0.000%\)},
+				qr{number of other SQL errors: 1 \(100.000%\)};
+		}
+		elsif ($error_type == META_COMMAND_ERROR)
+		{
+			push @$stdout_re,
+				qr{number of errors in meta-commands: 1 \(100.000%\)};
+		}
+		else
+		{
+			# internal error which should never occur
+			die $test_name . ": unexpected error type: " . $error_type . "\n";
+		}
+	}
+
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared --debug-fails --errors-detailed',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		$stdout_re,
 		$re,
-		'pgbench script error: ' . $name,
+		$test_name,
 		{ $n => $script });
 }
 
@@ -848,6 +939,209 @@ pgbench(
 check_pgbench_logs("$bdir/001_pgbench_log_3", 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization failure and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization failure and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
+  . "client \\g2 got a failure in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of errors)(.|\n))*$},
+	  qr{number of retried: 1\b}, qr{number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization failure.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table in the second try to run a
+  -- previously failed transaction.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock failure and retry
+
+# Check that we have a deadlock failure
+$err_pattern =
+	"client (0|1) got a failure in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --debug-fails --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of errors)(.|\n))*$},
+	  qr{number of retried: 1\b}, qr{number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- falure.
+--
+-- A client who has received a deadlock failure is called a (future) failed
+-- client although it will retry the failed transaction.
+--
+-- Also any successfull client must hold a lock at the transaction start so the
+-- failed client will wait until the successfull one releases all of its locks
+-- at the transaction end (we do not want any failures again).
+
+-- Since the client in the failed transaction has not released the blocking
+-- locks, let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second and future failed clients will stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second and future failed clients always come after the first and the
+-- number of rows in the table first_client_table reflect this. Here the first
+-- client inserts a row, so the second and future failed clients will see a
+-- non-empty table.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second or the failed client
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after a failure
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after a failure
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
 # done
 $node->stop;
 done_testing();
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index c1c2c1e..50626c2 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -157,6 +157,11 @@ my @options = (
 			qr{error while setting random seed from --random-seed option}
 		]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[ qr{invalid number of maximum tries: "-10"} ]
+	],
 
 	# loging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index db2a0a5..4d14066 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index 9b91de5..59c8d8a 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.7.4

#82

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#81)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a client's
random seed during the repeating of transactions after serialization/deadlock
failures).

About this v10 part 1:

Patch applies cleanly, compile, global & local make check both ok.

The random state is cleanly separated so that it will be easy to reset it
on client error handling ISTM that the pgbench side is deterministic with
the separation of the seeds for different uses.

Code is clean, comments are clear.

I'm wondering what is the rational for the "xseed" field name? In
particular, what does the "x" stands for?

--
Fabien.

#83

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#82)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 07-08-2018 19:21, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

About this v10 part 1:

Patch applies cleanly, compile, global & local make check both ok.

The random state is cleanly separated so that it will be easy to reset
it on client error handling ISTM that the pgbench side is
deterministic with
the separation of the seeds for different uses.

Code is clean, comments are clear.

:-)

I'm wondering what is the rational for the "xseed" field name? In
particular, what does the "x" stands for?

I called it "...seed" instead of "data" because perhaps the "data" is
too general a name for use here (but I'm not entirely sure what Alvaro
Herrera meant in [1]/messages/by-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql, see my answer in [2]/messages/by-id/cb2cde10e4e7a10a38b48e9cae8fbd28@postgrespro.ru). I called it "xseed" to
combine it with the arguments of the functions _dorand48 / pg_erand48 /
pg_jrand48 in the file erand48.c. IIUC they use a linear congruential
generator and perhaps "xseed" means the sequence with the name X of
pseudorandom values of size 48 bits (X_0, X_1, ... X_n) where X_0 is the
seed / the start value.

[1]: /messages/by-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql
/messages/by-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql

LGTM, though I'd rename the random_state struct members so that it
wouldn't look as confusing. Maybe that's just me.

[2]: /messages/by-id/cb2cde10e4e7a10a38b48e9cae8fbd28@postgrespro.ru
/messages/by-id/cb2cde10e4e7a10a38b48e9cae8fbd28@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#84

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#81)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to report
client failures that do not cause an aborts and this depends on the level of
debugging).

Patch applies cleanly, compiles, global & local make check ok.

This patch improves/homogenizes logging & error reporting in pgbench, in
preparation for another patch which will manage transaction restarts in
some cases.

However ISTM that it is not as necessary as the previous one, i.e. we
could do without it to get the desired feature, so I see it more as a
refactoring done "in passing", and I'm wondering whether it is
really worth it because it adds some new complexity, so I'm not sure of
the net benefit.

Anyway, I still have quite a few comments/suggestions on this version.

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should not
include "Error"? Maybe "LogLevel"?

I'm at odds with the proposed levels. ISTM that pgbench internal errors
which warrant an immediate exit should be dubbed "FATAL", which would
leave the "ERROR" name for... errors, eg SQL errors. I'd suggest to use an
INFO level for the PGBENCH_DEBUG function, and to keep LOG for main
program messages, so that all use case are separate. Or, maybe the
distinction between LOG/INFO is unclear so info is not necessary.

I'm unsure about the "log_min_messages" variable name, I'd suggest
"log_level".

I do not see the asserts on LOG >= log_min_messages as useful, because the
level can only be LOG or DEBUG anyway.

This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an local
function, which does not need a prefix? That would mean that
"pgbench_simple_error", which is indeed called on errors, could keep its
initial name "pgbench_error", and be called on errors.

Alternatively, the debug/logging code could be let as it is (i.e. direct
print to stderr) and the function only called when there is some kind of
error, in which case it could be named with "error" in its name (or
elog/ereport...).

* PQExpBuffer

I still do not see a positive value from importing PQExpBuffer complexity
and cost into pgbench, as the resulting code is not very readable and it
adds malloc/free cycles, so I'd try to avoid using PQExpBuf as much as
possible. ISTM that all usages could be avoided in the patch, and most
should be avoided even if ExpBuffer is imported because it is really
useful somewhere.

- to call pgbench_error from pgbench_simple_error, you can do a
pgbench_log_va(level, format, va_list) version called both from
pgbench_error & pgbench_simple_error.

- for PGBENCH_DEBUG function, do separate calls per type, the
very small partial code duplication is worth avoiding ExpBuf IMO.

- for doCustom debug: I'd just let the printf as it is, with a comment, as
it is really very internal stuff for debug. Or I'd just snprintf a
something in a static buffer.

- for syntax_error: it should terminate, so it should call
pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call
pgbench_error(FATAL, "syntax error found\n") for a final message,
or snprintf in a static buffer.

- for listAvailableScript: I'd simply call "pgbench_error(LOG" several
time, once per line.

I see building a string with a format (printfExpBuf..) and then calling
the pgbench_error function with just a "%s" format on the result as not
very elegant, because the second format is somehow hacked around.

* bool client

I'm unconvince by this added boolean just to switch the level on
encountered errors.

I'd suggest to let lookupCreateVariable, putVariable* as they are, call
pgbench_error with a level which does not stop the execution, and abort if
necessary from the callers with a "aborted because of putVariable/eval/...
error" message, as it was done before.

pgbench_error calls pgbench_error. Hmmm, why not.

--
Fabien.

#85

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#84)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 09-08-2018 12:28, Fabien COELHO wrote:

Hello Marina,

Hello!

v10-0002-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to
report client failures that do not cause an aborts and this depends on
the level of debugging).

Patch applies cleanly, compiles, global & local make check ok.

:-)

This patch improves/homogenizes logging & error reporting in pgbench,
in preparation for another patch which will manage transaction
restarts in some cases.

However ISTM that it is not as necessary as the previous one, i.e. we
could do without it to get the desired feature, so I see it more as a
refactoring done "in passing", and I'm wondering whether it is really
worth it because it adds some new complexity, so I'm not sure of the
net benefit.

We discussed this starting with [1]/messages/by-id/alpine.DEB.2.21.1806100837380.3655@lancre:

IMO this patch is more controversial than the other ones.

It is not really related to the aim of the patch series, which could
do without, couldn't it?

I'd suggest that it should be an independent submission, unrelated
to
the pgbench error management patch.

I suppose that this is related; because of my patch there may be a
lot
of such code (see v7 in [1]):
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);
I'm not sure that debug messages needs to be kept after debug, if it
is
about debugging pgbench itself. That is debatable.
AFAICS it is not about debugging pgbench itself, but about more
detailed
information that can be used to understand what exactly happened during
its launch. In the case of errors this helps to distinguish between
failures or errors by type (including which limit for retries was
violated and how far it was exceeded for the serialization/deadlock
errors).

That's why it was suggested to make the error function which hides
all
these things (see [2]):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in
the
main code, wrap with some function like log(level, msg).

Yep. I did not wrote that, but I agree with an "elog" suggestion to
switch

if (...) { fprintf(...); exit/abort/continue/... }

to a simpler:

elog(level, ...)

Anyway, I still have quite a few comments/suggestions on this version.

Thank you very much for them!

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should
not include "Error"? Maybe "LogLevel"?

On the one hand, this sounds better for me too. On the other hand, will
not this be in some kind of conflict with error level codes in elog.h?..

/* Error level codes */
#define DEBUG5 10 /* Debugging messages, in categories of
* decreasing detail. */
#define DEBUG4 11
...

I'm at odds with the proposed levels. ISTM that pgbench internal
errors which warrant an immediate exit should be dubbed "FATAL",

Ok!

which
would leave the "ERROR" name for... errors, eg SQL errors. I'd suggest
to use an INFO level for the PGBENCH_DEBUG function, and to keep LOG
for main program messages, so that all use case are separate. Or,
maybe the distinction between LOG/INFO is unclear so info is not
necessary.

The messages of the errors in SQL and meta commands are printed only if
the option --debug-fails is used so I'm not sure that they should have a
higher error level than main program messages (ERROR vs LOG). About an
INFO level for the PGBENCH_DEBUG function - ISTM that some main program
messages such as "dropping old tables...\n" or ..." tuples (%d%%) done
(elapsed %.2f s, remaining %.2f s)\n" can also use it.. About that all
use cases were separate - in the current version the level LOG also
includes messages about abortions of the clients.

I'm unsure about the "log_min_messages" variable name, I'd suggest
"log_level".

I do not see the asserts on LOG >= log_min_messages as useful, because
the level can only be LOG or DEBUG anyway.

Ok!

This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an
local function, which does not need a prefix? That would mean that
"pgbench_simple_error", which is indeed called on errors, could keep
its initial name "pgbench_error", and be called on errors.

About the name "log" - we already have the function doLog, so perhaps
the name "report" will be better.. But like with ErrorLevel will not
this be in some kind of conflict with ereport which is also used for the
levels DEBUG... / LOG / INFO?

Alternatively, the debug/logging code could be let as it is (i.e.
direct print to stderr) and the function only called when there is
some kind of error, in which case it could be named with "error" in
its name (or elog/ereport...).

As I wrote in [2]/messages/by-id/b692de21caaed13c59f31c06d0098488@postgrespro.ru:

because of my patch there may be a lot
of such code (see v7 in [1]):
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_FAILS)
+			{
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
+			}
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
fprintf(stderr, "client %d sending %s\n", st->id, sql);
That's why it was suggested to make the error function which hides all
these things (see [2]):

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

And IIUC macros will not help in the absence of __VA_ARGS__.

* PQExpBuffer

I still do not see a positive value from importing PQExpBuffer
complexity and cost into pgbench, as the resulting code is not very
readable and it adds malloc/free cycles, so I'd try to avoid using
PQExpBuf as much as possible. ISTM that all usages could be avoided in
the patch, and most should be avoided even if ExpBuffer is imported
because it is really useful somewhere.

- to call pgbench_error from pgbench_simple_error, you can do a
pgbench_log_va(level, format, va_list) version called both from
pgbench_error & pgbench_simple_error.

- for PGBENCH_DEBUG function, do separate calls per type, the very
small partial code duplication is worth avoiding ExpBuf IMO.

- for doCustom debug: I'd just let the printf as it is, with a
comment, as it is really very internal stuff for debug. Or I'd just
snprintf a something in a static buffer.

- for syntax_error: it should terminate, so it should call
pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call
pgbench_error(FATAL, "syntax error found\n") for a final message,
or snprintf in a static buffer.

- for listAvailableScript: I'd simply call "pgbench_error(LOG" several
time, once per line.

I see building a string with a format (printfExpBuf..) and then
calling the pgbench_error function with just a "%s" format on the
result as not very elegant, because the second format is somehow
hacked around.

Ok! About using a static buffer in doCustom debug or in syntax_error -
I'm not sure that this is always possible because ISTM that the variable
name can be quite large.

* bool client

I'm unconvince by this added boolean just to switch the level on
encountered errors.

I'd suggest to let lookupCreateVariable, putVariable* as they are,
call pgbench_error with a level which does not stop the execution, and
abort if necessary from the callers with a "aborted because of
putVariable/eval/... error" message, as it was done before.

There's one more problem: if this is a client failure, an error message
inside any of these functions should be printed at the level
DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do you
suggest using the error level as an argument for these functions?

pgbench_error calls pgbench_error. Hmmm, why not.

[1]: /messages/by-id/alpine.DEB.2.21.1806100837380.3655@lancre
/messages/by-id/alpine.DEB.2.21.1806100837380.3655@lancre
[2]: /messages/by-id/b692de21caaed13c59f31c06d0098488@postgrespro.ru
/messages/by-id/b692de21caaed13c59f31c06d0098488@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#86

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#85)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

I'd suggest to let lookupCreateVariable, putVariable* as they are,
call pgbench_error with a level which does not stop the execution, and
abort if necessary from the callers with a "aborted because of
putVariable/eval/... error" message, as it was done before.

There's one more problem: if this is a client failure, an error message
inside any of these functions should be printed at the level DEBUG_FAILS;
otherwise it should be printed at the level LOG. Or do you suggest using the
error level as an argument for these functions?

No. I suggest that the called function does only one simple thing,
probably "DEBUG", and that the *caller* prints a message if it is unhappy
about the failure of the called function, as it is currently done. This
allows to provide context as well from the caller, eg "setting variable %s
failed while <some specific context>". The user call rerun under debug for
precision if they need it.

I'm still not over enthousiastic with these changes, and still think that
it should be an independent patch, not submitted together with the "retry
on error" feature.

--
Fabien.

#87

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#86)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 10-08-2018 11:33, Fabien COELHO wrote:

Hello Marina,

I'd suggest to let lookupCreateVariable, putVariable* as they are,
call pgbench_error with a level which does not stop the execution,
and
abort if necessary from the callers with a "aborted because of
putVariable/eval/... error" message, as it was done before.

There's one more problem: if this is a client failure, an error
message inside any of these functions should be printed at the level
DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do
you suggest using the error level as an argument for these functions?

No. I suggest that the called function does only one simple thing,
probably "DEBUG", and that the *caller* prints a message if it is
unhappy about the failure of the called function, as it is currently
done. This allows to provide context as well from the caller, eg
"setting variable %s failed while <some specific context>". The user
call rerun under debug for precision if they need it.

Ok!

I'm still not over enthousiastic with these changes, and still think
that it should be an independent patch, not submitted together with
the "retry on error" feature.

In the next version I will put the error patch last, so it will be
possible to compare the "retry on error" feature with it and without it,
and let the committer decide how it is better)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#88

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Marina Polyakova (#85)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote:

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should
not include "Error"? Maybe "LogLevel"?

On the one hand, this sounds better for me too. On the other hand, will not
this be in some kind of conflict with error level codes in elog.h?..

I think it shouldn't because those error levels are backends levels.
pgbench is a client side utility with its own code, it shares some code
with libpq and other utilities, but elog.h isn't one of them.

This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an
local function, which does not need a prefix? That would mean that
"pgbench_simple_error", which is indeed called on errors, could keep
its initial name "pgbench_error", and be called on errors.

About the name "log" - we already have the function doLog, so perhaps the
name "report" will be better.. But like with ErrorLevel will not this be in
some kind of conflict with ereport which is also used for the levels
DEBUG... / LOG / INFO?

+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man 3
log").

pgbench_error calls pgbench_error. Hmmm, why not.

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert() or fprintf()+exit(1) at least.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#89

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#88)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 10-08-2018 15:53, Arthur Zakirov wrote:

On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote:

* ErrorLevel

If ErrorLevel is used for things which are not errors, its name should
not include "Error"? Maybe "LogLevel"?

On the one hand, this sounds better for me too. On the other hand,
will not
this be in some kind of conflict with error level codes in elog.h?..

I think it shouldn't because those error levels are backends levels.
pgbench is a client side utility with its own code, it shares some code
with libpq and other utilities, but elog.h isn't one of them.

I agree with you on this :) I just meant that maybe it would be better
to call this group in the same way because they are used in general for
the same purpose?..

This point also suggest that maybe "pgbench_error" is misnamed as well
(ok, I know I suggested it in place of ereport, but e stands for error
there), as it is called on errors, but is also on other things. Maybe
"pgbench_log"? Or just simply "log" or "report", as it is really an
local function, which does not need a prefix? That would mean that
"pgbench_simple_error", which is indeed called on errors, could keep
its initial name "pgbench_error", and be called on errors.

About the name "log" - we already have the function doLog, so perhaps
the
name "report" will be better.. But like with ErrorLevel will not this
be in
some kind of conflict with ereport which is also used for the levels
DEBUG... / LOG / INFO?

+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man
3
log").

Do you think that pgbench_log (or another whose name speaks only about
logging) will look good, for example, with FATAL? Because this means
that the logging function also processes errors and calls exit(1) if
necessary..

pgbench_error calls pgbench_error. Hmmm, why not.

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert()

I would like not to use Assert in this case because IIUC they are mostly
used for testing.

or fprintf()+exit(1) at least.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#90

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Marina Polyakova (#89)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote:

+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man 3
log").

Do you think that pgbench_log (or another whose name speaks only about
logging) will look good, for example, with FATAL? Because this means that
the logging function also processes errors and calls exit(1) if necessary..

Yes, why not. "_log" just means that you want to log some message with
the specified log level. Moreover those messages sometimes aren't error:

pgbench_error(LOG, "starting vacuum...");

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert()

I would like not to use Assert in this case because IIUC they are mostly
used for testing.

I'd vote to remove this check at all. I don't see any place where it is
possible to call pgbench_error() passing empty "fmt".

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#91

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#90)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 10-08-2018 17:19, Arthur Zakirov wrote:

On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote:

+1 from me to keep initial name "pgbench_error". "pgbench_log" for new
function looks nice to me. I think it is better than just "log",
because "log" may conflict with natural logarithmic function (see "man 3
log").

Do you think that pgbench_log (or another whose name speaks only about
logging) will look good, for example, with FATAL? Because this means
that
the logging function also processes errors and calls exit(1) if
necessary..

Yes, why not. "_log" just means that you want to log some message with
the specified log level. Moreover those messages sometimes aren't
error:

pgbench_error(LOG, "starting vacuum...");

"pgbench_log" is already used as the default filename prefix for
transaction logging.

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert()

I would like not to use Assert in this case because IIUC they are
mostly
used for testing.

I'd vote to remove this check at all. I don't see any place where it is
possible to call pgbench_error() passing empty "fmt".

pgbench_error(..., "%s", PQerrorMessage(con)); ?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#92

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#81)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

HEllo Marina,

v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client variables
during the repeating of transactions after serialization/deadlock failures).

This patch adds an explicit structure to manage Variables, which is useful
to reset these on pgbench script retries, which is the purpose of the
whole patch series.

About part 3:

Patch applies cleanly,

* typo in comments: "varaibles"

* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code can
never be triggered because pgbench would be dead long before having
allocated INT_MAX variables. So I would not bother to add such checks.

ISTM that if something is amiss it will fail in pg_realloc anyway. Also I
do not like the ExpBuf stuff, as usual.

I'm not sure that the size_t cast here and there are useful for any
practical values likely to be encountered by pgbench.

The exponential allocation seems overkill. I'd simply add a constant
number of slots, with a simple rule:

/* reallocated with a margin */
if (max_vars < needed) max_vars = needed + 8;

So in the end the function should be much simpler.

--
Fabien.

#93

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Fabien COELHO (#92)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

About part 3:

Patch applies cleanly,

I forgot: compiles, global & local "make check" are ok.

--
Fabien.

#94

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#93)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 12-08-2018 12:14, Fabien COELHO wrote:

HEllo Marina,

Hello, Fabien!

v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

This patch adds an explicit structure to manage Variables, which is
useful to reset these on pgbench script retries, which is the purpose
of the whole patch series.

About part 3:

Patch applies cleanly,

On 12-08-2018 12:17, Fabien COELHO wrote:

About part 3:

Patch applies cleanly,

I forgot: compiles, global & local "make check" are ok.

I'm glad to hear it :-)

* typo in comments: "varaibles"

I'm sorry, I'll fix it.

* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code
can never be triggered because pgbench would be dead long before
having allocated INT_MAX variables. So I would not bother to add such
checks.
...
I'm not sure that the size_t cast here and there are useful for any
practical values likely to be encountered by pgbench.

Looking at the code of the functions, for example, ParseScript and
psql_scan_setup, where the integer variable is used for the size of the
entire script - ISTM that you are right.. Therefore size_t casts will
also be removed.

ISTM that if something is amiss it will fail in pg_realloc anyway.

IIUC and physical RAM is not enough, this may depend on the size of the
swap.

Also I do not like the ExpBuf stuff, as usual.

The exponential allocation seems overkill. I'd simply add a constant
number of slots, with a simple rule:

/* reallocated with a margin */
if (max_vars < needed) max_vars = needed + 8;

So in the end the function should be much simpler.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#95

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#81)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of transactions
with serialization/deadlock failures (see the detailed description in the
file).

Patch applies cleanly.

It allows retrying a script (considered as a transaction) on serializable
and deadlock errors, which is a very interesting extension but also
impacts pgbench significantly.

I'm waiting for the feature to be right before checking in full the
documentation and tests. There are still some issues to resolve before
checking that.

Anyway, tests look reasonable. Taking advantage of of transactions control
from PL/pgsql is a good use of this new feature.

A few comments about the doc.

According to the documentation, the feature is triggered by --max-tries and
--latency-limit. I disagree with the later, because it means that having
latency limit without retrying is not supported anymore.

Maybe you can allow an "unlimited" max-tries, say with special value zero,
and the latency limit does its job if set, over all tries.

Doc: "error in meta commands" -> "meta command errors", for homogeneity with
other cases?

Detailed -r report. I understand from the doc that the retry number on the
detailed per-statement report is to identify at what point errors occur?
Probably this is more or less always at the same point on a given script,
so that the most interesting feature is to report the number of retries at the
script level.

Doc: "never occur.." -> "never occur", or eventually "...".

Doc: "Directly client errors" -> "Direct client errors".

I'm still in favor of asserting that the sql connection is idle (no tx in
progress) at the beginning and/or end of a script, and report a user error
if not, instead of writing complex caveats.

If someone has a use-case for that, then maybe it can be changed, but I
cannot see any in a benchmarking context, and I can see how easy it is
to have a buggy script with this allowed.

I do not think that the RETRIES_ENABLED macro is a good thing. I'd suggest
to write the condition four times.

ISTM that "skipped" transactions are NOT "successful" so there are a problem
with comments. I believe that your formula are probably right, it has more to do
with what is "success". For cnt decomposition, ISTM that "other transactions"
are really "directly successful transactions".

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another"
does not make sense yet. I'd suggest to name it "OTHER_SQL_FAILURE".

In TState, field "uint32 retries": maybe it would be simpler to count "tries",
which can be compared directly to max tries set in the option?

ErrorLevel: I have already commented about in review about 10.2. I'm not sure of
the LOG -> DEBUG_FAIL changes. I do not understand the name "DEBUG_FAIL", has it
is not related to debug, they just seem to be internal errors. META_ERROR maybe?

inTransactionBlock: I disagree with any function other than doCustom changing
the client state, because it makes understanding the state machine harder. There
is already one exception to that (threadRun) that I wish to remove. All state
changes must be performed explicitely in doCustom.

The automaton skips to FAILURE on every possible error. I'm wondering whether
it could do so only on SQL errors, because other fails will lead to ABORTED
anyway? If there is no good reason to skip to FAILURE from some errors, I'd
suggest to keep the previous behavior. Maybe the good reason is to do some
counting, but this means that on eg metacommand errors now the script would
loop over instead of aborting, which does not look like a desirable change
of behavior.

PQexec("ROOLBACK"): you are inserting a synchronous command, for which the
thread will have to wait for the result, in a middle of a framework which
takes great care to use only asynchronous stuff so that one thread can
manage several clients efficiently. You cannot call PQexec there.
From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to
a new state CSTATE_WAIT_ABORT_RESULT which would be similar to
CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead
of proceeding to the next command.

ISTM that it would be more logical to only get into RETRY if there is a retry,
i.e. move the test RETRY/ABORT in FAILURE. For that, instead of "canRetry",
maybe you want "doRetry", which tells that a retry is possible (the error
is serializable or deadlock) and that the current parameters allow it
(timeout, max retries).

* Minor C style comments:

if / else if / else if ... on *_FAILURE: I'd suggest a switch.

The following line removal does not seem useful, I'd have kept it:

stats->cnt++;
-
if (skipped)

copyVariables: I'm not convinced that source_vars & nvars variables are that
useful.

memcpy(&(st->retry_state.random_state), &(st->random_state), sizeof(RandomState));

Is there a problem with "st->retry_state.random_state = st->random_state;"
instead of memcpy? ISTM that simple assignments work in C. Idem in the reverse
copy under RETRY.

if (!copyVariables(&st->retry_state.variables, &st->variables)) {
pgbench_error(LOG, "client %d aborted when preparing to execute a transaction\n", st->id);

The message could be more precise, eg "client %d failed while copying
variables", unless copyVariables already printed a message. As this is really
an internal error from pgbench, I'd rather do a FATAL (direct exit) there.
ISTM that the only possible failure is OOM here, and pgbench is in a very bad
shape if it gets into that.

commandFailed: I'm not thrilled by the added boolean, which is partially
redundant with the second argument.

          if (per_script_stats)
  -               accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
  +       {
  +               accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
  +                                  st->failure_status, st->retries);
  +       }
   }

I do not see the point of changing the style here.

--
Fabien.

#96

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#95)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 15-08-2018 11:50, Fabien COELHO wrote:

Hello Marina,

Hello!

v10-0004-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

Patch applies cleanly.

It allows retrying a script (considered as a transaction) on
serializable and deadlock errors, which is a very interesting
extension but also impacts pgbench significantly.

I'm waiting for the feature to be right before checking in full the
documentation and tests. There are still some issues to resolve before
checking that.

Anyway, tests look reasonable. Taking advantage of of transactions
control from PL/pgsql is a good use of this new feature.

:-)

A few comments about the doc.

According to the documentation, the feature is triggered by --max-tries
and
--latency-limit. I disagree with the later, because it means that
having
latency limit without retrying is not supported anymore.

Maybe you can allow an "unlimited" max-tries, say with special value
zero,
and the latency limit does its job if set, over all tries.

Doc: "error in meta commands" -> "meta command errors", for homogeneity
with
other cases?
...
Doc: "never occur.." -> "never occur", or eventually "...".

Doc: "Directly client errors" -> "Direct client errors".
...
inTransactionBlock: I disagree with any function other than doCustom
changing
the client state, because it makes understanding the state machine
harder. There
is already one exception to that (threadRun) that I wish to remove. All
state
changes must be performed explicitely in doCustom.
...
PQexec("ROOLBACK"): you are inserting a synchronous command, for which
the
thread will have to wait for the result, in a middle of a framework
which
takes great care to use only asynchronous stuff so that one thread can
manage several clients efficiently. You cannot call PQexec there.
From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to
a new state CSTATE_WAIT_ABORT_RESULT which would be similar to
CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead
of proceeding to the next command.
...
memcpy(&(st->retry_state.random_state), &(st->random_state),
sizeof(RandomState));

Is there a problem with "st->retry_state.random_state =
st->random_state;"
instead of memcpy? ISTM that simple assignments work in C. Idem in the
reverse
copy under RETRY.

Thank you, I'll fix this.

Detailed -r report. I understand from the doc that the retry number on
the
detailed per-statement report is to identify at what point errors
occur?
Probably this is more or less always at the same point on a given
script,
so that the most interesting feature is to report the number of retries
at the
script level.

This may depend on various factors.. for example:

transaction type: pgbench_test_serialization.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 10 s
number of transactions actually processed: 266
number of errors: 10 (3.623%)
number of serialization errors: 10 (3.623%)
number of retried: 75 (27.174%)
number of retries: 75
maximum number of tries: 2
latency average = 72.734 ms (including errors)
tps = 26.501162 (including connections establishing)
tps = 26.515082 (excluding connections establishing)
statement latencies in milliseconds, errors and retries:
0.012 0 0 \set delta random(-5000, 5000)
0.001 0 0 \set x1 random(1, 100000)
0.001 0 0 \set x3 random(1, 2)
0.001 0 0 \set x2 random(1, 1)
19.837 0 0 UPDATE xy1 SET y = y + :delta
WHERE x = :x1;
21.239 5 36 UPDATE xy3 SET y = y + :delta
WHERE x = :x3;
21.360 5 39 UPDATE xy2 SET y = y + :delta
WHERE x = :x2;

And you can always get the number of retries at the script level from
the main report (if only one script is used) or from the report for each
script (if multiple scripts are used).

I'm still in favor of asserting that the sql connection is idle (no tx
in
progress) at the beginning and/or end of a script, and report a user
error
if not, instead of writing complex caveats.

If someone has a use-case for that, then maybe it can be changed, but I
cannot see any in a benchmarking context, and I can see how easy it is
to have a buggy script with this allowed.

I do not think that the RETRIES_ENABLED macro is a good thing. I'd
suggest
to write the condition four times.

Ok!

ISTM that "skipped" transactions are NOT "successful" so there are a
problem
with comments. I believe that your formula are probably right, it has
more to do
with what is "success". For cnt decomposition, ISTM that "other
transactions"
are really "directly successful transactions".

I agree with you, but I also think that skipped transactions should not
be considered errors. So we can write something like this:

All the transactions are divided into several types depending on their
execution. Firstly, they can be divided into transactions that we
started to execute, and transactions which were skipped (it was too late
to execute them). Secondly, running transactions fall into 2 main types:
is there any command that got a failure during the last execution of the
transaction script or not? Thus

the number of all transactions =
skipped (it was too late to execute them)
cnt (the number of successful transactions) +
ecnt (the number of failed transactions).

A successful transaction can have several unsuccessful tries before a
successfull run. Thus

cnt (the number of successful transactions) =
retried (they got a serialization or a deadlock failure(s), but were
successfully retried from the very beginning) +
directly successfull transactions (they were successfully completed on
the first try).

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise
"another"
does not make sense yet.

Maybe firstly put a general group, and then special cases?...

I'd suggest to name it "OTHER_SQL_FAILURE".

Ok!

In TState, field "uint32 retries": maybe it would be simpler to count
"tries",
which can be compared directly to max tries set in the option?

If you mean retries in CState - on the one hand, yes, on the other hand,
statistics always use the number of retries...

ErrorLevel: I have already commented about in review about 10.2. I'm
not sure of
the LOG -> DEBUG_FAIL changes. I do not understand the name
"DEBUG_FAIL", has it
is not related to debug, they just seem to be internal errors.
META_ERROR maybe?

As I wrote to you in [1]/messages/by-id/fcc2512cdc9e6bc49d3b489181f454da@postgrespro.ru:

I'm at odds with the proposed levels. ISTM that pgbench internal
errors which warrant an immediate exit should be dubbed "FATAL",

Ok!

which
would leave the "ERROR" name for... errors, eg SQL errors.
...

The messages of the errors in SQL and meta commands are printed only if
the option --debug-fails is used so I'm not sure that they should have
a
higher error level than main program messages (ERROR vs LOG).

Perhaps we can rename the levels DEBUG_FAIL and LOG to LOG and
LOG_PGBENCH respectively. In this case the client error messages do not
use debug error levels and the term "logging" is already used for
transaction/aggregation logging... Therefore perhaps we can also combine
the options --errors-detailed and --debug-fails into the option
--fails-detailed=none|groups|all_messages. Here --fails-detailed=groups
can be used to group errors in reports or logs by basic types.
--fails-detailed=all_messages can add to this all error messages in the
SQL/meta commands, and messages for processing the failed transaction
(its end/retry).

The automaton skips to FAILURE on every possible error. I'm wondering
whether
it could do so only on SQL errors, because other fails will lead to
ABORTED
anyway? If there is no good reason to skip to FAILURE from some errors,
I'd
suggest to keep the previous behavior. Maybe the good reason is to do
some
counting, but this means that on eg metacommand errors now the script
would
loop over instead of aborting, which does not look like a desirable
change
of behavior.

Even in the case of meta command errors we must prepare for
CSTATE_END_TX and the execution of the next script: if necessary, clear
the conditional stack and rollback the current transaction block.

ISTM that it would be more logical to only get into RETRY if there is a
retry,
i.e. move the test RETRY/ABORT in FAILURE. For that, instead of
"canRetry",
maybe you want "doRetry", which tells that a retry is possible (the
error
is serializable or deadlock) and that the current parameters allow it
(timeout, max retries).

* Minor C style comments:

if / else if / else if ... on *_FAILURE: I'd suggest a switch.

The following line removal does not seem useful, I'd have kept it:

stats->cnt++;
-
if (skipped)

copyVariables: I'm not convinced that source_vars & nvars variables are
that
useful.

if (!copyVariables(&st->retry_state.variables, &st->variables)) {
pgbench_error(LOG, "client %d aborted when preparing to execute a
transaction\n", st->id);

The message could be more precise, eg "client %d failed while copying
variables", unless copyVariables already printed a message. As this is
really
an internal error from pgbench, I'd rather do a FATAL (direct exit)
there.
ISTM that the only possible failure is OOM here, and pgbench is in a
very bad
shape if it gets into that.

Ok!

commandFailed: I'm not thrilled by the added boolean, which is
partially
redundant with the second argument.

Do you mean that it is partially redundant with the argument "cmd" and,
for example, the meta commands errors always do not cause the abortions
of the client?

if (per_script_stats)
-               accumStats(&sql_script[st->use_file].stats, skipped,
latency, lag);
+       {
+               accumStats(&sql_script[st->use_file].stats, skipped,
latency, lag,
+                                  st->failure_status, st->retries);
+       }
}

I do not see the point of changing the style here.

If in such cases one command is placed on several lines, ISTM that the
code is more understandable if curly brackets are used...

[1]: /messages/by-id/fcc2512cdc9e6bc49d3b489181f454da@postgrespro.ru
/messages/by-id/fcc2512cdc9e6bc49d3b489181f454da@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#97

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#96)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Detailed -r report. I understand from the doc that the retry number on
the detailed per-statement report is to identify at what point errors
occur? Probably this is more or less always at the same point on a
given script, so that the most interesting feature is to report the
number of retries at the script level.

This may depend on various factors.. for example:
[...]
21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x
= :x3;
21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x
= :x2;

Ok, not always the same point, and you confirm that it identifies where
the error is raised which leads to a retry.

And you can always get the number of retries at the script level from the
main report (if only one script is used) or from the report for each script
(if multiple scripts are used).

Ok.

ISTM that "skipped" transactions are NOT "successful" so there are a
problem with comments. I believe that your formula are probably right,
it has more to do with what is "success". For cnt decomposition, ISTM
that "other transactions" are really "directly successful
transactions".

I agree with you, but I also think that skipped transactions should not be
considered errors.

I'm ok with having a special category for them in the explanations, which
is neither success nor error.

So we can write something like this:

All the transactions are divided into several types depending on their
execution. Firstly, they can be divided into transactions that we started to
execute, and transactions which were skipped (it was too late to execute
them). Secondly, running transactions fall into 2 main types: is there any
command that got a failure during the last execution of the transaction
script or not? Thus

Here is an attempt at having a more precise and shorter version, not sure
it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome. First
a transaction may have started or not: skipped transactions occur under
--rate and --latency-limit when the client is too late to execute them.
Secondly, a started transaction may ultimately succeed or fail on some
error, possibly after some retries when --max-tries is not one. Thus
"""

the number of all transactions =
skipped (it was too late to execute them)
cnt (the number of successful transactions) +
ecnt (the number of failed transactions).

A successful transaction can have several unsuccessful tries before a
successfull run. Thus

cnt (the number of successful transactions) =
retried (they got a serialization or a deadlock failure(s), but were
successfully retried from the very beginning) +
directly successfull transactions (they were successfully completed on
the first try).

These above description is clearer for me.

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise
"another" does not make sense yet.

Maybe firstly put a general group, and then special cases?...

I understand it more as a catch all default "none of the above" case.

In TState, field "uint32 retries": maybe it would be simpler to count
"tries", which can be compared directly to max tries set in the option?

If you mean retries in CState - on the one hand, yes, on the other hand,
statistics always use the number of retries...

Ok.

The automaton skips to FAILURE on every possible error. I'm wondering
whether it could do so only on SQL errors, because other fails will
lead to ABORTED anyway? If there is no good reason to skip to FAILURE
from some errors, I'd suggest to keep the previous behavior. Maybe the
good reason is to do some counting, but this means that on eg
metacommand errors now the script would loop over instead of aborting,
which does not look like a desirable change of behavior.

Even in the case of meta command errors we must prepare for CSTATE_END_TX and
the execution of the next script: if necessary, clear the conditional stack
and rollback the current transaction block.

Seems ok.

commandFailed: I'm not thrilled by the added boolean, which is partially
redundant with the second argument.

Do you mean that it is partially redundant with the argument "cmd" and, for
example, the meta commands errors always do not cause the abortions of the
client?

Yes. And also I'm not sure we should want this boolean at all.

[...]
If in such cases one command is placed on several lines, ISTM that the code
is more understandable if curly brackets are used...

Hmmm. Such basic style changes are avoided because they break
backpatching, so we try to avoid gratuitous changes unless there is a
strong added value, which does not seem to be the case here.

--
Fabien.

#98

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#97)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 17-08-2018 10:49, Fabien COELHO wrote:

Hello Marina,

Detailed -r report. I understand from the doc that the retry number
on the detailed per-statement report is to identify at what point
errors occur? Probably this is more or less always at the same point
on a given script, so that the most interesting feature is to report
the number of retries at the script level.

This may depend on various factors.. for example:
[...]
21.239 5 36 UPDATE xy3 SET y = y + :delta
WHERE x = :x3;
21.360 5 39 UPDATE xy2 SET y = y + :delta
WHERE x = :x2;

Ok, not always the same point, and you confirm that it identifies
where the error is raised which leads to a retry.

Yes, I confirm this. I'll try to write more clearly about this in the
documentation...

So we can write something like this:

All the transactions are divided into several types depending on their
execution. Firstly, they can be divided into transactions that we
started to execute, and transactions which were skipped (it was too
late to execute them). Secondly, running transactions fall into 2 main
types: is there any command that got a failure during the last
execution of the transaction script or not? Thus

Here is an attempt at having a more precise and shorter version, not
sure it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome.
First
a transaction may have started or not: skipped transactions occur
under --rate and --latency-limit when the client is too late to
execute them. Secondly, a started transaction may ultimately succeed
or fail on some error, possibly after some retries when --max-tries is
not one. Thus
"""

Thank you!

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option,
otherwise "another" does not make sense yet.

Maybe firstly put a general group, and then special cases?...

I understand it more as a catch all default "none of the above" case.

Ok!

commandFailed: I'm not thrilled by the added boolean, which is
partially
redundant with the second argument.

Do you mean that it is partially redundant with the argument "cmd"
and, for example, the meta commands errors always do not cause the
abortions of the client?

Yes. And also I'm not sure we should want this boolean at all.

Perhaps we can use a separate function to print the messages about
client's abortion, something like this (it is assumed that all abortions
happen when processing SQL commands):

static void
clientAborted(CState *st, const char *message)
{
pgbench_error(...,
"client %d aborted in command %d (SQL) of script %d; %s\n",
st->id, st->command, st->use_file, message);
}

Or perhaps we can use a more detailed failure status so for each type of
failure we always know the command name (argument "cmd") and whether the
client is aborted. Something like this (but in comparison with the first
variant ISTM overly complicated):

/*
* For the failures during script execution.
*/
typedef enum FailureStatus
{
NO_FAILURE = 0,

/*
* Failures in meta commands. In these cases the failed transaction is
* terminated.
*/
META_SET_FAILURE,
META_SETSHELL_FAILURE,
META_SHELL_FAILURE,
META_SLEEP_FAILURE,
META_IF_FAILURE,
META_ELIF_FAILURE,

/*
* Failures in SQL commands. In cases of serialization/deadlock
failures a
* failed transaction is re-executed from the very beginning if
possible;
* otherwise the failed transaction is terminated.
*/
SERIALIZATION_FAILURE,
DEADLOCK_FAILURE,
OTHER_SQL_FAILURE, /* other failures in SQL commands that are not
* listed by themselves above */

/*
* Failures while processing SQL commands. In this case the client is
* aborted.
*/
SQL_CONNECTION_FAILURE
} FailureStatus;

[...]
If in such cases one command is placed on several lines, ISTM that the
code is more understandable if curly brackets are used...

Hmmm. Such basic style changes are avoided because they break
backpatching, so we try to avoid gratuitous changes unless there is a
strong added value, which does not seem to be the case here.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#99

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#98)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

commandFailed: I'm not thrilled by the added boolean, which is partially
redundant with the second argument.

Do you mean that it is partially redundant with the argument "cmd" and,
for example, the meta commands errors always do not cause the abortions of
the client?

Yes. And also I'm not sure we should want this boolean at all.

Perhaps we can use a separate function to print the messages about client's
abortion, something like this (it is assumed that all abortions happen when
processing SQL commands):

static void
clientAborted(CState *st, const char *message)

Possibly.

Or perhaps we can use a more detailed failure status so for each type of
failure we always know the command name (argument "cmd") and whether the
client is aborted. Something like this (but in comparison with the first
variant ISTM overly complicated):

I agree., I do not think that it would be useful given that the same thing
is done on all meta-command error cases in the end.

--
Fabien.

#100

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#99)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 17-08-2018 14:04, Fabien COELHO wrote:

...

Or perhaps we can use a more detailed failure status so for each type
of failure we always know the command name (argument "cmd") and
whether the client is aborted. Something like this (but in comparison
with the first variant ISTM overly complicated):

I agree., I do not think that it would be useful given that the same
thing is done on all meta-command error cases in the end.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#101

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Marina Polyakova (#100)

4 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello, hackers!

This is the eleventh version of the patch for error handling and
retrying of transactions with serialization/deadlock failures in pgbench
(based on the commit 14e9b2a752efaa427ce1b400b9aaa5a636898a04) thanks to
the comments of Fabien Coelho and Arthur Zakirov in this thread.

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

v11-0004-Pgbench-errors-use-a-separate-function-to-report.patch
- a patch for a separate error reporting function (this is used to
report client failures that do not cause an aborts and this depends on
the level of debugging). Although this is a try to fix a duplicate code
for debug messages (see [1]/messages/by-id/20180405180807.0bc1114f@wp.localdomain), this may seem mostly refactoring and
therefore may not seem very necessary for this set of patches (see [2]/messages/by-id/alpine.DEB.2.21.1808071823540.13466@lancre,
[3]: /messages/by-id/alpine.DEB.2.21.1808101027390.9120@lancre

Any suggestions are welcome!

[1]: /messages/by-id/20180405180807.0bc1114f@wp.localdomain
/messages/by-id/20180405180807.0bc1114f@wp.localdomain

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

[2]: /messages/by-id/alpine.DEB.2.21.1808071823540.13466@lancre
/messages/by-id/alpine.DEB.2.21.1808071823540.13466@lancre

However ISTM that it is not as necessary as the previous one, i.e. we
could do without it to get the desired feature, so I see it more as a
refactoring done "in passing", and I'm wondering whether it is
really worth it because it adds some new complexity, so I'm not sure of
the net benefit.

[3]: /messages/by-id/alpine.DEB.2.21.1808101027390.9120@lancre
/messages/by-id/alpine.DEB.2.21.1808101027390.9120@lancre

I'm still not over enthousiastic with these changes, and still think
that
it should be an independent patch, not submitted together with the
"retry
on error" feature.

All that was fixed from the previous version:

[4]: /messages/by-id/alpine.DEB.2.21.1808071823540.13466@lancre
/messages/by-id/alpine.DEB.2.21.1808071823540.13466@lancre

I'm at odds with the proposed levels. ISTM that pgbench internal
errors which warrant an immediate exit should be dubbed "FATAL",

I'm unsure about the "log_min_messages" variable name, I'd suggest
"log_level".

I do not see the asserts on LOG >= log_min_messages as useful, because
the level can only be LOG or DEBUG anyway.

* PQExpBuffer

I still do not see a positive value from importing PQExpBuffer
complexity and cost into pgbench, as the resulting code is not very
readable and it adds malloc/free cycles, so I'd try to avoid using
PQExpBuf as much as possible. ISTM that all usages could be avoided in
the patch, and most should be avoided even if ExpBuffer is imported
because it is really useful somewhere.

- to call pgbench_error from pgbench_simple_error, you can do a
pgbench_log_va(level, format, va_list) version called both from
pgbench_error & pgbench_simple_error.

- for PGBENCH_DEBUG function, do separate calls per type, the very
small partial code duplication is worth avoiding ExpBuf IMO.

- for doCustom debug: I'd just let the printf as it is, with a
comment, as it is really very internal stuff for debug. Or I'd just
snprintf a something in a static buffer.

...

- for listAvailableScript: I'd simply call "pgbench_error(LOG" several
time, once per line.

I see building a string with a format (printfExpBuf..) and then
calling the pgbench_error function with just a "%s" format on the
result as not very elegant, because the second format is somehow
hacked around.

[5]: /messages/by-id/alpine.DEB.2.21.1808101027390.9120@lancre
/messages/by-id/alpine.DEB.2.21.1808101027390.9120@lancre

I suggest that the called function does only one simple thing,
probably "DEBUG", and that the *caller* prints a message if it is
unhappy
about the failure of the called function, as it is currently done. This
allows to provide context as well from the caller, eg "setting variable
%s
failed while <some specific context>". The user call rerun under debug
for
precision if they need it.

[6]: /messages/by-id/20180810125327.GA2374@zakirov.localdomain
/messages/by-id/20180810125327.GA2374@zakirov.localdomain

I agree with Fabien. Calling pgbench_error() inside pgbench_error()
could be dangerous. I think "fmt" checking could be removed, or we may
use Assert() or fprintf()+exit(1) at least.

[7]: /messages/by-id/alpine.DEB.2.21.1808121057540.6189@lancre
/messages/by-id/alpine.DEB.2.21.1808121057540.6189@lancre

* typo in comments: "varaibles"

* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code
can
never be triggered because pgbench would be dead long before having
allocated INT_MAX variables. So I would not bother to add such checks.

I'm not sure that the size_t cast here and there are useful for any
practical values likely to be encountered by pgbench.

The exponential allocation seems overkill. I'd simply add a constant
number of slots, with a simple rule:

/* reallocated with a margin */
if (max_vars < needed) max_vars = needed + 8;

[8]: /messages/by-id/alpine.DEB.2.21.1808151046090.30050@lancre
/messages/by-id/alpine.DEB.2.21.1808151046090.30050@lancre

A few comments about the doc.

According to the documentation, the feature is triggered by --max-tries
and
--latency-limit. I disagree with the later, because it means that
having
latency limit without retrying is not supported anymore.

Maybe you can allow an "unlimited" max-tries, say with special value
zero,
and the latency limit does its job if set, over all tries.

Doc: "error in meta commands" -> "meta command errors", for homogeneity
with
other cases?

Doc: "never occur.." -> "never occur", or eventually "...".

Doc: "Directly client errors" -> "Direct client errors".

I'm still in favor of asserting that the sql connection is idle (no tx
in
progress) at the beginning and/or end of a script, and report a user
error
if not, instead of writing complex caveats.

I do not think that the RETRIES_ENABLED macro is a good thing. I'd
suggest
to write the condition four times.

ISTM that "skipped" transactions are NOT "successful" so there are a
problem
with comments. I believe that your formula are probably right, it has
more to do
with what is "success". For cnt decomposition, ISTM that "other
transactions"
are really "directly successful transactions".

I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise
"another"
does not make sense yet. I'd suggest to name it "OTHER_SQL_FAILURE".

I'm not sure of
the LOG -> DEBUG_FAIL changes. I do not understand the name
"DEBUG_FAIL", has it
is not related to debug, they just seem to be internal errors.

inTransactionBlock: I disagree with any function other than doCustom
changing
the client state, because it makes understanding the state machine
harder. There
is already one exception to that (threadRun) that I wish to remove. All
state
changes must be performed explicitely in doCustom.

PQexec("ROOLBACK"): you are inserting a synchronous command, for which
the
thread will have to wait for the result, in a middle of a framework
which
takes great care to use only asynchronous stuff so that one thread can
manage several clients efficiently. You cannot call PQexec there.
From where I sit, I'd suggest to sendQuery("ROLLBACK"), then switch to
a new state CSTATE_WAIT_ABORT_RESULT which would be similar to
CSTATE_WAIT_RESULT, but on success would skip to RETRY or ABORT instead
of proceeding to the next command.

ISTM that it would be more logical to only get into RETRY if there is a
retry,
i.e. move the test RETRY/ABORT in FAILURE. For that, instead of
"canRetry",
maybe you want "doRetry", which tells that a retry is possible (the
error
is serializable or deadlock) and that the current parameters allow it
(timeout, max retries).

* Minor C style comments:

if / else if / else if ... on *_FAILURE: I'd suggest a switch.

The following line removal does not seem useful, I'd have kept it:

stats->cnt++;
-
if (skipped)

copyVariables: I'm not convinced that source_vars & nvars variables are
that
useful.

memcpy(&(st->retry_state.random_state), &(st->random_state),
sizeof(RandomState));

Is there a problem with "st->retry_state.random_state =
st->random_state;"
instead of memcpy? ISTM that simple assignments work in C. Idem in the
reverse
copy under RETRY.

commandFailed: I'm not thrilled by the added boolean, which is
partially
redundant with the second argument.
if (per_script_stats)
-               accumStats(&sql_script[st->use_file].stats, skipped, 
latency, lag);
+       {
+               accumStats(&sql_script[st->use_file].stats, skipped, 
latency, lag,
+                                  st->failure_status, st->retries);
+       }
}
I do not see the point of changing the style here.

[9]: /messages/by-id/alpine.DEB.2.21.1808170917510.20841@lancre
/messages/by-id/alpine.DEB.2.21.1808170917510.20841@lancre

Here is an attempt at having a more precise and shorter version, not
sure
it is much better than yours, though:

"""
Transactions are counted depending on their execution and outcome.
First
a transaction may have started or not: skipped transactions occur under
--rate and --latency-limit when the client is too late to execute them.
Secondly, a started transaction may ultimately succeed or fail on some
error, possibly after some retries when --max-tries is not one. Thus
"""

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patchtext/x-diff; name=v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patchDownload

From d615a3e2cdc6949f4fae83a4a7a7328937f7acbf Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Tue, 4 Sep 2018 19:02:32 +0300
Subject: [PATCH v11 1/4] Pgbench errors: use the RandomState structure for
 thread/client random seed.

This is most important when it is used to reset a client's random seed during
the repeating of transactions after serialization/deadlock failures.

Use the random state of the client for random functions PGBENCH_RANDOM_* during
the execution of the script. Use the random state of the each thread option (to
choose the script / get the throttle delay / to log with a sample rate) to make
all of them independent of each other and therefore deterministic at the thread
level.
---
 src/bin/pgbench/pgbench.c | 104 +++++++++++++++++++++++++++-----------
 1 file changed, 74 insertions(+), 30 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c089..988e37bce5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -250,6 +250,14 @@ typedef struct StatsData
 	SimpleStats lag;
 } StatsData;
 
+/*
+ * Data structure for thread/client random seed.
+ */
+typedef struct
+{
+	unsigned short xseed[3];
+} RandomState;
+
 /*
  * Connection state machine states.
  */
@@ -331,6 +339,12 @@ typedef struct
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
 
+	/*
+	 * Separate randomness for each client. This is used for random functions
+	 * PGBENCH_RANDOM_* during the execution of the script.
+	 */
+	RandomState random_state;
+
 	int			use_file;		/* index in sql_script for this client */
 	int			command;		/* command number in script */
 
@@ -390,7 +404,16 @@ typedef struct
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
-	unsigned short random_state[3]; /* separate randomness for each thread */
+
+	/*
+	 * Separate randomness for each thread. Each thread option uses its own
+	 * random state to make all of them independent of each other and therefore
+	 * deterministic at the thread level.
+	 */
+	RandomState choose_script_rs;	/* random state for selecting a script */
+	RandomState throttling_rs;	/* random state for transaction throttling */
+	RandomState sampling_rs;	/* random state for log sampling */
+
 	int64		throttle_trigger;	/* previous/next throttling (us) */
 	FILE	   *logfile;		/* where to log, or NULL */
 	ZipfCache	zipf_cache;		/* for thread-safe  zipfian random number
@@ -694,7 +717,7 @@ gotdigits:
 
 /* random number generator: uniform distribution from min to max inclusive */
 static int64
-getrand(TState *thread, int64 min, int64 max)
+getrand(RandomState *random_state, int64 min, int64 max)
 {
 	/*
 	 * Odd coding is so that min and max have approximately the same chance of
@@ -705,7 +728,7 @@ getrand(TState *thread, int64 min, int64 max)
 	 * protected by a mutex, and therefore a bottleneck on machines with many
 	 * CPUs.
 	 */
-	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
+	return min + (int64) ((max - min + 1) * pg_erand48(random_state->xseed));
 }
 
 /*
@@ -714,7 +737,8 @@ getrand(TState *thread, int64 min, int64 max)
  * value is exp(-parameter).
  */
 static int64
-getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
+getExponentialRand(RandomState *random_state, int64 min, int64 max,
+				   double parameter)
 {
 	double		cut,
 				uniform,
@@ -724,7 +748,7 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 	Assert(parameter > 0.0);
 	cut = exp(-parameter);
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->xseed);
 
 	/*
 	 * inner expression in (cut, 1] (if parameter > 0), rand in [0, 1)
@@ -737,7 +761,8 @@ getExponentialRand(TState *thread, int64 min, int64 max, double parameter)
 
 /* random number generator: gaussian distribution from min to max inclusive */
 static int64
-getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
+getGaussianRand(RandomState *random_state, int64 min, int64 max,
+				double parameter)
 {
 	double		stdev;
 	double		rand;
@@ -765,8 +790,8 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
 		 * are expected in (0, 1] (see
 		 * https://en.wikipedia.org/wiki/Box-Muller_transform)
 		 */
-		double		rand1 = 1.0 - pg_erand48(thread->random_state);
-		double		rand2 = 1.0 - pg_erand48(thread->random_state);
+		double		rand1 = 1.0 - pg_erand48(random_state->xseed);
+		double		rand2 = 1.0 - pg_erand48(random_state->xseed);
 
 		/* Box-Muller basic form transform */
 		double		var_sqrt = sqrt(-2.0 * log(rand1));
@@ -793,7 +818,7 @@ getGaussianRand(TState *thread, int64 min, int64 max, double parameter)
  * will approximate a Poisson distribution centered on the given value.
  */
 static int64
-getPoissonRand(TState *thread, int64 center)
+getPoissonRand(RandomState *random_state, int64 center)
 {
 	/*
 	 * Use inverse transform sampling to generate a value > 0, such that the
@@ -802,7 +827,7 @@ getPoissonRand(TState *thread, int64 center)
 	double		uniform;
 
 	/* erand in [0, 1), uniform in (0, 1] */
-	uniform = 1.0 - pg_erand48(thread->random_state);
+	uniform = 1.0 - pg_erand48(random_state->xseed);
 
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
@@ -880,7 +905,7 @@ zipfFindOrCreateCacheCell(ZipfCache *cache, int64 n, double s)
  * Luc Devroye, p. 550-551, Springer 1986.
  */
 static int64
-computeIterativeZipfian(TState *thread, int64 n, double s)
+computeIterativeZipfian(RandomState *random_state, int64 n, double s)
 {
 	double		b = pow(2.0, s - 1.0);
 	double		x,
@@ -891,8 +916,8 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
 	while (true)
 	{
 		/* random variates */
-		u = pg_erand48(thread->random_state);
-		v = pg_erand48(thread->random_state);
+		u = pg_erand48(random_state->xseed);
+		v = pg_erand48(random_state->xseed);
 
 		x = floor(pow(u, -1.0 / (s - 1.0)));
 
@@ -910,10 +935,11 @@ computeIterativeZipfian(TState *thread, int64 n, double s)
  * Jim Gray et al, SIGMOD 1994
  */
 static int64
-computeHarmonicZipfian(TState *thread, int64 n, double s)
+computeHarmonicZipfian(ZipfCache *zipf_cache, RandomState *random_state,
+					   int64 n, double s)
 {
-	ZipfCell   *cell = zipfFindOrCreateCacheCell(&thread->zipf_cache, n, s);
-	double		uniform = pg_erand48(thread->random_state);
+	ZipfCell   *cell = zipfFindOrCreateCacheCell(zipf_cache, n, s);
+	double		uniform = pg_erand48(random_state->xseed);
 	double		uz = uniform * cell->harmonicn;
 
 	if (uz < 1.0)
@@ -925,7 +951,8 @@ computeHarmonicZipfian(TState *thread, int64 n, double s)
 
 /* random number generator: zipfian distribution from min to max inclusive */
 static int64
-getZipfianRand(TState *thread, int64 min, int64 max, double s)
+getZipfianRand(ZipfCache *zipf_cache, RandomState *random_state, int64 min,
+			   int64 max, double s)
 {
 	int64		n = max - min + 1;
 
@@ -934,8 +961,8 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s)
 
 
 	return min - 1 + ((s > 1)
-					  ? computeIterativeZipfian(thread, n, s)
-					  : computeHarmonicZipfian(thread, n, s));
+					  ? computeIterativeZipfian(random_state, n, s)
+					  : computeHarmonicZipfian(zipf_cache, random_state, n, s));
 }
 
 /*
@@ -2209,7 +2236,7 @@ evalStandardFunc(TState *thread, CState *st,
 				if (func == PGBENCH_RANDOM)
 				{
 					Assert(nargs == 2);
-					setIntValue(retval, getrand(thread, imin, imax));
+					setIntValue(retval, getrand(&st->random_state, imin, imax));
 				}
 				else			/* gaussian & exponential */
 				{
@@ -2231,7 +2258,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getGaussianRand(thread, imin, imax, param));
+									getGaussianRand(&st->random_state, imin,
+													imax, param));
 					}
 					else if (func == PGBENCH_RANDOM_ZIPFIAN)
 					{
@@ -2243,7 +2271,9 @@ evalStandardFunc(TState *thread, CState *st,
 							return false;
 						}
 						setIntValue(retval,
-									getZipfianRand(thread, imin, imax, param));
+									getZipfianRand(&thread->zipf_cache,
+												   &st->random_state, imin,
+												   imax, param));
 					}
 					else		/* exponential */
 					{
@@ -2256,7 +2286,8 @@ evalStandardFunc(TState *thread, CState *st,
 						}
 
 						setIntValue(retval,
-									getExponentialRand(thread, imin, imax, param));
+									getExponentialRand(&st->random_state, imin,
+													   imax, param));
 					}
 				}
 
@@ -2551,7 +2582,7 @@ chooseScript(TState *thread)
 	if (num_scripts == 1)
 		return 0;
 
-	w = getrand(thread, 0, total_weight - 1);
+	w = getrand(&thread->choose_script_rs, 0, total_weight - 1);
 	do
 	{
 		w -= sql_script[i++].weight;
@@ -2745,7 +2776,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * away.
 				 */
 				Assert(throttle_delay > 0);
-				wait = getPoissonRand(thread, throttle_delay);
+				wait = getPoissonRand(&thread->throttling_rs, throttle_delay);
 
 				thread->throttle_trigger += wait;
 				st->txn_scheduled = thread->throttle_trigger;
@@ -2779,7 +2810,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					{
 						processXactStats(thread, st, &now, true, agg);
 						/* next rendez-vous */
-						wait = getPoissonRand(thread, throttle_delay);
+						wait = getPoissonRand(&thread->throttling_rs,
+											  throttle_delay);
 						thread->throttle_trigger += wait;
 						st->txn_scheduled = thread->throttle_trigger;
 					}
@@ -3322,7 +3354,7 @@ doLog(TState *thread, CState *st,
 	 * to the random sample.
 	 */
 	if (sample_rate != 0.0 &&
-		pg_erand48(thread->random_state) > sample_rate)
+		pg_erand48(thread->sampling_rs.xseed) > sample_rate)
 		return;
 
 	/* should we aggregate the results or not? */
@@ -4750,6 +4782,17 @@ set_random_seed(const char *seed)
 	return true;
 }
 
+/*
+ * Initialize the random state of the client/thread.
+ */
+static void
+initRandomState(RandomState *random_state)
+{
+	random_state->xseed[0] = random();
+	random_state->xseed[1] = random();
+	random_state->xseed[2] = random();
+}
+
 
 int
 main(int argc, char **argv)
@@ -5358,6 +5401,7 @@ main(int argc, char **argv)
 	for (i = 0; i < nclients; i++)
 	{
 		state[i].cstack = conditional_stack_create();
+		initRandomState(&state[i].random_state);
 	}
 
 	if (debug)
@@ -5491,9 +5535,9 @@ main(int argc, char **argv)
 		thread->state = &state[nclients_dealt];
 		thread->nstate =
 			(nclients - nclients_dealt + nthreads - i - 1) / (nthreads - i);
-		thread->random_state[0] = random();
-		thread->random_state[1] = random();
-		thread->random_state[2] = random();
+		initRandomState(&thread->choose_script_rs);
+		initRandomState(&thread->throttling_rs);
+		initRandomState(&thread->sampling_rs);
 		thread->logfile = NULL; /* filled in later */
 		thread->latency_late = 0;
 		thread->zipf_cache.nb_cells = 0;
-- 
2.17.1

v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From ffbf34d489f8f01621866908b9dd418459bb0b45 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Wed, 5 Sep 2018 16:03:15 +0300
Subject: [PATCH v11 2/4] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 171 ++++++++++++++++++++++++--------------
 1 file changed, 109 insertions(+), 62 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 988e37bce5..1b25487bfc 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -201,6 +201,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -218,6 +224,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -349,9 +373,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction */
 	int64		txn_scheduled;	/* scheduled start time of transaction (usec) */
@@ -1212,39 +1234,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1362,21 +1384,43 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array. It is assumed that the sum of the number of current variables and the
+ * number of needed variables is less than or equal to (INT_MAX -
+ * VARIABLES_ALLOC_MARGIN).
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		/*
+		 * We don't want to allocate variables one by one; for efficiency, add a
+		 * constant margin each time it overflows.
+		 */
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1389,23 +1433,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1414,12 +1452,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1437,12 +1476,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1457,12 +1496,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1517,7 +1557,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1538,7 +1578,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1553,12 +1593,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2390,7 +2431,7 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					fprintf(stderr, "undefined variable \"%s\"\n",
 							expr->u.variable.varname);
@@ -2454,7 +2495,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2485,7 +2526,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[i]);
@@ -2548,7 +2589,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 				argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 #ifdef DEBUG
@@ -2602,7 +2643,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2614,7 +2655,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		if (debug)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
@@ -2648,7 +2689,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		if (debug)
@@ -2676,14 +2717,14 @@ sendCommand(CState *st, Command *command)
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			fprintf(stderr, "%s: undefined variable \"%s\"\n",
 					argv[0], argv[1]);
@@ -2956,7 +2997,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 */
 						int			usec;
 
-						if (!evaluateSleep(st, argc, argv, &usec))
+						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
 							commandFailed(st, "sleep", "execution of meta-command failed");
 							st->state = CSTATE_ABORTED;
@@ -2997,7 +3038,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (command->meta == META_SET)
 						{
-							if (!putVariableValue(st, argv[0], argv[1], &result))
+							if (!putVariableValue(&st->variables, argv[0],
+												  argv[1], &result))
 							{
 								commandFailed(st, "set", "assignment of meta-command failed");
 								st->state = CSTATE_ABORTED;
@@ -3050,7 +3092,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SETSHELL)
 					{
-						bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
+						bool		ret = runShellCommand(&st->variables,
+														  argv[1], argv + 2,
+														  argc - 2);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -3070,7 +3114,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					}
 					else if (command->meta == META_SHELL)
 					{
-						bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
+						bool		ret = runShellCommand(&st->variables, NULL,
+														  argv + 1, argc - 1);
 
 						if (timer_exceeded) /* timeout */
 						{
@@ -5075,7 +5120,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -5377,19 +5422,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -5465,11 +5510,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -5478,15 +5523,15 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = ((uint64) (random() & 0xFFFF) << 48) |
 		((uint64) (random() & 0xFFFF) << 32) |
@@ -5494,15 +5539,17 @@ main(int argc, char **argv)
 		(uint64) (random() & 0xFFFF);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 7f5d15d564c3b35a2ed0e9b0f337ea5480b4be83 Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Wed, 5 Sep 2018 18:23:07 +0300
Subject: [PATCH v11 3/4] Pgbench errors and serialization/deadlock retries

Client's run is aborted only in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. Otherwise if the execution of SQL or
meta command fails, the current transaction is always rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are rolled back and repeated
until they complete successfully or reach the maximum number of tries (specified
by the --max-tries option) / the maximum time of tries (specified by the
--latency-limit option). These options can be combined together; more over, you
cannot use an infinite number of tries (--max-tries=0) without the option
--latency-limit. By default the option --max-tries is set to 1 and transactions
with serialization/deadlock errors are not retried at all. If the last
transaction run fails, this transaction will be reported as failed, and the
client variables will be set as they were before the first run of this
transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures / other SQL failures / failures in
meta commands), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --print-errors or
--debug. The first option is recommended for this purpose because with the
second option the output can be significantly increased due to debug messages of
successful commands of all the transactions.
---
 doc/src/sgml/ref/pgbench.sgml                |  420 +++++-
 src/bin/pgbench/pgbench.c                    | 1358 +++++++++++++++---
 src/bin/pgbench/t/001_pgbench_with_server.pl |  408 +++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |   10 +
 src/fe_utils/conditional.c                   |   16 +-
 src/include/fe_utils/conditional.h           |    2 +
 6 files changed, 1962 insertions(+), 252 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 88cf8b3933..4afc996825 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -55,16 +55,20 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 tps = 85.184871 (including connections establishing)
 tps = 85.296346 (excluding connections establishing)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL/meta command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last two lines report the number of transactions per second,
   figured with and without counting the time to start database sessions.
  </para>
@@ -384,7 +388,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       <term><option>--debug</option></term>
       <listitem>
        <para>
-        Print debugging output.
+        Print debugging output. This option automatically turns on the option
+        <option>--print-errors</option>.
        </para>
       </listitem>
      </varlistentry>
@@ -453,6 +458,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the option <option>--max-tries</option> is used, the transaction
+        with serialization or deadlock error cannot be retried if the total time
+        of all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default the option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried at all. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -513,22 +529,31 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
         last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        deviation since the last report.  If any transactions have received a
+        failure in the SQL or meta command since the last report, they are also
+        reported as failed.  Under throttling (<option>-R</option>), the latency
+        is computed with respect to the transaction scheduled start time, not
+        the actual transaction beginning time, thus it also includes the average
+        schedule lag time.  If any transactions have been rolled back and
+        retried after a serialization/deadlock error since the last report, the
+        report includes the number of such transactions and the sum of all
+        retries. Use the option <option>--max-tries</option> to enable
+        transactions retries after serialization/deadlock errors.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the option
+        <option>--max-tries</option> is not equal to 1.  See below for details.
        </para>
       </listitem>
      </varlistentry>
@@ -656,6 +681,32 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+         <listitem>
+          <para>other SQL failures;</para>
+         </listitem>
+         <listitem>
+          <para>meta command failures.</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -666,6 +717,39 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the option <option>--latency-limit</option> which limits the total time
+        of all transaction tries; more over, you cannot use an infinite number
+        of tries (<literal>--max-tries=0</literal>) without the option
+        <option>--latency-limit</option>. The default value is 1 and
+        transactions with serialization/deadlock errors are not retried at all.
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--print-errors</option></term>
+      <listitem>
+       <para>
+        Print messages of all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.). This option is
+        automatically enabled if the option <option>--debug</option> is used.
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -807,8 +891,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -881,6 +965,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that the scripts used do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -1583,7 +1672,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1604,6 +1693,18 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all the retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the option <option>--max-tries</option> is not equal to 1.
+   If the transaction ended with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the option
+   <option>--failures-detailed</option>, the <replaceable>time</replaceable> of
+   the failed transaction will be reported as
+   <literal>serialization_failure</literal> /
+   <literal>deadlock_failure</literal> / <literal>other_sql_failure</literal> /
+   <literal>meta_command_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -1632,6 +1733,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -1647,7 +1766,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> <optional> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> <replaceable>other_sql_failures</replaceable> <replaceable>meta_command_failures</replaceable> </optional> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -1661,7 +1780,22 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL or meta command within the interval. If you use the option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this,
+   <replaceable>other_sql_failures</replaceable> is the number of transactions
+   that got a different error in the SQL command (such errors are never
+   retried),
+   <replaceable>meta_command_failures</replaceable> is the number of
+   transactions that got an error in the meta command (such errors are never
+   retried).
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -1669,21 +1803,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the option <option>--max-tries</option> is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   the retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -1695,13 +1833,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the option
+   <option>--max-tries</option> is not equal to 1.
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -1715,27 +1884,64 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 15.844 ms
 latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-statement latencies in milliseconds:
-        0.002  \set aid random(1, 100000 * :scale)
-        0.005  \set bid random(1, 1 * :scale)
-        0.002  \set tid random(1, 10 * :scale)
-        0.001  \set delta random(-5000, 5000)
-        0.326  BEGIN;
-        0.603  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-        0.454  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-        5.528  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-        7.335  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-        0.371  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-        1.212  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
+</screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of failures: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of retried: 5629 (56.290%)
+number of retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.650224 (including connections establishing)
+tps = 413.686560 (excluding connections establishing)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
 </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -1749,6 +1955,138 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but most client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted only in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. Otherwise if the execution of SQL or
+   meta command fails, the current transaction is always rolled back which also
+   includes setting the client variables as they were before the run of this
+   transaction (it is assumed that one transaction script contains only one
+   transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock errors are repeated after
+   rollbacks until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    Although without the option <option>--max-tries</option> the transaction
+    will never be retried after an error, use an infinite number of tries
+    (<literal>--max-tries=0</literal>) and the option
+    <option>--latency-limit</option> to limit only the maximum time of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the option <option>--max-tries</option> is not equal to 1.
+   A retry is reported for the command if in this command an error is raised
+   which leads to a retry.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   option <option>--failures-detailed</option>. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the options
+   <option>--print-errors</option> or <option>--debug</option>. The first
+   variant is recommended for this purpose because in the second case the output
+   can be significantly increased due to debug messages of successful commands
+   of all the transactions.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 1b25487bfc..8da11209ad 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,8 @@
 
 #include "pgbench.h"
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -187,9 +189,33 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		is_latencies;		/* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after the errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There're different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries can be used to limit the number of tries;
+ * - latency_limit can be used to limit the total time of tries.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is set to 1 and such transactions will
+ * not be retried at all.
+ */
+
+/*
+ * We cannot retry the transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 0;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 char	   *pghost = "";
 char	   *pgport = "";
 char	   *login = NULL;
@@ -267,9 +293,67 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or get a failure,
+	 * possibly after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   skipped (it was too late to execute them)
+	 *   cnt (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * cnt (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of several types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   serialization_failures (they got a serialization error and were not
+	 *                           retried) +
+	 *   deadlock_failures (they got a deadlock error and were not retried) +
+	 *   other_sql_failures (they got a different error in the SQL command; such
+	 *                       errors are never retried) +
+	 *   meta_command_failures (they got an error in the meta command; such
+	 *                          errors are never retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * number of retries =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         serialization failures +
+	 *                         deadlock failures) transactions.
+	 */
+	int64		cnt;			/* number of successful transactions */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * retried after a serialization
+										 * error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * retried after a deadlock error */
+	int64		other_sql_failures;	/* number of transactions with a different
+									 * error in the SQL command */
+	int64		meta_command_failures;	/* number of transactions with an error
+										 * in the meta command */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -282,6 +366,30 @@ typedef struct
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /*
  * Connection state machine states.
  */
@@ -336,6 +444,35 @@ typedef enum
 	CSTATE_SLEEP,
 	CSTATE_END_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
 	 * latency, and logs the transaction.  In --connect mode, closes the
@@ -382,10 +519,24 @@ typedef struct
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
+	bool		rollback_prepared;	/* whether client prepared a rollback
+									 * command */
+
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus 	estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction after a serialization or
+								 * a deadlock error? */
 
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
-	int			ecnt;			/* error count */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -491,6 +642,10 @@ typedef struct
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in the current command */
+	int64		failures;		/* number of errors in the current command that
+								 * were not retried */
 } Command;
 
 typedef struct ParsedScript
@@ -506,7 +661,16 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-static int	debug = 0;			/* debug flag */
+typedef enum DebugLevel
+{
+	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
+	DEBUG_ERRORS,				/* print only error messages, retries and
+								 * failures */
+	DEBUG_ALL					/* print all debugging output (throttling,
+								 * executed/sent/received commands etc.) */
+} DebugLevel;
+
+static DebugLevel debug_level = NO_DEBUG;	/* debug flag */
 
 /* Builtin test scripts */
 typedef struct BuiltinScript
@@ -618,15 +782,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --print-errors           print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1084,6 +1251,12 @@ initStats(StatsData *sd, time_t start_time)
 	sd->start_time = start_time;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
+	sd->other_sql_failures = 0;
+	sd->meta_command_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1092,22 +1265,55 @@ initStats(StatsData *sd, time_t start_time)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 retries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_META_COMMAND_ERROR:
+			stats->meta_command_failures++;
+			break;
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		case ESTATUS_OTHER_SQL_ERROR:
+			stats->other_sql_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			fprintf(stderr, "unexpected error status: %d\n", estatus);
+			exit(1);
 	}
 }
 
@@ -1340,9 +1546,10 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			fprintf(stderr,
-					"malformed variable \"%s\" value: \"%s\"\n",
-					var->name, var->svalue);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr,
+						"malformed variable \"%s\" value: \"%s\"\n",
+						var->name, var->svalue);
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1411,7 +1618,9 @@ enlargeVariables(Variables *variables, int needed)
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
- * Returns NULL on failure (bad name).
+ * Returns NULL on failure (bad name). Because this can be used by client
+ * commands, print an error message only in debug mode. The caller can print his
+ * own error message.
  */
 static Variable *
 lookupCreateVariable(Variables *variables, const char *context, char *name)
@@ -1427,8 +1636,9 @@ lookupCreateVariable(Variables *variables, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-					context, name);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
+						context, name);
 			return NULL;
 		}
 
@@ -1460,7 +1670,11 @@ putVariable(Variables *variables, const char *context, char *name,
 
 	var = lookupCreateVariable(variables, context, name);
 	if (!var)
+	{
+		fprintf(stderr, "%s: error while setting variable \"%s\"\n",
+				context, name);
 		return false;
+	}
 
 	/* dup then free, in case value is pointing at this variable */
 	val = pg_strdup(value);
@@ -1473,8 +1687,12 @@ putVariable(Variables *variables, const char *context, char *name,
 	return true;
 }
 
-/* Assign a value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign a value to a variable, creating it if need be.
+ * Returns false on failure (bad name). Because this can be used by client
+ * commands, print an error message only in debug mode. The caller can print his
+ * own error message.
+ */
 static bool
 putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
@@ -1483,7 +1701,12 @@ putVariableValue(Variables *variables, const char *context, char *name,
 
 	var = lookupCreateVariable(variables, context, name);
 	if (!var)
+	{
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "%s: error while setting variable \"%s\"\n",
+					context, name);
 		return false;
+	}
 
 	if (var->svalue)
 		free(var->svalue);
@@ -1493,8 +1716,12 @@ putVariableValue(Variables *variables, const char *context, char *name,
 	return true;
 }
 
-/* Assign an integer value to a variable, creating it if need be */
-/* Returns false on failure (bad name) */
+/*
+ * Assign an integer value to a variable, creating it if need be.
+ * Returns false on failure (bad name). Because this can be used by client
+ * commands, print an error message only in debug mode. The caller can print his
+ * own error message.
+ */
 static bool
 putVariableInt(Variables *variables, const char *context, char *name,
 			   int64 value)
@@ -1634,7 +1861,9 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		fprintf(stderr, "cannot coerce %s to boolean\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "cannot coerce %s to boolean\n",
+					valueTypeName(pval));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1679,7 +1908,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			fprintf(stderr, "double to int overflow for %f\n", dval);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1687,7 +1917,8 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
 		return false;
 	}
 }
@@ -1708,7 +1939,9 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		fprintf(stderr, "cannot coerce %s to double\n", valueTypeName(pval));
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "cannot coerce %s to double\n",
+					valueTypeName(pval));
 		return false;
 	}
 }
@@ -1889,8 +2122,9 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		fprintf(stderr,
-				"too many function arguments, maximum is %d\n", MAX_FARGS);
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr,
+					"too many function arguments, maximum is %d\n", MAX_FARGS);
 		return false;
 	}
 
@@ -2013,7 +2247,8 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								fprintf(stderr, "division by zero\n");
+								if (debug_level >= DEBUG_ERRORS)
+									fprintf(stderr, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -2024,7 +2259,9 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										fprintf(stderr, "bigint out of range\n");
+										if (debug_level >= DEBUG_ERRORS)
+											fprintf(stderr,
+													"bigint out of range\n");
 										return false;
 									}
 									else
@@ -2264,13 +2501,15 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					fprintf(stderr, "empty range given to random\n");
+					if (debug_level >= DEBUG_ERRORS)
+						fprintf(stderr, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					fprintf(stderr, "random range is too large\n");
+					if (debug_level >= DEBUG_ERRORS)
+						fprintf(stderr, "random range is too large\n");
 					return false;
 				}
 
@@ -2292,9 +2531,10 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							fprintf(stderr,
-									"gaussian parameter must be at least %f "
-									"(not %f)\n", MIN_GAUSSIAN_PARAM, param);
+							if (debug_level >= DEBUG_ERRORS)
+								fprintf(stderr,
+										"gaussian parameter must be at least %f (not %f)\n",
+										MIN_GAUSSIAN_PARAM, param);
 							return false;
 						}
 
@@ -2306,9 +2546,10 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							fprintf(stderr,
-									"zipfian parameter must be in range (0, 1) U (1, %d]"
-									" (got %f)\n", MAX_ZIPFIAN_PARAM, param);
+							if (debug_level >= DEBUG_ERRORS)
+								fprintf(stderr,
+										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
@@ -2320,9 +2561,10 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							fprintf(stderr,
-									"exponential parameter must be greater than zero"
-									" (got %f)\n", param);
+							if (debug_level >= DEBUG_ERRORS)
+								fprintf(stderr,
+										"exponential parameter must be greater than zero (got %f)\n",
+										param);
 							return false;
 						}
 
@@ -2433,8 +2675,9 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					fprintf(stderr, "undefined variable \"%s\"\n",
-							expr->u.variable.varname);
+					if (debug_level >= DEBUG_ERRORS)
+						fprintf(stderr, "undefined variable \"%s\"\n",
+								expr->u.variable.varname);
 					return false;
 				}
 
@@ -2528,15 +2771,17 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[i]);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[i]);
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2553,7 +2798,7 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded)
+			if (!timer_exceeded && debug_level >= DEBUG_ERRORS)
 				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 			return false;
 		}
@@ -2563,19 +2808,21 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded)
+		if (!timer_exceeded && debug_level >= DEBUG_ERRORS)
 			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2585,8 +2832,10 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		fprintf(stderr, "%s: shell command must return an integer (not \"%s\")\n",
-				argv[0], res);
+		if (debug_level >= DEBUG_ERRORS)
+			fprintf(stderr,
+					"%s: shell command must return an integer (not \"%s\")\n",
+					argv[0], res);
 		return false;
 	}
 	if (!putVariableInt(variables, "setshell", variable, retval))
@@ -2605,14 +2854,31 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
 	fprintf(stderr,
-			"client %d aborted in command %d (%s) of script %d; %s\n",
+			"client %d got an error in command %d (%s) of script %d; %s\n",
 			st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
+static void
+clientAborted(CState *st, const char *message)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	Assert(command->type == SQL_COMMAND);
+	fprintf(stderr,
+			"client %d aborted in command %d (SQL) of script %d; %s\n",
+			st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2645,7 +2911,7 @@ sendCommand(CState *st, Command *command)
 		sql = pg_strdup(command->argv[0]);
 		sql = assignVariables(&st->variables, sql);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
@@ -2657,7 +2923,7 @@ sendCommand(CState *st, Command *command)
 
 		getQueryParams(&st->variables, command, params);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
@@ -2692,7 +2958,7 @@ sendCommand(CState *st, Command *command)
 		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
@@ -2702,10 +2968,62 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug)
+		if (debug_level >= DEBUG_ALL)
 			fprintf(stderr, "client %d could not send %s\n",
 					st->id, command->argv[0]);
-		st->ecnt++;
+		return false;
+	}
+	else
+		return true;
+}
+
+/* Send a rollback command, using the chosen querymode */
+static bool
+sendRollback(CState *st)
+{
+	static const char *rollback_cmd = "ROLLBACK;";
+	static const char *prepared_name = "P_rollback";	/* for QUERY_PREPARED */
+	int			r;
+
+	if (querymode == QUERY_SIMPLE)
+	{
+		if (debug_level >= DEBUG_ALL)
+			fprintf(stderr, "client %d sending %s\n", st->id, rollback_cmd);
+		r = PQsendQuery(st->con, rollback_cmd);
+	}
+	else if (querymode == QUERY_EXTENDED)
+	{
+		if (debug_level >= DEBUG_ALL)
+			fprintf(stderr, "client %d sending %s\n", st->id, rollback_cmd);
+		r = PQsendQueryParams(st->con, rollback_cmd, 0,
+							  NULL, NULL, NULL, NULL, 0);
+	}
+	else if (querymode == QUERY_PREPARED)
+	{
+		if (!st->rollback_prepared)
+		{
+			PGresult   *res;
+
+			res = PQprepare(st->con, prepared_name, rollback_cmd, 0, NULL);
+			if (PQresultStatus(res) != PGRES_COMMAND_OK)
+				fprintf(stderr, "%s", PQerrorMessage(st->con));
+			PQclear(res);
+			st->rollback_prepared = true;
+		}
+
+		if (debug_level >= DEBUG_ALL)
+			fprintf(stderr, "client %d sending %s\n", st->id, prepared_name);
+		r = PQsendQueryPrepared(st->con, prepared_name, 0,
+								NULL, NULL, NULL, 0);
+	}
+	else						/* unknown sql mode */
+		r = 0;
+
+	if (r == 0)
+	{
+		if (debug_level >= DEBUG_ALL)
+			fprintf(stderr, "client %d could not send %s\n",
+					st->id, rollback_cmd);
 		return false;
 	}
 	else
@@ -2726,8 +3044,9 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			fprintf(stderr, "%s: undefined variable \"%s\"\n",
-					argv[0], argv[1]);
+			if (debug_level >= DEBUG_ERRORS)
+				fprintf(stderr, "%s: undefined variable \"%s\"\n",
+						argv[0], argv[1]);
 			return false;
 		}
 		usec = atoi(var);
@@ -2749,6 +3068,192 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		if (source_var->svalue == NULL)
+			dest_var->svalue = NULL;
+		else
+			dest_var->svalue = pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, instr_time *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->retries + 1 >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
+		if (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+getTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				fprintf(stderr, "perhaps the backend died while processing\n");
+				return false;
+			}
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			fprintf(stderr, "unexpected transaction status %d\n", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, instr_time *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	if (INSTR_TIME_IS_ZERO(*now))
+		INSTR_TIME_SET_CURRENT(*now);
+
+	return (100.0 * (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled) /
+			latency_limit);
+}
+
 /*
  * Advance the state machine of a connection, if possible.
  */
@@ -2790,9 +3295,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug)
-					fprintf(stderr, "client %d executing script \"%s\"\n", st->id,
-							sql_script[st->use_file].desc);
+				if (debug_level >= DEBUG_ALL)
+					fprintf(stderr, "client %d executing script \"%s\"\n",
+							st->id, sql_script[st->use_file].desc);
 
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
@@ -2800,6 +3305,10 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					st->state = CSTATE_START_TX;
 				/* check consistency */
 				Assert(conditional_stack_empty(st->cstack));
+
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->retries = 0;
 				break;
 
 				/*
@@ -2865,7 +3374,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
 							st->id, wait);
 				break;
@@ -2909,8 +3418,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 					/* Reset session-local state */
 					memset(st->prepared, 0, sizeof(st->prepared));
+					st->rollback_prepared = false;
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->random_state;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/*
 				 * Record transaction start time under logging, progress or
 				 * throttling.
@@ -2955,7 +3473,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Record statement start time if per-command latencies are
 				 * requested
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -2966,7 +3484,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				{
 					if (!sendCommand(st, command))
 					{
-						commandFailed(st, "SQL", "SQL command send failed");
+						clientAborted(st, "SQL command send failed");
 						st->state = CSTATE_ABORTED;
 					}
 					else
@@ -2978,7 +3496,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug)
+					if (debug_level >= DEBUG_ALL)
 					{
 						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
 						for (i = 1; i < argc; i++)
@@ -2999,8 +3517,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							commandFailed(st, "sleep", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							if (debug_level >= DEBUG_ERRORS)
+								commandFailed(st, "sleep",
+											  "execution of meta-command failed");
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
+							st->state = CSTATE_ERROR;
 							break;
 						}
 
@@ -3031,8 +3552,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							commandFailed(st, argv[0], "evaluation of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							if (debug_level >= DEBUG_ERRORS)
+								commandFailed(st, argv[0],
+											  "evaluation of meta-command failed");
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
+							st->state = CSTATE_ERROR;
 							break;
 						}
 
@@ -3041,8 +3565,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							if (!putVariableValue(&st->variables, argv[0],
 												  argv[1], &result))
 							{
-								commandFailed(st, "set", "assignment of meta-command failed");
-								st->state = CSTATE_ABORTED;
+								if (debug_level >= DEBUG_ERRORS)
+									commandFailed(st, "set",
+												  "assignment of meta-command failed");
+								st->estatus = ESTATUS_META_COMMAND_ERROR;
+								st->state = CSTATE_ERROR;
 								break;
 							}
 						}
@@ -3103,8 +3630,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "setshell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							if (debug_level >= DEBUG_ERRORS)
+								commandFailed(st, "setshell",
+											  "execution of meta-command failed");
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
+							st->state = CSTATE_ERROR;
 							break;
 						}
 						else
@@ -3124,8 +3654,11 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							commandFailed(st, "shell", "execution of meta-command failed");
-							st->state = CSTATE_ABORTED;
+							if (debug_level >= DEBUG_ERRORS)
+								commandFailed(st, "shell",
+											  "execution of meta-command failed");
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
+							st->state = CSTATE_ERROR;
 							break;
 						}
 						else
@@ -3242,11 +3775,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_WAIT_RESULT:
 				command = sql_script[st->use_file].commands[st->command];
-				if (debug)
+				if (debug_level >= DEBUG_ALL)
 					fprintf(stderr, "client %d receiving\n", st->id);
 				if (!PQconsumeInput(st->con))
 				{				/* there's something wrong */
-					commandFailed(st, "SQL", "perhaps the backend died while processing");
+					clientAborted(st,
+								  "perhaps the backend died while processing");
 					st->state = CSTATE_ABORTED;
 					break;
 				}
@@ -3263,12 +3797,66 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					case PGRES_TUPLES_OK:
 					case PGRES_EMPTY_QUERY:
 						/* OK */
+						st->estatus = ESTATUS_NO_ERROR;
 						PQclear(res);
 						discard_response(st);
 						st->state = CSTATE_END_COMMAND;
 						break;
+					case PGRES_NONFATAL_ERROR:
+					case PGRES_FATAL_ERROR:
+						st->estatus = getSQLErrorStatus(
+							PQresultErrorField(res, PG_DIAG_SQLSTATE));
+						if (debug_level >= DEBUG_ERRORS)
+							commandFailed(st, "SQL", PQerrorMessage(st->con));
+						PQclear(res);
+						discard_response(st);
+						st->state = CSTATE_ERROR;
+						break;
 					default:
-						commandFailed(st, "SQL", PQerrorMessage(st->con));
+						clientAborted(st, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				if (debug_level >= DEBUG_ALL)
+					fprintf(stderr, "client %d receiving\n", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					fprintf(stderr,
+							"client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing\n",
+							st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						discard_response(st);
+						/* Check if we can retry the error. */
+						if (doRetry(st, &now))
+							st->state = CSTATE_RETRY;
+						else
+							st->state = CSTATE_FAILURE;
+						break;
+					default:
+						fprintf(stderr,
+								"client %d aborted while rolling back the transaction after an error; %s\n",
+								st->id, PQerrorMessage(st->con));
 						PQclear(res);
 						st->state = CSTATE_ABORTED;
 						break;
@@ -3300,7 +3888,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * in thread-local data structure, if per-command latencies
 				 * are requested.
 				 */
-				if (is_latencies)
+				if (report_per_command)
 				{
 					if (INSTR_TIME_IS_ZERO(now))
 						INSTR_TIME_SET_CURRENT(now);
@@ -3319,52 +3907,215 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
-				 * End of transaction.
+				 * Clean up after an error.
 				 */
-			case CSTATE_END_TX:
+			case CSTATE_ERROR:
+				{
+					bool		in_tx_block;
 
-				/* transaction finished: calculate latency and do log */
-				processXactStats(thread, st, &now, false, agg);
+					Assert(st->estatus != ESTATUS_NO_ERROR);
 
-				/* conditional stack must be empty */
-				if (!conditional_stack_empty(st->cstack))
-				{
-					fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
-					exit(1);
+					/* Clear the conditional stack */
+					conditional_stack_reset(st->cstack);
+
+					/*
+					 * Check if we have a (failed) transaction block or not, and
+					 * roll it back if any.
+					 */
+
+					if (!getTransactionStatus(st->con, &in_tx_block))
+					{
+						/*
+						 * There's something wrong...
+						 * It is assumed that the function getTransactionStatus
+						 * has already printed a more detailed error message.
+						 */
+						fprintf(stderr,
+								"client %d aborted while receiving the transaction status\n",
+								st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					if (in_tx_block)
+					{
+						/* Try to rollback a (failed) transaction block. */
+						if (!sendRollback(st))
+						{
+							fprintf(stderr,
+									"client %d aborted: failed to send sql command for rolling back the failed transaction\n",
+									st->id);
+							st->state = CSTATE_ABORTED;
+						}
+						else
+							st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+					}
+					else
+					{
+						/* Check if we can retry the error. */
+						if (doRetry(st, &now))
+							st->state = CSTATE_RETRY;
+						else
+							st->state = CSTATE_FAILURE;
+					}
 				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the retry. */
+				st->retries++;
+				if (report_per_command)
+					command->retries++;
 
-				if (is_connect)
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (debug_level >= DEBUG_ERRORS)
 				{
-					finishCon(st);
-					INSTR_TIME_SET_ZERO(now);
+					fprintf(stderr,
+							"client %d repeats the transaction after the error (try %d",
+							st->id, st->retries + 1);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					fprintf(stderr, ")\n");
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->random_state = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * If this is a serialization or deadlock failure, inform that
+				 * the failed transaction will not be retried.
+				 */
+				if (debug_level >= DEBUG_ERRORS && canRetryError(st->estatus))
 				{
-					/* exit success */
-					st->state = CSTATE_FINISHED;
-					break;
+					fprintf(stderr,
+							"client %d ends the failed transaction (try %d",
+							st->id, st->retries + 1);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					fprintf(stderr, ")\n");
 				}
 
 				/*
-				 * No transaction is underway anymore.
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
 				 */
-				st->state = CSTATE_CHOOSE_SCRIPT;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
 
 				/*
-				 * If we paced through all commands in the script in this
-				 * loop, without returning to the caller even once, do it now.
-				 * This gives the thread a chance to process other
-				 * connections, and to do progress reporting.  This can
-				 * currently only happen if the script consists entirely of
-				 * meta-commands.
+				 * End of transaction.
 				 */
-				if (end_tx_processed)
-					return;
-				else
+			case CSTATE_END_TX:
 				{
-					end_tx_processed = true;
-					break;
+					bool		in_tx_block;
+
+					/* transaction finished: calculate latency and do log */
+					processXactStats(thread, st, &now, false, agg);
+
+					/* conditional stack must be empty */
+					if (!conditional_stack_empty(st->cstack))
+					{
+						fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
+						exit(1);
+					}
+
+					/*
+					 * We must complete all the transaction blocks that were
+					 * started in this script.
+					 */
+					if (!getTransactionStatus(st->con, &in_tx_block))
+					{
+						/*
+						 * There's something wrong...
+						 * It is assumed that the function getTransactionStatus
+						 * has already printed a more detailed error message.
+						 */
+						fprintf(stderr,
+								"client %d aborted while receiving the transaction status\n",
+								st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					if (in_tx_block)
+					{
+						fprintf(stderr,
+								"client %d aborted: end of script reached without completing the last transaction\n",
+								st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					if (is_connect)
+					{
+						finishCon(st);
+						INSTR_TIME_SET_ZERO(now);
+					}
+
+					if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+					{
+						/* exit success */
+						st->state = CSTATE_FINISHED;
+						break;
+					}
+
+					/*
+					 * No transaction is underway anymore.
+					 */
+					st->state = CSTATE_CHOOSE_SCRIPT;
+
+					/*
+					 * If we paced through all commands in the script in this
+					 * loop, without returning to the caller even once, do it now.
+					 * This gives the thread a chance to process other
+					 * connections, and to do progress reporting.  This can
+					 * currently only happen if the script consists entirely of
+					 * meta-commands.
+					 */
+					if (end_tx_processed)
+						return;
+					else
+					{
+						end_tx_processed = true;
+						break;
+					}
 				}
 
 				/*
@@ -3378,6 +4129,15 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	}
 }
 
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures +
+			stats->other_sql_failures +
+			stats->meta_command_failures);
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3422,6 +4182,16 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures,
+						agg->other_sql_failures,
+						agg->meta_command_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3432,6 +4202,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3439,7 +4213,7 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->retries);
 	}
 	else
 	{
@@ -3451,12 +4225,50 @@ doLog(TState *thread, CState *st,
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
 					st->id, st->cnt, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
-		else
+		else if (st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
 					st->id, st->cnt, latency, st->use_file,
 					(long) tv.tv_sec, (long) tv.tv_usec);
+		else if (failures_detailed)
+		{
+			switch (st->estatus)
+			{
+				case ESTATUS_META_COMMAND_ERROR:
+					fprintf(logfile, "%d " INT64_FORMAT " meta_command_failure %d %ld %ld",
+							st->id, st->cnt, st->use_file,
+							(long) tv.tv_sec, (long) tv.tv_usec);
+					break;
+				case ESTATUS_SERIALIZATION_ERROR:
+					fprintf(logfile, "%d " INT64_FORMAT " serialization_failure %d %ld %ld",
+							st->id, st->cnt, st->use_file,
+							(long) tv.tv_sec, (long) tv.tv_usec);
+					break;
+				case ESTATUS_DEADLOCK_ERROR:
+					fprintf(logfile, "%d " INT64_FORMAT " deadlock_failure %d %ld %ld",
+							st->id, st->cnt, st->use_file,
+							(long) tv.tv_sec, (long) tv.tv_usec);
+					break;
+				case ESTATUS_OTHER_SQL_ERROR:
+					fprintf(logfile, "%d " INT64_FORMAT " other_sql_failure %d %ld %ld",
+							st->id, st->cnt, st->use_file,
+							(long) tv.tv_sec, (long) tv.tv_usec);
+					break;
+				default:
+					/* internal error which should never occur */
+					fprintf(stderr, "unexpected error status: %d\n",
+							st->estatus);
+					exit(1);
+			}
+		}
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " failed %d %ld %ld",
+					st->id, st->cnt, st->use_file,
+					(long) tv.tv_sec, (long) tv.tv_usec);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3465,7 +4277,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, instr_time *now,
@@ -3473,10 +4286,10 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		if (INSTR_TIME_IS_ZERO(*now))
 			INSTR_TIME_SET_CURRENT(*now);
@@ -3486,20 +4299,12 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->retries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3509,7 +4314,8 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->retries);
 }
 
 
@@ -4644,15 +5450,16 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 				tps_include,
 				tps_exclude;
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	int			i,
 				totalCacheOverflows = 0;
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 
 	/* tps is about actually executed transactions */
-	tps_include = ntx / time_include;
-	tps_exclude = ntx /
+	tps_include = total->cnt / time_include;
+	tps_exclude = total->cnt /
 		(time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
 
 	/* Report test parameters. */
@@ -4666,14 +5473,55 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	if (failures > 0)
+	{
+		printf("number of failures: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			/* SQL failures */
+			if (total->serialization_failures || total->other_sql_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures || total->other_sql_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+			if (total->other_sql_failures)
+				printf("number of other SQL failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->other_sql_failures,
+					   100.0 * total->other_sql_failures / total_cnt);
+
+			/* meta command failures */
+			if (total->meta_command_failures > 0)
+				printf("number of meta-command failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->meta_command_failures,
+					   100.0 * total->meta_command_failures / total_cnt);
+		}
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("number of retries: " INT64_FORMAT "\n", total->retries);
 	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Report zipfian cache overflow */
 	for (i = 0; i < nthreads; i++)
 	{
@@ -4685,26 +5533,27 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
-			   100.0 * total->skipped / total->cnt);
+			   100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   1000.0 * time_include * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   1000.0 * time_include * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -4723,7 +5572,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
 	/* Report per-script/command statistics */
-	if (per_script_stats || is_latencies)
+	if (per_script_stats || report_per_command)
 	{
 		int			i;
 
@@ -4732,6 +5581,9 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -4741,25 +5593,75 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / time_include);
+					   sstats->cnt / time_include);
+
+				if (failures > 0)
+				{
+					printf(" - number of failures: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
+
+					if (failures_detailed)
+					{
+						/* SQL failures */
+						if (total->serialization_failures ||
+							total->other_sql_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures ||
+							total->other_sql_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+						if (total->other_sql_failures)
+							printf(" - number of other SQL failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->other_sql_failures,
+								   (100.0 * sstats->other_sql_failures /
+									script_total_cnt));
+
+						/* meta command failures */
+						if (total->meta_command_failures > 0)
+							printf(" - number of meta-command failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->meta_command_failures,
+								   (100.0 * sstats->meta_command_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
-			if (is_latencies)
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
+			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -4767,10 +5669,19 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->line);
 				}
 			}
 		}
@@ -4862,7 +5773,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -4881,6 +5792,9 @@ main(int argc, char **argv)
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
 		{"random-seed", required_argument, NULL, 9},
+		{"failures-detailed", no_argument, NULL, 10},
+		{"max-tries", required_argument, NULL, 11},
+		{"print-errors", no_argument, NULL, 12},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4917,6 +5831,7 @@ main(int argc, char **argv)
 	PGconn	   *con;
 	PGresult   *res;
 	char	   *env;
+	bool		retry = false;	/* retry transactions with errors or not */
 
 	progname = get_progname(argv[0]);
 
@@ -4986,7 +5901,7 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug++;
+				debug_level = DEBUG_ALL;
 				break;
 			case 'c':
 				benchmarking_option_set = true;
@@ -5039,7 +5954,7 @@ main(int argc, char **argv)
 				break;
 			case 'r':
 				benchmarking_option_set = true;
-				is_latencies = true;
+				report_per_command = true;
 				break;
 			case 's':
 				scale_given = true;
@@ -5236,6 +6151,40 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 10:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 11:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						fprintf(stderr,
+								"invalid number of maximum tries: \"%s\"\n",
+								optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+
+					/*
+					 * Always retry transactions with errors if this option is
+					 * used. But if its value is 0, use the option
+					 * --latency-limit to limit the number of tries.
+					 */
+					retry = true;
+
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 12:			/* print-errors */
+				benchmarking_option_set = true;
+				/* do not conflict with the option --debug */
+				if (debug_level < DEBUG_ERRORS)
+					debug_level = DEBUG_ERRORS;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5405,6 +6354,20 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (retry && !latency_limit)
+		{
+			fprintf(stderr, "an infinite number of transaction tries can only be used with the option --latency-limit\n");
+			exit(1);
+		}
+		else if (!retry)
+		{
+			/* By default transactions with errors are not retried */
+			max_tries = 1;
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -5430,7 +6393,12 @@ main(int argc, char **argv)
 				{
 					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
+					{
+						fprintf(stderr,
+								"error when setting the startup variable \"%s\" for client %d\n",
+								var->name, i);
 						exit(1);
+					}
 				}
 				else
 				{
@@ -5449,7 +6417,7 @@ main(int argc, char **argv)
 		initRandomState(&state[i].random_state);
 	}
 
-	if (debug)
+	if (debug_level >= DEBUG_ALL)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -5515,7 +6483,12 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 		{
 			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
+			{
+				fprintf(stderr,
+						"error when setting the startup variable \"scale\" for client %d\n",
+						i);
 				exit(1);
+			}
 		}
 	}
 
@@ -5527,7 +6500,12 @@ main(int argc, char **argv)
 	{
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
+			{
+				fprintf(stderr,
+						"error when setting the startup variable \"client_id\" for client %d\n",
+						i);
 				exit(1);
+			}
 	}
 
 	/* set default seed for hash functions */
@@ -5541,7 +6519,12 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
 								(int64) seed))
+			{
+				fprintf(stderr,
+						"error when setting the startup variable \"default_seed\" for client %d\n",
+						i);
 				exit(1);
+			}
 	}
 
 	/* set random seed unless overwritten */
@@ -5550,7 +6533,12 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
 								random_seed))
+			{
+				fprintf(stderr,
+						"error when setting the startup variable \"random_seed\" for client %d\n",
+						i);
 				exit(1);
+			}
 	}
 
 	if (!is_no_vacuum)
@@ -5666,6 +6654,12 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
+		stats.other_sql_failures += thread->stats.other_sql_failures;
+		stats.meta_command_failures += thread->stats.meta_command_failures;
 		latency_late += thread->latency_late;
 		INSTR_TIME_ADD(conn_total_time, thread->conn_time);
 	}
@@ -5804,7 +6798,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -5908,7 +6903,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call doCustom unless data is available */
 				int			sock = PQsocket(st->con);
@@ -5950,7 +6946,9 @@ threadRun(void *arg)
 				/* generate and show report */
 				StatsData	cur;
 				int64		run = now - last_report,
-							ntx;
+							cnt,
+							failures,
+							retried;
 				double		tps,
 							total_run,
 							latency,
@@ -5977,23 +6975,34 @@ threadRun(void *arg)
 					mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
 					cur.cnt += thread[i].stats.cnt;
 					cur.skipped += thread[i].stats.skipped;
+					cur.retries += thread[i].stats.retries;
+					cur.retried += thread[i].stats.retried;
+					cur.serialization_failures +=
+						thread[i].stats.serialization_failures;
+					cur.deadlock_failures += thread[i].stats.deadlock_failures;
+					cur.other_sql_failures +=
+						thread[i].stats.other_sql_failures;
+					cur.meta_command_failures +=
+						thread[i].stats.meta_command_failures;
 				}
 
 				/* we count only actually executed transactions */
-				ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped);
+				cnt = cur.cnt - last.cnt;
 				total_run = (now - thread_start) / 1000000.0;
-				tps = 1000000.0 * ntx / run;
-				if (ntx > 0)
+				tps = 1000000.0 * cnt / run;
+				if (cnt > 0)
 				{
-					latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx;
-					sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx;
+					latency = 0.001 * (cur.latency.sum - last.latency.sum) / cnt;
+					sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / cnt;
 					stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-					lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx;
+					lag = 0.001 * (cur.lag.sum - last.lag.sum) / cnt;
 				}
 				else
 				{
 					latency = sqlat = stdev = lag = 0;
 				}
+				failures = getFailures(&cur) - getFailures(&last);
+				retried = cur.retried - last.retried;
 
 				if (progress_timestamp)
 				{
@@ -6019,6 +7028,9 @@ threadRun(void *arg)
 						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
 						tbuf, tps, latency, stdev);
 
+				if (failures > 0)
+					fprintf(stderr, ", " INT64_FORMAT " failed", failures);
+
 				if (throttle_delay)
 				{
 					fprintf(stderr, ", lag %.3f ms", lag);
@@ -6026,6 +7038,12 @@ threadRun(void *arg)
 						fprintf(stderr, ", " INT64_FORMAT " skipped",
 								cur.skipped - last.skipped);
 				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (retried > 0)
+					fprintf(stderr,
+							", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+							retried, cur.retries - last.retries);
 				fprintf(stderr, "\n");
 
 				last = cur;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 2fc021dde7..033e66996e 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -5,9 +5,20 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 
+use constant
+{
+	SQL_ERROR           => 0,
+	META_COMMAND_ERROR  => 1,
+	SYNTAX_ERROR        => 2,
+};
+
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench
@@ -136,7 +147,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -530,11 +542,12 @@ pgbench(
 # trigger many expression errors
 my @errors = (
 
-	# [ test name, expected status, expected stderr, script ]
+	# [ test name, expected status, error type, expected stderr, script ]
 	# SQL
 	[
 		'sql syntax error',
 		0,
+		SQL_ERROR,
 		[
 			qr{ERROR:  syntax error},
 			qr{prepared statement .* does not exist}
@@ -544,28 +557,36 @@ my @errors = (
 }
 	],
 	[
-		'sql too many args', 1, [qr{statement has too many arguments.*\b9\b}],
+		'sql too many args', 1, SYNTAX_ERROR,
+		[qr{statement has too many arguments.*\b9\b}],
 		q{-- MAX_ARGS=10 for prepared
 \set i 0
 SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
+}
+	],
+	[   'sql division by zero', 0, SQL_ERROR, [qr{ERROR:  division by zero}],
+		q{-- SQL division by zero
+SELECT 1 / 0;
 }
 	],
 
 	# SHELL
 	[
-		'shell bad command',                    0,
+		'shell bad command', 0, META_COMMAND_ERROR,
 		[qr{\(shell\) .* meta-command failed}], q{\shell no-such-command}
 	],
 	[
-		'shell undefined variable', 0,
+		'shell undefined variable', 0, META_COMMAND_ERROR,
 		[qr{undefined variable ":nosuchvariable"}],
 		q{-- undefined variable in shell
 \shell echo ::foo :nosuchvariable
 }
 	],
-	[ 'shell missing command', 1, [qr{missing command }], q{\shell} ],
+	[   'shell missing command', 1, SYNTAX_ERROR, [qr{missing command }],
+		q{\shell} ],
 	[
-		'shell too many args', 1, [qr{too many arguments in command "shell"}],
+		'shell too many args', 1, SYNTAX_ERROR,
+		[qr{too many arguments in command "shell"}],
 		q{-- 257 arguments to \shell
 \shell echo \
  0 1 2 3 4 5 6 7 8 9 A B C D E F \
@@ -589,162 +610,232 @@ SELECT LEAST(:i, :i, :i, :i, :i, :i, :i, :i, :i, :i, :i);
 
 	# SET
 	[
-		'set syntax error',                  1,
+		'set syntax error', 1, SYNTAX_ERROR,
 		[qr{syntax error in command "set"}], q{\set i 1 +}
 	],
 	[
-		'set no such function',         1,
+		'set no such function', 1, SYNTAX_ERROR,
 		[qr{unexpected function name}], q{\set i noSuchFunction()}
 	],
 	[
-		'set invalid variable name', 0,
+		'set invalid variable name', 0, META_COMMAND_ERROR,
 		[qr{invalid variable name}], q{\set . 1}
 	],
 	[
-		'set int overflow',                   0,
+		'set int overflow', 0, META_COMMAND_ERROR,
 		[qr{double to int overflow for 100}], q{\set i int(1E32)}
 	],
-	[ 'set division by zero', 0, [qr{division by zero}], q{\set i 1/0} ],
 	[
-		'set bigint out of range', 0,
+		'set division by zero', 0, META_COMMAND_ERROR,
+		[qr{division by zero}], q{\set i 1/0}
+	],
+	[
+		'set bigint out of range', 0, META_COMMAND_ERROR,
 		[qr{bigint out of range}], q{\set i 9223372036854775808 / -1}
 	],
 	[
 		'set undefined variable',
 		0,
+		META_COMMAND_ERROR,
 		[qr{undefined variable "nosuchvariable"}],
 		q{\set i :nosuchvariable}
 	],
-	[ 'set unexpected char', 1, [qr{unexpected character .;.}], q{\set i ;} ],
+	[
+		'set unexpected char', 1, SYNTAX_ERROR,
+		[qr{unexpected character .;.}], q{\set i ;}
+	],
 	[
 		'set too many args',
 		0,
+		META_COMMAND_ERROR,
 		[qr{too many function arguments}],
 		q{\set i least(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)}
 	],
 	[
-		'set empty random range',          0,
+		'set empty random range', 0, META_COMMAND_ERROR,
 		[qr{empty range given to random}], q{\set i random(5,3)}
 	],
 	[
 		'set random range too large',
 		0,
+		META_COMMAND_ERROR,
 		[qr{random range is too large}],
 		q{\set i random(-9223372036854775808, 9223372036854775807)}
 	],
 	[
 		'set gaussian param too small',
 		0,
+		META_COMMAND_ERROR,
 		[qr{gaussian param.* at least 2}],
 		q{\set i random_gaussian(0, 10, 1.0)}
 	],
 	[
 		'set exponential param greater 0',
 		0,
+		META_COMMAND_ERROR,
 		[qr{exponential parameter must be greater }],
 		q{\set i random_exponential(0, 10, 0.0)}
 	],
 	[
 		'set zipfian param to 1',
 		0,
+		META_COMMAND_ERROR,
 		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1)}
 	],
 	[
 		'set zipfian param too large',
 		0,
+		META_COMMAND_ERROR,
 		[qr{zipfian parameter must be in range \(0, 1\) U \(1, \d+\]}],
 		q{\set i random_zipfian(0, 10, 1000000)}
 	],
 	[
-		'set non numeric value',                     0,
+		'set non numeric value', 0, META_COMMAND_ERROR,
 		[qr{malformed variable "foo" value: "bla"}], q{\set i :foo + 1}
 	],
-	[ 'set no expression',    1, [qr{syntax error}],      q{\set i} ],
-	[ 'set missing argument', 1, [qr{missing argument}i], q{\set} ],
+	[ 'set no expression', 1, SYNTAX_ERROR, [qr{syntax error}], q{\set i} ],
+	[
+		'set missing argument', 1, SYNTAX_ERROR,
+		[qr{missing argument}i], q{\set}
+	],
 	[
-		'set not a bool',                      0,
+		'set not a bool', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce double to boolean}], q{\set b NOT 0.0}
 	],
 	[
-		'set not an int',                   0,
+		'set not an int', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to int}], q{\set i TRUE + 2}
 	],
 	[
-		'set not a double',                    0,
+		'set not a double', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to double}], q{\set d ln(TRUE)}
 	],
 	[
 		'set case error',
 		1,
+		SYNTAX_ERROR,
 		[qr{syntax error in command "set"}],
 		q{\set i CASE TRUE THEN 1 ELSE 0 END}
 	],
 	[
-		'set random error',                 0,
+		'set random error', 0, META_COMMAND_ERROR,
 		[qr{cannot coerce boolean to int}], q{\set b random(FALSE, TRUE)}
 	],
 	[
-		'set number of args mismatch',        1,
+		'set number of args mismatch', 1, SYNTAX_ERROR,
 		[qr{unexpected number of arguments}], q{\set d ln(1.0, 2.0))}
 	],
 	[
-		'set at least one arg',               1,
+		'set at least one arg', 1, SYNTAX_ERROR,
 		[qr{at least one argument expected}], q{\set i greatest())}
 	],
 
 	# SETSHELL
 	[
-		'setshell not an int',                0,
+		'setshell not an int', 0, META_COMMAND_ERROR,
 		[qr{command must return an integer}], q{\setshell i echo -n one}
 	],
-	[ 'setshell missing arg', 1, [qr{missing argument }], q{\setshell var} ],
 	[
-		'setshell no such command',   0,
+		'setshell missing arg', 1, SYNTAX_ERROR,
+		[qr{missing argument }], q{\setshell var}
+	],
+	[
+		'setshell no such command', 0, META_COMMAND_ERROR,
 		[qr{could not read result }], q{\setshell var no-such-command}
 	],
 
 	# SLEEP
 	[
-		'sleep undefined variable',      0,
+		'sleep undefined variable', 0, META_COMMAND_ERROR,
 		[qr{sleep: undefined variable}], q{\sleep :nosuchvariable}
 	],
 	[
-		'sleep too many args',    1,
+		'sleep too many args', 1, SYNTAX_ERROR,
 		[qr{too many arguments}], q{\sleep too many args}
 	],
 	[
-		'sleep missing arg', 1,
+		'sleep missing arg', 1, SYNTAX_ERROR,
 		[ qr{missing argument}, qr{\\sleep} ], q{\sleep}
 	],
 	[
-		'sleep unknown unit',         1,
+		'sleep unknown unit', 1, SYNTAX_ERROR,
 		[qr{unrecognized time unit}], q{\sleep 1 week}
 	],
 
+	# CONDITIONAL BLOCKS
+	[   'error inside a conditional block', 0, SQL_ERROR,
+		[qr{ERROR:  division by zero}],
+		q{-- error inside a conditional block
+\if true
+SELECT 1 / 0;
+\endif
+}
+	],
+
 	# MISC
 	[
-		'misc invalid backslash command',         1,
+		'misc invalid backslash command', 1, SYNTAX_ERROR,
 		[qr{invalid command .* "nosuchcommand"}], q{\nosuchcommand}
 	],
-	[ 'misc empty script', 1, [qr{empty command list for script}], q{} ],
 	[
-		'bad boolean',                     0,
+		'misc empty script', 1, SYNTAX_ERROR,
+		[qr{empty command list for script}], q{}
+	],
+	[
+		'bad boolean', 0, META_COMMAND_ERROR,
 		[qr{malformed variable.*trueXXX}], q{\set b :badtrue or true}
 	],);
 
 
 for my $e (@errors)
 {
-	my ($name, $status, $re, $script) = @$e;
+	my ($name, $status, $error_type, $re, $script) = @$e;
 	my $n = '001_pgbench_error_' . $name;
 	$n =~ s/ /_/g;
+	my $test_name = 'pgbench script error: ' . $name;
+	my $stdout_re;
+
+	if ($status)
+	{
+		# only syntax errors get non-zero exit status
+		# internal error which should never occur
+		die $test_name . ": unexpected error type: " . $error_type . "\n"
+		if ($error_type != SYNTAX_ERROR);
+
+		$stdout_re = [ qr{^$} ];
+	}
+	else
+	{
+		$stdout_re =
+			[ qr{processed: 0/1}, qr{number of failures: 1 \(100.000%\)},
+			  qr{^((?!number of retried)(.|\n))*$} ];
+
+		if ($error_type == SQL_ERROR)
+		{
+			push @$stdout_re,
+				qr{number of serialization failures: 0 \(0.000%\)},
+				qr{number of deadlock failures: 0 \(0.000%\)},
+				qr{number of other SQL failures: 1 \(100.000%\)};
+		}
+		elsif ($error_type == META_COMMAND_ERROR)
+		{
+			push @$stdout_re,
+				qr{number of meta-command failures: 1 \(100.000%\)};
+		}
+		else
+		{
+			# internal error which should never occur
+			die $test_name . ": unexpected error type: " . $error_type . "\n";
+		}
+	}
+
 	pgbench(
-		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared',
+		'-n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -M prepared --failures-detailed --print-errors',
 		$status,
-		[ $status ? qr{^$} : qr{processed: 0/1} ],
+		$stdout_re,
 		$re,
-		'pgbench script error: ' . $name,
+		$test_name,
 		{ $n => $script });
 }
 
@@ -848,6 +939,245 @@ pgbench(
 check_pgbench_logs("$bdir/001_pgbench_log_3", 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 0, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Rollback of transaction block in case of meta command failure.
+#
+# If the rollback is not performed, we either continue the current transaction
+# block or we terminate it successfully. In the first case we get an abortion of
+# the client (we reached the end of the script with an incomplete transaction
+# block). In the second case we run the second transaction and get a failure in
+# the SQL command (the previous transaction was successful and inserting the
+# same value will get a unique violation error).
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE x_unique (x integer UNIQUE);');
+
+pgbench(
+	'--no-vacuum -t 2 --failures-detailed', 0,
+	[
+		qr{processed: 0/2},
+		qr{number of meta-command failures: 2 \(100.000%\)}
+	],
+	[qr{^$}],
+	'rollback of transaction block in case of meta command failure',
+	{ '001_pgbench_rollback_of_transaction_block_in_case_of_meta_command_failure' => q{
+BEGIN;
+INSERT INTO x_unique VALUES (1);
+\set i 1/0
+END;
+}
+	});
+
+# clean up
+$node->safe_psql('postgres', 'DROP TABLE x_unique');
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
+  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failures)(.|\n))*$},
+	  qr{number of retried: 1\b}, qr{number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --print-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failures)(.|\n))*$},
+	  qr{number of retried: 1\b}, qr{number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
 # done
 $node->stop;
 done_testing();
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index c1c2c1e3d4..7cca6df57d 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -157,6 +157,16 @@ my @options = (
 			qr{error while setting random seed from --random-seed option}
 		]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an infinite number of transaction tries can only be used with the option --latency-limit}]
+	],
 
 	# loging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index db2a0a53b3..4d14066024 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index 9b91de5a3d..59c8d8a8e5 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v11-0004-Pgbench-errors-use-a-separate-function-to-report.patchtext/x-diff; name=v11-0004-Pgbench-errors-use-a-separate-function-to-report.patchDownload

From ab40f131aa781a28f0593715d233af1c19ae951a Mon Sep 17 00:00:00 2001
From: Marina Polyakova <m.polyakova@postgrespro.ru>
Date: Wed, 5 Sep 2018 19:39:41 +0300
Subject: [PATCH v11 4/4] Pgbench errors: use a separate function to report a
 debug/log/error message

This is most important when it is used to report client failures that do not
cause an aborts and this depends on the level of debugging.

Rename the already used function pgbench_error() to pgbench_simple_error() for
flex lexer errors.
---
 src/bin/pgbench/pgbench.c | 1019 ++++++++++++++++++-------------------
 1 file changed, 485 insertions(+), 534 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8da11209ad..18742fcd56 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -661,17 +661,6 @@ static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
 static int64 total_weight = 0;
 
-typedef enum DebugLevel
-{
-	NO_DEBUG = 0,				/* no debugging output (except PGBENCH_DEBUG) */
-	DEBUG_ERRORS,				/* print only error messages, retries and
-								 * failures */
-	DEBUG_ALL					/* print all debugging output (throttling,
-								 * executed/sent/received commands etc.) */
-} DebugLevel;
-
-static DebugLevel debug_level = NO_DEBUG;	/* debug flag */
-
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -718,6 +707,41 @@ static const BuiltinScript builtin_script[] =
 	}
 };
 
+typedef enum ErrorLevel
+{
+	/*
+	 * To report throttling, executed/sent/received commands etc.
+	 */
+	DEBUG,
+
+	/*
+	 * To report a normal error of the SQL/meta command and process the
+	 * transaction after it (its end/retry).
+	 */
+	LOG,
+
+	/*
+	 * To report:
+	 * - abortion of the client (something serious e.g. connection with the
+	 *   backend was lost);
+	 * - the log messages of the main program;
+	 * - PGBENCH_DEBUG messages.
+	 */
+	LOG_PGBENCH,
+
+	/*
+	 * To report the error messages of the main program and immediately call
+	 * exit(1).
+	 */
+	FATAL
+} ErrorLevel;
+
+/*
+ * By default there are no debug messages or detailed reports about
+ * retried/failed transactions.
+ */
+static ErrorLevel log_level = LOG_PGBENCH;
+
 
 /* Function prototypes */
 static void setNullValue(PgBenchValue *pv);
@@ -729,17 +753,21 @@ static void doLog(TState *thread, CState *st,
 	  StatsData *agg, bool skipped, double latency, double lag);
 static void processXactStats(TState *thread, CState *st, instr_time *now,
 				 bool skipped, StatsData *agg);
-static void pgbench_error(const char *fmt,...) pg_attribute_printf(1, 2);
+static void pgbench_simple_error(const char *fmt,...) pg_attribute_printf(1, 2);
 static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void setalarm(int seconds);
 static void finishCon(CState *st);
+static void pgbench_error(ErrorLevel elevel,
+						  const char *fmt,...) pg_attribute_printf(2, 3);
+static void pgbench_error_va(ErrorLevel elevel, const char *fmt,
+							 va_list *args) pg_attribute_printf(2, 0);
 
 
 /* callback functions for our flex lexer */
 static const PsqlScanCallbacks pgbench_callbacks = {
 	NULL,						/* don't need get_variable functionality */
-	pgbench_error
+	pgbench_simple_error
 };
 
 
@@ -880,7 +908,8 @@ strtoint64(const char *str)
 
 	/* require at least one digit */
 	if (!isdigit((unsigned char) *ptr))
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+		pgbench_error(LOG_PGBENCH, "invalid input syntax for integer: \"%s\"\n",
+					  str);
 
 	/* process digits */
 	while (*ptr && isdigit((unsigned char) *ptr))
@@ -888,7 +917,9 @@ strtoint64(const char *str)
 		int64		tmp = result * 10 + (*ptr++ - '0');
 
 		if ((tmp / 10) != result)	/* overflow? */
-			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+			pgbench_error(LOG_PGBENCH,
+						  "value \"%s\" is out of range for type bigint\n",
+						  str);
 		result = tmp;
 	}
 
@@ -899,7 +930,8 @@ gotdigits:
 		ptr++;
 
 	if (*ptr != '\0')
-		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+		pgbench_error(LOG_PGBENCH, "invalid input syntax for integer: \"%s\"\n",
+					  str);
 
 	return ((sign < 0) ? -result : result);
 }
@@ -1312,8 +1344,8 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag,
 			break;
 		default:
 			/* internal error which should never occur */
-			fprintf(stderr, "unexpected error status: %d\n", estatus);
-			exit(1);
+			pgbench_error(FATAL, "unexpected error status: %d\n", estatus);
+			break;
 	}
 }
 
@@ -1325,10 +1357,7 @@ executeStatement(PGconn *con, const char *sql)
 
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
+		pgbench_error(FATAL, "%s", PQerrorMessage(con));
 	PQclear(res);
 }
 
@@ -1340,10 +1369,9 @@ tryExecuteStatement(PGconn *con, const char *sql)
 
 	res = PQexec(con, sql);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		fprintf(stderr, "(ignoring this error and continuing anyway)\n");
-	}
+		pgbench_error(LOG_PGBENCH,
+					  "%s(ignoring this error and continuing anyway)\n",
+					  PQerrorMessage(con));
 	PQclear(res);
 }
 
@@ -1388,8 +1416,8 @@ doConnect(void)
 
 		if (!conn)
 		{
-			fprintf(stderr, "connection to database \"%s\" failed\n",
-					dbName);
+			pgbench_error(LOG_PGBENCH, "connection to database \"%s\" failed\n",
+						  dbName);
 			return NULL;
 		}
 
@@ -1407,8 +1435,8 @@ doConnect(void)
 	/* check to see that the backend connection was successfully made */
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
-		fprintf(stderr, "connection to database \"%s\" failed:\n%s",
-				dbName, PQerrorMessage(conn));
+		pgbench_error(LOG_PGBENCH, "connection to database \"%s\" failed:\n%s",
+					  dbName, PQerrorMessage(conn));
 		PQfinish(conn);
 		return NULL;
 	}
@@ -1546,10 +1574,8 @@ makeVariableValue(Variable *var)
 
 		if (sscanf(var->svalue, "%lf%c", &dv, &xs) != 1)
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr,
-						"malformed variable \"%s\" value: \"%s\"\n",
-						var->name, var->svalue);
+			pgbench_error(LOG, "malformed variable \"%s\" value: \"%s\"\n",
+						  var->name, var->svalue);
 			return false;
 		}
 		setDoubleValue(&var->value, dv);
@@ -1619,7 +1645,7 @@ enlargeVariables(Variables *variables, int needed)
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name). Because this can be used by client
- * commands, print an error message only in debug mode. The caller can print his
+ * commands, print an error message at the level LOG. The caller can print his
  * own error message.
  */
 static Variable *
@@ -1636,9 +1662,8 @@ lookupCreateVariable(Variables *variables, const char *context, char *name)
 		 */
 		if (!valid_variable_name(name))
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "%s: invalid variable name: \"%s\"\n",
-						context, name);
+			pgbench_error(LOG, "%s: invalid variable name: \"%s\"\n",
+						  context, name);
 			return NULL;
 		}
 
@@ -1671,8 +1696,8 @@ putVariable(Variables *variables, const char *context, char *name,
 	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 	{
-		fprintf(stderr, "%s: error while setting variable \"%s\"\n",
-				context, name);
+		pgbench_error(LOG_PGBENCH, "%s: error while setting variable \"%s\"\n",
+					  context, name);
 		return false;
 	}
 
@@ -1690,7 +1715,7 @@ putVariable(Variables *variables, const char *context, char *name,
 /*
  * Assign a value to a variable, creating it if need be.
  * Returns false on failure (bad name). Because this can be used by client
- * commands, print an error message only in debug mode. The caller can print his
+ * commands, print an error message at the level LOG. The caller can print his
  * own error message.
  */
 static bool
@@ -1702,9 +1727,8 @@ putVariableValue(Variables *variables, const char *context, char *name,
 	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "%s: error while setting variable \"%s\"\n",
-					context, name);
+		pgbench_error(LOG, "%s: error while setting variable \"%s\"\n",
+					  context, name);
 		return false;
 	}
 
@@ -1719,7 +1743,7 @@ putVariableValue(Variables *variables, const char *context, char *name,
 /*
  * Assign an integer value to a variable, creating it if need be.
  * Returns false on failure (bad name). Because this can be used by client
- * commands, print an error message only in debug mode. The caller can print his
+ * commands, print an error message at the level LOG. The caller can print his
  * own error message.
  */
 static bool
@@ -1861,9 +1885,10 @@ coerceToBool(PgBenchValue *pval, bool *bval)
 	}
 	else						/* NULL, INT or DOUBLE */
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "cannot coerce %s to boolean\n",
-					valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (LOG >= log_level)
+			pgbench_error(LOG, "cannot coerce %s to boolean\n",
+						  valueTypeName(pval));
 		*bval = false;			/* suppress uninitialized-variable warnings */
 		return false;
 	}
@@ -1908,8 +1933,7 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 
 		if (dval < PG_INT64_MIN || PG_INT64_MAX < dval)
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "double to int overflow for %f\n", dval);
+			pgbench_error(LOG, "double to int overflow for %f\n", dval);
 			return false;
 		}
 		*ival = (int64) dval;
@@ -1917,8 +1941,10 @@ coerceToInt(PgBenchValue *pval, int64 *ival)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "cannot coerce %s to int\n", valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (LOG >= log_level)
+			pgbench_error(LOG, "cannot coerce %s to int\n",
+						  valueTypeName(pval));
 		return false;
 	}
 }
@@ -1939,9 +1965,10 @@ coerceToDouble(PgBenchValue *pval, double *dval)
 	}
 	else						/* BOOLEAN or NULL */
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "cannot coerce %s to double\n",
-					valueTypeName(pval));
+		/* call the function valueTypeName only if necessary */
+		if (LOG >= log_level)
+			pgbench_error(LOG, "cannot coerce %s to double\n",
+						  valueTypeName(pval));
 		return false;
 	}
 }
@@ -2122,9 +2149,8 @@ evalStandardFunc(TState *thread, CState *st,
 
 	if (l != NULL)
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr,
-					"too many function arguments, maximum is %d\n", MAX_FARGS);
+		pgbench_error(LOG, "too many function arguments, maximum is %d\n",
+					  MAX_FARGS);
 		return false;
 	}
 
@@ -2247,8 +2273,7 @@ evalStandardFunc(TState *thread, CState *st,
 						case PGBENCH_MOD:
 							if (ri == 0)
 							{
-								if (debug_level >= DEBUG_ERRORS)
-									fprintf(stderr, "division by zero\n");
+								pgbench_error(LOG, "division by zero\n");
 								return false;
 							}
 							/* special handling of -1 divisor */
@@ -2259,9 +2284,8 @@ evalStandardFunc(TState *thread, CState *st,
 									/* overflow check (needed for INT64_MIN) */
 									if (li == PG_INT64_MIN)
 									{
-										if (debug_level >= DEBUG_ERRORS)
-											fprintf(stderr,
-													"bigint out of range\n");
+										pgbench_error(LOG,
+													  "bigint out of range\n");
 										return false;
 									}
 									else
@@ -2365,17 +2389,20 @@ evalStandardFunc(TState *thread, CState *st,
 
 				Assert(nargs == 1);
 
-				fprintf(stderr, "debug(script=%d,command=%d): ",
-						st->use_file, st->command + 1);
+				pgbench_error(LOG_PGBENCH, "debug(script=%d,command=%d): ",
+							  st->use_file, st->command + 1);
 
 				if (varg->type == PGBT_NULL)
-					fprintf(stderr, "null\n");
+					pgbench_error(LOG_PGBENCH, "null\n");
 				else if (varg->type == PGBT_BOOLEAN)
-					fprintf(stderr, "boolean %s\n", varg->u.bval ? "true" : "false");
+					pgbench_error(LOG_PGBENCH, "boolean %s\n",
+								  varg->u.bval ? "true" : "false");
 				else if (varg->type == PGBT_INT)
-					fprintf(stderr, "int " INT64_FORMAT "\n", varg->u.ival);
+					pgbench_error(LOG_PGBENCH, "int " INT64_FORMAT "\n",
+								  varg->u.ival);
 				else if (varg->type == PGBT_DOUBLE)
-					fprintf(stderr, "double %.*g\n", DBL_DIG, varg->u.dval);
+					pgbench_error(LOG_PGBENCH, "double %.*g\n",
+								  DBL_DIG, varg->u.dval);
 				else			/* internal error, unexpected type */
 					Assert(0);
 
@@ -2501,15 +2528,13 @@ evalStandardFunc(TState *thread, CState *st,
 				/* check random range */
 				if (imin > imax)
 				{
-					if (debug_level >= DEBUG_ERRORS)
-						fprintf(stderr, "empty range given to random\n");
+					pgbench_error(LOG, "empty range given to random\n");
 					return false;
 				}
 				else if (imax - imin < 0 || (imax - imin) + 1 < 0)
 				{
 					/* prevent int overflows in random functions */
-					if (debug_level >= DEBUG_ERRORS)
-						fprintf(stderr, "random range is too large\n");
+					pgbench_error(LOG, "random range is too large\n");
 					return false;
 				}
 
@@ -2531,10 +2556,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param < MIN_GAUSSIAN_PARAM)
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								fprintf(stderr,
-										"gaussian parameter must be at least %f (not %f)\n",
-										MIN_GAUSSIAN_PARAM, param);
+							pgbench_error(LOG,
+										  "gaussian parameter must be at least %f (not %f)\n",
+										  MIN_GAUSSIAN_PARAM, param);
 							return false;
 						}
 
@@ -2546,10 +2570,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0 || param == 1.0 || param > MAX_ZIPFIAN_PARAM)
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								fprintf(stderr,
-										"zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
-										MAX_ZIPFIAN_PARAM, param);
+							pgbench_error(LOG,
+										  "zipfian parameter must be in range (0, 1) U (1, %d] (got %f)\n",
+										  MAX_ZIPFIAN_PARAM, param);
 							return false;
 						}
 						setIntValue(retval,
@@ -2561,10 +2584,9 @@ evalStandardFunc(TState *thread, CState *st,
 					{
 						if (param <= 0.0)
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								fprintf(stderr,
-										"exponential parameter must be greater than zero (got %f)\n",
-										param);
+							pgbench_error(LOG,
+										  "exponential parameter must be greater than zero (got %f)\n",
+										  param);
 							return false;
 						}
 
@@ -2675,9 +2697,8 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
-					if (debug_level >= DEBUG_ERRORS)
-						fprintf(stderr, "undefined variable \"%s\"\n",
-								expr->u.variable.varname);
+					pgbench_error(LOG, "undefined variable \"%s\"\n",
+								  expr->u.variable.varname);
 					return false;
 				}
 
@@ -2696,9 +2717,12 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 
 		default:
 			/* internal error which should never occur */
-			fprintf(stderr, "unexpected enode type in evaluation: %d\n",
-					expr->etype);
-			exit(1);
+			pgbench_error(FATAL,
+						  "unexpected enode type in evaluation: %d\n",
+						  expr->etype);
+
+			/* keep compiler quiet */
+			return false;
 	}
 }
 
@@ -2771,17 +2795,15 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		}
 		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "%s: undefined variable \"%s\"\n",
-						argv[0], argv[i]);
+			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+						  argv[0], argv[i]);
 			return false;
 		}
 
 		arglen = strlen(arg);
 		if (len + arglen + (i > 0 ? 1 : 0) >= SHELL_COMMAND_SIZE - 1)
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "%s: shell command is too long\n", argv[0]);
+			pgbench_error(LOG, "%s: shell command is too long\n", argv[0]);
 			return false;
 		}
 
@@ -2798,8 +2820,9 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	{
 		if (system(command))
 		{
-			if (!timer_exceeded && debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+			if (!timer_exceeded)
+				pgbench_error(LOG, "%s: could not launch shell command\n",
+							  argv[0]);
 			return false;
 		}
 		return true;
@@ -2808,21 +2831,20 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 	/* Execute the command with pipe and read the standard output. */
 	if ((fp = popen(command, "r")) == NULL)
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "%s: could not launch shell command\n", argv[0]);
+		pgbench_error(LOG, "%s: could not launch shell command\n", argv[0]);
 		return false;
 	}
 	if (fgets(res, sizeof(res), fp) == NULL)
 	{
-		if (!timer_exceeded && debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "%s: could not read result of shell command\n", argv[0]);
+		if (!timer_exceeded)
+			pgbench_error(LOG, "%s: could not read result of shell command\n",
+						  argv[0]);
 		(void) pclose(fp);
 		return false;
 	}
 	if (pclose(fp) < 0)
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr, "%s: could not close shell command\n", argv[0]);
+		pgbench_error(LOG, "%s: could not close shell command\n", argv[0]);
 		return false;
 	}
 
@@ -2832,10 +2854,9 @@ runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 		endptr++;
 	if (*res == '\0' || *endptr != '\0')
 	{
-		if (debug_level >= DEBUG_ERRORS)
-			fprintf(stderr,
-					"%s: shell command must return an integer (not \"%s\")\n",
-					argv[0], res);
+		pgbench_error(LOG,
+					  "%s: shell command must return an integer (not \"%s\")\n",
+					  argv[0], res);
 		return false;
 	}
 	if (!putVariableInt(variables, "setshell", variable, retval))
@@ -2860,9 +2881,9 @@ preparedStatementName(char *buffer, int file, int state)
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
-	fprintf(stderr,
-			"client %d got an error in command %d (%s) of script %d; %s\n",
-			st->id, st->command, cmd, st->use_file, message);
+	pgbench_error(LOG,
+				  "client %d got an error in command %d (%s) of script %d; %s\n",
+				  st->id, st->command, cmd, st->use_file, message);
 }
 
 /*
@@ -2874,9 +2895,9 @@ clientAborted(CState *st, const char *message)
 	const Command *command = sql_script[st->use_file].commands[st->command];
 
 	Assert(command->type == SQL_COMMAND);
-	fprintf(stderr,
-			"client %d aborted in command %d (SQL) of script %d; %s\n",
-			st->id, st->command, st->use_file, message);
+	pgbench_error(LOG_PGBENCH,
+				  "client %d aborted in command %d (SQL) of script %d; %s\n",
+				  st->id, st->command, st->use_file, message);
 }
 
 /* return a script number with a weighted choice. */
@@ -2911,8 +2932,7 @@ sendCommand(CState *st, Command *command)
 		sql = pg_strdup(command->argv[0]);
 		sql = assignVariables(&st->variables, sql);
 
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQuery(st->con, sql);
 		free(sql);
 	}
@@ -2923,8 +2943,7 @@ sendCommand(CState *st, Command *command)
 
 		getQueryParams(&st->variables, command, params);
 
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, sql);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
 							  NULL, params, NULL, NULL, 0);
 	}
@@ -2949,7 +2968,7 @@ sendCommand(CState *st, Command *command)
 				res = PQprepare(st->con, name,
 								commands[j]->argv[0], commands[j]->argc - 1, NULL);
 				if (PQresultStatus(res) != PGRES_COMMAND_OK)
-					fprintf(stderr, "%s", PQerrorMessage(st->con));
+					pgbench_error(LOG_PGBENCH, "%s", PQerrorMessage(st->con));
 				PQclear(res);
 			}
 			st->prepared[st->use_file] = true;
@@ -2958,8 +2977,7 @@ sendCommand(CState *st, Command *command)
 		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, name);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, name);
 		r = PQsendQueryPrepared(st->con, name, command->argc - 1,
 								params, NULL, NULL, 0);
 	}
@@ -2968,9 +2986,8 @@ sendCommand(CState *st, Command *command)
 
 	if (r == 0)
 	{
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d could not send %s\n",
-					st->id, command->argv[0]);
+		pgbench_error(DEBUG, "client %d could not send %s\n",
+					  st->id, command->argv[0]);
 		return false;
 	}
 	else
@@ -2987,14 +3004,12 @@ sendRollback(CState *st)
 
 	if (querymode == QUERY_SIMPLE)
 	{
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, rollback_cmd);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, rollback_cmd);
 		r = PQsendQuery(st->con, rollback_cmd);
 	}
 	else if (querymode == QUERY_EXTENDED)
 	{
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, rollback_cmd);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, rollback_cmd);
 		r = PQsendQueryParams(st->con, rollback_cmd, 0,
 							  NULL, NULL, NULL, NULL, 0);
 	}
@@ -3006,13 +3021,12 @@ sendRollback(CState *st)
 
 			res = PQprepare(st->con, prepared_name, rollback_cmd, 0, NULL);
 			if (PQresultStatus(res) != PGRES_COMMAND_OK)
-				fprintf(stderr, "%s", PQerrorMessage(st->con));
+				pgbench_error(LOG_PGBENCH, "%s", PQerrorMessage(st->con));
 			PQclear(res);
 			st->rollback_prepared = true;
 		}
 
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d sending %s\n", st->id, prepared_name);
+		pgbench_error(DEBUG, "client %d sending %s\n", st->id, prepared_name);
 		r = PQsendQueryPrepared(st->con, prepared_name, 0,
 								NULL, NULL, NULL, 0);
 	}
@@ -3021,9 +3035,8 @@ sendRollback(CState *st)
 
 	if (r == 0)
 	{
-		if (debug_level >= DEBUG_ALL)
-			fprintf(stderr, "client %d could not send %s\n",
-					st->id, rollback_cmd);
+		pgbench_error(DEBUG, "client %d could not send %s\n",
+					  st->id, rollback_cmd);
 		return false;
 	}
 	else
@@ -3044,9 +3057,8 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	{
 		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
-			if (debug_level >= DEBUG_ERRORS)
-				fprintf(stderr, "%s: undefined variable \"%s\"\n",
-						argv[0], argv[1]);
+			pgbench_error(LOG, "%s: undefined variable \"%s\"\n",
+						  argv[0], argv[1]);
 			return false;
 		}
 		usec = atoi(var);
@@ -3220,7 +3232,8 @@ getTransactionStatus(PGconn *con, bool *in_tx_block)
 			/* PQTRANS_UNKNOWN is expected given a broken connection */
 			if (PQstatus(con) == CONNECTION_BAD)
 			{		/* there's something wrong */
-				fprintf(stderr, "perhaps the backend died while processing\n");
+				pgbench_error(LOG_PGBENCH,
+							  "perhaps the backend died while processing\n");
 				return false;
 			}
 		case PQTRANS_ACTIVE:
@@ -3229,7 +3242,8 @@ getTransactionStatus(PGconn *con, bool *in_tx_block)
 			 * We cannot find out whether we are in a transaction block or not.
 			 * Internal error which should never occur.
 			 */
-			fprintf(stderr, "unexpected transaction status %d\n", tx_status);
+			pgbench_error(LOG_PGBENCH, "unexpected transaction status %d\n",
+						  tx_status);
 			return false;
 	}
 
@@ -3295,9 +3309,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 				st->use_file = chooseScript(thread);
 
-				if (debug_level >= DEBUG_ALL)
-					fprintf(stderr, "client %d executing script \"%s\"\n",
-							st->id, sql_script[st->use_file].desc);
+				pgbench_error(DEBUG, "client %d executing script \"%s\"\n",
+							  st->id, sql_script[st->use_file].desc);
 
 				if (throttle_delay > 0)
 					st->state = CSTATE_START_THROTTLE;
@@ -3374,9 +3387,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				}
 
 				st->state = CSTATE_THROTTLE;
-				if (debug_level >= DEBUG_ALL)
-					fprintf(stderr, "client %d throttling " INT64_FORMAT " us\n",
-							st->id, wait);
+				pgbench_error(DEBUG,
+							  "client %d throttling " INT64_FORMAT " us\n",
+							  st->id, wait);
 				break;
 
 				/*
@@ -3408,8 +3421,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					start = now;
 					if ((st->con = doConnect()) == NULL)
 					{
-						fprintf(stderr, "client %d aborted while establishing connection\n",
-								st->id);
+						pgbench_error(LOG_PGBENCH,
+									  "client %d aborted while establishing connection\n",
+									  st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -3496,12 +3510,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 								i;
 					char	  **argv = command->argv;
 
-					if (debug_level >= DEBUG_ALL)
+					/*
+					 * The variable name can be quite large so do not use a
+					 * static buffer.
+					 */
+					if (DEBUG >= log_level)
 					{
-						fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
+						pgbench_error(DEBUG, "client %d executing \\%s",
+									  st->id, argv[0]);
 						for (i = 1; i < argc; i++)
-							fprintf(stderr, " %s", argv[i]);
-						fprintf(stderr, "\n");
+							pgbench_error(DEBUG, " %s", argv[i]);
+						pgbench_error(DEBUG, "\n");
 					}
 
 					if (command->meta == META_SLEEP)
@@ -3517,9 +3536,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateSleep(&st->variables, argc, argv, &usec))
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								commandFailed(st, "sleep",
-											  "execution of meta-command failed");
+							commandFailed(st, "sleep",
+										  "execution of meta-command failed");
 							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							st->state = CSTATE_ERROR;
 							break;
@@ -3552,9 +3570,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 						if (!evaluateExpr(thread, st, expr, &result))
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								commandFailed(st, argv[0],
-											  "evaluation of meta-command failed");
+							commandFailed(st, argv[0],
+										  "evaluation of meta-command failed");
 							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							st->state = CSTATE_ERROR;
 							break;
@@ -3565,9 +3582,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							if (!putVariableValue(&st->variables, argv[0],
 												  argv[1], &result))
 							{
-								if (debug_level >= DEBUG_ERRORS)
-									commandFailed(st, "set",
-												  "assignment of meta-command failed");
+								commandFailed(st, "set",
+											  "assignment of meta-command failed");
 								st->estatus = ESTATUS_META_COMMAND_ERROR;
 								st->state = CSTATE_ERROR;
 								break;
@@ -3630,9 +3646,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								commandFailed(st, "setshell",
-											  "execution of meta-command failed");
+							commandFailed(st, "setshell",
+										  "execution of meta-command failed");
 							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							st->state = CSTATE_ERROR;
 							break;
@@ -3654,9 +3669,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						}
 						else if (!ret)	/* on error */
 						{
-							if (debug_level >= DEBUG_ERRORS)
-								commandFailed(st, "shell",
-											  "execution of meta-command failed");
+							commandFailed(st, "shell",
+										  "execution of meta-command failed");
 							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							st->state = CSTATE_ERROR;
 							break;
@@ -3775,8 +3789,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 */
 			case CSTATE_WAIT_RESULT:
 				command = sql_script[st->use_file].commands[st->command];
-				if (debug_level >= DEBUG_ALL)
-					fprintf(stderr, "client %d receiving\n", st->id);
+				pgbench_error(DEBUG, "client %d receiving\n", st->id);
 				if (!PQconsumeInput(st->con))
 				{				/* there's something wrong */
 					clientAborted(st,
@@ -3806,8 +3819,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 					case PGRES_FATAL_ERROR:
 						st->estatus = getSQLErrorStatus(
 							PQresultErrorField(res, PG_DIAG_SQLSTATE));
-						if (debug_level >= DEBUG_ERRORS)
-							commandFailed(st, "SQL", PQerrorMessage(st->con));
+						commandFailed(st, "SQL", PQerrorMessage(st->con));
 						PQclear(res);
 						discard_response(st);
 						st->state = CSTATE_ERROR;
@@ -3824,13 +3836,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * Wait for the rollback command to complete
 				 */
 			case CSTATE_WAIT_ROLLBACK_RESULT:
-				if (debug_level >= DEBUG_ALL)
-					fprintf(stderr, "client %d receiving\n", st->id);
+				pgbench_error(DEBUG, "client %d receiving\n", st->id);
 				if (!PQconsumeInput(st->con))
 				{
-					fprintf(stderr,
-							"client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing\n",
-							st->id);
+					pgbench_error(LOG_PGBENCH,
+								  "client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing\n",
+								  st->id);
 					st->state = CSTATE_ABORTED;
 					break;
 				}
@@ -3854,9 +3865,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 							st->state = CSTATE_FAILURE;
 						break;
 					default:
-						fprintf(stderr,
-								"client %d aborted while rolling back the transaction after an error; %s\n",
-								st->id, PQerrorMessage(st->con));
+						pgbench_error(LOG_PGBENCH,
+									  "client %d aborted while rolling back the transaction after an error; %s\n",
+									  st->id, PQerrorMessage(st->con));
 						PQclear(res);
 						st->state = CSTATE_ABORTED;
 						break;
@@ -3930,9 +3941,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 * It is assumed that the function getTransactionStatus
 						 * has already printed a more detailed error message.
 						 */
-						fprintf(stderr,
-								"client %d aborted while receiving the transaction status\n",
-								st->id);
+						pgbench_error(LOG_PGBENCH,
+									  "client %d aborted while receiving the transaction status\n",
+									  st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -3942,9 +3953,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						/* Try to rollback a (failed) transaction block. */
 						if (!sendRollback(st))
 						{
-							fprintf(stderr,
-									"client %d aborted: failed to send sql command for rolling back the failed transaction\n",
-									st->id);
+							pgbench_error(LOG_PGBENCH,
+										  "client %d aborted: failed to send sql command for rolling back the failed transaction\n",
+										  st->id);
 							st->state = CSTATE_ABORTED;
 						}
 						else
@@ -3975,18 +3986,24 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				/*
 				 * Inform that the transaction will be retried after the error.
 				 */
-				if (debug_level >= DEBUG_ERRORS)
+				if (LOG >= log_level)
 				{
-					fprintf(stderr,
-							"client %d repeats the transaction after the error (try %d",
-							st->id, st->retries + 1);
+					char		buff[512];
+					int			buff_size = 0;
+
 					if (max_tries)
-						fprintf(stderr, "/%d", max_tries);
+						buff_size += snprintf(buff + buff_size,
+											 sizeof(buff) - buff_size,
+											 "/%d", max_tries);
 					if (latency_limit)
-						fprintf(stderr,
-								", %.3f%% of the maximum time of tries was used",
-								getLatencyUsed(st, &now));
-					fprintf(stderr, ")\n");
+						buff_size += snprintf(buff + buff_size,
+											 sizeof(buff) - buff_size,
+											 ", %.3f%% of the maximum time of tries was used",
+											 getLatencyUsed(st, &now));
+					pgbench_error(LOG,
+								  "client %d repeats the transaction after the error (try %d%s)\n",
+								  st->id, st->retries + 1,
+								  buff_size ? buff : "");
 				}
 
 				/*
@@ -4016,18 +4033,24 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 				 * If this is a serialization or deadlock failure, inform that
 				 * the failed transaction will not be retried.
 				 */
-				if (debug_level >= DEBUG_ERRORS && canRetryError(st->estatus))
+				if (LOG >= log_level && canRetryError(st->estatus))
 				{
-					fprintf(stderr,
-							"client %d ends the failed transaction (try %d",
-							st->id, st->retries + 1);
+					char		buff[512];
+					int			buff_size = 0;
+
 					if (max_tries)
-						fprintf(stderr, "/%d", max_tries);
+						buff_size += snprintf(buff + buff_size,
+											 sizeof(buff) - buff_size,
+											 "/%d", max_tries);
 					if (latency_limit)
-						fprintf(stderr,
-								", %.3f%% of the maximum time of tries was used",
-								getLatencyUsed(st, &now));
-					fprintf(stderr, ")\n");
+						buff_size += snprintf(buff + buff_size,
+											 sizeof(buff) - buff_size,
+											 ", %.3f%% of the maximum time of tries was used",
+											 getLatencyUsed(st, &now));
+					pgbench_error(LOG,
+								  "client %d ends the failed transaction (try %d%s)\n",
+								  st->id, st->retries + 1,
+								  buff_size ? buff : "");
 				}
 
 				/*
@@ -4052,10 +4075,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 
 					/* conditional stack must be empty */
 					if (!conditional_stack_empty(st->cstack))
-					{
-						fprintf(stderr, "end of script reached within a conditional, missing \\endif\n");
-						exit(1);
-					}
+						pgbench_error(FATAL,
+									  "end of script reached within a conditional, missing \\endif\n");
 
 					/*
 					 * We must complete all the transaction blocks that were
@@ -4068,17 +4089,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 						 * It is assumed that the function getTransactionStatus
 						 * has already printed a more detailed error message.
 						 */
-						fprintf(stderr,
-								"client %d aborted while receiving the transaction status\n",
-								st->id);
+						pgbench_error(LOG_PGBENCH,
+									  "client %d aborted while receiving the transaction status\n",
+									  st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
 					if (in_tx_block)
 					{
-						fprintf(stderr,
-								"client %d aborted: end of script reached without completing the last transaction\n",
-								st->id);
+						pgbench_error(LOG_PGBENCH,
+									  "client %d aborted: end of script reached without completing the last transaction\n",
+									  st->id);
 						st->state = CSTATE_ABORTED;
 						break;
 					}
@@ -4255,9 +4276,9 @@ doLog(TState *thread, CState *st,
 					break;
 				default:
 					/* internal error which should never occur */
-					fprintf(stderr, "unexpected error status: %d\n",
-							st->estatus);
-					exit(1);
+					pgbench_error(FATAL, "unexpected error status: %d\n",
+								  st->estatus);
+					break;
 			}
 		}
 		else
@@ -4335,7 +4356,7 @@ disconnect_all(CState *state, int length)
 static void
 initDropTables(PGconn *con)
 {
-	fprintf(stderr, "dropping old tables...\n");
+	pgbench_error(LOG_PGBENCH, "dropping old tables...\n");
 
 	/*
 	 * We drop all the tables in one command, so that whether there are
@@ -4410,7 +4431,7 @@ initCreateTables(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating tables...\n");
+	pgbench_error(LOG_PGBENCH, "creating tables...\n");
 
 	for (i = 0; i < lengthof(DDLs); i++)
 	{
@@ -4463,7 +4484,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	pgbench_error(LOG_PGBENCH, "generating data...\n");
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -4508,10 +4529,7 @@ initGenerateData(PGconn *con)
 	 */
 	res = PQexec(con, "copy pgbench_accounts from stdin");
 	if (PQresultStatus(res) != PGRES_COPY_IN)
-	{
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
+		pgbench_error(FATAL, "%s", PQerrorMessage(con));
 	PQclear(res);
 
 	INSTR_TIME_SET_CURRENT(start);
@@ -4525,10 +4543,7 @@ initGenerateData(PGconn *con)
 				 INT64_FORMAT "\t" INT64_FORMAT "\t%d\t\n",
 				 j, k / naccounts + 1, 0);
 		if (PQputline(con, sql))
-		{
-			fprintf(stderr, "PQputline failed\n");
-			exit(1);
-		}
+			pgbench_error(FATAL, "PQputline failed\n");
 
 		/*
 		 * If we want to stick with the original logging, print a message each
@@ -4542,10 +4557,12 @@ initGenerateData(PGconn *con)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = ((double) scale * naccounts - j) * elapsed_sec / j;
 
-			fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-					j, (int64) naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * (int64) scale)),
-					elapsed_sec, remaining_sec);
+			pgbench_error(LOG_PGBENCH,
+						  INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+						  j, (int64) naccounts * scale,
+						  (int) (((int64) j * 100) /
+								 (naccounts * (int64) scale)),
+						  elapsed_sec, remaining_sec);
 		}
 		/* let's not call the timing for each row, but only each 100 rows */
 		else if (use_quiet && (j % 100 == 0))
@@ -4559,9 +4576,12 @@ initGenerateData(PGconn *con)
 			/* have we reached the next interval (or end)? */
 			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
 			{
-				fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
-						j, (int64) naccounts * scale,
-						(int) (((int64) j * 100) / (naccounts * (int64) scale)), elapsed_sec, remaining_sec);
+				pgbench_error(LOG_PGBENCH,
+							  INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s)\n",
+							  j, (int64) naccounts * scale,
+							  (int) (((int64) j * 100) /
+									 (naccounts * (int64) scale)),
+							  elapsed_sec, remaining_sec);
 
 				/* skip to the next interval */
 				log_interval = (int) ceil(elapsed_sec / LOG_STEP_SECONDS);
@@ -4570,15 +4590,9 @@ initGenerateData(PGconn *con)
 
 	}
 	if (PQputline(con, "\\.\n"))
-	{
-		fprintf(stderr, "very last PQputline failed\n");
-		exit(1);
-	}
+		pgbench_error(FATAL, "very last PQputline failed\n");
 	if (PQendcopy(con))
-	{
-		fprintf(stderr, "PQendcopy failed\n");
-		exit(1);
-	}
+		pgbench_error(FATAL, "PQendcopy failed\n");
 
 	executeStatement(con, "commit");
 }
@@ -4589,7 +4603,7 @@ initGenerateData(PGconn *con)
 static void
 initVacuum(PGconn *con)
 {
-	fprintf(stderr, "vacuuming...\n");
+	pgbench_error(LOG_PGBENCH, "vacuuming...\n");
 	executeStatement(con, "vacuum analyze pgbench_branches");
 	executeStatement(con, "vacuum analyze pgbench_tellers");
 	executeStatement(con, "vacuum analyze pgbench_accounts");
@@ -4609,7 +4623,7 @@ initCreatePKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating primary keys...\n");
+	pgbench_error(LOG_PGBENCH, "creating primary keys...\n");
 	for (i = 0; i < lengthof(DDLINDEXes); i++)
 	{
 		char		buffer[256];
@@ -4646,7 +4660,7 @@ initCreateFKeys(PGconn *con)
 	};
 	int			i;
 
-	fprintf(stderr, "creating foreign keys...\n");
+	pgbench_error(LOG_PGBENCH, "creating foreign keys...\n");
 	for (i = 0; i < lengthof(DDLKEYs); i++)
 	{
 		executeStatement(con, DDLKEYs[i]);
@@ -4666,20 +4680,15 @@ checkInitSteps(const char *initialize_steps)
 	const char *step;
 
 	if (initialize_steps[0] == '\0')
-	{
-		fprintf(stderr, "no initialization steps specified\n");
-		exit(1);
-	}
+		pgbench_error(FATAL, "no initialization steps specified\n");
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		if (strchr("dtgvpf ", *step) == NULL)
-		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
-			exit(1);
-		}
+			pgbench_error(FATAL,
+						  "unrecognized initialization step \"%c\"\n"
+						  "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n",
+						  *step);
 	}
 }
 
@@ -4720,14 +4729,15 @@ runInitSteps(const char *initialize_steps)
 			case ' ':
 				break;			/* ignore */
 			default:
-				fprintf(stderr, "unrecognized initialization step \"%c\"\n",
-						*step);
+				pgbench_error(LOG_PGBENCH,
+							  "unrecognized initialization step \"%c\"\n",
+							  *step);
 				PQfinish(con);
 				exit(1);
 		}
 	}
 
-	fprintf(stderr, "done.\n");
+	pgbench_error(LOG_PGBENCH, "done.\n");
 	PQfinish(con);
 }
 
@@ -4765,8 +4775,9 @@ parseQuery(Command *cmd)
 
 		if (cmd->argc >= MAX_ARGS)
 		{
-			fprintf(stderr, "statement has too many arguments (maximum is %d): %s\n",
-					MAX_ARGS - 1, cmd->argv[0]);
+			pgbench_error(LOG_PGBENCH,
+						  "statement has too many arguments (maximum is %d): %s\n",
+						  MAX_ARGS - 1, cmd->argv[0]);
 			pg_free(name);
 			return false;
 		}
@@ -4787,13 +4798,13 @@ parseQuery(Command *cmd)
  * Simple error-printing function, might be needed by lexer
  */
 static void
-pgbench_error(const char *fmt,...)
+pgbench_simple_error(const char *fmt,...)
 {
 	va_list		ap;
 
 	fflush(stdout);
 	va_start(ap, fmt);
-	vfprintf(stderr, _(fmt), ap);
+	pgbench_error_va(LOG_PGBENCH, fmt, &ap);
 	va_end(ap);
 }
 
@@ -4814,24 +4825,24 @@ syntax_error(const char *source, int lineno,
 			 const char *line, const char *command,
 			 const char *msg, const char *more, int column)
 {
-	fprintf(stderr, "%s:%d: %s", source, lineno, msg);
+	pgbench_error(LOG_PGBENCH, "%s:%d: %s", source, lineno, msg);
 	if (more != NULL)
-		fprintf(stderr, " (%s)", more);
+		pgbench_error(LOG_PGBENCH, " (%s)", more);
 	if (column >= 0 && line == NULL)
-		fprintf(stderr, " at column %d", column + 1);
+		pgbench_error(LOG_PGBENCH, " at column %d", column + 1);
 	if (command != NULL)
-		fprintf(stderr, " in command \"%s\"", command);
-	fprintf(stderr, "\n");
+		pgbench_error(LOG_PGBENCH, " in command \"%s\"", command);
+	pgbench_error(LOG_PGBENCH, "\n");
 	if (line != NULL)
 	{
-		fprintf(stderr, "%s\n", line);
+		pgbench_error(LOG_PGBENCH, "%s\n", line);
 		if (column >= 0)
 		{
 			int			i;
 
 			for (i = 0; i < column; i++)
-				fprintf(stderr, " ");
-			fprintf(stderr, "^ error found here\n");
+				pgbench_error(LOG_PGBENCH, " ");
+			pgbench_error(LOG_PGBENCH, "^ error found here\n");
 		}
 	}
 	exit(1);
@@ -5083,10 +5094,8 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 static void
 ConditionError(const char *desc, int cmdn, const char *msg)
 {
-	fprintf(stderr,
-			"condition error in script \"%s\" command %d: %s\n",
-			desc, cmdn, msg);
-	exit(1);
+	pgbench_error(FATAL, "condition error in script \"%s\" command %d: %s\n",
+				  desc, cmdn, msg);
 }
 
 /*
@@ -5284,20 +5293,14 @@ process_file(const char *filename, int weight)
 	if (strcmp(filename, "-") == 0)
 		fd = stdin;
 	else if ((fd = fopen(filename, "r")) == NULL)
-	{
-		fprintf(stderr, "could not open file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
-	}
+		pgbench_error(FATAL, "could not open file \"%s\": %s\n",
+					  filename, strerror(errno));
 
 	buf = read_file_contents(fd);
 
 	if (ferror(fd))
-	{
-		fprintf(stderr, "could not read file \"%s\": %s\n",
-				filename, strerror(errno));
-		exit(1);
-	}
+		pgbench_error(FATAL, "could not read file \"%s\": %s\n",
+					  filename, strerror(errno));
 
 	if (fd != stdin)
 		fclose(fd);
@@ -5320,10 +5323,10 @@ listAvailableScripts(void)
 {
 	int			i;
 
-	fprintf(stderr, "Available builtin scripts:\n");
+	pgbench_error(LOG_PGBENCH, "Available builtin scripts:\n");
 	for (i = 0; i < lengthof(builtin_script); i++)
-		fprintf(stderr, "\t%s\n", builtin_script[i].name);
-	fprintf(stderr, "\n");
+		pgbench_error(LOG_PGBENCH, "\t%s\n", builtin_script[i].name);
+	pgbench_error(LOG_PGBENCH, "\n");
 }
 
 /* return builtin script "name" if unambiguous, fails if not found */
@@ -5350,10 +5353,12 @@ findBuiltin(const char *name)
 
 	/* error cases */
 	if (found == 0)
-		fprintf(stderr, "no builtin script found for name \"%s\"\n", name);
+		pgbench_error(LOG_PGBENCH, "no builtin script found for name \"%s\"\n",
+					  name);
 	else						/* found > 1 */
-		fprintf(stderr,
-				"ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n", found, name);
+		pgbench_error(LOG_PGBENCH,
+					  "ambiguous builtin name: %d builtin scripts found for prefix \"%s\"\n",
+					  found, name);
 
 	listAvailableScripts();
 	exit(1);
@@ -5385,17 +5390,11 @@ parseScriptWeight(const char *option, char **script)
 		errno = 0;
 		wtmp = strtol(sep + 1, &badp, 10);
 		if (errno != 0 || badp == sep + 1 || *badp != '\0')
-		{
-			fprintf(stderr, "invalid weight specification: %s\n", sep);
-			exit(1);
-		}
+			pgbench_error(FATAL, "invalid weight specification: %s\n", sep);
 		if (wtmp > INT_MAX || wtmp < 0)
-		{
-			fprintf(stderr,
-					"weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
-					INT_MAX, (int64) wtmp);
-			exit(1);
-		}
+			pgbench_error(FATAL,
+						  "weight specification out of range (0 .. %u): " INT64_FORMAT "\n",
+						  INT_MAX, (int64) wtmp);
 		weight = wtmp;
 	}
 	else
@@ -5412,16 +5411,12 @@ static void
 addScript(ParsedScript script)
 {
 	if (script.commands == NULL || script.commands[0] == NULL)
-	{
-		fprintf(stderr, "empty command list for script \"%s\"\n", script.desc);
-		exit(1);
-	}
+		pgbench_error(FATAL, "empty command list for script \"%s\"\n",
+					  script.desc);
 
 	if (num_scripts >= MAX_SCRIPTS)
-	{
-		fprintf(stderr, "at most %d SQL scripts are allowed\n", MAX_SCRIPTS);
-		exit(1);
-	}
+		pgbench_error(FATAL, "at most %d SQL scripts are allowed\n",
+					  MAX_SCRIPTS);
 
 	CheckConditional(script);
 
@@ -5710,9 +5705,8 @@ set_random_seed(const char *seed)
 		if (!pg_strong_random(&iseed, sizeof(iseed)))
 #endif
 		{
-			fprintf(stderr,
-					"cannot seed random from a strong source, none available: "
-					"use \"time\" or an unsigned integer value.\n");
+			pgbench_error(LOG_PGBENCH,
+						  "cannot seed random from a strong source, none available: use \"time\" or an unsigned integer value.\n");
 			return false;
 		}
 	}
@@ -5723,15 +5717,15 @@ set_random_seed(const char *seed)
 
 		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
 		{
-			fprintf(stderr,
-					"unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
-					seed);
+			pgbench_error(LOG_PGBENCH,
+						  "unrecognized random seed option \"%s\": expecting an unsigned integer, \"time\" or \"rand\"\n",
+						  seed);
 			return false;
 		}
 	}
 
 	if (seed != NULL)
-		fprintf(stderr, "setting random seed to %u\n", iseed);
+		pgbench_error(LOG_PGBENCH, "setting random seed to %u\n", iseed);
 	srandom(iseed);
 	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
 	random_seed = iseed;
@@ -5866,10 +5860,8 @@ main(int argc, char **argv)
 
 	/* set random seed early, because it may be used while parsing scripts. */
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
-	{
-		fprintf(stderr, "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "error while setting random seed from PGBENCH_RANDOM_SEED environment variable\n");
 
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
@@ -5901,51 +5893,39 @@ main(int argc, char **argv)
 				pgport = pg_strdup(optarg);
 				break;
 			case 'd':
-				debug_level = DEBUG_ALL;
+				log_level = DEBUG;
 				break;
 			case 'c':
 				benchmarking_option_set = true;
 				nclients = atoi(optarg);
 				if (nclients <= 0 || nclients > MAXCLIENTS)
-				{
-					fprintf(stderr, "invalid number of clients: \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid number of clients: \"%s\"\n",
+								  optarg);
 #ifdef HAVE_GETRLIMIT
 #ifdef RLIMIT_NOFILE			/* most platforms use RLIMIT_NOFILE */
 				if (getrlimit(RLIMIT_NOFILE, &rlim) == -1)
 #else							/* but BSD doesn't ... */
 				if (getrlimit(RLIMIT_OFILE, &rlim) == -1)
 #endif							/* RLIMIT_NOFILE */
-				{
-					fprintf(stderr, "getrlimit failed: %s\n", strerror(errno));
-					exit(1);
-				}
+					pgbench_error(FATAL, "getrlimit failed: %s\n",
+								  strerror(errno));
 				if (rlim.rlim_cur < nclients + 3)
-				{
-					fprintf(stderr, "need at least %d open files, but system limit is %ld\n",
-							nclients + 3, (long) rlim.rlim_cur);
-					fprintf(stderr, "Reduce number of clients, or use limit/ulimit to increase the system limit.\n");
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "need at least %d open files, but system limit is %ld\n"
+								  "Reduce number of clients, or use limit/ulimit to increase the system limit.\n",
+								  nclients + 3, (long) rlim.rlim_cur);
 #endif							/* HAVE_GETRLIMIT */
 				break;
 			case 'j':			/* jobs */
 				benchmarking_option_set = true;
 				nthreads = atoi(optarg);
 				if (nthreads <= 0)
-				{
-					fprintf(stderr, "invalid number of threads: \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid number of threads: \"%s\"\n",
+								  optarg);
 #ifndef ENABLE_THREAD_SAFETY
 				if (nthreads != 1)
-				{
-					fprintf(stderr, "threads are not supported on this platform; use -j1\n");
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "threads are not supported on this platform; use -j1\n");
 #endif							/* !ENABLE_THREAD_SAFETY */
 				break;
 			case 'C':
@@ -5960,29 +5940,22 @@ main(int argc, char **argv)
 				scale_given = true;
 				scale = atoi(optarg);
 				if (scale <= 0)
-				{
-					fprintf(stderr, "invalid scaling factor: \"%s\"\n", optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid scaling factor: \"%s\"\n",
+								  optarg);
 				break;
 			case 't':
 				benchmarking_option_set = true;
 				nxacts = atoi(optarg);
 				if (nxacts <= 0)
-				{
-					fprintf(stderr, "invalid number of transactions: \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "invalid number of transactions: \"%s\"\n",
+								  optarg);
 				break;
 			case 'T':
 				benchmarking_option_set = true;
 				duration = atoi(optarg);
 				if (duration <= 0)
-				{
-					fprintf(stderr, "invalid duration: \"%s\"\n", optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid duration: \"%s\"\n", optarg);
 				break;
 			case 'U':
 				login = pg_strdup(optarg);
@@ -6028,11 +6001,9 @@ main(int argc, char **argv)
 					benchmarking_option_set = true;
 
 					if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0')
-					{
-						fprintf(stderr, "invalid variable definition: \"%s\"\n",
-								optarg);
-						exit(1);
-					}
+						pgbench_error(FATAL,
+									  "invalid variable definition: \"%s\"\n",
+									  optarg);
 
 					*p++ = '\0';
 					if (!putVariable(&state[0].variables, "option", optarg, p))
@@ -6043,10 +6014,8 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				fillfactor = atoi(optarg);
 				if (fillfactor < 10 || fillfactor > 100)
-				{
-					fprintf(stderr, "invalid fillfactor: \"%s\"\n", optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid fillfactor: \"%s\"\n",
+								  optarg);
 				break;
 			case 'M':
 				benchmarking_option_set = true;
@@ -6054,21 +6023,16 @@ main(int argc, char **argv)
 					if (strcmp(optarg, QUERYMODE[querymode]) == 0)
 						break;
 				if (querymode >= NUM_QUERYMODE)
-				{
-					fprintf(stderr, "invalid query mode (-M): \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid query mode (-M): \"%s\"\n",
+								  optarg);
 				break;
 			case 'P':
 				benchmarking_option_set = true;
 				progress = atoi(optarg);
 				if (progress <= 0)
-				{
-					fprintf(stderr, "invalid thread progress delay: \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "invalid thread progress delay: \"%s\"\n",
+								  optarg);
 				break;
 			case 'R':
 				{
@@ -6078,10 +6042,8 @@ main(int argc, char **argv)
 					benchmarking_option_set = true;
 
 					if (throttle_value <= 0.0)
-					{
-						fprintf(stderr, "invalid rate limit: \"%s\"\n", optarg);
-						exit(1);
-					}
+						pgbench_error(FATAL, "invalid rate limit: \"%s\"\n",
+									  optarg);
 					/* Invert rate limit into a time offset */
 					throttle_delay = (int64) (1000000.0 / throttle_value);
 				}
@@ -6091,11 +6053,8 @@ main(int argc, char **argv)
 					double		limit_ms = atof(optarg);
 
 					if (limit_ms <= 0.0)
-					{
-						fprintf(stderr, "invalid latency limit: \"%s\"\n",
-								optarg);
-						exit(1);
-					}
+						pgbench_error(FATAL, "invalid latency limit: \"%s\"\n",
+									  optarg);
 					benchmarking_option_set = true;
 					latency_limit = (int64) (limit_ms * 1000);
 				}
@@ -6116,20 +6075,16 @@ main(int argc, char **argv)
 				benchmarking_option_set = true;
 				sample_rate = atof(optarg);
 				if (sample_rate <= 0.0 || sample_rate > 1.0)
-				{
-					fprintf(stderr, "invalid sampling rate: \"%s\"\n", optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL, "invalid sampling rate: \"%s\"\n",
+								  optarg);
 				break;
 			case 5:				/* aggregate-interval */
 				benchmarking_option_set = true;
 				agg_interval = atoi(optarg);
 				if (agg_interval <= 0)
-				{
-					fprintf(stderr, "invalid number of seconds for aggregation: \"%s\"\n",
-							optarg);
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "invalid number of seconds for aggregation: \"%s\"\n",
+								  optarg);
 				break;
 			case 6:				/* progress-timestamp */
 				progress_timestamp = true;
@@ -6146,10 +6101,8 @@ main(int argc, char **argv)
 			case 9:				/* random-seed */
 				benchmarking_option_set = true;
 				if (!set_random_seed(optarg))
-				{
-					fprintf(stderr, "error while setting random seed from --random-seed option\n");
-					exit(1);
-				}
+					pgbench_error(FATAL,
+								  "error while setting random seed from --random-seed option\n");
 				break;
 			case 10:			/* failures-detailed */
 				benchmarking_option_set = true;
@@ -6160,12 +6113,9 @@ main(int argc, char **argv)
 					int32		max_tries_arg = atoi(optarg);
 
 					if (max_tries_arg < 0)
-					{
-						fprintf(stderr,
-								"invalid number of maximum tries: \"%s\"\n",
-								optarg);
-						exit(1);
-					}
+						pgbench_error(FATAL,
+									  "invalid number of maximum tries: \"%s\"\n",
+									  optarg);
 
 					benchmarking_option_set = true;
 
@@ -6182,12 +6132,13 @@ main(int argc, char **argv)
 			case 12:			/* print-errors */
 				benchmarking_option_set = true;
 				/* do not conflict with the option --debug */
-				if (debug_level < DEBUG_ERRORS)
-					debug_level = DEBUG_ERRORS;
+				if (log_level > LOG)
+					log_level = LOG;
 				break;
 			default:
-				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
-				exit(1);
+				pgbench_error(FATAL,
+							  _("Try \"%s --help\" for more information.\n"),
+							  progname);
 				break;
 		}
 	}
@@ -6224,10 +6175,7 @@ main(int argc, char **argv)
 		total_weight += sql_script[i].weight;
 
 	if (total_weight == 0 && !is_init_mode)
-	{
-		fprintf(stderr, "total script weight must not be zero\n");
-		exit(1);
-	}
+		pgbench_error(FATAL, "total script weight must not be zero\n");
 
 	/* show per script stats if several scripts are used */
 	if (num_scripts > 1)
@@ -6259,10 +6207,8 @@ main(int argc, char **argv)
 	if (is_init_mode)
 	{
 		if (benchmarking_option_set)
-		{
-			fprintf(stderr, "some of the specified options cannot be used in initialization (-i) mode\n");
-			exit(1);
-		}
+			pgbench_error(FATAL,
+						  "some of the specified options cannot be used in initialization (-i) mode\n");
 
 		if (initialize_steps == NULL)
 			initialize_steps = pg_strdup(DEFAULT_INIT_STEPS);
@@ -6294,17 +6240,13 @@ main(int argc, char **argv)
 	else
 	{
 		if (initialization_option_set)
-		{
-			fprintf(stderr, "some of the specified options cannot be used in benchmarking mode\n");
-			exit(1);
-		}
+			pgbench_error(FATAL,
+						  "some of the specified options cannot be used in benchmarking mode\n");
 	}
 
 	if (nxacts > 0 && duration > 0)
-	{
-		fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "specify either a number of transactions (-t) or a duration (-T), not both\n");
 
 	/* Use DEFAULT_NXACTS if neither nxacts nor duration is specified. */
 	if (nxacts <= 0 && duration <= 0)
@@ -6312,54 +6254,42 @@ main(int argc, char **argv)
 
 	/* --sampling-rate may be used only with -l */
 	if (sample_rate > 0.0 && !use_log)
-	{
-		fprintf(stderr, "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "log sampling (--sampling-rate) is allowed only when logging transactions (-l)\n");
 
 	/* --sampling-rate may not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
-	{
-		fprintf(stderr, "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "log sampling (--sampling-rate) and aggregation (--aggregate-interval) cannot be used at the same time\n");
 
 	if (agg_interval > 0 && !use_log)
-	{
-		fprintf(stderr, "log aggregation is allowed only when actually logging transactions\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "log aggregation is allowed only when actually logging transactions\n");
 
 	if (!use_log && logfile_prefix)
-	{
-		fprintf(stderr, "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "log file prefix (--log-prefix) is allowed only when logging transactions (-l)\n");
 
 	if (duration > 0 && agg_interval > duration)
-	{
-		fprintf(stderr, "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n", agg_interval, duration);
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "number of seconds for aggregation (%d) must not be higher than test duration (%d)\n",
+					  agg_interval, duration);
 
 	if (duration > 0 && agg_interval > 0 && duration % agg_interval != 0)
-	{
-		fprintf(stderr, "duration (%d) must be a multiple of aggregation interval (%d)\n", duration, agg_interval);
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "duration (%d) must be a multiple of aggregation interval (%d)\n",
+					  duration, agg_interval);
 
 	if (progress_timestamp && progress == 0)
-	{
-		fprintf(stderr, "--progress-timestamp is allowed only under --progress\n");
-		exit(1);
-	}
+		pgbench_error(FATAL,
+					  "--progress-timestamp is allowed only under --progress\n");
 
 	if (!max_tries)
 	{
 		if (retry && !latency_limit)
 		{
-			fprintf(stderr, "an infinite number of transaction tries can only be used with the option --latency-limit\n");
-			exit(1);
+			pgbench_error(FATAL,
+						  "an infinite number of transaction tries can only be used with the option --latency-limit\n");
 		}
 		else if (!retry)
 		{
@@ -6393,18 +6323,17 @@ main(int argc, char **argv)
 				{
 					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
-					{
-						fprintf(stderr,
-								"error when setting the startup variable \"%s\" for client %d\n",
-								var->name, i);
-						exit(1);
-					}
+						pgbench_error(FATAL,
+									  "error when setting the startup variable \"%s\" for client %d\n",
+									  var->name, i);
 				}
 				else
 				{
 					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
-						exit(1);
+						pgbench_error(FATAL,
+									  "error when setting the startup variable \"%s\" for client %d\n",
+									  var->name, i);
 				}
 			}
 		}
@@ -6417,7 +6346,7 @@ main(int argc, char **argv)
 		initRandomState(&state[i].random_state);
 	}
 
-	if (debug_level >= DEBUG_ALL)
+	if (DEBUG >= log_level)
 	{
 		if (duration <= 0)
 			printf("pghost: %s pgport: %s nclients: %d nxacts: %d dbName: %s\n",
@@ -6433,11 +6362,8 @@ main(int argc, char **argv)
 		exit(1);
 
 	if (PQstatus(con) == CONNECTION_BAD)
-	{
-		fprintf(stderr, "connection to database \"%s\" failed\n", dbName);
-		fprintf(stderr, "%s", PQerrorMessage(con));
-		exit(1);
-	}
+		pgbench_error(FATAL, "connection to database \"%s\" failed\n%s",
+					  dbName, PQerrorMessage(con));
 
 	if (internal_script_used)
 	{
@@ -6450,28 +6376,26 @@ main(int argc, char **argv)
 		{
 			char	   *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
 
-			fprintf(stderr, "%s", PQerrorMessage(con));
 			if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0)
-			{
-				fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con));
-			}
-
+				pgbench_error(LOG_PGBENCH,
+							  "%sPerhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n",
+							  PQerrorMessage(con), PQdb(con));
+			else
+				pgbench_error(LOG_PGBENCH, "%s", PQerrorMessage(con));
 			exit(1);
 		}
 		scale = atoi(PQgetvalue(res, 0, 0));
 		if (scale < 0)
-		{
-			fprintf(stderr, "invalid count(*) from pgbench_branches: \"%s\"\n",
-					PQgetvalue(res, 0, 0));
-			exit(1);
-		}
+			pgbench_error(FATAL,
+						  "invalid count(*) from pgbench_branches: \"%s\"\n",
+						  PQgetvalue(res, 0, 0));
 		PQclear(res);
 
 		/* warn if we override user-given -s switch */
 		if (scale_given)
-			fprintf(stderr,
-					"scale option ignored, using count from pgbench_branches table (%d)\n",
-					scale);
+			pgbench_error(LOG_PGBENCH,
+						  "scale option ignored, using count from pgbench_branches table (%d)\n",
+						  scale);
 	}
 
 	/*
@@ -6483,12 +6407,9 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 		{
 			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
-			{
-				fprintf(stderr,
-						"error when setting the startup variable \"scale\" for client %d\n",
-						i);
-				exit(1);
-			}
+				pgbench_error(FATAL,
+							  "error when setting the startup variable \"scale\" for client %d\n",
+							  i);
 		}
 	}
 
@@ -6500,12 +6421,9 @@ main(int argc, char **argv)
 	{
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
-			{
-				fprintf(stderr,
-						"error when setting the startup variable \"client_id\" for client %d\n",
-						i);
-				exit(1);
-			}
+				pgbench_error(FATAL,
+							  "error when setting the startup variable \"client_id\" for client %d\n",
+							  i);
 	}
 
 	/* set default seed for hash functions */
@@ -6519,12 +6437,9 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
 								(int64) seed))
-			{
-				fprintf(stderr,
-						"error when setting the startup variable \"default_seed\" for client %d\n",
-						i);
-				exit(1);
-			}
+				pgbench_error(FATAL,
+							  "error when setting the startup variable \"default_seed\" for client %d\n",
+							  i);
 	}
 
 	/* set random seed unless overwritten */
@@ -6533,27 +6448,24 @@ main(int argc, char **argv)
 		for (i = 0; i < nclients; i++)
 			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
 								random_seed))
-			{
-				fprintf(stderr,
-						"error when setting the startup variable \"random_seed\" for client %d\n",
-						i);
-				exit(1);
-			}
+				pgbench_error(FATAL,
+							  "error when setting the startup variable \"random_seed\" for client %d\n",
+							  i);
 	}
 
 	if (!is_no_vacuum)
 	{
-		fprintf(stderr, "starting vacuum...");
+		pgbench_error(LOG_PGBENCH, "starting vacuum...");
 		tryExecuteStatement(con, "vacuum pgbench_branches");
 		tryExecuteStatement(con, "vacuum pgbench_tellers");
 		tryExecuteStatement(con, "truncate pgbench_history");
-		fprintf(stderr, "end.\n");
+		pgbench_error(LOG_PGBENCH, "end.\n");
 
 		if (do_vacuum_accounts)
 		{
-			fprintf(stderr, "starting vacuum pgbench_accounts...");
+			pgbench_error(LOG_PGBENCH, "starting vacuum pgbench_accounts...");
 			tryExecuteStatement(con, "vacuum analyze pgbench_accounts");
-			fprintf(stderr, "end.\n");
+			pgbench_error(LOG_PGBENCH, "end.\n");
 		}
 	}
 	PQfinish(con);
@@ -6612,10 +6524,8 @@ main(int argc, char **argv)
 			int			err = pthread_create(&thread->thread, NULL, threadRun, thread);
 
 			if (err != 0 || thread->thread == INVALID_THREAD)
-			{
-				fprintf(stderr, "could not create thread: %s\n", strerror(err));
-				exit(1);
-			}
+				pgbench_error(FATAL, "could not create thread: %s\n",
+							  strerror(err));
 		}
 		else
 		{
@@ -6729,8 +6639,8 @@ threadRun(void *arg)
 
 		if (thread->logfile == NULL)
 		{
-			fprintf(stderr, "could not open logfile \"%s\": %s\n",
-					logpath, strerror(errno));
+			pgbench_error(LOG_PGBENCH, "could not open logfile \"%s\": %s\n",
+						  logpath, strerror(errno));
 			goto done;
 		}
 	}
@@ -6809,8 +6719,8 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					pgbench_error(LOG_PGBENCH, "invalid socket: %s",
+								  PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -6886,7 +6796,8 @@ threadRun(void *arg)
 					continue;
 				}
 				/* must be something wrong */
-				fprintf(stderr, "select() failed: %s\n", strerror(errno));
+				pgbench_error(LOG_PGBENCH, "select() failed: %s\n",
+							  strerror(errno));
 				goto done;
 			}
 		}
@@ -6911,8 +6822,8 @@ threadRun(void *arg)
 
 				if (sock < 0)
 				{
-					fprintf(stderr, "invalid socket: %s",
-							PQerrorMessage(st->con));
+					pgbench_error(LOG_PGBENCH, "invalid socket: %s",
+								  PQerrorMessage(st->con));
 					goto done;
 				}
 
@@ -6957,6 +6868,10 @@ threadRun(void *arg)
 							stdev;
 				char		tbuf[315];
 
+				/* buffer for an optional part of the progress message */
+				char		pbuf[512];
+				int			pbuf_size = 0;
+
 				/*
 				 * Add up the statistics of all threads.
 				 *
@@ -7024,27 +6939,35 @@ threadRun(void *arg)
 					snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
 				}
 
-				fprintf(stderr,
-						"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-						tbuf, tps, latency, stdev);
-
 				if (failures > 0)
-					fprintf(stderr, ", " INT64_FORMAT " failed", failures);
+					pbuf_size += snprintf(pbuf + pbuf_size,
+										  sizeof(pbuf) - pbuf_size,
+										  ", " INT64_FORMAT " failed",
+										  failures);
 
 				if (throttle_delay)
 				{
-					fprintf(stderr, ", lag %.3f ms", lag);
+					pbuf_size += snprintf(pbuf + pbuf_size,
+										  sizeof(pbuf) - pbuf_size,
+										  ", lag %.3f ms", lag);
 					if (latency_limit)
-						fprintf(stderr, ", " INT64_FORMAT " skipped",
-								cur.skipped - last.skipped);
+						pbuf_size += snprintf(pbuf + pbuf_size,
+											  sizeof(pbuf) - pbuf_size,
+											  ", " INT64_FORMAT " skipped",
+											  cur.skipped - last.skipped);
 				}
 
 				/* it can be non-zero only if max_tries is not equal to one */
 				if (retried > 0)
-					fprintf(stderr,
-							", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
-							retried, cur.retries - last.retries);
-				fprintf(stderr, "\n");
+					pbuf_size += snprintf(
+						pbuf + pbuf_size,
+						sizeof(pbuf) - pbuf_size,
+						", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+						retried, cur.retries - last.retries);
+
+				pgbench_error(LOG_PGBENCH,
+							  "progress: %s, %.1f tps, lat %.3f ms stddev %.3f%s\n",
+							  tbuf, tps, latency, stdev, pbuf_size ? pbuf : "");
 
 				last = cur;
 				last_report = now;
@@ -7128,10 +7051,7 @@ setalarm(int seconds)
 		!CreateTimerQueueTimer(&timer, queue,
 							   win32_timer_callback, NULL, seconds * 1000, 0,
 							   WT_EXECUTEINTIMERTHREAD | WT_EXECUTEONLYONCE))
-	{
-		fprintf(stderr, "failed to set timer\n");
-		exit(1);
-	}
+		pgbench_error(FATAL, "failed to set timer\n");
 }
 
 /* partial pthread implementation for Windows */
@@ -7201,3 +7121,34 @@ pthread_join(pthread_t th, void **thread_return)
 }
 
 #endif							/* WIN32 */
+
+static void
+pgbench_error(ErrorLevel elevel, const char *fmt,...)
+{
+	va_list		ap;
+
+	va_start(ap, fmt);
+	pgbench_error_va(elevel, fmt, &ap);
+	va_end(ap);
+}
+
+static void
+pgbench_error_va(ErrorLevel elevel, const char *fmt, va_list *args)
+{
+	/* Determine whether message is enabled for log output */
+	if (elevel < log_level)
+		return;
+
+	if (!fmt || !fmt[0])
+	{
+		/* internal error which should never occur */
+		/* do not call pgbench_error recursively */
+		fprintf(stderr, "unexpected empty error message\n");
+		exit(1);
+	}
+
+	vfprintf(stderr, _(fmt), *args);
+
+	if (elevel >= FATAL)
+		exit(1);
+}
-- 
2.17.1

#102

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#101)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

About the two first preparatory patches.

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a client's
random seed during the repeating of transactions after serialization/deadlock
failures).

Same version as the previous one, which was ok. Still applies, compiles,
passes tests. Fine with me.

v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client variables
during the repeating of transactions after serialization/deadlock failures).

Simpler version, applies cleanly on top of previous patch, compiles and
global & local "make check" are ok. Fine with me as well.

--
Fabien.

#103

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#101)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of transactions
with serialization/deadlock failures (see the detailed description in the
file).

About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and local
"make check" are ok.

* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta command
and other sql errors, which I find not desirable.

When a meta-command fails, before the patch the command is aborted and
there is a convenient error message:

sh> pgbench -T 10 -f bad-meta.sql
bad-meta.sql:1: unexpected function name (false) in command "set" [...]
\set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error till
the time is completed, and there are no clue about the actual issue:

sh> pgbench -T 10 -f bad-meta.sql
starting vacuum...end.
transaction type: bad-meta.sql
duration: 10 s
number of transactions actually processed: 0
number of failures: 27993953 (100.000%)
...

Same thing about SQL errors, an immediate abort...

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
client 0 aborted in command 0 of script 0; ERROR: syntax error at or near ";"
LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so that
there is no clue that the user was asking for something bad.

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
transaction type: bad-sql.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 0
number of failures: 274617 (100.000%)
# no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta command and
miscellaneous SQL errors should result in immediatly aborting the whole run,
because the client test code itself could not run correctly or the SQL sent
was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior on
other kind of errors. If these are to be changed, ISTM that it would be a
distinct patch and would require some discussion, and possibly an option
to enable it or not if some use case emerge. AFA this patch is concerned,
I'd suggest to let that out.

Doc says "you cannot use an infinite number of retries without latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent and
shortens the execution, ISTM that there could be good reason to test that.
Maybe it could be blocked only under -t if this would lead to an non-ending
run.

As "--print-errors" is really for debug, maybe it could be named
"--debug-errors". I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

* Code

The following remarks are linked to the change of behavior discussed above:
makeVariableValue error message is not for debug, but must be kept in all
cases, and the false returned must result in an immediate abort. Same thing about
lookupCreateVariable, an invalid name is a user error which warrants an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff is
really about the transaction script, not dealing with errors, esp as there is no
significant advantage in preparing a "ROLLBACK" statement which is short and has
no parameters. I'd suggest to remove this function and just issue
PQsendQuery("ROLLBACK;") in all cases.

In copyVariables, I'd simplify

  + if (source_var->svalue == NULL)
  +   dest_var->svalue = NULL;
  + else
  +   dest_var->svalue = pg_strdup(source_var->svalue);

as:

dest_var->value = (source_var->svalue == NULL) ? NULL : pg_strdup(source_var->svalue);

+ if (sqlState) -> if (sqlState != NULL) ?

Function getTransactionStatus name does not seem to correspond fully to what the
function does. There is a passthru case which should be either avoided or
clearly commented.

About:

  - commandFailed(st, "SQL", "perhaps the backend died while processing");
  + clientAborted(st,
  +              "perhaps the backend died while processing");

keep on one line?

About:

  + if (doRetry(st, &now))
  +   st->state = CSTATE_RETRY;
  + else
  +   st->state = CSTATE_FAILURE;

-> st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;

* Comments

"There're different types..." -> "There are different types..."

"after the errors and"... -> "after errors and"...

"the default value of max_tries is set to 1" -> "the default value
of max_tries is 1"

"We cannot retry the transaction" -> "We cannot retry a transaction"

"may ultimately succeed or get a failure," -> "may ultimately succeed or fail,"

Overall, the comment text in StatsData is very clear. However they are not
clearly linked to the struct fields. I'd suggest that earch field when used
should be quoted, so as to separate English from code, and the struct name
should always be used explicitely when possible.

I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).

* Documentation:

Some suggestions which may be improvements, although I'm not a native English
speaker.

ISTM that there are too many "the":
- "turns on the option ..." -> "turns on option ..."
- "When the option ..." -> "When option ..."
- "By default the option ..." -> "By default option ..."
- "only if the option ..." -> "only if option ..."
- "combined with the option ..." -> "combined with option ..."
- "without the option ..." -> "without option ..."
- "is the sum of all the retries" -> "is the sum of all retries"

"infinite" -> "unlimited"

"not retried at all" -> "not retried" (maybe several times).

"messages of all errors" -> "messages about all errors".

"It is assumed that the scripts used do not contain" ->
"It is assumed that pgbench scripts do not contain"

About v11-4. I'm do not feel that these changes are very useful/important
for now. I'd propose that your prioritize on updating 11-3 so that we can
have another round about it as soon as possible, and keep that one later.

--
Fabien.

#104

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#102)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 08-09-2018 10:17, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

About the two first preparatory patches.

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's random seed during the repeating of transactions after
serialization/deadlock failures).

Same version as the previous one, which was ok. Still applies,
compiles, passes tests. Fine with me.

v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

Simpler version, applies cleanly on top of previous patch, compiles
and global & local "make check" are ok. Fine with me as well.

Glad to hear it :)

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#105

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#103)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,

v11-0003-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

About patch v11-3.

Patch applies cleanly on top of the other two. Compiles, global and
local
"make check" are ok.

:-)

* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta
command and other sql errors, which I find not desirable.

When a meta-command fails, before the patch the command is aborted and
there is a convenient error message:

sh> pgbench -T 10 -f bad-meta.sql
bad-meta.sql:1: unexpected function name (false) in command "set"
[...]
\set i false + 1 [...]

After the patch it is simply counted, pgbench loops on the same error
till the time is completed, and there are no clue about the actual
issue:

sh> pgbench -T 10 -f bad-meta.sql
starting vacuum...end.
transaction type: bad-meta.sql
duration: 10 s
number of transactions actually processed: 0
number of failures: 27993953 (100.000%)
...

Same thing about SQL errors, an immediate abort...

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
client 0 aborted in command 0 of script 0; ERROR: syntax error at or
near ";"
LINE 1: SELECT 1 + ;

... is turned into counting without aborting nor error messages, so
that there is no clue that the user was asking for something bad.

sh> pgbench -T 10 -f bad-sql.sql
starting vacuum...end.
transaction type: bad-sql.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 0
number of failures: 274617 (100.000%)
# no clue that there was a syntax error in the script

I do not think that these changes of behavior are desirable. Meta
command and
miscellaneous SQL errors should result in immediatly aborting the whole
run,
because the client test code itself could not run correctly or the SQL
sent
was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior
on other kind of errors. If these are to be changed, ISTM that it
would be a distinct patch and would require some discussion, and
possibly an option to enable it or not if some use case emerge. AFA
this patch is concerned, I'd suggest to let that out.

...

The following remarks are linked to the change of behavior discussed
above:
makeVariableValue error message is not for debug, but must be kept in
all
cases, and the false returned must result in an immediate abort. Same
thing about
lookupCreateVariable, an invalid name is a user error which warrants
an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not
desirable.

Hmm, but we can say the same for serialization or deadlock errors that
were not retried (the client test code itself could not run correctly or
the SQL sent was somehow wrong, which is also the client's fault), can't
we? Why not handle client errors that can occur (but they may also not
occur) the same way? (For example, always abort the client, or
conversely do not make aborts in these cases.) Here's an example of such
error:

starting vacuum...end.
transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2500/2500
maximum number of tries: 1
latency average = 0.375 ms
tps = 26695.292848 (including connections establishing)
tps = 27489.678525 (excluding connections establishing)
statement latencies in milliseconds and failures:
0.001 0 \set divider random(-1000, 1000)
0.245 0 SELECT 1 / :divider;

starting vacuum...end.
client 5 got an error in command 1 (SQL) of script 0; ERROR: division
by zero

client 0 got an error in command 1 (SQL) of script 0; ERROR: division
by zero

client 7 got an error in command 1 (SQL) of script 0; ERROR: division
by zero

transaction type: pgbench_rare_sql_error.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 250
number of transactions actually processed: 2497/2500
number of failures: 3 (0.120%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 3 (0.120%)
maximum number of tries: 1
latency average = 0.579 ms (including failures)
tps = 17240.662547 (including connections establishing)
tps = 17862.090137 (excluding connections establishing)
statement latencies in milliseconds and failures:
0.001 0 \set divider random(-1000, 1000)
0.338 3 SELECT 1 / :divider;

Maybe we can limit the number of failures in one statement, and abort
the client if this limit is exceeded?...

To get a clue about the actual issue you can use the options
--failures-detailed (to find out out whether this is a serialization
failure / deadlock failure / other SQL failure / meta command failure)
and/or --print-errors (to get the complete error message).

Doc says "you cannot use an infinite number of retries without
latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent
and
shortens the execution, ISTM that there could be good reason to test
that.
Maybe it could be blocked only under -t if this would lead to an
non-ending
run.

...

* Comments

"There're different types..." -> "There are different types..."

"after the errors and"... -> "after errors and"...

"the default value of max_tries is set to 1" -> "the default value
of max_tries is 1"

"We cannot retry the transaction" -> "We cannot retry a transaction"

"may ultimately succeed or get a failure," -> "may ultimately succeed
or fail,"

...

* Documentation:

Some suggestions which may be improvements, although I'm not a native
English
speaker.

ISTM that there are too many "the":
- "turns on the option ..." -> "turns on option ..."
- "When the option ..." -> "When option ..."
- "By default the option ..." -> "By default option ..."
- "only if the option ..." -> "only if option ..."
- "combined with the option ..." -> "combined with option ..."
- "without the option ..." -> "without option ..."
- "is the sum of all the retries" -> "is the sum of all retries"

"infinite" -> "unlimited"

"not retried at all" -> "not retried" (maybe several times).

"messages of all errors" -> "messages about all errors".

"It is assumed that the scripts used do not contain" ->
"It is assumed that pgbench scripts do not contain"

Thank you, I'll fix this.

If you use the option --latency-limit, the time of tries will be limited
regardless of the use of the option -t. Therefore ISTM that an unlimited
number of tries can be used only if the time of tries is limited by the
options -T and/or -L.

As "--print-errors" is really for debug, maybe it could be named
"--debug-errors".

Ok!

I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

I'm not sure that the main debugging output will give a good clue of
what's happened without full messages about errors, retries and
failures...

* Code

<...>

sendRollback(): I'd suggest to simplify. The prepare/extended statement
stuff is
really about the transaction script, not dealing with errors, esp as
there is no
significant advantage in preparing a "ROLLBACK" statement which is
short and has
no parameters. I'd suggest to remove this function and just issue
PQsendQuery("ROLLBACK;") in all cases.

Ok!

In copyVariables, I'd simplify
+ if (source_var->svalue == NULL)
+   dest_var->svalue = NULL;
+ else
+   dest_var->svalue = pg_strdup(source_var->svalue);
as:

dest_var->value = (source_var->svalue == NULL) ? NULL :
pg_strdup(source_var->svalue);

About:
+ if (doRetry(st, &now))
+   st->state = CSTATE_RETRY;
+ else
+   st->state = CSTATE_FAILURE;
-> st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;

These lines are quite long - do you suggest to wrap them this way?

+		dest_var->svalue = ((source_var->svalue == NULL) ? NULL :
+							pg_strdup(source_var->svalue));

+						st->state = (doRetry(st, &now) ? CSTATE_RETRY :
+									 CSTATE_FAILURE);

+ if (sqlState) -> if (sqlState != NULL) ?

Ok!

Function getTransactionStatus name does not seem to correspond fully to
what the
function does. There is a passthru case which should be either avoided
or
clearly commented.

I don't quite understand you - do you mean that in fact this function
finds out whether we are in a (failed) transaction block or not? Or do
you mean that the case of PQTRANS_INTRANS is also ok?...

About:

- commandFailed(st, "SQL", "perhaps the backend died while 
processing");
+ clientAborted(st,
+              "perhaps the backend died while processing");

keep on one line?

I tried not to break the limit of 80 characters, but if you think that
this is better, I'll change it.

Overall, the comment text in StatsData is very clear. However they are
not
clearly linked to the struct fields. I'd suggest that earch field when
used
should be quoted, so as to separate English from code, and the struct
name
should always be used explicitely when possible.

Ok!

I'd insist in a comment that "cnt" does not include "skipped"
transactions
(anymore).

If you mean CState.cnt I'm not sure if this is practically useful
because the code uses only the sum of all client transactions including
skipped and failed... Maybe we can rename this field to nxacts or
total_cnt?

About v11-4. I'm do not feel that these changes are very
useful/important for now. I'd propose that your prioritize on updating
11-3 so that we can have another round about it as soon as possible,
and keep that one later.

Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#106

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Marina Polyakova (#105)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-09-2018 16:47, Marina Polyakova wrote:

On 08-09-2018 16:03, Fabien COELHO wrote:

Hello Marina,
I'd insist in a comment that "cnt" does not include "skipped"
transactions
(anymore).

If you mean CState.cnt I'm not sure if this is practically useful
because the code uses only the sum of all client transactions
including skipped and failed... Maybe we can rename this field to
nxacts or total_cnt?

Sorry, I misread your proposal for the first time. Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#107

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#105)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that were
not retried (the client test code itself could not run correctly or the SQL
sent was somehow wrong, which is also the client's fault), can't we?

I think not.

If a client asks for something "legal", but some other client in parallel
happens to make an incompatible change which result in a serialization or
deadlock error, the clients are not responsible for the raised errors, it
is just that they happen to ask for something incompatible at the same
time. So there is no user error per se, but the server is reporting its
(temporary) inability to process what was asked for. For these errors,
retrying is fine. If the client was alone, there would be no such errors,
you cannot deadlock with yourself. This is really an isolation issue
linked to parallel execution.

Why not handle client errors that can occur (but they may also not
occur) the same way? (For example, always abort the client, or
conversely do not make aborts in these cases.) Here's an example of such
error:

client 5 got an error in command 1 (SQL) of script 0; ERROR: division by zero

This is an interesting case. For me we must stop the script because the
client is asking for something "stupid", and retrying the same won't
change the outcome, the division will still be by zero. It is the client
responsability not to ask for something stupid, the bench script is buggy,
it should not submit illegal SQL queries. This is quite different from
submitting something legal which happens to fail.

Maybe we can limit the number of failures in one statement, and abort the
client if this limit is exceeded?...

I think this is quite debatable, and that the best option is to leavze
this point out of the current patch, so that we could have retry on
serial/deadlock errors.

Then you can submit another patch for a feature about other errors if you
feel that there is a use case for going on in some cases. I think that the
previous behavior made sense, and that changing it should only be
considered as an option. As it involves discussing and is not obvious,
later is better.

To get a clue about the actual issue you can use the options
--failures-detailed (to find out out whether this is a serialization failure
/ deadlock failure / other SQL failure / meta command failure) and/or
--print-errors (to get the complete error message).

Yep, but for me it should haved stopped immediately, as it did before.

If you use the option --latency-limit, the time of tries will be limited
regardless of the use of the option -t. Therefore ISTM that an unlimited
number of tries can be used only if the time of tries is limited by the
options -T and/or -L.

Indeed, I'm ok with forbidding unlimitted retries when under -t.

I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

I'm not sure that the main debugging output will give a good clue of what's
happened without full messages about errors, retries and failures...

I'm more argumenting about letting the user decide what they want.

These lines are quite long - do you suggest to wrap them this way?

Sure, if it is too long, then wrap.

Function getTransactionStatus name does not seem to correspond fully to
what the function does. There is a passthru case which should be either
avoided or clearly commented.

I don't quite understand you - do you mean that in fact this function finds
out whether we are in a (failed) transaction block or not? Or do you mean
that the case of PQTRANS_INTRANS is also ok?...

The former: although the function is named "getTransactionStatus", it does
not really return the "status" of the transaction (aka PQstatus()?).

I tried not to break the limit of 80 characters, but if you think that this
is better, I'll change it.

Hmmm. 80 columns, indeed...

I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).

If you mean CState.cnt I'm not sure if this is practically useful because the
code uses only the sum of all client transactions including skipped and
failed... Maybe we can rename this field to nxacts or total_cnt?

I'm fine with renaming the field if it makes thinks clearer. They are all
counters, so naming them "cnt" or "total_cnt" does not help much. Maybe
"succeeded" or "success" to show what is really counted?

--
Fabien.

#108

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#107)

1 attachment(s)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 11-09-2018 18:29, Fabien COELHO wrote:

Hello Marina,

Hmm, but we can say the same for serialization or deadlock errors that
were not retried (the client test code itself could not run correctly
or the SQL sent was somehow wrong, which is also the client's fault),
can't we?

I think not.

If a client asks for something "legal", but some other client in
parallel happens to make an incompatible change which result in a
serialization or deadlock error, the clients are not responsible for
the raised errors, it is just that they happen to ask for something
incompatible at the same time. So there is no user error per se, but
the server is reporting its (temporary) inability to process what was
asked for. For these errors, retrying is fine. If the client was
alone, there would be no such errors, you cannot deadlock with
yourself. This is really an isolation issue linked to parallel
execution.

You can get other errors that cannot happen for only one client if you
use shell commands in meta commands:

starting vacuum...end.
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 20/20
maximum number of tries: 1
latency average = 6.953 ms
tps = 287.630161 (including connections establishing)
tps = 303.232242 (excluding connections establishing)
statement latencies in milliseconds and failures:
1.636 0 BEGIN;
1.497 0 \setshell var mkdir my_directory && echo 1
0.007 0 \sleep 1 us
1.465 0 \setshell var rmdir my_directory && echo 1
1.622 0 END;

starting vacuum...end.
mkdir: cannot create directory ‘my_directory’: File exists
mkdir: could not read result of shell command
client 1 got an error in command 1 (setshell) of script 0; execution of
meta-command failed
transaction type: pgbench_meta_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 19/20
number of failures: 1 (5.000%)
number of meta-command failures: 1 (5.000%)
maximum number of tries: 1
latency average = 11.782 ms (including failures)
tps = 161.269033 (including connections establishing)
tps = 167.733278 (excluding connections establishing)
statement latencies in milliseconds and failures:
2.731 0 BEGIN;
2.909 1 \setshell var mkdir my_directory && echo 1
0.231 0 \sleep 1 us
2.366 0 \setshell var rmdir my_directory && echo 1
2.664 0 END;

Or if you use untrusted procedural languages in SQL expressions (see the
used file in the attachments):

client 1 got an error in command 0 (SQL) of script 0; ERROR: could not
create the directory "my_directory": File exists at line 3.
CONTEXT: PL/Perl anonymous code block

transaction type: pgbench_concurrent_error.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 18/20
number of failures: 2 (10.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 2 (10.000%)
maximum number of tries: 1
latency average = 3.282 ms (including failures)
tps = 548.437196 (including connections establishing)
tps = 637.662753 (excluding connections establishing)
statement latencies in milliseconds and failures:
1.566 2 DO $$

Or if you try to create a function and perhaps replace an existing one:

starting vacuum...end.
client 0 got an error in command 0 (SQL) of script 0; ERROR: duplicate
key value violates unique constraint "pg_proc_proname_args_nsp_index"
DETAIL: Key (proname, proargtypes, pronamespace)=(my_function, , 2200)
already exists.

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple
concurrently updated

transaction type: pgbench_create_function.sql
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/20
number of failures: 10 (50.000%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other SQL failures: 10 (50.000%)
maximum number of tries: 1
latency average = 82.881 ms (including failures)
tps = 12.065492 (including connections establishing)
tps = 12.092216 (excluding connections establishing)
statement latencies in milliseconds and failures:
82.549 10 CREATE OR REPLACE FUNCTION my_function()
RETURNS integer AS 'select 1;' LANGUAGE SQL;

Why not handle client errors that can occur (but they may also not
occur) the same way? (For example, always abort the client, or
conversely do not make aborts in these cases.) Here's an example of
such error:

client 5 got an error in command 1 (SQL) of script 0; ERROR: division
by zero

This is an interesting case. For me we must stop the script because
the client is asking for something "stupid", and retrying the same
won't change the outcome, the division will still be by zero. It is
the client responsability not to ask for something stupid, the bench
script is buggy, it should not submit illegal SQL queries. This is
quite different from submitting something legal which happens to fail.
...

I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

I'm not sure that the main debugging output will give a good clue of
what's happened without full messages about errors, retries and
failures...

I'm more argumenting about letting the user decide what they want.

These lines are quite long - do you suggest to wrap them this way?

Sure, if it is too long, then wrap.

Ok!

Function getTransactionStatus name does not seem to correspond fully
to what the function does. There is a passthru case which should be
either avoided or clearly commented.

I don't quite understand you - do you mean that in fact this function
finds out whether we are in a (failed) transaction block or not? Or do
you mean that the case of PQTRANS_INTRANS is also ok?...

The former: although the function is named "getTransactionStatus", it
does not really return the "status" of the transaction (aka
PQstatus()?).

Thank you, I'll think how to improve it. Perhaps the name
checkTransactionStatus will be better...

I'd insist in a comment that "cnt" does not include "skipped"
transactions
(anymore).

If you mean CState.cnt I'm not sure if this is practically useful
because the code uses only the sum of all client transactions
including skipped and failed... Maybe we can rename this field to
nxacts or total_cnt?

I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a comment
to this field. But IMO we have the same problem (They are all counters,
so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt
which cannot be named in the same way because it also includes skipped
and failed transactions.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#109

Fabien COELHO

coelho@cri.ensmp.fr

over 7 years ago

In reply to: Marina Polyakova (#108)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Marina,

You can get other errors that cannot happen for only one client if you use
shell commands in meta commands:

Or if you use untrusted procedural languages in SQL expressions (see the used
file in the attachments):

Or if you try to create a function and perhaps replace an existing one:

Sure. Indeed there can be shell errors, perl errors, create functions
conflicts... I do not understand what is your point wrt these.

I'm mostly saying that your patch should focus on implementing the retry
feature when appropriate, and avoid changing the behavior (error
displayed, abort or not) on features unrelated to serialization & deadlock
errors.

Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but if
so that should be a separate patch, if possible, and if these are bugs
they could be backpatched.

For now I'm still convinced that pgbench should keep on aborting on "\set"
or SQL syntax errors, and show clear error messages on these, and your
examples have not changed my mind on that point.

I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a comment to
this field. But IMO we have the same problem (They are all counters, so
naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which
cannot be named in the same way because it also includes skipped and failed
transactions.

Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if it
has a different name, esp if it has a different semantics. I think I was
arguing only about cnt in StatsData.

--
Fabien.

#110

Marina Polyakova

m.polyakova@postgrespro.ru

over 7 years ago

In reply to: Fabien COELHO (#109)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 12-09-2018 17:04, Fabien COELHO wrote:

Hello Marina,

You can get other errors that cannot happen for only one client if you
use shell commands in meta commands:

Or if you use untrusted procedural languages in SQL expressions (see
the used file in the attachments):

Or if you try to create a function and perhaps replace an existing
one:

Sure. Indeed there can be shell errors, perl errors, create functions
conflicts... I do not understand what is your point wrt these.

I'm mostly saying that your patch should focus on implementing the
retry feature when appropriate, and avoid changing the behavior (error
displayed, abort or not) on features unrelated to serialization &
deadlock errors.

Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but
if so that should be a separate patch, if possible, and if these are
bugs they could be backpatched.

For now I'm still convinced that pgbench should keep on aborting on
"\set" or SQL syntax errors, and show clear error messages on these,
and your examples have not changed my mind on that point.

I'm fine with renaming the field if it makes thinks clearer. They are
all counters, so naming them "cnt" or "total_cnt" does not help much.
Maybe "succeeded" or "success" to show what is really counted?

Perhaps renaming of StatsData.cnt is better than just adding a comment
to this field. But IMO we have the same problem (They are all
counters, so naming them "cnt" or "total_cnt" does not help much.) for
CState.cnt which cannot be named in the same way because it also
includes skipped and failed transactions.

Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if
it has a different name, esp if it has a different semantics.

Ok!

I think
I was arguing only about cnt in StatsData.

The discussion about this has become entangled from the beginning,
because as I wrote in [1]/messages/by-id/d318cdee8f96de6b1caf2ce684ffe4db@postgrespro.ru at first I misread your original proposal...

[1]: /messages/by-id/d318cdee8f96de6b1caf2ce684ffe4db@postgrespro.ru
/messages/by-id/d318cdee8f96de6b1caf2ce684ffe4db@postgrespro.ru

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#111

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Marina Polyakova (#110)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Wed, Sep 12, 2018 at 06:12:29PM +0300, Marina Polyakova wrote:

The discussion about this has become entangled from the beginning, because
as I wrote in [1] at first I misread your original proposal...

The last emails are about the last reviews of Fabien, which has remained
unanswered for the last couple of weeks. I am marking this patch as
returned with feedback for now.
--
Michael

#112

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Marina Polyakova (#101)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-Sep-05, Marina Polyakova wrote:

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a client's
random seed during the repeating of transactions after
serialization/deadlock failures).

Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random generator
infrastructure is, instead of in a totally random place). Thanks,

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#113

Marina Polyakova

m.polyakova@postgrespro.ru

about 7 years ago

In reply to: Alvaro Herrera (#112)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-11-16 22:59, Alvaro Herrera wrote:

On 2018-Sep-05, Marina Polyakova wrote:

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's
random seed during the repeating of transactions after
serialization/deadlock failures).

Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random
generator
infrastructure is, instead of in a totally random place). Thanks,

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#114

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Marina Polyakova (#113)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-Nov-19, Marina Polyakova wrote:

On 2018-11-16 22:59, Alvaro Herrera wrote:

On 2018-Sep-05, Marina Polyakova wrote:

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's
random seed during the repeating of transactions after
serialization/deadlock failures).

Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random generator
infrastructure is, instead of in a totally random place). Thanks,

Thank you very much! I'm going to send a new patch set until the end of this
week (I'm sorry I was very busy in the release of Postgres Pro 11...).

Great, thanks.

I also think that the pgbench_error() patch should go in before the main
one. It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#115

Fabien COELHO

coelho@cri.ensmp.fr

about 7 years ago

In reply to: Alvaro Herrera (#114)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Alvaro,

I also think that the pgbench_error() patch should go in before the main
one. It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.

I'm not that keen on this part of the patch, because ISTM that introduces
significant and possibly costly malloc/free cycles when handling error,
which do not currently exist in pgbench.

Previously an error was basically the end of the script, but with the
feature being introduced by Marina some errors are handled, in which case
we end up with paying these costs in the test loop. Also, refactoring
error handling is not necessary for the new feature. That is why I advised
to move it away and possibly keep it for later.

Related to Marina patch (triggered by reviewing the patches), I have
submitted a refactoring patch which aims at cleaning up the internal state
machine, so that additions and checking that all is well is simpler.

https://commitfest.postgresql.org/20/1754/

It has been reviewed, I think I answered to the reviewer concerns, but the
reviewer did not update the patch state on the cf app, so I do not know
whether he is unsatisfied or if it was just forgotten.

--
Fabien.

#116

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Fabien COELHO (#115)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On 2018-Nov-19, Fabien COELHO wrote:

Hello Alvaro,

I also think that the pgbench_error() patch should go in before the main
one. It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.

I'm not that keen on this part of the patch, because ISTM that introduces
significant and possibly costly malloc/free cycles when handling error,
which do not currently exist in pgbench.

Oh, I wasn't aware of that.

Related to Marina patch (triggered by reviewing the patches), I have
submitted a refactoring patch which aims at cleaning up the internal state
machine, so that additions and checking that all is well is simpler.

https://commitfest.postgresql.org/20/1754/

let me look at this one.

It has been reviewed, I think I answered to the reviewer concerns, but the
reviewer did not update the patch state on the cf app, so I do not know
whether he is unsatisfied or if it was just forgotten.

Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#117

Fabien COELHO

coelho@cri.ensmp.fr

about 7 years ago

In reply to: Alvaro Herrera (#116)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.

Sure, I do that. But I will not switch any of my patch to "Ready". AFAICR
the concerns where mostly about imprecise comments in the code, and a few
questions that I answered.

--
Fabien.

#118

Thomas Munro

thomas.munro@gmail.com

almost 6 years ago

In reply to: Marina Polyakova (#113)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Mon, Mar 9, 2020 at 10:00 AM Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

On 2018-11-16 22:59, Alvaro Herrera wrote:

On 2018-Sep-05, Marina Polyakova wrote:

v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a
client's
random seed during the repeating of transactions after
serialization/deadlock failures).

Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random
generator
infrastructure is, instead of in a totally random place). Thanks,

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in? It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.

#119

Fabien COELHO

coelho@cri.ensmp.fr

almost 6 years ago

In reply to: Thomas Munro (#118)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hello Thomas,

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in? It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.

I'm a reviewer on this patch, that I find a good thing (tm), and which was
converging to a reasonable and simple enough addition, IMHO.

If I proceed in place of Marina, who is going to do the reviews?

--
Fabien.

#120

Thomas Munro

thomas.munro@gmail.com

almost 6 years ago

In reply to: Fabien COELHO (#119)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in? It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.

I'm a reviewer on this patch, that I find a good thing (tm), and which was
converging to a reasonable and simple enough addition, IMHO.

If I proceed in place of Marina, who is going to do the reviews?

Hi Fabien,

Cool. I'll definitely take it for a spin if you post a fresh patch
set. Any place that we arbitrarily don't support SERIALIZABLE, I
consider a bug, so I'd like to commit this if we can agree it's ready.
It sounds like it's actually in pretty good shape.

#121

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Thomas Munro (#120)

Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Hi hackers,

On Tue, 10 Mar 2020 09:48:23 +1300
Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in? It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.

I got interested in this and now looking into the patch and the past discussion.
If anyone other won't do it and there are no objection, I would like to rebase
this. Is that okay?

Regards,
Yugo NAGATA

I'm a reviewer on this patch, that I find a good thing (tm), and which was
converging to a reasonable and simple enough addition, IMHO.

If I proceed in place of Marina, who is going to do the reviews?

Hi Fabien,

Cool. I'll definitely take it for a spin if you post a fresh patch
set. Any place that we arbitrarily don't support SERIALIZABLE, I
consider a bug, so I'd like to commit this if we can agree it's ready.
It sounds like it's actually in pretty good shape.

--
Yugo NAGATA <nagata@sraoss.co.jp>

#122

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#121)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi hackers,

On Mon, 24 May 2021 11:29:10 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi hackers,

On Tue, 10 Mar 2020 09:48:23 +1300
Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Thank you very much! I'm going to send a new patch set until the end of
this week (I'm sorry I was very busy in the release of Postgres Pro
11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in? It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.

I got interested in this and now looking into the patch and the past discussion.
If anyone other won't do it and there are no objection, I would like to rebase
this. Is that okay?

I rebased and fixed the previous patches (v11) rewtten by Marina Polyakova,
and attached the revised version (v12).

v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client
variables during the repeating of transactions after
serialization/deadlock failures).

v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of
transactions with serialization/deadlock failures (see the detailed
description in the file).

These are the revised versions from v11-0002 and v11-0003. v11-0001
(for the RandomState structure) is not included because this has been
already committed (40923191944). V11-0004 (for a separate error reporting
function) is not included neither because pgbench now uses common logging
APIs (30a3e772b40).

In addition to rebase on master, I updated the patch according with the
review from Fabien COELHO [1]/messages/by-id/alpine.DEB.2.21.1809081450100.10506@lancre and discussions after this. Also, I added
some other fixes through my reviewing the previous patch.

[1]: /messages/by-id/alpine.DEB.2.21.1809081450100.10506@lancre

Following are fixes according with Fabian's review.

* Features

As far as the actual retry feature is concerned, I'd say we are nearly
there. However I have an issue with changing the behavior on meta command
and other sql errors, which I find not desirable.

...

I do not think that these changes of behavior are desirable. Meta command and
miscellaneous SQL errors should result in immediatly aborting the whole run,
because the client test code itself could not run correctly or the SQL sent
was somehow wrong, which is also the client's fault, and the server
performance bench does not make much sense in such conditions.

ISTM that the focus of this patch should only be to handle some server
runtime errors that can be retryed, but not to change pgbench behavior on
other kind of errors. If these are to be changed, ISTM that it would be a
distinct patch and would require some discussion, and possibly an option
to enable it or not if some use case emerge. AFA this patch is concerned,
I'd suggest to let that out.

Previously, all SQL and meta command errors could be retried, but I fixed
to allow only serialization & deadlock errors to be retried.

Doc says "you cannot use an infinite number of retries without latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent and
shortens the execution, ISTM that there could be good reason to test that.
Maybe it could be blocked only under -t if this would lead to an non-ending
run.

I fixed to allow to use --max-tries with -T option even if latency-limit
is not used.

As "--print-errors" is really for debug, maybe it could be named
"--debug-errors". I'm not sure that having "--debug" implying this option
is useful: As there are two distinct options, the user may be allowed
to trigger one or the other as they wish?

print-errors was renamed to debug-errors.

makeVariableValue error message is not for debug, but must be kept in all
cases, and the false returned must result in an immediate abort. Same thing about
lookupCreateVariable, an invalid name is a user error which warrants an immediate
abort. Same thing again about coerce* functions or evalStandardFunc...
Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

"DEBUG_ERRORS" messages unrelated to serialization & deadlock errors were removed.

sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff is
really about the transaction script, not dealing with errors, esp as there is no
significant advantage in preparing a "ROLLBACK" statement which is short and has
no parameters. I'd suggest to remove this function and just issue
PQsendQuery("ROLLBACK;") in all cases.

Now, we just issue PQsendQuery("ROLLBACK;").

In copyVariables, I'd simplify
+ if (source_var->svalue == NULL)
+   dest_var->svalue = NULL;
+ else
+   dest_var->svalue = pg_strdup(source_var->svalue);
as:
dest_var->value = (source_var->svalue == NULL) ? NULL : pg_strdup(source_var->svalue);

Fixed using a ternary operator.

+ if (sqlState) -> if (sqlState != NULL) ?

Fixed.

Function getTransactionStatus name does not seem to correspond fully to what the
function does. There is a passthru case which should be either avoided or
clearly commented.

This was renamed to checkTransactionStatus according with [2]/messages/by-id/c262e889315625e0fc0d77ca78fe2eac@postgrespro.ru.

[2]: /messages/by-id/c262e889315625e0fc0d77ca78fe2eac@postgrespro.ru

- commandFailed(st, "SQL", "perhaps the backend died while processing");
+ clientAborted(st,
+              "perhaps the backend died while processing");

keep on one line?

This fix that replaced commandFailed with clientAborted was removed.
(See below)

+ if (doRetry(st, &now))
+   st->state = CSTATE_RETRY;
+ else
+   st->state = CSTATE_FAILURE;
-> st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;

Fixed using a ternary operator.

* Comments

"There're different types..." -> "There are different types..."
"after the errors and"... -> "after errors and"...
"the default value of max_tries is set to 1" -> "the default value
of max_tries is 1"
"We cannot retry the transaction" -> "We cannot retry a transaction"
"may ultimately succeed or get a failure," -> "may ultimately succeed or fail,"

Fixed.

Overall, the comment text in StatsData is very clear. However they are not
clearly linked to the struct fields. I'd suggest that earch field when used
should be quoted, so as to separate English from code, and the struct name
should always be used explicitely when possible.

The comment in StatsData was fixed to clarify what each filed in this struct
represents.

I'd insist in a comment that "cnt" does not include "skipped" transactions
(anymore).

StatsData.cnt has a comment "number of successful transactions, not including
'skipped'", and CState.cnt has a comment "skipped and failed transactions are
also counted here".

* Documentation:

ISTM that there are too many "the":
- "turns on the option ..." -> "turns on option ..."
- "When the option ..." -> "When option ..."
- "By default the option ..." -> "By default option ..."
- "only if the option ..." -> "only if option ..."
- "combined with the option ..." -> "combined with option ..."
- "without the option ..." -> "without option ..."

The previous patch used a lot of "the option xxxx", but I fixed
them to "the xxxx option" because I found that the documentation
uses such way for referring to a certain option. For example,

- You can (and, for most purposes, probably should) increase the number
of rows by using the <option>-s</option> (scale factor) option.
- The prefix can be changed by using the <option>--log-prefix</option> option.
- If the <option>-j</option> option is 2 or higher, so that there are multiple
worker threads,

- "is the sum of all the retries" -> "is the sum of all retries"
"infinite" -> "unlimited"
"not retried at all" -> "not retried" (maybe several times).
"messages of all errors" -> "messages about all errors".
"It is assumed that the scripts used do not contain" ->
"It is assumed that pgbench scripts do not contai

Fixed.

Following are additional fixes based on my review on the previous patch.

* About error reporting

In the previous patch, commandFailed() was changed to report an error
that doesn't immediately abort the client, and clientAborted() was
added to report an abortion of the client. In the attached patch,
behaviors around errors other than serialization and deadlock are
not changed and such errors cause the client to abort, so commandFaile()
is used without any changes to report a client abortion, and commandError()
is added to report an error that can be retried under --debug-error.

* About progress reporting

In the previous patch, the number of failures was reported only when any
transaction was failed, and statistics of retry was reported only when
any transaction was retried. This means, the number of columns in the
reporting were different depending on the interval. This was odd and
harder to parse the output.

In the attached patch, the number of failures is always reported, and
the retry statistic is reported when max-tries is not 1.

* About result outputs

In the previous patch, the number of failed transaction, the number
of retried transaction, and the number of total retries were reported
as:

number of failures: 324 (3.240%)
...
number of retried: 5629 (56.290%)
number of retries: 103299

I think this was confusable. Especially, it was unclear for me what
"retried" and "retries" represent repectively. Therefore, in the
attached patch, they are reported as:

number of transactions failed: 324 (3.240%)
...
number of transactions retried: 5629 (56.290%)
number of total retries: 103299

which clarify that first two are the numbers of transactions and the
last one is the number of retries over all transactions.

* Abourt average connection time

In the previous patch, this was calculated as "conn_total_duration / total->cnt"
where conn_total_duration is the cumulated connection time sumed over threads and
total->cnt is the number of transaction that is successfully processed.

However, the average connection time could be overestimated because
conn_total_duration includes a connection time of failed transaction
due to serialization and deadlock errors. So, in the attached patch,
this is calculated as "conn_total_duration / total->cnt + failures".

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From ae18f7445eff881800a398c27c850806048b060f Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Fri, 28 May 2021 10:48:57 +0900
Subject: [PATCH v12 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --debug-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 399 +++++++-
 src/bin/pgbench/pgbench.c                    | 952 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1480 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..6811a6b29c 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,23 +609,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the 
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; more over, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without 
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--debug-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the 
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or 
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,64 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of transactions failed: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+number of total retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.650224 (including connections establishing)
+tps = 413.686560 (excluding connections establishing)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2561,135 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the current transaction is rolled back which also
+   includes setting the client variables as they were before the run of this
+   transaction (it is assumed that one transaction script contains only one
+   transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock errors are repeated after
+   rollbacks until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    Although without the <option>--max-tries</option> option the transaction
+    will never be retried after an error, use an unlimited number of tries
+    (<literal>--max-tries=0</literal>) and the <option>--latency-limit</option>
+    option or the <option>--time</option> to limit only the maximum time of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--debug-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8acda86cad..77888146e2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 0;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them)
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactinos) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +458,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -446,6 +553,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +630,21 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			retries;	/* how many times have we already retried the
+								 * current transaction after a serialization or
+								 * a deadlock error? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +739,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +754,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +770,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	debug_errors = false;	/* print debug messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +909,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --debug-errors           print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1307,6 +1466,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1315,22 +1478,49 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 retries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	stats->retries += retries;
+	if (retries > 0)
+		stats->retried++;
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2865,6 +3055,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2872,6 +3065,19 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	Assert(command->type == SQL_COMMAND);
+	pg_log_error("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2979,6 +3185,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3021,6 +3254,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3035,6 +3269,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3059,6 +3294,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3076,6 +3312,20 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (debug_errors)
+						commandError(st, PQerrorMessage(st->con));
+					if (PQpipelineStatus(st->con) == PQ_PIPELINE_ABORTED)
+						PQpipelineSync(st->con);
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3154,6 +3404,160 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries
+	 * or time is over.
+	 */
+	if ((max_tries && st->retries + 1 >= max_tries) || timer_exceeded)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3183,6 +3587,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3191,6 +3597,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->retries = 0;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3227,6 +3637,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3378,6 +3796,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3516,10 +3936,55 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
 
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+						/* Check if we can retry the error. */
+						st->state =
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
 				/*
 				 * Wait until sleep is done. This state is entered after a
 				 * \sleep metacommand. The behavior is similar to
@@ -3562,6 +4027,132 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/* Check if we can retry the error. */
+					st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the retry. */
+				st->retries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (debug_errors)
+				{
+					fprintf(stderr,
+							"client %d repeats the transaction after the error (try %d",
+							st->id, st->retries);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					fprintf(stderr, ")\n");
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (debug_errors)
+				{
+					fprintf(stderr,
+							"client %d ends the failed transaction (try %d",
+							st->id, st->retries + 1);
+					if (max_tries)
+						fprintf(stderr, "/%d", max_tries);
+					if (latency_limit)
+						fprintf(stderr,
+								", %.3f%% of the maximum time of tries was used",
+								getLatencyUsed(st, &now));
+					else if (timer_exceeded)
+						fprintf(stderr,", the duration time is exceeded");
+					fprintf(stderr, ")\n");
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3576,6 +4167,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3807,6 +4421,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal(stderr, "unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3851,6 +4502,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3861,6 +4520,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3868,22 +4531,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->retries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->retries);
 		fputc('\n', logfile);
 	}
 }
@@ -3892,7 +4559,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3900,10 +4568,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3912,20 +4580,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->retries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3935,7 +4595,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->retries);
 }
 
 
@@ -4782,6 +5443,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5450,7 +6113,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5477,23 +6142,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5506,8 +6178,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5516,6 +6188,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5575,9 +6253,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5594,35 +6273,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	if (failures > 0)
+	{
+		printf("number of transactions failed: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
 	}
 
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("number of total retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5648,7 +6357,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5667,6 +6376,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5676,25 +6388,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
+
+				if (failures > 0)
+				{
+					printf(" - number of transactions failed: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - number of total retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5702,10 +6449,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5786,7 +6542,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5808,6 +6564,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"debug-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -5845,6 +6604,7 @@ main(int argc, char **argv)
 
 	PGconn	   *con;
 	char	   *env;
+	bool		retry = false;	/* retry transactions with errors or not */
 
 	int			exit_code = 0;
 
@@ -6176,6 +6936,36 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+
+					/*
+					 * Always retry transactions with errors if this option is
+					 * used. But if its value is 0, use the option
+					 * --latency-limit to limit the number of tries.
+					 */
+					retry = true;
+
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* debug-errors */
+				benchmarking_option_set = true;
+				debug_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6357,6 +7147,20 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (retry && !(latency_limit || duration > 0))
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+		else if (!retry)
+		{
+			/* By default transactions with errors are not retried */
+			max_tries = 1;
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6565,6 +7369,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6713,7 +7521,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6802,7 +7611,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 923203ea51..4488cf926c 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1225,6 +1230,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --debug-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --debug-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 9023fac52d..5bf9ab1f0e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -179,6 +179,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From fd3e1e0203f8b2b0c500d9f1f9905d315e97b6f6 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v12 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 169 ++++++++++++++++++++++++--------------
 1 file changed, 106 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..8acda86cad 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1418,39 +1440,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1582,21 +1604,43 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array. It is assumed that the sum of the number of current variables and the
+ * number of needed variables is less than or equal to (INT_MAX -
+ * VARIABLES_ALLOC_MARGIN).
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		/*
+		 * We don't want to allocate variables one by one; for efficiency, add a
+		 * constant margin each time it overflows.
+		 */
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1608,23 +1652,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1633,12 +1671,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1695,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1676,12 +1715,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1740,7 +1780,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1761,7 +1801,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1776,12 +1816,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2647,7 +2688,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2717,7 +2758,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2748,7 +2789,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2809,7 +2850,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2861,7 +2902,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2872,7 +2913,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2919,7 +2960,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3012,7 +3053,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3073,14 +3114,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3612,7 +3653,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3633,7 +3674,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3703,7 +3744,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3711,7 +3752,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -5993,7 +6034,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6333,19 +6374,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6380,11 +6421,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6393,30 +6434,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

#123

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#122)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san,

Thanks a lot for continuing this work started by Marina!

I'm planning to review it for the July CF. I've just added an entry there:

https://commitfest.postgresql.org/33/3194/

--
Fabien.

#124

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Fabien COELHO (#123)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Fabien,

On Tue, 22 Jun 2021 20:03:58 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Yugo-san,

Thanks a lot for continuing this work started by Marina!

I'm planning to review it for the July CF. I've just added an entry there:

https://commitfest.postgresql.org/33/3194/

Thanks!

--
Yugo NAGATA <nagata@sraoss.co.jp>

#125

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#122)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san:

# About v12.1

This is a refactoring patch, which creates a separate structure for
holding variables. This will become handy in the next patch. There is also
a benefit from a software engineering point of view, so it has merit on
its own.

## Compilation

Patch applies cleanly, compiles, global & local checks pass.

## About the code

Fine.

I'm wondering whether we could use "vars" instead of "variables" as a
struct field name and function parameter name, so that is is shorter and
more distinct from the type name "Variables". What do you think?

## About comments

Remove the comment on enlargeVariables about "It is assumed …" the issue
of trying MAXINT vars is more than remote and is not worth mentioning. In
the same function, remove the comments about MARGIN, it is already on the
macro declaration, once is enough.

--
Fabien.

#126

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Fabien COELHO (#125)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Wed, 23 Jun 2021 10:38:43 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Yugo-san:

# About v12.1

This is a refactoring patch, which creates a separate structure for
holding variables. This will become handy in the next patch. There is also
a benefit from a software engineering point of view, so it has merit on
its own.

## Compilation

Patch applies cleanly, compiles, global & local checks pass.

## About the code

Fine.

I'm wondering whether we could use "vars" instead of "variables" as a
struct field name and function parameter name, so that is is shorter and
more distinct from the type name "Variables". What do you think?

The struct "Variables" has a field named "vars" which is an array of
"Variable" type. I guess this is a reason why "variables" is used instead
of "vars" as a name of "Variables" type variable so that we could know
a variable's type is Variable or Variables. Also, in order to refer to
the field, we would use

vars->vars[vars->nvars]

and there are nested "vars". Could this make a codereader confused?

## About comments

Remove the comment on enlargeVariables about "It is assumed …" the issue
of trying MAXINT vars is more than remote and is not worth mentioning. In
the same function, remove the comments about MARGIN, it is already on the
macro declaration, once is enough.

Sure. I'll remove them.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#127

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#126)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san,

I'm wondering whether we could use "vars" instead of "variables" as a
struct field name and function parameter name, so that is is shorter and
more distinct from the type name "Variables". What do you think?

The struct "Variables" has a field named "vars" which is an array of
"Variable" type. I guess this is a reason why "variables" is used instead
of "vars" as a name of "Variables" type variable so that we could know
a variable's type is Variable or Variables. Also, in order to refer to
the field, we would use

vars->vars[vars->nvars]

and there are nested "vars". Could this make a codereader confused?

Hmmm… Probably. Let's keep "variables" then.

--
Fabien.

#128

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#122)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san,

# About v12.2

## Compilation

Patch seems to apply cleanly with "git apply", but does not compile on my
host: "undefined reference to `conditional_stack_reset'".

However it works better when using the "patch". I'm wondering why git
apply fails silently…

When compiling there are warnings about "pg_log_fatal", which does not
expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.

Global and local checks ok.

number of transactions failed: 324 (3.240%)
...
number of transactions retried: 5629 (56.290%)
number of total retries: 103299

I'd suggest: "number of failed transactions". "total number of retries" or
just "number of retries"?

## Feature

The overall code structure changes to implements the feature seems
reasonable to me, as we are at the 12th iteration of the patch.

Comments below are somehow about details and asking questions
about choices, and commenting…

## Documentation

There is a lot of documentation, which is good. I'll review these
separatly. It looks good, but having a native English speaker/writer
would really help!

Some output examples do not correspond to actual output for
the current version. In particular, there is always one TPS figure
given now, instead of the confusing two shown before.

## Comments

transactinos -> transactions.

## Code

By default max_tries = 0. Should not the initialization be 1,
as the documentation argues that it is the default?

Counter comments, missing + in the formula on the skipped line.

Given that we manage errors, ISTM that we should not necessarily
stop on other not retried errors, but rather count/report them and
possibly proceed. Eg with something like:

-- server side random fail
DO LANGUAGE plpgsql $$
BEGIN
IF RANDOM() < 0.1 THEN
RAISE EXCEPTION 'unlucky!';
END IF;
END;
$$;

Or:

-- client side random fail
BEGIN;
\if random(1, 10) <= 1
SELECT 1 +;
\else
SELECT 2;
\endif
COMMIT;

We could count the fail, rollback if necessary, and go on. What do you think?
Maybe such behavior would deserve an option.

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

--failures-detailed: if we bother to run with handling failures, should
it always be on?

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.
Also, should it use pg_log_debug?

doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
timer_exceeeded, for clarity.

Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?

advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.

Variables management: it looks expensive, with copying and freeing variable arrays.
I'm wondering whether we should think of something more clever. Well, that would be
for some other patch.

"Accumulate the retries" -> "Count (re)tries"?

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

## Tests

There are tests, good!

I'm wondering whether something simpler could be devised to trigger
serialization or deadlock errors, eg with a SEQUENCE and an \if.

See the attached files for generating deadlocks reliably (start with 2 clients).
What do you think? The PL/pgSQL minimal, it is really client-code
oriented.

Given that deadlocks are detected about every seconds, the test runs
would take some time. Let it be for now.

--
Fabien.

#129

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Fabien COELHO (#128)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Fabien,

On Sat, 26 Jun 2021 12:15:38 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Yugo-san,

# About v12.2

## Compilation

Patch seems to apply cleanly with "git apply", but does not compile on my
host: "undefined reference to `conditional_stack_reset'".

However it works better when using the "patch". I'm wondering why git
apply fails silently…

Hmm, I don't know why your compiling fails... I can apply and complile
successfully using git.

When compiling there are warnings about "pg_log_fatal", which does not
expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.

Ok.

Global and local checks ok.

number of transactions failed: 324 (3.240%)
...
number of transactions retried: 5629 (56.290%)
number of total retries: 103299

I'd suggest: "number of failed transactions". "total number of retries" or
just "number of retries"?

Ok. I fixed to use "number of failed transactions" and "total number of retries".

## Feature

The overall code structure changes to implements the feature seems
reasonable to me, as we are at the 12th iteration of the patch.

Comments below are somehow about details and asking questions
about choices, and commenting…

## Documentation

There is a lot of documentation, which is good. I'll review these
separatly. It looks good, but having a native English speaker/writer
would really help!

Some output examples do not correspond to actual output for
the current version. In particular, there is always one TPS figure
given now, instead of the confusing two shown before.

Fixed.

## Comments

transactinos -> transactions.

Fixed.

## Code

By default max_tries = 0. Should not the initialization be 1,
as the documentation argues that it is the default?

Ok. I fixed the default value to 1.

Counter comments, missing + in the formula on the skipped line.

Fixed.

Given that we manage errors, ISTM that we should not necessarily
stop on other not retried errors, but rather count/report them and
possibly proceed. Eg with something like:

-- server side random fail
DO LANGUAGE plpgsql $$
BEGIN
IF RANDOM() < 0.1 THEN
RAISE EXCEPTION 'unlucky!';
END IF;
END;
$$;

Or:

-- client side random fail
BEGIN;
\if random(1, 10) <= 1
SELECT 1 +;
\else
SELECT 2;
\endif
COMMIT;

We could count the fail, rollback if necessary, and go on. What do you think?
Maybe such behavior would deserve an option.

This feature to count failures that could occur at runtime seems nice. However,
as discussed in [1]/messages/by-id/alpine.DEB.2.21.1809121519590.13887@lancre, I think it is better to focus to only failures that can be
retried in this patch, and introduce the feature to handle other failures in a
separate patch.

[1]: /messages/by-id/alpine.DEB.2.21.1809121519590.13887@lancre

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.

--failures-detailed: if we bother to run with handling failures, should
it always be on?

If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures.

Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,
ISTM adding this option makes sense considering users who would like simple
outputs.

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.

I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command,
receiving a result, ... ) but interested in actual errors raised during running
script, this option seems useful.

Also, should it use pg_log_debug?

If we use pg_log_debug, the message is printed only under --debug.
Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.

doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
timer_exceeeded, for clarity.

Ok. I fixed to separate them.

Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?

I fixed to use "tries" instead of "retries" in CState. However, we still use
"retries" in StatsData and Command because the number of retries is printed
in the final result. Is it less confusing than the previous?

advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.

Sorry, I couldn't understand your suggestion. Is this about the order of case
statements or pg_log_error?

Variables management: it looks expensive, with copying and freeing variable arrays.
I'm wondering whether we should think of something more clever. Well, that would be
for some other patch.

Well.., indeed there may be more efficient way. For example, instead of clearing all
vars in dest, it might be possible to copy or clear only the difference part between
dest and source and remaining unchanged part in dest. Anyway, I think this work should
be done in other patch.

"Accumulate the retries" -> "Count (re)tries"?

Fixed.

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

The default values of max-tries is 1, so the retry on error is off.
Failed transactions are retried only when the user wants it and
specifies a valid value to max-treis.

## Tests

There are tests, good!

I'm wondering whether something simpler could be devised to trigger
serialization or deadlock errors, eg with a SEQUENCE and an \if.

See the attached files for generating deadlocks reliably (start with 2 clients).
What do you think? The PL/pgSQL minimal, it is really client-code
oriented.

Given that deadlocks are detected about every seconds, the test runs
would take some time. Let it be for now.

Sorry, but I cannot find the attached file. I don't have a good idea
for a simpler test for now, but I can fix the test based on your idea
after getting the file.

I attached the patch updated according with your suggestion.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v13-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v13-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 034db9dd6a44f76794a2f62f135b7543bddd23b2 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Fri, 28 May 2021 10:48:57 +0900
Subject: [PATCH v13 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-latencies, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --debug-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 396 +++++++-
 src/bin/pgbench/pgbench.c                    | 960 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1487 insertions(+), 114 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..5290ff0945 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,11 +609,14 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
@@ -608,9 +626,12 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       <term><option>--report-latencies</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the 
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; more over, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without 
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--debug-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the 
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or 
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,63 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of failed transactions: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+total number of retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.686560 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2560,135 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the current transaction is rolled back which also
+   includes setting the client variables as they were before the run of this
+   transaction (it is assumed that one transaction script contains only one
+   transaction; see <xref linkend="transactions-and-scripts"
+   endterm="transactions-and-scripts-title"/> for more information).
+   Transactions with serialization or deadlock errors are repeated after
+   rollbacks until they complete successfully or reach the maximum number of
+   tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of tries (specified by the <option>--latency-limit</option> option). If
+   the last transaction run fails, this transaction will be reported as failed.
+  </para>
+
+  <note>
+   <para>
+    Although without the <option>--max-tries</option> option the transaction
+    will never be retried after an error, use an unlimited number of tries
+    (<literal>--max-tries=0</literal>) and the <option>--latency-limit</option>
+    option or the <option>--time</option> to limit only the maximum time of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--debug-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3629caba42..11482032ba 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +458,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -446,6 +553,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +630,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +738,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +753,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +769,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	debug_errors = false;	/* print debug messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +908,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-latencies   report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --debug-errors           print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1307,6 +1465,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1315,22 +1477,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
+	{
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2861,6 +3052,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2868,6 +3062,19 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	const Command *command = sql_script[st->use_file].commands[st->command];
+
+	Assert(command->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2975,6 +3182,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3017,6 +3251,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3031,6 +3266,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3055,6 +3291,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3072,6 +3309,20 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (debug_errors)
+						commandError(st, PQerrorMessage(st->con));
+					if (PQpipelineStatus(st->con) == PQ_PIPELINE_ABORTED)
+						PQpipelineSync(st->con);
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3150,6 +3401,171 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries
+	 * or time is over.
+	 */
+	if ((max_tries && st->tries >= max_tries) || timer_exceeded)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3179,6 +3595,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3187,6 +3605,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3223,6 +3645,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3374,6 +3804,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3512,10 +3944,55 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
 
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+						/* Check if we can retry the error. */
+						st->state =
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
 				/*
 				 * Wait until sleep is done. This state is entered after a
 				 * \sleep metacommand. The behavior is similar to
@@ -3558,6 +4035,144 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/* Check if we can retry the error. */
+					st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (debug_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d repeats the transaction after the error (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/* Count tries and retries */
+				st->tries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (debug_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d ends the failed transaction (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					else if (timer_exceeded)
+						appendPQExpBuffer(&buf, ", the duration time is exceeded");
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3572,6 +4187,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3803,6 +4441,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3847,6 +4522,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3857,6 +4540,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3864,22 +4551,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3888,7 +4579,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3896,10 +4588,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3908,20 +4600,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3931,7 +4615,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4778,6 +5463,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5446,7 +6133,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5473,23 +6162,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5502,8 +6198,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5512,6 +6208,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5571,9 +6273,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5590,35 +6293,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
 	}
 
+	if (failures > 0)
+	{
+		printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5644,7 +6377,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5663,6 +6396,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5672,25 +6408,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				if (failures > 0)
+				{
+					printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
+
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5698,10 +6469,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5804,6 +6584,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"debug-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6172,6 +6955,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* debug-errors */
+				benchmarking_option_set = true;
+				debug_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6353,6 +7158,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6561,6 +7375,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6709,7 +7527,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6798,7 +7617,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3aa9d5d753..afaa2b9d0b 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1239,6 +1244,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --debug-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --debug-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of transactions failed)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{number of total retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 346a2667fc..56f7226c8e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -178,6 +178,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v13-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v13-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From 0c506d7de82ffd82570aa751e641faa42be8dd84 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v13 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4aeccd93af..3629caba42 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1418,39 +1440,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1582,21 +1604,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1608,23 +1646,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1633,12 +1665,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1676,12 +1709,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1740,7 +1774,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1761,7 +1795,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1776,12 +1810,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2649,7 +2684,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2719,7 +2754,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2750,7 +2785,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2811,7 +2846,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2863,7 +2898,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2874,7 +2909,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2921,7 +2956,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3014,7 +3049,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3075,14 +3110,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3614,7 +3649,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3635,7 +3670,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3705,7 +3740,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3713,7 +3748,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -5995,7 +6030,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6335,19 +6370,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6382,11 +6417,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6395,30 +6430,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

#130

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#129)

3 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san,

Thanks for the update!

Patch seems to apply cleanly with "git apply", but does not compile on my
host: "undefined reference to `conditional_stack_reset'".

However it works better when using the "patch". I'm wondering why git
apply fails silently…

Hmm, I don't know why your compiling fails... I can apply and complile
successfully using git.

Hmmm. Strange!

Given that we manage errors, ISTM that we should not necessarily stop
on other not retried errors, but rather count/report them and possibly
proceed. Eg with something like: [...] We could count the fail,
rollback if necessary, and go on. What do you think? Maybe such
behavior would deserve an option.

This feature to count failures that could occur at runtime seems nice. However,
as discussed in [1], I think it is better to focus to only failures that can be
retried in this patch, and introduce the feature to handle other failures in a
separate patch.

Ok.

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.

Hmmm. I liked the new name! My point was whether we need to support the
old one as well for compatibility, or whether we should not bother. I'm
still wondering. As I think that the new name is better, I'd suggest to
keep it.

--failures-detailed: if we bother to run with handling failures, should
it always be on?

If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures.

Ok.

Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,
ISTM adding this option makes sense considering users who would like simple
outputs.

Hmmm. What kind of failures could be managed with retries? I guess that on
a connection failure we can try to reconnect, but otherwise it is less
clear that other failures make sense to retry.

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.

I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command,
receiving a result, ... ) but interested in actual errors raised during running
script, this option seems useful.

Ok. The this is not really about debug per se, but a verbosity setting?
Maybe --verbose-errors would make more sense? I'm unsure. I'll think about
it.

Also, should it use pg_log_debug?

If we use pg_log_debug, the message is printed only under --debug.
Therefore, I fixed to use pg_log_info instead of pg_log_error or fprintf.

Ok, pg_log_info seems right.

Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?

I fixed to use "tries" instead of "retries" in CState. However, we still use
"retries" in StatsData and Command because the number of retries is printed
in the final result. Is it less confusing than the previous?

I'm going to think about it.

advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.

Sorry, I couldn't understand your suggestion. Is this about the order of case
statements or pg_log_error?

My sentence got mixed up. My point was about the case order, so that they
are put in a more logical order when reading all the cases.

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

The default values of max-tries is 1, so the retry on error is off.

Failed transactions are retried only when the user wants it and
specifies a valid value to max-treis.

Ok. My point is that we do not stop on such errors, whereas before ISTM
that we would have stopped, so somehow the default behavior has changed
and the previous behavior cannot be reinstated with an option. Maybe that
is not bad, but this is a behavioral change which needs to be documented
and argumented.

See the attached files for generating deadlocks reliably (start with 2
clients). What do you think? The PL/pgSQL minimal, it is really
client-code oriented.

Sorry, but I cannot find the attached file.

Sorry. Attached to this mail. The serialization stuff does not seem to
work as well as the deadlock one. Run with 2 clients.

--
Fabien.

#131

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#129)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I attached the patch updated according with your suggestion.

v13 patches gave a compiler warning...

$ make >/dev/null
pgbench.c: In function ‘commandError’:
pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
const Command *command = sql_script[st->use_file].commands[st->command];
^~~~~~~

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#132

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#131)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

v13 patches gave a compiler warning...

$ make >/dev/null
pgbench.c: In function ‘commandError’:
pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
const Command *command = sql_script[st->use_file].commands[st->command];
^~~~~~~

There is a typo in the doc (more over -> moreover).

of all transaction tries; more over, you cannot use an unlimited number

of all transaction tries; moreover, you cannot use an unlimited number

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#133

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#132)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I have found an interesting result from patched pgbench (I have set
the isolation level to REPEATABLE READ):

$ pgbench -p 11000 -c 10 -T 30 --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2586
number of failed transactions: 9 (0.347%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
number of transactions retried: 1892 (72.909%)
total number of retries: 21819
latency average = 115.551 ms (including failures)
initial connection time = 35.268 ms
tps = 86.241799 (without initial connection time)

I ran pgbench with 10 concurrent sessions. In this case pgbench always
reports 9 failed transactions regardless the setting of -T
option. This is because at the end of a pgbench session, only 1 out of
10 transaction succeeded but 9 transactions failed due to
serialization error without any chance to retry because -T expires.

This is a little bit disappointed because I wanted to see a result of
all transactions succeeded with retries. I tried -t instead of -T but
-t cannot be used with --max-tries=0.

Also I think this behavior is somewhat inconsistent with existing
behavior of pgbench. When pgbench runs without --max-tries option,
pgbench continues to run transactions even after -T expires:

$ time pgbench -p 11000 -T 10 -f pgbench.sql test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: pgbench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2
maximum number of tries: 1
latency average = 7009.006 ms
initial connection time = 8.045 ms
tps = 0.142674 (without initial connection time)

real 0m14.067s
user 0m0.010s
sys 0m0.004s

$ cat pgbench.sql
SELECT pg_sleep(7);

So pgbench does not stop transactions after 10 seconds passed but
waits for the last transaction completes. If we consistent with
behavior when --max-tries=0, shouldn't we retry until the last
transaction finishes?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#134

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#132)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Ishii-san,

On Thu, 01 Jul 2021 09:03:42 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

v13 patches gave a compiler warning...

$ make >/dev/null
pgbench.c: In function ‘commandError’:
pgbench.c:3071:17: warning: unused variable ‘command’ [-Wunused-variable]
const Command *command = sql_script[st->use_file].commands[st->command];
^~~~~~~

Hmm, we'll get the warning when --enable-cassert is not specified.
I'll fix it.

There is a typo in the doc (more over -> moreover).

of all transaction tries; more over, you cannot use an unlimited number

of all transaction tries; moreover, you cannot use an unlimited number

Thanks. I'll fix.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#135

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#133)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Ishii-san,

On Fri, 02 Jul 2021 09:25:03 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

I have found an interesting result from patched pgbench (I have set
the isolation level to REPEATABLE READ):

$ pgbench -p 11000 -c 10 -T 30 --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2586
number of failed transactions: 9 (0.347%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
number of transactions retried: 1892 (72.909%)
total number of retries: 21819
latency average = 115.551 ms (including failures)
initial connection time = 35.268 ms
tps = 86.241799 (without initial connection time)

I ran pgbench with 10 concurrent sessions. In this case pgbench always
reports 9 failed transactions regardless the setting of -T
option. This is because at the end of a pgbench session, only 1 out of
10 transaction succeeded but 9 transactions failed due to
serialization error without any chance to retry because -T expires.

This is a little bit disappointed because I wanted to see a result of
all transactions succeeded with retries. I tried -t instead of -T but
-t cannot be used with --max-tries=0.

Also I think this behavior is somewhat inconsistent with existing
behavior of pgbench. When pgbench runs without --max-tries option,
pgbench continues to run transactions even after -T expires:

$ time pgbench -p 11000 -T 10 -f pgbench.sql test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: pgbench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2
maximum number of tries: 1
latency average = 7009.006 ms
initial connection time = 8.045 ms
tps = 0.142674 (without initial connection time)

real 0m14.067s
user 0m0.010s
sys 0m0.004s

$ cat pgbench.sql
SELECT pg_sleep(7);

So pgbench does not stop transactions after 10 seconds passed but
waits for the last transaction completes. If we consistent with
behavior when --max-tries=0, shouldn't we retry until the last
transaction finishes?

I changed the previous patch to enable that the -T option can terminate
a retrying transaction and that we can specify --max-tries=0 without
--latency-limit if we have -T , according with the following comment.

Doc says "you cannot use an infinite number of retries without latency-limit..."

Why should this be forbidden? At least if -T timeout takes precedent and
shortens the execution, ISTM that there could be good reason to test that.
Maybe it could be blocked only under -t if this would lead to an non-ending
run.

Indeed, as Ishii-san pointed out, some users might not want to terminate
retrying transactions due to -T. However, the actual negative effect is only
printing the number of failed transactions. The other result that users want to
know, such as tps, are almost not affected because they are measured for
transactions processed successfully. Actually, the percentage of failed
transaction is very little, only 0.347%.

In the existing behaviour, running transactions are never terminated due to
the -T option. However, ISTM that this would be based on an assumption
that a latency of each transaction is small and that a timing when we can
finish the benchmark would come soon. On the other hand, when transactions can
be retried unlimitedly, it may take a long time more than expected, and we can
not guarantee that this would finish successfully in limited time. Therefore,
terminating the benchmark by giving up to retry the transaction after time
expiration seems reasonable under unlimited retries. In the sense that we don't
terminate running transactions forcibly, this don't change the existing behaviour.

If you don't want to print the number of transactions failed due to -T, we can
fix to forbid to use -T without latency-limit under max-tries=0 for avoiding
possible never-ending-benchmark. In this case, users have to limit the number of
transaction retry by specifying latency-limit or max-tries (>0). However, if some
users would like to benchmark simply allowing unlimited retries, using -T and
max-tries=0 seems the most straight way, so I think it is better that they can be
used together.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#136

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#135)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Indeed, as Ishii-san pointed out, some users might not want to terminate
retrying transactions due to -T. However, the actual negative effect is only
printing the number of failed transactions. The other result that users want to
know, such as tps, are almost not affected because they are measured for
transactions processed successfully. Actually, the percentage of failed
transaction is very little, only 0.347%.

Well, "that's very little, let's ignore it" is not technically a right
direction IMO.

In the existing behaviour, running transactions are never terminated due to
the -T option. However, ISTM that this would be based on an assumption
that a latency of each transaction is small and that a timing when we can
finish the benchmark would come soon. On the other hand, when transactions can
be retried unlimitedly, it may take a long time more than expected, and we can
not guarantee that this would finish successfully in limited time.Therefore,
terminating the benchmark by giving up to retry the transaction after time
expiration seems reasonable under unlimited retries.

That's necessarily true in practice. By the time when -T is about to
expire, transactions are all finished in finite time as you can see
the result I showed. So it's reasonable that the very last cycle of
the benchmark will finish in finite time as well.

Of course if a benchmark cycle takes infinite time, this will be a
problem. However same thing can be said to non-retry
benchmarks. Theoretically it is possible that *one* benchmark cycle
takes forever. In this case the only solution will be just hitting ^C
to terminate pgbench. Why can't we have same assumption with
--max-tries=0 case?

In the sense that we don't
terminate running transactions forcibly, this don't change the existing behaviour.

This statement seems to be depending on your perosnal assumption.

I still don't understand why you think that --max-tries non 0 case
will *certainly* finish in finite time whereas --max-tries=0 case will
not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#137

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#136)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Wed, 07 Jul 2021 16:11:23 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Indeed, as Ishii-san pointed out, some users might not want to terminate
retrying transactions due to -T. However, the actual negative effect is only
printing the number of failed transactions. The other result that users want to
know, such as tps, are almost not affected because they are measured for
transactions processed successfully. Actually, the percentage of failed
transaction is very little, only 0.347%.

Well, "that's very little, let's ignore it" is not technically a right
direction IMO.

Hmmm, It seems to me these failures are ignorable because with regard to failures
due to -T they occur only the last transaction of each client and do not affect
the result such as TPS and latency of successfully processed transactions.
(although I am not sure for what sense you use the word "technically"...)

However, maybe I am missing something. Could you please tell me what do you think
the actual harm for users about failures due to -D is?

In the existing behaviour, running transactions are never terminated due to
the -T option. However, ISTM that this would be based on an assumption
that a latency of each transaction is small and that a timing when we can
finish the benchmark would come soon. On the other hand, when transactions can
be retried unlimitedly, it may take a long time more than expected, and we can
not guarantee that this would finish successfully in limited time.Therefore,
terminating the benchmark by giving up to retry the transaction after time
expiration seems reasonable under unlimited retries.

That's necessarily true in practice. By the time when -T is about to
expire, transactions are all finished in finite time as you can see
the result I showed. So it's reasonable that the very last cycle of
the benchmark will finish in finite time as well.

Your script may finish in finite time, but others may not. However,
considering only serialization and deadlock errors, almost transactions
would finish in finite time eventually. In the previous version of the
patch, errors other than serialization or deadlock can be retried and
it causes unlimited retrying easily. Now, only the two kind of errors
can be retried, nevertheless, it is unclear for me that we can assume
that retying will finish in finite time. If we can assume it, maybe,
we can remove the restriction that --max-retries=0 must be used with
--latency-limit or -T.

Of course if a benchmark cycle takes infinite time, this will be a
problem. However same thing can be said to non-retry
benchmarks. Theoretically it is possible that *one* benchmark cycle
takes forever. In this case the only solution will be just hitting ^C
to terminate pgbench. Why can't we have same assumption with
--max-tries=0 case?

Indeed, it is possible an execution of a query takes a long or infinite
time. However, its cause would a problematic query in the custom script
or other problems occurs on the server side. These are not problem of
pgbench and, pgbench itself can't control either. On the other hand, the
unlimited number of tries is a behaviours specified by the pgbench option,
so I think pgbench itself should internally avoid problems caused from its
behaviours. That is, if max-tries=0 could cause infinite or much longer
benchmark time more than user expected due to too many retries, I think
pgbench should avoid it.

In the sense that we don't
terminate running transactions forcibly, this don't change the existing behaviour.

This statement seems to be depending on your perosnal assumption.

Ok. If we regard that a transaction is still running even when it is under
retrying after an error, terminate of the retry may imply to terminate running
the transaction forcibly.

I still don't understand why you think that --max-tries non 0 case
will *certainly* finish in finite time whereas --max-tries=0 case will
not.

I just mean that --max-tries greater than zero will prevent pgbench from retrying a
transaction forever.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#138

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#137)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Well, "that's very little, let's ignore it" is not technically a right
direction IMO.

Hmmm, It seems to me these failures are ignorable because with regard to failures
due to -T they occur only the last transaction of each client and do not affect
the result such as TPS and latency of successfully processed transactions.
(although I am not sure for what sense you use the word "technically"...)

"My application button does not respond once in 100 times. It's just
1% error rate. You should ignore it." I would say this attitude is not
technically correct.

However, maybe I am missing something. Could you please tell me what do you think
the actual harm for users about failures due to -D is?

I don't know why you are referring to -D.

That's necessarily true in practice. By the time when -T is about to
expire, transactions are all finished in finite time as you can see
the result I showed. So it's reasonable that the very last cycle of
the benchmark will finish in finite time as well.

Your script may finish in finite time, but others may not.

That's why I said "practically". In other words "in most cases the
scenario will finish in finite time".

Indeed, it is possible an execution of a query takes a long or infinite
time. However, its cause would a problematic query in the custom script
or other problems occurs on the server side. These are not problem of
pgbench and, pgbench itself can't control either. On the other hand, the
unlimited number of tries is a behaviours specified by the pgbench option,
so I think pgbench itself should internally avoid problems caused from its
behaviours. That is, if max-tries=0 could cause infinite or much longer
benchmark time more than user expected due to too many retries, I think
pgbench should avoid it.

I would say that's user's responsibility to avoid infinite running
benchmarking. Remember, pgbench is a tool for serious users, not for
novice users.

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#139

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#138)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Wed, 07 Jul 2021 21:50:16 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Well, "that's very little, let's ignore it" is not technically a right
direction IMO.

Hmmm, It seems to me these failures are ignorable because with regard to failures
due to -T they occur only the last transaction of each client and do not affect
the result such as TPS and latency of successfully processed transactions.
(although I am not sure for what sense you use the word "technically"...)

"My application button does not respond once in 100 times. It's just
1% error rate. You should ignore it." I would say this attitude is not
technically correct.

I cannot understand what you want to say. Reporting the number of transactions
that is failed intentionally can be treated as same as he error rate on your
application's button?

However, maybe I am missing something. Could you please tell me what do you think
the actual harm for users about failures due to -D is?

I don't know why you are referring to -D.

Sorry. It's just a typo as you can imagine.
I am asking you what do you think the actual harm for users due to termination of
retrying by the -T option is.

That's necessarily true in practice. By the time when -T is about to
expire, transactions are all finished in finite time as you can see
the result I showed. So it's reasonable that the very last cycle of
the benchmark will finish in finite time as well.

Your script may finish in finite time, but others may not.

That's why I said "practically". In other words "in most cases the
scenario will finish in finite time".

Sure.

Indeed, it is possible an execution of a query takes a long or infinite
time. However, its cause would a problematic query in the custom script
or other problems occurs on the server side. These are not problem of
pgbench and, pgbench itself can't control either. On the other hand, the
unlimited number of tries is a behaviours specified by the pgbench option,
so I think pgbench itself should internally avoid problems caused from its
behaviours. That is, if max-tries=0 could cause infinite or much longer
benchmark time more than user expected due to too many retries, I think
pgbench should avoid it.

I would say that's user's responsibility to avoid infinite running
benchmarking. Remember, pgbench is a tool for serious users, not for
novice users.

Of course, users themselves should be careful of problematic script, but it
would be better that pgbench itself avoids problems if pgbench can beforehand.

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

Hmmm, indeed this might make the behaviour a bit consistent, but I am not
sure such behavioural change benefit users.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#140

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Fabien COELHO (#130)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Fabien,

I attached the updated patch (v14)!

On Wed, 30 Jun 2021 17:33:24 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

Ok. Although now the option is not only for latencies, considering users who
are using the existing option, I'm fine with this. I got back this to the
previous name.

Hmmm. I liked the new name! My point was whether we need to support the
old one as well for compatibility, or whether we should not bother. I'm
still wondering. As I think that the new name is better, I'd suggest to
keep it.

Ok. I misunderstood it. I returned the option name to report-per-command.

If we keep report-latencies, I can imagine the following choises:
- use report-latencies to print only latency information
- use report-latencies as alias of report-per-command for compatibility
and remove at an appropriate timing. (that is, treat as deprecated)

Among these, I prefer the latter because ISTM we would not need many options
for reporting information per command. However, actually, I wander that we
don't have to keep the previous one if we plan to remove it eventually.

--failures-detailed: if we bother to run with handling failures, should
it always be on?

If we print other failures that cannot be retried in future, it could a lot
of lines and might make some users who don't need details of failures annoyed.
Moreover, some users would always need information of detailed failures in log,
and others would need only total numbers of failures.

Ok.

Currently we handle only serialization and deadlock failures, so the number of
lines printed and the number of columns of logging is not large even under the
failures-detail, but if we have a chance to handle other failures in future,
ISTM adding this option makes sense considering users who would like simple
outputs.

Hmmm. What kind of failures could be managed with retries? I guess that on
a connection failure we can try to reconnect, but otherwise it is less
clear that other failures make sense to retry.

Indeed, there would few failures that we should retry and I can not imagine
other than serialization , deadlock, and connection failures for now. However,
considering reporting the number of failed transaction and its causes in future,
as you said

Given that we manage errors, ISTM that we should not necessarily
stop on other not retried errors, but rather count/report them and
possibly proceed.

, we could define more a few kind of failures. At least we can consider
meta-command and other SQL commands errors in addition to serialization,
deadlock, connection failures. So, the total number of kind of failures would
be five at least and reporting always all of them results a lot of lines and
columns in logging.

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.

I think --debug is a debug option for telling users the pgbench's internal
behaviors, that is, which client is doing what. On other hand, --debug-errors
is for telling users what error caused a retry or a failure in detail. For
users who are not interested in pgbench's internal behavior (sending a command,
receiving a result, ... ) but interested in actual errors raised during running
script, this option seems useful.

Ok. The this is not really about debug per se, but a verbosity setting?

I think so.

Maybe --verbose-errors would make more sense? I'm unsure. I'll think about
it.

Agreed. This seems more proper than the previous one, so I fixed the name to
--verbose-errors.

Sorry, I couldn't understand your suggestion. Is this about the order of case
statements or pg_log_error?

My sentence got mixed up. My point was about the case order, so that they
are put in a more logical order when reading all the cases.

Ok. Considering the loical order, I moved WAIT_ROLLBACK_RESULT into
between ERROR and RETRY, because WAIT_ROLLBACK_RESULT comes atter ERROR state,
and RETRY comes after ERROR or WAIT_ROLLBACK_RESULT..

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

The default values of max-tries is 1, so the retry on error is off.

Failed transactions are retried only when the user wants it and
specifies a valid value to max-treis.

Ok. My point is that we do not stop on such errors, whereas before ISTM
that we would have stopped, so somehow the default behavior has changed
and the previous behavior cannot be reinstated with an option. Maybe that
is not bad, but this is a behavioral change which needs to be documented
and argumented.

I understood. Indeed, there is a behavioural change about whether we abort
the client after some types of errors or not. Now, serialization / deadlock
errors don't cause the abortion and are recorded as failures whereas other
errors cause to abort the client.

If we would want to record other errors as failures in future, we would need
a new option to specify which type of failures (or all types of errors, maybe)
should be reported. Until that time, ISTM we can treat serialization and
deadlock as something special errors to be reported as failures.

I rewrote "Failures and Serialization/Deadlock Retries" section a bit to
emphasis that such errors are treated differently than other errors.

See the attached files for generating deadlocks reliably (start with 2
clients). What do you think? The PL/pgSQL minimal, it is really
client-code oriented.

Sorry, but I cannot find the attached file.

Sorry. Attached to this mail. The serialization stuff does not seem to
work as well as the deadlock one. Run with 2 clients.

Hmmm, your test didn't work well for me. Both tests got stuck in
pgbench_deadlock_wait() and pgbench didn't finish.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v14-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v14-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From feb0391c642a72568b45946f4aae19f8976fd713 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v14 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 402 +++++++-
 src/bin/pgbench/pgbench.c                    | 954 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1485 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..0a3ccb3c92 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,23 +609,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,63 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of failed transactions: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+total number of retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.686560 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2560,139 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts" endterm="transactions-and-scripts-title"/>
+   for more information). Transactions with serialization or deadlock errors are
+   repeated after rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> to limit the benchmark
+    duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3629caba42..1e3d024bde 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +458,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -446,6 +553,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +630,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +738,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +753,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +769,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +908,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1307,6 +1465,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1315,22 +1477,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
 	{
-		addToSimpleStats(&stats->latency, lat);
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+	switch (estatus)
+	{
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
+
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2861,6 +3052,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2868,6 +3062,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2975,6 +3180,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3017,6 +3249,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3031,6 +3264,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3055,6 +3289,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3072,6 +3307,20 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					if (PQpipelineStatus(st->con) == PQ_PIPELINE_ABORTED)
+						PQpipelineSync(st->con);
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3150,6 +3399,165 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3179,6 +3587,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3187,6 +3597,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3223,6 +3637,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3374,6 +3796,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3512,6 +3936,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3558,6 +3984,187 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/* Check if we can retry the error. */
+					st->state = doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+						/* Check if we can retry the error. */
+						st->state =
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d repeats the transaction after the error (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/* Count tries and retries */
+				st->tries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d ends the failed transaction (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					else if (timer_exceeded)
+						appendPQExpBuffer(&buf, ", the duration time is exceeded");
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3572,6 +4179,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3803,6 +4433,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3847,6 +4514,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3857,6 +4532,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3864,22 +4543,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3888,7 +4571,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3896,10 +4580,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3908,20 +4592,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3931,7 +4607,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4778,6 +5455,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5446,7 +6125,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5473,23 +6154,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5502,8 +6190,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5512,6 +6200,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5571,9 +6265,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5590,35 +6285,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
 	}
 
+	if (failures > 0)
+	{
+		printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5644,7 +6369,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5663,6 +6388,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5672,25 +6400,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				if (failures > 0)
+				{
+					printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
+
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5698,10 +6461,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5782,7 +6554,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5804,6 +6576,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6172,6 +6947,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6353,6 +7150,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6561,6 +7367,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6709,7 +7519,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6798,7 +7609,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3aa9d5d753..f3351b51cf 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1239,6 +1244,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 346a2667fc..56f7226c8e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -178,6 +178,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v14-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v14-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From e83a30aea551ed3af2e5a90bbff887953e717002 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v14 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4aeccd93af..3629caba42 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1418,39 +1440,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1582,21 +1604,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1608,23 +1646,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1633,12 +1665,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1676,12 +1709,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1740,7 +1774,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1761,7 +1795,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1776,12 +1810,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2649,7 +2684,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2719,7 +2754,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2750,7 +2785,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2811,7 +2846,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2863,7 +2898,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2874,7 +2909,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2921,7 +2956,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3014,7 +3049,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3075,14 +3110,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3614,7 +3649,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3635,7 +3670,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3705,7 +3740,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3713,7 +3748,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -5995,7 +6030,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6335,19 +6370,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6382,11 +6417,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6395,30 +6430,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

#141

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#140)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I have played with v14 patch. I previously complained that pgbench
always reported 9 errors (actually the number is always the number
specified by "-c" -1 in my case).

$ pgbench -p 11000 -c 10 -T 10 --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 10 s
number of transactions actually processed: 974
number of failed transactions: 9 (0.916%)
number of transactions retried: 651 (66.226%)
total number of retries: 8482
latency average = 101.317 ms (including failures)
initial connection time = 44.440 ms
tps = 97.796487 (without initial connection time)

To reduce the number of errors I provide "--max-tries=9000" because
pgbench reported 8482 errors.

$ pgbench -p 11000 -c 10 -T 10 --max-tries=9000 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 10 s
number of transactions actually processed: 1133
number of failed transactions: 9 (0.788%)
number of transactions retried: 755 (66.112%)
total number of retries: 9278
maximum number of tries: 9000
latency average = 88.570 ms (including failures)
initial connection time = 23.384 ms
tps = 112.015219 (without initial connection time)

Unfortunately this didn't work. Still 9 errors because pgbench
terminated the last round of run.

Then I gave up to use -T, and switched to use -t. Number of
transactions for -t option was calculated by the total number of
transactions actually processed (1133) / number of clients (10) =
11.33. I rouned up 11.33 to 12, then multiply number of clients (10)
and got 120. The result:

$ pgbench -p 11000 -c 10 -t 120 --max-tries=9000 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 120
number of transactions actually processed: 1200/1200
number of transactions retried: 675 (56.250%)
total number of retries: 8524
maximum number of tries: 9000
latency average = 93.777 ms
initial connection time = 14.120 ms
tps = 106.635908 (without initial connection time)

Finally I was able to get a result without any errors. This is not a
super simple way to obtain pgbench results without errors, but
probably I can live with it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#142

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#139)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello,

Of course, users themselves should be careful of problematic script, but it
would be better that pgbench itself avoids problems if pgbench can beforehand.

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

I would tend to agree with this behavior, that is not to start any new
transaction or transaction attempt once -T has expired.

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems to
be the best option.

Hmmm, indeed this might make the behaviour a bit consistent, but I am not
sure such behavioural change benefit users.

The user benefit would be that if they asked for a 100s benchmark, pgbench
does a reasonable effort not to overshot that?

--
Fabien.

#143

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Fabien COELHO (#142)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

I would tend to agree with this behavior, that is not to start any new
transaction or transaction attempt once -T has expired.

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems
to be the best option.

I agree.

Hmmm, indeed this might make the behaviour a bit consistent, but I am
not
sure such behavioural change benefit users.

The user benefit would be that if they asked for a 100s benchmark,
pgbench does a reasonable effort not to overshot that?

Right.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#144

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#143)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Tue, 13 Jul 2021 13:00:49 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Or, we should terminate the last cycle of benchmark regardless it is
retrying or not if -T expires. This will make pgbench behaves much
more consistent.

I would tend to agree with this behavior, that is not to start any new
transaction or transaction attempt once -T has expired.

That is the behavior in the latest patch. Once -T has expired, any new
transaction or retry does not start.

IIUC, Ishii-san's proposal was changing the pgbench's behavior when -T has
expired to terminate any running transactions immediately regardless retrying.
I am not sure we should do it in this patch. If we would like this change,
it would be done in another patch as an improvement of the -T option.

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems
to be the best option.

I agree.

I also agree. Although I couldn't get an answer what does he think the actual
harm for users due to termination of retrying by the -T option is, I guess it just
complained about reporting the termination of retrying as failures. Therefore,
I will fix to finish the benchmark when the time is over during retrying, that is,
change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#145

Tatsuo Ishii

ishii@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#144)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I would tend to agree with this behavior, that is not to start any new
transaction or transaction attempt once -T has expired.

That is the behavior in the latest patch. Once -T has expired, any new
transaction or retry does not start.

Actually v14 has not changed the behavior in this regard as explained
in different email:

$ pgbench -p 11000 -c 10 -T 10 --max-tries=0 test
pgbench (15devel, server 13.3)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 10 s
number of transactions actually processed: 974
number of failed transactions: 9 (0.916%)
number of transactions retried: 651 (66.226%)
total number of retries: 8482
latency average = 101.317 ms (including failures)
initial connection time = 44.440 ms
tps = 97.796487 (without initial connection time)

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems
to be the best option.

I agree.

I also agree. Although I couldn't get an answer what does he think the actual
harm for users due to termination of retrying by the -T option is, I guess it just
complained about reporting the termination of retrying as failures. Therefore,
I will fix to finish the benchmark when the time is over during retrying, that is,
change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we
have 100 success transactions by time 25. At time 25 pgbench starts
next benchmark cycle and by time 30 there are 10 failing transactions
(because they are retrying). pgbench stops the execution at time
30. According your proposal (change the state to CSTATE_FINISHED
instead of CSTATE_ERROR) the total number of success transactions will
be 100 + 10 = 110, right? I guess Fabien wants to have the number to
be 100 rather than 110.

Fabien,
Please correct me if you think differently.

Also actually I have explained the harm number of times but you have
kept on ignoring it because "it's subtle". My request has been pretty
simple.

number of failed transactions: 9 (0.916%)

I don't like this and want to have the failed transactions to be 0.
Who wants a benchmark result having errors?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#146

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Tatsuo Ishii (#145)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Tue, 13 Jul 2021 14:35:00 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

I would tend to agree with this behavior, that is not to start any new
transaction or transaction attempt once -T has expired.

That is the behavior in the latest patch. Once -T has expired, any new
transaction or retry does not start.

Actually v14 has not changed the behavior in this regard as explained
in different email:

Right. Both of v13 and v14 doen't start any new transaction or retry once
-T has expired.

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems
to be the best option.

I agree.

I also agree. Although I couldn't get an answer what does he think the actual
harm for users due to termination of retrying by the -T option is, I guess it just
complained about reporting the termination of retrying as failures. Therefore,
I will fix to finish the benchmark when the time is over during retrying, that is,
change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

I guess Fabien wanted it differently. Suppose "-c 10 and -T 30" and we
have 100 success transactions by time 25. At time 25 pgbench starts
next benchmark cycle and by time 30 there are 10 failing transactions
(because they are retrying). pgbench stops the execution at time
30. According your proposal (change the state to CSTATE_FINISHED
instead of CSTATE_ERROR) the total number of success transactions will
be 100 + 10 = 110, right?

No. The last failed transaction is not counted because CSTATE_END_TX is
bypassed, so please don't worry.

Also actually I have explained the harm number of times but you have
kept on ignoring it because "it's subtle". My request has been pretty
simple.

number of failed transactions: 9 (0.916%)

I don't like this and want to have the failed transactions to be 0.
Who wants a benchmark result having errors?

I was asking you because I would like to confirm what you really complained
about; whether the problem is that retrying transaction is terminated by -T
option, or that pgbench reports it as the number of failed transactions? But
now, I understood this is the latter that you don't want to count the temination
of retrying as failures. Thanks.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#147

Yugo NAGATA

nagata@sraoss.co.jp

over 4 years ago

In reply to: Yugo NAGATA (#146)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello,

I attached the updated patch.

On Tue, 13 Jul 2021 15:50:52 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I'm a little hesitant about how to count and report such unfinished
because of bench timeout transactions, though. Not counting them seems
to be the best option.

I will fix to finish the benchmark when the time is over during retrying, that is,
change the state to CSTATE_FINISHED instead of CSTATE_ERROR in such cases.

Done.
(I wrote CSTATE_ERROR, but correctly it is CSTATE_FAILURE.)

Now, once the timer is expired during retrying a failed transaction, pgbench never start
a new transaction for retry. If the transaction successes, it will counted in the result.
Otherwise, if the transaction fails again, it is not counted.

In addition, I fixed to work well with pipeline mode. Previously, pipeline mode was not
enough considered, ROLLBACK was not sent correctly. I fixed to handle errors in pipeline
mode properly, and now it works.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v15-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v15-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 1cd3519f3a1cfbffe7bcce35fd6f12da566625b1 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v15 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 402 +++++++-
 src/bin/pgbench/pgbench.c                    | 974 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 217 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1505 insertions(+), 116 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..0a3ccb3c92 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,6 +58,7 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +66,14 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.  The seventh line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -528,6 +532,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -594,23 +609,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -738,6 +759,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -748,6 +789,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -943,8 +1016,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1017,6 +1090,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2207,7 +2285,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2228,6 +2306,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization_failure</literal> or
+   <literal>deadlock_failure</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2256,6 +2345,24 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2271,7 +2378,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2285,7 +2392,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2293,21 +2409,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2319,13 +2439,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
+
+  <para>
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2339,27 +2490,63 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+maximum number of tries: 1
 latency average = 10.870 ms
 latency stddev = 7.341 ms
 initial connection time = 30.954 ms
 tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+statement latencies in milliseconds and failures:
+  0.002  0  \set aid random(1, 100000 * :scale)
+  0.005  0  \set bid random(1, 1 * :scale)
+  0.002  0  \set tid random(1, 10 * :scale)
+  0.001  0  \set delta random(-5000, 5000)
+  0.326  0  BEGIN;
+  0.603  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.454  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  5.528  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  7.335  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.371  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.212  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+number of transactions per client: 1000
+number of transactions actually processed: 9676/10000
+number of failed transactions: 324 (3.240%)
+number of serialization failures: 324 (3.240%)
+number of transactions retried: 5629 (56.290%)
+total number of retries: 103299
+maximum number of tries: 100
+number of transactions above the 100.0 ms latency limit: 21/9676 (0.217 %)
+latency average = 16.138 ms
+latency stddev = 21.017 ms
+tps = 413.686560 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.002    0      0  \set aid random(1, 100000 * :scale)
+  0.000    0      0  \set bid random(1, 1 * :scale)
+  0.000    0      0  \set tid random(1, 10 * :scale)
+  0.000    0      0  \set delta random(-5000, 5000)
+  0.121    0      0  BEGIN;
+  0.290    0      2  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.221    0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  0.266  212  72127  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.222  112  31170  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.178    0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.210    0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2373,6 +2560,139 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there're three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts" endterm="transactions-and-scripts-title"/>
+   for more information). Transactions with serialization or deadlock errors are
+   repeated after rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> to limit the benchmark
+    duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency for failed
+   transactions and commands is not computed separately.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions if it is non-zero.
+   If the total number of retried transactions is non-zero, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c4c2fd3566..9e022c25cd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -74,6 +74,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -273,9 +275,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -360,9 +387,65 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -381,6 +464,30 @@ typedef struct RandomState
 	unsigned short xseed[3];
 } RandomState;
 
+/*
+ * Data structure for repeating a transaction from the beginnning with the same
+ * parameters.
+ */
+typedef struct
+{
+	RandomState random_state;	/* random seed */
+	Variables   variables;		/* client variables */
+} RetryState;
+
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
@@ -452,6 +559,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -500,8 +636,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	RetryState  retry_state;
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -596,6 +744,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -608,6 +759,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -622,6 +775,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -759,15 +914,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1313,6 +1471,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1321,22 +1483,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
+	{
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2867,6 +3058,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2874,6 +3068,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2981,6 +3186,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -3023,6 +3255,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3037,6 +3270,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3061,6 +3295,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3078,6 +3313,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3156,6 +3403,159 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+/*
+ * Clear the variables in the array. The array itself is not freed.
+ */
+static void
+clearVariables(Variables *variables)
+{
+	Variable   *vars,
+			   *var;
+	int			nvars;
+
+	if (!variables)
+		return;					/* nothing to do here */
+
+	vars = variables->vars;
+	nvars = variables->nvars;
+	for (var = vars; var - vars < nvars; ++var)
+	{
+		pg_free(var->name);
+		pg_free(var->svalue);
+	}
+
+	variables->nvars = 0;
+}
+
+/*
+ * Make a deep copy of variables array.
+ * Before copying the function frees the string fields of the destination
+ * variables and if necessary enlarges their array.
+ */
+static void
+copyVariables(Variables *dest, const Variables *source)
+{
+	Variable   *dest_var;
+	const Variable *source_var;
+
+	if (!dest || !source || dest == source)
+		return;					/* nothing to do here */
+
+	/*
+	 * Clear the original variables and make sure that we have enough space for
+	 * the new variables.
+	 */
+	clearVariables(dest);
+	enlargeVariables(dest, source->nvars);
+
+	/* Make a deep copy of variables array */
+	for (source_var = source->vars, dest_var = dest->vars;
+		 source_var - source->vars < source->nvars;
+		 ++source_var, ++dest_var)
+	{
+		dest_var->name = pg_strdup(source_var->name);
+		dest_var->svalue = (source_var->svalue == NULL) ?
+			NULL : pg_strdup(source_var->svalue);
+		dest_var->value = source_var->value;
+	}
+	dest->nvars = source->nvars;
+	dest->vars_sorted = source->vars_sorted;
+}
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Set in_tx_block to true if we are in a (failed) transaction block and false
+ * otherwise.
+ * Returns false on failure (broken connection or internal error).
+ */
+static bool
+checkTransactionStatus(PGconn *con, bool *in_tx_block)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			*in_tx_block = false;
+			break;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			*in_tx_block = true;
+			break;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+			{		/* there's something wrong */
+				pg_log_error("perhaps the backend died while processing");
+				return false;
+			}
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return false;
+	}
+
+	/* OK */
+	return true;
+}
+
+/*
+ * If the latency limit is used, return a percentage of the current transaction
+ * latency from the latency limit. Otherwise return zero.
+ */
+static double
+getLatencyUsed(CState *st, pg_time_usec_t *now)
+{
+	if (!latency_limit)
+		return 0.0;
+
+	pg_time_now_lazy(now);
+	return (100.0 * (*now - st->txn_scheduled) / latency_limit);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3185,6 +3585,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		bool		in_tx_block;
 
 		switch (st->state)
 		{
@@ -3193,6 +3595,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3229,6 +3635,14 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember its
+				 * parameters: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->retry_state.random_state = st->cs_func_rs;
+				copyVariables(&st->retry_state.variables, &st->variables);
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3380,6 +3794,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3518,6 +3934,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3564,6 +3982,215 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/* Read and discard until a sync point in pipeline mode */
+				if (PQpipelineStatus(st->con) != PQ_PIPELINE_OFF)
+				{
+					bool is_sync = false;
+					do {
+						res = PQgetResult(st->con);
+						if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
+							is_sync = true;
+						PQclear(res);
+					} while (!is_sync);
+				}
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
+				if (in_tx_block)
+				{
+					/* Exit pipeline mode before rollback */
+					if (PQpipelineStatus(st->con) != PQ_PIPELINE_OFF)
+					{
+						if (PQexitPipelineMode(st->con) != 1)
+						{
+							pg_log_error("client %d aborted: failed to exit pipeline mode for rolling back the failed transaction",
+										 st->id);
+							st->state = CSTATE_ABORTED;
+						}
+					}
+
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else
+				{
+					/*
+					 * If time is over, we're done;
+					 * otherwise, check if we can retry the error.
+					 */
+					st->state = timer_exceeded ? CSTATE_FINISHED :
+						doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				break;
+
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						do
+						{
+							res = PQgetResult(st->con);
+							if (res)
+								PQclear(res);
+						} while (res);
+
+						/*
+						 * If time is over, we're done;
+						 * otherwise, check if we can retry the error.
+						 */
+						st->state = timer_exceeded ? CSTATE_FINISHED :
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d repeats the transaction after the error (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/* Count tries and retries */
+				st->tries++;
+				if (report_per_command)
+					command->retries++;
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->retry_state.random_state;
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Complete the failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				if (report_per_command)
+					command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+				{
+					PQExpBufferData buf;
+
+					initPQExpBuffer(&buf);
+
+					printfPQExpBuffer(&buf, "client %d ends the failed transaction (try %d",
+									  st->id, st->tries);
+					if (max_tries)
+						appendPQExpBuffer(&buf, "/%d", max_tries);
+					if (latency_limit)
+						appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+										  getLatencyUsed(st, &now));
+					appendPQExpBuffer(&buf, ")\n");
+
+					pg_log_info("%s", buf.data);
+
+					termPQExpBuffer(&buf);
+				}
+
+				/*
+				 * Reset the execution parameters as they were at the beginning
+				 * of the transaction except for a random state.
+				 */
+				copyVariables(&st->variables, &st->retry_state.variables);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3578,6 +4205,29 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				if (!checkTransactionStatus(st->con, &in_tx_block))
+				{
+					/*
+					 * There's something wrong...
+					 * It is assumed that the function checkTransactionStatus
+					 * has already printed a more detailed error message.
+					 */
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (in_tx_block)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					finishCon(st);
@@ -3809,6 +4459,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization_failure";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock_failure";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3856,6 +4543,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3866,6 +4561,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3873,22 +4572,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3897,7 +4600,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3905,10 +4609,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3917,20 +4621,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3940,7 +4636,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4787,6 +5484,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5455,7 +6154,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5482,23 +6183,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5512,8 +6220,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5522,6 +6230,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5581,9 +6295,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5600,35 +6315,65 @@ printResults(StatsData *total,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
 	}
 
+	if (failures > 0)
+	{
+		printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+			   failures, 100.0 * failures / total_cnt);
+
+		if (failures_detailed)
+		{
+			if (total->serialization_failures)
+				printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->serialization_failures,
+					   100.0 * total->serialization_failures / total_cnt);
+			if (total->deadlock_failures)
+				printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+					   total->deadlock_failures,
+					   100.0 * total->deadlock_failures / total_cnt);
+		}
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (total->retried > 0)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
+	}
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5654,7 +6399,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5673,6 +6418,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5682,25 +6430,60 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				if (failures > 0)
+				{
+					printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+						   script_failures,
+						   100.0 * script_failures / script_total_cnt);
+
+					if (failures_detailed)
+					{
+						if (total->serialization_failures)
+							printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->serialization_failures,
+								   (100.0 * sstats->serialization_failures /
+									script_total_cnt));
+						if (total->deadlock_failures)
+							printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+								   sstats->deadlock_failures,
+								   (100.0 * sstats->deadlock_failures /
+									script_total_cnt));
+					}
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (total->retried > 0)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5708,10 +6491,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5792,7 +6584,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5814,6 +6606,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6190,6 +6985,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6371,6 +7188,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6579,6 +7405,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6734,7 +7564,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6823,7 +7654,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3aa9d5d753..f3351b51cf 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,11 @@ use Config;
 
 # start a pgbench specific server
 my $node = get_new_node('main');
-$node->init;
+
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
+
 $node->start;
 
 # invoke pgbench, with parameters:
@@ -159,7 +163,8 @@ pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1239,6 +1244,214 @@ pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{^((?!number of failed transactions)(.|\n))*$},
+	  qr{number of transactions retried: 1\b}, qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 346a2667fc..56f7226c8e 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -178,6 +178,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index a562e28846..c304014f51 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index c64c655775..9c495072aa 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v15-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v15-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From 36882744153c3642e4c75e101cd31b7f8a65a893 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v15 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 364b5a2e47..c4c2fd3566 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -287,6 +287,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -304,6 +310,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -466,9 +490,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1424,39 +1446,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1588,21 +1610,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1614,23 +1652,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1639,12 +1671,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1662,12 +1695,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1682,12 +1715,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1746,7 +1780,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1767,7 +1801,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1782,12 +1816,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2655,7 +2690,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2725,7 +2760,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2756,7 +2791,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2817,7 +2852,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2869,7 +2904,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2880,7 +2915,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2927,7 +2962,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -3020,7 +3055,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3081,14 +3116,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3620,7 +3655,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3641,7 +3676,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3711,7 +3746,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3719,7 +3754,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -6013,7 +6048,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6353,19 +6388,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6400,11 +6435,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6413,30 +6448,32 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed =
 		((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) |
 		(((uint64) pg_jrand48(base_random_sequence.xseed) & 0xFFFFFFFF) << 32);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

#148

Fabien COELHO

coelho@cri.ensmp.fr

over 4 years ago

In reply to: Yugo NAGATA (#147)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I attached the updated patch.

# About pgbench error handling v15

Patches apply cleanly. Compilation, global and local tests ok.

- v15.1: refactoring is a definite improvement.
Good, even if it is not very useful (see below).

While restructuring, maybe predefined variables could be make readonly
so that a script which would update them would fail, which would be a
good thing. Maybe this is probably material for an independent patch.

- v15.2: see detailed comments below

# Doc

Doc build is ok.

ISTM that "number of tries" line would be better placed between the
#threads and #transactions lines. What do you think?

Aggregate logging description: "{ failures | ... }" seems misleading
because it suggests we have one or the other, whereas it can also be
empty. I suggest: "{ | failures | ... }" to show the empty case.

Having a full example with retries in the doc is a good thing, and
illustrates in passing that running with a number of clients on a small
scale does not make much sense because of the contention on
tellers/branches. I'd wonder whether the number of tries is set too high,
though, ISTM that an application should give up before 100? I like that
the feature it is also limited by latency limit.

Minor editing:

"there're" -> "there are".

"the --time" -> "the --time option".

The overall English seems good, but I'm not a native speaker. As I already said, a native
speaker proofreading would be nice.

From a technical writing point of view, maybe the documentation could be improved a bit,
but I'm not a ease on that subject. Some comments:

"The latency for failed transactions and commands is not computed separately." is unclear,
please use a positive sentence to tell what is true instead of what is not and the reader
has to guess. Maybe: "The latency figures include failed transactions which have reached
the maximum number of tries or the transaction latency limit.".

"The main report contains the number of failed transactions if it is non-zero." ISTM that
this is a pain for scripts which would like to process these reports data, because the data
may or may not be there. I'm sure to write such scripts, which explains my concern:-)

"If the total number of retried transactions is non-zero…" should it rather be "not one",
because zero means unlimited retries?

The section describing the various type of errors that can occur is a good addition.

Option "--report-latencies" changed to "--report-per-commands": I'm fine with this change.

# FEATURES

--failures-detailed: I'm not convinced that this option should not always be on, but
this is not very important, so let it be.

--verbose-errors: I still think this is only for debugging, but let it be.

Copying variables: ISTM that we should not need to save the variables
states… no clearing, no copying should be needed. The restarted
transaction simply overrides the existing variables which is what the
previous version was doing anyway. The scripts should write their own
variables before using them, and if they don't then it is the user
problem. This is important for performance, because it means that after a
client has executed all scripts once the variable array is stable and does
not incur significant maintenance costs. The only thing that needs saving
for retry is the speudo-random generator state. This suggest simplifying
or removing "RetryState".

# CODE

The semantics of "cnt" is changed. Ok, the overall counters and their
relationships make sense, and it simplifies the reporting code. Good.

In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and
could be dealt with the default case. We are only interested in
serialization/deadlocks which are fatal errors?

doRetry: for consistency, given the assert, ISTM that it should return
false if duration has expired, by testing end_time or timer_exceeded.

checkTransactionStatus: this function does several things at once with 2
booleans, which make it not very clear to me. Maybe it would be clearer if
it would just return an enum (in trans, not in trans, conn error, other
error). Another reason to do that is that on connection error pgbench
could try to reconnect, which would be an interesting later extension, so
let's pave the way for that. Also, I do not think that the function
should print out a message, it should be the caller decision to do that.

verbose_errors: there is more or less repeated code under RETRY and
FAILURE, which should be factored out in a separate function. The
advanceConnectionFunction is long enough. Once this is done, there is no
need for a getLatencyUsed function.

I'd put cleaning up the pipeline in a function. I do not understand why
the pipeline mode is not exited in all cases, the code checks for the
pipeline status twice in a few lines. I'd put this cleanup in the sync
function as well, report to the caller (advanceConnectionState) if there
was an error, which would be managed there.

WAIT_ROLLBACK_RESULT: consumming results in a while could be a function to
avoid code repetition (there and in the "error:" label in
readCommandResponse). On the other hand, I'm not sure why the loop is
needed: we can only get there by submitting a "ROLLBACK" command, so there
should be only one result anyway?

report_per_command: please always count retries and failures of commands
even if they will not be reported in the end, the code will be simpler and
more efficient.

doLog: the format has changed, including a new string on failures which
replace the time field. Hmmm. Cannot say I like it much, but why not. ISTM
that the string could be shorten to "deadlock" or "serialization". ISTM
that the documentation example should include a line with a failure, to
make it clear what to expect.

I'm okay with always getting computing thread stats.

# COMMENTS

struct StatsData comment is helpful.
- "failed transactions" -> "unsuccessfully retried transactions"?
- 'cnt' decomposition: first term is field 'retried'? if so say it
explicitely?

"Complete the failed transaction" sounds strange: If it failed, it could
not complete? I'd suggest "Record a failed transaction".

# TESTS

I suggested to simplify the tests by using conditionals & sequences. You
reported that you got stuck. Hmmm.

I tried again my tests which worked fine when started with 2 clients,
otherwise they get stuck because the first client waits for the other one
which does not exists (the point is to generate deadlocks and other
errors). Maybe this is your issue?

Could you try with:

psql < deadlock_prep.sql
pgbench -t 4 -c 2 -f deadlock.sql
# note: each deadlock detection takes 1 second

psql < deadlock_prep.sql
pgbench -t 10 -c 2 -f serializable.sql
# very quick 50% serialization errors

--
Fabien.

#149

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Fabien COELHO (#148)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Fabien,

Thank you so much for your review.

Sorry for the late reply. I've stopped working on it due to other
jobs but I came back again. I attached the updated patch. I would
appreciate it if you could review this again.

On Mon, 19 Jul 2021 20:04:23 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

# About pgbench error handling v15

Patches apply cleanly. Compilation, global and local tests ok.

- v15.1: refactoring is a definite improvement.
Good, even if it is not very useful (see below).

Ok, we don't need to save variables in order to implement the
retry feature on pbench as you suggested. Well, should we completely
separate these two patches and should I fix v15.2 to not rely v15.1?

While restructuring, maybe predefined variables could be make readonly
so that a script which would update them would fail, which would be a
good thing. Maybe this is probably material for an independent patch.

Yes, It shoule be for an independent patch.

- v15.2: see detailed comments below

# Doc

Doc build is ok.

ISTM that "number of tries" line would be better placed between the
#threads and #transactions lines. What do you think?

Agreed. Fixed.

Aggregate logging description: "{ failures | ... }" seems misleading
because it suggests we have one or the other, whereas it can also be
empty. I suggest: "{ | failures | ... }" to show the empty case.

The description is correct because either "failures" or "both of
serialization_failures and deadlock_failures" should appear in aggregate
logging. If "failures" was printed only when any transaction failed,
each line in aggregate logging could have different numbers of columns
and which would make it difficult to parse the results.

I'd wonder whether the number of tries is set too high,
though, ISTM that an application should give up before 100?

Indeed, max-tries=100 seems too high for practical system.

Also, I noticed that sum of latencies of each command (= 15.839 ms)
is significantly larger than the latency average (= 10.870 ms)
because "per commands" results in the documentation were fixed.

So, I retook a measurement on my machine for more accurate documentation. I
used max-tries=10.

Minor editing:

"there're" -> "there are".

"the --time" -> "the --time option".

Fixed.

"The latency for failed transactions and commands is not computed separately." is unclear,
please use a positive sentence to tell what is true instead of what is not and the reader
has to guess. Maybe: "The latency figures include failed transactions which have reached
the maximum number of tries or the transaction latency limit.".

I'm not the original author of this description, but I guess this means "The latency is
measured only for successful transactions and commands but not for failed transactions
or commands.".

"The main report contains the number of failed transactions if it is non-zero." ISTM that
this is a pain for scripts which would like to process these reports data, because the data
may or may not be there. I'm sure to write such scripts, which explains my concern:-)

I agree with you. I fixed the behavior to report the the number of failed transactions
always regardless with if it is non-zero or not.

"If the total number of retried transactions is non-zero…" should it rather be "not one",
because zero means unlimited retries?

I guess that this means the actual number of retried transaction not the max-tries, so
"non-zero" was correct. However, for the same reason above, I fixed the behavior to
report the the retry statistics always regardeless with the actual retry numbers.

# FEATURES

Copying variables: ISTM that we should not need to save the variables
states… no clearing, no copying should be needed. The restarted
transaction simply overrides the existing variables which is what the
previous version was doing anyway. The scripts should write their own
variables before using them, and if they don't then it is the user
problem. This is important for performance, because it means that after a
client has executed all scripts once the variable array is stable and does
not incur significant maintenance costs. The only thing that needs saving
for retry is the speudo-random generator state. This suggest simplifying
or removing "RetryState".

Yes. The variables states is not necessary because we retry the
whole script. It was necessary in the initial patch because it
planned to retry one transaction included in the script. I removed
RetryState and copyVariables.

# CODE

In readCommandResponse: ISTM that PGRES_NONFATAL_ERROR is not needed and
could be dealt with the default case. We are only interested in
serialization/deadlocks which are fatal errors?

We need PGRES_NONFATAL_ERROR to save st->estatus. It is used outside
readCommandResponse to determine whether we should abort or not.

doRetry: for consistency, given the assert, ISTM that it should return
false if duration has expired, by testing end_time or timer_exceeded.

Ok. I fixed doRetry to check timer_exceeded again.

checkTransactionStatus: this function does several things at once with 2
booleans, which make it not very clear to me. Maybe it would be clearer if
it would just return an enum (in trans, not in trans, conn error, other
error). Another reason to do that is that on connection error pgbench
could try to reconnect, which would be an interesting later extension, so
let's pave the way for that. Also, I do not think that the function
should print out a message, it should be the caller decision to do that.

OK. I added a new enum type TStatus and I fixed the function to return it.
Also, I changed the function name to getTransactionStatus because the
actual check is done by the caller.

verbose_errors: there is more or less repeated code under RETRY and
FAILURE, which should be factored out in a separate function. The
advanceConnectionFunction is long enough. Once this is done, there is no
need for a getLatencyUsed function.

OK. I made a function to print verbose error messages and removed the
getLatencyUsed function.

I'd put cleaning up the pipeline in a function. I do not understand why
the pipeline mode is not exited in all cases, the code checks for the
pipeline status twice in a few lines. I'd put this cleanup in the sync
function as well, report to the caller (advanceConnectionState) if there
was an error, which would be managed there.

I fixed to exit the pipeline whenever we have an error in a pipeline mode.
Also, I added a PQpipelineSync call which was forgotten in the previous patch.

WAIT_ROLLBACK_RESULT: consumming results in a while could be a function to
avoid code repetition (there and in the "error:" label in
readCommandResponse). On the other hand, I'm not sure why the loop is
needed: we can only get there by submitting a "ROLLBACK" command, so there
should be only one result anyway?

Right. We should receive just one PGRES_COMMAND_OK and null following it.
I eliminated the loop.

report_per_command: please always count retries and failures of commands
even if they will not be reported in the end, the code will be simpler and
more efficient.

Ok. I fixed to count retries and failures of commands even if
report_per_command is false.

doLog: the format has changed, including a new string on failures which
replace the time field. Hmmm. Cannot say I like it much, but why not. ISTM
that the string could be shorten to "deadlock" or "serialization". ISTM
that the documentation example should include a line with a failure, to
make it clear what to expect.

I fixed getResultString to return "deadlock" or "serialization" instead of
"deadlock_failure" or "serialization_failure". Also, I added an output
example to the documentation.

I'm okay with always getting computing thread stats.

# COMMENTS

struct StatsData comment is helpful.
- "failed transactions" -> "unsuccessfully retried transactions"?

This seems an accurate description. However, "failed transaction" is
short and simple, and it is used in several places, so instead of
replacing them I added the following statement to define it:

"failed transaction is defined as unsuccessfully retried transactions."

- 'cnt' decomposition: first term is field 'retried'? if so say it
explicitely?

No. 'retreid' includes unsuccessfully retreid transactions, but 'cnt'
includes only successfully retried transactions.

"Complete the failed transaction" sounds strange: If it failed, it could
not complete? I'd suggest "Record a failed transaction".

Sounds good. Fixed.

# TESTS

I suggested to simplify the tests by using conditionals & sequences. You
reported that you got stuck. Hmmm.

I tried again my tests which worked fine when started with 2 clients,
otherwise they get stuck because the first client waits for the other one
which does not exists (the point is to generate deadlocks and other
errors). Maybe this is your issue?

That seems to be right. It got stuck when I used -T option rather than -t,
it was because, I guess, the number of transactions on each thread was
different.

Could you try with:

psql < deadlock_prep.sql
pgbench -t 4 -c 2 -f deadlock.sql
# note: each deadlock detection takes 1 second

psql < deadlock_prep.sql
pgbench -t 10 -c 2 -f serializable.sql
# very quick 50% serialization errors

That works. However, it still gets hang when --max-tries = 2,
so maybe I would not think we can use it for testing the retry
feature....

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v16-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v16-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 1762b812942a8d1b97ca07e269b0e41ff4189cce Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v16 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 435 ++++++++-
 src/bin/pgbench/pgbench.c                    | 900 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 215 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1458 insertions(+), 120 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index be1896fa99..d1f10f6835 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -56,8 +56,10 @@ scaling factor: 10
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of failed transactions: 0 (0.000%)
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +67,18 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.
+  The seventh line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
+  The eighth line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the number of failed transactions due to
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -531,6 +540,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -597,23 +617,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -741,6 +767,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -751,6 +797,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -948,8 +1026,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1022,6 +1100,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, he will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2212,7 +2295,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2233,6 +2316,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization</literal> or
+   <literal>deadlock</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2261,6 +2355,41 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
+  <para>
+   If <option>--failures-detailed</option> option is used, the type of
+   failure is reported in the <replaceable>time</replaceable> like this:
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 serialization 0 1499414498 84905 9
+2 0 serialization 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2276,7 +2405,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2290,7 +2419,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2298,21 +2436,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2324,13 +2466,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2342,29 +2515,67 @@ scaling factor: 1
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
-latency average = 10.870 ms
-latency stddev = 7.341 ms
-initial connection time = 30.954 ms
-tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+number of failed transactions: 0 (0.000%)
+number of transactions above the 50.0 ms latency limit: 1311/10000 (13.110 %)
+latency average = 28.488 ms
+latency stddev = 21.009 ms
+initial connection time = 69.068 ms
+tps = 346.224794 (without initial connection time)
+statement latencies in milliseconds and failures:
+   0.012  0  \set aid random(1, 100000 * :scale)
+   0.002  0  \set bid random(1, 1 * :scale)
+   0.002  0  \set tid random(1, 10 * :scale)
+   0.002  0  \set delta random(-5000, 5000)
+   0.319  0  BEGIN;
+   0.834  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+   0.641  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  11.126  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  12.961  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+   0.634  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+   1.957  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+maximum number of tries: 10
+number of transactions per client: 1000
+number of transactions actually processed: 6317/10000
+number of failed transactions: 3683 (36.830%)
+number of transactions retried: 7667 (76.670%)
+total number of retries: 45339
+number of transactions above the 50.0 ms latency limit: 106/6317 (1.678 %)
+latency average = 17.016 ms
+latency stddev = 13.283 ms
+initial connection time = 45.017 ms
+tps = 186.792667 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.006     0      0  \set aid random(1, 100000 * :scale)
+  0.001     0      0  \set bid random(1, 1 * :scale)
+  0.001     0      0  \set tid random(1, 10 * :scale)
+  0.001     0      0  \set delta random(-5000, 5000)
+  0.385     0      0  BEGIN;
+  0.773     0      1  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.624     0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  1.098   320   3762  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.582  3363  41576  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.465     0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.933     0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2378,6 +2589,140 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there are three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts" endterm="transactions-and-scripts-title"/>
+   for more information). Transactions with serialization or deadlock errors are
+   repeated after rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> option to limit the
+    benchmark duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency is measured
+   only for successful transactions and commands but not for failed transactions
+   or commands.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions. If the
+   <option>--max-tries</option> option is not equal to 1, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3743e36dac..da7dad8f65 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -76,6 +76,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -275,9 +277,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -362,9 +389,66 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction is defined as unsuccessfully retried transactions.
+	 * It can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +459,31 @@ typedef struct StatsData
  */
 pg_time_usec_t epoch_shift;
 
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
+/*
+ * Transaction status at the end of a command.
+ */
+typedef enum TStatus
+{
+	TSTATUS_IDLE,
+	TSTATUS_IN_BLOCK,
+	TSTATUS_CONN_ERROR,
+	TSTATUS_OTHER_ERROR
+} TStatus;
+
 /* Various random sequences are initialized from this one. */
 static pg_prng_state base_random_sequence;
 
@@ -446,6 +555,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +632,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	pg_prng_state	random_state;	/* random state */
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +740,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +755,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +771,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +910,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1287,6 +1447,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1295,22 +1459,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
 	{
-		addToSimpleStats(&stats->latency, lat);
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+	switch (estatus)
+	{
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
+
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2841,6 +3034,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2848,6 +3044,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2955,6 +3162,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -2997,6 +3231,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3011,6 +3246,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3035,6 +3271,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3052,6 +3289,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3130,6 +3379,126 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Get the transaction status at the end of a command especially for
+ * checking if we are in a (failed) transaction block.
+ */
+static TStatus
+getTransactionStatus(PGconn *con)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			return TSTATUS_IDLE;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			return TSTATUS_IN_BLOCK;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+				return TSTATUS_CONN_ERROR;
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return TSTATUS_OTHER_ERROR;
+	}
+
+	/* not reached */
+	Assert(false);
+	return TSTATUS_OTHER_ERROR;
+}
+
+/*
+ * Print verbose messages of an error
+ */
+static void
+printVerboseErrorMessages(CState *st, pg_time_usec_t *now, bool is_retry)
+{
+	PQExpBufferData buf;
+
+	initPQExpBuffer(&buf);
+
+	printfPQExpBuffer(&buf, "client %d ", st->id);
+	appendPQExpBuffer(&buf, "%s",
+					  (is_retry ?
+						"repeats the transaction after the error" :
+						"ends the failed transaction"));
+	appendPQExpBuffer(&buf, " (try %d", st->tries);
+
+	/* Print max_tries if it is not unlimitted. */
+	if (max_tries)
+		appendPQExpBuffer(&buf, "/%d", max_tries);
+
+	/*
+	 * If the latency limit is used, print a percentage of the current transaction
+	 * latency from the latency limit.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		appendPQExpBuffer(&buf, ", %.3f%% of the maximum time of tries was used",
+						  (100.0 * (*now - st->txn_scheduled) / latency_limit));
+	}
+	appendPQExpBuffer(&buf, ")\n");
+
+	pg_log_info("%s", buf.data);
+
+	termPQExpBuffer(&buf);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3159,6 +3528,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	for (;;)
 	{
 		Command    *command;
+		PGresult   *res;
+		TStatus		tstatus;
 
 		switch (st->state)
 		{
@@ -3167,6 +3538,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3207,6 +3582,13 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember the
+				 * random state: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->random_state = st->cs_func_rs;
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3363,6 +3745,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3508,6 +3892,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3554,6 +3940,179 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					CSTATE_START_COMMAND : CSTATE_SKIP_COMMAND;
 				break;
 
+				/*
+				 * Clean up after an error.
+				 */
+			case CSTATE_ERROR:
+
+				Assert(st->estatus != ESTATUS_NO_ERROR);
+
+				/* Clear the conditional stack */
+				conditional_stack_reset(st->cstack);
+
+				/* Read and discard until a sync point in pipeline mode */
+				if (PQpipelineStatus(st->con) != PQ_PIPELINE_OFF)
+				{
+					/* send a sync */
+					if (!PQpipelineSync(st->con))
+					{
+						pg_log_error("client %d aborted: failed to send a pipeline sync",
+									st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					/* receive PGRES_PIPELINE_SYNC and null fullowing it */
+					for(;;)
+					{
+						res = PQgetResult(st->con);
+						if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
+						{
+							PQclear(res);
+							res = PQgetResult(st->con);
+							Assert(res == NULL);
+							break;
+						}
+						PQclear(res);
+					}
+
+					/* exit pipline */
+					if (PQexitPipelineMode(st->con) != 1)
+					{
+						pg_log_error("client %d aborted: failed to exit pipeline mode for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+				}
+
+				/*
+				 * Check if we have a (failed) transaction block or not, and
+				 * roll it back if any.
+				 */
+				tstatus = getTransactionStatus(st->con);
+				if (tstatus == TSTATUS_IN_BLOCK)
+				{
+					/* Try to rollback a (failed) transaction block. */
+					if (!PQsendQuery(st->con, "ROLLBACK"))
+					{
+						pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					else
+						st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+				}
+				else if (tstatus == TSTATUS_IDLE)
+				{
+					/*
+					* If time is over, we're done;
+					* otherwise, check if we can retry the error.
+					*/
+					st->state = timer_exceeded ? CSTATE_FINISHED :
+						doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+				}
+				else
+				{
+					if (tstatus == TSTATUS_CONN_ERROR)
+						pg_log_error("perhaps the backend died while processing");
+
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+				}
+				break;
+
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
+				pg_log_debug("client %d receiving", st->id);
+				if (!PQconsumeInput(st->con))
+				{
+					pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				if (PQisBusy(st->con))
+					return;		/* don't have the whole result yet */
+
+				/*
+				 * Read and discard the query result;
+				 */
+				res = PQgetResult(st->con);
+				switch (PQresultStatus(res))
+				{
+					case PGRES_COMMAND_OK:
+						/* OK */
+						PQclear(res);
+						/* null must be returned */
+						res = PQgetResult(st->con);
+						Assert(res == NULL);
+
+						/*
+						 * If time is over, we're done;
+						 * otherwise, check if we can retry the error.
+						 */
+						st->state = timer_exceeded ? CSTATE_FINISHED :
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+						break;
+					default:
+						pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+									 st->id, PQerrorMessage(st->con));
+						PQclear(res);
+						st->state = CSTATE_ABORTED;
+						break;
+				}
+				break;
+
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/*
+				 * Inform that the transaction will be retried after the error.
+				 */
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, true);
+
+				/* Count tries and retries */
+				st->tries++;
+				command->retries++;
+
+				/*
+				 * Reset the random state as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->random_state;
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Record a failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, false);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
 				/*
 				 * End of transaction (end of script, really).
 				 */
@@ -3568,6 +4127,28 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				 */
 				Assert(conditional_stack_empty(st->cstack));
 
+				/*
+				 * We must complete all the transaction blocks that were
+				 * started in this script.
+				 */
+				tstatus = getTransactionStatus(st->con);
+				if (tstatus == TSTATUS_IN_BLOCK)
+				{
+					pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+								 st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+				else if (tstatus != TSTATUS_IDLE)
+				{
+					if (tstatus == TSTATUS_CONN_ERROR)
+						pg_log_error("perhaps the backend died while processing");
+
+					pg_log_error("client %d aborted while receiving the transaction status", st->id);
+					st->state = CSTATE_ABORTED;
+					break;
+				}
+
 				if (is_connect)
 				{
 					pg_time_usec_t start = now;
@@ -3816,6 +4397,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3863,6 +4481,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3873,6 +4499,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3880,22 +4510,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3904,7 +4538,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3912,10 +4547,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3924,20 +4559,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3947,7 +4574,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4806,6 +5434,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5474,7 +6104,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5501,23 +6133,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5531,8 +6170,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5541,6 +6180,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5600,9 +6245,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5615,39 +6261,67 @@ printResults(StatsData *total,
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
 	printf("number of threads: %d\n", nthreads);
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+		   failures, 100.0 * failures / total_cnt);
+
+	if (failures_detailed)
+	{
+		if (total->serialization_failures)
+			printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+				   total->serialization_failures,
+				   100.0 * total->serialization_failures / total_cnt);
+		if (total->deadlock_failures)
+			printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+				   total->deadlock_failures,
+				   100.0 * total->deadlock_failures / total_cnt);
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
 	}
 
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f %%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5673,7 +6347,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5692,6 +6366,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5701,25 +6378,57 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+					   script_failures,
+					   100.0 * script_failures / script_total_cnt);
+
+				if (failures_detailed)
+				{
+					if (total->serialization_failures)
+						printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+							   sstats->serialization_failures,
+							   (100.0 * sstats->serialization_failures /
+								script_total_cnt));
+					if (total->deadlock_failures)
+						printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+							   sstats->deadlock_failures,
+							   (100.0 * sstats->deadlock_failures /
+								script_total_cnt));
+				}
+
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (max_tries != 1)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5727,10 +6436,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5810,7 +6528,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5832,6 +6550,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6185,6 +6906,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6366,6 +7109,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6578,6 +7330,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6724,7 +7480,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6813,7 +7570,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 8b03900f32..d07bf665a3 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,9 @@ use Config;
 
 # start a pgbench specific server
 my $node = PostgreSQL::Test::Cluster->new('main');
-$node->init;
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
 $node->start;
 
 # tablespace for testing, because partitioned tables cannot use pg_default
@@ -111,7 +113,8 @@ $node->pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1200,6 +1203,214 @@ $node->pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+$node->pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+$node->pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+$node->pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index acad19edd0..a5074c70d9 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -188,6 +188,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index 0bf877e895..5a94664989 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index b28189471c..fa53d86501 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

v16-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v16-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From 12eb5ff63cb92adeee86937104e2cbb274337289 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v16 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f166a77e3a..3743e36dac 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1562,21 +1584,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1588,23 +1626,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1613,12 +1645,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1636,12 +1669,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1720,7 +1754,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1741,7 +1775,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1756,12 +1790,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2629,7 +2664,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2699,7 +2734,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2730,7 +2765,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2791,7 +2826,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2843,7 +2878,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2854,7 +2889,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2901,7 +2936,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -2994,7 +3029,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3055,14 +3090,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3627,7 +3662,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3648,7 +3683,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3718,7 +3753,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3726,7 +3761,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -6020,7 +6055,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6348,19 +6383,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6398,11 +6433,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6411,28 +6446,30 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = pg_prng_uint64(&base_random_sequence);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

#150

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#149)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Yugo and Fabien,

It seems the patch is ready for committer except below. Do you guys
want to do more on below?

# TESTS

I suggested to simplify the tests by using conditionals & sequences. You
reported that you got stuck. Hmmm.

I tried again my tests which worked fine when started with 2 clients,
otherwise they get stuck because the first client waits for the other one
which does not exists (the point is to generate deadlocks and other
errors). Maybe this is your issue?

That seems to be right. It got stuck when I used -T option rather than -t,
it was because, I guess, the number of transactions on each thread was
different.

Could you try with:

psql < deadlock_prep.sql
pgbench -t 4 -c 2 -f deadlock.sql
# note: each deadlock detection takes 1 second

psql < deadlock_prep.sql
pgbench -t 10 -c 2 -f serializable.sql
# very quick 50% serialization errors

That works. However, it still gets hang when --max-tries = 2,
so maybe I would not think we can use it for testing the retry
feature....

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#151

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Tatsuo Ishii (#150)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Tatsuo-san,

It seems the patch is ready for committer except below. Do you guys want
to do more on below?

I'm planning a new review of this significant patch, possibly over the
next week-end, or the next.

--
Fabien.

#152

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Yugo NAGATA (#149)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Yugo-san,

About Pgbench error handling v16:

This patch set needs a minor rebase because of 506035b0. Otherwise, patch
compiles, global and local "make check" are ok. Doc generation is ok.

This patch is in good shape, the code and comments are clear.
Some minor remarks below, including typos and a few small suggestions.

## About v16-1

This refactoring patch adds a struct for managing pgbench variables, instead of
mixing fields into the client state (CState) struct.

Patch compiles, global and local "make check" are both ok.

Although this patch is not necessary to add the feature, I'm fine with it as
improves pgbench source code readability.

## About v16-2

This last patch adds handling of serialization and deadlock errors to pgbench
transactions. This feature is desirable because it enlarge performance testing
options, and makes pgbench behave more like a database client application.

Possible future extension enabled by this patch include handling deconnections
errors by trying to reconnect, for instance.

The documentation is clear and well written, at least for my non-native speaker
eyes and ears.

English: "he will be aborted" -> "it will be aborted".

I'm fine with renaming --report-latencies to --report-per-command as the later
is clearer about what the options does.

I'm still not sure I like the "failure detailed" option, ISTM that the report
could be always detailed. That would remove some complexity and I do not think
that people executing a bench with error handling would mind having the details.
No big deal.

printVerboseErrorMessages: I'd make the buffer static and initialized only once
so that there is no significant malloc/free cycle involved when calling the function.

advanceConnectionState: I'd really prefer not to add new variables (res, status)
in the loop scope, and only declare them when actually needed in the state branches,
so as to avoid any unwanted interaction between states.

typo: "fullowing" -> "following"

Pipeline cleaning: the advance function is already soooo long, I'd put that in a
separate function and call it.

I think that the report should not remove data when they are 0, otherwise it makes
it harder to script around it (in failures_detailed on line 6284).

The test cover the different cases. I tried to suggest a simpler approach
in a previous round, but it seems not so simple to do so. They could be
simplified later, if possible.

--
Fabien.

#153

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Fabien COELHO (#152)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Fabien,

On Sat, 12 Mar 2022 15:54:54 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Yugo-san,

About Pgbench error handling v16:

Thank you for your review! I attached the updated patches.

This patch set needs a minor rebase because of 506035b0. Otherwise, patch
compiles, global and local "make check" are ok. Doc generation is ok.

I rebased it.

## About v16-2

English: "he will be aborted" -> "it will be aborted".

Fixed.

I'm still not sure I like the "failure detailed" option, ISTM that the report
could be always detailed. That would remove some complexity and I do not think
that people executing a bench with error handling would mind having the details.
No big deal.

I didn't change it because I think those who don't expect any failures using a
well designed script may not need details of failures. I think reporting such
details will be required only for benchmarks where any failures are expected.

printVerboseErrorMessages: I'd make the buffer static and initialized only once
so that there is no significant malloc/free cycle involved when calling the function.

OK. I fixed printVerboseErrorMessages to use a static variable.

advanceConnectionState: I'd really prefer not to add new variables (res, status)
in the loop scope, and only declare them when actually needed in the state branches,
so as to avoid any unwanted interaction between states.

I fixed to declare the variables in the case statement blocks.

typo: "fullowing" -> "following"

fixed.

Pipeline cleaning: the advance function is already soooo long, I'd put that in a
separate function and call it.

Ok. I made a new function "discardUntilSync" for the pipeline cleaning.

I think that the report should not remove data when they are 0, otherwise it makes
it harder to script around it (in failures_detailed on line 6284).

I fixed to report both serialization and deadlock failures always even when
they are 0.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v17-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v17-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From b4360d3c03013c86e1e62247f1c3c1378aacc38d Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v17 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..ab2c5dfc5f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1562,21 +1584,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1588,23 +1626,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1613,12 +1645,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1636,12 +1669,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1720,7 +1754,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1741,7 +1775,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1756,12 +1790,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2629,7 +2664,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2699,7 +2734,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2730,7 +2765,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2791,7 +2826,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2843,7 +2878,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2854,7 +2889,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2901,7 +2936,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -2994,7 +3029,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3055,14 +3090,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3627,7 +3662,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3648,7 +3683,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3718,7 +3753,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3726,7 +3761,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -6020,7 +6055,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6348,19 +6383,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6398,11 +6433,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6411,28 +6446,30 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = pg_prng_uint64(&base_random_sequence);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

v17-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v17-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From e5f646ef87641a67cc0003b5ec7c42c0e0beb6fe Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v17 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 435 ++++++++-
 src/bin/pgbench/pgbench.c                    | 965 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 215 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1499 insertions(+), 144 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index be1896fa99..49f57bda61 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -56,8 +56,10 @@ scaling factor: 10
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of failed transactions: 0 (0.000%)
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
@@ -65,11 +67,18 @@ tps = 896.967014 (without initial connection time)
 </screen>
 
   The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  settings.
+  The seventh line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
+  The eighth line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the number of failed transactions due to
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"
+  endterm="failures-and-retries-title"/> for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -531,6 +540,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+        for more information about retrying such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -597,23 +617,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -741,6 +767,26 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -751,6 +797,38 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information about
+        retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"
+        endterm="failures-and-retries-title"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -948,8 +1026,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts">
+  <title id="transactions-and-scripts-title">What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1022,6 +1100,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, it will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2212,7 +2295,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2233,6 +2316,17 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization</literal> or
+   <literal>deadlock</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>
+   for more information).
   </para>
 
   <para>
@@ -2261,6 +2355,41 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
+  <para>
+   If <option>--failures-detailed</option> option is used, the type of
+   failure is reported in the <replaceable>time</replaceable> like this:
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 serialization 0 1499414498 84905 9
+2 0 serialization 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2276,7 +2405,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2290,7 +2419,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2298,21 +2436,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2324,13 +2466,44 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"
+         endterm="failures-and-retries-title"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2342,29 +2515,67 @@ scaling factor: 1
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
-latency average = 10.870 ms
-latency stddev = 7.341 ms
-initial connection time = 30.954 ms
-tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+number of failed transactions: 0 (0.000%)
+number of transactions above the 50.0 ms latency limit: 1311/10000 (13.110 %)
+latency average = 28.488 ms
+latency stddev = 21.009 ms
+initial connection time = 69.068 ms
+tps = 346.224794 (without initial connection time)
+statement latencies in milliseconds and failures:
+   0.012  0  \set aid random(1, 100000 * :scale)
+   0.002  0  \set bid random(1, 1 * :scale)
+   0.002  0  \set tid random(1, 10 * :scale)
+   0.002  0  \set delta random(-5000, 5000)
+   0.319  0  BEGIN;
+   0.834  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+   0.641  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  11.126  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  12.961  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+   0.634  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+   1.957  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+maximum number of tries: 10
+number of transactions per client: 1000
+number of transactions actually processed: 6317/10000
+number of failed transactions: 3683 (36.830%)
+number of transactions retried: 7667 (76.670%)
+total number of retries: 45339
+number of transactions above the 50.0 ms latency limit: 106/6317 (1.678 %)
+latency average = 17.016 ms
+latency stddev = 13.283 ms
+initial connection time = 45.017 ms
+tps = 186.792667 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.006     0      0  \set aid random(1, 100000 * :scale)
+  0.001     0      0  \set bid random(1, 1 * :scale)
+  0.001     0      0  \set tid random(1, 10 * :scale)
+  0.001     0      0  \set delta random(-5000, 5000)
+  0.385     0      0  BEGIN;
+  0.773     0      1  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.624     0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  1.098   320   3762  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.582  3363  41576  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.465     0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.933     0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2378,6 +2589,140 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries">
+  <title id="failures-and-retries-title">Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there are three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts" endterm="transactions-and-scripts-title"/>
+   for more information). Transactions with serialization or deadlock errors are
+   repeated after rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> option to limit the
+    benchmark duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency is measured
+   only for successful transactions and commands but not for failed transactions
+   or commands.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions. If the
+   <option>--max-tries</option> option is not equal to 1, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ab2c5dfc5f..7080d2a795 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -76,6 +76,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -275,9 +277,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -362,9 +389,66 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction is defined as unsuccessfully retried transactions.
+	 * It can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +459,31 @@ typedef struct StatsData
  */
 pg_time_usec_t epoch_shift;
 
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
+/*
+ * Transaction status at the end of a command.
+ */
+typedef enum TStatus
+{
+	TSTATUS_IDLE,
+	TSTATUS_IN_BLOCK,
+	TSTATUS_CONN_ERROR,
+	TSTATUS_OTHER_ERROR
+} TStatus;
+
 /* Various random sequences are initialized from this one. */
 static pg_prng_state base_random_sequence;
 
@@ -446,6 +555,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +632,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	pg_prng_state	random_state;	/* random state */
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +740,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +755,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +771,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +910,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1287,6 +1447,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1295,22 +1459,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
+	{
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2841,6 +3034,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2848,6 +3044,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2955,6 +3162,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -2997,6 +3231,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3011,6 +3246,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3035,6 +3271,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3052,6 +3289,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3130,6 +3379,165 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Read results and discard it until a sync point.
+ */
+static int
+discardUntilSync(CState *st)
+{
+	/* send a sync */
+	if (!PQpipelineSync(st->con))
+	{
+		pg_log_error("client %d aborted: failed to send a pipeline sync",
+					st->id);
+		return 0;
+	}
+
+	/* receive PGRES_PIPELINE_SYNC and null following it */
+	for(;;)
+	{
+		PGresult *res = PQgetResult(st->con);
+		if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
+		{
+			PQclear(res);
+			res = PQgetResult(st->con);
+			Assert(res == NULL);
+			break;
+		}
+		PQclear(res);
+	}
+
+	/* exit pipline */
+	if (PQexitPipelineMode(st->con) != 1)
+	{
+		pg_log_error("client %d aborted: failed to exit pipeline mode for rolling back the failed transaction",
+					 st->id);
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Get the transaction status at the end of a command especially for
+ * checking if we are in a (failed) transaction block.
+ */
+static TStatus
+getTransactionStatus(PGconn *con)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			return TSTATUS_IDLE;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			return TSTATUS_IN_BLOCK;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+				return TSTATUS_CONN_ERROR;
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return TSTATUS_OTHER_ERROR;
+	}
+
+	/* not reached */
+	Assert(false);
+	return TSTATUS_OTHER_ERROR;
+}
+
+/*
+ * Print verbose messages of an error
+ */
+static void
+printVerboseErrorMessages(CState *st, pg_time_usec_t *now, bool is_retry)
+{
+	static PQExpBuffer buf = NULL;
+
+	if (buf == NULL)
+		buf = createPQExpBuffer();
+	else
+		resetPQExpBuffer(buf);
+
+	printfPQExpBuffer(buf, "client %d ", st->id);
+	appendPQExpBuffer(buf, "%s",
+					  (is_retry ?
+						"repeats the transaction after the error" :
+						"ends the failed transaction"));
+	appendPQExpBuffer(buf, " (try %d", st->tries);
+
+	/* Print max_tries if it is not unlimitted. */
+	if (max_tries)
+		appendPQExpBuffer(buf, "/%d", max_tries);
+
+	/*
+	 * If the latency limit is used, print a percentage of the current transaction
+	 * latency from the latency limit.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		appendPQExpBuffer(buf, ", %.3f%% of the maximum time of tries was used",
+						  (100.0 * (*now - st->txn_scheduled) / latency_limit));
+	}
+	appendPQExpBuffer(buf, ")\n");
+
+	pg_log_info("%s", buf->data);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3167,6 +3575,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3207,6 +3619,13 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember the
+				 * random state: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->random_state = st->cs_func_rs;
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3363,6 +3782,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3508,6 +3929,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3555,44 +3978,223 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
-				 * End of transaction (end of script, really).
+				 * Clean up after an error.
 				 */
-			case CSTATE_END_TX:
+			case CSTATE_ERROR:
+				{
+					TStatus		tstatus;
 
-				/* transaction finished: calculate latency and do log */
-				processXactStats(thread, st, &now, false, agg);
+					Assert(st->estatus != ESTATUS_NO_ERROR);
 
-				/*
-				 * missing \endif... cannot happen if CheckConditional was
-				 * okay
-				 */
-				Assert(conditional_stack_empty(st->cstack));
+					/* Clear the conditional stack */
+					conditional_stack_reset(st->cstack);
 
-				if (is_connect)
-				{
-					pg_time_usec_t start = now;
+					/* Read and discard until a sync point in pipeline mode */
+					if (PQpipelineStatus(st->con) != PQ_PIPELINE_OFF)
+					{
+						if (!discardUntilSync(st))
+						{
+							st->state = CSTATE_ABORTED;
+							break;
+						}
+					}
 
-					pg_time_now_lazy(&start);
-					finishCon(st);
-					now = pg_time_now();
-					thread->conn_duration += now - start;
+					/*
+					 * Check if we have a (failed) transaction block or not, and
+					 * roll it back if any.
+					 */
+					tstatus = getTransactionStatus(st->con);
+					if (tstatus == TSTATUS_IN_BLOCK)
+					{
+						/* Try to rollback a (failed) transaction block. */
+						if (!PQsendQuery(st->con, "ROLLBACK"))
+						{
+							pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+										 st->id);
+							st->state = CSTATE_ABORTED;
+						}
+						else
+							st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+					}
+					else if (tstatus == TSTATUS_IDLE)
+					{
+						/*
+						* If time is over, we're done;
+						* otherwise, check if we can retry the error.
+						*/
+						st->state = timer_exceeded ? CSTATE_FINISHED :
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+					}
+					else
+					{
+						if (tstatus == TSTATUS_CONN_ERROR)
+							pg_log_error("perhaps the backend died while processing");
+
+						pg_log_error("client %d aborted while receiving the transaction status", st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					break;
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
 				{
-					/* script completed */
-					st->state = CSTATE_FINISHED;
+					PGresult *res;
+
+					pg_log_debug("client %d receiving", st->id);
+					if (!PQconsumeInput(st->con))
+					{
+						pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+							/* OK */
+							PQclear(res);
+							/* null must be returned */
+							res = PQgetResult(st->con);
+							Assert(res == NULL);
+
+							/*
+							 * If time is over, we're done;
+							 * otherwise, check if we can retry the error.
+							 */
+							st->state = timer_exceeded ? CSTATE_FINISHED :
+								doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+							break;
+						default:
+							pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+										 st->id, PQerrorMessage(st->con));
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 					break;
 				}
 
-				/* next transaction (script) */
-				st->state = CSTATE_CHOOSE_SCRIPT;
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
 
 				/*
-				 * Ensure that we always return on this point, so as to avoid
-				 * an infinite loop if the script only contains meta commands.
+				 * Inform that the transaction will be retried after the error.
 				 */
-				return;
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, true);
+
+				/* Count tries and retries */
+				st->tries++;
+				command->retries++;
+
+				/*
+				 * Reset the random state as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->random_state;
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Record a failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, false);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
+				/*
+				 * End of transaction (end of script, really).
+				 */
+			case CSTATE_END_TX:
+				{
+					TStatus		tstatus;
+
+					/* transaction finished: calculate latency and do log */
+					processXactStats(thread, st, &now, false, agg);
+
+					/*
+					 * missing \endif... cannot happen if CheckConditional was
+					 * okay
+					 */
+					Assert(conditional_stack_empty(st->cstack));
+
+					/*
+					 * We must complete all the transaction blocks that were
+					 * started in this script.
+					 */
+					tstatus = getTransactionStatus(st->con);
+					if (tstatus == TSTATUS_IN_BLOCK)
+					{
+						pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					else if (tstatus != TSTATUS_IDLE)
+					{
+						if (tstatus == TSTATUS_CONN_ERROR)
+							pg_log_error("perhaps the backend died while processing");
+
+						pg_log_error("client %d aborted while receiving the transaction status", st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					if (is_connect)
+					{
+						pg_time_usec_t start = now;
+
+						pg_time_now_lazy(&start);
+						finishCon(st);
+						now = pg_time_now();
+						thread->conn_duration += now - start;
+					}
+
+					if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+					{
+						/* script completed */
+						st->state = CSTATE_FINISHED;
+						break;
+					}
+
+					/* next transaction (script) */
+					st->state = CSTATE_CHOOSE_SCRIPT;
+
+					/*
+					 * Ensure that we always return on this point, so as to avoid
+					 * an infinite loop if the script only contains meta commands.
+					 */
+					return;
+				}
 
 				/*
 				 * Final states.  Close the connection if it's still open.
@@ -3816,6 +4418,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3863,6 +4502,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3873,6 +4520,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3880,22 +4531,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3904,7 +4559,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3912,10 +4568,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3924,20 +4580,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3947,7 +4595,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4806,6 +5455,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5474,7 +6125,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5501,23 +6154,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5531,8 +6191,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5541,6 +6201,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5600,9 +6266,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5615,39 +6282,65 @@ printResults(StatsData *total,
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
 	printf("number of threads: %d\n", nthreads);
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+		   failures, 100.0 * failures / total_cnt);
+
+	if (failures_detailed)
+	{
+		printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+			   total->serialization_failures,
+			   100.0 * total->serialization_failures / total_cnt);
+		printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+			   total->deadlock_failures,
+			   100.0 * total->deadlock_failures / total_cnt);
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
 	}
 
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f%%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5673,7 +6366,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5692,6 +6385,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5701,25 +6397,55 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
+
+				printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+					   script_failures,
+					   100.0 * script_failures / script_total_cnt);
+
+				if (failures_detailed)
+				{
+					printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->serialization_failures,
+						   (100.0 * sstats->serialization_failures /
+							script_total_cnt));
+					printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->deadlock_failures,
+						   (100.0 * sstats->deadlock_failures /
+							script_total_cnt));
+				}
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (max_tries != 1)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5727,10 +6453,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5810,7 +6545,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5832,6 +6567,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6185,6 +6923,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6366,6 +7126,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6578,6 +7347,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6724,7 +7497,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6813,7 +7587,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index f1341092fe..d173ceae7a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,9 @@ use Config;
 
 # start a pgbench specific server
 my $node = PostgreSQL::Test::Cluster->new('main');
-$node->init;
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
 $node->start;
 
 # tablespace for testing, because partitioned tables cannot use pg_default
@@ -109,7 +111,8 @@ $node->pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1198,6 +1201,214 @@ $node->pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+$node->pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+$node->pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+$node->pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index acad19edd0..a5074c70d9 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -188,6 +188,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index 0bf877e895..5a94664989 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index b28189471c..fa53d86501 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

#154

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#153)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Yugo,

I have looked into the patch and I noticed that <xref
linkend=... endterm=...> is used in pgbench.sgml. e.g.

AFAIK this is the only place where "endterm" is used. In other places
"link" tag is used instead:

Note that the rendered result is identical. Do we want to use the link tag as well?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#155

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#154)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Yugo,

I tested with serialization error scenario by setting:
default_transaction_isolation = 'repeatable read'
The result was:

$ pgbench -t 10 -c 10 --max-tries=10 test
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of transactions retried: 35 (35.000%)
total number of retries: 74
latency average = 5.306 ms
initial connection time = 15.575 ms
tps = 1884.516810 (without initial connection time)

I had hard time to understand what those numbers mean:
number of transactions retried: 35 (35.000%)
total number of retries: 74

It seems "total number of retries" matches with the number of ERRORs
reported in PostgreSQL. Good. What I am not sure is "number of
transactions retried". What does this mean?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#156

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#155)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Yugo,

I tested with serialization error scenario by setting:
default_transaction_isolation = 'repeatable read'
The result was:

$ pgbench -t 10 -c 10 --max-tries=10 test
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of transactions retried: 35 (35.000%)
total number of retries: 74
latency average = 5.306 ms
initial connection time = 15.575 ms
tps = 1884.516810 (without initial connection time)

I had hard time to understand what those numbers mean:
number of transactions retried: 35 (35.000%)
total number of retries: 74

It seems "total number of retries" matches with the number of ERRORs
reported in PostgreSQL. Good. What I am not sure is "number of
transactions retried". What does this mean?

Oh, ok. I see it now. It turned out that "number of transactions
retried" does not actually means the number of transactions
rtried. Suppose pgbench exectutes following in a session:

BEGIN; -- transaction A starts
:
(ERROR)
ROLLBACK; -- transaction A aborts

(retry)

BEGIN; -- transaction B starts
:
(ERROR)
ROLLBACK; -- transaction B aborts

(retry)

BEGIN; -- transaction C starts
:
END; -- finally succeeds

In this case "total number of retries:" = 2 and "number of
transactions retried:" = 1. In this patch transactions A, B and C are
regarded as "same" transaction, so the retried transaction count
becomes 1. But it's confusing to use the language "transaction" here
because A, B and C are different transactions. I would think it's
better to use different language instead of "transaction", something
like "cycle"? i.e.

number of cycles retried: 35 (35.000%)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#157

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#154)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Ishii-san,

On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Hi Yugo,

I have looked into the patch and I noticed that <xref
linkend=... endterm=...> is used in pgbench.sgml. e.g.

<xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>

AFAIK this is the only place where "endterm" is used. In other places
"link" tag is used instead:

Thank you for pointing out it.

I've checked other places using <xref/> referring to <refsect2>, and found
that "xreflabel"s are used in such <refsect2> tags. So, I'll fix it
in this style.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#158

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#156)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Hi Yugo,

I tested with serialization error scenario by setting:
default_transaction_isolation = 'repeatable read'
The result was:

$ pgbench -t 10 -c 10 --max-tries=10 test
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of transactions retried: 35 (35.000%)
total number of retries: 74
latency average = 5.306 ms
initial connection time = 15.575 ms
tps = 1884.516810 (without initial connection time)

I had hard time to understand what those numbers mean:
number of transactions retried: 35 (35.000%)
total number of retries: 74

It seems "total number of retries" matches with the number of ERRORs
reported in PostgreSQL. Good. What I am not sure is "number of
transactions retried". What does this mean?

Oh, ok. I see it now. It turned out that "number of transactions
retried" does not actually means the number of transactions
rtried. Suppose pgbench exectutes following in a session:

BEGIN; -- transaction A starts
:
(ERROR)
ROLLBACK; -- transaction A aborts

(retry)

BEGIN; -- transaction B starts
:
(ERROR)
ROLLBACK; -- transaction B aborts

(retry)

BEGIN; -- transaction C starts
:
END; -- finally succeeds

In this case "total number of retries:" = 2 and "number of
transactions retried:" = 1. In this patch transactions A, B and C are
regarded as "same" transaction, so the retried transaction count
becomes 1. But it's confusing to use the language "transaction" here
because A, B and C are different transactions. I would think it's
better to use different language instead of "transaction", something
like "cycle"? i.e.

number of cycles retried: 35 (35.000%)

In the original patch by Marina Polyakova it was "number of retried",
but I changed it to "number of transactions retried" is because I felt
it was confusing with "number of retries". I chose the word "transaction"
because a transaction ends in any one of successful commit , skipped, or
failure, after possible retries.

Well, I agree with that it is somewhat confusing wording. If we can find
nice word to resolve the confusion, I don't mind if we change the word.
Maybe, we can use "executions" as well as "cycles". However, I am not sure
that the situation is improved by using such word because what such word
exactly means seems to be still unclear for users.

Another idea is instead reporting only "the number of successfully
retried transactions" that does not include "failed transactions",
that is, transactions failed after retries, like this;

number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of successfully retried transactions: 35 (35.000%)
total number of retries: 74

The meaning is clear and there seems to be no confusion.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#159

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#158)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Sun, 20 Mar 2022 16:11:43 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Hi Yugo,

I tested with serialization error scenario by setting:
default_transaction_isolation = 'repeatable read'
The result was:

$ pgbench -t 10 -c 10 --max-tries=10 test
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of transactions retried: 35 (35.000%)
total number of retries: 74
latency average = 5.306 ms
initial connection time = 15.575 ms
tps = 1884.516810 (without initial connection time)

I had hard time to understand what those numbers mean:
number of transactions retried: 35 (35.000%)
total number of retries: 74

It seems "total number of retries" matches with the number of ERRORs
reported in PostgreSQL. Good. What I am not sure is "number of
transactions retried". What does this mean?

Oh, ok. I see it now. It turned out that "number of transactions
retried" does not actually means the number of transactions
rtried. Suppose pgbench exectutes following in a session:

BEGIN; -- transaction A starts
:
(ERROR)
ROLLBACK; -- transaction A aborts

(retry)

BEGIN; -- transaction B starts
:
(ERROR)
ROLLBACK; -- transaction B aborts

(retry)

BEGIN; -- transaction C starts
:
END; -- finally succeeds

In this case "total number of retries:" = 2 and "number of
transactions retried:" = 1. In this patch transactions A, B and C are
regarded as "same" transaction, so the retried transaction count
becomes 1. But it's confusing to use the language "transaction" here
because A, B and C are different transactions. I would think it's
better to use different language instead of "transaction", something
like "cycle"? i.e.

number of cycles retried: 35 (35.000%)

I realized that the same argument can be applied even to "number of
transactions actually processed" because with the retry feature,
"transaction" could comprise multiple transactions.

But if we go forward and replace those "transactions" with "cycles"
(or whatever) altogether, probably it could bring enough confusion to
users who have been using pgbench. Probably we should give up the
language changing and redefine "transaction" when the retry feature is
enabled instead like "when retry feature is enabled, each transaction
can be consisted of multiple transactions retried."

In the original patch by Marina Polyakova it was "number of retried",
but I changed it to "number of transactions retried" is because I felt
it was confusing with "number of retries". I chose the word "transaction"
because a transaction ends in any one of successful commit , skipped, or
failure, after possible retries.

Ok.

Well, I agree with that it is somewhat confusing wording. If we can find
nice word to resolve the confusion, I don't mind if we change the word.
Maybe, we can use "executions" as well as "cycles". However, I am not sure
that the situation is improved by using such word because what such word
exactly means seems to be still unclear for users.

Another idea is instead reporting only "the number of successfully
retried transactions" that does not include "failed transactions",
that is, transactions failed after retries, like this;

number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
number of successfully retried transactions: 35 (35.000%)
total number of retries: 74

The meaning is clear and there seems to be no confusion.

Thank you for the suggestion. But I think it would better to leave it
as it is because of the reason I mentioned above.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#160

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#157)

2 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Tue, 22 Mar 2022 09:08:15 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi Ishii-san,

On Sun, 20 Mar 2022 09:52:06 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Hi Yugo,

I have looked into the patch and I noticed that <xref
linkend=... endterm=...> is used in pgbench.sgml. e.g.

<xref linkend="failures-and-retries" endterm="failures-and-retries-title"/>

AFAIK this is the only place where "endterm" is used. In other places
"link" tag is used instead:

Thank you for pointing out it.

I've checked other places using <xref/> referring to <refsect2>, and found
that "xreflabel"s are used in such <refsect2> tags. So, I'll fix it
in this style.

I attached the updated patch. I also fixed the following paragraph which I had
forgotten to fix in the previous patch.

The first seven lines report some of the most important parameter settings.
The sixth line reports the maximum number of tries for transactions with
serialization or deadlock errors

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

v18-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchtext/x-diff; name=v18-0001-Pgbench-errors-use-the-Variables-structure-for-c.patchDownload

From b9f993e81836c4379478809da0e690023c319038 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 26 May 2021 16:58:36 +0900
Subject: [PATCH v18 1/2] Pgbench errors: use the Variables structure for
 client variables

This is most important when it is used to reset client variables during the
repeating of transactions after serialization/deadlock failures.

Don't allocate Variable structs one by one. Instead, add a constant margin each
time it overflows.
---
 src/bin/pgbench/pgbench.c | 163 +++++++++++++++++++++++---------------
 1 file changed, 100 insertions(+), 63 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 000ffc4a5c..ab2c5dfc5f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -289,6 +289,12 @@ const char *progname;
 
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
+/*
+ * We don't want to allocate variables one by one; for efficiency, add a
+ * constant margin each time it overflows.
+ */
+#define VARIABLES_ALLOC_MARGIN	8
+
 /*
  * Variable definitions.
  *
@@ -306,6 +312,24 @@ typedef struct
 	PgBenchValue value;			/* actual variable's value */
 } Variable;
 
+/*
+ * Data structure for client variables.
+ */
+typedef struct
+{
+	Variable   *vars;			/* array of variable definitions */
+	int			nvars;			/* number of variables */
+
+	/*
+	 * The maximum number of variables that we can currently store in 'vars'
+	 * without having to reallocate more space. We must always have max_vars >=
+	 * nvars.
+	 */
+	int			max_vars;
+
+	bool		vars_sorted;	/* are variables sorted by name? */
+} Variables;
+
 #define MAX_SCRIPTS		128		/* max number of SQL scripts allowed */
 #define SHELL_COMMAND_SIZE	256 /* maximum size allowed for shell command */
 
@@ -460,9 +484,7 @@ typedef struct
 	int			command;		/* command number in script */
 
 	/* client variables */
-	Variable   *variables;		/* array of variable definitions */
-	int			nvariables;		/* number of variables */
-	bool		vars_sorted;	/* are variables sorted by name? */
+	Variables   variables;
 
 	/* various times about current transaction in microseconds */
 	pg_time_usec_t txn_scheduled;	/* scheduled start time of transaction */
@@ -1398,39 +1420,39 @@ compareVariableNames(const void *v1, const void *v2)
 
 /* Locate a variable by name; returns NULL if unknown */
 static Variable *
-lookupVariable(CState *st, char *name)
+lookupVariable(Variables *variables, char *name)
 {
 	Variable	key;
 
 	/* On some versions of Solaris, bsearch of zero items dumps core */
-	if (st->nvariables <= 0)
+	if (variables->nvars <= 0)
 		return NULL;
 
 	/* Sort if we have to */
-	if (!st->vars_sorted)
+	if (!variables->vars_sorted)
 	{
-		qsort((void *) st->variables, st->nvariables, sizeof(Variable),
+		qsort((void *) variables->vars, variables->nvars, sizeof(Variable),
 			  compareVariableNames);
-		st->vars_sorted = true;
+		variables->vars_sorted = true;
 	}
 
 	/* Now we can search */
 	key.name = name;
 	return (Variable *) bsearch((void *) &key,
-								(void *) st->variables,
-								st->nvariables,
+								(void *) variables->vars,
+								variables->nvars,
 								sizeof(Variable),
 								compareVariableNames);
 }
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(Variables *variables, char *name)
 {
 	Variable   *var;
 	char		stringform[64];
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 		return NULL;			/* not found */
 
@@ -1562,21 +1584,37 @@ valid_variable_name(const char *name)
 	return true;
 }
 
+/*
+ * Make sure there is enough space for 'needed' more variable in the variables
+ * array.
+ */
+static void
+enlargeVariables(Variables *variables, int needed)
+{
+	/* total number of variables required now */
+	needed += variables->nvars;
+
+	if (variables->max_vars < needed)
+	{
+		variables->max_vars = needed + VARIABLES_ALLOC_MARGIN;
+		variables->vars = (Variable *)
+			pg_realloc(variables->vars, variables->max_vars * sizeof(Variable));
+	}
+}
+
 /*
  * Lookup a variable by name, creating it if need be.
  * Caller is expected to assign a value to the variable.
  * Returns NULL on failure (bad name).
  */
 static Variable *
-lookupCreateVariable(CState *st, const char *context, char *name)
+lookupCreateVariable(Variables *variables, const char *context, char *name)
 {
 	Variable   *var;
 
-	var = lookupVariable(st, name);
+	var = lookupVariable(variables, name);
 	if (var == NULL)
 	{
-		Variable   *newvars;
-
 		/*
 		 * Check for the name only when declaring a new variable to avoid
 		 * overhead.
@@ -1588,23 +1626,17 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 		}
 
 		/* Create variable at the end of the array */
-		if (st->variables)
-			newvars = (Variable *) pg_realloc(st->variables,
-											  (st->nvariables + 1) * sizeof(Variable));
-		else
-			newvars = (Variable *) pg_malloc(sizeof(Variable));
-
-		st->variables = newvars;
+		enlargeVariables(variables, 1);
 
-		var = &newvars[st->nvariables];
+		var = &(variables->vars[variables->nvars]);
 
 		var->name = pg_strdup(name);
 		var->svalue = NULL;
 		/* caller is expected to initialize remaining fields */
 
-		st->nvariables++;
+		variables->nvars++;
 		/* we don't re-sort the array till we have to */
-		st->vars_sorted = false;
+		variables->vars_sorted = false;
 	}
 
 	return var;
@@ -1613,12 +1645,13 @@ lookupCreateVariable(CState *st, const char *context, char *name)
 /* Assign a string value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariable(CState *st, const char *context, char *name, const char *value)
+putVariable(Variables *variables, const char *context, char *name,
+			const char *value)
 {
 	Variable   *var;
 	char	   *val;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1636,12 +1669,12 @@ putVariable(CState *st, const char *context, char *name, const char *value)
 /* Assign a value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableValue(CState *st, const char *context, char *name,
+putVariableValue(Variables *variables, const char *context, char *name,
 				 const PgBenchValue *value)
 {
 	Variable   *var;
 
-	var = lookupCreateVariable(st, context, name);
+	var = lookupCreateVariable(variables, context, name);
 	if (!var)
 		return false;
 
@@ -1656,12 +1689,13 @@ putVariableValue(CState *st, const char *context, char *name,
 /* Assign an integer value to a variable, creating it if need be */
 /* Returns false on failure (bad name) */
 static bool
-putVariableInt(CState *st, const char *context, char *name, int64 value)
+putVariableInt(Variables *variables, const char *context, char *name,
+			   int64 value)
 {
 	PgBenchValue val;
 
 	setIntValue(&val, value);
-	return putVariableValue(st, context, name, &val);
+	return putVariableValue(variables, context, name, &val);
 }
 
 /*
@@ -1720,7 +1754,7 @@ replaceVariable(char **sql, char *param, int len, char *value)
 }
 
 static char *
-assignVariables(CState *st, char *sql)
+assignVariables(Variables *variables, char *sql)
 {
 	char	   *p,
 			   *name,
@@ -1741,7 +1775,7 @@ assignVariables(CState *st, char *sql)
 			continue;
 		}
 
-		val = getVariable(st, name);
+		val = getVariable(variables, name);
 		free(name);
 		if (val == NULL)
 		{
@@ -1756,12 +1790,13 @@ assignVariables(CState *st, char *sql)
 }
 
 static void
-getQueryParams(CState *st, const Command *command, const char **params)
+getQueryParams(Variables *variables, const Command *command,
+			   const char **params)
 {
 	int			i;
 
 	for (i = 0; i < command->argc - 1; i++)
-		params[i] = getVariable(st, command->argv[i + 1]);
+		params[i] = getVariable(variables, command->argv[i + 1]);
 }
 
 static char *
@@ -2629,7 +2664,7 @@ evaluateExpr(CState *st, PgBenchExpr *expr, PgBenchValue *retval)
 			{
 				Variable   *var;
 
-				if ((var = lookupVariable(st, expr->u.variable.varname)) == NULL)
+				if ((var = lookupVariable(&st->variables, expr->u.variable.varname)) == NULL)
 				{
 					pg_log_error("undefined variable \"%s\"", expr->u.variable.varname);
 					return false;
@@ -2699,7 +2734,7 @@ getMetaCommand(const char *cmd)
  * Return true if succeeded, or false on error.
  */
 static bool
-runShellCommand(CState *st, char *variable, char **argv, int argc)
+runShellCommand(Variables *variables, char *variable, char **argv, int argc)
 {
 	char		command[SHELL_COMMAND_SIZE];
 	int			i,
@@ -2730,7 +2765,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		{
 			arg = argv[i] + 1;	/* a string literal starting with colons */
 		}
-		else if ((arg = getVariable(st, argv[i] + 1)) == NULL)
+		else if ((arg = getVariable(variables, argv[i] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[i]);
 			return false;
@@ -2791,7 +2826,7 @@ runShellCommand(CState *st, char *variable, char **argv, int argc)
 		pg_log_error("%s: shell command must return an integer (not \"%s\")", argv[0], res);
 		return false;
 	}
-	if (!putVariableInt(st, "setshell", variable, retval))
+	if (!putVariableInt(variables, "setshell", variable, retval))
 		return false;
 
 	pg_log_debug("%s: shell parameter name: \"%s\", value: \"%s\"", argv[0], argv[1], res);
@@ -2843,7 +2878,7 @@ sendCommand(CState *st, Command *command)
 		char	   *sql;
 
 		sql = pg_strdup(command->argv[0]);
-		sql = assignVariables(st, sql);
+		sql = assignVariables(&st->variables, sql);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQuery(st->con, sql);
@@ -2854,7 +2889,7 @@ sendCommand(CState *st, Command *command)
 		const char *sql = command->argv[0];
 		const char *params[MAX_ARGS];
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 
 		pg_log_debug("client %d sending %s", st->id, sql);
 		r = PQsendQueryParams(st->con, sql, command->argc - 1,
@@ -2901,7 +2936,7 @@ sendCommand(CState *st, Command *command)
 			st->prepared[st->use_file] = true;
 		}
 
-		getQueryParams(st, command, params);
+		getQueryParams(&st->variables, command, params);
 		preparedStatementName(name, st->use_file, st->command);
 
 		pg_log_debug("client %d sending %s", st->id, name);
@@ -2994,7 +3029,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							varname = psprintf("%s%s", varprefix, varname);
 
 						/* store last row result as a string */
-						if (!putVariable(st, meta == META_ASET ? "aset" : "gset", varname,
+						if (!putVariable(&st->variables, meta == META_ASET ? "aset" : "gset", varname,
 										 PQgetvalue(res, ntuples - 1, fld)))
 						{
 							/* internal error */
@@ -3055,14 +3090,14 @@ error:
  * of delay, in microseconds.  Returns true on success, false on error.
  */
 static bool
-evaluateSleep(CState *st, int argc, char **argv, int *usecs)
+evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 {
 	char	   *var;
 	int			usec;
 
 	if (*argv[1] == ':')
 	{
-		if ((var = getVariable(st, argv[1] + 1)) == NULL)
+		if ((var = getVariable(variables, argv[1] + 1)) == NULL)
 		{
 			pg_log_error("%s: undefined variable \"%s\"", argv[0], argv[1] + 1);
 			return false;
@@ -3627,7 +3662,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 		 * latency will be recorded in CSTATE_SLEEP state, not here, after the
 		 * delay has elapsed.)
 		 */
-		if (!evaluateSleep(st, argc, argv, &usec))
+		if (!evaluateSleep(&st->variables, argc, argv, &usec))
 		{
 			commandFailed(st, "sleep", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3648,7 +3683,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 			return CSTATE_ABORTED;
 		}
 
-		if (!putVariableValue(st, argv[0], argv[1], &result))
+		if (!putVariableValue(&st->variables, argv[0], argv[1], &result))
 		{
 			commandFailed(st, "set", "assignment of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3718,7 +3753,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SETSHELL)
 	{
-		if (!runShellCommand(st, argv[1], argv + 2, argc - 2))
+		if (!runShellCommand(&st->variables, argv[1], argv + 2, argc - 2))
 		{
 			commandFailed(st, "setshell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -3726,7 +3761,7 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	}
 	else if (command->meta == META_SHELL)
 	{
-		if (!runShellCommand(st, NULL, argv + 1, argc - 1))
+		if (!runShellCommand(&st->variables, NULL, argv + 1, argc - 1))
 		{
 			commandFailed(st, "shell", "execution of meta-command failed");
 			return CSTATE_ABORTED;
@@ -6020,7 +6055,7 @@ main(int argc, char **argv)
 					}
 
 					*p++ = '\0';
-					if (!putVariable(&state[0], "option", optarg, p))
+					if (!putVariable(&state[0].variables, "option", optarg, p))
 						exit(1);
 				}
 				break;
@@ -6348,19 +6383,19 @@ main(int argc, char **argv)
 			int			j;
 
 			state[i].id = i;
-			for (j = 0; j < state[0].nvariables; j++)
+			for (j = 0; j < state[0].variables.nvars; j++)
 			{
-				Variable   *var = &state[0].variables[j];
+				Variable   *var = &state[0].variables.vars[j];
 
 				if (var->value.type != PGBT_NO_VALUE)
 				{
-					if (!putVariableValue(&state[i], "startup",
+					if (!putVariableValue(&state[i].variables, "startup",
 										  var->name, &var->value))
 						exit(1);
 				}
 				else
 				{
-					if (!putVariable(&state[i], "startup",
+					if (!putVariable(&state[i].variables, "startup",
 									 var->name, var->svalue))
 						exit(1);
 				}
@@ -6398,11 +6433,11 @@ main(int argc, char **argv)
 	 * :scale variables normally get -s or database scale, but don't override
 	 * an explicit -D switch
 	 */
-	if (lookupVariable(&state[0], "scale") == NULL)
+	if (lookupVariable(&state[0].variables, "scale") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
 		{
-			if (!putVariableInt(&state[i], "startup", "scale", scale))
+			if (!putVariableInt(&state[i].variables, "startup", "scale", scale))
 				exit(1);
 		}
 	}
@@ -6411,28 +6446,30 @@ main(int argc, char **argv)
 	 * Define a :client_id variable that is unique per connection. But don't
 	 * override an explicit -D switch.
 	 */
-	if (lookupVariable(&state[0], "client_id") == NULL)
+	if (lookupVariable(&state[0].variables, "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "client_id", i))
+			if (!putVariableInt(&state[i].variables, "startup", "client_id", i))
 				exit(1);
 	}
 
 	/* set default seed for hash functions */
-	if (lookupVariable(&state[0], "default_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "default_seed") == NULL)
 	{
 		uint64		seed = pg_prng_uint64(&base_random_sequence);
 
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "default_seed", (int64) seed))
+			if (!putVariableInt(&state[i].variables, "startup", "default_seed",
+								(int64) seed))
 				exit(1);
 	}
 
 	/* set random seed unless overwritten */
-	if (lookupVariable(&state[0], "random_seed") == NULL)
+	if (lookupVariable(&state[0].variables, "random_seed") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+			if (!putVariableInt(&state[i].variables, "startup", "random_seed",
+								random_seed))
 				exit(1);
 	}
 
-- 
2.17.1

v18-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchtext/x-diff; name=v18-0002-Pgbench-errors-and-serialization-deadlock-retrie.patchDownload

From 686a2beb6b4fd20e3f21c544ffbacb22782c3409 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Mon, 7 Jun 2021 18:35:14 +0900
Subject: [PATCH v18 2/2] Pgbench errors and serialization/deadlock retries

Client's run is aborted in case of a serious error, for example, the
connection with the database server was lost or the end of script reached
without completing the last transaction. In addition, if an execution of SQL
or meta command fails for reasons other than serialization or deadlock errors,
the client is aborted. Otherwise, if an SQL fails with serialization or
deadlock errors, the current transaction is rolled back which also
includes setting the client variables as they were before the run of this
transaction (it is assumed that one transaction script contains only one
transaction).

Transactions with serialization or deadlock errors are repeated after
rollbacks until they complete successfully or reach the maximum number of
tries (specified by the --max-tries option) / the maximum time of tries
(specified by the --latency-limit option).  These options can be combined
together; more over, you cannot use an unlimited number of tries (--max-tries=0)
without the --latency-limit option or the --time option. By default the option
--max-tries is set to 1 and transactions with serialization/deadlock errors
are not retried. If the last transaction run fails, this transaction will be
reported as failed, and the client variables will be set as they were before
the first run of this transaction.

If there're retries and/or failures their statistics are printed in the
progress, in the transaction / aggregation logs and in the end with other
results (all and for each script). Also retries and failures are printed
per-command with average latencies if you use the appropriate benchmarking
option (--report-per-command, -r). If you want to group failures by basic types
(serialization failures / deadlock failures), use the option --failures-detailed.

If you want to distinguish all errors and failures (errors without retrying) by
type including which limit for retries was violated and how far it was exceeded
for the serialization/deadlock failures, use the options --verbose-errors.
---
 doc/src/sgml/ref/pgbench.sgml                | 431 ++++++++-
 src/bin/pgbench/pgbench.c                    | 965 +++++++++++++++++--
 src/bin/pgbench/t/001_pgbench_with_server.pl | 215 ++++-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  10 +
 src/fe_utils/conditional.c                   |  16 +-
 src/include/fe_utils/conditional.h           |   2 +
 6 files changed, 1494 insertions(+), 145 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index be1896fa99..ebdb4b3f46 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -56,20 +56,29 @@ scaling factor: 10
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
+number of failed transactions: 0 (0.000%)
 latency average = 11.013 ms
 latency stddev = 7.351 ms
 initial connection time = 45.758 ms
 tps = 896.967014 (without initial connection time)
 </screen>
 
-  The first six lines report some of the most important parameter
-  settings.  The next line reports the number of transactions completed
+  The first seven lines report some of the most important parameter
+  settings.
+  The sixth line reports the maximum number of tries for transactions with
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"/> 
+  for more information).
+  The eighth line reports the number of transactions completed
   and intended (the latter being just the product of number of clients
   and number of transactions per client); these will be equal unless the run
-  failed before completion.  (In <option>-T</option> mode, only the actual
-  number of transactions is printed.)
+  failed before completion or some SQL command(s) failed.  (In
+  <option>-T</option> mode, only the actual number of transactions is printed.)
+  The next line reports the number of failed transactions due to
+  serialization or deadlock errors (see <xref linkend="failures-and-retries"/>
+  for more information).
   The last line reports the number of transactions per second.
  </para>
 
@@ -531,6 +540,17 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         at all. They are counted and reported separately as
         <firstterm>skipped</firstterm>.
        </para>
+       <para>
+        When the <option>--max-tries</option> option is used, the transaction with
+        serialization or deadlock error cannot be retried if the total time of
+        all its tries is greater than <replaceable>limit</replaceable> ms. To
+        limit only the time of tries and not their number, use
+        <literal>--max-tries=0</literal>. By default option
+        <option>--max-tries</option> is set to 1 and transactions with
+        serialization/deadlock errors are not retried. See <xref
+        linkend="failures-and-retries"/> for more information about retrying
+        such transactions.
+       </para>
        </listitem>
      </varlistentry>
 
@@ -597,23 +617,29 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <para>
         Show progress report every <replaceable>sec</replaceable> seconds.  The report
         includes the time since the beginning of the run, the TPS since the
-        last report, and the transaction latency average and standard
-        deviation since the last report.  Under throttling (<option>-R</option>),
-        the latency is computed with respect to the transaction scheduled
-        start time, not the actual transaction beginning time, thus it also
-        includes the average schedule lag time.
+        last report, and the transaction latency average, standard deviation,
+        and the number of failed transactions since the last report. Under
+        throttling (<option>-R</option>), the latency is computed with respect
+        to the transaction scheduled start time, not the actual transaction
+        beginning time, thus it also includes the average schedule lag time.
+        When <option>--max-tries</option> is used to enable transactions retries
+        after serialization/deadlock errors, the report includes the number of
+        retried transactions and the sum of all retries.
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
       <term><option>-r</option></term>
-      <term><option>--report-latencies</option></term>
+      <term><option>--report-per-command</option></term>
       <listitem>
        <para>
-        Report the average per-statement latency (execution time from the
-        perspective of the client) of each command after the benchmark
-        finishes.  See below for details.
+        Report the following statistics for each command after the benchmark
+        finishes: the average per-statement latency (execution time from the
+        perspective of the client), the number of failures and the number of
+        retries after serialization or deadlock errors in this command.  The
+        report displays retry statistics only if the
+        <option>--max-tries</option> option is not equal to 1.
        </para>
       </listitem>
      </varlistentry>
@@ -741,6 +767,25 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--failures-detailed</option></term>
+      <listitem>
+       <para>
+        Report failures in per-transaction and aggregation logs, as well as in
+        the main and per-script reports, grouped by the following types:
+        <itemizedlist>
+         <listitem>
+          <para>serialization failures;</para>
+         </listitem>
+         <listitem>
+          <para>deadlock failures;</para>
+         </listitem>
+        </itemizedlist>
+        See <xref linkend="failures-and-retries"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--log-prefix=<replaceable>prefix</replaceable></option></term>
       <listitem>
@@ -751,6 +796,36 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-tries=<replaceable>number_of_tries</replaceable></option></term>
+      <listitem>
+       <para>
+        Enable retries for transactions with serialization/deadlock errors and
+        set the maximum number of these tries. This option can be combined with
+        the <option>--latency-limit</option> option which limits the total time
+        of all transaction tries; moreover, you cannot use an unlimited number
+        of tries (<literal>--max-tries=0</literal>) without
+        <option>--latency-limit</option> or <option>--time</option>.
+        The default value is 1 and transactions with serialization/deadlock
+        errors are not retried. See <xref linkend="failures-and-retries"/>
+        for more information about retrying such transactions.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--verbose-errors</option></term>
+      <listitem>
+       <para>
+        Print messages about all errors and failures (errors without retrying)
+        including which limit for retries was violated and how far it was
+        exceeded for the serialization/deadlock failures. (Note that in this
+        case the output can be significantly increased.).
+        See <xref linkend="failures-and-retries"/> for more information.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--progress-timestamp</option></term>
       <listitem>
@@ -948,8 +1023,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
  <refsect1>
   <title>Notes</title>
 
- <refsect2>
-  <title>What Is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
+ <refsect2 id="transactions-and-scripts" xreflabel="What is the &quot;Transaction&quot; Actually Performed in pgbench?">
+  <title>What is the <quote>Transaction</quote> Actually Performed in <application>pgbench</application>?</title>
 
   <para>
    <application>pgbench</application> executes test scripts chosen randomly
@@ -1022,6 +1097,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
     both old and new versions of <application>pgbench</application>, be sure to write
     each SQL command on a single line ending with a semicolon.
    </para>
+   <para>
+    It is assumed that pgbench scripts do not contain incomplete blocks of SQL
+    transactions. If at runtime the client reaches the end of the script without
+    completing the last transaction block, it will be aborted.
+   </para>
   </note>
 
   <para>
@@ -2212,7 +2292,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional>
+<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2233,6 +2313,16 @@ END;
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
+   <replaceable>retries</replaceable> is the sum of all retries after the
+   serialization or deadlock errors during the current script execution. It is
+   present only if the <option>--max-tries</option> option is not equal to 1.
+   If the transaction ends with a failure, its <replaceable>time</replaceable>
+   will be reported as <literal>failed</literal>. If you use the
+   <option>--failures-detailed</option> option, the
+   <replaceable>time</replaceable> of the failed transaction will be reported as
+   <literal>serialization</literal> or
+   <literal>deadlock</literal> depending on the type of failure (see
+   <xref linkend="failures-and-retries"/> for more information).
   </para>
 
   <para>
@@ -2261,6 +2351,41 @@ END;
    were already late before they were even started.
   </para>
 
+  <para>
+   The following example shows a snippet of a log file with failures and
+   retries, with the maximum number of tries set to 10 (note the additional
+   <replaceable>retries</replaceable> column):
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 failed 0 1499414498 84905 9
+2 0 failed 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
+  <para>
+   If <option>--failures-detailed</option> option is used, the type of
+   failure is reported in the <replaceable>time</replaceable> like this:
+<screen>
+3 0 47423 0 1499414498 34501 3
+3 1 8333 0 1499414498 42848 0
+3 2 8358 0 1499414498 51219 0
+4 0 72345 0 1499414498 59433 6
+1 3 41718 0 1499414498 67879 4
+1 4 8416 0 1499414498 76311 0
+3 3 33235 0 1499414498 84469 3
+0 0 serialization 0 1499414498 84905 9
+2 0 serialization 0 1499414498 86248 9
+3 4 8307 0 1499414498 92788 0
+</screen>
+  </para>
+
   <para>
    When running a long test on hardware that can handle a lot of transactions,
    the log files can become very large.  The <option>--sampling-rate</option> option
@@ -2276,7 +2401,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where
@@ -2290,7 +2415,16 @@ END;
    transaction latencies within the interval,
    <replaceable>min_latency</replaceable> is the minimum latency within the interval,
    and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval.
+   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
+   <replaceable>failures</replaceable> is the number of transactions that ended
+   with a failed SQL command within the interval. If you use option
+   <option>--failures-detailed</option>, instead of the sum of all failed
+   transactions you will get more detailed statistics for the failed
+   transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
    and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
@@ -2298,21 +2432,25 @@ END;
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
-   The very last field, <replaceable>skipped</replaceable>,
+   The next field, <replaceable>skipped</replaceable>,
    is only present if the <option>--latency-limit</option> option is used, too.
    It counts the number of transactions skipped because they would have
    started too late.
+   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
+   fields are present only if the <option>--max-tries</option> option is not
+   equal to 1. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.
    Each transaction is counted in the interval when it was committed.
   </para>
 
   <para>
    Here is some example output:
 <screen>
-1345828501 5601 1542744 483552416 61 2573
-1345828503 7884 1979812 565806736 60 1479
-1345828505 7208 1979422 567277552 59 1391
-1345828507 7685 1980268 569784714 60 1398
-1345828509 7073 1979779 573489941 236 1411
+1345828501 5601 1542744 483552416 61 2573 0
+1345828503 7884 1979812 565806736 60 1479 0
+1345828505 7208 1979422 567277552 59 1391 0
+1345828507 7685 1980268 569784714 60 1398 0
+1345828509 7073 1979779 573489941 236 1411 0
 </screen></para>
 
   <para>
@@ -2324,13 +2462,42 @@ END;
  </refsect2>
 
  <refsect2>
-  <title>Per-Statement Latencies</title>
+  <title>Per-Statement Report</title>
+
+  <para>
+   With the <option>-r</option> option, <application>pgbench</application>
+   collects the following statistics for each statement:
+   <itemizedlist>
+     <listitem>
+       <para>
+         <literal>latency</literal> &mdash; elapsed transaction time for each
+         statement. <application>pgbench</application> reports an average value
+         of all successful runs of the statement.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of failures in this statement. See
+         <xref linkend="failures-and-retries"/> for more information.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         The number of retries after a serialization or a deadlock error in this
+         statement. See <xref linkend="failures-and-retries"/> for more information.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   The report displays retry statistics only if the <option>--max-tries</option>
+   option is not equal to 1.
+  </para>
 
   <para>
-   With the <option>-r</option> option, <application>pgbench</application> collects
-   the elapsed transaction time of each statement executed by every
-   client.  It then reports an average of those values, referred to
-   as the latency for each statement, after the benchmark has finished.
+   All values are computed for each statement executed by every client and are
+   reported after the benchmark has finished.
   </para>
 
   <para>
@@ -2342,29 +2509,67 @@ scaling factor: 1
 query mode: simple
 number of clients: 10
 number of threads: 1
+maximum number of tries: 1
 number of transactions per client: 1000
 number of transactions actually processed: 10000/10000
-latency average = 10.870 ms
-latency stddev = 7.341 ms
-initial connection time = 30.954 ms
-tps = 907.949122 (without initial connection time)
-statement latencies in milliseconds:
-    0.001  \set aid random(1, 100000 * :scale)
-    0.001  \set bid random(1, 1 * :scale)
-    0.001  \set tid random(1, 10 * :scale)
-    0.000  \set delta random(-5000, 5000)
-    0.046  BEGIN;
-    0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-    0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-    4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-    5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-    0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-    0.974  END;
+number of failed transactions: 0 (0.000%)
+number of transactions above the 50.0 ms latency limit: 1311/10000 (13.110 %)
+latency average = 28.488 ms
+latency stddev = 21.009 ms
+initial connection time = 69.068 ms
+tps = 346.224794 (without initial connection time)
+statement latencies in milliseconds and failures:
+   0.012  0  \set aid random(1, 100000 * :scale)
+   0.002  0  \set bid random(1, 1 * :scale)
+   0.002  0  \set tid random(1, 10 * :scale)
+   0.002  0  \set delta random(-5000, 5000)
+   0.319  0  BEGIN;
+   0.834  0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+   0.641  0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  11.126  0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  12.961  0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+   0.634  0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+   1.957  0  END;
 </screen>
+
+   Another example of output for the default script using serializable default
+   transaction isolation level (<command>PGOPTIONS='-c
+   default_transaction_isolation=serializable' pgbench ...</command>):
+<screen>
+starting vacuum...end.
+transaction type: &lt;builtin: TPC-B (sort of)&gt;
+scaling factor: 1
+query mode: simple
+number of clients: 10
+number of threads: 1
+maximum number of tries: 10
+number of transactions per client: 1000
+number of transactions actually processed: 6317/10000
+number of failed transactions: 3683 (36.830%)
+number of transactions retried: 7667 (76.670%)
+total number of retries: 45339
+number of transactions above the 50.0 ms latency limit: 106/6317 (1.678 %)
+latency average = 17.016 ms
+latency stddev = 13.283 ms
+initial connection time = 45.017 ms
+tps = 186.792667 (without initial connection time)
+statement latencies in milliseconds, failures and retries:
+  0.006     0      0  \set aid random(1, 100000 * :scale)
+  0.001     0      0  \set bid random(1, 1 * :scale)
+  0.001     0      0  \set tid random(1, 10 * :scale)
+  0.001     0      0  \set delta random(-5000, 5000)
+  0.385     0      0  BEGIN;
+  0.773     0      1  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+  0.624     0      0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+  1.098   320   3762  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+  0.582  3363  41576  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+  0.465     0      0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
+  1.933     0      0  END;
+ </screen>
   </para>
 
   <para>
-   If multiple script files are specified, the averages are reported
+   If multiple script files are specified, all statistics are reported
    separately for each script file.
   </para>
 
@@ -2378,6 +2583,140 @@ statement latencies in milliseconds:
   </para>
  </refsect2>
 
+ <refsect2 id="failures-and-retries" xreflabel="Failures and Serialization/Deadlock Retries">
+  <title>Failures and Serialization/Deadlock Retries</title>
+
+  <para>
+   When executing <application>pgbench</application>, there are three main types
+   of errors:
+   <itemizedlist>
+     <listitem>
+       <para>
+         Errors of the main program. They are the most serious and always result
+         in an immediate exit from the <application>pgbench</application> with
+         the corresponding error message. They include:
+         <itemizedlist>
+           <listitem>
+             <para>
+               errors at the beginning of the <application>pgbench</application>
+               (e.g. an invalid option value);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors in the initialization mode (e.g. the query to create
+               tables for built-in scripts fails);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               errors before starting threads (e.g. we could not connect to the
+               database server / the syntax error in the meta command / thread
+               creation failure);
+             </para>
+           </listitem>
+           <listitem>
+             <para>
+               internal <application>pgbench</application> errors (which are
+               supposed to never occur...).
+             </para>
+           </listitem>
+         </itemizedlist>
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Errors when the thread manages its clients (e.g. the client could not
+         start a connection to the database server / the socket for connecting
+         the client to the database server has become invalid). In such cases
+         all clients of this thread stop while other threads continue to work.
+       </para>
+     </listitem>
+     <listitem>
+       <para>
+         Direct client errors. They lead to immediate exit from the
+         <application>pgbench</application> with the corresponding error message
+         only in the case of an internal <application>pgbench</application>
+         error (which are supposed to never occur...). Otherwise in the worst
+         case they only lead to the abortion of the failed client while other
+         clients continue their run (but some client errors are handled without
+         an abortion of the client and reported separately, see below). Later in
+         this section it is assumed that the discussed errors are only the
+         direct client errors and they are not internal
+         <application>pgbench</application> errors.
+       </para>
+     </listitem>
+   </itemizedlist>
+  </para>
+
+  <para>
+   Client's run is aborted in case of a serious error, for example, the
+   connection with the database server was lost or the end of script reached
+   without completing the last transaction. In addition, if an execution of SQL
+   or meta command fails for reasons other than serialization or deadlock errors,
+   the client is aborted. Otherwise, if an SQL fails with serialization or
+   deadlock errors, the client is not aborted. In such cases, the current
+   transaction is rolled back, which also includes setting the client variables
+   as they were before the run of this transaction (it is assumed that one
+   transaction script contains only one transaction; see
+   <xref linkend="transactions-and-scripts"/> for more information).
+   Transactions with serialization or deadlock errors are repeated after
+   rollbacks until they complete successfully or reach the maximum
+   number of tries (specified by the <option>--max-tries</option> option) / the maximum
+   time of retries (specified by the <option>--latency-limit</option> option) / the end
+   of benchmark (specified by the <option>--time</option> option). If
+   the last trial run fails, this transaction will be reported as failed but
+   the client is not aborted and continue to work.
+  </para>
+
+  <note>
+   <para>
+    Without specifying the <option>--max-tries</option> option a transaction will
+    never be retried after a serialization or deadlock error because its default
+    values is 1. Use an unlimited number of tries (<literal>--max-tries=0</literal>)
+    and the <option>--latency-limit</option> option to limit only the maximum time
+    of tries. You can also use the <option>--time</option> option to limit the
+    benchmark duration under an unlimited number of tries.
+   </para>
+   <para>
+    Be careful when repeating scripts that contain multiple transactions: the
+    script is always retried completely, so the successful transactions can be
+    performed several times.
+   </para>
+   <para>
+    Be careful when repeating transactions with shell commands. Unlike the
+    results of SQL commands, the results of shell commands are not rolled back,
+    except for the variable value of the <command>\setshell</command> command.
+   </para>
+  </note>
+
+  <para>
+   The latency of a successful transaction includes the entire time of
+   transaction execution with rollbacks and retries. The latency is measured
+   only for successful transactions and commands but not for failed transactions
+   or commands.
+  </para>
+
+  <para>
+   The main report contains the number of failed transactions. If the
+   <option>--max-tries</option> option is not equal to 1, the main report also
+   contains the statistics related to retries: the total number of retried
+   transactions and total number of retries. The per-script report inherits all
+   these fields from the main report. The per-statement report displays retry
+   statistics only if the <option>--max-tries</option> option is not equal to 1.
+  </para>
+
+  <para>
+   If you want to group failures by basic types in per-transaction and
+   aggregation logs, as well as in the main and per-script reports, use the
+   <option>--failures-detailed</option> option. If you also want to distinguish
+   all errors and failures (errors without retrying) by type including which
+   limit for retries was violated and how far it was exceeded for the
+   serialization/deadlock failures, use the <option>--verbose-errors</option>
+   option.
+  </para>
+ </refsect2>
+
  <refsect2>
   <title>Good Practices</title>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ab2c5dfc5f..7080d2a795 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -76,6 +76,8 @@
 #define M_PI 3.14159265358979323846
 #endif
 
+#define ERRCODE_T_R_SERIALIZATION_FAILURE  "40001"
+#define ERRCODE_T_R_DEADLOCK_DETECTED  "40P01"
 #define ERRCODE_UNDEFINED_TABLE  "42P01"
 
 /*
@@ -275,9 +277,34 @@ bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
 int			nthreads = 1;		/* number of threads */
 bool		is_connect;			/* establish connection for each transaction */
-bool		report_per_command; /* report per-command latencies */
+bool		report_per_command = false;	/* report per-command latencies, retries
+										 * after errors and failures (errors
+										 * without retrying) */
 int			main_pid;			/* main process id used in log filename */
 
+/*
+ * There are different types of restrictions for deciding that the current
+ * transaction with a serialization/deadlock error can no longer be retried and
+ * should be reported as failed:
+ * - max_tries (--max-tries) can be used to limit the number of tries;
+ * - latency_limit (-L) can be used to limit the total time of tries;
+ * - duration (-T) can be used to limit the total benchmark time.
+ *
+ * They can be combined together, and you need to use at least one of them to
+ * retry the transactions with serialization/deadlock errors. If none of them is
+ * used, the default value of max_tries is 1 and such transactions will not be
+ * retried.
+ */
+
+/*
+ * We cannot retry a transaction after the serialization/deadlock error if its
+ * number of tries reaches this maximum; if its value is zero, it is not used.
+ */
+uint32		max_tries = 1;
+
+bool		failures_detailed = false;	/* whether to group failures in reports
+										 * or logs by basic types */
+
 const char *pghost = NULL;
 const char *pgport = NULL;
 const char *username = NULL;
@@ -362,9 +389,66 @@ typedef int64 pg_time_usec_t;
 typedef struct StatsData
 {
 	pg_time_usec_t start_time;	/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions, including skipped */
+
+	/*
+	 * Transactions are counted depending on their execution and outcome. First
+	 * a transaction may have started or not: skipped transactions occur under
+	 * --rate and --latency-limit when the client is too late to execute them.
+	 * Secondly, a started transaction may ultimately succeed or fail, possibly
+	 * after some retries when --max-tries is not one. Thus
+	 *
+	 * the number of all transactions =
+	 *   'skipped' (it was too late to execute them) +
+	 *   'cnt' (the number of successful transactions) +
+	 *   failed (the number of failed transactions).
+	 *
+	 * A successful transaction can have several unsuccessful tries before a
+	 * successful run. Thus
+	 *
+	 * 'cnt' (the number of successful transactions) =
+	 *   successfully retried transactions (they got a serialization or a
+	 *                                      deadlock error(s), but were
+	 *                                      successfully retried from the very
+	 *                                      beginning) +
+	 *   directly successful transactions (they were successfully completed on
+	 *                                     the first try).
+	 *
+	 * A failed transaction is defined as unsuccessfully retried transactions.
+	 * It can be one of two types:
+	 *
+	 * failed (the number of failed transactions) =
+	 *   'serialization_failures' (they got a serialization error and were not
+	 *                             successfully retried) +
+	 *   'deadlock_failures' (they got a deadlock error and were not successfully
+	 *                        retried).
+	 *
+	 * If the transaction was retried after a serialization or a deadlock error
+	 * this does not guarantee that this retry was successful. Thus
+	 *
+	 * 'retries' (number of retries) =
+	 *   number of retries in all retried transactions =
+	 *   number of retries in (successfully retried transactions +
+	 *                         failed transactions);
+	 *
+	 * 'retried' (number of all retried transactions) =
+	 *   successfully retried transactions +
+	 *   failed transactions.
+	 */
+	int64		cnt;			/* number of successful transactions, not
+								 * including 'skipped' */
 	int64		skipped;		/* number of transactions skipped under --rate
 								 * and --latency-limit */
+	int64		retries;		/* number of retries after a serialization or a
+								 * deadlock error in all the transactions */
+	int64		retried;		/* number of all transactions that were retried
+								 * after a serialization or a deadlock error
+								 * (perhaps the last try was unsuccessful) */
+	int64		serialization_failures;	/* number of transactions that were not
+										 * successfully retried after a
+										 * serialization error */
+	int64		deadlock_failures;	/* number of transactions that were not
+									 * successfully retried after a deadlock
+									 * error */
 	SimpleStats latency;
 	SimpleStats lag;
 } StatsData;
@@ -375,6 +459,31 @@ typedef struct StatsData
  */
 pg_time_usec_t epoch_shift;
 
+/*
+ * Error status for errors during script execution.
+ */
+typedef enum EStatus
+{
+	ESTATUS_NO_ERROR = 0,
+	ESTATUS_META_COMMAND_ERROR,
+
+	/* SQL errors */
+	ESTATUS_SERIALIZATION_ERROR,
+	ESTATUS_DEADLOCK_ERROR,
+	ESTATUS_OTHER_SQL_ERROR
+} EStatus;
+
+/*
+ * Transaction status at the end of a command.
+ */
+typedef enum TStatus
+{
+	TSTATUS_IDLE,
+	TSTATUS_IN_BLOCK,
+	TSTATUS_CONN_ERROR,
+	TSTATUS_OTHER_ERROR
+} TStatus;
+
 /* Various random sequences are initialized from this one. */
 static pg_prng_state base_random_sequence;
 
@@ -446,6 +555,35 @@ typedef enum
 	CSTATE_END_COMMAND,
 	CSTATE_SKIP_COMMAND,
 
+	/*
+	 * States for failed commands.
+	 *
+	 * If the SQL/meta command fails, in CSTATE_ERROR clean up after an error:
+	 * - clear the conditional stack;
+	 * - if we have an unterminated (possibly failed) transaction block, send
+	 * the rollback command to the server and wait for the result in
+	 * CSTATE_WAIT_ROLLBACK_RESULT. If something goes wrong with rolling back,
+	 * go to CSTATE_ABORTED.
+	 *
+	 * But if everything is ok we are ready for future transactions: if this is
+	 * a serialization or deadlock error and we can re-execute the transaction
+	 * from the very beginning, go to CSTATE_RETRY; otherwise go to
+	 * CSTATE_FAILURE.
+	 *
+	 * In CSTATE_RETRY report an error, set the same parameters for the
+	 * transaction execution as in the previous tries and process the first
+	 * transaction command in CSTATE_START_COMMAND.
+	 *
+	 * In CSTATE_FAILURE report a failure, set the parameters for the
+	 * transaction execution as they were before the first run of this
+	 * transaction (except for a random state) and go to CSTATE_END_TX to
+	 * complete this transaction.
+	 */
+	CSTATE_ERROR,
+	CSTATE_WAIT_ROLLBACK_RESULT,
+	CSTATE_RETRY,
+	CSTATE_FAILURE,
+
 	/*
 	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
 	 * latency, and logs the transaction.  In --connect mode, it closes the
@@ -494,8 +632,20 @@ typedef struct
 
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
+	/*
+	 * For processing failures and repeating transactions with serialization or
+	 * deadlock errors:
+	 */
+	EStatus		estatus;	/* the error status of the current transaction
+							 * execution; this is ESTATUS_NO_ERROR if there were
+							 * no errors */
+	pg_prng_state	random_state;	/* random state */
+	uint32			tries;		/* how many times have we already tried the
+								 * current transaction? */
+
 	/* per client collected stats */
-	int64		cnt;			/* client transaction count, for -t */
+	int64		cnt;			/* client transaction count, for -t; skipped and
+								 * failed transactions are also counted here */
 } CState;
 
 /*
@@ -590,6 +740,9 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * retries		Number of retries after a serialization or deadlock error in the
+ *				current command.
+ * failures		Number of errors in the current command that were not retried.
  */
 typedef struct Command
 {
@@ -602,6 +755,8 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	int64		retries;
+	int64		failures;
 } Command;
 
 typedef struct ParsedScript
@@ -616,6 +771,8 @@ static ParsedScript sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int64 total_weight = 0;
 
+static bool	verbose_errors = false;	/* print verbose messages of all errors */
+
 /* Builtin test scripts */
 typedef struct BuiltinScript
 {
@@ -753,15 +910,18 @@ usage(void)
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
-		   "  -r, --report-latencies   report average latency per command\n"
+		   "  -r, --report-per-command report latencies, failures and retries per command\n"
 		   "  -R, --rate=NUM           target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM           duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all         vacuum all four standard tables before tests\n"
 		   "  --aggregate-interval=NUM aggregate data over NUM seconds\n"
+		   "  --failures-detailed      report the failures grouped by basic types\n"
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
+		   "  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+		   "  --verbose-errors         print messages of all errors\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
 		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1287,6 +1447,10 @@ initStats(StatsData *sd, pg_time_usec_t start)
 	sd->start_time = start;
 	sd->cnt = 0;
 	sd->skipped = 0;
+	sd->retries = 0;
+	sd->retried = 0;
+	sd->serialization_failures = 0;
+	sd->deadlock_failures = 0;
 	initSimpleStats(&sd->latency);
 	initSimpleStats(&sd->lag);
 }
@@ -1295,22 +1459,51 @@ initStats(StatsData *sd, pg_time_usec_t start)
  * Accumulate one additional item into the given stats object.
  */
 static void
-accumStats(StatsData *stats, bool skipped, double lat, double lag)
+accumStats(StatsData *stats, bool skipped, double lat, double lag,
+		   EStatus estatus, int64 tries)
 {
-	stats->cnt++;
-
+	/* Record the skipped transaction */
 	if (skipped)
 	{
 		/* no latency to record on skipped transactions */
 		stats->skipped++;
+		return;
 	}
-	else
+
+	/*
+	 * Record the number of retries regardless of whether the transaction was
+	 * successful or failed.
+	 */
+	if (tries > 1)
+	{
+		stats->retries += (tries - 1);
+		stats->retried++;
+	}
+
+	switch (estatus)
 	{
-		addToSimpleStats(&stats->latency, lat);
+			/* Record the successful transaction */
+		case ESTATUS_NO_ERROR:
+			stats->cnt++;
 
-		/* and possibly the same for schedule lag */
-		if (throttle_delay)
-			addToSimpleStats(&stats->lag, lag);
+			addToSimpleStats(&stats->latency, lat);
+
+			/* and possibly the same for schedule lag */
+			if (throttle_delay)
+				addToSimpleStats(&stats->lag, lag);
+			break;
+
+			/* Record the failed transaction */
+		case ESTATUS_SERIALIZATION_ERROR:
+			stats->serialization_failures++;
+			break;
+		case ESTATUS_DEADLOCK_ERROR:
+			stats->deadlock_failures++;
+			break;
+		default:
+			/* internal error which should never occur */
+			pg_log_fatal("unexpected error status: %d", estatus);
+			exit(1);
 	}
 }
 
@@ -2841,6 +3034,9 @@ preparedStatementName(char *buffer, int file, int state)
 	sprintf(buffer, "P%d_%d", file, state);
 }
 
+/*
+ * Report the abortion of the client when processing SQL commands.
+ */
 static void
 commandFailed(CState *st, const char *cmd, const char *message)
 {
@@ -2848,6 +3044,17 @@ commandFailed(CState *st, const char *cmd, const char *message)
 				 st->id, st->command, cmd, st->use_file, message);
 }
 
+/*
+ * Report the error in the command while the script is executing.
+ */
+static void
+commandError(CState *st, const char *message)
+{
+	Assert(sql_script[st->use_file].commands[st->command]->type == SQL_COMMAND);
+	pg_log_info("client %d got an error in command %d (SQL) of script %d; %s",
+				 st->id, st->command, st->use_file, message);
+}
+
 /* return a script number with a weighted choice. */
 static int
 chooseScript(TState *thread)
@@ -2955,6 +3162,33 @@ sendCommand(CState *st, Command *command)
 		return true;
 }
 
+/*
+ * Get the error status from the error code.
+ */
+static EStatus
+getSQLErrorStatus(const char *sqlState)
+{
+	if (sqlState != NULL)
+	{
+		if (strcmp(sqlState, ERRCODE_T_R_SERIALIZATION_FAILURE) == 0)
+			return ESTATUS_SERIALIZATION_ERROR;
+		else if (strcmp(sqlState, ERRCODE_T_R_DEADLOCK_DETECTED) == 0)
+			return ESTATUS_DEADLOCK_ERROR;
+	}
+
+	return ESTATUS_OTHER_SQL_ERROR;
+}
+
+/*
+ * Returns true if this type of error can be retried.
+ */
+static bool
+canRetryError(EStatus estatus)
+{
+	return (estatus == ESTATUS_SERIALIZATION_ERROR ||
+			estatus == ESTATUS_DEADLOCK_ERROR);
+}
+
 /*
  * Process query response from the backend.
  *
@@ -2997,6 +3231,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 				{
 					pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 								 st->id, st->use_file, st->command, qrynum, 0);
+					st->estatus = ESTATUS_META_COMMAND_ERROR;
 					goto error;
 				}
 				break;
@@ -3011,6 +3246,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 						/* under \gset, report the error */
 						pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
 									 st->id, st->use_file, st->command, qrynum, PQntuples(res));
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 						goto error;
 					}
 					else if (meta == META_ASET && ntuples <= 0)
@@ -3035,6 +3271,7 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 							/* internal error */
 							pg_log_error("client %d script %d command %d query %d: error storing into variable %s",
 										 st->id, st->use_file, st->command, qrynum, varname);
+							st->estatus = ESTATUS_META_COMMAND_ERROR;
 							goto error;
 						}
 
@@ -3052,6 +3289,18 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 								 PQerrorMessage(st->con));
 				break;
 
+			case PGRES_NONFATAL_ERROR:
+			case PGRES_FATAL_ERROR:
+				st->estatus = getSQLErrorStatus(
+					PQresultErrorField(res, PG_DIAG_SQLSTATE));
+				if (canRetryError(st->estatus))
+				{
+					if (verbose_errors)
+						commandError(st, PQerrorMessage(st->con));
+					goto error;
+				}
+				/* fall through */
+
 			default:
 				/* anything else is unexpected */
 				pg_log_error("client %d script %d aborted in command %d query %d: %s",
@@ -3130,6 +3379,165 @@ evaluateSleep(Variables *variables, int argc, char **argv, int *usecs)
 	return true;
 }
 
+
+/*
+ * Returns true if the error can be retried.
+ */
+static bool
+doRetry(CState *st, pg_time_usec_t *now)
+{
+	Assert(st->estatus != ESTATUS_NO_ERROR);
+
+	/* We can only retry serialization or deadlock errors. */
+	if (!canRetryError(st->estatus))
+		return false;
+
+	/*
+	 * We must have at least one option to limit the retrying of transactions
+	 * that got an error.
+	 */
+	Assert(max_tries || latency_limit || duration > 0);
+
+	/*
+	 * We cannot retry the error if we have reached the maximum number of tries.
+	 */
+	if (max_tries && st->tries >= max_tries)
+		return false;
+
+	/*
+	 * We cannot retry the error if we spent too much time on this transaction.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		if (*now - st->txn_scheduled > latency_limit)
+			return false;
+	}
+
+	/*
+	 * We cannot retry the error if the benchmark duration is over.
+	 */
+	if (timer_exceeded)
+		return false;
+
+	/* OK */
+	return true;
+}
+
+/*
+ * Read results and discard it until a sync point.
+ */
+static int
+discardUntilSync(CState *st)
+{
+	/* send a sync */
+	if (!PQpipelineSync(st->con))
+	{
+		pg_log_error("client %d aborted: failed to send a pipeline sync",
+					st->id);
+		return 0;
+	}
+
+	/* receive PGRES_PIPELINE_SYNC and null following it */
+	for(;;)
+	{
+		PGresult *res = PQgetResult(st->con);
+		if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
+		{
+			PQclear(res);
+			res = PQgetResult(st->con);
+			Assert(res == NULL);
+			break;
+		}
+		PQclear(res);
+	}
+
+	/* exit pipline */
+	if (PQexitPipelineMode(st->con) != 1)
+	{
+		pg_log_error("client %d aborted: failed to exit pipeline mode for rolling back the failed transaction",
+					 st->id);
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Get the transaction status at the end of a command especially for
+ * checking if we are in a (failed) transaction block.
+ */
+static TStatus
+getTransactionStatus(PGconn *con)
+{
+	PGTransactionStatusType tx_status;
+
+	tx_status = PQtransactionStatus(con);
+	switch (tx_status)
+	{
+		case PQTRANS_IDLE:
+			return TSTATUS_IDLE;
+		case PQTRANS_INTRANS:
+		case PQTRANS_INERROR:
+			return TSTATUS_IN_BLOCK;
+		case PQTRANS_UNKNOWN:
+			/* PQTRANS_UNKNOWN is expected given a broken connection */
+			if (PQstatus(con) == CONNECTION_BAD)
+				return TSTATUS_CONN_ERROR;
+			/* fall through */
+		case PQTRANS_ACTIVE:
+		default:
+			/*
+			 * We cannot find out whether we are in a transaction block or not.
+			 * Internal error which should never occur.
+			 */
+			pg_log_error("unexpected transaction status %d", tx_status);
+			return TSTATUS_OTHER_ERROR;
+	}
+
+	/* not reached */
+	Assert(false);
+	return TSTATUS_OTHER_ERROR;
+}
+
+/*
+ * Print verbose messages of an error
+ */
+static void
+printVerboseErrorMessages(CState *st, pg_time_usec_t *now, bool is_retry)
+{
+	static PQExpBuffer buf = NULL;
+
+	if (buf == NULL)
+		buf = createPQExpBuffer();
+	else
+		resetPQExpBuffer(buf);
+
+	printfPQExpBuffer(buf, "client %d ", st->id);
+	appendPQExpBuffer(buf, "%s",
+					  (is_retry ?
+						"repeats the transaction after the error" :
+						"ends the failed transaction"));
+	appendPQExpBuffer(buf, " (try %d", st->tries);
+
+	/* Print max_tries if it is not unlimitted. */
+	if (max_tries)
+		appendPQExpBuffer(buf, "/%d", max_tries);
+
+	/*
+	 * If the latency limit is used, print a percentage of the current transaction
+	 * latency from the latency limit.
+	 */
+	if (latency_limit)
+	{
+		pg_time_now_lazy(now);
+		appendPQExpBuffer(buf, ", %.3f%% of the maximum time of tries was used",
+						  (100.0 * (*now - st->txn_scheduled) / latency_limit));
+	}
+	appendPQExpBuffer(buf, ")\n");
+
+	pg_log_info("%s", buf->data);
+}
+
 /*
  * Advance the state machine of a connection.
  */
@@ -3167,6 +3575,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				st->use_file = chooseScript(thread);
 				Assert(conditional_stack_empty(st->cstack));
 
+				/* reset transaction variables to default values */
+				st->estatus = ESTATUS_NO_ERROR;
+				st->tries = 1;
+
 				pg_log_debug("client %d executing script \"%s\"",
 							 st->id, sql_script[st->use_file].desc);
 
@@ -3207,6 +3619,13 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					memset(st->prepared, 0, sizeof(st->prepared));
 				}
 
+				/*
+				 * It is the first try to run this transaction. Remember the
+				 * random state: maybe it will get an error and we will need to
+				 * run it again.
+				 */
+				st->random_state = st->cs_func_rs;
+
 				/* record transaction start time */
 				st->txn_begin = now;
 
@@ -3363,6 +3782,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					 * - else CSTATE_END_COMMAND
 					 */
 					st->state = executeMetaCommand(st, &now);
+					if (st->state == CSTATE_ABORTED)
+						st->estatus = ESTATUS_META_COMMAND_ERROR;
 				}
 
 				/*
@@ -3508,6 +3929,8 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 					if (PQpipelineStatus(st->con) != PQ_PIPELINE_ON)
 						st->state = CSTATE_END_COMMAND;
 				}
+				else if (canRetryError(st->estatus))
+					st->state = CSTATE_ERROR;
 				else
 					st->state = CSTATE_ABORTED;
 				break;
@@ -3555,44 +3978,223 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 				break;
 
 				/*
-				 * End of transaction (end of script, really).
+				 * Clean up after an error.
 				 */
-			case CSTATE_END_TX:
+			case CSTATE_ERROR:
+				{
+					TStatus		tstatus;
 
-				/* transaction finished: calculate latency and do log */
-				processXactStats(thread, st, &now, false, agg);
+					Assert(st->estatus != ESTATUS_NO_ERROR);
 
-				/*
-				 * missing \endif... cannot happen if CheckConditional was
-				 * okay
-				 */
-				Assert(conditional_stack_empty(st->cstack));
+					/* Clear the conditional stack */
+					conditional_stack_reset(st->cstack);
 
-				if (is_connect)
-				{
-					pg_time_usec_t start = now;
+					/* Read and discard until a sync point in pipeline mode */
+					if (PQpipelineStatus(st->con) != PQ_PIPELINE_OFF)
+					{
+						if (!discardUntilSync(st))
+						{
+							st->state = CSTATE_ABORTED;
+							break;
+						}
+					}
 
-					pg_time_now_lazy(&start);
-					finishCon(st);
-					now = pg_time_now();
-					thread->conn_duration += now - start;
+					/*
+					 * Check if we have a (failed) transaction block or not, and
+					 * roll it back if any.
+					 */
+					tstatus = getTransactionStatus(st->con);
+					if (tstatus == TSTATUS_IN_BLOCK)
+					{
+						/* Try to rollback a (failed) transaction block. */
+						if (!PQsendQuery(st->con, "ROLLBACK"))
+						{
+							pg_log_error("client %d aborted: failed to send sql command for rolling back the failed transaction",
+										 st->id);
+							st->state = CSTATE_ABORTED;
+						}
+						else
+							st->state = CSTATE_WAIT_ROLLBACK_RESULT;
+					}
+					else if (tstatus == TSTATUS_IDLE)
+					{
+						/*
+						* If time is over, we're done;
+						* otherwise, check if we can retry the error.
+						*/
+						st->state = timer_exceeded ? CSTATE_FINISHED :
+							doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+					}
+					else
+					{
+						if (tstatus == TSTATUS_CONN_ERROR)
+							pg_log_error("perhaps the backend died while processing");
+
+						pg_log_error("client %d aborted while receiving the transaction status", st->id);
+						st->state = CSTATE_ABORTED;
+					}
+					break;
 				}
 
-				if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+				/*
+				 * Wait for the rollback command to complete
+				 */
+			case CSTATE_WAIT_ROLLBACK_RESULT:
 				{
-					/* script completed */
-					st->state = CSTATE_FINISHED;
+					PGresult *res;
+
+					pg_log_debug("client %d receiving", st->id);
+					if (!PQconsumeInput(st->con))
+					{
+						pg_log_error("client %d aborted while rolling back the transaction after an error; perhaps the backend died while processing",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					if (PQisBusy(st->con))
+						return;		/* don't have the whole result yet */
+
+					/*
+					 * Read and discard the query result;
+					 */
+					res = PQgetResult(st->con);
+					switch (PQresultStatus(res))
+					{
+						case PGRES_COMMAND_OK:
+							/* OK */
+							PQclear(res);
+							/* null must be returned */
+							res = PQgetResult(st->con);
+							Assert(res == NULL);
+
+							/*
+							 * If time is over, we're done;
+							 * otherwise, check if we can retry the error.
+							 */
+							st->state = timer_exceeded ? CSTATE_FINISHED :
+								doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
+							break;
+						default:
+							pg_log_error("client %d aborted while rolling back the transaction after an error; %s",
+										 st->id, PQerrorMessage(st->con));
+							PQclear(res);
+							st->state = CSTATE_ABORTED;
+							break;
+					}
 					break;
 				}
 
-				/* next transaction (script) */
-				st->state = CSTATE_CHOOSE_SCRIPT;
+				/*
+				 * Retry the transaction after an error.
+				 */
+			case CSTATE_RETRY:
+				command = sql_script[st->use_file].commands[st->command];
 
 				/*
-				 * Ensure that we always return on this point, so as to avoid
-				 * an infinite loop if the script only contains meta commands.
+				 * Inform that the transaction will be retried after the error.
 				 */
-				return;
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, true);
+
+				/* Count tries and retries */
+				st->tries++;
+				command->retries++;
+
+				/*
+				 * Reset the random state as they were at the beginning
+				 * of the transaction.
+				 */
+				st->cs_func_rs = st->random_state;
+
+				/* Process the first transaction command. */
+				st->command = 0;
+				st->estatus = ESTATUS_NO_ERROR;
+				st->state = CSTATE_START_COMMAND;
+				break;
+
+				/*
+				 * Record a failed transaction.
+				 */
+			case CSTATE_FAILURE:
+				command = sql_script[st->use_file].commands[st->command];
+
+				/* Accumulate the failure. */
+				command->failures++;
+
+				/*
+				 * Inform that the failed transaction will not be retried.
+				 */
+				if (verbose_errors)
+					printVerboseErrorMessages(st, &now, false);
+
+				/* End the failed transaction. */
+				st->state = CSTATE_END_TX;
+				break;
+
+				/*
+				 * End of transaction (end of script, really).
+				 */
+			case CSTATE_END_TX:
+				{
+					TStatus		tstatus;
+
+					/* transaction finished: calculate latency and do log */
+					processXactStats(thread, st, &now, false, agg);
+
+					/*
+					 * missing \endif... cannot happen if CheckConditional was
+					 * okay
+					 */
+					Assert(conditional_stack_empty(st->cstack));
+
+					/*
+					 * We must complete all the transaction blocks that were
+					 * started in this script.
+					 */
+					tstatus = getTransactionStatus(st->con);
+					if (tstatus == TSTATUS_IN_BLOCK)
+					{
+						pg_log_error("client %d aborted: end of script reached without completing the last transaction",
+									 st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+					else if (tstatus != TSTATUS_IDLE)
+					{
+						if (tstatus == TSTATUS_CONN_ERROR)
+							pg_log_error("perhaps the backend died while processing");
+
+						pg_log_error("client %d aborted while receiving the transaction status", st->id);
+						st->state = CSTATE_ABORTED;
+						break;
+					}
+
+					if (is_connect)
+					{
+						pg_time_usec_t start = now;
+
+						pg_time_now_lazy(&start);
+						finishCon(st);
+						now = pg_time_now();
+						thread->conn_duration += now - start;
+					}
+
+					if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+					{
+						/* script completed */
+						st->state = CSTATE_FINISHED;
+						break;
+					}
+
+					/* next transaction (script) */
+					st->state = CSTATE_CHOOSE_SCRIPT;
+
+					/*
+					 * Ensure that we always return on this point, so as to avoid
+					 * an infinite loop if the script only contains meta commands.
+					 */
+					return;
+				}
 
 				/*
 				 * Final states.  Close the connection if it's still open.
@@ -3816,6 +4418,43 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/*
+ * Return the number fo failed transactions.
+ */
+static int64
+getFailures(const StatsData *stats)
+{
+	return (stats->serialization_failures +
+			stats->deadlock_failures);
+}
+
+/*
+ * Return a string constant representing the result of a transaction
+ * that is not successfully processed.
+ */
+static const char *
+getResultString(bool skipped, EStatus estatus)
+{
+	if (skipped)
+		return "skipped";
+	else if (failures_detailed)
+	{
+		switch (estatus)
+		{
+			case ESTATUS_SERIALIZATION_ERROR:
+				return "serialization";
+			case ESTATUS_DEADLOCK_ERROR:
+				return "deadlock";
+			default:
+				/* internal error which should never occur */
+				pg_log_fatal("unexpected error status: %d", estatus);
+				exit(1);
+		}
+	}
+	else
+		return "failed";
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3863,6 +4502,14 @@ doLog(TState *thread, CState *st,
 					agg->latency.sum2,
 					agg->latency.min,
 					agg->latency.max);
+
+			if (failures_detailed)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->serialization_failures,
+						agg->deadlock_failures);
+			else
+				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
+
 			if (throttle_delay)
 			{
 				fprintf(logfile, " %.0f %.0f %.0f %.0f",
@@ -3873,6 +4520,10 @@ doLog(TState *thread, CState *st,
 				if (latency_limit)
 					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
 			}
+			if (max_tries != 1)
+				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
+						agg->retried,
+						agg->retries);
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */
@@ -3880,22 +4531,26 @@ doLog(TState *thread, CState *st,
 		}
 
 		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		accumStats(agg, skipped, latency, lag, st->estatus, st->tries);
 	}
 	else
 	{
 		/* no, print raw transactions */
-		if (skipped)
-			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
-					INT64_FORMAT,
-					st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
-		else
+		if (!skipped && st->estatus == ESTATUS_NO_ERROR)
 			fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
 					INT64_FORMAT,
 					st->id, st->cnt, latency, st->use_file,
 					now / 1000000, now % 1000000);
+		else
+			fprintf(logfile, "%d " INT64_FORMAT " %s %d " INT64_FORMAT " "
+					INT64_FORMAT,
+					st->id, st->cnt, getResultString(skipped, st->estatus),
+					st->use_file, now / 1000000, now % 1000000);
+
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
+		if (max_tries != 1)
+			fprintf(logfile, " %d", st->tries - 1);
 		fputc('\n', logfile);
 	}
 }
@@ -3904,7 +4559,8 @@ doLog(TState *thread, CState *st,
  * Accumulate and report statistics at end of a transaction.
  *
  * (This is also called when a transaction is late and thus skipped.
- * Note that even skipped transactions are counted in the "cnt" fields.)
+ * Note that even skipped and failed transactions are counted in the CState
+ * "cnt" field.)
  */
 static void
 processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
@@ -3912,10 +4568,10 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 {
 	double		latency = 0.0,
 				lag = 0.0;
-	bool		thread_details = progress || throttle_delay || latency_limit,
-				detailed = thread_details || use_log || per_script_stats;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+						   use_log || per_script_stats;
 
-	if (detailed && !skipped)
+	if (detailed && !skipped && st->estatus == ESTATUS_NO_ERROR)
 	{
 		pg_time_now_lazy(now);
 
@@ -3924,20 +4580,12 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 		lag = st->txn_begin - st->txn_scheduled;
 	}
 
-	if (thread_details)
-	{
-		/* keep detailed thread stats */
-		accumStats(&thread->stats, skipped, latency, lag);
+	/* keep detailed thread stats */
+	accumStats(&thread->stats, skipped, latency, lag, st->estatus, st->tries);
 
-		/* count transactions over the latency limit, if needed */
-		if (latency_limit && latency > latency_limit)
-			thread->latency_late++;
-	}
-	else
-	{
-		/* no detailed stats, just count */
-		thread->stats.cnt++;
-	}
+	/* count transactions over the latency limit, if needed */
+	if (latency_limit && latency > latency_limit)
+		thread->latency_late++;
 
 	/* client stat is just counting */
 	st->cnt++;
@@ -3947,7 +4595,8 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 
 	/* XXX could use a mutex here, but we choose not to */
 	if (per_script_stats)
-		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
+		accumStats(&sql_script[st->use_file].stats, skipped, latency, lag,
+				   st->estatus, st->tries);
 }
 
 
@@ -4806,6 +5455,8 @@ create_sql_command(PQExpBuffer buf, const char *source)
 	my_command->type = SQL_COMMAND;
 	my_command->meta = META_NONE;
 	my_command->argc = 0;
+	my_command->retries = 0;
+	my_command->failures = 0;
 	memset(my_command->argv, 0, sizeof(my_command->argv));
 	my_command->varprefix = NULL;	/* allocated later, if needed */
 	my_command->expr = NULL;
@@ -5474,7 +6125,9 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 {
 	/* generate and show report */
 	pg_time_usec_t run = now - *last_report;
-	int64		ntx;
+	int64		cnt,
+				failures,
+				retried;
 	double		tps,
 				total_run,
 				latency,
@@ -5501,23 +6154,30 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 		mergeSimpleStats(&cur.lag, &threads[i].stats.lag);
 		cur.cnt += threads[i].stats.cnt;
 		cur.skipped += threads[i].stats.skipped;
+		cur.retries += threads[i].stats.retries;
+		cur.retried += threads[i].stats.retried;
+		cur.serialization_failures +=
+			threads[i].stats.serialization_failures;
+		cur.deadlock_failures += threads[i].stats.deadlock_failures;
 	}
 
 	/* we count only actually executed transactions */
-	ntx = (cur.cnt - cur.skipped) - (last->cnt - last->skipped);
+	cnt = cur.cnt - last->cnt;
 	total_run = (now - test_start) / 1000000.0;
-	tps = 1000000.0 * ntx / run;
-	if (ntx > 0)
+	tps = 1000000.0 * cnt / run;
+	if (cnt > 0)
 	{
-		latency = 0.001 * (cur.latency.sum - last->latency.sum) / ntx;
-		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / ntx;
+		latency = 0.001 * (cur.latency.sum - last->latency.sum) / cnt;
+		sqlat = 1.0 * (cur.latency.sum2 - last->latency.sum2) / cnt;
 		stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-		lag = 0.001 * (cur.lag.sum - last->lag.sum) / ntx;
+		lag = 0.001 * (cur.lag.sum - last->lag.sum) / cnt;
 	}
 	else
 	{
 		latency = sqlat = stdev = lag = 0;
 	}
+	failures = getFailures(&cur) - getFailures(last);
+	retried = cur.retried - last->retried;
 
 	if (progress_timestamp)
 	{
@@ -5531,8 +6191,8 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 	}
 
 	fprintf(stderr,
-			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
-			tbuf, tps, latency, stdev);
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f, " INT64_FORMAT " failed",
+			tbuf, tps, latency, stdev, failures);
 
 	if (throttle_delay)
 	{
@@ -5541,6 +6201,12 @@ printProgressReport(TState *threads, int64 test_start, pg_time_usec_t now,
 			fprintf(stderr, ", " INT64_FORMAT " skipped",
 					cur.skipped - last->skipped);
 	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+		fprintf(stderr,
+				", " INT64_FORMAT " retried, " INT64_FORMAT " retries",
+				retried, cur.retries - last->retries);
 	fprintf(stderr, "\n");
 
 	*last = cur;
@@ -5600,9 +6266,10 @@ printResults(StatsData *total,
 			 int64 latency_late)
 {
 	/* tps is about actually executed transactions during benchmarking */
-	int64		ntx = total->cnt - total->skipped;
+	int64		failures = getFailures(total);
+	int64		total_cnt = total->cnt + total->skipped + failures;
 	double		bench_duration = PG_TIME_GET_DOUBLE(total_duration);
-	double		tps = ntx / bench_duration;
+	double		tps = total->cnt / bench_duration;
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -5615,39 +6282,65 @@ printResults(StatsData *total,
 	printf("query mode: %s\n", QUERYMODE[querymode]);
 	printf("number of clients: %d\n", nclients);
 	printf("number of threads: %d\n", nthreads);
+
+	if (max_tries)
+		printf("maximum number of tries: %d\n", max_tries);
+
 	if (duration <= 0)
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   ntx, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
 		printf("duration: %d s\n", duration);
 		printf("number of transactions actually processed: " INT64_FORMAT "\n",
-			   ntx);
+			   total->cnt);
+	}
+
+	printf("number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+		   failures, 100.0 * failures / total_cnt);
+
+	if (failures_detailed)
+	{
+		printf("number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+			   total->serialization_failures,
+			   100.0 * total->serialization_failures / total_cnt);
+		printf("number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+			   total->deadlock_failures,
+			   100.0 * total->deadlock_failures / total_cnt);
+	}
+
+	/* it can be non-zero only if max_tries is not equal to one */
+	if (max_tries != 1)
+	{
+		printf("number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+			   total->retried, 100.0 * total->retried / total_cnt);
+		printf("total number of retries: " INT64_FORMAT "\n", total->retries);
 	}
 
 	/* Remaining stats are nonsensical if we failed to execute any xacts */
-	if (total->cnt <= 0)
+	if (total->cnt + total->skipped <= 0)
 		return;
 
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
-			   total->skipped, 100.0 * total->skipped / total->cnt);
+			   total->skipped, 100.0 * total->skipped / total_cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT "/" INT64_FORMAT " (%.3f%%)\n",
-			   latency_limit / 1000.0, latency_late, ntx,
-			   (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);
+			   latency_limit / 1000.0, latency_late, total->cnt,
+			   (total->cnt > 0) ? 100.0 * latency_late / total->cnt : 0.0);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", &total->latency);
 	else
 	{
 		/* no measurement, show average latency computed from run time */
-		printf("latency average = %.3f ms\n",
-			   0.001 * total_duration * nclients / total->cnt);
+		printf("latency average = %.3f ms%s\n",
+			   0.001 * total_duration * nclients / total_cnt,
+			   failures > 0 ? " (including failures)" : "");
 	}
 
 	if (throttle_delay)
@@ -5673,7 +6366,7 @@ printResults(StatsData *total,
 	 */
 	if (is_connect)
 	{
-		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / total->cnt);
+		printf("average connection time = %.3f ms\n", 0.001 * conn_total_duration / (total->cnt + failures));
 		printf("tps = %f (including reconnection times)\n", tps);
 	}
 	else
@@ -5692,6 +6385,9 @@ printResults(StatsData *total,
 			if (per_script_stats)
 			{
 				StatsData  *sstats = &sql_script[i].stats;
+				int64		script_failures = getFailures(sstats);
+				int64		script_total_cnt =
+					sstats->cnt + sstats->skipped + script_failures;
 
 				printf("SQL script %d: %s\n"
 					   " - weight: %d (targets %.1f%% of total)\n"
@@ -5701,25 +6397,55 @@ printResults(StatsData *total,
 					   100.0 * sql_script[i].weight / total_weight,
 					   sstats->cnt,
 					   100.0 * sstats->cnt / total->cnt,
-					   (sstats->cnt - sstats->skipped) / bench_duration);
+					   sstats->cnt / bench_duration);
+
+				printf(" - number of failed transactions: " INT64_FORMAT " (%.3f%%)\n",
+					   script_failures,
+					   100.0 * script_failures / script_total_cnt);
+
+				if (failures_detailed)
+				{
+					printf(" - number of serialization failures: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->serialization_failures,
+						   (100.0 * sstats->serialization_failures /
+							script_total_cnt));
+					printf(" - number of deadlock failures: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->deadlock_failures,
+						   (100.0 * sstats->deadlock_failures /
+							script_total_cnt));
+				}
 
-				if (throttle_delay && latency_limit && sstats->cnt > 0)
+				/* it can be non-zero only if max_tries is not equal to one */
+				if (max_tries != 1)
+				{
+					printf(" - number of transactions retried: " INT64_FORMAT " (%.3f%%)\n",
+						   sstats->retried,
+						   100.0 * sstats->retried / script_total_cnt);
+					printf(" - total number of retries: " INT64_FORMAT "\n",
+						   sstats->retries);
+				}
+
+				if (throttle_delay && latency_limit && script_total_cnt > 0)
 					printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 						   sstats->skipped,
-						   100.0 * sstats->skipped / sstats->cnt);
+						   100.0 * sstats->skipped / script_total_cnt);
 
 				printSimpleStats(" - latency", &sstats->latency);
 			}
 
-			/* Report per-command latencies */
+			/*
+			 * Report per-command statistics: latencies, retries after errors,
+			 * failures (errors without retrying).
+			 */
 			if (report_per_command)
 			{
 				Command   **commands;
 
-				if (per_script_stats)
-					printf(" - statement latencies in milliseconds:\n");
-				else
-					printf("statement latencies in milliseconds:\n");
+				printf("%sstatement latencies in milliseconds%s:\n",
+					   per_script_stats ? " - " : "",
+					   (max_tries == 1 ?
+						" and failures" :
+						", failures and retries"));
 
 				for (commands = sql_script[i].commands;
 					 *commands != NULL;
@@ -5727,10 +6453,19 @@ printResults(StatsData *total,
 				{
 					SimpleStats *cstats = &(*commands)->stats;
 
-					printf("   %11.3f  %s\n",
-						   (cstats->count > 0) ?
-						   1000.0 * cstats->sum / cstats->count : 0.0,
-						   (*commands)->first_line);
+					if (max_tries == 1)
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->first_line);
+					else
+						printf("   %11.3f  %10" INT64_MODIFIER "d  %10" INT64_MODIFIER "d  %s\n",
+							   (cstats->count > 0) ?
+							   1000.0 * cstats->sum / cstats->count : 0.0,
+							   (*commands)->failures,
+							   (*commands)->retries,
+							   (*commands)->first_line);
 				}
 			}
 		}
@@ -5810,7 +6545,7 @@ main(int argc, char **argv)
 		{"progress", required_argument, NULL, 'P'},
 		{"protocol", required_argument, NULL, 'M'},
 		{"quiet", no_argument, NULL, 'q'},
-		{"report-latencies", no_argument, NULL, 'r'},
+		{"report-per-command", no_argument, NULL, 'r'},
 		{"rate", required_argument, NULL, 'R'},
 		{"scale", required_argument, NULL, 's'},
 		{"select-only", no_argument, NULL, 'S'},
@@ -5832,6 +6567,9 @@ main(int argc, char **argv)
 		{"show-script", required_argument, NULL, 10},
 		{"partitions", required_argument, NULL, 11},
 		{"partition-method", required_argument, NULL, 12},
+		{"failures-detailed", no_argument, NULL, 13},
+		{"max-tries", required_argument, NULL, 14},
+		{"verbose-errors", no_argument, NULL, 15},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6185,6 +6923,28 @@ main(int argc, char **argv)
 					exit(1);
 				}
 				break;
+			case 13:			/* failures-detailed */
+				benchmarking_option_set = true;
+				failures_detailed = true;
+				break;
+			case 14:			/* max-tries */
+				{
+					int32		max_tries_arg = atoi(optarg);
+
+					if (max_tries_arg < 0)
+					{
+						pg_log_fatal("invalid number of maximum tries: \"%s\"", optarg);
+						exit(1);
+					}
+
+					benchmarking_option_set = true;
+					max_tries = (uint32) max_tries_arg;
+				}
+				break;
+			case 15:			/* verbose-errors */
+				benchmarking_option_set = true;
+				verbose_errors = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -6366,6 +7126,15 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (!max_tries)
+	{
+		if (!latency_limit && duration <= 0)
+		{
+			pg_log_fatal("an unlimited number of transaction tries can only be used with --latency-limit or a duration (-T)");
+			exit(1);
+		}
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
@@ -6578,6 +7347,10 @@ main(int argc, char **argv)
 		mergeSimpleStats(&stats.lag, &thread->stats.lag);
 		stats.cnt += thread->stats.cnt;
 		stats.skipped += thread->stats.skipped;
+		stats.retries += thread->stats.retries;
+		stats.retried += thread->stats.retried;
+		stats.serialization_failures += thread->stats.serialization_failures;
+		stats.deadlock_failures += thread->stats.deadlock_failures;
 		latency_late += thread->latency_late;
 		conn_total_duration += thread->conn_duration;
 
@@ -6724,7 +7497,8 @@ threadRun(void *arg)
 				if (min_usec > this_usec)
 					min_usec = this_usec;
 			}
-			else if (st->state == CSTATE_WAIT_RESULT)
+			else if (st->state == CSTATE_WAIT_RESULT ||
+					 st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/*
 				 * waiting for result from server - nothing to do unless the
@@ -6813,7 +7587,8 @@ threadRun(void *arg)
 		{
 			CState	   *st = &state[i];
 
-			if (st->state == CSTATE_WAIT_RESULT)
+			if (st->state == CSTATE_WAIT_RESULT ||
+				st->state == CSTATE_WAIT_ROLLBACK_RESULT)
 			{
 				/* don't call advanceConnectionState unless data is available */
 				int			sock = PQsocket(st->con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index f1341092fe..d173ceae7a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -11,7 +11,9 @@ use Config;
 
 # start a pgbench specific server
 my $node = PostgreSQL::Test::Cluster->new('main');
-$node->init;
+# Set to untranslated messages, to be able to compare program output with
+# expected strings.
+$node->init(extra => [ '--locale', 'C' ]);
 $node->start;
 
 # tablespace for testing, because partitioned tables cannot use pg_default
@@ -109,7 +111,8 @@ $node->pgbench(
 		qr{builtin: TPC-B},
 		qr{clients: 2\b},
 		qr{processed: 10/10},
-		qr{mode: simple}
+		qr{mode: simple},
+		qr{maximum number of tries: 1}
 	],
 	[qr{^$}],
 	'pgbench tpcb-like');
@@ -1198,6 +1201,214 @@ $node->pgbench(
 check_pgbench_logs($bdir, '001_pgbench_log_3', 1, 10, 10,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
+# abortion of the client if the script contains an incomplete transaction block
+$node->pgbench(
+	'--no-vacuum', 2, [ qr{processed: 1/10} ],
+	[ qr{client 0 aborted: end of script reached without completing the last transaction} ],
+	'incomplete transaction block',
+	{ '001_pgbench_incomplete_transaction_block' => q{BEGIN;SELECT 1;} });
+
+# Test the concurrent update in the table row and deadlocks.
+
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE first_client_table (value integer); '
+  . 'CREATE UNLOGGED TABLE xy (x integer, y integer); '
+  . 'INSERT INTO xy VALUES (1, 2);');
+
+# Serialization error and retry
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
+
+# Check that we have a serialization error and the same random value of the
+# delta variable in the next try
+my $err_pattern =
+    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+  . "ERROR:  could not serialize access due to concurrent update\\b.*"
+  . "\\g1";
+
+$node->pgbench(
+	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr/$err_pattern/s ],
+	'concurrent update with retrying',
+	{
+		'001_pgbench_serialization' => q{
+-- What's happening:
+-- The first client starts the transaction with the isolation level Repeatable
+-- Read:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+--
+-- The second client starts a similar transaction with the same isolation level:
+--
+-- BEGIN;
+-- UPDATE xy SET y = ... WHERE x = 1;
+-- <waiting for the first client>
+--
+-- The first client commits its transaction, and the second client gets a
+-- serialization error.
+
+\set delta random(-5000, 5000)
+
+-- The second client will stop here
+SELECT pg_advisory_lock(0);
+
+-- Start transaction with concurrent update
+BEGIN;
+UPDATE xy SET y = y + :delta WHERE x = 1 AND pg_advisory_lock(1) IS NOT NULL;
+
+-- Wait for the second client
+DO $$
+DECLARE
+  exists boolean;
+  waiters integer;
+BEGIN
+  -- The second client always comes in second, and the number of rows in the
+  -- table first_client_table reflect this. Here the first client inserts a row,
+  -- so the second client will see a non-empty table when repeating the
+  -- transaction after the serialization error.
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF NOT exists THEN
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	-- And wait until the second client tries to get the same lock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 1::bigint) AND NOT granted;
+	  IF waiters = 1 THEN
+		INSERT INTO first_client_table VALUES (1);
+
+		-- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+  END IF;
+END$$;
+
+COMMIT;
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+
+$node->safe_psql('postgres', 'DELETE FROM first_client_table;');
+
+local $ENV{PGOPTIONS} = "-c default_transaction_isolation=read\\ committed";
+
+# Deadlock error and retry
+
+# Check that we have a deadlock error
+$err_pattern =
+	"client (0|1) got an error in command (3|5) \\(SQL\\) of script 0; "
+  . "ERROR:  deadlock detected\\b";
+
+$node->pgbench(
+	"-n -c 2 -t 1 --max-tries 2 --verbose-errors",
+	0,
+	[ qr{processed: 2/2\b}, qr{number of transactions retried: 1\b},
+	  qr{total number of retries: 1\b} ],
+	[ qr{$err_pattern} ],
+	'deadlock with retrying',
+	{
+		'001_pgbench_deadlock' => q{
+-- What's happening:
+-- The first client gets the lock 2.
+-- The second client gets the lock 3 and tries to get the lock 2.
+-- The first client tries to get the lock 3 and one of them gets a deadlock
+-- error.
+--
+-- A client that does not get a deadlock error must hold a lock at the
+-- transaction start. Thus in the end it releases all of its locks before the
+-- client with the deadlock error starts a retry (we do not want any errors
+-- again).
+
+-- Since the client with the deadlock error has not released the blocking locks,
+-- let's do this here.
+SELECT pg_advisory_unlock_all();
+
+-- The second client and the client with the deadlock error stop here
+SELECT pg_advisory_lock(0);
+SELECT pg_advisory_lock(1);
+
+-- The second client and the client with the deadlock error always come after
+-- the first and the number of rows in the table first_client_table reflects
+-- this. Here the first client inserts a row, so in the future the table is
+-- always non-empty.
+DO $$
+DECLARE
+  exists boolean;
+BEGIN
+  SELECT EXISTS (SELECT * FROM first_client_table) INTO STRICT exists;
+  IF exists THEN
+	-- We are the second client or the client with the deadlock error
+
+	-- The first client will take care by itself of this lock (see below)
+	PERFORM pg_advisory_unlock(0);
+
+	PERFORM pg_advisory_lock(3);
+
+	-- The second client can get a deadlock here
+	PERFORM pg_advisory_lock(2);
+  ELSE
+	-- We are the first client
+
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (1);
+
+	PERFORM pg_advisory_lock(2);
+  END IF;
+END$$;
+
+DO $$
+DECLARE
+  num_rows integer;
+  waiters integer;
+BEGIN
+  -- Check if we are the first client
+  SELECT COUNT(*) FROM first_client_table INTO STRICT num_rows;
+  IF num_rows = 1 THEN
+	-- This code should not be used in a new transaction after an error
+	INSERT INTO first_client_table VALUES (2);
+
+	-- Let the second client begin
+	PERFORM pg_advisory_unlock(0);
+	PERFORM pg_advisory_unlock(1);
+
+	-- Make sure the second client is ready for deadlock
+	LOOP
+	  SELECT COUNT(*) INTO STRICT waiters FROM pg_locks WHERE
+	  locktype = 'advisory' AND
+	  objsubid = 1 AND
+	  ((classid::bigint << 32) | objid::bigint = 2::bigint) AND
+	  NOT granted;
+
+	  IF waiters = 1 THEN
+	    -- Exit loop
+		EXIT;
+	  END IF;
+	END LOOP;
+
+	PERFORM pg_advisory_lock(0);
+    -- And the second client took care by itself of the lock 1
+  END IF;
+END$$;
+
+-- The first client can get a deadlock here
+SELECT pg_advisory_lock(3);
+
+SELECT pg_advisory_unlock_all();
+}
+	});
+
+# Clean up
+$node->safe_psql('postgres', 'DROP TABLE first_client_table, xy;');
+
+
 # done
 $node->safe_psql('postgres', 'DROP TABLESPACE regress_pgbench_tap_1_ts');
 $node->stop;
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index acad19edd0..a5074c70d9 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -188,6 +188,16 @@ my @options = (
 		'-i --partition-method=hash',
 		[qr{partition-method requires greater than zero --partitions}]
 	],
+	[
+		'bad maximum number of tries',
+		'--max-tries -10',
+		[qr{invalid number of maximum tries: "-10"}]
+	],
+	[
+		'an infinite number of tries',
+		'--max-tries 0',
+		[qr{an unlimited number of transaction tries can only be used with --latency-limit or a duration}]
+	],
 
 	# logging sub-options
 	[
diff --git a/src/fe_utils/conditional.c b/src/fe_utils/conditional.c
index 0bf877e895..5a94664989 100644
--- a/src/fe_utils/conditional.c
+++ b/src/fe_utils/conditional.c
@@ -24,13 +24,25 @@ conditional_stack_create(void)
 }
 
 /*
- * destroy stack
+ * Destroy all the elements from the stack. The stack itself is not freed.
  */
 void
-conditional_stack_destroy(ConditionalStack cstack)
+conditional_stack_reset(ConditionalStack cstack)
 {
+	if (!cstack)
+		return;					/* nothing to do here */
+
 	while (conditional_stack_pop(cstack))
 		continue;
+}
+
+/*
+ * destroy stack
+ */
+void
+conditional_stack_destroy(ConditionalStack cstack)
+{
+	conditional_stack_reset(cstack);
 	free(cstack);
 }
 
diff --git a/src/include/fe_utils/conditional.h b/src/include/fe_utils/conditional.h
index b28189471c..fa53d86501 100644
--- a/src/include/fe_utils/conditional.h
+++ b/src/include/fe_utils/conditional.h
@@ -73,6 +73,8 @@ typedef struct ConditionalStackData *ConditionalStack;
 
 extern ConditionalStack conditional_stack_create(void);
 
+extern void conditional_stack_reset(ConditionalStack cstack);
+
 extern void conditional_stack_destroy(ConditionalStack cstack);
 
 extern int	conditional_stack_depth(ConditionalStack cstack);
-- 
2.17.1

#161

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#160)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I've checked other places using <xref/> referring to <refsect2>, and found
that "xreflabel"s are used in such <refsect2> tags. So, I'll fix it
in this style.

I attached the updated patch. I also fixed the following paragraph which I had
forgotten to fix in the previous patch.

The first seven lines report some of the most important parameter settings.
The sixth line reports the maximum number of tries for transactions with
serialization or deadlock errors

Thank you for the updated patch. I think the patches look good and now
it's ready for commit. If there's no objection, I would like to
commit/push the patches.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#162

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#161)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I attached the updated patch. I also fixed the following paragraph which I had
forgotten to fix in the previous patch.

The first seven lines report some of the most important parameter settings.
The sixth line reports the maximum number of tries for transactions with
serialization or deadlock errors

Thank you for the updated patch. I think the patches look good and now
it's ready for commit. If there's no objection, I would like to
commit/push the patches.

The patch Pushed. Thank you!

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#163

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#162)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

The patch Pushed. Thank you!

My hoary animal prairiedog doesn't like this [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-03-23%2013%3A21%3A44:

# Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)/'
# at t/001_pgbench_with_server.pl line 1229.
# 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 nxacts: 1 dbName: postgres
...
# pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: could not serialize access due to concurrent update
...
# '
# doesn't match '(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)'
# Looks like you failed 1 test of 425.

I'm not sure what the "\\b.*\\g1" part of this regex is meant to
accomplish, but it seems to be assuming more than it should
about the output format of TAP messages.

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-03-23%2013%3A21%3A44

#164

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#163)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Wed, 23 Mar 2022 14:26:54 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

The patch Pushed. Thank you!

My hoary animal prairiedog doesn't like this [1]:

# Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)/'
# at t/001_pgbench_with_server.pl line 1229.
# 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 nxacts: 1 dbName: postgres
...
# pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: could not serialize access due to concurrent update
...
# '
# doesn't match '(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)'
# Looks like you failed 1 test of 425.

I'm not sure what the "\\b.*\\g1" part of this regex is meant to
accomplish, but it seems to be assuming more than it should
about the output format of TAP messages.

I have edited the test code from the original patch by mistake, but
I could not realize because the test works in my machine without any
errors somehow.

I attached a patch to fix the test as was in the original patch, where
backreferences are used to check retry of the same query.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

fix_pgbench_test.patchtext/x-diff; name=fix_pgbench_test.patchDownload

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index d173ceae7a..3eb5905e5a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1222,7 +1222,8 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
 # Check that we have a serialization error and the same random value of the
 # delta variable in the next try
 my $err_pattern =
-    "client (0|1) got an error in command 3 \\(SQL\\) of script 0; "
+	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
+  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
   . "\\g1";

#165

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#164)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

My hoary animal prairiedog doesn't like this [1]:

# Failed test 'concurrent update with retrying stderr /(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)/'
# at t/001_pgbench_with_server.pl line 1229.
# 'pgbench: pghost: /tmp/nhghgwAoki pgport: 58259 nclients: 2 nxacts: 1 dbName: postgres
...
# pgbench: client 0 got an error in command 3 (SQL) of script 0; ERROR: could not serialize access due to concurrent update
...
# '
# doesn't match '(?s-xim:client (0|1) got an error in command 3 \$SQL\$ of script 0; ERROR: could not serialize access due to concurrent update\\b.*\\g1)'
# Looks like you failed 1 test of 425.

I'm not sure what the "\\b.*\\g1" part of this regex is meant to
accomplish, but it seems to be assuming more than it should
about the output format of TAP messages.

I have edited the test code from the original patch by mistake, but
I could not realize because the test works in my machine without any
errors somehow.

I attached a patch to fix the test as was in the original patch, where
backreferences are used to check retry of the same query.

My machine (Ubuntu 20) did not complain either. Maybe perl version
difference? Any way, the fix pushed. Let's see how prairiedog feels.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#166

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#165)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

My hoary animal prairiedog doesn't like this [1]:

My machine (Ubuntu 20) did not complain either. Maybe perl version
difference? Any way, the fix pushed. Let's see how prairiedog feels.

Still not happy. After some digging in man pages, I believe the
problem is that its old version of Perl does not understand "\gN"
backreferences. Is there a good reason to be using that rather
than the traditional "\N" backref notation?

regards, tom lane

#167

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#166)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

My machine (Ubuntu 20) did not complain either. Maybe perl version
difference? Any way, the fix pushed. Let's see how prairiedog feels.

Still not happy. After some digging in man pages, I believe the
problem is that its old version of Perl does not understand "\gN"
backreferences. Is there a good reason to be using that rather
than the traditional "\N" backref notation?

I don't see a reason to use "\gN" either. Actually after applying
attached patch, my machine is still happy with pgbench test.

Yugo?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

fix_pgbench_test_v2.patchtext/x-patch; charset=us-asciiDownload

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 60cae1e843..22a23489e8 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1224,7 +1224,7 @@ my $err_pattern =
 	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
   . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
-  . "\\g1";
+  . "\\1";
 
 $node->pgbench(
 	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",

#168

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#167)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

I don't see a reason to use "\gN" either. Actually after applying
attached patch, my machine is still happy with pgbench test.

Note that the \\g2 just above also needs to be changed.

regards, tom lane

#169

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#168)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Note that the \\g2 just above also needs to be changed.

Oops. Thanks. New patch attached. Test has passed on my machine.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

fix_pgbench_test_v3.patchtext/x-patch; charset=us-asciiDownload

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 60cae1e843..ca71f968dc 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -1222,9 +1222,9 @@ local $ENV{PGOPTIONS} = "-c default_transaction_isolation=repeatable\\ read";
 # delta variable in the next try
 my $err_pattern =
 	"(client (0|1) sending UPDATE xy SET y = y \\+ -?\\d+\\b).*"
-  . "client \\g2 got an error in command 3 \\(SQL\\) of script 0; "
+  . "client \\2 got an error in command 3 \\(SQL\\) of script 0; "
   . "ERROR:  could not serialize access due to concurrent update\\b.*"
-  . "\\g1";
+  . "\\1";
 
 $node->pgbench(
 	"-n -c 2 -t 1 -d --verbose-errors --max-tries 2",

#170

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#169)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

Oops. Thanks. New patch attached. Test has passed on my machine.

I reproduced the failure on another machine with perl 5.8.8,
and I can confirm that this patch fixes it.

regards, tom lane

#171

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#169)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Fri, 25 Mar 2022 09:14:00 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Note that the \\g2 just above also needs to be changed.

Oops. Thanks. New patch attached. Test has passed on my machine.

This patch works for me. I think it is ok to use \N instead of \gN.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

#172

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#170)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I reproduced the failure on another machine with perl 5.8.8,
and I can confirm that this patch fixes it.

Thank you for the test. I have pushed the patch.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#173

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#171)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Oops. Thanks. New patch attached. Test has passed on my machine.

This patch works for me. I think it is ok to use \N instead of \gN.

Thanks. Patch pushed.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#174

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#173)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

Thanks. Patch pushed.

This patch has caused the PDF documentation to fail to build cleanly:

[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by more than 50 points. (See position 125066:375)

It's complaining about this:

<synopsis>
<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
</synopsis>

which runs much too wide in HTML format too, even though that toolchain
doesn't tell you so.

We could silence the warning by inserting an arbitrary line break or two,
or refactoring the syntax description into multiple parts. Either way
seems to create a risk of confusion.

TBH, I think the *real* problem is that the complexity of this log format
has blown past "out of hand". Can't we simplify it? Who is really going
to use all these numbers? I pity the poor sucker who tries to write a
log analysis tool that will handle all the variants.

regards, tom lane

#175

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#174)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

This patch has caused the PDF documentation to fail to build cleanly:

[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by more than 50 points. (See position 125066:375)

It's complaining about this:

<synopsis>
<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
</synopsis>

which runs much too wide in HTML format too, even though that toolchain
doesn't tell you so.

Yeah.

We could silence the warning by inserting an arbitrary line break or two,
or refactoring the syntax description into multiple parts. Either way
seems to create a risk of confusion.

I think we can fold the line nicely. Here is the rendered image.

Before:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

Note that before it was like this:

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

So newly added items are "{ failures | serialization_failures deadlock_failures }" and " [ retried retries ]".

TBH, I think the *real* problem is that the complexity of this log format
has blown past "out of hand". Can't we simplify it? Who is really going
to use all these numbers? I pity the poor sucker who tries to write a
log analysis tool that will handle all the variants.

Well, the extra logging items above only appear when the retry feature
is enabled. For those who do not use the feature the only new logging
item is "failures". For those who use the feature, the extra logging
items are apparently necessary. For example if we write an application
using repeatable read or serializable transaction isolation mode,
retrying failed transactions due to srialization error is an essential
technique. Also the retry rate of transactions will deeply affect the
performance and in such use cases the newly added items will be
precisou information. I would suggest leave the log items as it is.

Patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

pgbench-doc.patchtext/x-patch; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..b65b813ebe 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2398,10 +2398,11 @@ END;
 
   <para>
    With the <option>--aggregate-interval</option> option, a different
-   format is used for the log files:
+   format is used for the log files (note that the actual log line is not folded).
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
+  <replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>
+  { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where

#176

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#175)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Sun, 27 Mar 2022 15:28:41 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

This patch has caused the PDF documentation to fail to build cleanly:

[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by more than 50 points. (See position 125066:375)

It's complaining about this:

<synopsis>
<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
</synopsis>

which runs much too wide in HTML format too, even though that toolchain
doesn't tell you so.

Yeah.

We could silence the warning by inserting an arbitrary line break or two,
or refactoring the syntax description into multiple parts. Either way
seems to create a risk of confusion.

I think we can fold the line nicely. Here is the rendered image.

Before:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

Note that before it was like this:

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

So newly added items are "{ failures | serialization_failures deadlock_failures }" and " [ retried retries ]".

TBH, I think the *real* problem is that the complexity of this log format
has blown past "out of hand". Can't we simplify it? Who is really going
to use all these numbers? I pity the poor sucker who tries to write a
log analysis tool that will handle all the variants.

Well, the extra logging items above only appear when the retry feature
is enabled. For those who do not use the feature the only new logging
item is "failures". For those who use the feature, the extra logging
items are apparently necessary. For example if we write an application
using repeatable read or serializable transaction isolation mode,
retrying failed transactions due to srialization error is an essential
technique. Also the retry rate of transactions will deeply affect the
performance and in such use cases the newly added items will be
precisou information. I would suggest leave the log items as it is.

Patch attached.

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit. So,
I would like to get back this as it was. I attached the patch.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

pgbench-doc_v2.patchtext/x-diff; name=pgbench-doc_v2.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..4437d5ef53 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where

#177

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#176)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit.

Yes, your patch removed "&zwsp;".

So,
I would like to get back this as it was. I attached the patch.

This produces errors. Needs ";" postfix?

ref/pgbench.sgml:2404: parser error : EntityRef: expecting ';'
le>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp
^
ref/pgbench.sgml:2781: parser error : chunk is not well balanced

^
reference.sgml:251: parser error : Failure to process entity pgbench
&pgbench;
^
reference.sgml:251: parser error : Entity 'pgbench' not defined
&pgbench;
^
reference.sgml:296: parser error : chunk is not well balanced

^
postgres.sgml:240: parser error : Failure to process entity reference
&reference;
^
postgres.sgml:240: parser error : Entity 'reference' not defined
&reference;
^
make: *** [Makefile:135: html-stamp] エラー 1

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#178

Yugo NAGATA

nagata@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#177)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On Mon, 28 Mar 2022 12:17:13 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit.

Yes, your patch removed "&zwsp;".

So,
I would like to get back this as it was. I attached the patch.

This produces errors. Needs ";" postfix?

Oops. Yes, it needs ';'. Also, I found another "&zwsp;" dropped.
I attached the fixed patch.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

Attachments:

pgbench-doc_v3.patchtext/x-diff; name=pgbench-doc_v3.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..b16a5b9b7b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,7 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>&zwsp; <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>&zwsp; { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
 </synopsis>
 
    where

#179

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Yugo NAGATA (#178)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit.

Yes, your patch removed "&zwsp;".

So,
I would like to get back this as it was. I attached the patch.

This produces errors. Needs ";" postfix?

Oops. Yes, it needs ';'. Also, I found another "&zwsp;" dropped.
I attached the fixed patch.

Basic problem with this patch is, this may solve the issue with pdf
generation but this does not solve the issue with HTML generation. The
PDF manual of pgbench has ridiculously long line, which Tom Lane
complained too:

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

Why can't we use just line feeds instead of &zwsp;? Although it's not
a command usage but the SELECT manual already uses line feeds to
nicely break into multiple lines of command usage.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#180

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#179)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Even applying this patch, "make postgres-A4.pdf" arises the warning on my
machine. After some investigations, I found that previous document had a break
after 'num_transactions', but it has been removed due to this commit.

Yes, your patch removed "&zwsp;".

So,
I would like to get back this as it was. I attached the patch.

This produces errors. Needs ";" postfix?

Oops. Yes, it needs ';'. Also, I found another "&zwsp;" dropped.
I attached the fixed patch.

Basic problem with this patch is, this may solve the issue with pdf
generation but this does not solve the issue with HTML generation. The
PDF manual of pgbench has ridiculously long line, which Tom Lane

I meant "HTML manual" here.

Show quoted text

complained too:

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

Why can't we use just line feeds instead of &zwsp;? Although it's not
a command usage but the SELECT manual already uses line feeds to
nicely break into multiple lines of command usage.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#181

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Tatsuo Ishii (#175)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello,

On 2022-Mar-27, Tatsuo Ishii wrote:

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

You're showing an indentation, but looking at the HTML output there is
no such. Is the HTML processor eating leading whitespace or something
like that?

I think that the explanatory paragraph is way too long now, particularly
since it explains --failures-detailed starting in the middle. Also, the
example output doesn't include the failures-detailed mode. I suggest
that this should be broken down even more; first to explain the output
without failures-detailed, including an example, and then the output
with failures-detailed, and an example of that. Something like this,
perhaps:

Aggregated Logging
With the --aggregate-interval option, a different format is used for the log files (note that the actual log line is not folded).

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
failures [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

where interval_start is the start of the interval (as a Unix epoch time stamp), num_transactions is the number of transactions within the interval, sum_latency is the sum of the transaction latencies within the interval, sum_latency_2 is the sum of squares of the transaction latencies within the interval, min_latency is the minimum latency within the interval, and max_latency is the maximum latency within the interval, failures is the number of transactions that ended with a failed SQL command within the interval.

The next fields, sum_lag, sum_lag_2, min_lag, and max_lag, are only present if the --rate option is used. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, skipped, is only present if the --latency-limit option is used, too. It counts the number of transactions skipped because they would have started too late. The retried and retries fields are present only if the --max-tries option is not equal to 1. They report the number of retried transactions and the sum of all retries after serialization or deadlock errors within the interval. Each transaction is counted in the interval when it was committed.

Notice that while the plain (unaggregated) log file shows which script was used for each transaction, the aggregated log does not. Therefore if you need per-script data, you need to aggregate the data on your own.

Here is some example output:

1345828501 5601 1542744 483552416 61 2573 0
1345828503 7884 1979812 565806736 60 1479 0
1345828505 7208 1979422 567277552 59 1391 0
1345828507 7685 1980268 569784714 60 1398 0
1345828509 7073 1979779 573489941 236 1411 0

If you use option --failures-detailed, instead of the sum of all failed transactions you will get more detailed statistics for the failed transactions:

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
serialization_failures deadlock_failures [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

This is similar to the above, but here the single 'failures' figure is replaced by serialization_failures which is the number of transactions that got a serialization error and were not retried after this, deadlock_failures which is the number of transactions that got a deadlock error and were not retried after this. The other fields are as above. Here is some example output:

[example with detailed failures]

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it." (New York Times, about Microsoft PowerPoint)

#182

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Alvaro Herrera (#181)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

I think that the explanatory paragraph is way too long now, particularly
since it explains --failures-detailed starting in the middle. Also, the
example output doesn't include the failures-detailed mode.

I think the problem is not merely one of documentation, but one of
bad design. Up to now it was possible to tell what was what from
counting the number of columns in the output; but with this design,
that is impossible. That should be fixed. The first thing you have
got to do is drop the alternation { failures | serialization_failures
deadlock_failures }. That doesn't make any sense in the first place:
counting serialization and deadlock failures doesn't make it impossible
for other errors to occur. It'd probably make the most sense to have
three columns always, serialization, deadlock and total. Now maybe
that change alone is sufficient, but I'm not convinced, because the
multiple options at the end of the line mean we will never again be
able to add any more columns without reintroducing ambiguity. I
would be happier if the syntax diagram were such that columns could
only be dropped from right to left.

regards, tom lane

#183

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Alvaro Herrera (#181)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello,

On 2022-Mar-27, Tatsuo Ishii wrote:

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

You're showing an indentation, but looking at the HTML output there is
no such. Is the HTML processor eating leading whitespace or something
like that?

I just copied from my web browser screen (Firefox 98.0.2 on Ubuntu 20).

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#184

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#182)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

After:
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
{ failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries ]

I think that the explanatory paragraph is way too long now, particularly
since it explains --failures-detailed starting in the middle. Also, the
example output doesn't include the failures-detailed mode.

I think the problem is not merely one of documentation, but one of
bad design. Up to now it was possible to tell what was what from
counting the number of columns in the output; but with this design,
that is impossible. That should be fixed. The first thing you have
got to do is drop the alternation { failures | serialization_failures
deadlock_failures }. That doesn't make any sense in the first place:
counting serialization and deadlock failures doesn't make it impossible
for other errors to occur. It'd probably make the most sense to have
three columns always, serialization, deadlock and total.

+1.

Now maybe
that change alone is sufficient, but I'm not convinced, because the
multiple options at the end of the line mean we will never again be
able to add any more columns without reintroducing ambiguity. I
would be happier if the syntax diagram were such that columns could
only be dropped from right to left.

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries?

Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#185

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Tatsuo Ishii (#184)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag,
skipped, retried retries?

Anyway now that current CF is closing, it will not be possible to
change those logging design soon. Or can we change the logging design
even after CF is closed?

My 0.02ï¿œ: I'm not sure how the official guidelines are to be interpreted
in that case, but if the design is to be changed, ISTM that it is better
to do it before a release instead of letting the release out with one
format and changing it in the next release?

--
Fabien.

#186

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#184)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I think the problem is not merely one of documentation, but one of
bad design. Up to now it was possible to tell what was what from
counting the number of columns in the output; but with this design,
that is impossible. That should be fixed. The first thing you have
got to do is drop the alternation { failures | serialization_failures
deadlock_failures }. That doesn't make any sense in the first place:
counting serialization and deadlock failures doesn't make it impossible
for other errors to occur. It'd probably make the most sense to have
three columns always, serialization, deadlock and total.

+1.

Now maybe
that change alone is sufficient, but I'm not convinced, because the
multiple options at the end of the line mean we will never again be
able to add any more columns without reintroducing ambiguity. I
would be happier if the syntax diagram were such that columns could
only be dropped from right to left.

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries?

What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

failures:
always 0 (if --max-tries is 1, the default)
sum of serialization_failures and deadlock_failures (if --max-tries is not 1)

serialization_failures and deadlock_failures:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

retried and retries:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#187

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Tatsuo Ishii (#186)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag,
skipped, retried retries?

What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

My 0.02ï¿œ:

I agree that it would be better to have a more deterministic
aggregated log format.

ISTM that it should skip failures and lags if no fancy options has been
selected, i.e.:

[ fails ... retries [ sum_lag ... [ skipped ] ] ?

Alterlatively, as the failure stuff is added to the format, maybe it could
be at the end:

[ sum_lag ... [ skipped [ fails ... retries ] ] ] ?

failures:
always 0 (if --max-tries is 1, the default)
sum of serialization_failures and deadlock_failures (if --max-tries is not 1)

serialization_failures and deadlock_failures:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

retried and retries:
always 0 (if --max-tries is 1, the default)
0 or more (if --max-tries is not 1)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Fabien.

#188

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Fabien COELHO (#187)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Or those three columns always, sum_lag sum_lag_2, min_lag max_lag,
skipped, retried retries?

What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency
max_latency
failures serialization_failures deadlock_failures retried retries [
sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

My 0.02€:

I agree that it would be better to have a more deterministic
aggregated log format.

ISTM that it should skip failures and lags if no fancy options has
been selected, i.e.:

[ fails ... retries [ sum_lag ... [ skipped ] ] ?

Alterlatively, as the failure stuff is added to the format, maybe it
could be at the end:

[ sum_lag ... [ skipped [ fails ... retries ] ] ] ?

I like this one.

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
[sum_lag sum_lag_2 min_lag max_lag [ skipped [
failures serialization_failures deadlock_failures retried retries ] ] ]

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#189

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Fabien COELHO (#187)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

On 2022-Apr-03, Fabien COELHO wrote:

What about this? (a log line is not actually folded)
interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ]

My 0.02€:

I agree that it would be better to have a more deterministic aggregated log
format.

ISTM that it should skip failures and lags if no fancy options has been
selected, i.e.:

[ fails ... retries [ sum_lag ... [ skipped ] ] ?

I think it's easier to just say "if feature X is not enabled, then
columns XYZ are always zeroes".

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#190

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Alvaro Herrera (#189)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

My 0.02€:

I agree that it would be better to have a more deterministic aggregated log
format.

ISTM that it should skip failures and lags if no fancy options has been
selected, i.e.:

[ fails ... retries [ sum_lag ... [ skipped ] ] ?

I think it's easier to just say "if feature X is not enabled, then
columns XYZ are always zeroes".

Ok, I will come up with a patch in this direction.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#191

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Alvaro Herrera (#189)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I think it's easier to just say "if feature X is not enabled, then
columns XYZ are always zeroes".

+1, that's pretty much what I was thinking.

regards, tom lane

#192

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#190)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I think it's easier to just say "if feature X is not enabled, then
columns XYZ are always zeroes".

Ok, I will come up with a patch in this direction.

Please find attached patch for this.

With the patch, the log line is as follows (actually no line foldings of course):

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries

I updated the doc as well:

- fold the log line using line feed to avoid error in rendering PDF. I
did not use &zwsp; because it does not enhance HTML output.

- split explanation of the log output into multiple paragraphs to
enhance readability.

- replace the example output with full options are specified.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

pgbench-aggregate-log.patchtext/x-patch; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..e1f98ae228 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>
+<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <replaceable>skipped</replaceable>
+<replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> <replaceable>retried</replaceable> <replaceable>retries</replaceable>
 </synopsis>
 
    where
@@ -2417,41 +2419,55 @@ END;
    and
    <replaceable>max_latency</replaceable> is the maximum latency within the interval,
    <replaceable>failures</replaceable> is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   <option>--failures-detailed</option>, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   <replaceable>serialization_failures</replaceable> is the number of
-   transactions that got a serialization error and were not retried after this,
-   <replaceable>deadlock_failures</replaceable> is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  </para>
+  <para>
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
-   and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
-   option is used.
+   and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option>
+   option is used. Otherwise, they are all 0.0.
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
    The next field, <replaceable>skipped</replaceable>,
-   is only present if the <option>--latency-limit</option> option is used, too.
+   is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0.
    It counts the number of transactions skipped because they would have
    started too late.
-   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
-   fields are present only if the <option>--max-tries</option> option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   <replaceable>failures</replaceable> is the sum of all failed transactions.
+   If <option>--failures-detailed</option> is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If <option>--failures-detailed</option> is not
+   specified, <replaceable>serialization_failures</replaceable>
+   and <replaceable>deadlock_failures</replaceable> are always 0.
+  </para>
+  <para>
+   The <replaceable>retried</replaceable>
+   and <replaceable>retries</replaceable> fields are only meaningful if
+   the <option>--max-tries</option> option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
   </para>
 
   <para>
-   Here is some example output:
+   Here is some example output with following options:
 <screen>
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-</screen></para>
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+</screen>
+
+<screen>
+1649033235 5398 26186948 170797073304 1051 15347 2471626 6290343560 0 8261 0 3925 3925 0 7524 29534
+1649033245 5651 27345519 210270761637 1011 67780 2480555 6835066067 0 9999 496 3839 3839 0 7533 30118
+</screen>
+  </para>
 
   <para>
    Notice that while the plain (unaggregated) log file shows which script
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..aea27b1383 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st,
 
 		while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now)
 		{
+			double		lag_sum = 0.0;
+			double		lag_sum2 = 0.0;
+			double		lag_min = 0.0;
+			double		lag_max = 0.0;
+			int64		skipped = 0;
+			int64		serialization_failures = 0;
+			int64		deadlock_failures = 0;
+			int64		serialization_or_deadlock_failures = 0;
+			int64		retried = 0;
+			int64		retries = 0;
+
 			/* print aggregated report to logfile */
 			fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
 					agg->start_time / 1000000,	/* seconds since Unix epoch */
@@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st,
 					agg->latency.min,
 					agg->latency.max);
 
-			if (failures_detailed)
-				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-						agg->serialization_failures,
-						agg->deadlock_failures);
-			else
-				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
-
 			if (throttle_delay)
 			{
-				fprintf(logfile, " %.0f %.0f %.0f %.0f",
-						agg->lag.sum,
-						agg->lag.sum2,
-						agg->lag.min,
-						agg->lag.max);
-				if (latency_limit)
-					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+				lag_sum = agg->lag.sum;
+				lag_sum2 = agg->lag.sum2;
+				lag_min = agg->lag.min;
+				lag_max = agg->lag.max;
+			}
+			fprintf(logfile, " %.0f %.0f %.0f %.0f",
+						lag_sum,
+						lag_sum2,
+						lag_min,
+						lag_max);
+
+			if (latency_limit)
+				skipped = agg->skipped;
+			fprintf(logfile, " " INT64_FORMAT, skipped);
+
+			if (failures_detailed)
+			{
+				serialization_failures = agg->serialization_failures;
+				deadlock_failures = agg->deadlock_failures;
 			}
+			serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+					serialization_or_deadlock_failures,
+					serialization_failures,
+					deadlock_failures);
+
 			if (max_tries != 1)
-				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-						agg->retried,
-						agg->retries);
+			{
+				retried = agg->retried;
+				retries = agg->retries;
+			}
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries);
+
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */

#193

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Tatsuo Ishii (#192)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Tatsuo-san,

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries

I would suggest to reorder the last chunk to:

... retried retries failures serfail dlfail

because I intend to add connection failures handling at some point, and it
would make more sense to add the corresponding count at the end with other
fails.

--
Fabien Coelho - CRI, MINES ParisTech

#194

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Fabien COELHO (#193)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hi Fabien,

Hello Tatsuo-san,

interval_start num_transactions sum_latency sum_latency_2 min_latency
max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries

I would suggest to reorder the last chunk to:

... retried retries failures serfail dlfail

because I intend to add connection failures handling at some point,
and it would make more sense to add the corresponding count at the end
with other fails.

Ok, I have adjusted the patch. V2 patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

pgbench-aggregate-log-v2.patchtext/x-patch; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..d1818ff316 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> { <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable> <replaceable>retries</replaceable> </optional>
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>
+<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <replaceable>skipped</replaceable>
+<replaceable>retried</replaceable> <replaceable>retries</replaceable> <replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable>
 </synopsis>
 
    where
@@ -2417,41 +2419,55 @@ END;
    and
    <replaceable>max_latency</replaceable> is the maximum latency within the interval,
    <replaceable>failures</replaceable> is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   <option>--failures-detailed</option>, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   <replaceable>serialization_failures</replaceable> is the number of
-   transactions that got a serialization error and were not retried after this,
-   <replaceable>deadlock_failures</replaceable> is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  </para>
+  <para>
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
-   and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
-   option is used.
+   and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option>
+   option is used. Otherwise, they are all 0.0.
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
    The next field, <replaceable>skipped</replaceable>,
-   is only present if the <option>--latency-limit</option> option is used, too.
+   is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0.
    It counts the number of transactions skipped because they would have
    started too late.
-   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
-   fields are present only if the <option>--max-tries</option> option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   The <replaceable>retried</replaceable>
+   and <replaceable>retries</replaceable> fields are only meaningful if
+   the <option>--max-tries</option> option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   <replaceable>failures</replaceable> is the sum of all failed transactions.
+   If <option>--failures-detailed</option> is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If <option>--failures-detailed</option> is not
+   specified, <replaceable>serialization_failures</replaceable>
+   and <replaceable>deadlock_failures</replaceable> are always 0.
   </para>
 
   <para>
-   Here is some example output:
+   Here is some example output with following options:
 <screen>
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-</screen></para>
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+</screen>
+
+<screen>
+1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0
+1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0
+</screen>
+  </para>
 
   <para>
    Notice that while the plain (unaggregated) log file shows which script
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..4d4b979e4f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st,
 
 		while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now)
 		{
+			double		lag_sum = 0.0;
+			double		lag_sum2 = 0.0;
+			double		lag_min = 0.0;
+			double		lag_max = 0.0;
+			int64		skipped = 0;
+			int64		serialization_failures = 0;
+			int64		deadlock_failures = 0;
+			int64		serialization_or_deadlock_failures = 0;
+			int64		retried = 0;
+			int64		retries = 0;
+
 			/* print aggregated report to logfile */
 			fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
 					agg->start_time / 1000000,	/* seconds since Unix epoch */
@@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st,
 					agg->latency.min,
 					agg->latency.max);
 
-			if (failures_detailed)
-				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-						agg->serialization_failures,
-						agg->deadlock_failures);
-			else
-				fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
-
 			if (throttle_delay)
 			{
-				fprintf(logfile, " %.0f %.0f %.0f %.0f",
-						agg->lag.sum,
-						agg->lag.sum2,
-						agg->lag.min,
-						agg->lag.max);
-				if (latency_limit)
-					fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+				lag_sum = agg->lag.sum;
+				lag_sum2 = agg->lag.sum2;
+				lag_min = agg->lag.min;
+				lag_max = agg->lag.max;
 			}
+			fprintf(logfile, " %.0f %.0f %.0f %.0f",
+						lag_sum,
+						lag_sum2,
+						lag_min,
+						lag_max);
+
+			if (latency_limit)
+				skipped = agg->skipped;
+			fprintf(logfile, " " INT64_FORMAT, skipped);
+
 			if (max_tries != 1)
-				fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-						agg->retried,
-						agg->retries);
+			{
+				retried = agg->retried;
+				retries = agg->retries;
+			}
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries);
+
+			if (failures_detailed)
+			{
+				serialization_failures = agg->serialization_failures;
+				deadlock_failures = agg->deadlock_failures;
+			}
+			serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+					serialization_or_deadlock_failures,
+					serialization_failures,
+					deadlock_failures);
+
 			fputc('\n', logfile);
 
 			/* reset data and move to next interval */

#195

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tatsuo Ishii (#194)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

I would suggest to reorder the last chunk to:

... retried retries failures serfail dlfail

because I intend to add connection failures handling at some point,
and it would make more sense to add the corresponding count at the end
with other fails.

Ok, I have adjusted the patch. V2 patch attached.

Patch pushed. Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#196

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Tatsuo Ishii (#195)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

Patch pushed. Thanks.

The buildfarm is still complaining about the synopsis being too
wide for PDF format. I think what we ought to do is give up on
using a <synopsis> for log lines at all, and instead convert the
documentation into a tabular list of fields. Proposal attached,
which also fixes a couple of outright errors.

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so. If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)

regards, tom lane

Attachments:

change-log-line-docs-layout.patchtext/x-diff; charset=us-ascii; name=change-log-line-docs-layout.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9ba26e5e86..d12cbaa8ab 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2289,33 +2289,95 @@ END;
   </para>
 
   <para>
-   The format of the log is:
-
-<synopsis>
-<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable> <replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable> </optional> <optional> <replaceable>retries</replaceable> </optional>
-</synopsis>
-
-   where
-   <replaceable>client_id</replaceable> indicates which client session ran the transaction,
-   <replaceable>transaction_no</replaceable> counts how many transactions have been
-   run by that session,
-   <replaceable>time</replaceable> is the total elapsed transaction time in microseconds,
-   <replaceable>script_no</replaceable> identifies which script file was used (useful when
-   multiple scripts were specified with <option>-f</option> or <option>-b</option>),
-   and <replaceable>time_epoch</replaceable>/<replaceable>time_us</replaceable> are a
-   Unix-epoch time stamp and an offset
-   in microseconds (suitable for creating an ISO 8601
-   time stamp with fractional seconds) showing when
-   the transaction completed.
-   The <replaceable>schedule_lag</replaceable> field is the difference between the
-   transaction's scheduled start time, and the time it actually started, in
-   microseconds. It is only present when the <option>--rate</option> option is used.
+   Each line in a log file describes one transaction.
+   It contains the following space-separated fields:
+
+   <variablelist>
+    <varlistentry>
+     <term><replaceable>client_id</replaceable></term>
+     <listitem>
+      <para>
+       identifies the client session that ran the transaction
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>transaction_no</replaceable></term>
+     <listitem>
+      <para>
+       counts how many transactions have been run by that session
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>time</replaceable></term>
+     <listitem>
+      <para>
+       transaction's elapsed time, in microseconds
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>script_no</replaceable></term>
+     <listitem>
+      <para>
+       identifies the script file that was used for the transaction
+       (useful when multiple scripts are specified
+       with <option>-f</option> or <option>-b</option>)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>time_epoch</replaceable></term>
+     <listitem>
+      <para>
+       transaction's completion time, as a Unix-epoch time stamp
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>time_us</replaceable></term>
+     <listitem>
+      <para>
+       fractional-second part of transaction's completion time, in
+       microseconds
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>schedule_lag</replaceable></term>
+     <listitem>
+      <para>
+       difference between the transaction's scheduled start time and the
+       time it actually started, in microseconds
+       (present only if <option>--rate</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>retries</replaceable></term>
+     <listitem>
+      <para>
+       count of retries after serialization or deadlock errors during the
+       transaction
+       (present only if <option>--max-tries</option> is not equal to one)
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
    When both <option>--rate</option> and <option>--latency-limit</option> are used,
    the <replaceable>time</replaceable> for a skipped transaction will be reported as
    <literal>skipped</literal>.
-   <replaceable>retries</replaceable> is the sum of all retries after the
-   serialization or deadlock errors during the current script execution. It is
-   present only if the <option>--max-tries</option> option is not equal to 1.
    If the transaction ends with a failure, its <replaceable>time</replaceable>
    will be reported as <literal>failed</literal>. If you use the
    <option>--failures-detailed</option> option, the
@@ -2398,66 +2460,171 @@ END;
 
   <para>
    With the <option>--aggregate-interval</option> option, a different
-   format is used for the log files:
-
-<synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable> <replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>
-<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable> <replaceable>skipped</replaceable>
-<replaceable>retried</replaceable> <replaceable>retries</replaceable> <replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable>
-</synopsis>
-
-   where
-   <replaceable>interval_start</replaceable> is the start of the interval (as a Unix
-   epoch time stamp),
-   <replaceable>num_transactions</replaceable> is the number of transactions
-   within the interval,
-   <replaceable>sum_latency</replaceable> is the sum of the transaction
-   latencies within the interval,
-   <replaceable>sum_latency_2</replaceable> is the sum of squares of the
-   transaction latencies within the interval,
-   <replaceable>min_latency</replaceable> is the minimum latency within the interval,
-   and
-   <replaceable>max_latency</replaceable> is the maximum latency within the interval,
-   <replaceable>failures</replaceable> is the number of transactions that ended
-   with a failed SQL command within the interval.
-  </para>
-  <para>
-   The next fields,
-   <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
-   and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option>
-   option is used. Otherwise, they are all 0.0.
-   They provide statistics about the time each transaction had to wait for the
-   previous one to finish, i.e., the difference between each transaction's
-   scheduled start time and the time it actually started.
-   The next field, <replaceable>skipped</replaceable>,
-   is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0.
-   It counts the number of transactions skipped because they would have
-   started too late.
-  </para>
-  <para>
-   The <replaceable>retried</replaceable>
-   and <replaceable>retries</replaceable> fields are only meaningful if
-   the <option>--max-tries</option> option is not equal to 1. Otherwise they
-   are 0. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.  Each
-   transaction is counted in the interval when it was committed.
-  </para>
-  <para>
-   <replaceable>failures</replaceable> is the sum of all failed transactions.
-   If <option>--failures-detailed</option> is specified, instead of the sum of
-   all failed transactions you will get more detailed statistics for the
-   failed transactions grouped by the following types:
-   <replaceable>serialization_failures</replaceable> is the number of
-   transactions that got a serialization error and were not retried after this,
-   <replaceable>deadlock_failures</replaceable> is the number of transactions
-   that got a deadlock error and were not retried after this.
-   If <option>--failures-detailed</option> is not
-   specified, <replaceable>serialization_failures</replaceable>
-   and <replaceable>deadlock_failures</replaceable> are always 0.
+   format is used for the log files.  Each log line describes one
+   aggregation interval.  It contains the following space-separated
+   fields:
+
+   <variablelist>
+    <varlistentry>
+     <term><replaceable>interval_start</replaceable></term>
+     <listitem>
+      <para>
+       start time of the interval, as a Unix-epoch time stamp
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>num_transactions</replaceable></term>
+     <listitem>
+      <para>
+       number of transactions within the interval
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>sum_latency</replaceable></term>
+     <listitem>
+      <para>
+       sum of transaction latencies
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>sum_latency_2</replaceable></term>
+     <listitem>
+      <para>
+       sum of squares of transaction latencies
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>min_latency</replaceable></term>
+     <listitem>
+      <para>
+       minimum transaction latency
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>max_latency</replaceable></term>
+     <listitem>
+      <para>
+       maximum transaction latency
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>sum_lag</replaceable></term>
+     <listitem>
+      <para>
+       sum of transaction start delays
+       (zero unless <option>--rate</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>sum_lag_2</replaceable></term>
+     <listitem>
+      <para>
+       sum of squares of transaction start delays
+       (zero unless <option>--rate</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>min_lag</replaceable></term>
+     <listitem>
+      <para>
+       minimum transaction start delay
+       (zero unless <option>--rate</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>max_lag</replaceable></term>
+     <listitem>
+      <para>
+       maximum transaction start delay
+       (zero unless <option>--rate</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>skipped</replaceable></term>
+     <listitem>
+      <para>
+       number of transactions skipped because they would have started too late
+       (zero unless <option>--rate</option>
+       and <option>--latency-limit</option> are specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>retried</replaceable></term>
+     <listitem>
+      <para>
+       number of retried transactions
+       (zero unless <option>--max-tries</option> is not equal to one)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>retries</replaceable></term>
+     <listitem>
+      <para>
+       number of retries after serialization or deadlock errors
+       (zero unless <option>--max-tries</option> is not equal to one)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>failures</replaceable></term>
+     <listitem>
+      <para>
+       number of transactions that ended with a failed SQL command
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>serialization_failures</replaceable></term>
+     <listitem>
+      <para>
+       number of transactions that got a serialization error and were not
+       retried afterwards
+       (zero unless <option>--failures-detailed</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><replaceable>deadlock_failures</replaceable></term>
+     <listitem>
+      <para>
+       number of transactions that got a deadlock error and were not
+       retried afterwards
+       (zero unless <option>--failures-detailed</option> is specified)
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
   </para>
 
   <para>
-   Here is some example output with following options:
+   Here is some example output generated with these options:
 <screen>
 <userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
 --latency-limit=10 --failures-detailed --max-tries=10 test</userinput>
@@ -2468,8 +2635,8 @@ END;
   </para>
 
   <para>
-   Notice that while the plain (unaggregated) log file shows which script
-   was used for each transaction, the aggregated log does not. Therefore if
+   Notice that while the plain (unaggregated) log format shows which script
+   was used for each transaction, the aggregated format does not. Therefore if
    you need per-script data, you need to aggregate the data on your own.
   </para>

#197

Tatsuo Ishii

ishii@sraoss.co.jp

almost 4 years ago

In reply to: Tom Lane (#196)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

The buildfarm is still complaining about the synopsis being too
wide for PDF format. I think what we ought to do is give up on
using a <synopsis> for log lines at all, and instead convert the
documentation into a tabular list of fields. Proposal attached,
which also fixes a couple of outright errors.

Once I thought about that too. Looks good to me.

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so. If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

+1.

(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#198

Fabien COELHO

coelho@cri.ensmp.fr

almost 4 years ago

In reply to: Tom Lane (#196)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Hello Tom,

The buildfarm is still complaining about the synopsis being too
wide for PDF format. I think what we ought to do is give up on
using a <synopsis> for log lines at all, and instead convert the
documentation into a tabular list of fields. Proposal attached,
which also fixes a couple of outright errors.

Looks ok. Html doc generation ok.

While looking at the html outpout, the "pgbench" command line just below
wraps strangely:

pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
--latency-limit=10 --failures-detailed --max-tries=10 test

ISTM that there should be no nl in the <textinput>pgbench …</textinput>
section, although maybe it would trigger a complaint in the pdf format.

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so. If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

Ok. Fine with me. Possibly at some point there was the idea that there
could be other failures counted, but there are none. Also, there has been
questions about the failures detailed option, or whether the reports
should always be detailed, and the result may be some kind of not
convincing compromise.

(BTW, now that I've read this stuff I am quite horrified by how
the non-aggregated log format has been mangled for error retries,
and will be probably be submitting a proposal to change that.
But that's a different issue.)

Indeed, any improvement is welcome!

--
Fabien.

#199

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Fabien COELHO (#198)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Fabien COELHO <coelho@cri.ensmp.fr> writes:

While looking at the html outpout, the "pgbench" command line just below
wraps strangely:

pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
--latency-limit=10 --failures-detailed --max-tries=10 test

ISTM that there should be no nl in the <textinput>pgbench …</textinput>
section, although maybe it would trigger a complaint in the pdf format.

PDF wraps that text where it wants to anyway, so I removed the newline.

regards, tom lane

#200

Tatsuo Ishii

ishii@sraoss.co.jp

over 3 years ago

In reply to: Fabien COELHO (#198)

1 attachment(s)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

One thing that this doesn't fix is that the existing text appears
to suggest that the "failures" column is something different from
the sum of the serialization_failures and deadlock_failures
columns, which it's obvious from the code is not so. If this isn't
a code bug then I think we ought to just drop that column entirely,
because it's redundant.

Ok. Fine with me. Possibly at some point there was the idea that there
could be other failures counted, but there are none. Also, there has
been questions about the failures detailed option, or whether the
reports should always be detailed, and the result may be some kind of
not convincing compromise.

Attached is the patch to remove the column.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Attachments:

pgbench-remove-failures-column.patchtext/x-patch; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 387a836287..dbae4e2321 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2591,15 +2591,6 @@ END;
      </listitem>
     </varlistentry>
 
-    <varlistentry>
-     <term><replaceable>failures</replaceable></term>
-     <listitem>
-      <para>
-       number of transactions that ended with a failed SQL command
-      </para>
-     </listitem>
-    </varlistentry>
-
     <varlistentry>
      <term><replaceable>serialization_failures</replaceable></term>
      <listitem>
@@ -2629,8 +2620,8 @@ END;
 <screen>
 <userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test</userinput>
 
-1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0
-1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0
+1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0
+1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0
 </screen>
   </para>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e63cea56a1..f8bcb1ab6d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4498,7 +4498,6 @@ doLog(TState *thread, CState *st,
 			int64		skipped = 0;
 			int64		serialization_failures = 0;
 			int64		deadlock_failures = 0;
-			int64		serialization_or_deadlock_failures = 0;
 			int64		retried = 0;
 			int64		retries = 0;
 
@@ -4540,9 +4539,7 @@ doLog(TState *thread, CState *st,
 				serialization_failures = agg->serialization_failures;
 				deadlock_failures = agg->deadlock_failures;
 			}
-			serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
-			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
-					serialization_or_deadlock_failures,
+			fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
 					serialization_failures,
 					deadlock_failures);

#201

Fabien COELHO

coelho@cri.ensmp.fr

over 3 years ago

In reply to: Tatsuo Ishii (#200)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Ok. Fine with me. Possibly at some point there was the idea that there
could be other failures counted, but there are none. Also, there has
been questions about the failures detailed option, or whether the
reports should always be detailed, and the result may be some kind of
not convincing compromise.

Attached is the patch to remove the column.

Patch applies cleanly. Compilation ok. Global and local "make check" ok.
Doc build ok.

It find it a little annoying that there is no change in tests, it means
that the format is not checked at all:-(

--
Fabien.

#202

Tatsuo Ishii

ishii@sraoss.co.jp

over 3 years ago

In reply to: Fabien COELHO (#201)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

Ok. Fine with me. Possibly at some point there was the idea that there
could be other failures counted, but there are none. Also, there has
been questions about the failures detailed option, or whether the
reports should always be detailed, and the result may be some kind of
not convincing compromise.

Attached is the patch to remove the column.

Patch applies cleanly. Compilation ok. Global and local "make check"
ok.
Doc build ok.

Thank you for reviewing. Patch pushed.

It find it a little annoying that there is no change in tests, it
means that the format is not checked at all:-(

Yeah. Perhaps it's a little bit hard to perform this kind of tests in
the TAP test?

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#203

Fabien COELHO

coelho@cri.ensmp.fr

over 3 years ago

In reply to: Tatsuo Ishii (#202)

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

It find it a little annoying that there is no change in tests, it
means that the format is not checked at all:-(

Yeah. Perhaps it's a little bit hard to perform this kind of tests in
the TAP test?

Not really. I'll look into it.

--
Fabien.