Suggestion to add --continue-client-on-abort option to pgbench
Hi hackers,
I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.
For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.
The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.
```
% bin/pgbench -d postgres -f ../insert_to_unique_column.sql -T 10
--failures-detailed --continue-client-on-error
transaction type: ../custom_script_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33552
number of failed transactions: 21901 (39.495%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 21901 (39.495%)
latency average = 0.180 ms (including failures)
initial connection time = 2.857 ms
tps = 3356.092385 (without initial connection time)
```
I have attached the patch. I would appreciate your feedback.
Best regards,
Rintaro Ikeda
NTT DATA Corporation Japan
Attachments:
0001-add-continue-client-on-error-option-to-pgbench.patchtext/x-diff; charset=us-ascii; name=0001-add-continue-client-on-error-option-to-pgbench.patchDownload+37-5
On Sat, May 10, 2025 at 8:45 PM ikedarintarof <ikedarintarof@oss.nttdata.com>
wrote:
Hi hackers,
I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.```
% bin/pgbench -d postgres -f ../insert_to_unique_column.sql -T 10
--failures-detailed --continue-client-on-error
transaction type: ../custom_script_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33552
number of failed transactions: 21901 (39.495%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 21901 (39.495%)
latency average = 0.180 ms (including failures)
initial connection time = 2.857 ms
tps = 3356.092385 (without initial connection time)
```I have attached the patch. I would appreciate your feedback.
Best regards,
Rintaro Ikeda
NTT DATA Corporation Japan
Hi Rintaro,
Thanks for the patch and explanation. I understand your goal is to ensure
that pgbench clients continue running even when some transactions fail due
to application-level errors (e.g., constraint violations), especially when
running custom scripts.
However, I wonder if the intended behavior can't already be achieved using
standard SQL constructs — specifically ON CONFLICT or careful transaction
structure. For example, your sample script:
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
can be rewritten as:
\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;
This avoids transaction aborts entirely in the presence of uniqueness
violations and ensures the client continues to issue load without
interruption. In many real-world benchmarking scenarios, this is the
preferred and simplest approach.
So from that angle, could you elaborate on specific cases where this
SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from using
SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?
Best regards,
Stepan Neretin
On Sat, May 10, 2025 at 8:45 PM ikedarintarof <ikedarintarof@oss.nttdata.com> wrote:
Hi hackers,
I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.
+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.
On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:
INSERT INTO test (col2) VALUES (random(0, 50000));
can be rewritten as:
\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;
That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.
This avoids transaction aborts entirely in the presence of uniqueness violations and ensures the client continues to issue load without interruption. In many real-world benchmarking scenarios, this is the preferred and simplest approach.
So from that angle, could you elaborate on specific cases where this SQL-level workaround wouldn't be sufficient? Are there error types you intend to handle that cannot be gracefully avoided or recovered from using SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?
The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
On Sun, May 11, 2025 at 7:07 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:
On Sat, May 10, 2025 at 8:45 PM ikedarintarof <
ikedarintarof@oss.nttdata.com> wrote:
Hi hackers,
I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:
INSERT INTO test (col2) VALUES (random(0, 50000));
can be rewritten as:
\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.This avoids transaction aborts entirely in the presence of uniqueness
violations and ensures the client continues to issue load without
interruption. In many real-world benchmarking scenarios, this is the
preferred and simplest approach.So from that angle, could you elaborate on specific cases where this
SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from using
SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Hi Matthias,
Thanks for your detailed explanation — it really helped clarify the
usefulness of the patch. I agree that the feature is indeed valuable, and
it's great to see it being pushed forward.
Regarding the patch code, I noticed that there are duplicate case entries
in the command-line option handling (specifically for case 18 or case
ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option). These
duplicated cases can be merged to simplify the logic and reduce redundancy.
Best regards,
Stepan Neretin
Hi Stepan and Matthias,
Thank you both for your replies. I agree with Matthias's detailed explanation regarding the purpose of the patch.
Regarding the patch code, I noticed that there are duplicate case
entries in the command-line option handling (specifically for case 18
or case ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option).
These duplicated cases can be merged to simplify the logic and reduce
redundancy.
I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.
Best regards,
Rintaro Ikeda
On Sat, May 10, 2025 at 8:45 PM ikedarintarof
<ikedarintarof@oss.nttdata.com> wrote:
Hi hackers,
I would like to suggest adding a new option to pgbench, which
enables
the client to continue processing transactions even if some errors
occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures ordeadlocks. I
think in some cases, especially when using custom scripts, the
client
should be able to rollback the failed transaction and start a new
one.
For example, my custom script (insert_to_unique_column.sql)
follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5clients
for a certain period of time. However, a client sometimes stops
when its
transaction in my custom script is aborted due to a check
constraint
violation. As a result, the load on the server is lower than
expected,
which is the problem I want to address.
The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls backthe
failed transaction and starts a new one. This allows all 5 clients
to
continuously apply load to the server, even if some transactions
fail.
+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:
INSERT INTO test (col2) VALUES (random(0, 50000));
can be rewritten as:
\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.This avoids transaction aborts entirely in the presence of
uniqueness violations and ensures the client continues to issue load
without interruption. In many real-world benchmarking scenarios, this
is the preferred and simplest approach.So from that angle, could you elaborate on specific cases where this
SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from
using SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.Kind regards,
Matthias van de Meent
Neon (https://urldefense.com/v3/__https://neon.tech__;!!GCTRfqYYOYGmgK_z!92h3wOeDsDRQ1abfcL8-tRZqrAQ0w5RXwLNofOa_guIgDHdYknrizKqUGkvSn1_OU-xzMRv2halvtpUX7BFE8e3aPO_1-CZDhQ$ )
Hi Matthias,
Thanks for your detailed explanation — it really helped clarify the
usefulness of the patch. I agree that the feature is indeed valuable,
and it's great to see it being pushed forward.
Regarding the patch code, I noticed that there are duplicate case
entries in the command-line option handling (specifically for case 18 or
case ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option).
These duplicated cases can be merged to simplify the logic and reduce
redundancy.
Best regards,
Stepan Neretin
Attachments:
0001-add-continue-client-on-error-option-to-pgbench_ver2.patchapplication/octet-stream; name=0001-add-continue-client-on-error-option-to-pgbench_ver2.patchDownload+30-5
Import Notes
Reply to msg id not found: ee7279ebf35bd0f638ebd37a7e13cb43@oss.nttdata.com
On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com> wrote:
I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.
This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.
1. You need to update the stats for this new counter in the
"accumStats()" function.
2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".
3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi,
On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com> wrote:
I also appreciate you for pointing out my mistakes in the previous
version of the patch. I fixed the duplicated lines. I’ve attached the
updated patch.This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.1. You need to update the stats for this new counter in the
"accumStats()" function.2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.
Hi, +1 for the idea. I’ve reviewed and tested the patch. Aside from Dilip’s
feedback and the missing usage information for this option, the patch LGTM.
Here's the diff for the missing usage information for this option and as
Dilip mentioned updating the new counter in the "accumStats()" function.
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index baaf1379be2..20d456bc4b9 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -959,6 +959,8 @@ usage(void)
" --log-prefix=PREFIX prefix for transaction time
log file\n"
" (default: \"pgbench_log\")\n"
" --max-tries=NUM max number of tries to run
transaction (default: 1)\n"
+ " --continue-client-on-error\n"
+ " Continue and retry
transactions that failed due to errors other than serialization or
deadlocks.\n"
" --progress-timestamp use Unix epoch timestamps
for progress\n"
" --random-seed=SEED set random seed (\"time\",
\"rand\", integer)\n"
" --sampling-rate=NUM fraction of transactions to
log (e.g., 0.01 for 1%%)\n"
@@ -1522,6 +1524,9 @@ accumStats(StatsData *stats, bool skipped, double
lat, double lag,
case ESTATUS_DEADLOCK_ERROR:
stats->deadlock_failures++;
break;
+ case ESTATUS_OTHER_SQL_ERROR:
+ stats->other_sql_failures++;
+ break;
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Hi, hakers.
On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com <mailto:dilipbalaut@gmail.com>> wrote:
1. You need to update the stats for this new counter in the
"accumStats()" function.2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.
On May 14, 2025, at 18:08, Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Here's the diff for the missing usage information for this option and as Dilip mentioned updating the new counter in the "accumStats()" function.
Thank you very much for the helpful comments, and apologies for my delayed reply.
I've updated the patch based on your suggestions:
- Modified name of the option.
- Added the missing explanation.
- Updated the new counter in the `accumStats()` function as pointed out.
- Fixed the whitespace issues.
Additionally, I've included documentation for the new option.
I'm submitting this updated patch to the current CommitFest.
Best Regards,
Rintaro Ikeda

Show quoted text
On May 14, 2025, at 18:08, Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Hi,
On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com <mailto:dilipbalaut@gmail.com>> wrote:
On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com <mailto:Rintaro.Ikeda@nttdata.com>> wrote:
I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.
This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.1. You need to update the stats for this new counter in the
"accumStats()" function.2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.Hi, +1 for the idea. I’ve reviewed and tested the patch. Aside from Dilip’s feedback and the missing usage information for this option, the patch LGTM.
Here's the diff for the missing usage information for this option and as Dilip mentioned updating the new counter in the "accumStats()" function.
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index baaf1379be2..20d456bc4b9 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -959,6 +959,8 @@ usage(void) " --log-prefix=PREFIX prefix for transaction time log file\n" " (default: \"pgbench_log\")\n" " --max-tries=NUM max number of tries to run transaction (default: 1)\n" + " --continue-client-on-error\n" + " Continue and retry transactions that failed due to errors other than serialization or deadlocks.\n" " --progress-timestamp use Unix epoch timestamps for progress\n" " --random-seed=SEED set random seed (\"time\", \"rand\", integer)\n" " --sampling-rate=NUM fraction of transactions to log (e.g., 0.01 for 1%%)\n" @@ -1522,6 +1524,9 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag, case ESTATUS_DEADLOCK_ERROR: stats->deadlock_failures++; break; + case ESTATUS_OTHER_SQL_ERROR: + stats->other_sql_failures++; + break; default: /* internal error which should never occur */ pg_fatal("unexpected error status: %d", estatus); -- Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/
Attachments:
v3-0001-Add-continue-on-error-option-to-pgbench.patchapplication/octet-stream; name=v3-0001-Add-continue-on-error-option-to-pgbench.patch; x-unix-mode=0644Download+47-5
Hi, Hackers.
I've attached the patch that I failed to include in my previous email.
(I'm still a bit confused about how to attach files using the standard
Mail client on macOS.)
Best Regards,
Rintaro Ikeda
Attachments:
v3-0001-Add-continue-on-error-option-to-pgbench.patchtext/x-diff; name=v3-0001-Add-continue-on-error-option-to-pgbench.patchDownload+47-5
Dear Ikeda-san,
Thanks for starting the new thread! I have never known the issue before I heard at
PGConf.dev.
Few comments:
1.
This parameter seems a type of benchmark option. So should we set
benchmarking_option_set as well?
2.
Not sure, but exit-on-abort seems a similar option. What if both are specified?
Is it allowed?
3.
Can you consider a test case for the new parameter?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Kuroda-san, hackers,
On 2025/06/04 21:57, Hayato Kuroda (Fujitsu) wrote:
Dear Ikeda-san,
Thanks for starting the new thread! I have never known the issue before I heard at
PGConf.dev.Few comments:
1.
This parameter seems a type of benchmark option. So should we set
benchmarking_option_set as well?2.
Not sure, but exit-on-abort seems a similar option. What if both are specified?
Is it allowed?3.
Can you consider a test case for the new parameter?Best regards,
Hayato Kuroda
FUJITSU LIMITED
Thank you for your valuable comment!
1. I should've also set benchmarking_option_set. I've modified it accordingly.
2. The exit-on-abort option and continue-on-error option are mutually exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options are
set simultaneously. Corresponding explanation was also added.
(I'm wondering the name of parameter should be continue-on-abort so that users
understand the two option are mutually exclusive.)
3. I've added the test.
Additionally, I modified the patch so that st->state does not transition to
CSTATE_RETRY when a transaction fails and continue-on-error option is enabled.
In the previous patch, we retry the failed transaction up to max-try times,
which is unnecessary for our purpose: clients does not exit when its
transactions fail.
I've attached the updated patch.
v3-0001-Add-continue-on-error-option-to-pgbench.patch is identical to
v4-0001-Add-continue-on-error-option-to-pgbench.patch. The v4-0002 patch is the
diff from the previous patch.
Best regards,
Rintaro Ikeda
Attachments:
v4-0001-Add-continue-on-error-option-to-pgbench.patchtext/plain; charset=UTF-8; name=v4-0001-Add-continue-on-error-option-to-pgbench.patchDownload+47-5
v4-0002-1.-Do-not-retry-failed-transaction-due-to-other_sql_.patchtext/plain; charset=UTF-8; name=v4-0002-1.-Do-not-retry-failed-transaction-due-to-other_sql_.patchDownload+45-7
Dear Ikeda-san,
Thanks for updating the patch!
1. I should've also set benchmarking_option_set. I've modified it accordingly.
Confirmed it has been fixed. Thanks.
2. The exit-on-abort option and continue-on-error option are mutually exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options are
set simultaneously. Corresponding explanation was also added.
(I'm wondering the name of parameter should be continue-on-abort so that users
understand the two option are mutually exclusive.)
Make sense, +1.
Here are new comments.
01. build failure
According to the cfbot [1], the documentation cannot be built. IIUC </para> seemed
to be missed here:
```
+ <para>
+ Note that this option can not be used together with
+ <option>--exit-on-abort</option>.
+ </listitem>
+ </varlistentry>
```
02. patch separation
How about separating the patch series like:
0001 - contains option handling and retry part, and documentation
0002 - contains accumulation/reporting part
0003 - contains tests.
I hope above style is more helpful for reviewers.
03. documentation
```
+ Note that this option can not be used together with
+ <option>--exit-on-abort</option>.
```
I feel we should add the similar description in `exit-on-abort` part.
04. documentation
```
+ Client rolls back the failed transaction and starts a new one when its
+ transaction fails due to the reason other than the deadlock and
+ serialization failure. This allows all clients specified with -c option
+ to continuously apply load to the server, even if some transactions fail.
```
I feel the description contains bit redundant part and misses the default behavior.
How about:
```
<para>
Clients survive when their transactions are aborted, and they continue
their run. Without the option, clients exit when transactions they run
are aborted.
</para>
<para>
Note that serialization failures or deadlock failures do not abort the
client, so they are not affected by this option.
See <xref linkend="failures-and-retries"/> for more information.
</para>
```
05. StatsData
```
+ * When continue-on-error option is specified,
+ * failed (the number of failed transactions) =
+ * 'other_sql_failures' (they got a error when continue-on-error option
+ * was specified).
```
Let me confirm one point; can serialization_failures and deadlock_failures be
counted when continue-on-error is true? If so, the comment seems not correct for me.
The formula can be 'serialization_failures' + 'deadlock_failures' + 'deadlock_failures'
in the case.
06. StatsData
Another point; can other_sql_failures be counted when the continue-on-error is NOT
specified? I feel it should be...
06. usage()
Added line is too long. According to program_help_ok(), the output by help should
be less than 80.
07.
Please run pgindent/pgperltidy, I got some diffs.
[1]: https://cirrus-ci.com/task/5210061275922432
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Mon, 9 Jun 2025 09:34:03 +0000
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
2. The exit-on-abort option and continue-on-error option are mutually exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options are
set simultaneously. Corresponding explanation was also added.
I don't think that's right since "abort" and "error" are different concept in pgbench.
(Here, "abort" refers to the termination of a client, not a transaction abort.)
The --exit-on-abort option forces to exit pgbench immediately when any client is aborted
due to some error. When the --continue-on-error option is not set, SQL errors other than
deadlock or serialization error cause a client to be aborted. On the other hand, when the option
is set, clients are not aborted due to any SQL errors; instead they continue to run after them.
However, clients can still be aborted for other reasons, such as connection failures or
meta-command errors (e.g., \set x 1/0). In these cases, the --exit-on-abort option remains
useful even when --continue-on-error is enabled.
(I'm wondering the name of parameter should be continue-on-abort so that users
understand the two option are mutually exclusive.)
For the same reason as above, I believe --continue-on-error is a more accurate description
of the option's behavior.
02. patch separation
How about separating the patch series like:0001 - contains option handling and retry part, and documentation
0002 - contains accumulation/reporting part
0003 - contains tests.I hope above style is more helpful for reviewers.
I'm not sure whether it's necessary to split the patch, as the change doesn't seem very
complex. However, the current separation appears inconsistent. For example, patch 0001
modifies canRetryError(), but patch 0002 reverts that change, and so on.
04. documentation ``` + Client rolls back the failed transaction and starts a new one when its + transaction fails due to the reason other than the deadlock and + serialization failure. This allows all clients specified with -c option + to continuously apply load to the server, even if some transactions fail. ```I feel the description contains bit redundant part and misses the default behavior.
How about:
```
<para>
Clients survive when their transactions are aborted, and they continue
their run. Without the option, clients exit when transactions they run
are aborted.
</para>
<para>
Note that serialization failures or deadlock failures do not abort the
client, so they are not affected by this option.
See <xref linkend="failures-and-retries"/> for more information.
</para>
```
I think we can make it clearer as follows:
Allows clients to continue their run even if an SQL statement fails due to errors other
than serialization or deadlock. Without this option, the client is aborted after
such errors.
Note that serialization and deadlock failures never cause the client to be aborted,
so they are not affected by this option. See <xref linkend="failures-and-retries"/>
for more information.
That said, a review by a native English speaker would still be appreciated.
Also, we would need to update several parts of the documentation. For example, the
"Failures and Serialization/Deadlock Retries" section should be revised to describe the
behavior change. In addition, we should update the explanations of output result examples
and logging, the description of the --failures-detailed option, and so on.
If transactions are not retried after SQL errors other than serialization or deadlock,
this should also be explicitly documented.
05. StatsData ``` + * When continue-on-error option is specified, + * failed (the number of failed transactions) = + * 'other_sql_failures' (they got a error when continue-on-error option + * was specified). ```Let me confirm one point; can serialization_failures and deadlock_failures be
counted when continue-on-error is true? If so, the comment seems not correct for me.
The formula can be 'serialization_failures' + 'deadlock_failures' + 'deadlock_failures'
in the case.
+1
06. StatsData
Another point; can other_sql_failures be counted when the continue-on-error is NOT
specified? I feel it should be...
We could do that. However, if an SQL error other than a serialization or deadlock error
occurs when --continue-on-error is not set, pgbench will be aborted midway and the printed
results will be incomplete. Therefore, this might not make much sense.
06. usage()
Added line is too long. According to program_help_ok(), the output by help should
be less than 80.
+1
Here are additional comments from me.
@@ -4548,6 +4570,8 @@ getResultString(bool skipped, EStatus estatus)
return "serialization";
case ESTATUS_DEADLOCK_ERROR:
return "deadlock";
+ case ESTATUS_OTHER_SQL_ERROR:
+ return "error (except serialization/deadlock)";
Strings returned by getResultString() are printed in the "time" field of the
log when both the -l and --failures-detailed options are set. Therefore, they
should be single words that do not contain any space characters. I wonder if
something like "other" or "other_sql_error" would be appropriate.
@@ -4099,6 +4119,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
* can retry the error.
*/
st->state = timer_exceeded ? CSTATE_FINISHED :
+ continue_on_error ? CSTATE_FAILURE :
doRetry(st, &now) ? CSTATE_RETRY : CSTATE_FAILURE;
}
else
This fix is not necessary because doRetry() (and canRetryError(), which is called
within it) will return false when continue_on_error is set (after applying patch 0002).
case PGRES_NONFATAL_ERROR:
case PGRES_FATAL_ERROR:
st->estatus = getSQLErrorStatus(PQresultErrorField(res,
PG_DIAG_SQLSTATE));
if (canRetryError(st->estatus))
{
if (verbose_errors)
commandError(st, PQerrorMessage(st->con));
goto error;
}
/* fall through */
default:
/* anything else is unexpected */
pg_log_error("client %d script %d aborted in command %d query %d: %s",
st->id, st->use_file, st->command, qrynum,
PQerrorMessage(st->con));
goto error;
}
When an SQL error other than a serialization or deadlock error occurs, an error message is
output via pg_log_error in this code path. However, I think this should be reported only
when verbose_errors is set, similar to how serialization and deadlock errors are handled when
--continue-on-error is enabled
Best regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Dear Nagata-san,
2. The exit-on-abort option and continue-on-error option are mutually
exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options
are
set simultaneously. Corresponding explanation was also added.
I don't think that's right since "abort" and "error" are different concept in pgbench.
(Here, "abort" refers to the termination of a client, not a transaction abort.)The --exit-on-abort option forces to exit pgbench immediately when any client is
aborted
due to some error. When the --continue-on-error option is not set, SQL errors
other than
deadlock or serialization error cause a client to be aborted. On the other hand,
when the option
is set, clients are not aborted due to any SQL errors; instead they continue to run
after them.
However, clients can still be aborted for other reasons, such as connection
failures or
meta-command errors (e.g., \set x 1/0). In these cases, the --exit-on-abort option
remains
useful even when --continue-on-error is enabled.
To clarify: another approach is that allow --continue-on-error option to continue
running even when clients meet such errors. Which one is better?
02. patch separation
How about separating the patch series like:0001 - contains option handling and retry part, and documentation
0002 - contains accumulation/reporting part
0003 - contains tests.I hope above style is more helpful for reviewers.
I'm not sure whether it's necessary to split the patch, as the change doesn't seem
very
complex. However, the current separation appears inconsistent. For example,
patch 0001
modifies canRetryError(), but patch 0002 reverts that change, and so on.
Either way is fine for me if they are changed from the current method.
04. documentation ``` + Client rolls back the failed transaction and starts a new one when its + transaction fails due to the reason other than the deadlock and + serialization failure. This allows all clients specified with -c option + to continuously apply load to the server, even if some transactionsfail.
```
I feel the description contains bit redundant part and misses the default
behavior.
How about:
```
<para>
Clients survive when their transactions are aborted, and they continue
their run. Without the option, clients exit when transactions they run
are aborted.
</para>
<para>
Note that serialization failures or deadlock failures do not abort the
client, so they are not affected by this option.
See <xref linkend="failures-and-retries"/> for more information.
</para>
```I think we can make it clearer as follows:
I do not have confident for English, native speaker is needed....
06. usage()
Added line is too long. According to program_help_ok(), the output by helpshould
be less than 80.
+1
FYI - I posted a patch which adds the test. You can apply and confirm how the function says.
[1]: /messages/by-id/OSCPR01MB1496610451F5896375B2562E6F56BA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Tue, 17 Jun 2025 03:47:00 +0000
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
Dear Nagata-san,
2. The exit-on-abort option and continue-on-error option are mutually
exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options
are
set simultaneously. Corresponding explanation was also added.
I don't think that's right since "abort" and "error" are different concept in pgbench.
(Here, "abort" refers to the termination of a client, not a transaction abort.)The --exit-on-abort option forces to exit pgbench immediately when any client is
aborted
due to some error. When the --continue-on-error option is not set, SQL errors
other than
deadlock or serialization error cause a client to be aborted. On the other hand,
when the option
is set, clients are not aborted due to any SQL errors; instead they continue to run
after them.
However, clients can still be aborted for other reasons, such as connection
failures or
meta-command errors (e.g., \set x 1/0). In these cases, the --exit-on-abort option
remains
useful even when --continue-on-error is enabled.To clarify: another approach is that allow --continue-on-error option to continue
running even when clients meet such errors. Which one is better?
It might be worth discussing which types of errors this option should allow pgbench
to continue after. On my understand the current patch's goal is to allow only SQL
level errors like comstraint violations. It seems good because this could simulate
behaviour of applications that ignore or retry such errors (although they are not
retried in the current patch). Perhaps, it makes sense to allow to continue after
some network errors because it would enable benchmarks usign a cluster system as a
cloud service that could report a temporary error during a failover.
It might be worth discussing which types of errors this option should allow pgbench to
continue after.
As I understand it, the current patch aims to allow continuation only after SQL-level
errors, such as constraint violations. That seems reasonable, as it can simulate the
behavior of applications that ignore or retry such errors (even though retries are not
implemented in the current patch).
Perhaps it also makes sense to allow continuation after certain network errors, as this
would enable benchmarking with cluster systems or cloud services, which might report
temporary errors during a failover. We would need additional work to properly detect
and handle network errors, though.
However, I'm not sure it's reasonable to allow continuation after other types of errors,
such as misuse of meta-commands or unexpected errors during their execution, since these
wouldn't simulate any real application behavior and would more likely indicate a failure
in the benchmarking process itself.
Best regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 17 Jun 2025 16:28:28 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
On Tue, 17 Jun 2025 03:47:00 +0000
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:Dear Nagata-san,
2. The exit-on-abort option and continue-on-error option are mutually
exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options
are
set simultaneously. Corresponding explanation was also added.
I don't think that's right since "abort" and "error" are different concept in pgbench.
(Here, "abort" refers to the termination of a client, not a transaction abort.)The --exit-on-abort option forces to exit pgbench immediately when any client is
aborted
due to some error. When the --continue-on-error option is not set, SQL errors
other than
deadlock or serialization error cause a client to be aborted. On the other hand,
when the option
is set, clients are not aborted due to any SQL errors; instead they continue to run
after them.
However, clients can still be aborted for other reasons, such as connection
failures or
meta-command errors (e.g., \set x 1/0). In these cases, the --exit-on-abort option
remains
useful even when --continue-on-error is enabled.To clarify: another approach is that allow --continue-on-error option to continue
running even when clients meet such errors. Which one is better?It might be worth discussing which types of errors this option should allow pgbench
to continue after. On my understand the current patch's goal is to allow only SQL
level errors like comstraint violations. It seems good because this could simulate
behaviour of applications that ignore or retry such errors (although they are not
retried in the current patch). Perhaps, it makes sense to allow to continue after
some network errors because it would enable benchmarks usign a cluster system as a
cloud service that could report a temporary error during a failover.
I apologize for accidentally leaving the draft paragraph just above in my previous post.
Please ignore it.
It might be worth discussing which types of errors this option should allow pgbench to
continue after.As I understand it, the current patch aims to allow continuation only after SQL-level
errors, such as constraint violations. That seems reasonable, as it can simulate the
behavior of applications that ignore or retry such errors (even though retries are not
implemented in the current patch).Perhaps it also makes sense to allow continuation after certain network errors, as this
would enable benchmarking with cluster systems or cloud services, which might report
temporary errors during a failover. We would need additional work to properly detect
and handle network errors, though.However, I'm not sure it's reasonable to allow continuation after other types of errors,
such as misuse of meta-commands or unexpected errors during their execution, since these
wouldn't simulate any real application behavior and would more likely indicate a failure
in the benchmarking process itself.Best regards,
Yugo Nagata--
Yugo Nagata <nagata@sraoss.co.jp>
--
Yugo Nagata <nagata@sraoss.co.jp>
Dear Nagata-san,
As I understand it, the current patch aims to allow continuation only after
SQL-level
errors, such as constraint violations. That seems reasonable, as it can simulate
the
behavior of applications that ignore or retry such errors (even though retries are
not
implemented in the current patch).
Yes, no one has objections to retry in this case. This is a main part of the proposal.
However, I'm not sure it's reasonable to allow continuation after other types of
errors,
such as misuse of meta-commands or unexpected errors during their execution,
since these
wouldn't simulate any real application behavior and would more likely indicate a
failure
in the benchmarking process itself.
I have a concern for \gset metacommand.
According to the doc and source code, \gset assumed that executed command surely
returns a tuple:
```
if (meta == META_GSET && ntuples != 1)
{
/* under \gset, report the error */
pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
st->id, st->use_file, st->command, qrynum, PQntuples(res));
st->estatus = ESTATUS_META_COMMAND_ERROR;
goto error;
}
```
But sometimes the SQL may not be able to return tuples or return multiple ones due
to the concurrent transactions. I feel retrying the transaction is very useful
in this case.
Anyway, we must confirm the opinion from the proposer.
[1]: https://github.com/ryogrid/tpcc_like_with_pgbench
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Thu, 26 Jun 2025 05:45:12 +0000
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
Dear Nagata-san,
As I understand it, the current patch aims to allow continuation only after
SQL-level
errors, such as constraint violations. That seems reasonable, as it can simulate
the
behavior of applications that ignore or retry such errors (even though retries are
not
implemented in the current patch).Yes, no one has objections to retry in this case. This is a main part of the proposal.,
As I understand it, the proposed --continue-on-error option does not retry the transaction
in any case; it simply gives up on the transaction. That is, when an SQL-level error occurs,
the transaction is reported as "failed" rather than "retried", and the random state is discarded.
However, I'm not sure it's reasonable to allow continuation after other types of
errors,
such as misuse of meta-commands or unexpected errors during their execution,
since these
wouldn't simulate any real application behavior and would more likely indicate a
failure
in the benchmarking process itself.I have a concern for \gset metacommand.
According to the doc and source code, \gset assumed that executed command surely
returns a tuple:```
if (meta == META_GSET && ntuples != 1)
{
/* under \gset, report the error */
pg_log_error("client %d script %d command %d query %d: expected one row, got %d",
st->id, st->use_file, st->command, qrynum, PQntuples(res));
st->estatus = ESTATUS_META_COMMAND_ERROR;
goto error;
}
```But sometimes the SQL may not be able to return tuples or return multiple ones due
to the concurrent transactions. I feel retrying the transaction is very useful
in this case.
You can use \aset command instead to avoid the error of pgbench. If the query doesn't
return any row, a subsecuent SQL command trying to use the varialbe will fail, but this
would be ignored without terminating the benchmark when the --coneinue-on-error option
enabled.
Anyway, we must confirm the opinion from the proposer.
+1
Best regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Hi,
Thank you very much for your valuable comments and kind advice. I'm
currently working on revising the previous patch based on the feedback
received. I would like to share my thoughts regarding the conditions
under which the --continue-on-error option should initiate a new
transaction or a new connection.
In my opinion, when the --continue-on-error option is enabled, pgbench
clients does not need to start new transactions after network errors and
other errors except for SQL-level errors.
Network errors are relatively rare, except in failover scenarios.
Outside of failover, any network issues should be resolved rather than
worked around. In the context of failover, the key metric is not TPS,
but system downtime. While one might infer the timing of a failover by
observing by using --progress option, you can easily determine the
downtime by executing simple SQL query such as `psql -c 'SELECT 1` every
second.
On 2025/06/26 18:47, Yugo Nagata wrote:
On Thu, 26 Jun 2025 05:45:12 +0000
"Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:Dear Nagata-san,
As I understand it, the current patch aims to allow continuation only
after
SQL-level
errors, such as constraint violations. That seems reasonable, as it
can simulate
the
behavior of applications that ignore or retry such errors (even
though retries are
not
implemented in the current patch).Yes, no one has objections to retry in this case. This is a main part
of the proposal.,As I understand it, the proposed --continue-on-error option does not
retry the transaction
in any case; it simply gives up on the transaction. That is, when an
SQL-level error occurs,
the transaction is reported as "failed" rather than "retried", and the
random state is discarded.
Retrying the failed transaction is not necessary when the transaction
failed due to SQL-level errors. Unlike real-world applications, pgbench
does not need to complete specific transaction successfully. In the case
of unique constraint violations, retrying the same transaction will
likely to result in the same error again.
I want to hear your thoughts on this.
Best regards,
Rintaro Ikeda
On Fri, 27 Jun 2025 14:06:24 +0900
ikedarintarof <ikedarintarof@oss.nttdata.com> wrote:
Hi,
Thank you very much for your valuable comments and kind advice. I'm
currently working on revising the previous patch based on the feedback
received. I would like to share my thoughts regarding the conditions
under which the --continue-on-error option should initiate a new
transaction or a new connection.In my opinion, when the --continue-on-error option is enabled, pgbench
clients does not need to start new transactions after network errors and
other errors except for SQL-level errors.
+1
I agree that --continue-on-error prevents pgbench from terminating only when
SQL-level errors occur, and does not change the behavior in the case of other
types of errors, including network errors.
As I understand it, the proposed --continue-on-error option does not
retry the transaction
in any case; it simply gives up on the transaction. That is, when an
SQL-level error occurs,
the transaction is reported as "failed" rather than "retried", and the
random state is discarded.Retrying the failed transaction is not necessary when the transaction
failed due to SQL-level errors. Unlike real-world applications, pgbench
does not need to complete specific transaction successfully. In the case
of unique constraint violations, retrying the same transaction will
likely to result in the same error again.
Agreed.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>