Suggestion to add --continue-client-on-abort option to pgbench

Started by ikedarintarof10 months ago102 messages

ikedarintarof@oss.nttdata.com

10 months ago

Hi hackers,

I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.

For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.

The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.

```
% bin/pgbench -d postgres -f ../insert_to_unique_column.sql -T 10
--failures-detailed --continue-client-on-error
transaction type: ../custom_script_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33552
number of failed transactions: 21901 (39.495%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 21901 (39.495%)
latency average = 0.180 ms (including failures)
initial connection time = 2.857 ms
tps = 3356.092385 (without initial connection time)
```

I have attached the patch. I would appreciate your feedback.

Best regards,

Rintaro Ikeda
NTT DATA Corporation Japan

Stepan Neretin

slpmcf@gmail.com

10 months ago

In reply to: ikedarintarof (#1)

Re: Suggestion to add --continue-client-on-abort option to pgbench

On Sat, May 10, 2025 at 8:45 PM ikedarintarof <ikedarintarof@oss.nttdata.com>
wrote:

Hi hackers,

I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.

For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.

The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.

```
% bin/pgbench -d postgres -f ../insert_to_unique_column.sql -T 10
--failures-detailed --continue-client-on-error
transaction type: ../custom_script_insert.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 33552
number of failed transactions: 21901 (39.495%)
number of serialization failures: 0 (0.000%)
number of deadlock failures: 0 (0.000%)
number of other failures: 21901 (39.495%)
latency average = 0.180 ms (including failures)
initial connection time = 2.857 ms
tps = 3356.092385 (without initial connection time)
```

I have attached the patch. I would appreciate your feedback.

Best regards,

Rintaro Ikeda
NTT DATA Corporation Japan

Hi Rintaro,

Thanks for the patch and explanation. I understand your goal is to ensure
that pgbench clients continue running even when some transactions fail due
to application-level errors (e.g., constraint violations), especially when
running custom scripts.

However, I wonder if the intended behavior can't already be achieved using
standard SQL constructs — specifically ON CONFLICT or careful transaction
structure. For example, your sample script:

CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));

can be rewritten as:

\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;

This avoids transaction aborts entirely in the presence of uniqueness
violations and ensures the client continues to issue load without
interruption. In many real-world benchmarking scenarios, this is the
preferred and simplest approach.

So from that angle, could you elaborate on specific cases where this
SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from using
SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?

Best regards,

Stepan Neretin

Matthias van de Meent

boekewurm+postgres@gmail.com

10 months ago

In reply to: Stepan Neretin (#2)

Re: Suggestion to add --continue-client-on-abort option to pgbench

On Sat, May 10, 2025 at 8:45 PM ikedarintarof <ikedarintarof@oss.nttdata.com> wrote:

Hi hackers,

I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.

For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.

The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.

+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.

On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:

INSERT INTO test (col2) VALUES (random(0, 50000));

can be rewritten as:

\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;

That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.

This avoids transaction aborts entirely in the presence of uniqueness violations and ensures the client continues to issue load without interruption. In many real-world benchmarking scenarios, this is the preferred and simplest approach.

So from that angle, could you elaborate on specific cases where this SQL-level workaround wouldn't be sufficient? Are there error types you intend to handle that cannot be gracefully avoided or recovered from using SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?

The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Stepan Neretin

slpmcf@gmail.com

10 months ago

In reply to: Matthias van de Meent (#3)

Re: Suggestion to add --continue-client-on-abort option to pgbench

On Sun, May 11, 2025 at 7:07 PM Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:

On Sat, May 10, 2025 at 8:45 PM ikedarintarof <

ikedarintarof@oss.nttdata.com> wrote:

Hi hackers,

I would like to suggest adding a new option to pgbench, which enables
the client to continue processing transactions even if some errors occur
during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or deadlocks. I
think in some cases, especially when using custom scripts, the client
should be able to rollback the failed transaction and start a new one.

For example, my custom script (insert_to_unique_column.sql) follows:
```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5 clients
for a certain period of time. However, a client sometimes stops when its
transaction in my custom script is aborted due to a check constraint
violation. As a result, the load on the server is lower than expected,
which is the problem I want to address.

The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back the
failed transaction and starts a new one. This allows all 5 clients to
continuously apply load to the server, even if some transactions fail.

+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.

On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:

INSERT INTO test (col2) VALUES (random(0, 50000));

can be rewritten as:

\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;

That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.

This avoids transaction aborts entirely in the presence of uniqueness

violations and ensures the client continues to issue load without
interruption. In many real-world benchmarking scenarios, this is the
preferred and simplest approach.

So from that angle, could you elaborate on specific cases where this

SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from using
SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?

The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Hi Matthias,

Thanks for your detailed explanation — it really helped clarify the
usefulness of the patch. I agree that the feature is indeed valuable, and
it's great to see it being pushed forward.

Regarding the patch code, I noticed that there are duplicate case entries
in the command-line option handling (specifically for case 18 or case
ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option). These
duplicated cases can be merged to simplify the logic and reduce redundancy.

Best regards,
Stepan Neretin

Noname

Rintaro.Ikeda@nttdata.com

10 months ago

In reply to: ikedarintarof (#1)

Re: Suggestion to add --continue-client-on-abort option to pgbench

Hi Stepan and Matthias,

Thank you both for your replies. I agree with Matthias's detailed explanation regarding the purpose of the patch.

Regarding the patch code, I noticed that there are duplicate case
entries in the command-line option handling (specifically for case 18
or case ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option).
These duplicated cases can be merged to simplify the logic and reduce
redundancy.

I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.

Best regards,
Rintaro Ikeda

On Sat, May 10, 2025 at 8:45 PM ikedarintarof

<ikedarintarof@oss.nttdata.com> wrote:

Hi hackers,

I would like to suggest adding a new option to pgbench, which

enables

the client to continue processing transactions even if some errors

occur

during a transaction.
Currently, a client stops sending requests when its transaction is
aborted due to reasons other than serialization failures or

deadlocks. I

think in some cases, especially when using custom scripts, the

client

should be able to rollback the failed transaction and start a new

one.

For example, my custom script (insert_to_unique_column.sql)

follows:

```
CREATE TABLE IF NOT EXISTS test (col1 serial, col2 int unique);
INSERT INTO test (col2) VALUES (random(0, 50000));
```
Assume we need to continuously apply load to the server using 5

clients

for a certain period of time. However, a client sometimes stops

when its

transaction in my custom script is aborted due to a check

constraint

violation. As a result, the load on the server is lower than

expected,

which is the problem I want to address.

The proposed new option solves this problem. When
--continue-client-on-abort is set to true, the client rolls back

the

failed transaction and starts a new one. This allows all 5 clients

to

continuously apply load to the server, even if some transactions

fail.

+1. I've had similar cases before too, where I'd wanted pgbench to
continue creating load on the server even if a transaction failed
server-side for any reason. Sometimes, I'd even want that type of
load.

On Sat, 10 May 2025 at 17:02, Stepan Neretin <slpmcf@gmail.com> wrote:

INSERT INTO test (col2) VALUES (random(0, 50000));

can be rewritten as:

\setrandom val 0 50000
INSERT INTO test (col2) VALUES (:val) ON CONFLICT DO NOTHING;

That won't test the same execution paths, so an option to explicitly
rollback or ignore failed transactions (rather than stopping the
benchmark) would be a nice feature.
With e.g. ON CONFLICT DO NOTHING you'll have much higher workload if
there are many conflicting entries, as that triggers and catches
per-row errors, rather than per-statement. E.g. INSERT INTO ... SELECT
...multiple rows could conflict on multiple rows, but will fail on the
first conflict. DO NOTHING would cause full execution of the SELECT
statement, which has an inherently different performance profile.

This avoids transaction aborts entirely in the presence of

uniqueness violations and ensures the client continues to issue load
without interruption. In many real-world benchmarking scenarios, this
is the preferred and simplest approach.

So from that angle, could you elaborate on specific cases where this

SQL-level workaround wouldn't be sufficient? Are there error types you
intend to handle that cannot be gracefully avoided or recovered from
using SQL constructs like ON CONFLICT, or SAVEPOINT/ROLLBACK TO?

The issue isn't necessarily whether you can construct SQL scripts that
don't raise such errors (indeed, it's possible to do so for nearly any
command; you can run pl/pgsql procedures or DO blocks which catch and
ignore errors), but rather whether we can make pgbench function in a
way that can keep load on the server even when it notices an error.

Kind regards,

Matthias van de Meent
Neon (https://urldefense.com/v3/__https://neon.tech__;!!GCTRfqYYOYGmgK_z!92h3wOeDsDRQ1abfcL8-tRZqrAQ0w5RXwLNofOa_guIgDHdYknrizKqUGkvSn1_OU-xzMRv2halvtpUX7BFE8e3aPO_1-CZDhQ$ )

Hi Matthias,

Thanks for your detailed explanation — it really helped clarify the
usefulness of the patch. I agree that the feature is indeed valuable,
and it's great to see it being pushed forward.

Regarding the patch code, I noticed that there are duplicate case
entries in the command-line option handling (specifically for case 18 or
case ESTATUS_OTHER_SQL_ERROR, the continue-client-on-error option).
These duplicated cases can be merged to simplify the logic and reduce
redundancy.

Best regards,
Stepan Neretin

Import Notes

Reply to msg id not found: ee7279ebf35bd0f638ebd37a7e13cb43@oss.nttdata.com

Dilip Kumar

dilipbalaut@gmail.com

10 months ago

In reply to: Noname (#5)

Re: Suggestion to add --continue-client-on-abort option to pgbench

On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com> wrote:

I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.

This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.

1. You need to update the stats for this new counter in the
"accumStats()" function.

2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".

3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Srinath Reddy Sadipiralla

srinath2133@gmail.com

10 months ago

In reply to: Dilip Kumar (#6)

Re: Suggestion to add --continue-client-on-abort option to pgbench

Hi,

On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com> wrote:

I also appreciate you for pointing out my mistakes in the previous

version of the patch. I fixed the duplicated lines. I’ve attached the
updated patch.

This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.

1. You need to update the stats for this new counter in the
"accumStats()" function.

2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".

3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.

Hi, +1 for the idea. I’ve reviewed and tested the patch. Aside from Dilip’s
feedback and the missing usage information for this option, the patch LGTM.

Here's the diff for the missing usage information for this option and as
Dilip mentioned updating the new counter in the "accumStats()" function.

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index baaf1379be2..20d456bc4b9 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -959,6 +959,8 @@ usage(void)
                   "  --log-prefix=PREFIX      prefix for transaction time
log file\n"
                   "                           (default: \"pgbench_log\")\n"
                   "  --max-tries=NUM          max number of tries to run
transaction (default: 1)\n"
+                  "  --continue-client-on-error\n"
+                  "                           Continue and retry
transactions that failed due to errors other than serialization or
deadlocks.\n"
                   "  --progress-timestamp     use Unix epoch timestamps
for progress\n"
                   "  --random-seed=SEED       set random seed (\"time\",
\"rand\", integer)\n"
                   "  --sampling-rate=NUM      fraction of transactions to
log (e.g., 0.01 for 1%%)\n"
@@ -1522,6 +1524,9 @@ accumStats(StatsData *stats, bool skipped, double
lat, double lag,
                case ESTATUS_DEADLOCK_ERROR:
                        stats->deadlock_failures++;
                        break;
+               case ESTATUS_OTHER_SQL_ERROR:
+                       stats->other_sql_failures++;
+                       break;
                default:
                        /* internal error which should never occur */
                        pg_fatal("unexpected error status: %d", estatus);
-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

ikedarintarof

ikedarintarof@oss.nttdata.com

9 months ago

In reply to: Srinath Reddy Sadipiralla (#7)

Re: Suggestion to add --continue-client-on-abort option to pgbench

Hi, hakers.

On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com <mailto:dilipbalaut@gmail.com>> wrote:

1. You need to update the stats for this new counter in the
"accumStats()" function.

2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".

3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.

On May 14, 2025, at 18:08, Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:

Here's the diff for the missing usage information for this option and as Dilip mentioned updating the new counter in the "accumStats()" function.

Thank you very much for the helpful comments, and apologies for my delayed reply.

I've updated the patch based on your suggestions:
- Modified name of the option.
- Added the missing explanation.
- Updated the new counter in the `accumStats()` function as pointed out.
- Fixed the whitespace issues.

Additionally, I've included documentation for the new option.

I'm submitting this updated patch to the current CommitFest.

Best Regards,
Rintaro Ikeda

Show quoted text

On May 14, 2025, at 18:08, Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:

Hi,

On Tue, May 13, 2025 at 11:27 AM Dilip Kumar <dilipbalaut@gmail.com <mailto:dilipbalaut@gmail.com>> wrote:

On Tue, May 13, 2025 at 9:20 AM <Rintaro.Ikeda@nttdata.com <mailto:Rintaro.Ikeda@nttdata.com>> wrote:

I also appreciate you for pointing out my mistakes in the previous version of the patch. I fixed the duplicated lines. I’ve attached the updated patch.

This is a useful feature, so +1 from my side. Here are some initial
comments on the patch while having a quick look.

1. You need to update the stats for this new counter in the
"accumStats()" function.

2. IMHO, " continue-on-error " is more user-friendly than
"continue-client-on-error".

3. There are a lot of whitespace errors, so those can be fixed. You
can just try to apply using git am, and it will report those
whitespace warnings. And for fixing, you can just use
"--whitespace=fix" along with git am.

Hi, +1 for the idea. I’ve reviewed and tested the patch. Aside from Dilip’s feedback and the missing usage information for this option, the patch LGTM.

Here's the diff for the missing usage information for this option and as Dilip mentioned updating the new counter in the "accumStats()" function.
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index baaf1379be2..20d456bc4b9 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -959,6 +959,8 @@ usage(void)
"  --log-prefix=PREFIX      prefix for transaction time log file\n"
"                           (default: \"pgbench_log\")\n"
"  --max-tries=NUM          max number of tries to run transaction (default: 1)\n"
+                  "  --continue-client-on-error\n"
+                  "                           Continue and retry transactions that failed due to errors other than serialization or deadlocks.\n"
"  --progress-timestamp     use Unix epoch timestamps for progress\n"
"  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
"  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
@@ -1522,6 +1524,9 @@ accumStats(StatsData *stats, bool skipped, double lat, double lag,
case ESTATUS_DEADLOCK_ERROR:
stats->deadlock_failures++;
break;
+               case ESTATUS_OTHER_SQL_ERROR:
+                       stats->other_sql_failures++;
+                       break;
default:
/* internal error which should never occur */
pg_fatal("unexpected error status: %d", estatus);
-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

ikedarintarof

ikedarintarof@oss.nttdata.com

9 months ago

In reply to: ikedarintarof (#8)

Re: Suggestion to add --continue-client-on-abort option to pgbench

Hi, Hackers.

I've attached the patch that I failed to include in my previous email.
(I'm still a bit confused about how to attach files using the standard
Mail client on macOS.)

Best Regards,
Rintaro Ikeda

#10

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

9 months ago

In reply to: ikedarintarof (#9)

RE: Suggestion to add --continue-client-on-abort option to pgbench

Dear Ikeda-san,

Thanks for starting the new thread! I have never known the issue before I heard at
PGConf.dev.

Few comments:

1.
This parameter seems a type of benchmark option. So should we set
benchmarking_option_set as well?

2.
Not sure, but exit-on-abort seems a similar option. What if both are specified?
Is it allowed?

3.
Can you consider a test case for the new parameter?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#11

ikedarintarof

ikedarintarof@oss.nttdata.com

9 months ago

In reply to: Hayato Kuroda (Fujitsu) (#10)

Re: Suggestion to add --continue-client-on-abort option to pgbench

Dear Kuroda-san, hackers,

On 2025/06/04 21:57, Hayato Kuroda (Fujitsu) wrote:

Dear Ikeda-san,

Thanks for starting the new thread! I have never known the issue before I heard at
PGConf.dev.

Few comments:

1.
This parameter seems a type of benchmark option. So should we set
benchmarking_option_set as well?

2.
Not sure, but exit-on-abort seems a similar option. What if both are specified?
Is it allowed?

3.
Can you consider a test case for the new parameter?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Thank you for your valuable comment!

1. I should've also set benchmarking_option_set. I've modified it accordingly.

2. The exit-on-abort option and continue-on-error option are mutually exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options are
set simultaneously. Corresponding explanation was also added.
(I'm wondering the name of parameter should be continue-on-abort so that users
understand the two option are mutually exclusive.)

3. I've added the test.

Additionally, I modified the patch so that st->state does not transition to
CSTATE_RETRY when a transaction fails and continue-on-error option is enabled.
In the previous patch, we retry the failed transaction up to max-try times,
which is unnecessary for our purpose: clients does not exit when its
transactions fail.

I've attached the updated patch.
v3-0001-Add-continue-on-error-option-to-pgbench.patch is identical to
v4-0001-Add-continue-on-error-option-to-pgbench.patch. The v4-0002 patch is the
diff from the previous patch.

Best regards,
Rintaro Ikeda

#12

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

9 months ago

In reply to: ikedarintarof (#11)

RE: Suggestion to add --continue-client-on-abort option to pgbench

Dear Ikeda-san,

Thanks for updating the patch!

1. I should've also set benchmarking_option_set. I've modified it accordingly.

Confirmed it has been fixed. Thanks.

2. The exit-on-abort option and continue-on-error option are mutually exclusive.
Therefore, I've updated the patch to throw a FATAL error when two options are
set simultaneously. Corresponding explanation was also added.
(I'm wondering the name of parameter should be continue-on-abort so that users
understand the two option are mutually exclusive.)

Make sense, +1.

Here are new comments.

01. build failure
According to the cfbot [1], the documentation cannot be built. IIUC </para> seemed
to be missed here:
```
+       <para>
+        Note that this option can not be used together with
+        <option>--exit-on-abort</option>.
+      </listitem>
+     </varlistentry>
```

02. patch separation
How about separating the patch series like:

0001 - contains option handling and retry part, and documentation
0002 - contains accumulation/reporting part
0003 - contains tests.

I hope above style is more helpful for reviewers.

03. documentation
```
+        Note that this option can not be used together with
+        <option>--exit-on-abort</option>.
```

I feel we should add the similar description in `exit-on-abort` part.

04. documentation
```
+        Client rolls back the failed transaction and starts a new one when its
+        transaction fails due to the reason other than the deadlock and
+        serialization failure. This allows all clients specified with -c option
+        to continuously apply load to the server, even if some transactions fail.
```

I feel the description contains bit redundant part and misses the default behavior.
How about:
```
<para>
Clients survive when their transactions are aborted, and they continue
their run. Without the option, clients exit when transactions they run
are aborted.
</para>
<para>
Note that serialization failures or deadlock failures do not abort the
client, so they are not affected by this option.
See <xref linkend="failures-and-retries"/> for more information.
</para>
```

05. StatsData
```
+        * When continue-on-error option is specified,
+        * failed (the number of failed transactions) =
+        *   'other_sql_failures' (they got a error when continue-on-error option
+        *                                                 was specified).
```

Let me confirm one point; can serialization_failures and deadlock_failures be
counted when continue-on-error is true? If so, the comment seems not correct for me.
The formula can be 'serialization_failures' + 'deadlock_failures' + 'deadlock_failures'
in the case.

06. StatsData
Another point; can other_sql_failures be counted when the continue-on-error is NOT
specified? I feel it should be...

06. usage()
Added line is too long. According to program_help_ok(), the output by help should
be less than 80.

07.
Please run pgindent/pgperltidy, I got some diffs.

[1]: https://cirrus-ci.com/task/5210061275922432

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Suggestion to add --continue-client-on-abort option to pgbench

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: