Retry in pgbench

Started by Tatsuo Ishiialmost 5 years ago8 messages
#1Tatsuo Ishii
ishii@sraoss.co.jp

Currently standard pgbench scenario produces transaction serialize
errors "could not serialize access due to concurrent update" if
PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the
session aborts. In order to achieve meaningful results even in these
transaction isolation levels, I would like to propose an automatic
retry feature if "could not serialize access due to concurrent update"
error occurs.

Probably just adding a switch to retry is not enough, maybe retry
method (random interval etc.) and max retry number are needed to be
added.

I would like to hear your thoughts,

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Tatsuo Ishii (#1)
Re: Retry in pgbench

On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Currently standard pgbench scenario produces transaction serialize
errors "could not serialize access due to concurrent update" if
PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the
session aborts. In order to achieve meaningful results even in these
transaction isolation levels, I would like to propose an automatic
retry feature if "could not serialize access due to concurrent update"
error occurs.

Probably just adding a switch to retry is not enough, maybe retry
method (random interval etc.) and max retry number are needed to be
added.

I would like to hear your thoughts,

See also:

/messages/by-id/72a0d590d6ba06f242d75c2e641820ec@postgrespro.ru

#3Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Thomas Munro (#2)
Re: Retry in pgbench

On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Currently standard pgbench scenario produces transaction serialize
errors "could not serialize access due to concurrent update" if
PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the
session aborts. In order to achieve meaningful results even in these
transaction isolation levels, I would like to propose an automatic
retry feature if "could not serialize access due to concurrent update"
error occurs.

Probably just adding a switch to retry is not enough, maybe retry
method (random interval etc.) and max retry number are needed to be
added.

I would like to hear your thoughts,

See also:

/messages/by-id/72a0d590d6ba06f242d75c2e641820ec@postgrespro.ru

Thanks for the pointer. It seems we need to resume the discussion.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In reply to: Tatsuo Ishii (#3)
Re: Retry in pgbench

Hi,

On Tue, 13 Apr 2021 16:12:59 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

[...]
[...]
[...]

Thanks for the pointer. It seems we need to resume the discussion.

By the way, I've been playing with the idea of failing gracefully and retry
indefinitely (or until given -T) on SQL error AND connection issue.

It would be useful to test replicating clusters with a (switch|fail)over
procedure.

Regards,

#5Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Jehan-Guillaume de Rorthais (#4)
Re: Retry in pgbench

By the way, I've been playing with the idea of failing gracefully and retry
indefinitely (or until given -T) on SQL error AND connection issue.

It would be useful to test replicating clusters with a (switch|fail)over
procedure.

Interesting idea but in general a failover takes sometime (like a few
minutes), and it will strongly affect TPS. I think in the end it just
compares the failover time.

Or are you suggesting to ignore the time spent in failover?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tatsuo Ishii (#5)
Re: Retry in pgbench

It would be useful to test replicating clusters with a (switch|fail)over
procedure.

Interesting idea but in general a failover takes sometime (like a few
minutes), and it will strongly affect TPS. I think in the end it just
compares the failover time.

Or are you suggesting to ignore the time spent in failover?

Or simply to be able to measure it simply from a client perspective? How
much delay is introduced, how long is endured to go back to the previous
tps level…

My recollection of Marina patch is that it was non trivial, adding such a
new and interesting feature suggests a set of patches, not just one patch.

--
Fabien.

In reply to: Tatsuo Ishii (#5)
Re: Retry in pgbench

On Fri, 16 Apr 2021 10:28:48 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

By the way, I've been playing with the idea of failing gracefully and retry
indefinitely (or until given -T) on SQL error AND connection issue.

It would be useful to test replicating clusters with a (switch|fail)over
procedure.

Interesting idea but in general a failover takes sometime (like a few
minutes), and it will strongly affect TPS. I think in the end it just
compares the failover time.

This usecase is not about benchmarking. It's about generating constant trafic
to be able to practice/train some [auto]switchover procedures while being close
to production activity.

In this contexte, a max-saturated TPS of one node is not relevant. But being
able to add some stats about downtime might be a good addition.

Regards,

#8Tatsuo Ishii
ishii@sraoss.co.jp
In reply to: Jehan-Guillaume de Rorthais (#7)
Re: Retry in pgbench

This usecase is not about benchmarking. It's about generating constant trafic
to be able to practice/train some [auto]switchover procedures while being close
to production activity.

In this contexte, a max-saturated TPS of one node is not relevant. But being
able to add some stats about downtime might be a good addition.

Oh I see. That makes sense.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp