[PATCH] pgbench: add multiconnect option

Started by David Christensenalmost 5 years ago20 messageshackers
Jump to latest
#1David Christensen
david.christensen@crunchydata.com

-hackers,

This patch adds the concept of "multiconnect" to pgbench (better
terminology welcome). The basic idea here is to allow connections made
with pgbench to use different auth values or connect to multiple
databases. We implement this using a user-provided PGSERVICEFILE and
choosing a PGSERVICE from this based on a number of strategies.
(Currently the only supported strategies are round robin or random.)

There is definite room for improvement here; at the very least, teaching
`pgbench -i` about all of the distinct DBs referenced in this service
file would ensure that initialization works as expected in all places.
For now, we are punting initialization to the user in this version of
the patch if using more that one database in the given service file.

Best,

David

Attachments:

pgbench-add-multiconnect.patchtext/x-patchDownload+217-2
#2Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Christensen (#1)
Re: [PATCH] pgbench: add multiconnect option

Hello David,

This patch adds the concept of "multiconnect" to pgbench (better
terminology welcome).

Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…

The basic idea here is to allow connections made with pgbench to use
different auth values or connect to multiple databases. We implement
this using a user-provided PGSERVICEFILE and choosing a PGSERVICE from
this based on a number of strategies. (Currently the only supported
strategies are round robin or random.)

I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…

Your approach using PGSERVICEFILE also make sense!

Maybe it could be simplified, the code base reduced, and provide more
benefits, by mixing both ideas.

In particular, pgbench parses the file but then it will be read also by
libpq, yuk yuk.

Also, I do not like that PGSERVICE is overriden by pgbench, while other
options are passed with the parameters approach in doConnect. It would
make proce sense to add a "service" field to the parameters for
consistency, if this approach was to be pursued.

On reflexion, I'd suggest to use the --conninfo (or some other name)
approach, eg "pgbench --conninfo='service=s1' --conninfo='service=s2'" and
users just have to set PGSERVICEFILE env themselves, which I think is
better than pgbench overriding env variables behind their back.

This allow to have a service file with more connections and just tell
pgbench which ones to use, which is the expected way to use this feature.
This drops file parsing.

I can only see benefit to this simplified approach.
What do you think?

About the patch:

There are warnings about trailing whitespaces when applying the patch, and
there are some tabbing issues in the file.

I would not consume "-g" option unless there is some logical link with the
feature. I'd be okay with "-m" if it is still needed. I would suggest to
use it for the choice strategy?

stringinfo: We already have PQExpBuffer imported, could we use that
instead? Having two set of struct/functions which do the same in the same
source file does not look like a good idea. If we do not parse the file,
nothing is needed, which is a relief.

Attached is my work-in-progress start at adding conninfo, that would need
to be improved with strategies.

--
Fabien.

Attachments:

pgbench-multi-connect-conninfo-1.patchtext/x-diff; name=pgbench-multi-connect-conninfo-1.patchDownload+86-6
#3Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#2)
Re: [PATCH] pgbench: add multiconnect option

On Thu, Jul 01, 2021 at 12:22:45PM +0200, Fabien COELHO wrote:

Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…

round-robin and random make sense. I am wondering how round-robin
would work with -C, though? Would you just reuse the same connection
string as the one chosen at the starting point.

I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…

That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for. That's a bit confusing
though as pgbench does not support directly connection strings, and we
should be careful to keep fallback_application_name intact.

Your approach using PGSERVICEFILE also make sense!

I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string. I think that you'd
better leave libpq do all the work related to a service file, if
specified. pgbench does not need to know any of that.
--
Michael

#4Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#3)
Re: [PATCH] pgbench: add multiconnect option

Bonjour Michaël,

Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…

round-robin and random make sense. I am wondering how round-robin
would work with -C, though? Would you just reuse the same connection
string as the one chosen at the starting point.

Well, not necessarily, but this is debatable.

I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…

That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for. That's a bit confusing
though as pgbench does not support directly connection strings,

They are supported because libpq silently assumes that "dbname" can be a
full connection string.

and we should be careful to keep fallback_application_name intact.

Hmmm. See attached patch, ISTM that it does the right thing.

Your approach using PGSERVICEFILE also make sense!

I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string. I think that you'd
better leave libpq do all the work related to a service file, if
specified. pgbench does not need to know any of that.

Yes, this is an inconvenient with this approach, part of libpq machinery
is more or less replicated in pgbench, which is quite annoying, and less
powerful.

Attached my work-in-progress version, with a few open issues (eg probably
not thread safe), but comments about the provided feature are welcome.

I borrowed the "strategy" option, renamed policy, from the initial patch.
Pgbench just accepts several connection strings as parameters, eg:

pgbench ... "service=db1" "service=db2" "service=db3"

The next stage is to map scripts to connections types and connections
to connection types, so that pgbench could run W transactions against a
primary and R transactions agains a hot standby, for instance. I have a
some design for that, but nothing is implemented.

There is also the combination with the error handling patch to consider:
if a connection fails, a connection to a replica could be issued instead.

--
Fabien.

Attachments:

pgbench-multi-connect-conninfo-2.patchtext/x-diff; name=pgbench-multi-connect-conninfo-2.patchDownload+151-21
#5David Christensen
david.christensen@crunchydata.com
In reply to: Fabien COELHO (#4)
Re: [PATCH] pgbench: add multiconnect option

Good. I was thinking of adding such capability, possibly for handling
connection errors and reconnecting…

round-robin and random make sense. I am wondering how round-robin
would work with -C, though? Would you just reuse the same connection
string as the one chosen at the starting point.

Well, not necessarily, but this is debatable.

My expectation for such a behavior would be that it would reconnect to
a random connstring each time, otherwise what's the point of using
this with -C? If we needed to forbid some option combinations that is
also an option.

I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…

That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for. That's a bit confusing
though as pgbench does not support directly connection strings,

They are supported because libpq silently assumes that "dbname" can be a
full connection string.

and we should be careful to keep fallback_application_name intact.

Hmmm. See attached patch, ISTM that it does the right thing.

I guess the multiple --conninfo approach is fine; I personally liked
having the list come from a file, as you could benchmark different
groups/clusters based on a file, much easier than constructing
multiple pgbench invocations depending. I can see an argument for
both approaches. The PGSERVICEFILE was an idea I'd had to store
easily indexed groups of connection information in a way that I didn't
need to know all the details, could easily parse, and could later pass
in the ENV so libpq could just pull out the information.

Your approach using PGSERVICEFILE also make sense!

I am not sure that's actually needed here, as it is possible to pass
down a service name within a connection string. I think that you'd
better leave libpq do all the work related to a service file, if
specified. pgbench does not need to know any of that.

Yes, this is an inconvenient with this approach, part of libpq machinery
is more or less replicated in pgbench, which is quite annoying, and less
powerful.

There is some small fraction reproduced here just to pull out the
named sections; no other parsing should be done though.

Attached my work-in-progress version, with a few open issues (eg probably
not thread safe), but comments about the provided feature are welcome.

I borrowed the "strategy" option, renamed policy, from the initial patch.
Pgbench just accepts several connection strings as parameters, eg:

pgbench ... "service=db1" "service=db2" "service=db3"

The next stage is to map scripts to connections types and connections
to connection types, so that pgbench could run W transactions against a
primary and R transactions agains a hot standby, for instance. I have a
some design for that, but nothing is implemented.

There is also the combination with the error handling patch to consider:
if a connection fails, a connection to a replica could be issued instead.

I'll see if I can take a look at your latest patch. I was also
wondering about how we should handle `pgbench -i` with multiple
connection strings; currently it would only initialize with the first
DSN it gets, but it probably makes sense to run initialize against all
of the databases (or at least attempt to). Maybe this is one argument
for the multiple --conninfo handling, since you could explicitly pass
the databases you want. (Not that it is hard to just loop over
connection info and `pgbench -i` with ENV, or any other number of ways
to accomplish the same thing.)

Best,

David

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Christensen (#5)
Re: [PATCH] pgbench: add multiconnect option

Hello David,

round-robin and random make sense. I am wondering how round-robin
would work with -C, though? Would you just reuse the same connection
string as the one chosen at the starting point.

Well, not necessarily, but this is debatable.

My expectation for such a behavior would be that it would reconnect to
a random connstring each time, otherwise what's the point of using
this with -C? If we needed to forbid some option combinations that is
also an option.

Yep. ISTM that it should follow the connection policy/strategy, what ever
it is.

I was thinking of providing a allowing a list of conninfo strings with
repeated options, eg --conninfo "foo" --conninfo "bla"…

That was my first thought when reading the subject of this thread:
create a list of connection strings and pass one of them to
doConnect() to grab the properties looked for. That's a bit confusing
though as pgbench does not support directly connection strings,

They are supported because libpq silently assumes that "dbname" can be a
full connection string.

and we should be careful to keep fallback_application_name intact.

Hmmm. See attached patch, ISTM that it does the right thing.

I guess the multiple --conninfo approach is fine; I personally liked
having the list come from a file, as you could benchmark different
groups/clusters based on a file, much easier than constructing
multiple pgbench invocations depending. I can see an argument for
both approaches. The PGSERVICEFILE was an idea I'd had to store
easily indexed groups of connection information in a way that I didn't
need to know all the details, could easily parse, and could later pass
in the ENV so libpq could just pull out the information.

The attached version does work with the service file if the user provides
"service=whatever" on the command line. The main difference is that it
sticks to the libpq policy to use an explicit connection string or list of
connection strings.

Also, note that the patch I sent dropped the --conninfo option.
Connections are simply tghe last arguments to pgbench.

I'll see if I can take a look at your latest patch.

Thanks!

I was also wondering about how we should handle `pgbench -i` with
multiple connection strings; currently it would only initialize with the
first DSN it gets, but it probably makes sense to run initialize against
all of the databases (or at least attempt to).

I'll tend to disagree on this one. Pgbench whole expectation is to run
against "one" system, which might be composed of several nodes because of
replications. I do not think that it is desirable to jump to "serveral
fully independent databases".

Maybe this is one argument for the multiple --conninfo handling, since
you could explicitly pass the databases you want. (Not that it is hard
to just loop over connection info and `pgbench -i` with ENV, or any
other number of ways to accomplish the same thing.)

Yep.

--
Fabien.

#7Bruce Momjian
bruce@momjian.us
In reply to: Fabien COELHO (#6)
Re: [PATCH] pgbench: add multiconnect option

Hi guys,

It looks like David sent a patch and Fabien sent a followup patch. But
there hasn't been a whole lot of discussion or further patches.

It sounds like there are some basic questions about what the right
interface should be. Are there specific questions that would be
helpful for moving forward?

#8Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Bruce Momjian (#7)
Re: [PATCH] pgbench: add multiconnect option

Hello Greg,

It looks like David sent a patch and Fabien sent a followup patch. But
there hasn't been a whole lot of discussion or further patches.

It sounds like there are some basic questions about what the right
interface should be. Are there specific questions that would be
helpful for moving forward?

Review the designs and patches and tell us what you think?

Personnaly, I think that allowing multiple connections is a good thing,
especially if the code impact is reduced, which is the case with the
version I sent.

Then for me the next step would be to have a reconnection on errors so as
to implement a client-side failover policy that could help testing a
server-failover performance impact. I have done that internally but it
requires that "Pgbench Serialization and deadlock errors" to land, as it
would just be another error that can be handled.

--
Fabien.

#9Sami Imseih
samimseih@gmail.com
In reply to: Fabien COELHO (#8)
Re: [PATCH] pgbench: add multiconnect option

The current version of the patch does not apply, so I could not test it.

Here are some comments I have.

Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.
A real world use-case will be helpful for this thread.

For the current patch, Should the report also cover per-database statistics (tps/latency/etc.) ?

Regards,

Sami Imseih
Amazon Web Services

#10Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Sami Imseih (#9)
Re: [PATCH] pgbench: add multiconnect option

Hi Sami,

Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.

Maybe, but I do not see how it would be worse that what pgbench already
allows.

A real world use-case will be helpful for this thread.

Basically more versatile testing for non single host setups.

For instance, it would allow testing directly a multi-master setup, such
as bucardo, symmetricds or coackroachdb.

It would be a first step on the path to allow interesting features such
as:

- testing failover setup, on connection error a client could connect to
another host.

- testing a primary/standby setup, with write transactions sent to the
primary and read transactions sent to the standbyes.

Basically I have no doubt that it can be useful.

For the current patch, Should the report also cover per-database
statistics (tps/latency/etc.) ?

That could be a "per-connection" option. If there is a reasonable use case
I think that it would be an easy enough feature to implement.

Attached a rebased version.

--
Fabien.

Attachments:

pgbench-multi-connect-conninfo-3.patchtext/x-diff; name=pgbench-multi-connect-conninfo-3.patchDownload+151-21
#11David Christensen
david.christensen@crunchydata.com
In reply to: Fabien COELHO (#10)
Re: [PATCH] pgbench: add multiconnect option

On Sat, Mar 19, 2022 at 11:43 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hi Sami,

Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.

Maybe, but I do not see how it would be worse that what pgbench already
allows.

I agree that pgbench is simple; perhaps really too simple when it comes to
being able to measure much more than basic query flows. What pgbench does
have in its favor is being distributed with the core distribution.

I think there is definitely space for a more complicated benchmarking tool
that exercises more scenarios and more realistic query patterns and
scenarios. Whether that is distributed with the core is another question.

David

#12Fabien COELHO
coelho@cri.ensmp.fr
In reply to: David Christensen (#11)
Re: [PATCH] pgbench: add multiconnect option

Pgbench is a simple benchmark tool by design, and I wonder if adding
a multiconnect feature will cause pgbench to be used incorrectly.

Maybe, but I do not see how it would be worse that what pgbench already
allows.

I agree that pgbench is simple; perhaps really too simple when it comes to
being able to measure much more than basic query flows. What pgbench does
have in its favor is being distributed with the core distribution.

I think there is definitely space for a more complicated benchmarking tool
that exercises more scenarios and more realistic query patterns and
scenarios. Whether that is distributed with the core is another question.

As far as this feature is concerned, the source code impact of the patch
is very small, so I do not think that is worth barring this feature on
that ground.

--
Fabien.

#13Bruce Momjian
bruce@momjian.us
In reply to: Fabien COELHO (#12)
Re: [PATCH] pgbench: add multiconnect option

According to the cfbot this patch needs a rebase

#14Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Bruce Momjian (#13)
Re: [PATCH] pgbench: add multiconnect option

According to the cfbot this patch needs a rebase

Indeed. v4 attached.

--
Fabien.

Attachments:

pgbench-multi-connect-conninfo-4.patchtext/x-diff; name=pgbench-multi-connect-conninfo-4.patchDownload+151-21
#15Ian Lawrence Barwick
barwick@gmail.com
In reply to: Fabien COELHO (#14)
Re: [PATCH] pgbench: add multiconnect option

2022年4月2日(土) 22:35 Fabien COELHO <coelho@cri.ensmp.fr>:

According to the cfbot this patch needs a rebase

Indeed. v4 attached.

Hi

cfbot reports the patch no longer applies. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Thanks

Ian Barwick

#16Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Ian Lawrence Barwick (#15)
Re: [PATCH] pgbench: add multiconnect option

Hello Ian,

cfbot reports the patch no longer applies. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Attached a v5 which is just a rebase.

--
Fabien.

Attachments:

pgbench-multi-connect-conninfo-5.patchtext/x-diff; name=pgbench-multi-connect-conninfo-5.patchDownload+151-21
#17Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Fabien COELHO (#16)
Re: [PATCH] pgbench: add multiconnect option

This patch seems to have quite some use case overlap with my patch which
adds load balancing to libpq itself:
/messages/by-id/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.com

My patch is only able to add "random" load balancing though, not
"round-robin". So this patch still definitely seems useful, even when mine
gets merged.

I'm not sure that the support for the "working" connection is necessary
from a feature perspective though (usability/discoverability is another
question). It's already possible to achieve the same behaviour by simply
providing multiple host names in the connection string. You can even tell
libpq to connect to a primary or secondary by using the
target_session_attrs option.

On Fri, 6 Jan 2023 at 11:33, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Show quoted text

Hello Ian,

cfbot reports the patch no longer applies. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Attached a v5 which is just a rebase.

--
Fabien.

#18Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Jelte Fennema-Nio (#17)
Re: [PATCH] pgbench: add multiconnect option

Hello Jelte,

This patch seems to have quite some use case overlap with my patch which
adds load balancing to libpq itself:
/messages/by-id/PR3PR83MB04768E2FF04818EEB2179949F7A69@PR3PR83MB0476.EURPRD83.prod.outlook.com

Thanks for the pointer.

The end purpose of the patch is to allow pgbench to follow a failover at
some point, at the client level, AFAICR.

My patch is only able to add "random" load balancing though, not
"round-robin". So this patch still definitely seems useful, even when mine
gets merged.

Yep. I'm not sure the end purpose is the same, but possibly the pgbench
patch could take advantage of libpq extension.

I'm not sure that the support for the "working" connection is necessary
from a feature perspective though (usability/discoverability is another
question). It's already possible to achieve the same behaviour by simply
providing multiple host names in the connection string. You can even tell
libpq to connect to a primary or secondary by using the
target_session_attrs option.

--
Fabien.

#19vignesh C
vignesh21@gmail.com
In reply to: Fabien COELHO (#16)
Re: [PATCH] pgbench: add multiconnect option

On Tue, 8 Nov 2022 at 02:16, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Ian,

cfbot reports the patch no longer applies. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Attached a v5 which is just a rebase.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3227.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
=== applying patch ./pgbench-multi-connect-conninfo-5.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/ref/pgbench.sgml
Hunk #3 FAILED at 921.
1 out of 3 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/pgbench.sgml.rej

[1]: http://cfbot.cputube.org/patch_41_3227.log

Regards,
Vignesh

#20vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#19)
Re: [PATCH] pgbench: add multiconnect option

On Wed, 11 Jan 2023 at 22:17, vignesh C <vignesh21@gmail.com> wrote:

On Tue, 8 Nov 2022 at 02:16, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Ian,

cfbot reports the patch no longer applies. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.

Attached a v5 which is just a rebase.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
=== applying patch ./pgbench-multi-connect-conninfo-5.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/ref/pgbench.sgml
Hunk #3 FAILED at 921.
1 out of 3 hunks FAILED -- saving rejects to file
doc/src/sgml/ref/pgbench.sgml.rej

There has been no updates on this thread for some time, so this has
been switched as Returned with Feedback. Feel free to change it open
in the next commitfest if you plan to continue on this.

Regards,
Vignesh