multi-master pgbench?

Started by Tatsuo Ishiiover 13 years ago19 messages
#1Tatsuo Ishii
ishii@postgresql.org

Hi,

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Tatsuo Ishii (#1)
Re: multi-master pgbench?

On Tue, Aug 21, 2012 at 6:04 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:

Hi,

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Perhaps the read option has a good interest for PostgreSQL to check a
simultaneous load on a multiple cluster of Postgres with read operations.
But I do not see any immediate use of write operations only. Have you
thought about the possibility to define a different set of transaction
depending on the node targetted? For example you could target a master with
write-read and slaves with read-only.

Btw, this could have some use not only for Postgres, but also for other
projects based on it with which you could really do some multi-master
benchmark in writing.
Do you have some thoughts about the possible option specifications?
Configuration files would be too heavy for the only purpose of pgbench. So,
specifiying all the info in a single command? It is of course possible, but
command will become easily unreadable, and it might be the cause of many
mistakes.

However, here are some ideas you might use:
1) pgbench -h host1:port1,host2:port2 ...
2) pgbench -h host1,host2 -p port1:port2

Regards,
--
Michael Paquier
http://michael.otacoo.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Why wouldn't you just fire up several copies of pgbench, one per host?

The main reason I'm dubious about this is that it's demonstrable that
pgbench itself is the bottleneck in many test scenarios. That problem
gets worse the more backends you try to have it control. You can of
course "solve" this with multiple threads in pgbench, but as soon as you
do that there's no functional benefit over just running several copies.

regards, tom lane

#4Tatsuo Ishii
ishii@postgresql.org
In reply to: Michael Paquier (#2)
Re: multi-master pgbench?

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Perhaps the read option has a good interest for PostgreSQL to check a
simultaneous load on a multiple cluster of Postgres with read operations.
But I do not see any immediate use of write operations only. Have you
thought about the possibility to define a different set of transaction
depending on the node targetted? For example you could target a master with
write-read and slaves with read-only.

I think that kind of "intelligence" is beyond scope of pgbench. I
would prefer to leave such a work to another tool.

Btw, this could have some use not only for Postgres, but also for other
projects based on it with which you could really do some multi-master
benchmark in writing.

Right. If pgbench could have such a functionarlity, we could compare
those projects by using pgbench. Currently those projects use
different benchmarking tools. That means, the comparison is something
like apple-to-orange. With enhanced pgbench we could do apple-to-apple
comparison.

Do you have some thoughts about the possible option specifications?
Configuration files would be too heavy for the only purpose of pgbench. So,
specifiying all the info in a single command? It is of course possible, but
command will become easily unreadable, and it might be the cause of many
mistakes.

Agreed.

However, here are some ideas you might use:
1) pgbench -h host1:port1,host2:port2 ...
2) pgbench -h host1,host2 -p port1:port2

Looks good.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#5David Fetter
david@fetter.org
In reply to: Tatsuo Ishii (#1)
Re: multi-master pgbench?

On Tue, Aug 21, 2012 at 06:04:42PM +0900, Tatsuo Ishii wrote:

Hi,

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Comments?

To distinguish it from simply running separate pgbench tests for each
host, would this somehow test propagation of the writes? Such a thing
would be quite useful, but it seems at first glance like a large
project.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6Tatsuo Ishii
ishii@postgresql.org
In reply to: David Fetter (#5)
Re: multi-master pgbench?

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Comments?

To distinguish it from simply running separate pgbench tests for each
host, would this somehow test propagation of the writes? Such a thing
would be quite useful, but it seems at first glance like a large
project.

What does "propagation of the writes" mean?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#7Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#3)
Re: multi-master pgbench?

Why wouldn't you just fire up several copies of pgbench, one per host?

Well, more convenient. Aside from bottle neck discussion below, simple
tool to generate load is important IMO. It will help developers to
enhance multi-master configuration in finding bugs and problems if
any. IMO I saw similar relationship between pgbench and PostgreSQL.

The main reason I'm dubious about this is that it's demonstrable that
pgbench itself is the bottleneck in many test scenarios. That problem
gets worse the more backends you try to have it control. You can of
course "solve" this with multiple threads in pgbench, but as soon as you
do that there's no functional benefit over just running several copies.

Are you sure that running several copies of pgbench could produce more
TPS than single pgbench? I thought that's just a limitation of the
resource of the machine which pgbench is running on. So I thought to
avoid the bottle neck of pgbench, I have to use several pgbench client
machines simultaneously anyway.

Another point is, what kind of transactions you want. "pgbench -S"
type transaction produces trivial load, and could easily reveal bottle
neck of pgbench. However this type of transaction is pretty extrem one
and very different from transactions in the real world. So even if
your argument is true, I guess it's only adopted to "pgbench -S" case.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#7)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

Why wouldn't you just fire up several copies of pgbench, one per host?

Well, more convenient. Aside from bottle neck discussion below, simple
tool to generate load is important IMO.

Well, my concern here is that it's *not* going to be simple. By the
time we get done adding enough switches to control connection to N
different hosts (possibly with different usernames, passwords, etc),
then adding frammishes to control which scripts get sent to which hosts,
and so on, I don't think it's really going to be simpler to use than
launching N copies of pgbench.

It might be worth doing if we had features that allowed the different
test scripts to interact, so that they could do things like check
replication propagation from one host to another. But pgbench hasn't
got that, and in multi-job mode really can't have that (at least not
in the Unix separate-processes implementation). Anyway that's a whole
nother level of complexity that would have to be added on before you
got to a useful feature.

regards, tom lane

#9David Fetter
david@fetter.org
In reply to: Tatsuo Ishii (#6)
Re: multi-master pgbench?

On Wed, Aug 22, 2012 at 06:26:00AM +0900, Tatsuo Ishii wrote:

I am thinking about to implement "multi-master" option for pgbench.
Supose we have multiple PostgreSQL running on host1 and host2.
Something like "pgbench -c 10 -h host1,host2..." will create 5
connections to host1 and host2 and send queries to host1 and host2.
The point of this functionality is to test some cluster software which
have a capability to create multi-master configuration.

Comments?

To distinguish it from simply running separate pgbench tests for each
host, would this somehow test propagation of the writes? Such a thing
would be quite useful, but it seems at first glance like a large
project.

What does "propagation of the writes" mean?

I apologize for not being clear. In a multi-master system, people
frequently wish to know how quickly a write operation has been
duplicated to the other nodes. In some sense, those write operations
are incomplete until they have happened on all nodes, even in the
asynchronous case.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#10Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#8)
Re: multi-master pgbench?

Well, my concern here is that it's *not* going to be simple. By the
time we get done adding enough switches to control connection to N
different hosts (possibly with different usernames, passwords, etc),
then adding frammishes to control which scripts get sent to which hosts,
and so on, I don't think it's really going to be simpler to use than
launching N copies of pgbench.

It might be worth doing if we had features that allowed the different
test scripts to interact, so that they could do things like check
replication propagation from one host to another. But pgbench hasn't
got that, and in multi-job mode really can't have that (at least not
in the Unix separate-processes implementation). Anyway that's a whole
nother level of complexity that would have to be added on before you
got to a useful feature.

I do not intended to implement such a feature. As I wrote in the
subject line, I intended to enhance pgbench for "multi-master"
configuration. IMO, any node on multi-master configuration should
accept *any* queries, not only read queries but write queries. So bare
PostgreSQL streaming replication configuration cannot be a
multi-master configuration and will not be a target of the new
pgbench.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#11Tatsuo Ishii
ishii@postgresql.org
In reply to: David Fetter (#9)
Re: multi-master pgbench?

What does "propagation of the writes" mean?

I apologize for not being clear. In a multi-master system, people
frequently wish to know how quickly a write operation has been
duplicated to the other nodes. In some sense, those write operations
are incomplete until they have happened on all nodes, even in the
asynchronous case.

IMO, that kind of functionnality is beyond the scope of benchmark tools.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#10)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

Well, my concern here is that it's *not* going to be simple. By the
time we get done adding enough switches to control connection to N
different hosts (possibly with different usernames, passwords, etc),
then adding frammishes to control which scripts get sent to which hosts,
and so on, I don't think it's really going to be simpler to use than
launching N copies of pgbench.

I do not intended to implement such a feature. As I wrote in the
subject line, I intended to enhance pgbench for "multi-master"
configuration. IMO, any node on multi-master configuration should
accept *any* queries, not only read queries but write queries. So bare
PostgreSQL streaming replication configuration cannot be a
multi-master configuration and will not be a target of the new
pgbench.

Well, you're being shortsighted then, because such a feature will barely
have hit the git repository before somebody wants to use it differently.
I can easily imagine wanting to stress a master plus some hot-standby
slaves, for instance; and that would absolutely require being able to
direct different subsets of the test scripts to different hosts.

regards, tom lane

#13Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#12)
Re: multi-master pgbench?

I do not intended to implement such a feature. As I wrote in the
subject line, I intended to enhance pgbench for "multi-master"
configuration. IMO, any node on multi-master configuration should
accept *any* queries, not only read queries but write queries. So bare
PostgreSQL streaming replication configuration cannot be a
multi-master configuration and will not be a target of the new
pgbench.

Well, you're being shortsighted then, because such a feature will barely
have hit the git repository before somebody wants to use it differently.
I can easily imagine wanting to stress a master plus some hot-standby
slaves, for instance; and that would absolutely require being able to
direct different subsets of the test scripts to different hosts.

I don't see any practical way to implement such a tool because there's
always a chance to try to retrieve non existing data from hot-standby
because of replication delay.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#14Greg Sabino Mullane
greg@turnstep.com
In reply to: Tatsuo Ishii (#6)
Re: multi-master pgbench?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

The point of this functionality is to test some cluster
software which have a capability to create multi-master
configuration.

As the maintainer of software that does multi-master, I'm a little
confused as to why we would extend pg_bench to do this. The software
in question should be doing the testing itself, ideally via
it's test suite (i.e. "make test"). Having pg_bench do any of this
would be at best a very poor subset of the tests the software
should be performing. I suppose if the software *uses* pg_bench for
its tests already, once could argue a limited test case - but it seems
difficult to design some pg_bench options generic and powerful enough
to handle other cases outside of the one software this change is aimed at.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201208212330
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlA0UvsACgkQvJuQZxSWSsjALgCgw2cGI3eWR5fBGkoX9hqV1N39
OSEAn2ZIxrNRCdkDfKVrMmx2PsQTs8ZS
=Xhqb
-----END PGP SIGNATURE-----

#15Tatsuo Ishii
ishii@postgresql.org
In reply to: Greg Sabino Mullane (#14)
Re: multi-master pgbench?

As the maintainer of software that does multi-master, I'm a little
confused as to why we would extend pg_bench to do this. The software
in question should be doing the testing itself, ideally via
it's test suite (i.e. "make test"). Having pg_bench do any of this
would be at best a very poor subset of the tests the software
should be performing. I suppose if the software *uses* pg_bench for
its tests already, once could argue a limited test case - but it seems
difficult to design some pg_bench options generic and powerful enough
to handle other cases outside of the one software this change is aimed at.

Well, my point was in upthread:

Right. If pgbench could have such a functionarlity, we could compare
those projects by using pgbench. Currently those projects use
different benchmarking tools. That means, the comparison is something
like apple-to-orange. With enhanced pgbench we could do apple-to-apple
comparison.

--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#16David Fetter
david@fetter.org
In reply to: Tatsuo Ishii (#11)
Re: multi-master pgbench?

On Wed, Aug 22, 2012 at 10:13:43AM +0900, Tatsuo Ishii wrote:

What does "propagation of the writes" mean?

I apologize for not being clear. In a multi-master system, people
frequently wish to know how quickly a write operation has been
duplicated to the other nodes. In some sense, those write
operations are incomplete until they have happened on all nodes,
even in the asynchronous case.

IMO, that kind of functionnality is beyond the scope of benchmark
tools.

I was trying to come up with something that would distinguish pgbench
for multi-master from pgbench run on independent nodes. Is there some
other distinction to draw?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#17Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tatsuo Ishii (#1)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

I am thinking about to implement "multi-master" option for pgbench.

Please consider using Tsung, which solves that problem and many others.

http://tsung.erlang-projects.org/

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#18Tatsuo Ishii
ishii@postgresql.org
In reply to: Dimitri Fontaine (#17)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

I am thinking about to implement "multi-master" option for pgbench.

Please consider using Tsung, which solves that problem and many others.

http://tsung.erlang-projects.org/

Thank you for introducing Tsung. I have some questions regarding it.
Does it support extended query? Does it support V3 protocol?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#19Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tatsuo Ishii (#18)
Re: multi-master pgbench?

Tatsuo Ishii <ishii@postgresql.org> writes:

Does it support extended query? Does it support V3 protocol?

Yes.

It also has a proxy mode where it captures the queries sent by the
client along with think times and outputs that in the session format it
reads from its setup, which is very useful. Using that capability you
can easily have Tsung replay your existing pgbench script.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support