Doubt in pgbench TPS number

Started by Tatsuo Ishiiover 10 years ago18 messages

ishii@postgresql.org

over 10 years ago

Today I got an interesting output from pgbench.
The scenario is executing SELECT pg_sleep(0.1).

$ cat test.sql
select pg_sleep(0.1);

$ pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2883
latency average: 104.058 ms
tps = 95.799114 (including connections establishing)
tps = 124.487149 (excluding connections establishing)

Interesting thing is, the number from "excluding connections
establishing". 124.487149 tps means 0.008032 seconds per
transaction. Since the query executes pg_sleep(0.1), I think the
number should be equal to or greater than 0.1. Maybe a tolerance, but
20% of error seems to be too high for me.

Note that if "-C" does not present, the TPS number seems to be sane.

$ pgbench -n -p 11002 -c 10 -T 30 -f test.sql test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2970
latency average: 101.010 ms
tps = 98.692514 (including connections establishing)
tps = 98.747053 (excluding connections establishing)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tatsuo Ishii (#1)

Re: Doubt in pgbench TPS number

Hello Tatsuo-san,

$ pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test
tps = 95.799114 (including connections establishing)
tps = 124.487149 (excluding connections establishing)

Interesting thing is, the number from "excluding connections
establishing". 124.487149 tps means 0.008032 seconds per
transaction. Since the query executes pg_sleep(0.1), I think the
number should be equal to or greater than 0.1. Maybe a tolerance, but
20% of error seems to be too high for me.

Indeed.

Note that if "-C" does not present, the TPS number seems to be sane.

Hmmm... I never use -C. The formula seems ok:

tps_exclude = normal_xacts / (time_include -
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));

conn_total_time is the cumulated time spent by all threads.

A quick look at the logic reveals some minor issues (conn_time is passed
as an argument to doCustom, although it is already available as a "thread"
field, stange). I spotted "threads" used instead of "thread" in an
accumulation, but it is not related to this computation.

Hmmm. I'll have a slower look...

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Fabien COELHO (#2)

1 attachment(s)

Re: Doubt in pgbench TPS number

Hello Tatsuo,

Hmmm... I never use -C. The formula seems ok:

tps_exclude = normal_xacts / (time_include -
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));

Hmmm... it is not:-)

I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other clients
are possibly doing their transactions, in parallel, even if it is in the
same thread, so it is not "stopped time" for all clients. It starts to
matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in
each thread may be arbitrary close to the whole time if there are many
clients.

Now, I do not think that this tps computation makes that much sense. If
you want to know the tps without reconnect, run without reconnecting... It
is clear that I do not get this figure when running without -C, so maybe
the tps with/without reconnection should be dropped?

Anyway, here is a patch, which:
- fixes the "exclude" computation (s/nthreads/nclients/)
- fixes the count total for skipped (s/threads/thread/)
- removes a useless parameter to doCustom
(the conn_time is available through the thread parameter).

--
Fabien.

Attachments:

pgbench-connect-1.patchtext/x-diff; name=pgbench-connect-1.patchDownload

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6ae1b86..9a7dd6a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1126,7 +1126,7 @@ agg_vals_init(AggVals *aggs, instr_time start)
 
 /* return false iff client should be disconnected */
 static bool
-doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVals *agg)
+doCustom(TState *thread, CState *st, FILE *logfile, AggVals *agg)
 {
 	PGresult   *res;
 	Command   **commands;
@@ -1357,7 +1357,7 @@ top:
 			return clientDone(st, false);
 		}
 		INSTR_TIME_SET_CURRENT(end);
-		INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
+		INSTR_TIME_ACCUM_DIFF(thread->conn_time, end, start);
 	}
 
 	/*
@@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 	tps_include = normal_xacts / time_include;
 	tps_exclude = normal_xacts / (time_include -
-						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
 
 	if (ttype == 0)
 		s = "TPC-B (sort of)";
@@ -3462,7 +3462,7 @@ main(int argc, char **argv)
 
 		/* thread level stats */
 		throttle_lag += thread->throttle_lag;
-		throttle_latency_skipped += threads->throttle_latency_skipped;
+		throttle_latency_skipped += thread->throttle_latency_skipped;
 		latency_late += thread->latency_late;
 		if (throttle_lag_max > thread->throttle_lag_max)
 			throttle_lag_max = thread->throttle_lag_max;
@@ -3578,7 +3578,7 @@ threadRun(void *arg)
 		int			prev_ecnt = st->ecnt;
 
 		st->use_file = getrand(thread, 0, num_files - 1);
-		if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))
+		if (!doCustom(thread, st, logfile, &aggs))
 			remains--;			/* I've aborted */
 
 		if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
@@ -3717,7 +3717,7 @@ threadRun(void *arg)
 			if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask)
 							|| commands[st->state]->type == META_COMMAND))
 			{
-				if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))
+				if (!doCustom(thread, st, logfile, &aggs))
 					remains--;	/* I've aborted */
 			}

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Fabien COELHO (#3)

Re: Doubt in pgbench TPS number

I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other
clients are possibly doing their transactions, in parallel,

I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.

I found this while playing with pgpool-II. pgpool-II pre-forks child
process, whose number is defined by "num_init_children"
directive. What I observed was, pgbench starts connecting to pgpool-II
until "-c" connections are established. So, if "-c" is larger than
"num_init_children", no transaction starts.

even if it
is in the same thread, so it is not "stopped time" for all clients. It
starts to matter with "-j 1 -c 30" and slow transactions, the
cumulated conn_time in each thread may be arbitrary close to the whole
time if there are many clients.

Now, I do not think that this tps computation makes that much
sense. If you want to know the tps without reconnect, run without
reconnecting... It is clear that I do not get this figure when running
without -C, so maybe
the tps with/without reconnection should be dropped?

Anyway, here is a patch, which:
- fixes the "exclude" computation (s/nthreads/nclients/)
- fixes the count total for skipped (s/threads/thread/)
- removes a useless parameter to doCustom
(the conn_time is available through the thread parameter).

I have tested your patch. It seems the tolerance is much better than
before with your patch.

With the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.896850 (including connections establishing)
tps = 98.101662 (excluding connections establishing)

Without the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.919415 (including connections establishing)
tps = 124.732475 (excluding connections establishing)

I'm going to commit your patch if there's no objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Tatsuo Ishii (#4)

Re: Doubt in pgbench TPS number

I have tested your patch. It seems the tolerance is much better than
before with your patch.

[snip]

I'm going to commit your patch if there's no objection.

I think we should commit this to master and 9.5 stable branch only
because the fix significantly changes the benchmark result of pgbench.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tatsuo Ishii (#4)

Re: Doubt in pgbench TPS number

Hello Tatsuo-san,

I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other
clients are possibly doing their transactions, in parallel,

I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.

I think that is true if "!is_connect", all client connections are
performed at the beginning of threadRun, but under -C each client has its
own connect/deconnect integrated within doCustom, so it is done in
parallel to other clients having their transactions processed, hence the
issue with the formula.

I found this while playing with pgpool-II. pgpool-II pre-forks child
process, whose number is defined by "num_init_children"
directive. What I observed was, pgbench starts connecting to pgpool-II
until "-c" connections are established. So, if "-c" is larger than
"num_init_children", no transaction starts.

Yep, unless -C, I think, where some client may start nevertheless? Not
sure... Does not matter.

I have tested your patch. It seems the tolerance is much better than
before with your patch.

Yep.

I'm going to commit your patch if there's no objection.

This seems fine with me.

The formula change, and just this one, should probably be backported
somehow, as this is a bug, wherever pgbench resides in older versions. It
is just 's/nthreads/nclients/' in the printResult formula for computing
tps_exclude.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Fabien COELHO (#6)

Re: Doubt in pgbench TPS number

I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.

I think that is true if "!is_connect", all client connections are
performed at the beginning of threadRun, but under -C each client has
its own connect/deconnect integrated within doCustom, so it is done in
parallel to other clients having their transactions processed, hence
the issue with the formula.

Really?

I have tested with pgpool-II which is set to accept up to 2
connections, then run pgbench with -C and -c 32. pgbench was blocked
as expected and I attached gdb and got stack trace:

(gdb) bt
#0 0x00007f48d5f17110 in __poll_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f48d6724056 in pqSocketCheck ()
from /usr/local/src/pgsql/current/lib/libpq.so.5
#2 0x00007f48d6724940 in pqWaitTimed ()
from /usr/local/src/pgsql/current/lib/libpq.so.5
#3 0x00007f48d671f3e2 in connectDBComplete ()
from /usr/local/src/pgsql/current/lib/libpq.so.5
#4 0x00007f48d671fbcf in PQconnectdbParams ()
from /usr/local/src/pgsql/current/lib/libpq.so.5
#5 0x0000000000402b2b in doConnect () at pgbench.c:650
#6 0x0000000000404591 in doCustom (thread=0x25c2f40, st=0x25c2770,
conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260) at pgbench.c:1353
#7 0x000000000040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581
#8 0x0000000000409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455

As you can see, one of threads wants to connect to PostgreSQL
(actually pgpool-II) and waits for reply.

In threadRun() around line 3581:

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)
{
CState *st = &state[i];
Command **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;

st->use_file = getrand(thread, 0, num_files - 1);
if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))
remains--; /* I've aborted */

if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
{
fprintf(stderr, "client %d aborted in state %d; execution of meta-command failed\n",
i, st->state);
remains--; /* I've aborted */
PQfinish(st->con);
st->con = NULL;
}
}

Here doCustome() is called with st->con == NULL. In doCustom() around
line 1353:

if (st->con == NULL)
{
instr_time start,
end;

INSTR_TIME_SET_CURRENT(start);
if ((st->con = doConnect()) == NULL)
{

doConnect() blocks until PostgreSQL (pgpool-II) allows to be
connected.

Because outer loop in threadRun() wants to loop over until all threads
succefully connects to PostgreSQL, pgbench is blocked here.

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)

I'm going to commit your patch if there's no objection.

This seems fine with me.

The formula change, and just this one, should probably be backported
somehow, as this is a bug, wherever pgbench resides in older
versions. It is just 's/nthreads/nclients/' in the printResult formula
for computing tps_exclude.

Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tatsuo Ishii (#7)

Re: Doubt in pgbench TPS number

#5 0x0000000000402b2b in doConnect () at pgbench.c:650
#6 0x0000000000404591 in doCustom (thread=0x25c2f40, st=0x25c2770,
conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260) at pgbench.c:1353
#7 0x000000000040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581
#8 0x0000000000409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455

Hmm. Ok, whatever the connection position (from doCustom or from
threadRun), doConnect would block the thread. On the other hand, if you
start several threads, probably those threads which could connect all
their client would proceed.

What looks to be needed would be some timeout when connecting to the
server.

The formula change, and just this one, should probably be backported
somehow, as this is a bug, wherever pgbench resides in older
versions. It is just 's/nthreads/nclients/' in the printResult formula
for computing tps_exclude.

Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...

My 2ï¿œ: I do not think of a good argument to keep wrong tps numbers once it
is known that there are plain wrong, especially as it is not a behavioral
change as such which could block applications or whatever, just a
different number printed at the end of a run. So I would not bother much
with upward compatibility consideration in this case.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Fabien COELHO (#8)

Re: Doubt in pgbench TPS number

Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...

My 2¥: I do not think of a good argument to keep wrong tps numbers
once it is known that there are plain wrong, especially as it is not a
behavioral change as such which could block applications or whatever,
just a different number printed at the end of a run. So I would not
bother much with upward compatibility consideration in this case.

--
Fabien.

FYI, the bug was there since the thread support was introduced in commit:

da0dfb4b1460c3701abc8ed5f516d138dc4654c

That was in 9.0, which was released in 2010.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Fabien COELHO (#8)

Re: Doubt in pgbench TPS number

Fabien COELHO <coelho@cri.ensmp.fr> writes:

Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...

My 2�: I do not think of a good argument to keep wrong tps numbers once it
is known that there are plain wrong, especially as it is not a behavioral
change as such which could block applications or whatever, just a
different number printed at the end of a run. So I would not bother much
with upward compatibility consideration in this case.

FWIW, I vote with Tatsuo-san. Such a change will break comparability of
results with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus. Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.

I think we should fix it in 9.5 and document it as an incompatible change.

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tom Lane (#10)

Re: Doubt in pgbench TPS number

Hello Tom,

FWIW, I vote with Tatsuo-san. Such a change will break comparability of
results

I would not classify a performance measure as a "result compatibility"
issue. What matters is the *correction* of the results.

When a bug is fixed anywhere in pg it may change performance significantly
for some tests, and I have never seen this as a relevant consideration not
to fix a problem...

with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus. Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.

I think that if people are interested in tps without reconnecting on each
transaction they would not run with "-C" to trigger reconnections and then
look at the "tps without connections" for the figure they want... so I do
not think that keeping this error is worth anything.

On the other hand, and on the principle, keeping the bug looks wrong. I
cannot agree with the logic behind shipping something which is known bad,
especially to display *optimistic* performances... It looks too much like
the VW way:-)

I think we should fix it in 9.5 and document it as an incompatible
change.

Hmm.

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

That is another question:-)

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Tom Lane (#10)

Re: Doubt in pgbench TPS number

FWIW, I vote with Tatsuo-san. Such a change will break comparability of
results with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus. Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.

I think we should fix it in 9.5 and document it as an incompatible change.

Plus if 9.4 or earlier version of PostgreSQL user wants to use fixed
version of pgbench, he/she can always use pgbench coming with 9.5 or
later version of PostgreSQL.

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

As Fabien anayzed the problem was that pgbench simply uses wrong
variable: nthreads (number of threads, specified by "-j" option)
vs. nclients (number of clients, specified by "-c" option).

@@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 	tps_include = normal_xacts / time_include;
 	tps_exclude = normal_xacts / (time_include -
-						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));

Here conn_total_time is the sum of time to establish connection to
PostgreSQL. Since establishing connections to PostgreSQL is done in
serial rather than in parallel, conn_total_time should have been
divided by nclients.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tatsuo Ishii (#12)

1 attachment(s)

Re: Doubt in pgbench TPS number

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

As Fabien anayzed the problem was that pgbench simply uses wrong
variable: nthreads (number of threads, specified by "-j" option)
vs. nclients (number of clients, specified by "-c" option).
@@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
time_include = INSTR_TIME_GET_DOUBLE(total_time);
tps_include = normal_xacts / time_include;
tps_exclude = normal_xacts / (time_include -
-						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
Here conn_total_time is the sum of time to establish connection to
PostgreSQL. Since establishing connections to PostgreSQL is done in
serial rather than in parallel, conn_total_time should have been
divided by nclients.

After some more thinking and looking again at the connection code, I
revise slightly my diagnostic:

- the amount of parallelism is "nclients", as discussed above, when
reconnecting on each transaction (-C) because the connections are managed
in parallel from doCustom,

* BUT *

- if there is no reconnections (not -C) the connections are performed in
threadRun in a sequential way, all clients wait for the connections of
other clients in the same thread before starting processing transactions,
so "nthreads" is the right amount of parallelism in this case.

So on second thought the formula should rather be:

... / (is_connect? nclients: nthreads)

Here is a v2 with this formula. Note that it does not take into account
thread creation time, which might be significant, the start and end of a
pgbench run are quite fuzzy.

I've removed the doCustom parameter change as it is included in a larger
patch I submitted about reworking stat collections in pgbench, so this
attached patch is bug fix only.

For the record, I still plainly disagree with the idea of shipping a
performance measuring tool which knowingly displays wrong and optimistic
figures under some conditions.

--
Fabien.

Attachments:

pgbench-connect-2.patchtext/x-diff; name=pgbench-connect-2.patchDownload

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6ae1b86..a6763f6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2616,7 +2616,8 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 	tps_include = normal_xacts / time_include;
 	tps_exclude = normal_xacts / (time_include -
-						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+						(INSTR_TIME_GET_DOUBLE(conn_total_time) /
+						 (is_connect? nclients: nthreads)));
 
 	if (ttype == 0)
 		s = "TPC-B (sort of)";
@@ -3462,7 +3463,7 @@ main(int argc, char **argv)
 
 		/* thread level stats */
 		throttle_lag += thread->throttle_lag;
-		throttle_latency_skipped += threads->throttle_latency_skipped;
+		throttle_latency_skipped += thread->throttle_latency_skipped;
 		latency_late += thread->latency_late;
 		if (throttle_lag_max > thread->throttle_lag_max)
 			throttle_lag_max = thread->throttle_lag_max;

#14

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Fabien COELHO (#13)

Re: Doubt in pgbench TPS number

Here conn_total_time is the sum of time to establish connection to
PostgreSQL. Since establishing connections to PostgreSQL is done in
serial rather than in parallel, conn_total_time should have been
divided by nclients.

After some more thinking and looking again at the connection code, I
revise slightly my diagnostic:

- the amount of parallelism is "nclients", as discussed above, when
- reconnecting on each transaction (-C) because the connections are
- managed in parallel from doCustom,

* BUT *

- if there is no reconnections (not -C) the connections are performed in
- threadRun in a sequential way, all clients wait for the connections of
- other clients in the same thread before starting processing
- transactions, so "nthreads" is the right amount of parallelism in this
- case.

So on second thought the formula should rather be:

... / (is_connect? nclients: nthreads)

I don't think this is quite correct.

If is_connect is false, then following loop is executed in threadRun():

/* make connections to the database */
for (i = 0; i < nstate; i++)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
}

Here, nstate is nclients/nthreads. Suppose nclients = 16 and nthreads
= 2, then 2 threads run in parallel, and each thread is connecting 8
times (nstate = 8) in *serial*. The total connection time for this
thread is calculated by "the time ends the loop" - "the time starts
the loop". So if the time to establish a connection is 1 second, the
total connection time for a thread will be 8 seconds. Thus grand total
of connection time will be 2 * 8 = 16 seconds.

If is_connect is true, following loop is executed.

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)
{
CState *st = &state[i];
Command **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;

st->use_file = getrand(thread, 0, num_files - 1);
if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))

In the loop, exactly same thing happens as is_connect = false case. If
t = 1, total connection time will be same as is_connect = false case,
i.e. 16 seconds.

In summary, I see no reason to change the v1 patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Tatsuo Ishii (#12)

Re: Doubt in pgbench TPS number

FWIW, I vote with Tatsuo-san. Such a change will break comparability of
results with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus. Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.

I think we should fix it in 9.5 and document it as an incompatible change.

Plus if 9.4 or earlier version of PostgreSQL user wants to use fixed
version of pgbench, he/she can always use pgbench coming with 9.5 or
later version of PostgreSQL.

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

As Fabien anayzed the problem was that pgbench simply uses wrong
variable: nthreads (number of threads, specified by "-j" option)
vs. nclients (number of clients, specified by "-c" option).
@@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
time_include = INSTR_TIME_GET_DOUBLE(total_time);
tps_include = normal_xacts / time_include;
tps_exclude = normal_xacts / (time_include -
-						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+						(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
Here conn_total_time is the sum of time to establish connection to
PostgreSQL. Since establishing connections to PostgreSQL is done in
serial rather than in parallel, conn_total_time should have been
divided by nclients.

Fix committed to master and 9.5 stable branches.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Fabien COELHO

coelho@cri.ensmp.fr

over 10 years ago

In reply to: Tatsuo Ishii (#14)

Re: Doubt in pgbench TPS number

Hello Tatsuo,

So on second thought the formula should rather be:

... / (is_connect? nclients: nthreads)

I don't think this is quite correct.

If is_connect is false, then following loop is executed in threadRun():

/* make connections to the database */
for (i = 0; i < nstate; i++)
{
if ((state[i].con = doConnect()) == NULL)
goto done;
}

Yep. The loop initializes all client connections *BEFORE* starting any
transactions on any client, the thread does only do connections at this
time, which is included conn_time.

Here, nstate is nclients/nthreads. Suppose nclients = 16 and nthreads
= 2, then 2 threads run in parallel, and each thread is connecting 8
times (nstate = 8) in *serial*.

Yes.

The total connection time for this thread is calculated by "the time
ends the loop" - "the time starts the loop". So if the time to establish
a connection is 1 second, the total connection time for a thread will be
8 seconds. Thus grand total of connection time will be 2 * 8 = 16
seconds.

Yes, 16 seconds in 2 threads, 8 seconds per thread of the "real time" of
the test is spend in the connection, and no

If is_connect is true, following loop is executed.

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)
{
CState *st = &state[i];
Command **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;

st->use_file = getrand(thread, 0, num_files - 1);
if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))

In the loop, exactly same thing happens as is_connect = false case. If
t = 1, total connection time will be same as is_connect = false case,
i.e. 16 seconds.

Without -C, 1 thread, 2 clients, if transactions take same time as
connections:

Client 1: C-|TTTTTTTTTTTTTTTTTTTTTTTTTTT
Client 2: -C|TTTTTTTTTTTTTTTTTTTTTTTTTTT
<> connection time (initial loop in threadRun)
<----------------------------> whole execution time
<-------------------------> transaction time

The actual transaction time to consider on this drawing is whole time
minus the connection time of the thread, which is serialised. It is not
whole execution time minus connection time / 2 (number of clients),
because of the '|' synchronisation (clients do not start before all other
clients of the thread are connected).

With -C, the is no initial connection, the connections are managed within
doCustom, which is doing transaction processing asynchronously.

Client 1: CTCTCTCTCTCTCTCTCTCTCTCTCTCT-
Client 2: -CTCTCTCTCTCTCTCTCTCTCTCTCTCT
<---------------------------> whole execution time
<--------------------------> measured connection time

While a client is connecting, the other client is performing its
transaction in an asynchronous manner, so the measured connection time may
be arbitrary close to the execution time, this was the bug you detected.

In summary, I see no reason to change the v1 patch.

I still think that my revised thinking is the right one... I hope that the
above drawings make my thinking clearer. For me, the initial formula was
the right one when not using -C, only the -C case need fixing.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Fabien COELHO (#3)

Re: [HACKERS] Doubt in pgbench TPS number

Hi,

On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote:

Hello Tatsuo,

Hmmm... I never use -C. The formula seems ok:

tps_exclude = normal_xacts / (time_include -
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));

Hmmm... it is not:-)

I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other clients
are possibly doing their transactions, in parallel, even if it is in the
same thread, so it is not "stopped time" for all clients. It starts to
matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in
each thread may be arbitrary close to the whole time if there are many
clients.

I think this pretty much entirely broke the tps_exclude logic when not
using -C, especially when -c and -j differ. The wait time there is
actually per thread, not per client.

In this example I set post_auth_delay=1s on the server. Pgbench
tells me:
pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S
tps = 897607.544862 (including connections establishing)
tps = 1004793.708611 (excluding connections establishing)

pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S
tps = 739502.979613 (including connections establishing)
tps = 822639.038779 (excluding connections establishing)

pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
tps = 376468.177081 (including connections establishing)
tps = 418554.527585 (excluding connections establishing)

which pretty obviously is bogus. While I'd not expect it'd to work
perfectly, the "excluding" number should stay roughly constant.

The fundamental issue is that without -C *none* of the connections in
each thread gets to actually execute work before all of them have
established a connection. So dividing conn_total_time by / nclients is
wrong.

For more realistic connection delays this leads to the 'excluding'
number being way too close to the 'including' number, even if a
substantial portion of the time is spent on it.

Greetings,

Andres Freund

#18

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Andres Freund (#17)

Re: [HACKERS] Doubt in pgbench TPS number

On 2020-02-27 12:26:36 -0800, Andres Freund wrote:

Hi,

On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote:

Hello Tatsuo,

Hmmm... I never use -C. The formula seems ok:

tps_exclude = normal_xacts / (time_include -
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));

Hmmm... it is not:-)

I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other clients
are possibly doing their transactions, in parallel, even if it is in the
same thread, so it is not "stopped time" for all clients. It starts to
matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in
each thread may be arbitrary close to the whole time if there are many
clients.

I think this pretty much entirely broke the tps_exclude logic when not
using -C, especially when -c and -j differ. The wait time there is
actually per thread, not per client.

In this example I set post_auth_delay=1s on the server. Pgbench
tells me:
pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S
tps = 897607.544862 (including connections establishing)
tps = 1004793.708611 (excluding connections establishing)

pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S
tps = 739502.979613 (including connections establishing)
tps = 822639.038779 (excluding connections establishing)

pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
tps = 376468.177081 (including connections establishing)
tps = 418554.527585 (excluding connections establishing)

which pretty obviously is bogus. While I'd not expect it'd to work
perfectly, the "excluding" number should stay roughly constant.

The fundamental issue is that without -C *none* of the connections in
each thread gets to actually execute work before all of them have
established a connection. So dividing conn_total_time by / nclients is
wrong.

For more realistic connection delays this leads to the 'excluding'
number being way too close to the 'including' number, even if a
substantial portion of the time is spent on it.

Not suggesting it as an actual fix, but just multiplying the computed
conn_time within runThread() by the number of connections the thread
handles (for the non -C case) results in much more reasonable output.

pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
before:
tps = 378393.985650 (including connections establishing)
tps = 420691.025930 (excluding connections establishing)
after:
tps = 379818.929680 (including connections establishing)
tps = 957405.785600 (excluding connections establishing)

pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S

before:
tps = 906662.031099 (including connections establishing)
tps = 1012223.500880 (excluding connections establishing)

after:
tps = 906178.154876 (including connections establishing)
tps = 1012431.154463 (excluding connections establishing)

The numbers are still a bit bogus because of thread startup overhead
being included, and conn_time being computed relative to the thread
creation. But they at least make some basic sense.

diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c
index 1159757acb0..3bc45107136 100644
--- i/src/bin/pgbench/pgbench.c
+++ w/src/bin/pgbench/pgbench.c
@@ -6265,6 +6265,16 @@ threadRun(void *arg)
     INSTR_TIME_SET_CURRENT(thread->conn_time);
     INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time);

+    /* add once for each other connection */
+    if (!is_connect)
+    {
+        instr_time e = thread->conn_time;
+        for (i = 0; i < (nstate - 1); i++)
+        {
+            INSTR_TIME_ADD(thread->conn_time, e);
+        }
+    }
+
     /* explicitly initialize the state machines */
     for (i = 0; i < nstate; i++)
     {

Greetings,

Andres Freund