multi-threaded pgbench

Started by ITAGAKI Takahiroalmost 17 years ago26 messageshackers
Jump to latest
#1ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp

Pgbench is a famous tool to measure postgres performance, but nowadays
it does not work well because it cannot use multiple CPUs. On the other
hand, postgres server can use CPUs very well, so the bottle-neck of
workload is *in pgbench*.

Multi-threading would be a solution. The attached patch adds -j
(number of jobs) option to pgbench. If the value N is greater than 1,
pgbench runs with N threads. Connections are equally-divided into
them (ex. -c64 -j4 => 4 threads with 16 connections each). It can
run on POSIX platforms with pthread and on Windows with win32 threads.

Here are results of multi-threaded pgbench runs on Fedora 11 with intel
core i7 (8 logical cores = 4 physical cores * HT). -j8 (8 threads) was
the best and the tps is 4.5 times of -j1, that is a traditional result.

$ pgbench -i -s10
$ pgbench -n -S -c64 -j1 => tps = 11600.158593
$ pgbench -n -S -c64 -j2 => tps = 17947.100954
$ pgbench -n -S -c64 -j4 => tps = 26571.124001
$ pgbench -n -S -c64 -j8 => tps = 52725.470403
$ pgbench -n -S -c64 -j16 => tps = 38976.675319
$ pgbench -n -S -c64 -j32 => tps = 28998.499601
$ pgbench -n -S -c64 -j64 => tps = 26701.877815

Is it acceptable to use pthread in contrib module?
If ok, I will add the patch to the next commitfest.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

pgbench-mt.patchapplication/octet-stream; name=pgbench-mt.patchDownload+385-252
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: ITAGAKI Takahiro (#1)
Re: multi-threaded pgbench

Itagaki Takahiro wrote:

Is it acceptable to use pthread in contrib module?

We don't have a precedent it seems. I think the requirement would be
that it should compile if pthread support is not present.

If ok, I will add the patch to the next commitfest.

Add it anyway -- discussion should happen during commitfest if it
doesn't spark right away.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#2)
Re: multi-threaded pgbench

Alvaro Herrera wrote:

Itagaki Takahiro wrote:

Is it acceptable to use pthread in contrib module?

We don't have a precedent it seems. I think the requirement would be
that it should compile if pthread support is not present.

My thoughts as well. But I wonder, would it be harder or easier to use
fork() instead?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: multi-threaded pgbench

Alvaro Herrera <alvherre@commandprompt.com> writes:

Itagaki Takahiro wrote:

Is it acceptable to use pthread in contrib module?

We don't have a precedent it seems. I think the requirement would be
that it should compile if pthread support is not present.

Right. Breaking it for non-pthread environments is not acceptable.

The real question here is whether it will be a problem if pgbench
delivers significantly different results when built with or without
threading support. I can see arguents either way on that ...

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#3)
Re: multi-threaded pgbench

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Itagaki Takahiro wrote:

Is it acceptable to use pthread in contrib module?

We don't have a precedent it seems. I think the requirement would be
that it should compile if pthread support is not present.

My thoughts as well. But I wonder, would it be harder or easier to use
fork() instead?

I have just been down this road to some extent with parallel pg_restore,
which uses threads on Windows. That might be useful as a bit of a
template. Extending it to use pthreads would probably be fairly trivial.
The thread/fork specific stuff ended up being fairly isolated for
pg_restore. see src/bin/pg_dump/pg_backup_archiver.c:spawn_restore()

I think you should have it use pthreads if available, or Windows threads
there, or fork() elsewhere.

cheers

andrew

#6Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#4)
Re: multi-threaded pgbench

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Itagaki Takahiro wrote:

Is it acceptable to use pthread in contrib module?

We don't have a precedent it seems. I think the requirement would be
that it should compile if pthread support is not present.

Right. Breaking it for non-pthread environments is not acceptable.

The real question here is whether it will be a problem if pgbench
delivers significantly different results when built with or without
threading support. I can see arguents either way on that ...

well pgbench as it is now is now is more ore less unusable on modern
hardware for SELECT type queries(way too slow to scale to what the
backend can do thses days and the number of cores in a recent box).
It is only somewhat usable on the default update heavy test as well
because even there it is hitting scalability limits (ie I can easily
improve on its numbers with a perl script that forks and issues the same
queries).
I would even go as far as issuing a WARNING if pgbench is invoked and
not compiled with threads if we accept this patch...

Stefan

#7Greg Smith
gsmith@gregsmith.com
In reply to: ITAGAKI Takahiro (#1)
Re: multi-threaded pgbench

On Wed, 8 Jul 2009, Itagaki Takahiro wrote:

Multi-threading would be a solution. The attached patch adds -j
(number of jobs) option to pgbench.

Should probably name this -w "numbers of workers" to stay consistent with
terminology used on the server side.

Is it acceptable to use pthread in contrib module?
If ok, I will add the patch to the next commitfest.

pgbench is basically broken right now, as demonstrated by the lack of
scaling show in your results and similar ones I've collected. This looks
like it fixes the primary problem there. While it would be nice if a
multi-process based solution were written instead, unless someone is
willing to step up and volunteer to write one I'd much rather see your
patch go in than doing nothing at all. It shouldn't even impact old
results if you don't toggle the option on.

I have 3 new server systems I was going to run pgbench on anyway in the
next month as part of my standard performance testing on new hardware.
I'll be happy to mix in results using the multi-threaded pgbench to check
the patch's performance, along with the rest of the initial review here.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: multi-threaded pgbench

Andrew Dunstan <andrew@dunslane.net> writes:

I think you should have it use pthreads if available, or Windows threads
there, or fork() elsewhere.

Hmm, but how will you communicate stats back from the sub-processes?
pg_restore doesn't need anything more than a success/failure result
from its child processes, but I think pgbench will want more.

regards, tom lane

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: multi-threaded pgbench

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I think you should have it use pthreads if available, or Windows threads
there, or fork() elsewhere.

Hmm, but how will you communicate stats back from the sub-processes?
pg_restore doesn't need anything more than a success/failure result
from its child processes, but I think pgbench will want more.

My first reaction is to say "use a pipe."

cheers

andtrew

#10ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Andrew Dunstan (#5)
Re: multi-threaded pgbench

Andrew Dunstan <andrew@dunslane.net> wrote:

I think you should have it use pthreads if available, or Windows threads
there, or fork() elsewhere.

Just a question - which platform does not support any threading?
I think threading is very common in modern applications. If there
are such OSes, they seem to be just abandoned and not maintained...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#11Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#8)
Re: multi-threaded pgbench

On Wed, 8 Jul 2009, Tom Lane wrote:

pg_restore doesn't need anything more than a success/failure result
from its child processes, but I think pgbench will want more.

The biggest chunk of returned state to consider is how each client
transaction generates a line of latency information that goes into the log
file.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#12ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Andrew Dunstan (#9)
Re: multi-threaded pgbench

Here is an updated version of multi-threaded pgbench patch.

Andrew Dunstan <andrew@dunslane.net> wrote:

Hmm, but how will you communicate stats back from the sub-processes?

My first reaction is to say "use a pipe."

I added partial implementation of pthread using fork and pipe for platform
without ENABLE_THREAD_SAFETY. Pthread version is not necessarily needed
if we have the fork version, but I still left it as-is.

The name of new option is still -j, that is borrowed from pg_restore
and gmake. They use -j for multi-worker-processing.

-j NUM number of threads (default: 1)

I needed to modify the meaning of tps (excluding connections establishing)
a little because connections are executed in parallel. I subtract average
of connection times from total execution time.

total_time := last_thread_finish_time - first_thread_start_time
tps (including connection) := num_transaction / total_time
tps (excluding connection) := num_transaction /
(total_time - (total_connection_time / num_threads))

I notice that I also fixed a few parts of pgbench:
* Use instr_time instead of struct timeval.
Macros in portability/instr_time.h makes codes cleaner.
* Accept "\sleep 1ms" format (no spaces between "1" and "ms") for sleep
meta command. The old version of pgbench interprets "1ms" as just "1",
that means "1 s". It was confusable.

I'll add the patch to the commitfest page.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

pgbench-mt_20090709.patchapplication/octet-stream; name=pgbench-mt_20090709.patchDownload+713-442
#13Robert Haas
robertmhaas@gmail.com
In reply to: ITAGAKI Takahiro (#12)
Re: multi-threaded pgbench

On Thu, Jul 9, 2009 at 4:51 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of multi-threaded pgbench patch.

Greg (Smith), do you have time to review this version? If not, I will
assign a round-robin reviewer when one becomes available.

...Robert

#14Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#13)
Re: multi-threaded pgbench

On Sat, Jul 18, 2009 at 8:25 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Thu, Jul 9, 2009 at 4:51 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of multi-threaded pgbench patch.

Greg (Smith), do you have time to review this version?  If not, I will
assign a round-robin reviewer when one becomes available.

Incidentally you could assign me something if you want.

I gave feedback on Simon/Your join removal and the Append min/max
patch. I don't think either has really reached any conclusive
"finished" state though. I suppose I should mark your patch as
"returned with feedback" even if it's mostly just "good work, keep
going"? And the other patch isn't actually in this commitfest but I
think we're still discussing what it should do.

--
greg
http://mit.edu/~gsstark/resume.pdf

#15Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#13)
Re: multi-threaded pgbench

Greg (Smith), do you have time to review this version? If not, I will
assign a round-robin reviewer when one becomes available.

I can do a concurrency test of this next week.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#16Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#14)
Re: multi-threaded pgbench

On Jul 18, 2009, at 3:40 PM, Greg Stark <gsstark@mit.edu> wrote:

On Sat, Jul 18, 2009 at 8:25 PM, Robert Haas<robertmhaas@gmail.com>
wrote:

On Thu, Jul 9, 2009 at 4:51 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of multi-threaded pgbench patch.

Greg (Smith), do you have time to review this version? If not, I
will
assign a round-robin reviewer when one becomes available.

Incidentally you could assign me something if you want.

OK.

I gave feedback on Simon/Your join removal and the Append min/max
patch. I don't think either has really reached any conclusive
"finished" state though. I suppose I should mark your patch as
"returned with feedback" even if it's mostly just "good work, keep
going"? And the other patch isn't actually in this commitfest but I
think we're still discussing what it should do.

Well, I think we really need Tom to look at join removal. If he
doesn't have any better ideas for how to structure the code it's not
clear to me that we shouldn't just commit what I already did and then
start future work from there. But this seems like an issue for that
thread rather than this one.

Wrt append min/max I think we should postpone further discussion until
end of commitfest, since it was submitted mid-CommitFest.

...Robert

#17Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#15)
Re: multi-threaded pgbench

On Sun, Jul 19, 2009 at 12:50 AM, Josh Berkus<josh@agliodbs.com> wrote:

Greg (Smith), do you have time to review this version?  If not, I will
assign a round-robin reviewer when one becomes available.

I can do a concurrency test of this next week.

Sounds good.

...Robert

#18Greg Smith
gsmith@gregsmith.com
In reply to: ITAGAKI Takahiro (#12)
Re: multi-threaded pgbench

I just took multi-threaded pgbench for an initial spin, looks good overall
with only a couple of small rough edges.

The latest code works differently depending on whether you compiled with
--enable-thread-safety or not, it defines some structures based on fork if
it's not enabled:

#elif defined(ENABLE_THREAD_SAFETY)
#include <pthread.h>
#else
#include <sys/wait.h>
typedef struct fork_pthread *pthread_t;
typedef int pthread_attr_t;
static int pthread_create(pthread_t *thread, pthread_attr_t *attr, void
* (*start_routine)(void *), void * arg);
static int pthread_join(pthread_t th, void **thread_return);
#endif

That second code path, when --enable-thread-safety is turned off, crashes
and burns on my Linux system:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
-I../../src/interfaces/libpq -I. -I../../src/include -D_GNU_SOURCE -c -o
pgbench.o pgbench.c -MMD -MP -MF .deps/pgbench.Po
pgbench.c:72: error: conflicting types for pthread_t
/usr/include/bits/pthreadtypes.h:50: error: previous declaration of
pthread_t was here
pgbench.c:73: error: conflicting types for pthread_attr_t
/usr/include/bits/pthreadtypes.h:57: error: previous declaration of
pthread_attr_t was here

So that's the first problem to sort out, I was planning to test that path
as well as the regular threaded one. Since I'd expect there to be Linux
packages built both with and without thread safety enabled, they both
should work, even though people should always be turning safety on
nowadays.

We should try to get a Windows based tester here too at some point,
there's a completely different set of thread wrapper code for that OS that
could use a look by somebody more familiar than me with that platform.

The second thing that concerns me is that there's a limitation in the code
where the number of clients must be a multiple of the number of workers.
When I tried to gradually step up the client volume the tests wouldn't
run:

$ ./pgbench -j 16 -S -c 24 -t 10000 pgbench
number of clients (24) must be a multiple number of threads (16)

Once the larger issues are worked out, I would be much friendlier if it
were possible to pass new threads a client count so that the last in the
pool could service a smaller number. The logic for that is kind of a
pain--in this case you'd want 8 threads running 2 clients each while 8 ran
1 client--but it would really be much friendlier and flexible that way.

Onto performance. My test system has a 16 cores of Xeon X5550 @ 2.67GHz.
I created a little pgbench database (-s 10) and used the default
postgresql.conf parameters for everything but max_connections for a rough
initial test.

Performance on this box drops off pretty fast once you get past 16
clients; using the original, unpatched pgbench:

c tps
16 86887
24 70685
32 63787
64 64712
128 60602

A quick test of the new version suggest that there's no glaring
performance regression running it with a single client thread:

$ ./pgbench.orig -S -c 64 -t 10000 pgbench
tps = 64712.451737 (including connections establishing)

$ ./pgbench -S -c 64 -t 10000 pgbench
tps = 63806.494046 (including connections establishing)

So I moved onto to testing with a worker thread per CPU:

./pgbench -j 16 -S -c 16 -t 100000 pgbench
./pgbench -j 16 -S -c 32 -t 50000 pgbench
./pgbench -j 16 -S -c 64 -t 10000 pgbench
./pgbench -j 16 -S -c 128 -t 10000 pgbench

And got considerably better results:

c tps
16 96223
32 89014
64 82487
128 74217

That's as much as a 40% speedup @ 32 clients, and even a decent win at
lower counts.

The patch looks like it accomplishes its performance goals quite well
here. I'll be glad to run some more extensive performance tests, but I'd
like to at least see the version without --enable-thread-safety fixed
first so that I can queue up and compare both versions when I go through
that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#19ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Greg Smith (#18)
Re: multi-threaded pgbench

Greg Smith <gsmith@gregsmith.com> wrote:

That second code path, when --enable-thread-safety is turned off, crashes
and burns on my Linux system:

It comes from confliction of identifiers.
Renaming identifiers with #define can solve the errors:

#define pthread_t pg_pthread_t
#define pthread_attr_t pg_pthread_attr_t
#define pthread_create pg_pthread_create
#define pthread_join pg_pthread_join
typedef struct fork_pthread *pthread_t;
...

Another idea is that we don't use pthread and add 'pg_thread' wrapper
module on the top of pthread.

We can choose either of implementations... Which is better?

$ ./pgbench -j 16 -S -c 24 -t 10000 pgbench
number of clients (24) must be a multiple number of threads (16)

It's hard on forking-thread platforms because multiple threads need
to access the job queue. We need to put the queue on inter-process
shared memory, but it introduces additional complexities.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#20ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: ITAGAKI Takahiro (#19)
Re: multi-threaded pgbench

Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:

Greg Smith <gsmith@gregsmith.com> wrote:

That second code path, when --enable-thread-safety is turned off, crashes
and burns on my Linux system:

It comes from confliction of identifiers.
Renaming identifiers with #define can solve the errors:
#define pthread_t pg_pthread_t

Here is a patch to fix compile errors by identifier-renaming
when thread-safety is disabled on linux.

Also I fixed file descriptor leaks at the end of benchmark.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

pgbench-mt_20090724.patchapplication/octet-stream; name=pgbench-mt_20090724.patchDownload+748-442
#21Josh Williams
joshwilliams@ij.net
In reply to: Greg Smith (#18)
#22Greg Smith
gsmith@gregsmith.com
In reply to: Josh Williams (#21)
#23Josh Williams
joshwilliams@ij.net
In reply to: Greg Smith (#22)
#24Josh Williams
joshwilliams@ij.net
In reply to: Josh Williams (#23)
#25Greg Smith
gsmith@gregsmith.com
In reply to: Josh Williams (#24)
#26Magnus Hagander
magnus@hagander.net
In reply to: Josh Williams (#24)