7.2 is slow?
With the freshly retrieved current source, now PostgreSQL is running
fine on an AIX 5L box. Thanks Tom.
BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1. The hardware is a
4way machine. Since I thought that 7.2 improves the performance for
SMP machines, I'm now wondering why 7.2 is so slow.
postgresql.conf paramters changed from default values are:
max_connections = 1024
wal_sync_method = fdatasync
shared_buffers = 4096
deadlock_timeout = 1000000
configure option is: --enable-multibyte=EUC_JP
Of cousre, these setting are identical for both 7.1 and 7.2.
See attached graph...
Attachments:
With the freshly retrieved current source, now PostgreSQL is running
fine on an AIX 5L box. Thanks Tom.BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1. The hardware is a
4way machine. Since I thought that 7.2 improves the performance for
^^^^
SMP machines, I'm now wondering why 7.2 is so slow.
Ewe. I will remind people that this multi-cpu setup is exactly the type
of machine we wanted to speed up with the new light-weight locking code
that reduced spinlock looping.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tatsuo Ishii wrote:
With the freshly retrieved current source, now PostgreSQL is running
fine on an AIX 5L box. Thanks Tom.BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1.
Is this an AIX specific problem or do all/all SMP/all 4way computers
have it ?
Is this a bug that needs to be addresse before release of final ?
Or would we just prominently warn people that the new release is 2x
slower and advise upgrading only if they have powerful enough computers
(system load < 0.5 during normal operation)?
------------------------
Hannu
BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1.Is this an AIX specific problem or do all/all SMP/all 4way computers
have it ?
Not sure. As far as I can tell, nobody except me has tested 7.2 on big
boxes.
Is this a bug that needs to be addresse before release of final ?
I hope this would be solved before final. At least I would like to
know what's going on.
Anyway, I will do some testings on a smaller machine (that is my
laptop) to see if I see the same performance degration on it.
--
Tatsuo Ishii
Tatsuo Ishii wrote:
BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1.Is this an AIX specific problem or do all/all SMP/all 4way computers
have it ?Not sure. As far as I can tell, nobody except me has tested 7.2 on big
boxes.Is this a bug that needs to be addresse before release of final ?
I hope this would be solved before final. At least I would like to
know what's going on.Anyway, I will do some testings on a smaller machine (that is my
laptop) to see if I see the same performance degration on it.
How did you test ?
I could do the same test on Dual Pentium III / 800 w/1024 MB
with IBM 45 G/7200 IDE disk.
So we could compare different platforms as well :)
-------------
Hannu
On Mon, Dec 17, 2001 at 12:43:05PM +0200, Hannu Krosing allegedly wrote:
Tatsuo Ishii wrote:
BTW, I have done some benchmarking using pgbench on this machine and
found that 7.2 is almost two times slower than 7.1.Is this an AIX specific problem or do all/all SMP/all 4way computers
have it ?Not sure. As far as I can tell, nobody except me has tested 7.2 on big
boxes.Is this a bug that needs to be addresse before release of final ?
I hope this would be solved before final. At least I would like to
know what's going on.Anyway, I will do some testings on a smaller machine (that is my
laptop) to see if I see the same performance degration on it.How did you test ?
I could do the same test on Dual Pentium III / 800 w/1024 MB
with IBM 45 G/7200 IDE disk.So we could compare different platforms as well :)
I could do some testing on a Sun 450 / 4x400 MHz / 4 GB, if that's helpful.
Cheers,
Mathijs
Is this an AIX specific problem or do all/all SMP/all 4way computers
have it ?
I'll have 4 way and 8 way xeon boxes tues evening that I can test this
against (though I won't get to test till wed unless I don't sleep)
- Brandon
----------------------------------------------------------------------------
c: 646-456-5455 h: 201-798-4983
b. palmer, bpalmer@crimelabs.net pgp:crimelabs.net/bpalmer.pgp5
How did you test ?
I could do the same test on Dual Pentium III / 800 w/1024 MB
with IBM 45 G/7200 IDE disk.So we could compare different platforms as well :)
I could do some testing on a Sun 450 / 4x400 MHz / 4 GB, if that's helpful.
Cheers,
Mathijs
I'll have 4 way and 8 way xeon boxes tues evening that I can test this
against (though I won't get to test till wed unless I don't sleep)- Brandon
Thanks to everyone. Here are the methods I used for testings including
generating graphs (actually very simple).
(1) Tweak postgresql.conf to allow large concurrent users. I tested up
to 1024 on AIX, but for the comparison I think testing up to 128
users is enough. Here are example settings:
max_connections = 128
shared_buffers = 4096
deadlock_timeout = 100000
You might want to tweak wal_sync_method to get the best
performance. However this should not affect the comparison between
7.1 and 7.2.
(2) Run:
sh bench.sh
It will invoke pgbench for various concurrent users. So you need
to install pgbench beforehand (it's in contrib/pgbench. Just type
make install there to install pgbench).
This will take while.
(3) (2) will generate a file named "bench.data". The file have rows
where the first column is the number of concurrent users and
second one is the tps. Rename it to bench-7.2.data.
(4) Do (1) and (2) for PostgreSQL 7.1 and rename bench.data to
bench-7.1.data.
(5) Run plot.sh to see the result graph. Note that plot.sh requires
gnuplot.
---
Tatsuo Ishii
Attachments:
It seems that on dual PIII we are indeed faster than 7.1.3 for
small number of clients but slower for large number (~ 40)
My initial results on dual PIII/800 are as follows
7.1.3 7.2b4 7.2b4-FULL
==================================================================
./pgbench -i -p 5433
./pgbench -p 5433 -c 1 -t 100 240/251 217/223 177/181
./pgbench -p 5433 -c 5 -t 100 93/ 94 211/217 207/212
./pgbench -p 5433 -c 10 -t 100 57/ 58 145/148 160/163
------------------------------------------------------------------
./pgbench -i -s 10 -p 5433
./pgbench -p 5433 -c 1 -t 100 171/177 162/166 169/173
./pgbench -p 5433 -c 5 -t 100 140/143 191/196 202/207
./pgbench -p 5433 -c 10 -t 100 132/135 165/168 159/163
./pgbench -p 5433 -c 25 -t 100 65/ 66 60/ 60 75/ 76
./pgbench -p 5433 -c 50 -t 100 60/ 61 43/ 43 55/ 59
./pgbench -p 5433 -c 100 -t 100 48/ 48 23/ 23 34/ 34
------------------------------------------------------------------
One of thereasons seems to be that vacuum has chaged
after oding
psql -p 5433 -c 'vacuum full'
the result of
./pgbench -p 5433 -c 100 -t 100
was 34/34 - still ~25% slower than 7.1.3 but much better
than with non-full vacuum (which I guess is used by pgbench
The third column 7.2b4-FULL is done by running
"psql -p 5433 -c 'vacuum full'"
between each pgbench run - now the lines cross somwhere
between 25 and 50 concurrent users
One of the reasons pg is slower on last limes of my test is that
postgres is slower when vacuum is not done often enough -
on fresh db
"./pgbench -p 5433 -c 100 -t 10" gives 67/75 as result
indicating that one reason is just our non-overwriting storage manager.
I also tried to outsmart pg by running the new vacuum
concurrently, but was disappointed.
vacuuming in 'normal' psql gave me 20/20 tps and running
with nice psql gave 21/21 tps
running ./pgbench -p 5433 -c 100 -t 100 as first benchmark gave the
same result as running it after vacuum full
-----------------------------------------------------------------------
PS. I hope to get single-processor results from the same computer in
about 6 hours as well (after my co-worker arrives home and can reboot
his computer to single-user)
Inxc - after you have rebooted to single-processor mode, pleas start
the postgres daemon by
su - hannu
cd db/7.2b4/
bin/pg_ctl -D data -l logfile
and ther run above pgbench commands from
cd /home/hannu/src/postgresql-7.1.3/contrib/pgbench/
-----------------------------------------------------------------------
Tatsuo Ishii wrote:
Thanks to everyone. Here are the methods I used for testings including
generating graphs (actually very simple).(1) Tweak postgresql.conf to allow large concurrent users. I tested up
to 1024 on AIX, but for the comparison I think testing up to 128
users is enough. Here are example settings:max_connections = 128
shared_buffers = 4096
deadlock_timeout = 100000You might want to tweak wal_sync_method to get the best
performance. However this should not affect the comparison between
7.1 and 7.2.(2) Run:
sh bench.sh
I have no more time today, but I'll redo the tests with your script
tomorrow
(after I have found where to stick database name and port :)
----------------
Hannu
Hannu Krosing <hannu@tm.ee> writes:
./pgbench -i -s 10 -p 5433
./pgbench -p 5433 -c 1 -t 100 171/177 162/166 169/173
./pgbench -p 5433 -c 5 -t 100 140/143 191/196 202/207
./pgbench -p 5433 -c 10 -t 100 132/135 165/168 159/163
./pgbench -p 5433 -c 25 -t 100 65/ 66 60/ 60 75/ 76
./pgbench -p 5433 -c 50 -t 100 60/ 61 43/ 43 55/ 59
./pgbench -p 5433 -c 100 -t 100 48/ 48 23/ 23 34/ 34
You realize, of course, that when the number of clients exceeds the
scale factor you're not really measuring anything except update
contention on the "branch" rows? Every transaction tries to update
the balance for its branch, so if you have more clients than branches
then there will be lots of transactions blocked waiting for someone
else to commit. With a 10:1 ratio, there will be several transactions
blocked waiting for *each* active transaction; and when that guy
commits, all the others will waken simultaneously and contend for the
chance to update the branch row. One will win, the others will go
back to sleep, having done nothing except wasting CPU time. Thus a
severe falloff in measured TPS is inevitable when -c >> -s. I don't
think this scenario has all that much to do with real-world loads,
however.
I think you are right that the difference between 7.1 and 7.2 may have
more to do with the change in VACUUM strategy than anything else. Could
you retry the test after changing all the "vacuum" commands in pgbench.c
to "vacuum full"?
regards, tom lane
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
./pgbench -i -s 10 -p 5433
./pgbench -p 5433 -c 1 -t 100 171/177 162/166 169/173
./pgbench -p 5433 -c 5 -t 100 140/143 191/196 202/207
./pgbench -p 5433 -c 10 -t 100 132/135 165/168 159/163
./pgbench -p 5433 -c 25 -t 100 65/ 66 60/ 60 75/ 76
./pgbench -p 5433 -c 50 -t 100 60/ 61 43/ 43 55/ 59
./pgbench -p 5433 -c 100 -t 100 48/ 48 23/ 23 34/ 34You realize, of course, that when the number of clients exceeds the
scale factor you're not really measuring anything except update
contention on the "branch" rows?
Oops! I thought that the deciding table would be tellers and this -s 10
would be ok for up to 100 users
I will retry this with Tatsuos using -s 128(if it still fits on disk
- taking about 160MB/1Mtuple needs 1.6GB for test with -s 100 and
I currently have only 1.3G free)
I re-run some of them with -s 50 (on 7.2b4)
each one after running "psql -p 5433 -c 'vacuum full;checkpoint;'"
tps
./pgbench -p 5433 -i -s 50
./pgbench -p 5433 -c 1 -t 1000 93/ 93
./pgbench -p 5433 -c 3 -t 333 106/107
./pgbench -p 5433 -c 5 -t 200 106/107
./pgbench -p 5433 -c 8 -t 125 112/113
./pgbench -p 5433 -c 10 -t 100 94/ 95
./pgbench -p 5433 -c 25 -t 40 98/ 91
./pgbench -p 5433 -c 50 -t 20 70/ 74
Every transaction tries to update
the balance for its branch, so if you have more clients than branches
then there will be lots of transactions blocked waiting for someone
else to commit. With a 10:1 ratio, there will be several transactions
blocked waiting for *each* active transaction; and when that guy
commits, all the others will waken simultaneously and contend for the
chance to update the branch row. One will win, the others will go
back to sleep, having done nothing except wasting CPU time. Thus a
severe falloff in measured TPS is inevitable when -c >> -s. I don't
think this scenario has all that much to do with real-world loads,
however.
It probably models a real-world ill-tuned database :)
And it seems that we fall off more rapidly on 7.2 than we did on 7.1 ,
even so much so that we will be slower in the end.
I think you are right that the difference between 7.1 and 7.2 may have
more to do with the change in VACUUM strategy than anything else. Could
you retry the test after changing all the "vacuum" commands in pgbench.c
to "vacuum full"?
The third column should be the equivalent of doing so (I did run
'vacuum full' between each pgbench and AFACT pgbencg runs vacuun only
before each run)
--------------
Hannu
I haven't tested with the new 7.2 betas, but here are some results from
7.1.
We have a developement computer, IBM x series 250, with 4 processors
(PIII Xeon 750Mhz), 1 Gb memory and 2SCSI disks (u160).
The software is writing new rows to a table, and after this it reads the
id from that row. There are currently about 50 connections doing the
same thing.
When I run this test with the Redhat 7.1 SMP kernel, I noticed, that the
processors are more than 90% idle. Disks utilisation is not the
bottleneck either, since there is very low disk usage. Some data is
written to disks every 4-5 seconds. Fsync is turned of. In transactions,
this means about 200 inserted rows per second. The software that is used
to give the feed, is capable of several thousand rows per second.
Okey, so I tried this also with the same computer, but using the not SMP
supported kernel. So only with one processor. The result was about 600
rows per second. The configuration file was unchanged. Now, the
processor is about 100% utilized.
I didn't find any parameters that should help in this, but if you have a
version of 7.2 that you would like to get information about, let me
know, so I'll test.
Jussi
Import Notes
Resolved by subject fallback
I think you are right that the difference between 7.1 and 7.2 may have
more to do with the change in VACUUM strategy than anything else. Could
you retry the test after changing all the "vacuum" commands in pgbench.c
to "vacuum full"?
Might there also be a difference in chosen query plans ?
Wasn't 7.1 more willing to choose an index over seq scan,
even though the scan would be faster in the single user case ?
Or was that change after 7.0 ?
The seq scan would be slower that the index in the case of
many concurrent accesses.
Andreas
Import Notes
Resolved by subject fallback
I haven't tested with the new 7.2 betas, but here are some results from
7.1.
We have a developement computer, IBM x series 250, with 4 processors
(PIII Xeon 750Mhz), 1 Gb memory and 2SCSI disks (u160).
The software is writing new rows to a table, and after this it reads the
id from that row. There are currently about 50 connections doing the
same thing.
When I run this test with the Redhat 7.1, with SMP kernel, I noticed, that
the
processors are more than 90% idle. Disks utilisation is not the
bottleneck either, since there is very low disk usage. Some data is
written to disks every 4-5 seconds. Fsync is turned of. In transactions,
this means about 200 inserted rows per second. The software that is used
to give the feed, is capable of several thousand rows per second.
Okey, so I tried this also with the same computer, but using the not SMP
supported kernel. So only with one processor. The result was about 600
rows per second. The configuration file was unchanged. Now, the
processor is about 100% utilized.
I didn't find any parameters that should help in this, but if you have a
version of 7.2 that you would like to get information about, let me
know, so I'll test.
Jussi
Zeugswetter Andreas SB SD wrote:
I think you are right that the difference between 7.1 and 7.2 may have
more to do with the change in VACUUM strategy than anything else. Could
you retry the test after changing all the "vacuum" commands in pgbench.c
to "vacuum full"?Might there also be a difference in chosen query plans ?
Wasn't 7.1 more willing to choose an index over seq scan,
even though the scan would be faster in the single user case ?
Or was that change after 7.0 ?The seq scan would be slower that the index in the case of
many concurrent accesses.Andreas
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Jussi Mikkola Project Manager
Bonware Oy gsm +358 40 830 7561
Tekniikantie 12 tel +358 9 2517 5570
02150 Espoo fax +358 9 2517 5571
Finland www.bonware.com
I haven't tested with the new 7.2 betas, but here are some results from
7.1.We have a developement computer, IBM x series 250, with 4 processors
(PIII Xeon 750Mhz), 1 Gb memory and 2SCSI disks (u160).The software is writing new rows to a table, and after this it reads the
id from that row. There are currently about 50 connections doing the
same thing.When I run this test with the Redhat 7.1, with SMP kernel, I noticed, that
the
processors are more than 90% idle. Disks utilisation is not the
bottleneck either, since there is very low disk usage. Some data is
written to disks every 4-5 seconds. Fsync is turned of. In transactions,
this means about 200 inserted rows per second. The software that is used
to give the feed, is capable of several thousand rows per second.Okey, so I tried this also with the same computer, but using the not SMP
supported kernel. So only with one processor. The result was about 600
rows per second. The configuration file was unchanged. Now, the
processor is about 100% utilized.I didn't find any parameters that should help in this, but if you have a
version of 7.2 that you would like to get information about, let me
know, so I'll test.
Yes! This sleeping case is the problem we expected to see on SMP
machines in >= 7.1 because of lock contention and a select() that can't
sleep for less than 1/100 second. Please try the current 7.2 snapshot
and let us know what performance you get.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hi !
Yes, now I've tested with 7.2b4. The result is about the same as with 7.1.
About 200 messages with four processors and about 600 messages with one
processor.
Jussi
Yes! This sleeping case is the problem we expected to see on SMP
machines in >= 7.1 because of lock contention and a select() that can't
sleep for less than 1/100 second. Please try the current 7.2 snapshot
and let us know what performance you get.-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Jussi Mikkola Project Manager
Bonware Oy gsm +358 40 830 7561
Tekniikantie 12 tel +358 9 2517 5570
02150 Espoo fax +358 9 2517 5571
Finland www.bonware.com
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
Might there also be a difference in chosen query plans ?
If so, it'd affect the results across-the-board, but AFAICT
Tatsuo is seeing comparable results for small numbers of clients.
regards, tom lane
Hi All,
I have experienced similar problems after moving my main server from UP
(PII 300 box) to SMP (PII400 box). I remeber someone said look at the
iostat figures for the different runs, but I haven't had time to check
it out.
Out of curosity, what does iostat say when run in SMP vs UP?
Ashley Cambrell
Jussi Mikkola wrote:
Show quoted text
Hi !
Yes, now I've tested with 7.2b4. The result is about the same as with 7.1.
About 200 messages with four processors and about 600 messages with one
processor.Jussi
<snip>
We're getting similar problem.
We're currently working on TPC-H benchmarking using postgresql 7.2b3.
From one up to 8 paralell conexions (we've got 8 MIPS processors), uptime
increases from 1 to 8,
but increasing above 8 makes performance drop rapidly to uptimes even lower
than 2 for 22 conexions.
As we've could trail, when we have more processes than processors, we're
getting an increasing number of collisions.
when collision happens both processes get idle for a while, then collision may
happen again and so...
Regards
Luis Amigo
Universidad de Cantabria
Jussi Mikkola <jussi.mikkola@bonware.com> writes:
Yes, now I've tested with 7.2b4. The result is about the same as with 7.1.
About 200 messages with four processors and about 600 messages with one
processor.
That's annoying. The LWLock changes were intended to solve the
inefficiency with multiple CPUs, but it seems like we still have a
problem somewhere.
Could you recompile the backend with profiling enabled and try to get
a profile from your test case? To build a profilable backend, it's
sufficient to do
cd .../src/backend
gmake clean
gmake PROFILE=-pg all
gmake install-bin
(assuming you are using gcc). Then restart the postmaster, and you
should notice "gmon.out" files being dropped into the various database
subdirectories anytime a backend exits. Next run your test case,
and as soon as it finishes copy the gmon.out file to a safe place.
(You'll only be able to get the profile from the last process to exit,
so try to make sure that this is representative. Might be worth
repeating the test a few times to make sure that the results don't
vary a whole lot.) Finally, do
gprof .../bin/postgres gmon.out >resultfile
to produce a legible result.
Oh, one more thing: on Linuxen you are likely to find that all the
reported routine runtimes are zero, rendering the results useless.
Apply the attached patch (for 7.2beta) to fix this.
regards, tom lane
*** src/backend/postmaster/postmaster.c.orig Wed Dec 12 14:52:03 2001
--- src/backend/postmaster/postmaster.c Mon Dec 17 19:38:29 2001
***************
*** 1823,1828 ****
--- 1823,1829 ----
{
Backend *bn; /* for backend cleanup */
pid_t pid;
+ struct itimerval svitimer;
/*
* Compute the cancel key that will be assigned to this backend. The
***************
*** 1858,1869 ****
--- 1859,1874 ----
beos_before_backend_startup();
#endif
+ getitimer(ITIMER_PROF, &svitimer);
+
pid = fork();
if (pid == 0) /* child */
{
int status;
+ setitimer(ITIMER_PROF, &svitimer, NULL);
+
free(bn);
#ifdef __BEOS__
/* Specific beos backend startup actions */
Luis Amigo <lamigo@atc.unican.es> writes:
We're getting similar problem.
We're currently working on TPC-H benchmarking using postgresql 7.2b3.
From one up to 8 paralell conexions (we've got 8 MIPS processors),
MIPS? Which spinlock implementation is getting used? (Look in
src/include/storage/s_lock.h and src/backend/storage/lmgr/s_lock.c)
If you're falling back to the default SysV-semaphore based spinlock
implementation, I wouldn't be surprised to see a performance problem...
regards, tom lane
I haven't actually tried to compile 7.2 from the CVS, but there seems to
be a problem? [maybe on my side]
make[3]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend/utils'
gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations
-Wl,-rpath,/tmp//lib -export-dynamic access/SUBSYS.o bootstrap/SUBSYS.o
catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o
optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o
rewrite/SUBSYS.o storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz
-lcrypt -lresolv -lnsl -ldl -lm -lreadline -o postgres
nodes/SUBSYS.o: In function `pprint':
nodes/SUBSYS.o(.text+0xdc95): undefined reference to `MIN'
nodes/SUBSYS.o(.text+0xdcfd): undefined reference to `MIN'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1
make[2]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/ash/ash-server/Work/build/pgsql/src'
make: *** [all] Error 2
In ./src/backend/nodes/print.c:
/* outdent */
if (indentLev > 0)
{
indentLev--;
indentDist = MIN(indentLev * INDENTSTOP, MAXINDENT);
}
If I add
#ifndef MIN
#define MIN(a,b) (((a)<(b)) ? (a) : (b))
#endif
to print.c it compiles fine.
Ashley Cambrell
<snip>
Show quoted text
That's annoying. The LWLock changes were intended to solve the
inefficiency with multiple CPUs, but it seems like we still have a
problem somewhere.Could you recompile the backend with profiling enabled and try to get
a profile from your test case? To build a profilable backend, it's
sufficient to docd .../src/backend
gmake clean
gmake PROFILE=-pg all
gmake install-bin(assuming you are using gcc). Then restart the postmaster, and you
should notice "gmon.out" files being dropped into the various database
subdirectories anytime a backend exits. Next run your test case,
and as soon as it finishes copy the gmon.out file to a safe place.
(You'll only be able to get the profile from the last process to exit,
so try to make sure that this is representative. Might be worth
repeating the test a few times to make sure that the results don't
vary a whole lot.) Finally, dogprof .../bin/postgres gmon.out >resultfile
to produce a legible result.
Oh, one more thing: on Linuxen you are likely to find that all the
reported routine runtimes are zero, rendering the results useless.
Apply the attached patch (for 7.2beta) to fix this.regards, tom lane
*** src/backend/postmaster/postmaster.c.orig Wed Dec 12 14:52:03 2001 --- src/backend/postmaster/postmaster.c Mon Dec 17 19:38:29 2001 *************** *** 1823,1828 **** --- 1823,1829 ---- { Backend *bn; /* for backend cleanup */ pid_t pid; + struct itimerval svitimer;/* * Compute the cancel key that will be assigned to this backend. The *************** *** 1858,1869 **** --- 1859,1874 ---- beos_before_backend_startup(); #endif+ getitimer(ITIMER_PROF, &svitimer); + pid = fork();if (pid == 0) /* child */
{
int status;+ setitimer(ITIMER_PROF, &svitimer, NULL); + free(bn); #ifdef __BEOS__ /* Specific beos backend startup actions */---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
I just got the same problem on latest CVS on freebsd/i386
gmake[4]: Entering directory `/home/chriskl/pgsql/src/backend/utils/time'
gcc -pipe -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../.
./src/include -c -o tqual.o tqual.c
/usr/libexec/elf/ld -r -o SUBSYS.o tqual.o
gmake[4]: Leaving directory `/home/chriskl/pgsql/src/backend/utils/time'
/usr/libexec/elf/ld -r -o SUBSYS.o fmgrtab.o adt/SUBSYS.o cache/SUBSYS.o
error/SUBSYS.o fmgr/SUBSYS.o hash/SUBSYS.o init/SUBSYS.o misc/SUBS
YS.o mmgr/SUBSYS.o sort/SUBSYS.o time/SUBSYS.o
gmake[3]: Leaving directory `/home/chriskl/pgsql/src/backend/utils'
gcc -pipe -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -R/home/chr
iskl/local/lib -export-dynamic access/SUBSYS.o bootstrap/SUBSYS
.o catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o optimizer/
SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o rewrite/SUBSYS.o
storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz -lcrypt -lcomp
at -lm -lutil -lreadline -o postgres
nodes/SUBSYS.o: In function `pprint':
nodes/SUBSYS.o(.text+0xda71): undefined reference to `MIN'
nodes/SUBSYS.o(.text+0xdade): undefined reference to `MIN'
gmake[2]: *** [postgres] Error 1
gmake[2]: Leaving directory `/home/chriskl/pgsql/src/backend'
gmake[1]: *** [all] Error 2
gmake[1]: Leaving directory `/home/chriskl/pgsql/src'
gmake: *** [all] Error 2
Chris
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Ashley Cambrell
Sent: Thursday, 20 December 2001 8:51 AM
To: Tom Lane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] 7.2 is slow? [compile problem]I haven't actually tried to compile 7.2 from the CVS, but there seems to
be a problem? [maybe on my side]make[3]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend/utils'
gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations
-Wl,-rpath,/tmp//lib -export-dynamic access/SUBSYS.o bootstrap/SUBSYS.o
catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o
optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o
rewrite/SUBSYS.o storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz
-lcrypt -lresolv -lnsl -ldl -lm -lreadline -o postgres
nodes/SUBSYS.o: In function `pprint':
nodes/SUBSYS.o(.text+0xdc95): undefined reference to `MIN'
nodes/SUBSYS.o(.text+0xdcfd): undefined reference to `MIN'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1
make[2]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/ash/ash-server/Work/build/pgsql/src'
make: *** [all] Error 2In ./src/backend/nodes/print.c:
/* outdent */
if (indentLev > 0)
{
indentLev--;
indentDist = MIN(indentLev * INDENTSTOP, MAXINDENT);
}If I add
#ifndef MIN
#define MIN(a,b) (((a)<(b)) ? (a) : (b))
#endif
to print.c it compiles fine.Ashley Cambrell
<snip>
That's annoying. The LWLock changes were intended to solve the
inefficiency with multiple CPUs, but it seems like we still have a
problem somewhere.Could you recompile the backend with profiling enabled and try to get
a profile from your test case? To build a profilable backend, it's
sufficient to docd .../src/backend
gmake clean
gmake PROFILE=-pg all
gmake install-bin(assuming you are using gcc). Then restart the postmaster, and you
should notice "gmon.out" files being dropped into the various database
subdirectories anytime a backend exits. Next run your test case,
and as soon as it finishes copy the gmon.out file to a safe place.
(You'll only be able to get the profile from the last process to exit,
so try to make sure that this is representative. Might be worth
repeating the test a few times to make sure that the results don't
vary a whole lot.) Finally, dogprof .../bin/postgres gmon.out >resultfile
to produce a legible result.
Oh, one more thing: on Linuxen you are likely to find that all the
reported routine runtimes are zero, rendering the results useless.
Apply the attached patch (for 7.2beta) to fix this.regards, tom lane
*** src/backend/postmaster/postmaster.c.orig Wed Dec 12 14:52:03 2001 --- src/backend/postmaster/postmaster.c Mon Dec 17 19:38:29 2001 *************** *** 1823,1828 **** --- 1823,1829 ---- { Backend *bn; /* for backend cleanup */ pid_t pid; + struct itimerval svitimer;/* * Compute the cancel key that will be assigned to this backend. The *************** *** 1858,1869 **** --- 1859,1874 ---- beos_before_backend_startup(); #endif+ getitimer(ITIMER_PROF, &svitimer); + pid = fork();if (pid == 0) /* child */
{
int status;+ setitimer(ITIMER_PROF, &svitimer, NULL); + free(bn); #ifdef __BEOS__ /* Specific beos backend startup actions */---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
OK, I just committed a fix. MIN() was used in the pretty node print
patch; should have been Min().
---------------------------------------------------------------------------
I just got the same problem on latest CVS on freebsd/i386
gmake[4]: Entering directory `/home/chriskl/pgsql/src/backend/utils/time'
gcc -pipe -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -I../../../.
./src/include -c -o tqual.o tqual.c
/usr/libexec/elf/ld -r -o SUBSYS.o tqual.o
gmake[4]: Leaving directory `/home/chriskl/pgsql/src/backend/utils/time'
/usr/libexec/elf/ld -r -o SUBSYS.o fmgrtab.o adt/SUBSYS.o cache/SUBSYS.o
error/SUBSYS.o fmgr/SUBSYS.o hash/SUBSYS.o init/SUBSYS.o misc/SUBS
YS.o mmgr/SUBSYS.o sort/SUBSYS.o time/SUBSYS.o
gmake[3]: Leaving directory `/home/chriskl/pgsql/src/backend/utils'
gcc -pipe -O2 -Wall -Wmissing-prototypes -Wmissing-declarations -R/home/chr
iskl/local/lib -export-dynamic access/SUBSYS.o bootstrap/SUBSYS
.o catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o optimizer/
SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o rewrite/SUBSYS.o
storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz -lcrypt -lcomp
at -lm -lutil -lreadline -o postgres
nodes/SUBSYS.o: In function `pprint':
nodes/SUBSYS.o(.text+0xda71): undefined reference to `MIN'
nodes/SUBSYS.o(.text+0xdade): undefined reference to `MIN'
gmake[2]: *** [postgres] Error 1
gmake[2]: Leaving directory `/home/chriskl/pgsql/src/backend'
gmake[1]: *** [all] Error 2
gmake[1]: Leaving directory `/home/chriskl/pgsql/src'
gmake: *** [all] Error 2Chris
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Ashley Cambrell
Sent: Thursday, 20 December 2001 8:51 AM
To: Tom Lane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] 7.2 is slow? [compile problem]I haven't actually tried to compile 7.2 from the CVS, but there seems to
be a problem? [maybe on my side]make[3]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend/utils'
gcc -O2 -Wall -Wmissing-prototypes -Wmissing-declarations
-Wl,-rpath,/tmp//lib -export-dynamic access/SUBSYS.o bootstrap/SUBSYS.o
catalog/SUBSYS.o parser/SUBSYS.o commands/SUBSYS.o executor/SUBSYS.o
lib/SUBSYS.o libpq/SUBSYS.o main/SUBSYS.o nodes/SUBSYS.o
optimizer/SUBSYS.o port/SUBSYS.o postmaster/SUBSYS.o regex/SUBSYS.o
rewrite/SUBSYS.o storage/SUBSYS.o tcop/SUBSYS.o utils/SUBSYS.o -lz
-lcrypt -lresolv -lnsl -ldl -lm -lreadline -o postgres
nodes/SUBSYS.o: In function `pprint':
nodes/SUBSYS.o(.text+0xdc95): undefined reference to `MIN'
nodes/SUBSYS.o(.text+0xdcfd): undefined reference to `MIN'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1
make[2]: Leaving directory
`/home/ash/ash-server/Work/build/pgsql/src/backend'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/ash/ash-server/Work/build/pgsql/src'
make: *** [all] Error 2In ./src/backend/nodes/print.c:
/* outdent */
if (indentLev > 0)
{
indentLev--;
indentDist = MIN(indentLev * INDENTSTOP, MAXINDENT);
}If I add
#ifndef MIN
#define MIN(a,b) (((a)<(b)) ? (a) : (b))
#endif
to print.c it compiles fine.Ashley Cambrell
<snip>
That's annoying. The LWLock changes were intended to solve the
inefficiency with multiple CPUs, but it seems like we still have a
problem somewhere.Could you recompile the backend with profiling enabled and try to get
a profile from your test case? To build a profilable backend, it's
sufficient to docd .../src/backend
gmake clean
gmake PROFILE=-pg all
gmake install-bin(assuming you are using gcc). Then restart the postmaster, and you
should notice "gmon.out" files being dropped into the various database
subdirectories anytime a backend exits. Next run your test case,
and as soon as it finishes copy the gmon.out file to a safe place.
(You'll only be able to get the profile from the last process to exit,
so try to make sure that this is representative. Might be worth
repeating the test a few times to make sure that the results don't
vary a whole lot.) Finally, dogprof .../bin/postgres gmon.out >resultfile
to produce a legible result.
Oh, one more thing: on Linuxen you are likely to find that all the
reported routine runtimes are zero, rendering the results useless.
Apply the attached patch (for 7.2beta) to fix this.regards, tom lane
*** src/backend/postmaster/postmaster.c.orig Wed Dec 12 14:52:03 2001 --- src/backend/postmaster/postmaster.c Mon Dec 17 19:38:29 2001 *************** *** 1823,1828 **** --- 1823,1829 ---- { Backend *bn; /* for backend cleanup */ pid_t pid; + struct itimerval svitimer;/* * Compute the cancel key that will be assigned to this backend. The *************** *** 1858,1869 **** --- 1859,1874 ---- beos_before_backend_startup(); #endif+ getitimer(ITIMER_PROF, &svitimer); + pid = fork();if (pid == 0) /* child */
{
int status;+ setitimer(ITIMER_PROF, &svitimer, NULL); + free(bn); #ifdef __BEOS__ /* Specific beos backend startup actions */---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
OK, I just committed a fix. MIN() was used in the pretty node print
patch; should have been Min().
Mea maxima (or MINima?) culpa. Thanks ...
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
OK, I just committed a fix. MIN() was used in the pretty node print
patch; should have been Min().Mea maxima (or MINima?) culpa. Thanks ...
I also noticed we define our our MAX/MIN in adt/numeric.c and a few
other files. I put it on my list to research that in 7.3 and maybe use
the c.h versions.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hi Tom!
Well, here's the profile, but yes, almost all the times are zero. Both without
the patch, and with it. Did I miss something? (Yes, I did make and install
afterwards ;-)
I made a new database, so as it was growing, the times were even slower than
before. (It would be nice, if I could create a large database from the
beginning.;)
Is this of any help? Do you need the same result with less cpus?
Jussi
Tom Lane wrote:
Jussi Mikkola <jussi.mikkola@bonware.com> writes:
Yes, now I've tested with 7.2b4. The result is about the same as with 7.1.
About 200 messages with four processors and about 600 messages with one
processor.That's annoying. The LWLock changes were intended to solve the
inefficiency with multiple CPUs, but it seems like we still have a
problem somewhere.Could you recompile the backend with profiling enabled and try to get
a profile from your test case? To build a profilable backend, it's
sufficient to docd .../src/backend
gmake clean
gmake PROFILE=-pg all
gmake install-bin(assuming you are using gcc). Then restart the postmaster, and you
should notice "gmon.out" files being dropped into the various database
subdirectories anytime a backend exits. Next run your test case,
and as soon as it finishes copy the gmon.out file to a safe place.
(You'll only be able to get the profile from the last process to exit,
so try to make sure that this is representative. Might be worth
repeating the test a few times to make sure that the results don't
vary a whole lot.) Finally, dogprof .../bin/postgres gmon.out >resultfile
to produce a legible result.
Oh, one more thing: on Linuxen you are likely to find that all the
reported routine runtimes are zero, rendering the results useless.
Apply the attached patch (for 7.2beta) to fix this.regards, tom lane
*** src/backend/postmaster/postmaster.c.orig Wed Dec 12 14:52:03 2001 --- src/backend/postmaster/postmaster.c Mon Dec 17 19:38:29 2001 *************** *** 1823,1828 **** --- 1823,1829 ---- { Backend *bn; /* for backend cleanup */ pid_t pid; + struct itimerval svitimer;/* * Compute the cancel key that will be assigned to this backend. The *************** *** 1858,1869 **** --- 1859,1874 ---- beos_before_backend_startup(); #endif+ getitimer(ITIMER_PROF, &svitimer); + pid = fork();if (pid == 0) /* child */
{
int status;+ setitimer(ITIMER_PROF, &svitimer, NULL); + free(bn); #ifdef __BEOS__ /* Specific beos backend startup actions */
--
Jussi Mikkola Project Manager
Bonware Oy gsm +358 40 830 7561
Tekniikantie 12 tel +358 9 2517 5570
02150 Espoo fax +358 9 2517 5571
Finland www.bonware.com
Attachments:
Jussi Mikkola <jussi.mikkola@bonware.com> writes:
Well, here's the profile, but yes, almost all the times are zero.
It looks to me like this profile only covers the postmaster, not a
backend. You want to use gmon.out from down inside the database's
subdirectory ($PGDATA/base/something/gmon.out).
Both without the patch, and with it. Did I miss something? (Yes, I did
make and install afterwards ;-)
[ scratches head ] Dunno. You did restart the postmaster after
installing the new executable, right?
regards, tom lane
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
./pgbench -i -s 10 -p 5433
./pgbench -p 5433 -c 1 -t 100 171/177 162/166 169/173
./pgbench -p 5433 -c 5 -t 100 140/143 191/196 202/207
./pgbench -p 5433 -c 10 -t 100 132/135 165/168 159/163
./pgbench -p 5433 -c 25 -t 100 65/ 66 60/ 60 75/ 76
./pgbench -p 5433 -c 50 -t 100 60/ 61 43/ 43 55/ 59
./pgbench -p 5433 -c 100 -t 100 48/ 48 23/ 23 34/ 34You realize, of course, that when the number of clients exceeds the
scale factor you're not really measuring anything except update
contention on the "branch" rows? Every transaction tries to update
the balance for its branch, so if you have more clients than branches
then there will be lots of transactions blocked waiting for someone
else to commit. With a 10:1 ratio, there will be several transactions
blocked waiting for *each* active transaction; and when that guy
commits, all the others will waken simultaneously and contend for the
chance to update the branch row. One will win, the others will go
back to sleep, having done nothing except wasting CPU time. Thus a
severe falloff in measured TPS is inevitable when -c >> -s. I don't
think this scenario has all that much to do with real-world loads,
however.
I did some benchmarking and the interesting part is that 7.2b4 is up to
2.5X faster than 7.1.3 for _small_ scale factors and up to 25% slower
when there is no contention (-s128, clients <= 128)
Perhaps the waiting on lock somehow organizes things to happen in some
order that avoids some stupidity in some other locking logic ?
I run benchmark (with added vacuum full for 7.2b4) on Dual PIII 800MHz
with 1 G of RAM and an IDE disk. The results are mean from six runs
with two slowes removed (there was other activity going on sometimes)
they are for scale factors 1, 10 and 128
in order to measure real performance of roughly the _same_ dataset each
test run did the same total number of transactions 512 with each client
doing 512/nr_of_trx.
Attachments:
Hannu Krosing <hannu@tm.ee> writes:
in order to measure real performance of roughly the _same_ dataset each
test run did the same total number of transactions 512 with each client
doing 512/nr_of_trx.
That means you're only measuring a few transactions per backend (as few
as 4, near the upper end of the scale). I think the results may say
more about backend-startup transients than true peak throughput.
Could you try it again with a run about ten times that long?
regards, tom lane
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
in order to measure real performance of roughly the _same_ dataset each
test run did the same total number of transactions 512 with each client
doing 512/nr_of_trx.That means you're only measuring a few transactions per backend (as few
as 4, near the upper end of the scale). I think the results may say
more about backend-startup transients than true peak throughput.
Could you try it again with a run about ten times that long?
I did run 4096trx on 7.2b4 with -s 1, best 4-of-6
512trx 4096trx ratio
1 180.59 90.15 2.00
2 221.52 80.92 2.74
4 203.72 75.60 2.69
8 179.54 69.29 2.59
16 156.68 63.15 2.48
32 123.48 57.73 2.14
64 89.99 54.14 1.66
128 61.84 48.97 1.26
so it seems that large number of of transactions degrades tps
performance faster than
connection setup overhead.
the
I'll try running the whole suite again with higher number of
transactions i a few days
------------------
Hannu
Jussi Mikkola wrote:
Hi Tom!
Well, here's the profile, but yes, almost all the times are zero. Both without
the patch, and with it. Did I miss something? (Yes, I did make and install
afterwards ;-)I made a new database, so as it was growing, the times were even slower than
before. (It would be nice, if I could create a large database from the
beginning.;)Is this of any help? Do you need the same result with less cpus?
Jussi
Tom Lane wrote:
Jussi Mikkola <jussi.mikkola@bonware.com> writes:
Yes, now I've tested with 7.2b4. The result is about the same as with 7.1.
About 200 messages with four processors and about 600 messages with one
processor.
Was this solved with latest LWLock patches ?
--------------
Hannu