Re: postgresql and process titles
The only way that I'm aware of for disabling this is at compile time ...
after running configure, you want to modify:
src/include/pg_config.h
and undef HAVE_SETPROCTITLE ...
I'm CC'ng -hackers about this though, since I think you are the first to
point to setproctitle() as being a serious performance bottleneck ...
On Sun, 11 Jun 2006, Kris Kennaway wrote:
Why does postgresql change its process title so frequently and how can
this be disabled? Profiling suggests it's a fairly serious
performance bottleneck.Kris
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org MSN . scrappy@hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
Import Notes
Reply to msg id not found: 20060611074537.GA5337@xor.obsecurity.orgReference msg id not found: 20060611074537.GA5337@xor.obsecurity.org
On Sun, 11 Jun 2006, Kris Kennaway wrote:
Why does postgresql change its process title so frequently and how can
this be disabled? Profiling suggests it's a fairly serious
performance bottleneck.
Let's see the evidence.
regards, tom lane
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
On Sun, 11 Jun 2006, Kris Kennaway wrote:
Why does postgresql change its process title so frequently and how can
this be disabled? Profiling suggests it's a fairly serious
performance bottleneck.Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.
It's actually worse than that in stock FreeBSD because in certain
"mixed" workloads with other sources of Giant contention present they
will contend with each other to make overall machine performance much
worse; that's the "serious" aspect, although 8% performance loss just
in case someone should ever want to run 'ps' to check the state of the
database processes is nothing to laugh at either.
Kris
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.
I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case? And what did you do to "disable process title
setting completely"?
The reason I'm being doubting Thomas here is that I've never seen any
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.
regards, tom lane
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case?
As I said, I'm using the super-smack select benchmark; presumably you
are aware of it.
And what did you do to "disable process title setting completely"?
Added this to ps_status.c:
#undef PS_USE_SETPROCTITLE
#undef PS_USE_PSTAT
#undef PS_USE_PS_STRINGS
#undef PS_USE_CHANGE_ARGV
#undef PS_USE_CLOBBER_ARGV
#define PS_USE_NONE
Here are the queries/second data, before and after:
x pg
+ pg-noproctitle
+------------------------------------------------------------+
|x xx + |
|x xx + + |
|xx xxx + + + ++++ |
| |AM| |____AM___||
+------------------------------------------------------------+
N Min Max Median Avg Stddev
x 11 3399.29 3425.1 3413.93 3411.8418 9.4287675
+ 10 3615.63 3699.32 3685.515 3679.344 24.894177
Difference at 95.0% confidence
267.502 +/- 16.871
7.8404% +/- 0.494483%
(Student's t, pooled s = 18.4484)
Kris
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case? And what did you do to "disable process title
setting completely"?The reason I'm being doubting Thomas here is that I've never seen any
Ba-da-bum!
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.It's also important to find out what version of FreeBSD this is. A lot
of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
possible this isn't an issue in newer versions.
It's still true in 6.x and 7.x. I have a patch that removes Giant
from the sysctl in question, and I have also removed it from another
relevant part of the kernel (semop() is bogusly acquiring Giant when
it is supposed to be mpsafe).
When it's possible to commit that patch (should be in time for 7.0,
but not sure if it will make it into 6.2) it will eliminate the worst
part of the problem, but it still leaves postgresql making thousands
of syscalls per second to flip its process titles back and forth,
which needs to be looked at carefully for a performance impact. For
now, users of FreeBSD who want that 8% should turn it off though (or
maybe one of the alternative methods is usable).
BTW, another promising performance/scalability change on BSD systems
would be to convert pgsql to use kqueue instead of select, since mutex
profiling traces show a lot of contention on the select lock in
FreeBSD.
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.
Kris
Import Notes
Reply to msg id not found: 20060612030712.GG34196@pervasive.comReference msg id not found: 20060611074537.GA5337@xor.obsecurity.org
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case? And what did you do to "disable process title
setting completely"?The reason I'm being doubting Thomas here is that I've never seen any
Ba-da-bum!
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.It's also important to find out what version of FreeBSD this is. A lot
of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
possible this isn't an issue in newer versions.
Can you provide the actual commands you used to setup and run the test?
This would allow others to duplicate your results and debug the
situation on their own. This is also important because we've generally
found HTT to be a loss (except on Windows), so it'd be good to see what
impact this has on AMD hardware. It would also be very useful to have
the raw test result numbers you obtained.
It's still true in 6.x and 7.x. I have a patch that removes Giant
from the sysctl in question, and I have also removed it from another
relevant part of the kernel (semop() is bogusly acquiring Giant when
it is supposed to be mpsafe).
What affect did that patch have on the numbers? And where is it, in case
anyone here wants to try it?
When it's possible to commit that patch (should be in time for 7.0,
but not sure if it will make it into 6.2) it will eliminate the worst
part of the problem, but it still leaves postgresql making thousands
of syscalls per second to flip its process titles back and forth,
which needs to be looked at carefully for a performance impact. For
now, users of FreeBSD who want that 8% should turn it off though (or
maybe one of the alternative methods is usable).
We have a similar issue internally with stats_command_string. The issue
is that it's very helpful to be able to see what a 'long running' (more
than a second or so) statement is doing, but there's very little value
in doing all that work for a statement that's only going to run for a
few ms. If there's a very cheap way to set some kind of a timer that
would update this information once a statement's been around long enough
that might be a way to handle this (I don't know if we're already using
ALRM in the backend, or if that's a cheap enough solution).
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.
Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case? And what did you do to "disable process title
setting completely"?The reason I'm being doubting Thomas here is that I've never seen any
Ba-da-bum!
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.It's also important to find out what version of FreeBSD this is. A lot
of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
possible this isn't an issue in newer versions.Can you provide the actual commands you used to setup and run the test?
I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).
It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps. Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.
The only thing I had to change on FreeBSD was to edit the
select-key.smack and change "localhost" to "127.0.0.1" in two
locations to avoid possibly using IPv6 transport.
This would allow others to duplicate your results and debug the
situation on their own. This is also important because we've generally
found HTT to be a loss (except on Windows), so it'd be good to see what
impact this has on AMD hardware. It would also be very useful to have
the raw test result numbers you obtained.
They were posted previously. This is Intel hardware (AMD doesn't do
HTT), and I didn't retest without HTT. I'll try to do so if I have
the time (however previously when profiling mysql, HTT did give a small
positive change).
It's still true in 6.x and 7.x. I have a patch that removes Giant
from the sysctl in question, and I have also removed it from another
relevant part of the kernel (semop() is bogusly acquiring Giant when
it is supposed to be mpsafe).What affect did that patch have on the numbers? And where is it, in case
anyone here wants to try it?
I didn't yet retest with the patch. It's in my perforce branch:
http://perforce.freebsd.org/changeList.cgi?FSPC=//depot/user/kris/contention/...
although you probably need a combination of the changes in order for
it to be usable.
When it's possible to commit that patch (should be in time for 7.0,
but not sure if it will make it into 6.2) it will eliminate the worst
part of the problem, but it still leaves postgresql making thousands
of syscalls per second to flip its process titles back and forth,
which needs to be looked at carefully for a performance impact. For
now, users of FreeBSD who want that 8% should turn it off though (or
maybe one of the alternative methods is usable).We have a similar issue internally with stats_command_string. The issue
is that it's very helpful to be able to see what a 'long running' (more
than a second or so) statement is doing, but there's very little value
in doing all that work for a statement that's only going to run for a
few ms. If there's a very cheap way to set some kind of a timer that
would update this information once a statement's been around long enough
that might be a way to handle this (I don't know if we're already using
ALRM in the backend, or if that's a cheap enough solution).
I don't know what the best way to implement it would be, but limiting
the frequency of these updates does seem to be the way to go.
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(
Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs. It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.
Kris
On Mon, Jun 12, 2006 at 11:38:01AM -0400, Kris Kennaway wrote:
On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway <kris@obsecurity.org> writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT. Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case? And what did you do to "disable process title
setting completely"?The reason I'm being doubting Thomas here is that I've never seen any
Ba-da-bum!
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.It's also important to find out what version of FreeBSD this is. A lot
of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
possible this isn't an issue in newer versions.Can you provide the actual commands you used to setup and run the test?
I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps. Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.
Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs. It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.
FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
Can you provide the actual commands you used to setup and run the test?
I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps. Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
Thanks for the reference, I'll check it out. Brain-dead or not though
(and I've also seen evidence that it is, e.g. it does a hell of a lot
of 1-byte reads), it's a well known benchmark that seems to be fairly
widely used, so there's still some value in making it perform well
(but not at the expense of other things). However my interest is not
so much for measuring database performance as for measuring kernel
performance and trying to optimize bottlenecks.
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs. It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
It could be it's making assumptions about what is cheap and what is
not, which are not true universally. e.g. you might be adding a new
contention point that is even worse. I didn't notice a change in
behaviour as I varied the load, so perhaps it was not visible on this
benchmark or I wasn't looking in the right place.
Kris
Jim C. Nasby wrote:
FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
I am working on a patch for 8.2 to fix that for all platforms.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
Have you tried to compile this on FreeBSD? It looks like it (dbt1 at
least) will need a moderate amount of hacking - there are some Linux
assumptions in the source and the configure script makes assumptions
about where things are installed that cannot be overridden on the
commandline.
Kris
On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
Jim C. Nasby wrote:
FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(I am working on a patch for 8.2 to fix that for all platforms.
Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote:
On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
Jim C. Nasby wrote:
FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(I am working on a patch for 8.2 to fix that for all platforms.
Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
The patches are in the hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
Title is "Stats collector performance improvement". I need someone to
test my patches on a non-BSD platform to move forward.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Jim C. Nasby wrote:
Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
The patches are in the hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
That's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c). But IIRC that patch is on hold because
nobody particularly liked the approach it's taking. I think we should
investigate rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at all? It'd
be interesting to see what happens if we just make the collector process
read the UDP socket directly. Or alternatively drop the UDP socket in
favor of having the backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).
As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so). The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work. Not clear
that there's a win there. (If you're using statement_timeout it might
not matter, but if you aren't...)
Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.
regards, tom lane
On Tue, Jun 13, 2006 at 04:35:24PM -0400, Tom Lane wrote:
... The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work. Not clear
that there's a win there.
And ofcourse it's an almost guarenteed loss on systems that don't
require a syscall to set the proc title.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:
FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with
a lot
of procs in either semwait or semlock. :(Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc). As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs. It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.
BTW, there's another FBSD performance odditiy I've run across. Running
pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 &
which dumps the email_contrib table to bzip2 then to disk, the OS
won't use more than 1 CPU on an SMP system... unless the data is
cached. According to both gstat and systat -v, the system isn't I/O
bound; both are reporting the RAID10 with that table on it as only
about 10% busy. If I let that command run for a bit then cancel it
and re-start it so that the beginning of that table is in cache, it
will use one entire CPU for bzip2, which is what I'd expect to happen.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
That's talking about the stats code, which has approximately
zip to do with setproctitle (ps_status.c). But IIRC that
patch is on hold because nobody particularly liked the
approach it's taking. I think we should investigate
rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at
all? It'd be interesting to see what happens if we just make
the collector process read the UDP socket directly. Or
alternatively drop the UDP socket in favor of having the
backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).
(Yes, Iremember saying I was planning to look at this. As is probably
obvious by now, I haven't had the time to do that (yet)).
As for your question, it will be a bit painful to port to windows. We
did have a lot of problems with the pgstat pipe in the initial porting
work, and I'm not convinced that there aren't some small issues still
lurking there under heavy load. The point is that the whole concept of
sharing socket descriptors doesn't really play well between processes on
Windows.
Using UDP would make that a whole lot better. Without knowing anything,
I would assume the overhead of a localhost UDP packet isn't very large
on a reasonably modern platform.
//Magnus
Import Notes
Resolved by subject fallback
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Jim C. Nasby wrote:
Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...The patches are in the hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_holdThat's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c). But IIRC that patch is on hold because
I thought the bug reporter was asking about the stats code was well.
nobody particularly liked the approach it's taking. I think we should
investigate rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at all? It'd
be interesting to see what happens if we just make the collector process
read the UDP socket directly. Or alternatively drop the UDP socket in
Agreed, that's what I would prefer, and tested something like that, but
even pulling the packet into the buffer and throwing them away had
significant overhead, so I think the timeout trick has to be employed as
well as going to a single process.
favor of having the backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so). The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work. Not clear
that there's a win there. (If you're using statement_timeout it might
not matter, but if you aren't...)Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.
Right. What if the postmaster signals the backend once a second to do
their reporting. I am not sure what the final solution will be, but we
_need_ one based on the performance numbers I and others have seen.
Could we have PGPROC have a reporting boolean that is set every second
and somehow checked by each backend?
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 05:05:31PM -0400, Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Jim C. Nasby wrote:
Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...The patches are in the hold queue:
http://momjian.postgresql.org/cgi-bin/pgpatches_holdThat's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c). But IIRC that patch is on hold becauseI thought the bug reporter was asking about the stats code was well.
It did get brought up...
As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so). The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work. Not clear
that there's a win there. (If you're using statement_timeout it might
not matter, but if you aren't...)Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.Right. What if the postmaster signals the backend once a second to do
their reporting. I am not sure what the final solution will be, but we
_need_ one based on the performance numbers I and others have seen.
Could we have PGPROC have a reporting boolean that is set every second
and somehow checked by each backend?
One second might be a bit more delay than some folks want... it would be
nice if this was tuneable. Also, what would the overhead on this look
like if there's a large number of idle backends?
It does sound more appealing than setting a timer every time you start a
transaction, though...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461