Posix Shared Mem patch
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.
I don't think that patch was all that reasonable. It needed work, and
in any case it needs a rebase because it was pretty old.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.I don't think that patch was all that reasonable. It needed work, and
in any case it needs a rebase because it was pretty old.
Yep, agreed.
I'd like to get this fixed too, but it hasn't made it up to the top of
my list of things to worry about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 6/26/12 2:13 PM, Robert Haas wrote:
On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.I don't think that patch was all that reasonable. It needed work, and
in any case it needs a rebase because it was pretty old.Yep, agreed.
I'd like to get this fixed too, but it hasn't made it up to the top of
my list of things to worry about.
Was there a post-AgentM version of the patch, which incorporated the
small SySV RAM partition? I'm not finding it.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Tue, Jun 26, 2012 at 2:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 6/26/12 2:13 PM, Robert Haas wrote:
On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.I don't think that patch was all that reasonable. It needed work, and
in any case it needs a rebase because it was pretty old.Yep, agreed.
I'd like to get this fixed too, but it hasn't made it up to the top of
my list of things to worry about.Was there a post-AgentM version of the patch, which incorporated the
small SySV RAM partition? I'm not finding it.
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed. This is most likely on platforms that don't have a
high SysV shmem limit by default, so installers all feel the
prerogative to increase the limit, but there's no great answer for how
to compose a series of such installations. It only takes one
installer that says "whatever, I'm just catenating stuff to
sysctl.conf that works for me" to sabotage Postgres' ability to start.
So there may be a benefit in finding a way to have no SysV memory at
all. I wouldn't let perfect be the enemy of good to make progress
here, but it appears this was a witnessed real problem, so it may be
worth reconsidering if there is a way we can safely remove all SysV by
finding an alternative to the nattach mechanic.
--
fdr
On Tue, Jun 26, 2012 at 5:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 6/26/12 2:13 PM, Robert Haas wrote:
On Tue, Jun 26, 2012 at 4:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Excerpts from Josh Berkus's message of mar jun 26 15:49:59 -0400 2012:
Robert, all:
Last I checked, we had a reasonably acceptable patch to use mostly Posix
Shared mem with a very small sysv ram partition. Is there anything
keeping this from going into 9.3? It would eliminate a major
configuration headache for our users.I don't think that patch was all that reasonable. It needed work, and
in any case it needs a rebase because it was pretty old.Yep, agreed.
I'd like to get this fixed too, but it hasn't made it up to the top of
my list of things to worry about.Was there a post-AgentM version of the patch, which incorporated the
small SySV RAM partition? I'm not finding it.
To my knowledge, no.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed. This is most likely on platforms that don't have a
high SysV shmem limit by default, so installers all feel the
prerogative to increase the limit, but there's no great answer for how
to compose a series of such installations. It only takes one
installer that says "whatever, I'm just catenating stuff to
sysctl.conf that works for me" to sabotage Postgres' ability to start.
Personally, I see this as rather an extreme case, and aside from AgentM
himself, have never run into it before. Certainly it would be useful to
not need SysV RAM at all, but it's more important to get a working patch
for 9.3.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Tue, Jun 26, 2012 at 5:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed. This is most likely on platforms that don't have a
high SysV shmem limit by default, so installers all feel the
prerogative to increase the limit, but there's no great answer for how
to compose a series of such installations. It only takes one
installer that says "whatever, I'm just catenating stuff to
sysctl.conf that works for me" to sabotage Postgres' ability to start.Personally, I see this as rather an extreme case, and aside from AgentM
himself, have never run into it before. Certainly it would be useful to
not need SysV RAM at all, but it's more important to get a working patch
for 9.3.
+1.
I'd sort of given up on finding a solution that doesn't involve system
V shmem anyway, but now that I think about it... what about using a
FIFO? The man page for open on MacOS X says:
[ENXIO] O_NONBLOCK and O_WRONLY are set, the file is a FIFO,
and no process has it open for reading.
And Linux says:
ENXIO O_NONBLOCK | O_WRONLY is set, the named file is a FIFO and no
process has the file open for reading. Or, the file is a device
special file and no corresponding device exists.
And HP/UX says:
[ENXIO] O_NDELAY is set, the named file is a FIFO,
O_WRONLY is set, and no process has the file open
for reading.
So, what about keeping a FIFO in the data directory? When the
postmaster starts up, it tries to open the file with O_NONBLOCK |
O_WRONLY (or O_NDELAY | O_WRONLY, if the platform has O_NDELAY rather
than O_NONBLOCK). If that succeeds, it bails out. If it fails with
anything other than ENXIO, it bails out. If it fails with exactly
ENXIO, then it opens the pipe with O_RDONLY and arranges to pass the
file descriptor down to all of its children, so that a subsequent open
will fail if it or any of its children are still alive.
This might even be more reliable than what we do right now, because
our current system appears not to be robust against the removal of
postmaster.pid.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed.
This argument is what killed the original patch. If you want to get
anything done *at all* I think it needs to be dropped. Changing shmem
implementation is already difficult enough --- you don't need to add the
requirement that the interlocking mechanism be changed simultaneously.
You (or whoever else) can always work on that as a followup patch.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Jun 26, 2012 at 2:53 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed.This argument is what killed the original patch. If you want to get
anything done *at all* I think it needs to be dropped. Changing shmem
implementation is already difficult enough --- you don't need to add the
requirement that the interlocking mechanism be changed simultaneously.
You (or whoever else) can always work on that as a followup patch.
True, but then again, I did very intentionally write:
Excerpts from Daniel Farina's message of mar jun 26 17:40:16 -0400 2012:
*I wouldn't let perfect be the enemy of good* to make progress
here, but it appears this was a witnessed real problem, so it may
be worth reconsidering if there is a way we can safely remove all
SysV by finding an alternative to the nattach mechanic.
(Emphasis mine).
I don't think that -hackers at the time gave the zero-shmem rationale
much weight (I also was not that happy about the safety mechanism of
that patch), but upon more reflection (and taking into account *other*
software that may mangle shmem settings) I think it's something at
least worth thinking about again one more time. What killed the patch
was an attachment to the deemed-less-safe stategy for avoiding bogus
shmem attachments already in it, but I don't seem to recall anyone
putting a whole lot of thought at the time into the zero-shmem case
from what I could read on the list, because a small interlock with
nattach seemed good-enough.
I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way. Maybe there's a solution
(like Robert's FIFO suggestion?) that is not too onerous and can
satisfy everyone.
--
fdr
On Jun 26, 2012, at 5:44 PM, Josh Berkus wrote:
On that, I used to be of the opinion that this is a good compromise (a
small amount of interlock space, plus mostly posix shmem), but I've
heard since then (I think via AgentM indirectly, but I'm not sure)
that there are cases where even the small SysV segment can cause
problems -- notably when other software tweaks shared memory settings
on behalf of a user, but only leaves just-enough for the software
being installed. This is most likely on platforms that don't have a
high SysV shmem limit by default, so installers all feel the
prerogative to increase the limit, but there's no great answer for how
to compose a series of such installations. It only takes one
installer that says "whatever, I'm just catenating stuff to
sysctl.conf that works for me" to sabotage Postgres' ability to start.Personally, I see this as rather an extreme case, and aside from AgentM
himself, have never run into it before. Certainly it would be useful to
not need SysV RAM at all, but it's more important to get a working patch
for 9.3.
This can be trivially reproduced if one runs an old (SysV shared memory-based) postgresql alongside a potentially newer postgresql with a smaller SysV segment. This can occur with applications that bundle postgresql as part of the app.
Cheers,
M
This can be trivially reproduced if one runs an old (SysV shared memory-based) postgresql alongside a potentially newer postgresql with a smaller SysV segment. This can occur with applications that bundle postgresql as part of the app.
I'm not saying it doesn't happen at all. I'm saying it's not the 80%
case.
So let's fix the 80% case with something we feel confident in, and then
revisit the no-sysv interlock as a separate patch. That way if we can't
fix the interlock issues, we still have a reduced-shmem version of Postgres.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
Robert Haas <robertmhaas@gmail.com> writes:
So, what about keeping a FIFO in the data directory?
Hm, does that work if the data directory is on NFS? Or some other weird
not-really-Unix file system?
When the
postmaster starts up, it tries to open the file with O_NONBLOCK |
O_WRONLY (or O_NDELAY | O_WRONLY, if the platform has O_NDELAY rather
than O_NONBLOCK). If that succeeds, it bails out. If it fails with
anything other than ENXIO, it bails out. If it fails with exactly
ENXIO, then it opens the pipe with O_RDONLY
... race condition here ...
and arranges to pass the
file descriptor down to all of its children, so that a subsequent open
will fail if it or any of its children are still alive.
This might be made to work, but that doesn't sound quite right in
detail.
I remember we speculated about using an fcntl lock on some file in the
data directory, but that fails because child processes don't inherit
fcntl locks.
In the modern world, it'd be really a step forward if the lock mechanism
worked on shared storage, ie a data directory on NFS or similar could be
locked against all comers not just those on the same node as the
original postmaster. I don't know how to do that though.
In the meantime, insisting that we solve this problem before we do
anything is a good recipe for ensuring that nothing happens, just
like it hasn't happened for the last half dozen years. (I see Alvaro
just made the same point.)
regards, tom lane
On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:
(Emphasis mine).
I don't think that -hackers at the time gave the zero-shmem rationale
much weight (I also was not that happy about the safety mechanism of
that patch), but upon more reflection (and taking into account *other*
software that may mangle shmem settings) I think it's something at
least worth thinking about again one more time. What killed the patch
was an attachment to the deemed-less-safe stategy for avoiding bogus
shmem attachments already in it, but I don't seem to recall anyone
putting a whole lot of thought at the time into the zero-shmem case
from what I could read on the list, because a small interlock with
nattach seemed good-enough.I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way. Maybe there's a solution
(like Robert's FIFO suggestion?) that is not too onerous and can
satisfy everyone.
I solved this via fcntl locking. I also set up gdb to break in critical regions to test the interlock and I found no flaw in the design. More eyes would be welcome, of course.
https://github.com/agentm/postgres/tree/posix_shmem
Cheers,
M
Josh Berkus <josh@agliodbs.com> writes:
So let's fix the 80% case with something we feel confident in, and then
revisit the no-sysv interlock as a separate patch. That way if we can't
fix the interlock issues, we still have a reduced-shmem version of Postgres.
Yes. Insisting that we have the whole change in one patch is a good way
to prevent any forward progress from happening. As Alvaro noted, there
are plenty of issues to resolve without trying to change the interlock
mechanism at the same time.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the meantime, insisting that we solve this problem before we do
anything is a good recipe for ensuring that nothing happens, just
like it hasn't happened for the last half dozen years. (I see
Alvaro just made the same point.)
And now so has Josh.
+1 from me, too.
-Kevin
"A.M." <agentm@themactionfaction.com> writes:
This can be trivially reproduced if one runs an old (SysV shared memory-based) postgresql alongside a potentially newer postgresql with a smaller SysV segment. This can occur with applications that bundle postgresql as part of the app.
I don't believe that that case is a counterexample to what's being
proposed (namely, grabbing a minimum-size shmem segment, perhaps 1K).
It would only fail if the old postmaster ate up *exactly* SHMMAX worth
of shmem, which is not real likely. As a data point, on my Mac laptop
with SHMMAX set to 32MB, 9.2 will by default eat up 31624KB, leaving
more than a meg available. Sure, that isn't enough to start another
old-style postmaster, but it would be plenty of room for one that only
wants 1K.
Even if you actively try to configure the shmem settings to exactly
fill shmmax (which I concede some installation scripts might do),
it's going to be hard to do because of the 8K granularity of the main
knob, shared_buffers. Moreover, a installation script that did that
would soon learn not to, because of the fact that we don't worry too
much about changing small details of shared memory consumption in minor
releases.
regards, tom lane
Excerpts from Tom Lane's message of mar jun 26 18:58:45 -0400 2012:
Even if you actively try to configure the shmem settings to exactly
fill shmmax (which I concede some installation scripts might do),
it's going to be hard to do because of the 8K granularity of the main
knob, shared_buffers.
Actually it's very easy -- just try to start postmaster on a system with
not enough shmmax and it will tell you how much shmem it wants. Then
copy that number verbatim in the config file. This might fail on picky
systems such as MacOSX that require some exact multiple or power of some
other parameter, but it works fine on Linux.
I think the minimum you can request, at least on Linux, is 1 byte.
Moreover, a installation script that did that
would soon learn not to, because of the fact that we don't worry too
much about changing small details of shared memory consumption in minor
releases.
+1
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"A.M." <agentm@themactionfaction.com> writes:
On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:
I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way.
I solved this via fcntl locking.
No, you didn't, because fcntl locks aren't inherited by child processes.
Too bad, because they'd be a great solution otherwise.
regards, tom lane
On 06/26/2012 07:30 PM, Tom Lane wrote:
"A.M." <agentm@themactionfaction.com> writes:
On Jun 26, 2012, at 6:12 PM, Daniel Farina wrote:
I'm simply suggesting that for additional benefits it may be worth
thinking about getting around nattach and thus SysV shmem, especially
with regard to safety, in an open-ended way.I solved this via fcntl locking.
No, you didn't, because fcntl locks aren't inherited by child processes.
Too bad, because they'd be a great solution otherwise.
You claimed this last time and I replied:
http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php
"I address this race condition by ensuring that a lock-holding violator
is the postmaster or a postmaster child. If such as condition is
detected, the child exits immediately without touching the shared
memory. POSIX shmem is inherited via file descriptors."
This is possible because the locking API allows one to request which PID
violates the lock. The child expects the lock to be held and checks that
the PID is the parent. If the lock is not held, that means that the
postmaster is dead, so the child exits immediately.
Cheers,
M