Proposal to add a QNX 6.5 port to PostgreSQL
I propose that a QNX 6.5 port be introduced to PostgreSQL.
I am new to PostgreSQL development, so please bear with me.
I have made good progress (with 1 outstanding issue, details below):
* I created a QNX 6.5 port of PostgreSQL 9.3.4 which passes regression tests.
* I merged my changes into 9.4beta2, and with a few minor changes, it passes regression tests.
* QNX support states that QNX 6.5 SP1 binaries run on QNX 6.6 without modification, which I confirmed with a few quick tests.
Summary of changes required for PostgreSQL 9.3.4 on QNX 6.5:
* Typical changes required for any new port (template, configure.in, dynloader, etc.)
* QNX lacks System V shared memory: I created "src/backend/port/posix_shmem.c" which replaces System V calls (shmget, shmat, shmdt, ...) with POSIX calls (shm_open, mmap, munmap, shm_unlink)
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
* A few files required addition of #include <sys/select.h> on QNX (for fd_set).
Additional changes required for PostgreSQL9.4beta2 on QNX 6.5:
* "DSM" changes introduced in 9.4 (R. Haas) required that I make minor updates to my new "posix_shmem.c" code.
* src\include\replication\logical.h: struct LogicalDecodingContext field "write" interferes with my "write" retry macro. Renaming field "write" to "do_write" solved this problem.
Outstanding Issue #1:
src/backend/commands/dbcommands.c :: createdb() complains when copying template1 to template0 (apparently a locale issue)
"FATAL: 22023: new LC_CTYPE (C;collate:POSIX;ctype:POSIX) is incompatible with the LC_CTYPE of the template database (POSIX;messages:C)"
I would appreciate help from an experienced PostgreSQL hacker to address this.
I have temporarily disabled this check on QNX (I can live with the assumption/limitation that template0 and template1 contain strictly ASCII).
I can work toward setting up a build farm member should this proposal be accepted.
Your feedback and guidance on next steps is appreciated.
Thank you.
Keith Baker
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
I propose that a QNX 6.5 port be introduced to PostgreSQL.
Hmm ... you're aware that there used to be a QNX port? We removed it
back in 2006 for lack of interest and maintainers, and AFAIR you're
the first person to show any interest in reintroducing it since then.
I'm a bit concerned about reintroducing something that seems to have so
little usage, especially if the port is going to be as invasive as you
suggest:
* QNX lacks System V shared memory: I created "src/backend/port/posix_shmem.c" which replaces System V calls (shmget, shmat, shmdt, ...) with POSIX calls (shm_open, mmap, munmap, shm_unlink)
This isn't really acceptable for production usage; if it were, we'd have
done it already. The POSIX APIs lack any way to tell how many processes
are attached to a shmem segment, which is *necessary* functionality for
us (it's a critical part of the interlock against starting multiple
postmasters in one data directory).
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
That's pretty scary too. For one thing, such macros would affect every
call site whether it's running with SA_RESTART or not. Do you really
need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if
you don't have SA_RESTART. Maybe that code has bit-rotted by now, but
it did work at one time.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 3:16 PM, Baker, Keith [OCDUS Non-J&J]
<KBaker9@its.jnj.com> wrote:
I propose that a QNX 6.5 port be introduced to PostgreSQL.
I am new to PostgreSQL development, so please bear with me.
I have made good progress (with 1 outstanding issue, details below):
· I created a QNX 6.5 port of PostgreSQL 9.3.4 which passes
regression tests.· I merged my changes into 9.4beta2, and with a few minor changes,
it passes regression tests.· QNX support states that QNX 6.5 SP1 binaries run on QNX 6.6
without modification, which I confirmed with a few quick tests.Summary of changes required for PostgreSQL 9.3.4 on QNX 6.5:
· Typical changes required for any new port (template, configure.in,
dynloader, etc.)· QNX lacks System V shared memory: I created
“src/backend/port/posix_shmem.c” which replaces System V calls (shmget,
shmat, shmdt, …) with POSIX calls (shm_open, mmap, munmap, shm_unlink)· QNX lacks sigaction SA_RESTART: I modified “src/include/port.h” to
define macros to retry system calls upon EINTR (open,read,write,…) when
compiled on QNX· A few files required addition of #include <sys/select.h> on QNX
(for fd_set).Additional changes required for PostgreSQL9.4beta2 on QNX 6.5:
· “DSM” changes introduced in 9.4 (R. Haas) required that I make
minor updates to my new “posix_shmem.c” code.· src\include\replication\logical.h: struct LogicalDecodingContext
field “write” interferes with my “write” retry macro. Renaming field
“write” to “do_write” solved this problem.Outstanding Issue #1:
src/backend/commands/dbcommands.c :: createdb() complains when copying
template1 to template0 (apparently a locale issue)“FATAL: 22023: new LC_CTYPE (C;collate:POSIX;ctype:POSIX) is incompatible
with the LC_CTYPE of the template database (POSIX;messages:C)”I would appreciate help from an experienced PostgreSQL hacker to address
this.I have temporarily disabled this check on QNX (I can live with the
assumption/limitation that template0 and template1 contain strictly ASCII).I can work toward setting up a build farm member should this proposal be
accepted.
Maybe step #1 is to get a buildfarm member set up. Is there any
policy against unsupported environments in the buildfarm? (I hope not)
You're going to have to run it against a git repository containing
your custom patches. It's a long and uncertain road to getting a new
port (re-) accepted, but demonstrated commitment to support is a
necessary first step. It will also advertise support for the platform.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-07-28 11:19:48 -0500, Merlin Moncure wrote:
Maybe step #1 is to get a buildfarm member set up. Is there any
policy against unsupported environments in the buildfarm? (I hope not)You're going to have to run it against a git repository containing
your custom patches. It's a long and uncertain road to getting a new
port (re-) accepted, but demonstrated commitment to support is a
necessary first step. It will also advertise support for the platform.
I don't think a buildfarm animal that doesn't run the actual upstream
code is a good idea. That'll make it a lot harder to understand what's
going on when something breaks after a commit. It'd also require the
custom patches being rebased ontop of $branch before every run...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 28, 2014 at 11:22 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-07-28 11:19:48 -0500, Merlin Moncure wrote:
Maybe step #1 is to get a buildfarm member set up. Is there any
policy against unsupported environments in the buildfarm? (I hope not)You're going to have to run it against a git repository containing
your custom patches. It's a long and uncertain road to getting a new
port (re-) accepted, but demonstrated commitment to support is a
necessary first step. It will also advertise support for the platform.I don't think a buildfarm animal that doesn't run the actual upstream
code is a good idea. That'll make it a lot harder to understand what's
going on when something breaks after a commit. It'd also require the
custom patches being rebased ontop of $branch before every run...
hm. oh well. maybe if there was a separate page for custom builds
(basically, an unsupported section).
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 28, 2014 at 9:41 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
I don't think a buildfarm animal that doesn't run the actual upstream
code is a good idea. That'll make it a lot harder to understand what's
going on when something breaks after a commit. It'd also require the
custom patches being rebased ontop of $branch before every run...hm. oh well. maybe if there was a separate page for custom builds
(basically, an unsupported section).
I think that's a bad idea. The QNX OS seems to be mostly used in
safety-critical systems; it has a microkernel design. I think it would
be particularly bad to have iffy support for something like that.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 6:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* QNX lacks System V shared memory: I created "src/backend/port/posix_shmem.c" which replaces System V calls (shmget, shmat, shmdt, ...) with POSIX calls (shm_open, mmap, munmap, shm_unlink)
This isn't really acceptable for production usage; if it were, we'd have
done it already. The POSIX APIs lack any way to tell how many processes
are attached to a shmem segment, which is *necessary* functionality for
us (it's a critical part of the interlock against starting multiple
postmasters in one data directory).
I think it would be good to spend some energy figuring out what to do
about this. The Linux developers, for reasons I have not been able to
understand, appear to hate System V shared memory, and rumors have
circulated here that they would like to get rid of it altogether. And
quite apart from that, even using a few bytes of System V shared
memory is apparently inconvenient for people who run many copies of
PostgreSQL on the same machine or who run in environments where it's
not available, such as FreeBSD jails for which it hasn't been
specifically enabled.[1]See comments on http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
Now, in fairness, all of the alternative systems have their own share
of problems. POSIX shared memory isn't available everywhere, and the
anonymous mmap we're now using doesn't work in EXEC_BACKEND builds,
can't be used for dynamic shared memory, and apparently performs
poorly on BSD systems.[1]See comments on http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html In spite of that, I think that having an
option to use POSIX shared memory would make a reasonable number of
PostgreSQL users happier than they are today; and maybe even attract a
few new ones.
In our last discussion on this topic, we talked about using file locks
as a substitute for nattch. You concluded that fcntl was totally
broken for this purpose because of the possibility of some other piece
of code accidentally opening and closing the lock file.[2]/messages/by-id/18958.1340764854@sss.pgh.pa.us lockf
appears to have the same problem, but flock might not, at least on
some systems. The semantics as described in my copy of the Linux man
pages are that a child created by fork() inherits a copy of the
filehandle pointing to the same lock, and that the lock is released
when either ANY process with a copy of that filehandle makes an
explicit unlock request or ALL copies of the filehandle are closed.
That seems like it'd be OK for our purposes, though the Linux guys
seem to think the semantics might be different on other platforms, and
note that it won't work over NFS.
Another thing that strikes me is that lsof works on just about every
platform I've ever used, and it tells you who has got a certain file
open. Of course it has to use different methods to do that on
different platforms, but at least on Linux, /proc/self/fd/N is a
symlink to the file you've got open, and shared memory segments are
files in /dev/shm. So maybe at least on particular platforms where we
care enough, we could install operating-system-specific code to
provide an interlock using a mechanism of this type. Not sure if that
will fly, but it's a thought.
Yet another idea is to somehow use POSIX semaphores, which are
distinct from POSIX shared memory. semop() has a SEM_UNDO flag which
causes whatever operation you perform to reversed out a process exit.
So you could have each new postgres process increment the semaphore
value in such a way that it would be decremented on exit, although I'm
not sure how to avoid a race if the postmaster dies before a new child
has a chance to increment the semaphore.
Finally, how about named pipes? Linux says that trying to open a
named pipe for write when there are no readers will return ENXIO, and
attempting to write to an already-open pipe with no remaining readers
will cause SIGPIPE. So: create a permanent named pipe in the data
directory that all PostgreSQL processes keep open. When the
postmaster starts, it opens the pipe for read, then for write, then
closes it for read. It then tries to write to the pipe. If this
fails to result in SIGPIPE, then somebody else has got the thing open;
so the new postmaster should die at once. But if does get a SIGPIPE
then there are as of that moment no other readers.
I'm not sure if any of this helps QNX or not, but maybe if we figure
out which of these mechanisms (or others) might be acceptable we can
cross-check that against what QNX supports.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1]: See comments on http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
[2]: /messages/by-id/18958.1340764854@sss.pgh.pa.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jul 25, 2014 at 6:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This isn't really acceptable for production usage; if it were, we'd have
done it already. The POSIX APIs lack any way to tell how many processes
are attached to a shmem segment, which is *necessary* functionality for
us (it's a critical part of the interlock against starting multiple
postmasters in one data directory).
I think it would be good to spend some energy figuring out what to do
about this.
Well, we've been around on this multiple times before, but if we have
any new ideas, sure ...
In our last discussion on this topic, we talked about using file locks
as a substitute for nattch. You concluded that fcntl was totally
broken for this purpose because of the possibility of some other piece
of code accidentally opening and closing the lock file.[2] lockf
appears to have the same problem, but flock might not, at least on
some systems.
My Linux man page for flock says
flock() does not lock files over NFS. Use fcntl(2) instead: that does
work over NFS, given a sufficiently recent version of Linux and a
server which supports locking.
which seems like a showstopper problem; we might try to tell people not to
put their databases on NFS, but they're not gonna listen. It also says
flock() and fcntl(2) locks have different semantics with respect to
forked processes and dup(2). On systems that implement flock() using
fcntl(2), the semantics of flock() will be different from those
described in this manual page.
which is pretty scary if it's accurate for any still-extant platforms;
we might think we're using flock and still get fcntl behavior. It's
also of concern that (AFAICS) flock is not in POSIX, which means we
can't even expect that platforms will agree on how it *should* behave.
I also noted that flock does not support atomic downgrade of exclusive
lock to shared lock, which seems like a problem for the lock inheritance
scheme sketched in
/messages/by-id/18162.1340761845@sss.pgh.pa.us
... but OTOH, it sounds like flock locks are not only inherited through
fork() but even preserved across exec(), which would mean that we don't
need that scheme for file lock inheritance, even with EXEC_BACKEND.
Still, it's not clear to me how we could put much faith in flock.
Finally, how about named pipes? Linux says that trying to open a
named pipe for write when there are no readers will return ENXIO, and
attempting to write to an already-open pipe with no remaining readers
will cause SIGPIPE. So: create a permanent named pipe in the data
directory that all PostgreSQL processes keep open. When the
postmaster starts, it opens the pipe for read, then for write, then
closes it for read. It then tries to write to the pipe. If this
fails to result in SIGPIPE, then somebody else has got the thing open;
so the new postmaster should die at once. But if does get a SIGPIPE
then there are as of that moment no other readers.
Hm. That particular protocol is broken: two postmasters doing it at the
same time would both pass (because neither has it open for read at the
instant where they try to write). But we could possibly frob the idea
until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it would work.
But does NFS support named pipes?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you to all who have responded to this proposal.
PostgreSQL manages to meet all production requirements on Windows without System V shared memory, so I would think this can be achieved on QNX/Linux.
The old PostgreSQL QNX port ran on the very old "QNX4" (1991), so I understand why it would be of little value today.
Currently, QNX Neutrino 6.5 is well established (and QNX 6.6 is emerging) and that is where a PostgreSQL port would be well received.
I have attached my current work-in-progress patches for 9.3.4 and 9.4beta2 for the curious.
To minimize risk, I have been careful to ensure my changes will have effect only QNX builds, existing ports should see zero impact.
To minimize addition of new files, I have used the "linux" template rather than add qnx6 as a separate port/template.
All regression tests pass on my system, so while not perfect it is at least a reasonable start.
posix_shmem.c is still in need of some cleanup and mitigations to make it "production-strength".
If there are existing tests I can run to ensure the QNX port meets your criteria for robust failure handling in this area I would be happy to run them.
If not, perhaps someone can provide a quick list of failure modes to consider.
As-is:
- starting of a second postmaster fails with message 'FATAL: lock file "postmaster.pid" already exists'
- Kill -9 of postmaster followed by a pg_ctl start seems to go through recovery, although the original shared memory segments hang out in /dev/shmem until reboot (that could be better).
Thanks again and please let me know if I can be of any assistance.
Keith Baker
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 29, 2014 7:06 PM
To: Robert Haas
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jul 25, 2014 at 6:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This isn't really acceptable for production usage; if it were, we'd
have done it already. The POSIX APIs lack any way to tell how many
processes are attached to a shmem segment, which is *necessary*
functionality for us (it's a critical part of the interlock against
starting multiple postmasters in one data directory).
I think it would be good to spend some energy figuring out what to do
about this.
Well, we've been around on this multiple times before, but if we have any new ideas, sure ...
In our last discussion on this topic, we talked about using file locks
as a substitute for nattch. You concluded that fcntl was totally
broken for this purpose because of the possibility of some other piece
of code accidentally opening and closing the lock file.[2] lockf
appears to have the same problem, but flock might not, at least on
some systems.
My Linux man page for flock says
flock() does not lock files over NFS. Use fcntl(2) instead: that does
work over NFS, given a sufficiently recent version of Linux and a
server which supports locking.
which seems like a showstopper problem; we might try to tell people not to put their databases on NFS, but they're not gonna listen. It also says
flock() and fcntl(2) locks have different semantics with respect to
forked processes and dup(2). On systems that implement flock() using
fcntl(2), the semantics of flock() will be different from those
described in this manual page.
which is pretty scary if it's accurate for any still-extant platforms; we might think we're using flock and still get fcntl behavior. It's also of concern that (AFAICS) flock is not in POSIX, which means we can't even expect that platforms will agree on how it *should* behave.
I also noted that flock does not support atomic downgrade of exclusive lock to shared lock, which seems like a problem for the lock inheritance scheme sketched in /messages/by-id/18162.1340761845@sss.pgh.pa.us
... but OTOH, it sounds like flock locks are not only inherited through
fork() but even preserved across exec(), which would mean that we don't need that scheme for file lock inheritance, even with EXEC_BACKEND.
Still, it's not clear to me how we could put much faith in flock.
Finally, how about named pipes? Linux says that trying to open a named
pipe for write when there are no readers will return ENXIO, and
attempting to write to an already-open pipe with no remaining readers
will cause SIGPIPE. So: create a permanent named pipe in the data
directory that all PostgreSQL processes keep open. When the
postmaster starts, it opens the pipe for read, then for write, then
closes it for read. It then tries to write to the pipe. If this
fails to result in SIGPIPE, then somebody else has got the thing open;
so the new postmaster should die at once. But if does get a SIGPIPE
then there are as of that moment no other readers.
Hm. That particular protocol is broken: two postmasters doing it at the same time would both pass (because neither has it open for read at the instant where they try to write). But we could possibly frob the idea until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline assumption about what's portable across Unixen, so maybe it would work.
But does NFS support named pipes?
regards, tom lane
Attachments:
pg_94beta2_qnx_20140729.patchapplication/octet-stream; name=pg_94beta2_qnx_20140729.patchDownload
diff -rdupN postgresql-9.4beta2/INSTALL postgresql-9.4beta2_qnx/INSTALL
--- postgresql-9.4beta2/INSTALL 2014-07-21 15:24:46.000000000 -0400
+++ postgresql-9.4beta2_qnx/INSTALL 2014-07-29 15:40:50.000000000 -0400
@@ -1513,3 +1513,16 @@ make: *** [postgres] Error 1
your DTrace installation is too old to handle probes in static
functions. You need Solaris 10u4 or newer.
+ __________________________________________________________________
+
+QNX 6.5 and QNX 6.6
+
+ PostgreSQL can be built natively on QNX 6.5 SP1 using gcc.
+ The executables will also run on QNX 6.6.
+ Changes required for QNX:
+ a. Replace all System V shared memory with POSIX named shared memory (posix_shmem.c).
+ b. port.h now includes a #ifdef __QNX__ section, where macros to retry
+ interrupeted system calls (e.g., read, write) are defined.
+ This is needed because QNX does not support sigaction SA_RESTART.
+
+ ./configure --without-readline --disable-thread-safety
diff -rdupN postgresql-9.4beta2/configure postgresql-9.4beta2_qnx/configure
--- postgresql-9.4beta2/configure 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/configure 2014-07-29 16:06:00.000000000 -0400
@@ -2850,7 +2850,7 @@ case $host_os in
dragonfly*) template=netbsd ;;
freebsd*) template=freebsd ;;
hpux*) template=hpux ;;
- linux*|gnu*|k*bsd*-gnu)
+ linux*|gnu*|k*bsd*-gnu|*qnx6*)
template=linux ;;
mingw*) template=win32 ;;
netbsd*) template=netbsd ;;
@@ -13895,16 +13895,21 @@ fi
# Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
+if test "$PORTNAME" = "win32"; then
-$as_echo "#define USE_SYSV_SHARED_MEMORY 1" >>confdefs.h
+$as_echo "#define USE_WIN32_SHARED_MEMORY 1" >>confdefs.h
- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
+ SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test x"$USE_POSIX_SHARED_MEMORY" = x"1" ; then
+
+$as_echo "#define USE_POSIX_SHARED_MEMORY 1" >>confdefs.h
+
+ SHMEM_IMPLEMENTATION="src/backend/port/posix_shmem.c"
else
-$as_echo "#define USE_WIN32_SHARED_MEMORY 1" >>confdefs.h
+$as_echo "#define USE_SYSV_SHARED_MEMORY 1" >>confdefs.h
- SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+ SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
fi
# Select latch implementation type.
diff -rdupN postgresql-9.4beta2/configure.in postgresql-9.4beta2_qnx/configure.in
--- postgresql-9.4beta2/configure.in 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/configure.in 2014-07-29 16:07:38.000000000 -0400
@@ -64,7 +64,7 @@ case $host_os in
dragonfly*) template=netbsd ;;
freebsd*) template=freebsd ;;
hpux*) template=hpux ;;
- linux*|gnu*|k*bsd*-gnu)
+ linux*|gnu*|k*bsd*-gnu|*qnx6*)
template=linux ;;
mingw*) template=win32 ;;
netbsd*) template=netbsd ;;
@@ -1794,12 +1794,15 @@ fi
# Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
- AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared memory.])
- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
-else
+if test "$PORTNAME" = "win32"; then
AC_DEFINE(USE_WIN32_SHARED_MEMORY, 1, [Define to select Win32-style shared memory.])
SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test x"$USE_POSIX_SHARED_MEMORY" = x"1" ; then
+ AC_DEFINE(USE_POSIX_SHARED_MEMORY, 1, [Define to select POSIX-style shared memory (QNX).])
+ SHMEM_IMPLEMENTATION="src/backend/port/posix_shmem.c"
+else
+ AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared memory.])
+ SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
fi
# Select latch implementation type.
diff -rdupN postgresql-9.4beta2/src/backend/Makefile postgresql-9.4beta2_qnx/src/backend/Makefile
--- postgresql-9.4beta2/src/backend/Makefile 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/Makefile 2014-07-29 15:40:51.000000000 -0400
@@ -52,6 +52,7 @@ all: submake-libpgport submake-schemapg
ifneq ($(PORTNAME), cygwin)
ifneq ($(PORTNAME), win32)
ifneq ($(PORTNAME), aix)
+ifeq (,$(findstring qnx6, $(host_os)))
postgres: $(OBJS)
$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@
@@ -59,6 +60,7 @@ postgres: $(OBJS)
endif
endif
endif
+endif
ifeq ($(PORTNAME), cygwin)
@@ -105,6 +107,14 @@ endif
endif # aix
+ifneq (,$(findstring qnx6, $(host_os)))
+
+postgres: $(OBJS)
+ $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@
+ ldrel -S 3M $@
+
+endif # nto-qnx6.5.0
+
# Update the commonly used headers before building the subdirectories
$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/catalog/schemapg.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/probes.h
diff -rdupN postgresql-9.4beta2/src/backend/commands/dbcommands.c postgresql-9.4beta2_qnx/src/backend/commands/dbcommands.c
--- postgresql-9.4beta2/src/backend/commands/dbcommands.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/commands/dbcommands.c 2014-07-29 15:40:51.000000000 -0400
@@ -348,7 +348,19 @@ createdb(const CreatedbStmt *stmt)
* nor any indexes that depend on collation or ctype, so template0 can be
* used as template for creating a database with any encoding or locale.
*/
- if (strcmp(dbtemplate, "template0") != 0)
+ if ((strcmp(dbtemplate, "template0") != 0)
+#ifdef __QNX__
+ /* KBAKER: QNX6 port has some problem here.
+ * Regression test fails when copying template1 to template0 with msg below:
+ * copying template1 to template0 FATAL: 22023:
+ * new LC_CTYPE (C;collate:POSIX;ctype:POSIX) is incompatible with
+ * the LC_CTYPE of the template database (POSIX;messages:C)
+ *
+ * For now QNX6 will live with the assumption/restriction that template1 will contain only ASCII
+ */
+ && (strcmp(dbtemplate, "template1") != 0)
+#endif
+ )
{
if (encoding != src_encoding)
ereport(ERROR,
diff -rdupN postgresql-9.4beta2/src/backend/port/posix_shmem.c postgresql-9.4beta2_qnx/src/backend/port/posix_shmem.c
--- postgresql-9.4beta2/src/backend/port/posix_shmem.c 1969-12-31 19:00:00.000000000 -0500
+++ postgresql-9.4beta2_qnx/src/backend/port/posix_shmem.c 2014-07-29 17:27:43.000000000 -0400
@@ -0,0 +1,492 @@
+/*-------------------------------------------------------------------------
+ *
+ * posix_shmem.c
+ * Implement shared memory using POSIX (non-SysV) facilities
+ *
+ * These routines represent a fairly thin layer on top of POSIX (non-SysV) shared
+ * memory functionality. Originally created for QNX6 port.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/port/posix_shmem.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include <signal.h>
+#include <unistd.h>
+#include <sys/file.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#ifdef HAVE_SYS_IPC_H
+#include <sys/ipc.h>
+#endif
+#ifdef HAVE_SYS_SHM_H
+#include <sys/shm.h>
+#endif
+
+#include "postgres.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/pg_shmem.h"
+
+
+typedef key_t IpcMemoryKey; /* shared memory key passed to shmget(2) */
+typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */
+
+#define IPCProtection (0600) /* access/modify by user only */
+
+#ifdef SHM_SHARE_MMU /* use intimate shared memory on Solaris */
+#define PG_SHMAT_FLAGS SHM_SHARE_MMU
+#else
+#define PG_SHMAT_FLAGS 0
+#endif
+
+/* Linux prefers MAP_ANONYMOUS, but the flag is called MAP_ANON on other systems. */
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+/* BSD-derived systems have MAP_HASSEMAPHORE, but it's not present (or needed) on Linux. */
+#ifndef MAP_HASSEMAPHORE
+#define MAP_HASSEMAPHORE 0
+#endif
+
+#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
+
+/* Some really old systems don't define MAP_FAILED. */
+#ifndef MAP_FAILED
+#define MAP_FAILED ((void *) -1)
+#endif
+
+
+/* Variables to remember details for the USED named shared memory segment (small) */
+#define USED_SHMEM_SEG_NAME_MAX 100
+static char UsedShmemSegName[USED_SHMEM_SEG_NAME_MAX];
+static Size UsedShmemSegSize= 0;
+unsigned long UsedShmemSegID = 0;
+void *UsedShmemSegAddr = NULL;
+
+/* Variables to remember details for the MAIN anonymous shared memory segment (large) */
+static Size AnonymousShmemSize;
+static void *AnonymousShmem;
+
+static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
+static void IpcMemoryDetach(int status, Datum shmaddr);
+static void IpcMemoryDelete(int status, Datum shmId);
+static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
+ IpcMemoryId *shmid);
+
+
+static char *
+keytoname(key_t key, char *name)
+{
+ /* POSIX shared memory segment names which start with "/" are system-wide (across processes) */
+ sprintf(name, "/PgShmHdr%x", key);
+ return name;
+}
+
+/*
+ * InternalIpcMemoryCreate(memKey, size)
+ *
+ * Attempt to create a new shared memory segment with the specified key.
+ * Will fail (return NULL) if such a segment already exists. If successful,
+ * attach the segment to the current process and return its attached address.
+ * On success, callbacks are registered with on_shmem_exit to detach and
+ * delete the segment when on_shmem_exit is called.
+ *
+ * If we fail with a failure code other than collision-with-existing-segment,
+ * print out an error and abort. Other types of errors are not recoverable.
+ */
+static void *
+InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size)
+{
+ IpcMemoryId shmid;
+ void *memAddress;
+
+ keytoname(memKey, UsedShmemSegName);
+
+ shmid = shm_open(UsedShmemSegName, (O_CREAT | O_EXCL | O_RDWR), 0);
+
+ if (shmid < 0)
+ {
+ int shmopen_errno = errno;
+
+ /*
+ * Fail quietly if error indicates a collision with existing segment.
+ * One would expect EEXIST, given that we said IPC_EXCL, but perhaps
+ * we could get a permission violation instead? Also, EIDRM might
+ * occur if an old seg is slated for destruction but not gone yet.
+ */
+ if (shmopen_errno == EEXIST || shmopen_errno == EACCES
+#ifdef EIDRM
+ || shmopen_errno == EIDRM
+#endif
+ )
+ return NULL;
+
+
+ /*
+ * Else complain and abort.
+ */
+ errno = shmopen_errno;
+
+ ereport(FATAL,
+ (errmsg("could not map used shared memory (%s): %m", UsedShmemSegName),
+ (shmopen_errno == ENOMEM) ?
+ errhint("This error usually means that PostgreSQL's request "
+ "for a shared memory segment exceeded available memory "
+ "or swap space. To reduce the request size (currently "
+ "%lu bytes), reduce PostgreSQL's shared memory usage, "
+ "perhaps by reducing shared_buffers or "
+ "max_connections.",
+ (unsigned long) size) : 0));
+
+ return NULL;
+ }
+
+ /* we need to set the size of the shared memory segment after creation */
+ if (ftruncate(shmid, size) < 0)
+ elog(FATAL, "ftruncate(shmid=%d, size=%lu) failed: %m", shmid, (unsigned long) size);
+
+ /* Register on-exit routine to delete the new segment */
+ on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid));
+
+ /* OK, should be able to attach to the segment */
+ memAddress = mmap(NULL, size, (PROT_READ|PROT_WRITE), MAP_SHARED, shmid, 0);
+
+ /* remember the USED shared memory info so we can unmap and unlink (delete) it upon exit */
+ UsedShmemSegID = shmid;
+ UsedShmemSegAddr = memAddress;
+ UsedShmemSegSize = size;
+
+ if (memAddress == (void *) -1)
+ //TODO: fix elog
+ elog(FATAL, "mmap(id=%d, name=%s, size=%lu) failed: %m", shmid, UsedShmemSegName, (unsigned long)UsedShmemSegSize);
+
+ /* Register on-exit routine to detach new segment before deleting */
+ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));
+
+ /*
+ * Store shmem key and ID in data directory lockfile. Format to try to
+ * keep it the same length always (trailing junk in the lockfile won't
+ * hurt, but might confuse humans).
+ */
+ {
+ char line[64];
+
+ sprintf(line, "%9lu %9lu",
+ (unsigned long) memKey, (unsigned long) shmid);
+ AddToDataDirLockFile(LOCK_FILE_LINE_SHMEM_KEY, line);
+ }
+
+ return memAddress;
+}
+
+/* from process' address spaceq */
+/* (called as an on_shmem_exit callback, hence funny argument list) */
+/****************************************************************************/
+static void
+IpcMemoryDetach(int status, Datum shmaddr)
+{
+ /* Release USED shared memory block, if any. */
+ if (UsedShmemSegAddr != NULL
+ && munmap(UsedShmemSegAddr, UsedShmemSegSize) < 0)
+ elog(LOG, "munmap(%p) failed: %m", UsedShmemSegAddr);
+ /* Release anonymous shared memory block, if any. */
+ if (AnonymousShmem != NULL
+ && munmap(AnonymousShmem, AnonymousShmemSize) < 0)
+ elog(LOG, "munmap(%p) failed: %m", AnonymousShmem);
+}
+
+/****************************************************************************/
+/* IpcMemoryDelete(status, shmId) deletes a shared memory segment */
+/* (called as an on_shmem_exit callback, hence funny argument list) */
+/****************************************************************************/
+static void
+IpcMemoryDelete(int status, Datum shmId)
+{
+
+ if (shm_unlink(UsedShmemSegName) < 0)
+ elog(LOG, "shm_unlink(%s) failed: %m",
+ UsedShmemSegName);
+}
+
+/*
+ * PGSharedMemoryIsInUse
+ *
+ * Is a previously-existing shmem segment still existing and in use?
+ *
+ * The point of this exercise is to detect the case where a prior postmaster
+ * crashed, but it left child backends that are still running. Therefore
+ * we only care about shmem segments that are associated with the intended
+ * DataDir. This is an important consideration since accidental matches of
+ * shmem segment IDs are reasonably common.
+ */
+bool
+PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
+{
+ /* TODO: Enhance for QNX? */
+ return false;
+}
+
+
+/*
+ * PGSharedMemoryCreate
+ *
+ * Create a shared memory segment of the given size and initialize its
+ * standard header. Also, register an on_shmem_exit callback to release
+ * the storage.
+ *
+ * Dead Postgres segments are recycled if found, but we do not fail upon
+ * collision with non-Postgres shmem segments. The idea here is to detect and
+ * re-use keys that may have been assigned by a crashed postmaster or backend.
+ *
+ * makePrivate means to always create a new segment, rather than attach to
+ * or recycle any existing segment.
+ *
+ * The port number is passed for possible use as a key (for SysV, we use
+ * it to generate the starting shmem key). In a standalone backend,
+ * zero will be passed.
+ */
+PGShmemHeader *
+PGSharedMemoryCreate(Size size, bool makePrivate, int port,
+/* KBAKER: added for 9.4beta */ PGShmemHeader **shim
+)
+{
+ IpcMemoryKey NextShmemSegID;
+ void *memAddress;
+ PGShmemHeader *hdr;
+ struct stat statbuf;
+ Size sysvsize = size;
+
+ /* Room for a header? */
+ Assert(size > MAXALIGN(sizeof(PGShmemHeader)));
+
+ /*
+ * As of PostgreSQL 9.3, we normally allocate only a very small amount of
+ * System V shared memory, and only for the purposes of providing an
+ * interlock to protect the data directory. The real shared memory block
+ * is allocated using mmap(). This works around the problem that many
+ * systems have very low limits on the amount of System V shared memory
+ * that can be allocated. Even a limit of a few megabytes will be enough
+ * to run many copies of PostgreSQL without needing to adjust system
+ * settings.
+ *
+ * However, we disable this logic in the EXEC_BACKEND case, and fall back
+ * to the old method of allocating the entire segment using System V
+ * shared memory, because there's no way to attach an mmap'd segment to a
+ * process after exec(). Since EXEC_BACKEND is intended only for
+ * developer use, this shouldn't be a big problem.
+ */
+#ifndef EXEC_BACKEND
+ {
+ long pagesize = sysconf(_SC_PAGE_SIZE);
+
+ /*
+ * Ensure request size is a multiple of pagesize.
+ *
+ * pagesize will, for practical purposes, always be a power of two.
+ * But just in case it isn't, we do it this way instead of using
+ * TYPEALIGN().
+ */
+ if (pagesize > 0 && size % pagesize != 0)
+ size += pagesize - (size % pagesize);
+
+ /*
+ * We assume that no one will attempt to run PostgreSQL 9.3 or later
+ * on systems that are ancient enough that anonymous shared memory is
+ * not supported, such as pre-2.4 versions of Linux. If that turns
+ * out to be false, we might need to add a run-time test here and do
+ * this only if the running kernel supports it.
+ */
+ AnonymousShmem = mmap(NULL, size, PROT_READ | PROT_WRITE, PG_MMAP_FLAGS,
+ -1, 0);
+ if (AnonymousShmem == MAP_FAILED)
+ {
+ int saved_errno = errno;
+
+ ereport(FATAL,
+ (errmsg("could not map anonymous shared memory: %m"),
+ (saved_errno == ENOMEM) ?
+ errhint("This error usually means that PostgreSQL's request "
+ "for a shared memory segment exceeded available memory "
+ "or swap space. To reduce the request size (currently "
+ "%lu bytes), reduce PostgreSQL's shared memory usage, "
+ "perhaps by reducing shared_buffers or "
+ "max_connections.",
+ (unsigned long) size) : 0));
+ }
+ AnonymousShmemSize = size;
+
+ /* Now we need only allocate a minimal-sized SysV shmem block. */
+ sysvsize = sizeof(PGShmemHeader);
+
+
+ }
+#endif
+
+ /* Make sure PGSharedMemoryAttach doesn't fail without need */
+ UsedShmemSegAddr = NULL;
+
+ /* Loop till we find a free IPC key */
+ NextShmemSegID = port * 1000;
+
+ for (NextShmemSegID++;; NextShmemSegID++)
+ {
+ /* Try to create new segment */
+ memAddress = InternalIpcMemoryCreate(NextShmemSegID, sysvsize);
+ if (memAddress)
+ break; /* successful create and attach */
+
+ /*
+ * Can only get here if some other process managed to create the same
+ * shmem key before we did. Let him have that one, loop around to try
+ * next key.
+ */
+ }
+
+ /*
+ * OK, we created a new segment. Mark it as created by this process. The
+ * order of assignments here is critical so that another Postgres process
+ * can't see the header as valid but belonging to an invalid PID!
+ */
+ hdr = (PGShmemHeader *) memAddress;
+ hdr->creatorPID = getpid();
+ hdr->magic = PGShmemMagic;
+ hdr->dsm_control = 0; /* KBAKER: Added for 9.4beta */
+
+ /* Fill in the data directory ID info, too */
+ if (stat(DataDir, &statbuf) < 0)
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("could not stat data directory \"%s\": %m",
+ DataDir)));
+ hdr->device = statbuf.st_dev;
+ hdr->inode = statbuf.st_ino;
+
+ /*
+ * Initialize space allocation status for segment.
+ */
+ hdr->totalsize = size;
+ hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
+ *shim = hdr; /* KBAKER: Added for 9.4beta */
+
+ /* Save info for possible future use */
+ UsedShmemSegAddr = memAddress;
+ UsedShmemSegID = (unsigned long) NextShmemSegID;
+
+ /*
+ * If AnonymousShmem is NULL here, then we're not using anonymous shared
+ * memory, and should return a pointer to the System V shared memory
+ * block. Otherwise, the System V shared memory block is only a shim, and
+ * we must return a pointer to the real block.
+ */
+ if (AnonymousShmem == NULL)
+ return hdr;
+ memcpy(AnonymousShmem, hdr, sizeof(PGShmemHeader));
+ return (PGShmemHeader *) AnonymousShmem;
+}
+
+#ifdef EXEC_BACKEND
+
+/*
+ * PGSharedMemoryReAttach
+ *
+ * Re-attach to an already existing shared memory segment. In the non
+ * EXEC_BACKEND case this is not used, because postmaster children inherit
+ * the shared memory segment attachment via fork().
+ *
+ * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+ * routine. The caller must have already restored them to the postmaster's
+ * values.
+ */
+void
+PGSharedMemoryReAttach(void)
+{
+ IpcMemoryId shmid;
+ void *hdr;
+ void *origUsedShmemSegAddr = UsedShmemSegAddr;
+
+ Assert(UsedShmemSegAddr != NULL);
+ Assert(IsUnderPostmaster);
+
+#ifdef __CYGWIN__
+ /* cygipc (currently) appears to not detach on exec. */
+ PGSharedMemoryDetach();
+ UsedShmemSegAddr = origUsedShmemSegAddr;
+#endif
+
+ elog(DEBUG3, "attaching to %p", UsedShmemSegAddr);
+ hdr = (void *) PGSharedMemoryAttach((IpcMemoryKey) UsedShmemSegID, &shmid);
+ if (hdr == NULL)
+ elog(FATAL, "could not reattach to shared memory (key=%d, addr=%p): %m",
+ (int) UsedShmemSegID, UsedShmemSegAddr);
+ if (hdr != origUsedShmemSegAddr)
+ elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)",
+ hdr, origUsedShmemSegAddr);
+
+ UsedShmemSegAddr = hdr; /* probably redundant */
+}
+#endif /* EXEC_BACKEND */
+
+/*
+ * PGSharedMemoryDetach
+ *
+ * Detach from the shared memory segment, if still attached. This is not
+ * intended for use by the process that originally created the segment
+ * (it will have an on_shmem_exit callback registered to do that). Rather,
+ * this is for subprocesses that have inherited an attachment and want to
+ * get rid of it.
+ */
+void
+PGSharedMemoryDetach(void)
+{
+ if (UsedShmemSegAddr != NULL
+ && munmap(UsedShmemSegAddr, UsedShmemSegSize) < 0)
+ {
+ elog(LOG, "used munmap(%p) failed: %m", UsedShmemSegAddr);
+ }
+ else
+ {
+ UsedShmemSegAddr = NULL;
+ }
+
+ /* Release anonymous shared memory block, if any. */
+ if (AnonymousShmem != NULL
+ && munmap(AnonymousShmem, AnonymousShmemSize) < 0)
+ elog(LOG, "anonymous munmap(%p) failed: %m", AnonymousShmem);
+}
+
+
+/*
+ * Attach to shared memory and make sure it has a Postgres header
+ *
+ * Returns attach address if OK, else NULL
+ */
+static PGShmemHeader *
+PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid)
+{
+ PGShmemHeader *hdr;
+
+ keytoname(key, UsedShmemSegName);
+
+ *shmid = shm_open(UsedShmemSegName, (O_RDWR), IPCProtection);
+ if (*shmid == -1)
+ return NULL;
+
+ hdr = mmap(UsedShmemSegAddr, UsedShmemSegSize, (PROT_READ|PROT_WRITE), MAP_SHARED, *shmid, 0);
+
+ if (hdr == (PGShmemHeader *) -1)
+ return NULL; /* failed: must be some other app's */
+
+ if (hdr->magic != PGShmemMagic)
+ {
+ munmap(UsedShmemSegName, UsedShmemSegSize);
+ return NULL; /* segment belongs to a non-Postgres app */
+ }
+
+ return hdr;
+}
diff -rdupN postgresql-9.4beta2/src/backend/replication/logical/logical.c postgresql-9.4beta2_qnx/src/backend/replication/logical/logical.c
--- postgresql-9.4beta2/src/backend/replication/logical/logical.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/replication/logical/logical.c 2014-07-29 16:16:04.000000000 -0400
@@ -168,7 +168,7 @@ StartupDecodingContext(List *output_plug
ctx->out = makeStringInfo();
ctx->prepare_write = prepare_write;
- ctx->write = do_write;
+ ctx->do_write = do_write;
ctx->output_plugin_options = output_plugin_options;
@@ -510,7 +510,7 @@ OutputPluginWrite(struct LogicalDecoding
if (!ctx->prepared_write)
elog(ERROR, "OutputPluginPrepareWrite needs to be called before OutputPluginWrite");
- ctx->write(ctx, ctx->write_location, ctx->write_xid, last_write);
+ ctx->do_write(ctx, ctx->write_location, ctx->write_xid, last_write);
ctx->prepared_write = false;
}
diff -rdupN postgresql-9.4beta2/src/backend/utils/error/elog.c postgresql-9.4beta2_qnx/src/backend/utils/error/elog.c
--- postgresql-9.4beta2/src/backend/utils/error/elog.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/utils/error/elog.c 2014-07-29 15:40:51.000000000 -0400
@@ -3326,8 +3326,10 @@ get_errno_symbol(int errnum)
return "EAGAIN";
#endif
#ifdef EALREADY
+#if !defined(__QNX__)
case EALREADY:
return "EALREADY";
+#endif /* !defined(__QNX__) */
#endif
case EBADF:
return "EBADF";
diff -rdupN postgresql-9.4beta2/src/backend/utils/misc/postgresql.conf.sample postgresql-9.4beta2_qnx/src/backend/utils/misc/postgresql.conf.sample
--- postgresql-9.4beta2/src/backend/utils/misc/postgresql.conf.sample 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/utils/misc/postgresql.conf.sample 2014-07-29 16:56:42.000000000 -0400
@@ -408,6 +408,7 @@
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default # terse, default, or verbose messages
+#log_error_verbosity = verbose # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '' # special values:
# %a = application name
diff -rdupN postgresql-9.4beta2/src/include/port.h postgresql-9.4beta2_qnx/src/include/port.h
--- postgresql-9.4beta2/src/include/port.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/include/port.h 2014-07-29 16:35:33.000000000 -0400
@@ -479,4 +479,97 @@ extern char *escape_single_quotes_ascii(
/* port/wait_error.c */
extern char *wait_result_to_str(int exit_status);
+
+#if defined(__QNX__)
+
+#include <sys/mman.h>
+#include <sys/select.h>
+#include <fcntl.h>
+
+/* QNX does not support sigaction SA_RESTART. We must retry interrupted calls (EINTR) */
+
+/* Helper macros, used to build our retry macros */
+#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; })
+#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int)
+#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *)
+
+/* override calls known to return EINTR when interrupted */
+#define close(a) PG_RETRY_EINTR(close(a))
+#define fclose(a) PG_RETRY_EINTR(fclose(a))
+#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b))
+#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b))
+#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c))
+#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c))
+#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c))
+#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b))
+#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c))
+#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; })
+#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c))
+#define stat(a,b) PG_RETRY_EINTR(stat(a,b))
+#define unlink(a) PG_RETRY_EINTR(unlink(a))
+
+/* reads and writes can be partial, and not always return -1 on failure. retry these partials. */
+#define read(fildes,buf,nbytes) ({ \
+ ssize_t _tmp_bytes_completed = 0; \
+ while (_tmp_bytes_completed < (nbytes)) { \
+ ssize_t _tmp_rc = read(fildes, (char *)(buf) + _tmp_bytes_completed, (nbytes) - _tmp_bytes_completed); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_bytes_completed == 0) _tmp_bytes_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_bytes_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_bytes_completed; \
+ })
+#define fread(buf,size,num,fp) ({ \
+ size_t _tmp_elements_completed = 0; \
+ while (_tmp_elements_completed < (num)) { \
+ size_t _tmp_rc = fread((char *)(buf) + (_tmp_elements_completed * (size)), size, (num) - _tmp_elements_completed, fp); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_elements_completed == 0) _tmp_elements_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_elements_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_elements_completed; \
+ })
+#define write(fildes,buf,nbytes) ({ \
+ ssize_t _tmp_bytes_completed = 0; \
+ while (_tmp_bytes_completed < (nbytes)) { \
+ ssize_t _tmp_rc = write(fildes, (char *)(buf) + _tmp_bytes_completed, (nbytes) - _tmp_bytes_completed); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_bytes_completed == 0) _tmp_bytes_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_bytes_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_bytes_completed; \
+ })
+#define fwrite(buf,size,num,fp) ({ \
+ size_t _tmp_elements_completed = 0; \
+ while (_tmp_elements_completed < (num)) { \
+ size_t _tmp_rc = fwrite((char *)(buf) + (_tmp_elements_completed * (size)), size, (num) - _tmp_elements_completed, fp); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_elements_completed == 0) _tmp_elements_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_elements_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_elements_completed; \
+ })
+
+#endif /* __QNX__ */
+
#endif /* PG_PORT_H */
diff -rdupN postgresql-9.4beta2/src/include/replication/logical.h postgresql-9.4beta2_qnx/src/include/replication/logical.h
--- postgresql-9.4beta2/src/include/replication/logical.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/include/replication/logical.h 2014-07-29 16:15:08.000000000 -0400
@@ -49,7 +49,7 @@ typedef struct LogicalDecodingContext
* User-Provided callback for writing/streaming out data.
*/
LogicalOutputPluginWriterPrepareWrite prepare_write;
- LogicalOutputPluginWriterWrite write;
+ LogicalOutputPluginWriterWrite do_write;
/*
* Output buffer.
diff -rdupN postgresql-9.4beta2/src/include/storage/dsm_impl.h postgresql-9.4beta2_qnx/src/include/storage/dsm_impl.h
--- postgresql-9.4beta2/src/include/storage/dsm_impl.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/include/storage/dsm_impl.h 2014-07-29 16:27:14.000000000 -0400
@@ -32,10 +32,14 @@
#define USE_DSM_POSIX
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_POSIX
#endif
+
+#if !defined(__QNX__)
#define USE_DSM_SYSV
#ifndef DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_SYSV
#endif
+#endif /* !defined(__QNX__) */
+
#define USE_DSM_MMAP
#endif
diff -rdupN postgresql-9.4beta2/src/template/linux postgresql-9.4beta2_qnx/src/template/linux
--- postgresql-9.4beta2/src/template/linux 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/template/linux 2014-07-29 15:40:51.000000000 -0400
@@ -28,3 +28,10 @@ if test "$SUN_STUDIO_CC" = "yes" ; then
;;
esac
fi
+
+case $host_os in
+ *qnx6*)
+ USE_UNNAMED_POSIX_SEMAPHORES=1
+ USE_POSIX_SHARED_MEMORY=1
+ ;;
+esac
diff -rdupN postgresql-9.4beta2/src/timezone/private.h postgresql-9.4beta2_qnx/src/timezone/private.h
--- postgresql-9.4beta2/src/timezone/private.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/timezone/private.h 2014-07-29 15:40:52.000000000 -0400
@@ -40,7 +40,10 @@
*/
#ifndef remove
+
+#ifndef __QNX__
extern int unlink(const char *filename);
+#endif
#define remove unlink
#endif /* !defined remove */
pg_934_qnx_20140729.patchapplication/octet-stream; name=pg_934_qnx_20140729.patchDownload
diff -rdupN postgresql-9.3.4/INSTALL postgresql-9.3.4_qnx/INSTALL
--- postgresql-9.3.4/INSTALL 2014-03-17 15:44:44.000000000 -0400
+++ postgresql-9.3.4_qnx/INSTALL 2014-07-25 05:23:58.000000000 -0400
@@ -1542,3 +1542,16 @@ gmake: *** [postgres] Error 1
your DTrace installation is too old to handle probes in static
functions. You need Solaris 10u4 or newer.
+ __________________________________________________________________
+
+QNX 6.5 and QNX 6.6
+
+ PostgreSQL can be built natively on QNX 6.5 SP1 using gcc.
+ The executables will also run on QNX 6.6.
+ Changes required for QNX:
+ a. Replace all System V shared memory with POSIX named shared memory (posix_shmem.c).
+ b. port.h now includes a #ifdef __QNX__ section, where macros to retry
+ interrupeted system calls (e.g., read, write) are defined.
+ This is needed because QNX does not support sigaction SA_RESTART.
+
+ ./configure --without-readline --disable-thread-safety
diff -rdupN postgresql-9.3.4/configure postgresql-9.3.4_qnx/configure
--- postgresql-9.3.4/configure 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/configure 2014-07-25 05:20:12.000000000 -0400
@@ -2192,7 +2192,7 @@ dragonfly*) template=netbsd ;;
freebsd*) template=freebsd ;;
hpux*) template=hpux ;;
irix*) template=irix ;;
- linux*|gnu*|k*bsd*-gnu)
+ linux*|gnu*|k*bsd*-gnu|*qnx6*)
template=linux ;;
mingw*) template=win32 ;;
netbsd*) template=netbsd ;;
@@ -28856,20 +28856,27 @@ fi
# Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
+if test "$PORTNAME" = "win32"; then
cat >>confdefs.h <<\_ACEOF
-#define USE_SYSV_SHARED_MEMORY 1
+#define USE_WIN32_SHARED_MEMORY 1
_ACEOF
- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
+ SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test x"$USE_POSIX_SHARED_MEMORY" = x"1" ; then
+
+cat >>confdefs.h <<\_ACEOF
+#define USE_POSIX_SHARED_MEMORY 1
+_ACEOF
+
+ SHMEM_IMPLEMENTATION="src/backend/port/posix_shmem.c"
else
cat >>confdefs.h <<\_ACEOF
-#define USE_WIN32_SHARED_MEMORY 1
+#define USE_SYSV_SHARED_MEMORY 1
_ACEOF
- SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+ SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
fi
# Select latch implementation type.
diff -rdupN postgresql-9.3.4/configure.in postgresql-9.3.4_qnx/configure.in
--- postgresql-9.3.4/configure.in 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/configure.in 2014-07-29 11:44:48.000000000 -0400
@@ -61,7 +61,7 @@ dragonfly*) template=netbsd ;;
freebsd*) template=freebsd ;;
hpux*) template=hpux ;;
irix*) template=irix ;;
- linux*|gnu*|k*bsd*-gnu)
+ linux*|gnu*|k*bsd*-gnu|*qnx6*)
template=linux ;;
mingw*) template=win32 ;;
netbsd*) template=netbsd ;;
@@ -1774,12 +1774,15 @@ fi
# Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
- AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared memory.])
- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
-else
+if test "$PORTNAME" = "win32"; then
AC_DEFINE(USE_WIN32_SHARED_MEMORY, 1, [Define to select Win32-style shared memory.])
SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test x"$USE_POSIX_SHARED_MEMORY" = x"1" ; then
+ AC_DEFINE(USE_POSIX_SHARED_MEMORY, 1, [Define to select POSIX-style shared memory (QNX).])
+ SHMEM_IMPLEMENTATION="src/backend/port/posix_shmem.c"
+else
+ AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared memory.])
+ SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
fi
# Select latch implementation type.
diff -rdupN postgresql-9.3.4/src/backend/Makefile postgresql-9.3.4_qnx/src/backend/Makefile
--- postgresql-9.3.4/src/backend/Makefile 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/backend/Makefile 2014-07-29 15:24:23.000000000 -0400
@@ -52,6 +52,7 @@ all: submake-libpgport submake-schemapg
ifneq ($(PORTNAME), cygwin)
ifneq ($(PORTNAME), win32)
ifneq ($(PORTNAME), aix)
+ifeq (,$(findstring qnx6, $(host_os)))
postgres: $(OBJS)
$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@
@@ -59,6 +60,7 @@ postgres: $(OBJS)
endif
endif
endif
+endif
ifeq ($(PORTNAME), cygwin)
@@ -115,6 +117,14 @@ endif
endif # aix
+ifneq (,$(findstring qnx6, $(host_os)))
+
+postgres: $(OBJS)
+ $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) $(export_dynamic) $(call expand_subsys,$^) $(LIBS) -o $@
+ ldrel -S 3M $@
+
+endif # nto-qnx6.5.0
+
# Update the commonly used headers before building the subdirectories
$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/catalog/schemapg.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/probes.h
diff -rdupN postgresql-9.3.4/src/backend/commands/dbcommands.c postgresql-9.3.4_qnx/src/backend/commands/dbcommands.c
--- postgresql-9.3.4/src/backend/commands/dbcommands.c 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/backend/commands/dbcommands.c 2014-07-25 02:27:56.000000000 -0400
@@ -348,7 +348,19 @@ createdb(const CreatedbStmt *stmt)
* nor any indexes that depend on collation or ctype, so template0 can be
* used as template for creating a database with any encoding or locale.
*/
- if (strcmp(dbtemplate, "template0") != 0)
+ if ((strcmp(dbtemplate, "template0") != 0)
+#ifdef __QNX__
+ /* KBAKER: QNX6 port has some problem here.
+ * Regression test fails when copying template1 to template0 with msg below:
+ * copying template1 to template0 FATAL: 22023:
+ * new LC_CTYPE (C;collate:POSIX;ctype:POSIX) is incompatible with
+ * the LC_CTYPE of the template database (POSIX;messages:C)
+ *
+ * For now QNX6 will live with the assumption/restriction that template1 will contain only ASCII
+ */
+ && (strcmp(dbtemplate, "template1") != 0)
+#endif
+ )
{
if (encoding != src_encoding)
ereport(ERROR,
diff -rdupN postgresql-9.3.4/src/backend/port/posix_shmem.c postgresql-9.3.4_qnx/src/backend/port/posix_shmem.c
--- postgresql-9.3.4/src/backend/port/posix_shmem.c 1969-12-31 19:00:00.000000000 -0500
+++ postgresql-9.3.4_qnx/src/backend/port/posix_shmem.c 2014-07-25 02:44:39.000000000 -0400
@@ -0,0 +1,492 @@
+/*-------------------------------------------------------------------------
+ *
+ * posix_shmem.c
+ * Implement shared memory using POSIX (non-SysV) facilities
+ *
+ * These routines represent a fairly thin layer on top of POSIX (non-SysV) shared
+ * memory functionality. Originally created for QNX6 port.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/port/posix_shmem.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef __QNX__
+#error "This file is NOT TESTED on your platform! Use at your own risk!"
+#endif
+
+#include <signal.h>
+#include <unistd.h>
+#include <sys/file.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#ifdef HAVE_SYS_IPC_H
+#include <sys/ipc.h>
+#endif
+#ifdef HAVE_SYS_SHM_H
+#include <sys/shm.h>
+#endif
+
+#include "postgres.h"
+#include "miscadmin.h"
+#include "storage/ipc.h"
+#include "storage/pg_shmem.h"
+
+
+typedef key_t IpcMemoryKey; /* shared memory key passed to shmget(2) */
+typedef int IpcMemoryId; /* shared memory ID returned by shmget(2) */
+
+#define IPCProtection (0600) /* access/modify by user only */
+
+#ifdef SHM_SHARE_MMU /* use intimate shared memory on Solaris */
+#define PG_SHMAT_FLAGS SHM_SHARE_MMU
+#else
+#define PG_SHMAT_FLAGS 0
+#endif
+
+/* Linux prefers MAP_ANONYMOUS, but the flag is called MAP_ANON on other systems. */
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+/* BSD-derived systems have MAP_HASSEMAPHORE, but it's not present (or needed) on Linux. */
+#ifndef MAP_HASSEMAPHORE
+#define MAP_HASSEMAPHORE 0
+#endif
+
+#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
+
+/* Some really old systems don't define MAP_FAILED. */
+#ifndef MAP_FAILED
+#define MAP_FAILED ((void *) -1)
+#endif
+
+
+/* Variables to remember details for the USED named shared memory segment (small) */
+#define USED_SHMEM_SEG_NAME_MAX 100
+static char UsedShmemSegName[USED_SHMEM_SEG_NAME_MAX];
+static Size UsedShmemSegSize= 0;
+unsigned long UsedShmemSegID = 0;
+void *UsedShmemSegAddr = NULL;
+
+/* Variables to remember details for the MAIN anonymous shared memory segment (large) */
+static Size AnonymousShmemSize;
+static void *AnonymousShmem;
+
+static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
+static void IpcMemoryDetach(int status, Datum shmaddr);
+static void IpcMemoryDelete(int status, Datum shmId);
+static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key,
+ IpcMemoryId *shmid);
+
+
+static char *
+keytoname(key_t key, char *name)
+{
+ /* POSIX shared memory segment names which start with "/" are system-wide (across processes) */
+ sprintf(name, "/PgShmHdr%x", key);
+ return name;
+}
+
+/*
+ * InternalIpcMemoryCreate(memKey, size)
+ *
+ * Attempt to create a new shared memory segment with the specified key.
+ * Will fail (return NULL) if such a segment already exists. If successful,
+ * attach the segment to the current process and return its attached address.
+ * On success, callbacks are registered with on_shmem_exit to detach and
+ * delete the segment when on_shmem_exit is called.
+ *
+ * If we fail with a failure code other than collision-with-existing-segment,
+ * print out an error and abort. Other types of errors are not recoverable.
+ */
+static void *
+InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size)
+{
+ IpcMemoryId shmid;
+ void *memAddress;
+
+ keytoname(memKey, UsedShmemSegName);
+
+ shmid = shm_open(UsedShmemSegName, (O_CREAT | O_EXCL | O_RDWR), 0);
+
+ if (shmid < 0)
+ {
+ int shmopen_errno = errno;
+
+ /*
+ * Fail quietly if error indicates a collision with existing segment.
+ * One would expect EEXIST, given that we said IPC_EXCL, but perhaps
+ * we could get a permission violation instead? Also, EIDRM might
+ * occur if an old seg is slated for destruction but not gone yet.
+ */
+ if (shmopen_errno == EEXIST || shmopen_errno == EACCES
+#ifdef EIDRM
+ || shmopen_errno == EIDRM
+#endif
+ )
+ return NULL;
+
+
+ /*
+ * Else complain and abort.
+ */
+ errno = shmopen_errno;
+
+ ereport(FATAL,
+ (errmsg("could not map used shared memory (%s): %m", UsedShmemSegName),
+ (shmopen_errno == ENOMEM) ?
+ errhint("This error usually means that PostgreSQL's request "
+ "for a shared memory segment exceeded available memory "
+ "or swap space. To reduce the request size (currently "
+ "%lu bytes), reduce PostgreSQL's shared memory usage, "
+ "perhaps by reducing shared_buffers or "
+ "max_connections.",
+ (unsigned long) size) : 0));
+
+ return NULL;
+ }
+
+ /* we need to set the size of the shared memory segment after creation */
+ if (ftruncate(shmid, size) < 0)
+ elog(FATAL, "ftruncate(shmid=%d, size=%lu) failed: %m", shmid, (unsigned long) size);
+
+ /* Register on-exit routine to delete the new segment */
+ on_shmem_exit(IpcMemoryDelete, Int32GetDatum(shmid));
+
+ /* OK, should be able to attach to the segment */
+ memAddress = mmap(NULL, size, (PROT_READ|PROT_WRITE), MAP_SHARED, shmid, 0);
+
+ /* remember the USED shared memory info so we can unmap and unlink (delete) it upon exit */
+ UsedShmemSegID = shmid;
+ UsedShmemSegAddr = memAddress;
+ UsedShmemSegSize = size;
+
+ if (memAddress == (void *) -1)
+ //TODO: fix elog
+ elog(FATAL, "mmap(id=%d, name=%s, size=%lu) failed: %m", shmid, UsedShmemSegName, (unsigned long)UsedShmemSegSize);
+
+ /* Register on-exit routine to detach new segment before deleting */
+ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));
+
+ /*
+ * Store shmem key and ID in data directory lockfile. Format to try to
+ * keep it the same length always (trailing junk in the lockfile won't
+ * hurt, but might confuse humans).
+ */
+ {
+ char line[64];
+
+ sprintf(line, "%9lu %9lu",
+ (unsigned long) memKey, (unsigned long) shmid);
+ AddToDataDirLockFile(LOCK_FILE_LINE_SHMEM_KEY, line);
+ }
+
+ return memAddress;
+}
+
+/* from process' address spaceq */
+/* (called as an on_shmem_exit callback, hence funny argument list) */
+/****************************************************************************/
+static void
+IpcMemoryDetach(int status, Datum shmaddr)
+{
+ /* Release USED shared memory block, if any. */
+ if (UsedShmemSegAddr != NULL
+ && munmap(UsedShmemSegAddr, UsedShmemSegSize) < 0)
+ elog(LOG, "munmap(%p) failed: %m", UsedShmemSegAddr);
+ /* Release anonymous shared memory block, if any. */
+ if (AnonymousShmem != NULL
+ && munmap(AnonymousShmem, AnonymousShmemSize) < 0)
+ elog(LOG, "munmap(%p) failed: %m", AnonymousShmem);
+}
+
+/****************************************************************************/
+/* IpcMemoryDelete(status, shmId) deletes a shared memory segment */
+/* (called as an on_shmem_exit callback, hence funny argument list) */
+/****************************************************************************/
+static void
+IpcMemoryDelete(int status, Datum shmId)
+{
+
+ if (shm_unlink(UsedShmemSegName) < 0)
+ elog(LOG, "shm_unlink(%s) failed: %m",
+ UsedShmemSegName);
+}
+
+/*
+ * PGSharedMemoryIsInUse
+ *
+ * Is a previously-existing shmem segment still existing and in use?
+ *
+ * The point of this exercise is to detect the case where a prior postmaster
+ * crashed, but it left child backends that are still running. Therefore
+ * we only care about shmem segments that are associated with the intended
+ * DataDir. This is an important consideration since accidental matches of
+ * shmem segment IDs are reasonably common.
+ */
+bool
+PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
+{
+ /* TODO: Enhance for QNX? */
+ return false;
+}
+
+
+/*
+ * PGSharedMemoryCreate
+ *
+ * Create a shared memory segment of the given size and initialize its
+ * standard header. Also, register an on_shmem_exit callback to release
+ * the storage.
+ *
+ * Dead Postgres segments are recycled if found, but we do not fail upon
+ * collision with non-Postgres shmem segments. The idea here is to detect and
+ * re-use keys that may have been assigned by a crashed postmaster or backend.
+ *
+ * makePrivate means to always create a new segment, rather than attach to
+ * or recycle any existing segment.
+ *
+ * The port number is passed for possible use as a key (for SysV, we use
+ * it to generate the starting shmem key). In a standalone backend,
+ * zero will be passed.
+ */
+PGShmemHeader *
+PGSharedMemoryCreate(Size size, bool makePrivate, int port)
+{
+ IpcMemoryKey NextShmemSegID;
+ void *memAddress;
+ PGShmemHeader *hdr;
+ struct stat statbuf;
+ Size sysvsize = size;
+
+ /* Room for a header? */
+ Assert(size > MAXALIGN(sizeof(PGShmemHeader)));
+
+ /*
+ * As of PostgreSQL 9.3, we normally allocate only a very small amount of
+ * System V shared memory, and only for the purposes of providing an
+ * interlock to protect the data directory. The real shared memory block
+ * is allocated using mmap(). This works around the problem that many
+ * systems have very low limits on the amount of System V shared memory
+ * that can be allocated. Even a limit of a few megabytes will be enough
+ * to run many copies of PostgreSQL without needing to adjust system
+ * settings.
+ *
+ * However, we disable this logic in the EXEC_BACKEND case, and fall back
+ * to the old method of allocating the entire segment using System V
+ * shared memory, because there's no way to attach an mmap'd segment to a
+ * process after exec(). Since EXEC_BACKEND is intended only for
+ * developer use, this shouldn't be a big problem.
+ */
+#ifndef EXEC_BACKEND
+ {
+ long pagesize = sysconf(_SC_PAGE_SIZE);
+
+ /*
+ * Ensure request size is a multiple of pagesize.
+ *
+ * pagesize will, for practical purposes, always be a power of two.
+ * But just in case it isn't, we do it this way instead of using
+ * TYPEALIGN().
+ */
+ if (pagesize > 0 && size % pagesize != 0)
+ size += pagesize - (size % pagesize);
+
+ /*
+ * We assume that no one will attempt to run PostgreSQL 9.3 or later
+ * on systems that are ancient enough that anonymous shared memory is
+ * not supported, such as pre-2.4 versions of Linux. If that turns
+ * out to be false, we might need to add a run-time test here and do
+ * this only if the running kernel supports it.
+ */
+ AnonymousShmem = mmap(NULL, size, PROT_READ | PROT_WRITE, PG_MMAP_FLAGS,
+ -1, 0);
+ if (AnonymousShmem == MAP_FAILED)
+ {
+ int saved_errno = errno;
+
+ ereport(FATAL,
+ (errmsg("could not map anonymous shared memory: %m"),
+ (saved_errno == ENOMEM) ?
+ errhint("This error usually means that PostgreSQL's request "
+ "for a shared memory segment exceeded available memory "
+ "or swap space. To reduce the request size (currently "
+ "%lu bytes), reduce PostgreSQL's shared memory usage, "
+ "perhaps by reducing shared_buffers or "
+ "max_connections.",
+ (unsigned long) size) : 0));
+ }
+ AnonymousShmemSize = size;
+
+ /* Now we need only allocate a minimal-sized SysV shmem block. */
+ sysvsize = sizeof(PGShmemHeader);
+
+
+ }
+#endif
+
+ /* Make sure PGSharedMemoryAttach doesn't fail without need */
+ UsedShmemSegAddr = NULL;
+
+ /* Loop till we find a free IPC key */
+ NextShmemSegID = port * 1000;
+
+ for (NextShmemSegID++;; NextShmemSegID++)
+ {
+ /* Try to create new segment */
+ memAddress = InternalIpcMemoryCreate(NextShmemSegID, sysvsize);
+ if (memAddress)
+ break; /* successful create and attach */
+
+ /*
+ * Can only get here if some other process managed to create the same
+ * shmem key before we did. Let him have that one, loop around to try
+ * next key.
+ */
+ }
+
+ /*
+ * OK, we created a new segment. Mark it as created by this process. The
+ * order of assignments here is critical so that another Postgres process
+ * can't see the header as valid but belonging to an invalid PID!
+ */
+ hdr = (PGShmemHeader *) memAddress;
+ hdr->creatorPID = getpid();
+ hdr->magic = PGShmemMagic;
+
+ /* Fill in the data directory ID info, too */
+ if (stat(DataDir, &statbuf) < 0)
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("could not stat data directory \"%s\": %m",
+ DataDir)));
+ hdr->device = statbuf.st_dev;
+ hdr->inode = statbuf.st_ino;
+
+ /*
+ * Initialize space allocation status for segment.
+ */
+ hdr->totalsize = size;
+ hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
+
+ /* Save info for possible future use */
+ UsedShmemSegAddr = memAddress;
+ UsedShmemSegID = (unsigned long) NextShmemSegID;
+
+ /*
+ * If AnonymousShmem is NULL here, then we're not using anonymous shared
+ * memory, and should return a pointer to the System V shared memory
+ * block. Otherwise, the System V shared memory block is only a shim, and
+ * we must return a pointer to the real block.
+ */
+ if (AnonymousShmem == NULL)
+ return hdr;
+ memcpy(AnonymousShmem, hdr, sizeof(PGShmemHeader));
+ return (PGShmemHeader *) AnonymousShmem;
+}
+
+#ifdef EXEC_BACKEND
+
+/*
+ * PGSharedMemoryReAttach
+ *
+ * Re-attach to an already existing shared memory segment. In the non
+ * EXEC_BACKEND case this is not used, because postmaster children inherit
+ * the shared memory segment attachment via fork().
+ *
+ * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this
+ * routine. The caller must have already restored them to the postmaster's
+ * values.
+ */
+void
+PGSharedMemoryReAttach(void)
+{
+ IpcMemoryId shmid;
+ void *hdr;
+ void *origUsedShmemSegAddr = UsedShmemSegAddr;
+
+ Assert(UsedShmemSegAddr != NULL);
+ Assert(IsUnderPostmaster);
+
+#ifdef __CYGWIN__
+ /* cygipc (currently) appears to not detach on exec. */
+ PGSharedMemoryDetach();
+ UsedShmemSegAddr = origUsedShmemSegAddr;
+#endif
+
+ elog(DEBUG3, "attaching to %p", UsedShmemSegAddr);
+ hdr = (void *) PGSharedMemoryAttach((IpcMemoryKey) UsedShmemSegID, &shmid);
+ if (hdr == NULL)
+ elog(FATAL, "could not reattach to shared memory (key=%d, addr=%p): %m",
+ (int) UsedShmemSegID, UsedShmemSegAddr);
+ if (hdr != origUsedShmemSegAddr)
+ elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)",
+ hdr, origUsedShmemSegAddr);
+
+ UsedShmemSegAddr = hdr; /* probably redundant */
+}
+#endif /* EXEC_BACKEND */
+
+/*
+ * PGSharedMemoryDetach
+ *
+ * Detach from the shared memory segment, if still attached. This is not
+ * intended for use by the process that originally created the segment
+ * (it will have an on_shmem_exit callback registered to do that). Rather,
+ * this is for subprocesses that have inherited an attachment and want to
+ * get rid of it.
+ */
+void
+PGSharedMemoryDetach(void)
+{
+ if (UsedShmemSegAddr != NULL
+ && munmap(UsedShmemSegAddr, UsedShmemSegSize) < 0)
+ {
+ elog(LOG, "used munmap(%p) failed: %m", UsedShmemSegAddr);
+ }
+ else
+ {
+ UsedShmemSegAddr = NULL;
+ }
+
+ /* Release anonymous shared memory block, if any. */
+ if (AnonymousShmem != NULL
+ && munmap(AnonymousShmem, AnonymousShmemSize) < 0)
+ elog(LOG, "anonymous munmap(%p) failed: %m", AnonymousShmem);
+}
+
+
+/*
+ * Attach to shared memory and make sure it has a Postgres header
+ *
+ * Returns attach address if OK, else NULL
+ */
+static PGShmemHeader *
+PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid)
+{
+ PGShmemHeader *hdr;
+
+ keytoname(key, UsedShmemSegName);
+
+ *shmid = shm_open(UsedShmemSegName, (O_RDWR), IPCProtection);
+ if (*shmid == -1)
+ return NULL;
+
+ hdr = mmap(UsedShmemSegAddr, UsedShmemSegSize, (PROT_READ|PROT_WRITE), MAP_SHARED, *shmid, 0);
+
+ if (hdr == (PGShmemHeader *) -1)
+ return NULL; /* failed: must be some other app's */
+
+ if (hdr->magic != PGShmemMagic)
+ {
+ munmap(UsedShmemSegName, UsedShmemSegSize);
+ return NULL; /* segment belongs to a non-Postgres app */
+ }
+
+ return hdr;
+}
diff -rdupN postgresql-9.3.4/src/backend/utils/error/elog.c postgresql-9.3.4_qnx/src/backend/utils/error/elog.c
--- postgresql-9.3.4/src/backend/utils/error/elog.c 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/backend/utils/error/elog.c 2014-07-25 02:11:57.000000000 -0400
@@ -3024,8 +3024,10 @@ get_errno_symbol(int errnum)
return "EAGAIN";
#endif
#ifdef EALREADY
+#if !defined(__QNX__)
case EALREADY:
return "EALREADY";
+#endif /* !defined(__QNX__) */
#endif
case EBADF:
return "EBADF";
diff -rdupN postgresql-9.3.4/src/include/port.h postgresql-9.3.4_qnx/src/include/port.h
--- postgresql-9.3.4/src/include/port.h 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/include/port.h 2014-07-25 04:30:55.000000000 -0400
@@ -472,4 +472,96 @@ extern char *escape_single_quotes_ascii(
/* port/wait_error.c */
extern char *wait_result_to_str(int exit_status);
+
+#if defined(__QNX__)
+
+#include <sys/select.h>
+#include <fcntl.h>
+
+/* QNX does not support sigaction SA_RESTART. We must retry interrupted calls (EINTR) */
+
+/* Helper macros, used to build our retry macros */
+#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; })
+#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int)
+#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *)
+
+/* override calls known to return EINTR when interrupted */
+#define close(a) PG_RETRY_EINTR(close(a))
+#define fclose(a) PG_RETRY_EINTR(fclose(a))
+#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b))
+#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b))
+#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c))
+#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c))
+#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c))
+#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b))
+#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c))
+#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; })
+#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c))
+#define stat(a,b) PG_RETRY_EINTR(stat(a,b))
+#define unlink(a) PG_RETRY_EINTR(unlink(a))
+
+/* reads and writes can be partial, and not always return -1 on failure. retry these partials. */
+#define read(fildes,buf,nbytes) ({ \
+ ssize_t _tmp_bytes_completed = 0; \
+ while (_tmp_bytes_completed < (nbytes)) { \
+ ssize_t _tmp_rc = read(fildes, (char *)(buf) + _tmp_bytes_completed, (nbytes) - _tmp_bytes_completed); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_bytes_completed == 0) _tmp_bytes_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_bytes_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_bytes_completed; \
+ })
+#define fread(buf,size,num,fp) ({ \
+ size_t _tmp_elements_completed = 0; \
+ while (_tmp_elements_completed < (num)) { \
+ size_t _tmp_rc = fread((char *)(buf) + (_tmp_elements_completed * (size)), size, (num) - _tmp_elements_completed, fp); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_elements_completed == 0) _tmp_elements_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_elements_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_elements_completed; \
+ })
+#define write(fildes,buf,nbytes) ({ \
+ ssize_t _tmp_bytes_completed = 0; \
+ while (_tmp_bytes_completed < (nbytes)) { \
+ ssize_t _tmp_rc = write(fildes, (char *)(buf) + _tmp_bytes_completed, (nbytes) - _tmp_bytes_completed); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_bytes_completed == 0) _tmp_bytes_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_bytes_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_bytes_completed; \
+ })
+#define fwrite(buf,size,num,fp) ({ \
+ size_t _tmp_elements_completed = 0; \
+ while (_tmp_elements_completed < (num)) { \
+ size_t _tmp_rc = fwrite((char *)(buf) + (_tmp_elements_completed * (size)), size, (num) - _tmp_elements_completed, fp); \
+ if (_tmp_rc <= 0) { \
+ if (errno == EINTR) continue; \
+ if (_tmp_elements_completed == 0) _tmp_elements_completed = _tmp_rc; \
+ break; \
+ } \
+ else { \
+ _tmp_elements_completed += _tmp_rc; \
+ } \
+ } \
+ _tmp_elements_completed; \
+ })
+
+#endif /* __QNX__ */
+
#endif /* PG_PORT_H */
diff -rdupN postgresql-9.3.4/src/template/linux postgresql-9.3.4_qnx/src/template/linux
--- postgresql-9.3.4/src/template/linux 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/template/linux 2014-07-25 02:57:09.000000000 -0400
@@ -28,3 +28,10 @@ if test "$SUN_STUDIO_CC" = "yes" ; then
;;
esac
fi
+
+case $host_os in
+ *qnx6*)
+ USE_UNNAMED_POSIX_SEMAPHORES=1
+ USE_POSIX_SHARED_MEMORY=1
+ ;;
+esac
diff -rdupN postgresql-9.3.4/src/timezone/private.h postgresql-9.3.4_qnx/src/timezone/private.h
--- postgresql-9.3.4/src/timezone/private.h 2014-03-17 15:35:47.000000000 -0400
+++ postgresql-9.3.4_qnx/src/timezone/private.h 2014-07-25 00:55:23.000000000 -0400
@@ -40,7 +40,10 @@
*/
#ifndef remove
+
+#ifndef __QNX__
extern int unlink(const char *filename);
+#endif
#define remove unlink
#endif /* !defined remove */
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
If there are existing tests I can run to ensure the QNX port meets your criteria for robust failure handling in this area I would be happy to run them.
If not, perhaps someone can provide a quick list of failure modes to consider.
As-is:
- starting of a second postmaster fails with message 'FATAL: lock file "postmaster.pid" already exists'
- Kill -9 of postmaster followed by a pg_ctl start seems to go through recovery, although the original shared memory segments hang out in /dev/shmem until reboot (that could be better).
Unfortunately, that probably proves it's broken rather than that it works.
The behavior we need is that after kill -9'ing the postmaster, subsequent
postmaster start attempts *fail* until all the original postmaster's child
processes are gone. Otherwise you end up with two independent sets of
processes scribbling on the same files (and not sharing shmem either).
Kiss consistency goodbye ...
It's possible that all the children automatically exited, especially if
you had only background processes active; but if you had a live regular
session it would not exit just because the parent process died.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 29, 2014 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think it would be good to spend some energy figuring out what to do
about this.Well, we've been around on this multiple times before, but if we have
any new ideas, sure ...
Well, I tried to compile a more comprehensive list of possible
techniques in that email than I've seen anyone post before.
Still, it's not clear to me how we could put much faith in flock.
Yeah, after some more research, I think you're right. Apparently, as
recently as 2010, the Linux kernel transparently converted flock()
requests to fcntl()-style locks when running on NFS:
http://0pointer.de/blog/projects/locking.html
Maybe someday this will be reliable enough to use, but the odds of it
happening in the next decade don't look good.
Finally, how about named pipes? Linux says that trying to open a
named pipe for write when there are no readers will return ENXIO, and
attempting to write to an already-open pipe with no remaining readers
will cause SIGPIPE. So: create a permanent named pipe in the data
directory that all PostgreSQL processes keep open. When the
postmaster starts, it opens the pipe for read, then for write, then
closes it for read. It then tries to write to the pipe. If this
fails to result in SIGPIPE, then somebody else has got the thing open;
so the new postmaster should die at once. But if does get a SIGPIPE
then there are as of that moment no other readers.Hm. That particular protocol is broken: two postmasters doing it at the
same time would both pass (because neither has it open for read at the
instant where they try to write). But we could possibly frob the idea
until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it would work.
But does NFS support named pipes?
Looks iffy, on a quick search. Sigh.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jul 29, 2014 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. That particular protocol is broken: two postmasters doing it at the
same time would both pass (because neither has it open for read at the
instant where they try to write). But we could possibly frob the idea
until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it would work.
But does NFS support named pipes?
Looks iffy, on a quick search. Sigh.
I poked around, and it seems like a lot of the people who think it's flaky
are imagining that they should be able to use a named pipe on an NFS
server to pass data between two different machines. That doesn't work,
but it's not what we need, either. For communication between processes
on the same server, all that's needed is that the filesystem entry looks
like a pipe to the local kernel --- and that's been required NFS
functionality since RFC1813 (v3, in 1995).
So it seems like we could possibly go this route, assuming we can think
of a variant of your proposal that's race-condition-free. A disadvantage
compared to a true file lock is that it would not protect against people
trying to start postmasters from two different NFS client machines --- but
we don't have protection against that now. (Maybe we could do this *and*
do a regular file lock to offer some protection against that case, even if
it's not bulletproof?)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert and Tom,
Please let me know if either of you are ready to experiment with the "named pipe" idea anytime soon.
If not, I would be happy to take a crack at it, but would appreciate your expert advice to start me down the right path (files/functions to update, pseudo-code, etc.).
-Keith Baker
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, July 30, 2014 11:02 AM
To: Robert Haas
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLRobert Haas <robertmhaas@gmail.com> writes:
On Tue, Jul 29, 2014 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. That particular protocol is broken: two postmasters doing it at
the same time would both pass (because neither has it open for read
at the instant where they try to write). But we could possibly frob
the idea until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it wouldwork.
But does NFS support named pipes?
Looks iffy, on a quick search. Sigh.
I poked around, and it seems like a lot of the people who think it's flaky are
imagining that they should be able to use a named pipe on an NFS server to
pass data between two different machines. That doesn't work, but it's not
what we need, either. For communication between processes on the same
server, all that's needed is that the filesystem entry looks like a pipe to the
local kernel --- and that's been required NFS functionality since RFC1813 (v3,
in 1995).So it seems like we could possibly go this route, assuming we can think of a
variant of your proposal that's race-condition-free. A disadvantage
compared to a true file lock is that it would not protect against people trying
to start postmasters from two different NFS client machines --- but we don't
have protection against that now. (Maybe we could do this *and* do a
regular file lock to offer some protection against that case, even if it's not
bulletproof?)regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
Please let me know if either of you are ready to experiment with the "named pipe" idea anytime soon.
If not, I would be happy to take a crack at it, but would appreciate your expert advice to start me down the right path (files/functions to update, pseudo-code, etc.).
Well, before we start coding anything, the first order of business would
be to think of a bulletproof locking protocol using the available pipe
operations. Robert's straw man isn't that, but it seems like there might
be one in there.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 30, 2014 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So it seems like we could possibly go this route, assuming we can think
of a variant of your proposal that's race-condition-free. A disadvantage
compared to a true file lock is that it would not protect against people
trying to start postmasters from two different NFS client machines --- but
we don't have protection against that now. (Maybe we could do this *and*
do a regular file lock to offer some protection against that case, even if
it's not bulletproof?)
That's not a bad idea. By the way, it also wouldn't be too hard to
test at runtime whether or not flock() has first-close semantics. Not
that we'd want this exact design, but suppose you configure
shmem_interlock=flock in postgresql.conf. On startup, we test whether
flock is reliable, determine that it is, and proceed accordingly.
Now, you move your database onto an NFS volume and the semantics
change (because, hey, breaking userspace assumptions is fun) and try
to restart up your database, and it says FATAL: flock() is broken.
Now you can either move the database back, or set shmem_interlock to
some other value.
Now maybe, as you say, it's best to use multiple locking protocols and
hope that at least one will catch whatever the dangerous situation is.
I'm just trying to point out that we need not blindly assume the
semantics we want are there (or that they are not); we can check.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I will on vacation until August 11, I look forward to any progress you are able to make.
Since ensuring there are not orphaned back-end processes is vital, could we add a check for getppid() == 1 ?
Patch below seemed to work on QNX (first client command after a kill -9 of postmaster resulted in exit of its associated server process).
diff -rdup postgresql-9.3.5/src/backend/tcop/postgres.c postgresql-9.3.5_qnx/src/backend/tcop/postgres.c
--- postgresql-9.3.5/src/backend/tcop/postgres.c 2014-07-21 15:10:42.000000000 -0400
+++ postgresql-9.3.5_qnx/src/backend/tcop/postgres.c 2014-07-31 18:17:40.000000000 -0400
@@ -3967,6 +3967,14 @@ PostgresMain(int argc, char *argv[],
*/
firstchar = ReadCommand(&input_message);
+#ifndef WIN32
+ /* Check for death of parent */
+ if (getppid() == 1)
+ ereport(FATAL,
+ (errcode(ERRCODE_CRASH_SHUTDOWN),
+ errmsg("Parent server process has exited")));
+#endif
+
/*
* (4) disable async signal conditions again.
*/
Keith Baker
-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, July 31, 2014 12:58 PM
To: Tom Lane
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLOn Wed, Jul 30, 2014 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So it seems like we could possibly go this route, assuming we can
think of a variant of your proposal that's race-condition-free. A
disadvantage compared to a true file lock is that it would not protect
against people trying to start postmasters from two different NFS
client machines --- but we don't have protection against that now.
(Maybe we could do this *and* do a regular file lock to offer some
protection against that case, even if it's not bulletproof?)That's not a bad idea. By the way, it also wouldn't be too hard to test at
runtime whether or not flock() has first-close semantics. Not that we'd want
this exact design, but suppose you configure shmem_interlock=flock in
postgresql.conf. On startup, we test whether flock is reliable, determine
that it is, and proceed accordingly.
Now, you move your database onto an NFS volume and the semantics
change (because, hey, breaking userspace assumptions is fun) and try to
restart up your database, and it says FATAL: flock() is broken.
Now you can either move the database back, or set shmem_interlock to
some other value.Now maybe, as you say, it's best to use multiple locking protocols and hope
that at least one will catch whatever the dangerous situation is.
I'm just trying to point out that we need not blindly assume the semantics we
want are there (or that they are not); we can check.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
Since ensuring there are not orphaned back-end processes is vital, could we add a check for getppid() == 1 ?
No. Or yeah, we could, but that patch would add no security worth
mentioning. For example, someone could launch a query that runs for
many minutes, and would have plenty of time to conflict with a
subsequently-started postmaster.
Even without that issue, there's no consensus that forcibly making
orphan backends exit would be a good thing. (Some people would
like to have such an option, but the key word in that sentence is
"option".)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 31, 2014 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
Since ensuring there are not orphaned back-end processes is vital, could we add a check for getppid() == 1 ?
No. Or yeah, we could, but that patch would add no security worth
mentioning. For example, someone could launch a query that runs for
many minutes, and would have plenty of time to conflict with a
subsequently-started postmaster.
True.
Even without that issue, there's no consensus that forcibly making
orphan backends exit would be a good thing. (Some people would
like to have such an option, but the key word in that sentence is
"option".)
I believe that multiple people have said multiple times that we should
change the behavior so that orphaned backends exit immediately; I
think you are the only one defending the current behavior. There are
several problems with the status quo:
1. Most seriously, once the postmaster is gone, there's nobody to
SIGQUIT remaining backends if somebody exits uncleanly. This means
that a backend running without a postmaster could be running in a
corrupt shared memory segment, which could lead to all sorts of
misbehavior, including possible data corruption.
2. Operationally, orphaned backends prevent the system from being
restarted. There's no easy, automatic way to kill them, so scripts
that automatically restart the database server if it exits don't work.
Even if letting the remaining backends continue to operate is good,
not being able to accept new connections is bad enough to completely
overshadow it. In many situations, killing them is a small price to
pay to get the system back on line.
3. Practically, the performance of any remaining backends will be
poor, because processes like the WAL writer and background writer
aren't going to be around to help any more. I think this will only
get worse over time; certainly, any future parallel query facility
won't work if the postmaster isn't around to fork new children. And
maybe we'll have other utility processes over time, too. But in any
case the situation isn't great right now, either.
Now, I don't say that any of this is a reason not to have a strong
shared memory interlock, but I'm quite unconvinced that the current
behavior should even be optional, let alone the default.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/04/2014 07:54 AM, Robert Haas wrote:
1. Most seriously, once the postmaster is gone, there's nobody to
SIGQUIT remaining backends if somebody exits uncleanly. This means
that a backend running without a postmaster could be running in a
corrupt shared memory segment, which could lead to all sorts of
misbehavior, including possible data corruption.
I've seen this in the field.
2. Operationally, orphaned backends prevent the system from being
restarted. There's no easy, automatic way to kill them, so scripts
that automatically restart the database server if it exits don't work.
I've also seen this in the field.
Now, I don't say that any of this is a reason not to have a strong
shared memory interlock, but I'm quite unconvinced that the current
behavior should even be optional, let alone the default.
I always assumed that the current behavior existed because we *couldn't*
fix it, not because anybody wanted it.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM77a968dc7316e69bc43bba6c328c1d71fd621d91fdb58e5449576b06d1a09e9258adb12bbb5b71c44a80fa6267d08a08@asav-2.01.com
On 2014-08-04 10:54:25 -0400, Robert Haas wrote:
On Thu, Jul 31, 2014 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Even without that issue, there's no consensus that forcibly making
orphan backends exit would be a good thing. (Some people would
like to have such an option, but the key word in that sentence is
"option".)I believe that multiple people have said multiple times that we should
change the behavior so that orphaned backends exit immediately; I
think you are the only one defending the current behavior. There are
several problems with the status quo:
+1. I think the current behaviour is a seriously bad idea.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-04 10:54:25 -0400, Robert Haas wrote:
I believe that multiple people have said multiple times that we should
change the behavior so that orphaned backends exit immediately; I
think you are the only one defending the current behavior. There are
several problems with the status quo:
+1. I think the current behaviour is a seriously bad idea.
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.
Ideally we'd have some way to configure the behavior appropriately for
a given installation; but short of that, it's unclear to me that
unilaterally changing the system's bias is something our users would
thank us for. I've not noticed a large groundswell of complaints about
it (though this may just reflect that we've made the postmaster pretty
darn robust, so that the case seldom comes up).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-04 10:54:25 -0400, Robert Haas wrote:
I believe that multiple people have said multiple times that we should
change the behavior so that orphaned backends exit immediately; I
think you are the only one defending the current behavior. There are
several problems with the status quo:+1. I think the current behaviour is a seriously bad idea.
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.
Nah. The current behaviour circumvents security measures we normally
consider absolutely essential. If the postmaster died some bad shit went
on. The likelihood of hitting corner case bugs where it's important that
we react to a segfault/panic with a restart/crash replay is rather high.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.
Nah. The current behaviour circumvents security measures we normally
consider absolutely essential. If the postmaster died some bad shit went
on. The likelihood of hitting corner case bugs where it's important that
we react to a segfault/panic with a restart/crash replay is rather high.
What's your point? Once a new postmaster starts, it *will* do a crash
restart, because certainly no shutdown checkpoint ever happened. The
only issue here is what grace period existing orphaned backends are given
to finish their work --- and it's not possible for the answer to that
to be "zero", so you don't get to assume that nothing happens in
backend-land after the instant of postmaster crash.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-09 14:09:36 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.Nah. The current behaviour circumvents security measures we normally
consider absolutely essential. If the postmaster died some bad shit went
on. The likelihood of hitting corner case bugs where it's important that
we react to a segfault/panic with a restart/crash replay is rather high.What's your point? Once a new postmaster starts, it *will* do a crash
restart, because certainly no shutdown checkpoint ever happened.
That's not saying much. For one, there can be online checkpoints in that
time. So it's certainly not guaranteed (or even all that likely) that
all the WAL since the incident is replayed. For another, it can be
*hours* before all the backends finish.
IIRC we'll continue to happily write WAL and everything after postmaster
(and possibly some backends, corrupting shmem) have crashed. The
bgwriter, checkpointer, backends will continue to write dirty buffers to
disk. We'll IIRC continue to write checkpoints. That's simply not
things we should be doing after postmaster crashed if we can avoid at
all.
The
only issue here is what grace period existing orphaned backends are given
to finish their work --- and it's not possible for the answer to that
to be "zero", so you don't get to assume that nothing happens in
backend-land after the instant of postmaster crash.
Sure. But I don't think a window in the range of seconds comes close to
being the same as a window that easily can be hours.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
[Due for a new subject line?]
On Sat, Aug 09, 2014 at 08:16:01PM +0200, Andres Freund wrote:
On 2014-08-09 14:09:36 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.Nah. The current behaviour circumvents security measures we normally
consider absolutely essential. If the postmaster died some bad shit went
on. The likelihood of hitting corner case bugs where it's important that
we react to a segfault/panic with a restart/crash replay is rather high.What's your point? Once a new postmaster starts, it *will* do a crash
restart, because certainly no shutdown checkpoint ever happened.That's not saying much. For one, there can be online checkpoints in that
time. So it's certainly not guaranteed (or even all that likely) that
all the WAL since the incident is replayed. For another, it can be
*hours* before all the backends finish.IIRC we'll continue to happily write WAL and everything after postmaster
(and possibly some backends, corrupting shmem) have crashed. The
bgwriter, checkpointer, backends will continue to write dirty buffers to
disk. We'll IIRC continue to write checkpoints. That's simply not
things we should be doing after postmaster crashed if we can avoid at
all.
The basic support processes, including the checkpointer, exit promptly upon
detecting a postmaster exit. Checkpoints cease. Your central point still
stands. WAL protects data integrity only to the extent that we stop writing
it after shared memory ceases to be trustworthy. Crash recovery of WAL
written based on corrupt buffers just reproduces the corruption.
The
only issue here is what grace period existing orphaned backends are given
to finish their work --- and it's not possible for the answer to that
to be "zero", so you don't get to assume that nothing happens in
backend-land after the instant of postmaster crash.
Our grace period for active backends after unclean exit of one of their peers
is low, milliseconds to seconds. Our grace period for active backends after
unclean exit of the postmaster is unconstrained. At least one of those
policies has to be wrong. Like Andres and Robert, I pick the second one.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Noah Misch (noah@leadboat.com) wrote:
[Due for a new subject line?]
Probably.
Our grace period for active backends after unclean exit of one of their peers
is low, milliseconds to seconds. Our grace period for active backends after
unclean exit of the postmaster is unconstrained. At least one of those
policies has to be wrong. Like Andres and Robert, I pick the second one.
Ditto for me. The postmaster going away really is a bad sign and the
confusion due to leftover processes is terrible for our users.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> wrote:
Our grace period for active backends after unclean exit of one
of their peers is low, milliseconds to seconds. Our grace
period for active backends after unclean exit of the postmaster
is unconstrained. At least one of those policies has to be
wrong. Like Andres and Robert, I pick the second one.Ditto for me.
+1
In fact, I would say that is slightly understated. The grace
period for active backends after unclean exit of one of their peers
is low, milliseconds to seconds, *unless the postmaster has also
crashed* -- in which case it is unconstrained. Why is the crash of
a backend less serious if the postmaster has also crashed?
Certainly it can't be considered to be surprising that if the
postmaster is crashing that other backends might be also crashing
around the same time?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-10 18:36:18 -0400, Noah Misch wrote:
[Due for a new subject line?]
On Sat, Aug 09, 2014 at 08:16:01PM +0200, Andres Freund wrote:
On 2014-08-09 14:09:36 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.Nah. The current behaviour circumvents security measures we normally
consider absolutely essential. If the postmaster died some bad shit went
on. The likelihood of hitting corner case bugs where it's important that
we react to a segfault/panic with a restart/crash replay is rather high.What's your point? Once a new postmaster starts, it *will* do a crash
restart, because certainly no shutdown checkpoint ever happened.That's not saying much. For one, there can be online checkpoints in that
time. So it's certainly not guaranteed (or even all that likely) that
all the WAL since the incident is replayed. For another, it can be
*hours* before all the backends finish.IIRC we'll continue to happily write WAL and everything after postmaster
(and possibly some backends, corrupting shmem) have crashed. The
bgwriter, checkpointer, backends will continue to write dirty buffers to
disk. We'll IIRC continue to write checkpoints. That's simply not
things we should be doing after postmaster crashed if we can avoid at
all.The basic support processes, including the checkpointer, exit promptly upon
detecting a postmaster exit. Checkpoints cease.
Only after finishing an 'in process' checkpoint though afaics. And only
if no new checkpoint has been requested since. The latter because we
don't even test for postmaster death if a latch has been set... I think
it's similar for the bgwriter and such.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 9, 2014 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1. I think the current behaviour is a seriously bad idea.
I don't think it's anywhere near as black-and-white as you guys claim.
What it comes down to is whether allowing existing transactions/sessions
to finish is more important than allowing new sessions to start.
Depending on the application, either could be more important.
It's partly about that, and I think the answer is that being able to
start new sessions is almost always more important; but it's also
about about the fact that the postmaster provides essential
protections against data corruption, and running without those
protections is a bad idea. If it's not a bad idea, then why do we
need those protections ever? Why have we put so much effort into
bullet-proofing them over the years?
I mean, we could simply regard the unexpected end of a backend as
being something that is "probably OK" and we'd usually be right; after
all, a backend would crap out without releasing a critical spinlock
very often. A lot of users would probably be very happy to be
liberated from the tyranny of a server-wide restart every time a
backend crashes, and 90% of the time nothing bad would happen. But
clearly this is insanity, because every now and then something would
go terribly wrong and there would be no automated way for the system
to recover, and on even rarer occasions your data would get eaten.
That is why it is right to think that the service provided by the
postmaster is essential, not nice-to-have.
Ideally we'd have some way to configure the behavior appropriately for
a given installation; but short of that, it's unclear to me that
unilaterally changing the system's bias is something our users would
thank us for. I've not noticed a large groundswell of complaints about
it (though this may just reflect that we've made the postmaster pretty
darn robust, so that the case seldom comes up).
I do think that's a large part of it. The postmaster doesn't get
killed very often, and when it does, things are often messed up to a
degree where the user's just going to reboot anyway. But I've
encountered customers who managed to corrupt their database because
backends didn't exit when the postmaster died, because it turns out
that removing postmaster.pid defeats the shared memory interlocks that
normally prevent starting a new postmaster, and the customer did that.
And I've personally experienced at least one protracted outage that
resulted from orphaned backends preventing 'pg_ctl restart' from
working. If the postmaster weren't so reliable, I'm sure these kinds
of problems would be a lot more common.
But the fact that they're uncommon doesn't mean that the current
behavior is the best one, and I'm convinced that it isn't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert and Tom,
I assume you guys are working on other priorities, so I did some locking experiments on QNX.
I know fcntl() locking has downsides, but I think it deserves a second look:
- it is POSIX, so should be fairly consistent across platforms (at least more consistent than lockf and flock)
- the "accidental" open/close lock release can be easily avoided (simply don't add new code which touches the new, unique lock file)
- don't know if it will work on NFS, but that is not a priority for me (is that really a requirement for a QNX port?)
Existing System V shared memory locking can be left in place for all existing platforms (so nothing lost), while fcntl()-style locking could be limited to platforms which lack System V shared memory (like QNX).
Experimental patch is attached, but logic is basically this:
a. postmaster obtains exclusive lock on data dir file "postmaster.fcntl" (or FATAL)
b. postmaster then downgrades to shared lock (or FATAL)
c. all other backend processes obtain shared lock on this file (or FATAL)
A quick test on QNX 6.5 appeared to behave well (orphan backends left behind after kill -9 of postmaster held their locks, thus database restart was prevented as desired).
Let me know if there are other test scenarios to consider.
Thanks!
-Keith Baker
Show quoted text
-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, July 31, 2014 12:58 PM
To: Tom Lane
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLOn Wed, Jul 30, 2014 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So it seems like we could possibly go this route, assuming we can
think of a variant of your proposal that's race-condition-free. A
disadvantage compared to a true file lock is that it would not protect
against people trying to start postmasters from two different NFS
client machines --- but we don't have protection against that now.
(Maybe we could do this *and* do a regular file lock to offer some
protection against that case, even if it's not bulletproof?)That's not a bad idea. By the way, it also wouldn't be too hard to test at
runtime whether or not flock() has first-close semantics. Not that we'd want
this exact design, but suppose you configure shmem_interlock=flock in
postgresql.conf. On startup, we test whether flock is reliable, determine
that it is, and proceed accordingly.
Now, you move your database onto an NFS volume and the semantics
change (because, hey, breaking userspace assumptions is fun) and try to
restart up your database, and it says FATAL: flock() is broken.
Now you can either move the database back, or set shmem_interlock to
some other value.Now maybe, as you say, it's best to use multiple locking protocols and hope
that at least one will catch whatever the dangerous situation is.
I'm just trying to point out that we need not blindly assume the semantics we
want are there (or that they are not); we can check.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
Attachments:
fcntl_lock_20140813.patchapplication/octet-stream; name=fcntl_lock_20140813.patchDownload
diff -rdup postgresql-9.4beta2/src/backend/bootstrap/bootstrap.c postgresql-9.4beta2_qnx/src/backend/bootstrap/bootstrap.c
--- postgresql-9.4beta2/src/backend/bootstrap/bootstrap.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/bootstrap/bootstrap.c 2014-08-13 13:07:27.000000000 -0400
@@ -352,6 +352,11 @@ AuxiliaryProcessMain(int argc, char *arg
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
+
+ /* obtain shared lock using fcntl (or FATAL error) */
+ CreateDataDirLockFcntl(false);
+
+
/* Initialize MaxBackends (if under postmaster, was done already) */
if (!IsUnderPostmaster)
InitializeMaxBackends();
diff -rdup postgresql-9.4beta2/src/backend/postmaster/postmaster.c postgresql-9.4beta2_qnx/src/backend/postmaster/postmaster.c
--- postgresql-9.4beta2/src/backend/postmaster/postmaster.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/postmaster/postmaster.c 2014-08-13 17:21:06.000000000 -0400
@@ -881,6 +881,9 @@ PostmasterMain(int argc, char *argv[])
*/
CreateDataDirLockFile(true);
+ /* postmaster gets exclusive lock then holds shared lock (or FATAL error) */
+ CreateDataDirLockFcntl(true);
+
/*
* Initialize SSL library, if specified.
*/
diff -rdup postgresql-9.4beta2/src/backend/tcop/postgres.c postgresql-9.4beta2_qnx/src/backend/tcop/postgres.c
--- postgresql-9.4beta2/src/backend/tcop/postgres.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/tcop/postgres.c 2014-08-13 17:23:00.000000000 -0400
@@ -3679,6 +3679,9 @@ PostgresMain(int argc, char *argv[],
/* Initialize MaxBackends (if under postmaster, was done already) */
InitializeMaxBackends();
}
+
+ /* obtain shared lock using fcntl (or FATAL error) */
+ CreateDataDirLockFcntl(false);
/* Early initialization */
BaseInit();
diff -rdup postgresql-9.4beta2/src/backend/utils/init/miscinit.c postgresql-9.4beta2_qnx/src/backend/utils/init/miscinit.c
--- postgresql-9.4beta2/src/backend/utils/init/miscinit.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/utils/init/miscinit.c 2014-08-13 17:23:35.000000000 -0400
@@ -47,6 +47,7 @@
#define DIRECTORY_LOCK_FILE "postmaster.pid"
+#define DIRECTORY_LOCK_FCNTL "postmaster.fcntl"
ProcessingMode Mode = InitProcessing;
@@ -1257,3 +1258,64 @@ pg_bindtextdomain(const char *domain)
}
#endif
}
+
+
+/*
+ * Create a lock using fcntl.
+ *
+ */
+void
+CreateDataDirLockFcntl(bool amPostmaster)
+{
+ int fd = -1;
+ struct flock fl;
+
+ fd = open(DIRECTORY_LOCK_FCNTL, O_RDWR | O_CREAT, 0600);
+ if (fd < 0)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not open fcntl lock file \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+ }
+
+ if (amPostmaster)
+ {
+ /* postmaster must start with exclusive lock to ensure no locks exist from old processes. */
+ fl.l_type = F_WRLCK;
+ fl.l_whence = SEEK_SET;
+ fl.l_start = 0;
+ fl.l_len = 0;
+
+ if (fcntl(fd, F_SETLK, &fl) == -1)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not exclusive lock \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+
+ }
+
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] exclusive lock OK \"%s\"", getpid(), DIRECTORY_LOCK_FCNTL)));
+
+ /* postmaster continues to downgrade to read lock for lifetime of process, allowing other processes obtain read locks */
+ }
+
+ /* obtain read lock, to be held for lifetime of process. */
+ fl.l_type = F_RDLCK;
+ fl.l_whence = SEEK_SET;
+ fl.l_start = 0;
+ fl.l_len = 0;
+
+ if (fcntl(fd, F_SETLK, &fl) == -1)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not share lock \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+ }
+
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] share lock OK \"%s\"", getpid(), DIRECTORY_LOCK_FCNTL)));
+}
diff -rdup postgresql-9.4beta2/src/include/miscadmin.h postgresql-9.4beta2_qnx/src/include/miscadmin.h
--- postgresql-9.4beta2/src/include/miscadmin.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/include/miscadmin.h 2014-08-13 17:17:19.000000000 -0400
@@ -425,6 +425,7 @@ extern char *local_preload_libraries_str
#define LOCK_FILE_LINE_SHMEM_KEY 7
extern void CreateDataDirLockFile(bool amPostmaster);
+extern void CreateDataDirLockFcntl(bool amPostmaster);
extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
const char *socketDir);
extern void TouchSocketLockFiles(void);
"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
I assume you guys are working on other priorities, so I did some locking experiments on QNX.
I know fcntl() locking has downsides, but I think it deserves a second look:
- it is POSIX, so should be fairly consistent across platforms (at least more consistent than lockf and flock)
- the "accidental" open/close lock release can be easily avoided (simply don't add new code which touches the new, unique lock file)
I guess you didn't read the previous discussion. Asserting that it's
"easy to avoid" an accidental unlock doesn't make it true. In the case of
a PG backend, we have to expect that people will run random code inside,
say, plperlu or plpythonu functions. And it doesn't seem unlikely that
someone might scan the entire PGDATA directory tree as part of, for
example, a backup or archiving operation. If we had full control of
everything that ever happens in a PG backend process then *maybe* we could
have adequate confidence that we'd never lose the lock, but we don't.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom,
I appreciate your patience and explanation. (I am new to PostgreSQL hacking. I have read many old posts but not all of it sticks, sorry).
I know QNX support is not high on your TODO list, so I am trying to keep the effort moving without being a distraction.
Couldn't backend "random code" corrupt any file in the PGDATA dir?
Perhaps the new fcntl lock file could be kept outside PGDATA directory tree to make likelihood of backend "random code" interference remote.
This could be present and used only on systems without System V shared memory (QNX), leaving existing platforms unaffected.
I know this falls short of perfect, but perhaps is good enough to get the QNX port off the ground.
I would rather have a QNX port with reasonable restrictions than no port at all.
Also, I will try to experiment with named pipe locking as Robert had suggested.
Thanks again for your feedback, I really do appreciate it.
-Keith Baker
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, August 13, 2014 7:05 PM
To: Baker, Keith [OCDUS Non-J&J]
Cc: Robert Haas; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL"Baker, Keith [OCDUS Non-J&J]" <KBaker9@its.jnj.com> writes:
I assume you guys are working on other priorities, so I did some locking
experiments on QNX.
I know fcntl() locking has downsides, but I think it deserves a second look:
- it is POSIX, so should be fairly consistent across platforms (at
least more consistent than lockf and flock)
- the "accidental" open/close lock release can be easily avoided
(simply don't add new code which touches the new, unique lock file)I guess you didn't read the previous discussion. Asserting that it's "easy to
avoid" an accidental unlock doesn't make it true. In the case of a PG
backend, we have to expect that people will run random code inside, say,
plperlu or plpythonu functions. And it doesn't seem unlikely that someone
might scan the entire PGDATA directory tree as part of, for example, a
backup or archiving operation. If we had full control of everything that ever
happens in a PG backend process then *maybe* we could have adequate
confidence that we'd never lose the lock, but we don't.regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom and Robert,
I tried a combination of PIPE lock and file lock (fcntl) as Tom had suggested.
Attached experimental patch has this logic...
Postmaster :
- get exclusive fcntl lock (to guard against race condition in PIPE-based lock)
- check PIPE for any existing readers
- open PIPE for read
All other backends:
- get shared fcnlt lock
- open PIPE for read
Your feedback is appreciated.
Thanks.
-Keith Baker
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Tom Lane
Sent: Wednesday, July 30, 2014 11:02 AM
To: Robert Haas
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLRobert Haas <robertmhaas@gmail.com> writes:
On Tue, Jul 29, 2014 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. That particular protocol is broken: two postmasters doing it at
the same time would both pass (because neither has it open for read
at the instant where they try to write). But we could possibly frob
the idea until it works. Bigger question is how portable is this behavior?
I see named pipes (fifos) in SUS v2, which is our usual baseline
assumption about what's portable across Unixen, so maybe it wouldwork.
But does NFS support named pipes?
Looks iffy, on a quick search. Sigh.
I poked around, and it seems like a lot of the people who think it's flaky are
imagining that they should be able to use a named pipe on an NFS server to
pass data between two different machines. That doesn't work, but it's not
what we need, either. For communication between processes on the same
server, all that's needed is that the filesystem entry looks like a pipe to the
local kernel --- and that's been required NFS functionality since RFC1813 (v3,
in 1995).So it seems like we could possibly go this route, assuming we can think of a
variant of your proposal that's race-condition-free. A disadvantage
compared to a true file lock is that it would not protect against people trying
to start postmasters from two different NFS client machines --- but we don't
have protection against that now. (Maybe we could do this *and* do a
regular file lock to offer some protection against that case, even if it's not
bulletproof?)regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
locking_20140814.patchapplication/octet-stream; name=locking_20140814.patchDownload
diff -rdup postgresql-9.4beta2/src/backend/bootstrap/bootstrap.c postgresql-9.4beta2_qnx/src/backend/bootstrap/bootstrap.c
--- postgresql-9.4beta2/src/backend/bootstrap/bootstrap.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/bootstrap/bootstrap.c 2014-08-14 11:02:14.000000000 -0400
@@ -352,6 +352,11 @@ AuxiliaryProcessMain(int argc, char *arg
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
+
+ /* obtain shared file-based locks (or FATAL error) */
+ CreateDataDirLockFiles(false);
+
+
/* Initialize MaxBackends (if under postmaster, was done already) */
if (!IsUnderPostmaster)
InitializeMaxBackends();
diff -rdup postgresql-9.4beta2/src/backend/postmaster/postmaster.c postgresql-9.4beta2_qnx/src/backend/postmaster/postmaster.c
--- postgresql-9.4beta2/src/backend/postmaster/postmaster.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/postmaster/postmaster.c 2014-08-14 11:03:25.000000000 -0400
@@ -881,6 +881,9 @@ PostmasterMain(int argc, char *argv[])
*/
CreateDataDirLockFile(true);
+ /* postmaster gets exclusive file-based locks then holds shared lock (or FATAL error) */
+ CreateDataDirLockFiles(true);
+
/*
* Initialize SSL library, if specified.
*/
diff -rdup postgresql-9.4beta2/src/backend/tcop/postgres.c postgresql-9.4beta2_qnx/src/backend/tcop/postgres.c
--- postgresql-9.4beta2/src/backend/tcop/postgres.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/tcop/postgres.c 2014-08-14 11:01:31.000000000 -0400
@@ -3679,6 +3679,9 @@ PostgresMain(int argc, char *argv[],
/* Initialize MaxBackends (if under postmaster, was done already) */
InitializeMaxBackends();
}
+
+ /* obtain shared file-based locks (or FATAL error) */
+ CreateDataDirLockFiles(false);
/* Early initialization */
BaseInit();
diff -rdup postgresql-9.4beta2/src/backend/utils/init/miscinit.c postgresql-9.4beta2_qnx/src/backend/utils/init/miscinit.c
--- postgresql-9.4beta2/src/backend/utils/init/miscinit.c 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/backend/utils/init/miscinit.c 2014-08-14 11:19:14.000000000 -0400
@@ -47,6 +47,8 @@
#define DIRECTORY_LOCK_FILE "postmaster.pid"
+#define DIRECTORY_LOCK_FCNTL "postmaster.fcntl"
+#define DIRECTORY_LOCK_PIPE "postmaster.pipe"
ProcessingMode Mode = InitProcessing;
@@ -1257,3 +1259,131 @@ pg_bindtextdomain(const char *domain)
}
#endif
}
+
+
+/*
+ * Create a lock using fcntl.
+ *
+ */
+void
+CreateDataDirLockFcntl(bool amPostmaster)
+{
+ int fd = -1;
+ struct flock fl;
+
+ fd = open(DIRECTORY_LOCK_FCNTL, O_RDWR | O_CREAT, 0600);
+ if (fd < 0)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not open fcntl lock file \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+ }
+
+ if (amPostmaster)
+ {
+ /* postmaster must start with exclusive lock to ensure no locks exist from old processes. */
+ fl.l_type = F_WRLCK;
+ fl.l_whence = SEEK_SET;
+ fl.l_start = 0;
+ fl.l_len = 0;
+
+ if (fcntl(fd, F_SETLK, &fl) == -1)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not exclusive lock \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+
+ }
+
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] exclusive lock OK \"%s\"", getpid(), DIRECTORY_LOCK_FCNTL)));
+
+ /* postmaster continues to downgrade to read lock for lifetime of process, allowing other processes obtain read locks */
+ }
+
+ /* obtain read lock, to be held for lifetime of process. */
+ fl.l_type = F_RDLCK;
+ fl.l_whence = SEEK_SET;
+ fl.l_start = 0;
+ fl.l_len = 0;
+
+ if (fcntl(fd, F_SETLK, &fl) == -1)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not share lock \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_FCNTL)));
+ }
+
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] share lock OK \"%s\"", getpid(), DIRECTORY_LOCK_FCNTL)));
+}
+
+/*
+ * Create a lock using pipe.
+ *
+ */
+void
+CreateDataDirLockPipe(bool amPostmaster)
+{
+ int fd_read = -1;
+ int fd_write = -1;
+
+ /* create the pipe if not there */
+ mkfifo(DIRECTORY_LOCK_PIPE, 0600);
+
+ if (amPostmaster)
+ {
+ /* try to open pipe for write nonblock. error with ENXIO indicates no readers, which is good. */
+ fd_write = open(DIRECTORY_LOCK_PIPE, O_WRONLY | O_NONBLOCK);
+
+ if (!((fd_write < 0) && (errno == ENXIO)))
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] no ENXIO upon open for write PIPE lock file \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_PIPE)));
+ }
+
+ if (fd_write > -1)
+ {
+ close(fd_write);
+ }
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] ENXIO check OK \"%s\"", getpid(), DIRECTORY_LOCK_PIPE)));
+ }
+
+ /* open pipe for read forever, so other processes know someone is active */
+ fd_read = open(DIRECTORY_LOCK_PIPE, O_RDONLY | O_NONBLOCK);
+
+ if (fd_read < 0)
+ {
+ ereport(FATAL,
+ (errcode_for_file_access(),
+ errmsg("KBAKER process [%d] could not open for read PIPE lock file \"%s\": %m",
+ getpid(), DIRECTORY_LOCK_PIPE)));
+ }
+
+ ereport(LOG,
+ (errmsg("KBAKER: process [%d] open for read OK \"%s\"", getpid(), DIRECTORY_LOCK_PIPE)));
+
+}
+
+/*
+ * Create lock files
+ *
+ */
+void
+CreateDataDirLockFiles(bool amPostmaster)
+{
+ /* first obtain fcntl-based lock.
+ * Look could be accidentally broken by random backend code open/close of the lock file,
+ * but should hold until postmaster obtains secure pipe-based lock.
+ * This prevents race condition of 2 postmasters starting at same time. */
+ CreateDataDirLockFcntl(amPostmaster);
+
+ /* now create the more secure pipe-based lock */
+ CreateDataDirLockPipe(amPostmaster);
+}
diff -rdup postgresql-9.4beta2/src/include/miscadmin.h postgresql-9.4beta2_qnx/src/include/miscadmin.h
--- postgresql-9.4beta2/src/include/miscadmin.h 2014-07-21 15:07:50.000000000 -0400
+++ postgresql-9.4beta2_qnx/src/include/miscadmin.h 2014-08-14 10:45:29.000000000 -0400
@@ -425,6 +425,9 @@ extern char *local_preload_libraries_str
#define LOCK_FILE_LINE_SHMEM_KEY 7
extern void CreateDataDirLockFile(bool amPostmaster);
+extern void CreateDataDirLockFiles(bool amPostmaster);
+extern void CreateDataDirLockFcntl(bool amPostmaster);
+extern void CreateDataDirLockPipe(bool amPostmaster);
extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
const char *socketDir);
extern void TouchSocketLockFiles(void);
On Thu, Aug 14, 2014 at 12:08 PM, Baker, Keith [OCDUS Non-J&J]
<KBaker9@its.jnj.com> wrote:
I tried a combination of PIPE lock and file lock (fcntl) as Tom had suggested.
Attached experimental patch has this logic...Postmaster :
- get exclusive fcntl lock (to guard against race condition in PIPE-based lock)
- check PIPE for any existing readers
- open PIPE for readAll other backends:
- get shared fcnlt lock
- open PIPE for read
Hmm. This seems like it might almost work. But I don't see why the
other backends need to care about fcntl() at all. How about this
locking protocol:
Postmaster:
1. Acquire an exclusive lock on some file in the data directory, maybe
the control file, using fcntl().
2. Open the named pipe for read.
3. Open the named pipe for write.
4. Close the named pipe for read.
5. Install a signal handler for SIGPIPE which sets a global variable.
6. Try to write to the pipe.
7. Check that the variable is set; if not, FATAL.
8. Revert SIGPIPE handler.
9. Close the named pipe for write.
10. Open the named pipe for read.
11. Release the fcntl() lock acquired in step 1.
Regular backends don't need to do anything special, except that they
need to make sure that the file descriptor opened in step 8 gets
inherited by the right set of processes. That means that the
close-on-exec flag should be turned on in the postmaster; except in
EXEC_BACKEND builds, where it should be turned off but then turned on
again by child processes before they do anything that might fork.
It's impossible for two postmasters to start up at the same time
because the fcntl() lock acquired at step 1 will block any
newly-arriving postmaster until step 11 is completel. The
first-to-close semantics of fcntl() aren't a problem for this purpose
because we only execute a very limited amount of code over which we
have full control while holding the lock. By the time the postmaster
that gets the lock first completes step 10, any later-arriving
postmaster is guaranteed to fall out at step 7 while that postmaster
or any children who inherit the pipe descriptor remain alive. No
process holds any resource that will survive its exit, so cleanup is
fully automatic.
This seems solid to me, but watch somebody find a problem with it...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
How about this locking protocol:
Postmaster:
1. Acquire an exclusive lock on some file in the data directory, maybe
the control file, using fcntl().
2. Open the named pipe for read.
3. Open the named pipe for write.
4. Close the named pipe for read.
5. Install a signal handler for SIGPIPE which sets a global variable.
6. Try to write to the pipe.
7. Check that the variable is set; if not, FATAL.
8. Revert SIGPIPE handler.
9. Close the named pipe for write.
10. Open the named pipe for read.
11. Release the fcntl() lock acquired in step 1.
Hm, this seems like it would work. A couple other thoughts:
* I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN
(which is its usual setting in the postmaster already) and check for
EPIPE from the write().
* There might be some benefit to swapping steps 9 and 10; at the
very least, this would eliminate the need to use O_NONBLOCK while
re-opening for read.
* We talked about combining this technique with a plain file lock
so that we would have belt-and-suspenders protection, in particular
something that would have a chance of working across NFS clients.
This would suggest leaving the fcntl lock in place, ie, don't do
step 11, and also that the file-to-be-locked *not* have any other
purpose (which would only increase the risk of losing the lock
through careless open/close).
Regular backends don't need to do anything special, except that they
need to make sure that the file descriptor opened in step 8 gets
inherited by the right set of processes. That means that the
close-on-exec flag should be turned on in the postmaster; except in
EXEC_BACKEND builds, where it should be turned off but then turned on
again by child processes before they do anything that might fork.
Meh. Do we really want to allow a new postmaster to start if there
are any processes remaining that were launched by backends? I'd
be inclined to just suppress close-on-exec, period.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 15, 2014 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN
(which is its usual setting in the postmaster already) and check for
EPIPE from the write().
wfm.
* There might be some benefit to swapping steps 9 and 10; at the
very least, this would eliminate the need to use O_NONBLOCK while
re-opening for read.
Also wfm.
* We talked about combining this technique with a plain file lock
so that we would have belt-and-suspenders protection, in particular
something that would have a chance of working across NFS clients.
This would suggest leaving the fcntl lock in place, ie, don't do
step 11, and also that the file-to-be-locked *not* have any other
purpose (which would only increase the risk of losing the lock
through careless open/close).
I'd be afraid that a secondary mechanism that mostly-but-not-really
works could do more harm by allowing us to miss bugs in the primary,
pipe-based locking mechanism than the good it would accomplish.
Regular backends don't need to do anything special, except that they
need to make sure that the file descriptor opened in step 8 gets
inherited by the right set of processes. That means that the
close-on-exec flag should be turned on in the postmaster; except in
EXEC_BACKEND builds, where it should be turned off but then turned on
again by child processes before they do anything that might fork.Meh. Do we really want to allow a new postmaster to start if there
are any processes remaining that were launched by backends? I'd
be inclined to just suppress close-on-exec, period.
Seems like a pretty weird and artificial restriction. Anything that
has done exec() will not be connected to shared memory, so it really
doesn't matter whether it's still alive or not. People can and do
write extensions that launch processes from PostgreSQL backends via
fork()+exec(), and we've taken pains in the past not to break such
cases. I don't see a reason to impose now (for no
data-integrity-related reason) the rule that any such processes must
not be daemons.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Nice algorithm.
On Fri, Aug 15, 2014 at 02:16:08PM -0400, Robert Haas wrote:
On Fri, Aug 15, 2014 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* We talked about combining this technique with a plain file lock
so that we would have belt-and-suspenders protection, in particular
something that would have a chance of working across NFS clients.
This would suggest leaving the fcntl lock in place, ie, don't do
step 11, and also that the file-to-be-locked *not* have any other
purpose (which would only increase the risk of losing the lock
through careless open/close).I'd be afraid that a secondary mechanism that mostly-but-not-really
works could do more harm by allowing us to miss bugs in the primary,
pipe-based locking mechanism than the good it would accomplish.
Users do corrupt their NFS- and GFS2-hosted databases today. I would rather
have each process hold only an fcntl() lock than hold only the FIFO file
descriptor. There's no such dichotomy, so let's have both.
Meh. Do we really want to allow a new postmaster to start if there
are any processes remaining that were launched by backends? I'd
be inclined to just suppress close-on-exec, period.Seems like a pretty weird and artificial restriction. Anything that
has done exec() will not be connected to shared memory, so it really
doesn't matter whether it's still alive or not. People can and do
write extensions that launch processes from PostgreSQL backends via
fork()+exec(), and we've taken pains in the past not to break such
cases. I don't see a reason to impose now (for no
data-integrity-related reason) the rule that any such processes must
not be daemons.
+1
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 16, 2014 at 3:28 AM, Noah Misch <noah@leadboat.com> wrote:
Nice algorithm.
Thanks.
I'd be afraid that a secondary mechanism that mostly-but-not-really
works could do more harm by allowing us to miss bugs in the primary,
pipe-based locking mechanism than the good it would accomplish.Users do corrupt their NFS- and GFS2-hosted databases today. I would rather
have each process hold only an fcntl() lock than hold only the FIFO file
descriptor. There's no such dichotomy, so let's have both.
Meh. We can do that, but I think that will provide us with only the
it-works-until-it-doesn't level of protection. Granted, that's more
than zero, but does anyone advocate wearing seatbelts for the first 60
minutes you're in the car and then taking them off after that? I
think that with a sufficiently long-running server the chances of the
lock somehow getting released approach certainty. But I'm not going
to fight this one tooth and nail.
A bigger question in my view is what to do with the existing
mechanism. The main advantage of making a change like this is that we
could finally dispense with System V shared memory completely. But we
risk encountering systems where the battle-tested System V mechanism
works and this new one either fails to work at all (server won't
start) or fails to work as desired (interlock broken). So it's
tempting to think we should have a GUC or control-file setting to
control which mechanism gets used. Of course for QNX, the actual
subject of this thread, System V won't be an option, but other people
might like a big red button they can push if the new code turns out to
be less than we're hoping.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 18, 2014 at 09:01:20AM -0400, Robert Haas wrote:
On Sat, Aug 16, 2014 at 3:28 AM, Noah Misch <noah@leadboat.com> wrote:
I'd be afraid that a secondary mechanism that mostly-but-not-really
works could do more harm by allowing us to miss bugs in the primary,
pipe-based locking mechanism than the good it would accomplish.Users do corrupt their NFS- and GFS2-hosted databases today. I would rather
have each process hold only an fcntl() lock than hold only the FIFO file
descriptor. There's no such dichotomy, so let's have both.Meh. We can do that, but I think that will provide us with only the
it-works-until-it-doesn't level of protection. Granted, that's more
than zero, but does anyone advocate wearing seatbelts for the first 60
minutes you're in the car and then taking them off after that? I
think that with a sufficiently long-running server the chances of the
lock somehow getting released approach certainty. But I'm not going
to fight this one tooth and nail.
In case it wasn't clear, I advocate both using the FIFO defense and holding
fcntl locks throughout the life of every PostgreSQL process having a shared
memory attachment. I grant that this raises the chance of a shortcoming in
one mechanism remaining undiscovered. However, we already know that each by
itself has limitations. I don't like the prospect of accepting a known hole
to help discover unknown holes.
We could have the would-be new postmaster, when it hits a fcntl lock conflict,
proceed with the FIFO check anyway. If the FIFO check says "go" after the
fcntl check said "stop", emit a message about the apparent bug. (That's
oversimplified; it needs looping to account for the case of the old postmaster
exiting concurrently.)
A bigger question in my view is what to do with the existing
mechanism. The main advantage of making a change like this is that we
could finally dispense with System V shared memory completely. But we
risk encountering systems where the battle-tested System V mechanism
works and this new one either fails to work at all (server won't
start) or fails to work as desired (interlock broken). So it's
tempting to think we should have a GUC or control-file setting to
control which mechanism gets used. Of course for QNX, the actual
subject of this thread, System V won't be an option, but other people
might like a big red button they can push if the new code turns out to
be less than we're hoping.
A GUC sounds fine to me, as would using the sysv interlock unconditionally for
a couple more releases before removing it.
Thanks,
nm
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert, Tom, and others,
Glad to see good discussion and progress on the locking topic!
My proof of concept code (steps a though e below) avoided any reading or writing to the pipe (and associated handling of SIGPIPE), it just relied on postmaster open of PIPE with ENXIO to indicate all is clear.
Trying to keep things simple, I created 1 function for fcntl locks, 1 function for PIPE locks, and a wrapper that called both in sequence (wrapper is called by the Backend mains).
I agree that "d." could be omitted, but I thought better to be conservative and has all processes obtain fcntl and PIPE locks.
Is there a gap that a-e does not cover? (Sorry, not clear to me).
Postmaster :
a. get exclusive fcntl lock (to guard against race condition in PIPE-based lock)
b. check PIPE for any existing readers
+ fd_write = open(DIRECTORY_LOCK_PIPE, O_WRONLY | O_NONBLOCK);
+ if (!((fd_write < 0) && (errno == ENXIO))) ereport(FATAL,
+ if (fd_write > -1) close(fd_write);
c. open PIPE for read
+ fd_read = open(DIRECTORY_LOCK_PIPE, O_RDONLY | O_NONBLOCK);
All other backends:
d. get shared fcnlt lock
e. open PIPE for read
+ fd_read = open(DIRECTORY_LOCK_PIPE, O_RDONLY | O_NONBLOCK);
Just my 2 cents, I am happy with whatever solution you find agreeable.
My assumptions:
1. Platforms without System V shared memory (QNX) would use POSIX shared memory and file-based (fcntl+pipe) locks.
2. Existing platforms would continue to rely of System V shared memory and its proven locking by default (perhaps with option use all POSIX shared memory and file-based locks instead, at your discretion).
Robert, Assuming an algorithm choice is agreed upon in the near future, would you be the logical choice to implement the change?
I am happy to help, especially with any QNX-specific aspects, but don't want to step on anyone's toes.
Thanks.
Keith Baker
-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, August 15, 2014 2:16 PM
To: Tom Lane
Cc: Baker, Keith [OCDUS Non-J&J]; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLOn Fri, Aug 15, 2014 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN
(which is its usual setting in the postmaster already) and check for
EPIPE from the write().wfm.
* There might be some benefit to swapping steps 9 and 10; at the very
least, this would eliminate the need to use O_NONBLOCK while
re-opening for read.Also wfm.
* We talked about combining this technique with a plain file lock so
that we would have belt-and-suspenders protection, in particular
something that would have a chance of working across NFS clients.
This would suggest leaving the fcntl lock in place, ie, don't do step
11, and also that the file-to-be-locked *not* have any other purpose
(which would only increase the risk of losing the lock through
careless open/close).I'd be afraid that a secondary mechanism that mostly-but-not-really works
could do more harm by allowing us to miss bugs in the primary, pipe-based
locking mechanism than the good it would accomplish.Regular backends don't need to do anything special, except that they
need to make sure that the file descriptor opened in step 8 gets
inherited by the right set of processes. That means that the
close-on-exec flag should be turned on in the postmaster; except in
EXEC_BACKEND builds, where it should be turned off but then turned on
again by child processes before they do anything that might fork.Meh. Do we really want to allow a new postmaster to start if there
are any processes remaining that were launched by backends? I'd be
inclined to just suppress close-on-exec, period.Seems like a pretty weird and artificial restriction. Anything that has done
exec() will not be connected to shared memory, so it really doesn't matter
whether it's still alive or not. People can and do write extensions that launch
processes from PostgreSQL backends via fork()+exec(), and we've taken
pains in the past not to break such cases. I don't see a reason to impose now
(for no data-integrity-related reason) the rule that any such processes must
not be daemons.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-J&J]
<KBaker9@its.jnj.com> wrote:
My proof of concept code (steps a though e below) avoided any reading or writing to the pipe (and associated handling of SIGPIPE), it just relied on postmaster open of PIPE with ENXIO to indicate all is clear.
I'm not following.
Robert, Assuming an algorithm choice is agreed upon in the near future, would you be the logical choice to implement the change?
I am happy to help, especially with any QNX-specific aspects, but don't want to step on anyone's toes.
I'm unlikely to have time to work on this in the immediate future, but
I may be able to help review.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert and Tom,
Sorry for any confusion, I will try to clarify.
Here is progression of events as I recall them:
- My Initial QNX 6.5 port proposal lacked a robust replacement for the existing System V shared memory locking mechanism, a show stopper.
- Robert proposed a nice set of possible alternatives for locking (to enable an all POSIX shared memory solution for future platforms).
- Tom and Robert seemed to agree that a combination of file-based locking plus pipe-based locking should be a sufficiently robust on platforms without Sys V shared memory (e.g., QNX).
- I coded a proof-of-concept patch (fcntl + PIPE) which appeared to work on QNX (steps a through e).
- Robert countered with an 11 step algorithm (all in the postmaster)
- Tom suggested elimination of steps 5,6,7,8, and 11 (and swapping order 9 and 10)
I was just taking a step back to ask what gaps existed in the proof-of-concept patch (steps a through e).
Is there a scenario it fails to cover, prompting the seemingly more complex 11 step algorithm (which added writing data to the pipe and handling of SIGPIPE)?
I am willing to attempt coding of the set of changes for a QNX port (option for new locking and all POSIX shared memory, plus a few minor QNX-specific tweaks), provided you and Tom are satisfied that the show stoppers have been sufficiently addressed.
Please let me know if more discussion is required, or if it would be reasonable for me (or someone else of your choosing) to work on the coding effort (perhaps targeted for 9.5?)
If on the other hand it has been decided that a QNX port is not in the cards, I would like to know (I hope that is not the case given the progress made, but no point in wasting anyone's time).
Thanks again for your time, effort, patience, and coaching.
Keith Baker
-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Wednesday, August 20, 2014 12:26 PM
To: Baker, Keith [OCDUS Non-J&J]
Cc: Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLOn Mon, Aug 18, 2014 at 11:02 AM, Baker, Keith [OCDUS Non-J&J]
<KBaker9@its.jnj.com> wrote:My proof of concept code (steps a though e below) avoided any reading or
writing to the pipe (and associated handling of SIGPIPE), it just relied on
postmaster open of PIPE with ENXIO to indicate all is clear.I'm not following.
Robert, Assuming an algorithm choice is agreed upon in the near future,
would you be the logical choice to implement the change?
I am happy to help, especially with any QNX-specific aspects, but don't
want to step on anyone's toes.
I'm unlikely to have time to work on this in the immediate future, but I may
be able to help review.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Baker, Keith [OCDUS Non-J&J] wrote:
Please let me know if more discussion is required, or if it would be
reasonable for me (or someone else of your choosing) to work on the
coding effort (perhaps targeted for 9.5?)
If on the other hand it has been decided that a QNX port is not in the
cards, I would like to know (I hope that is not the case given the
progress made, but no point in wasting anyone's time).
As I recall, other than the postmaster startup interlock, the other
major missing item you mentioned is SA_RESTART. That could well turn
out to be a showstopper, so I suggest you study that in more depth.
Are there other major items missing? Did you have to use
configure --disable-spinlocks for instance?
What's your compiler, and what are the underlying hardware platforms you
want to support?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro,
Thanks for your interest and questions.
At this point I have created a proof-of-concept QNX 6.5 port which appears to work on the surface (passes regression tests), but needs to be deemed "production-quality".
To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h
With these macros in place "make check" runs cleanly (fails in many place without them).
+#if defined(__QNX__)
+/* QNX does not support sigaction SA_RESTART. We must retry interrupted calls (EINTR) */
+/* Helper macros, used to build our retry macros */
+#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; })
+#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int)
+#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *)
+/* override calls known to return EINTR when interrupted */
+#define close(a) PG_RETRY_EINTR(close(a))
+#define fclose(a) PG_RETRY_EINTR(fclose(a))
+#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b))
+#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b))
+#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c))
+#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c))
+#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c))
+#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b))
+#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c))
+#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; })
+#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c))
+#define stat(a,b) PG_RETRY_EINTR(stat(a,b))
+#define unlink(a) PG_RETRY_EINTR(unlink(a))
... (Macros for read and write are similar but slightly longer, so I omit them here)...
+#endif /* __QNX__ */
Here is what I used for configure, I am open to suggestions:
./configure --without-readline --disable-thread-safety
I am targeting QNX 6.5 on x86, using gcc 4.4.2.
Also, I have an issue to work out for locale support, but expect I can solve that.
Keith Baker
-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]
Sent: Wednesday, August 20, 2014 4:16 PM
To: Baker, Keith [OCDUS Non-J&J]
Cc: Robert Haas; Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLBaker, Keith [OCDUS Non-J&J] wrote:
Please let me know if more discussion is required, or if it would be
reasonable for me (or someone else of your choosing) to work on the
coding effort (perhaps targeted for 9.5?) If on the other hand it has
been decided that a QNX port is not in the cards, I would like to know
(I hope that is not the case given the progress made, but no point in
wasting anyone's time).As I recall, other than the postmaster startup interlock, the other major
missing item you mentioned is SA_RESTART. That could well turn out to be a
showstopper, so I suggest you study that in more depth.Are there other major items missing? Did you have to use configure --
disable-spinlocks for instance?What's your compiler, and what are the underlying hardware platforms you
want to support?--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2014-08-20 21:21:41 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
To work around lack of SA_RESTART, I added QNX-specific retry macros to port.h
With these macros in place "make check" runs cleanly (fails in many place without them).+#if defined(__QNX__) +/* QNX does not support sigaction SA_RESTART. We must retry interrupted calls (EINTR) */ +/* Helper macros, used to build our retry macros */ +#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc = (exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; }) +#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int) +#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE *) +/* override calls known to return EINTR when interrupted */ +#define close(a) PG_RETRY_EINTR(close(a)) +#define fclose(a) PG_RETRY_EINTR(fclose(a)) +#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b)) +#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b)) +#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c)) +#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c)) +#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c)) +#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b)) +#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c)) +#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc = open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR); _tmp_rc; }) +#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c)) +#define stat(a,b) PG_RETRY_EINTR(stat(a,b)) +#define unlink(a) PG_RETRY_EINTR(unlink(a)) ... (Macros for read and write are similar but slightly longer, so I omit them here)... +#endif /* __QNX__ */
I think this is a horrible way to go and unlikely to succeed. You're
surely going to miss calls and it's going to need to be maintained
continuously. We'll miss adding things which will then only break under
load. Which most poeple won't be able to generate under qnx.
The only reasonably way to fake kernel SA_RESTART support is doing so is
in $platform's libc. In the syscall wrapper.
Here is what I used for configure, I am open to suggestions:
./configure --without-readline --disable-thread-safety
Why is the --disable-thread-safety needed?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-07-25 18:29:53 -0400, Tom Lane wrote:
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
That's pretty scary too. For one thing, such macros would affect every
call site whether it's running with SA_RESTART or not. Do you really
need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if
you don't have SA_RESTART. Maybe that code has bit-rotted by now, but
it did work at one time.
I have pretty much no trust that we're maintaining
!HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I
seriously doubt there's any !HAVE_POSIX_SIGNAL animals and
873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely
that we have much chance of finding such mistakes during development.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Andres,
Thanks for your response.
About SA_RESTART:
------------------------
I would like to offer you a different perspective which may alter your current opinion.
I believe the port.h QNX macro replacement for SA_RESTART is still a reasonable solution on QNX for these reasons:
First, I think it is better to adapt PostgreSQL to suit the platform than to adapt the platform to suit PostgreSQL.
Changing default behavior of libc on QNX to suit PostgreSQL may break other applications which rely on the current behavior of libc.
Yes, I could forget to add a port.h macro for a given interruptible primitive, but I could likewise forget to update the wrapper for that call in a custom libc.
I requested that QNX support provide me a list of interruptible primitives, but I was able to identify many by searching through the QNX help.
Definition of a new interruptible primitive is a rare event, so once a solid list of macros is in place for QNX, it should need very little maintenance.
If you have any specific calls you believe are missing from my list of macros, I would be happy to add them.
port.h is included in c.h, which is in postgres.h, so the QNX macros should be effective for all QNX PostgreSQL compiles.
If it were not, no one could reply on any port.h features on any platform.
Testing so far has demonstrated that the macro fixes are effective on QNX. Repeated runs of the regression tests run cleanly.
More testing will be required to boost the confidence and expose any gaps, but the foundation appears to be solid.
The first release on any platform has risk of defects, which can be corrected once identified.
I would expect that a first release on any platform would include a warning or disclaimer stating that it is new port.
Lastly, the QNX-specific section added to port.h appears to solve the SA_RESTART issue for QNX, while having no impact on compiles of existing platforms.
About configure:
--------------------
"./configure" barked at 2 things on QNX, and it advised using both "--without-readline --disable-thread-safety".
I can investigate further, but I have been focusing on the bigger issues first.
I hope the explanations above address your main concerns.
Again, thanks for your response!
Keith Baker
-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Wednesday, August 20, 2014 7:25 PM
To: Baker, Keith [OCDUS Non-J&J]
Cc: Alvaro Herrera; Robert Haas; Tom Lane; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLHi,
On 2014-08-20 21:21:41 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
To work around lack of SA_RESTART, I added QNX-specific retry macros
to port.h With these macros in place "make check" runs cleanly (fails inmany place without them).
+#if defined(__QNX__) +/* QNX does not support sigaction SA_RESTART. We must retryinterrupted calls (EINTR) */
+/* Helper macros, used to build our retry macros */ +#define PG_RETRY_EINTR3(exp,val,type) ({ type _tmp_rc; do _tmp_rc =(exp); while (_tmp_rc == (val) && errno == EINTR); _tmp_rc; })
+#define PG_RETRY_EINTR(exp) PG_RETRY_EINTR3(exp,-1L,long int) +#define PG_RETRY_EINTR_FILE(exp) PG_RETRY_EINTR3(exp,NULL,FILE*)
+/* override calls known to return EINTR when interrupted */ +#define close(a) PG_RETRY_EINTR(close(a)) +#define fclose(a) PG_RETRY_EINTR(fclose(a)) +#define fdopen(a,b) PG_RETRY_EINTR_FILE(fdopen(a,b)) +#define fopen(a,b) PG_RETRY_EINTR_FILE(fopen(a,b)) +#define freopen(a,b,c) PG_RETRY_EINTR_FILE(freopen(a,b,c)) +#define fseek(a,b,c) PG_RETRY_EINTR(fseek(a,b,c)) +#define fseeko(a,b,c) PG_RETRY_EINTR(fseeko(a,b,c)) +#define ftruncate(a,b) PG_RETRY_EINTR(ftruncate(a,b)) +#define lseek(a,b,c) PG_RETRY_EINTR(lseek(a,b,c)) +#define open(a,b,...) ({ int _tmp_rc; do _tmp_rc =open(a,b,##__VA_ARGS__); while (_tmp_rc == (-1) && errno == EINTR);
_tmp_rc; })+#define shm_open(a,b,c) PG_RETRY_EINTR(shm_open(a,b,c)) +#define stat(a,b) PG_RETRY_EINTR(stat(a,b)) +#define unlink(a) PG_RETRY_EINTR(unlink(a)) ... (Macros for read and write are similar but slightly longer, so I omitthem here)...
+#endif /* __QNX__ */
I think this is a horrible way to go and unlikely to succeed. You're surely going
to miss calls and it's going to need to be maintained continuously. We'll miss
adding things which will then only break under load. Which most poeple
won't be able to generate under qnx.The only reasonably way to fake kernel SA_RESTART support is doing so is in
$platform's libc. In the syscall wrapper.Here is what I used for configure, I am open to suggestions:
./configure --without-readline --disable-thread-safetyWhy is the --disable-thread-safety needed?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Baker, Keith [OCDUS Non-J&J] wrote:
About configure:
--------------------
"./configure" barked at 2 things on QNX, and it advised using both "--without-readline --disable-thread-safety".
I can investigate further, but I have been focusing on the bigger issues first.
I don't think thread-safety is of great concern. The backend is not
multithreaded, and neither are the utilities (I think the only exception
is pgbench, and even there it is optional). The only problem, as I
recall, would be that libpq would not lock things correctly when used in
a multithreaded program. I think you will need to solve this
eventually, but it doesn't look as critical as the others.
I was asking specifically about spinlocks because if you have to use
that switch, it means our spinlock implementation doesn't cover your
platform, and you would need to add something to support native
spinlocks. Since you're using gcc on x86, I assume your port is
choosing an already existing, working implementation.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 21, 2014 at 01:33:38AM +0200, Andres Freund wrote:
On 2014-07-25 18:29:53 -0400, Tom Lane wrote:
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
That's pretty scary too. For one thing, such macros would affect every
call site whether it's running with SA_RESTART or not. Do you really
need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if
you don't have SA_RESTART. Maybe that code has bit-rotted by now, but
it did work at one time.I have pretty much no trust that we're maintaining
!HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I
seriously doubt there's any !HAVE_POSIX_SIGNAL animals and
873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely
that we have much chance of finding such mistakes during development.
I bet it's fine for its intended target, namely BSD-style signal() in which
SA_RESTART-like behavior is implicit. See the src/port/pqsignal.c header
comment. PostgreSQL has no support for V7-style/QNX-style signal().
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-22 01:36:37 -0400, Noah Misch wrote:
On Thu, Aug 21, 2014 at 01:33:38AM +0200, Andres Freund wrote:
On 2014-07-25 18:29:53 -0400, Tom Lane wrote:
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
That's pretty scary too. For one thing, such macros would affect every
call site whether it's running with SA_RESTART or not. Do you really
need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if
you don't have SA_RESTART. Maybe that code has bit-rotted by now, but
it did work at one time.I have pretty much no trust that we're maintaining
!HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I
seriously doubt there's any !HAVE_POSIX_SIGNAL animals and
873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely
that we have much chance of finding such mistakes during development.I bet it's fine for its intended target, namely BSD-style signal() in which
SA_RESTART-like behavior is implicit. See the src/port/pqsignal.c header
comment. PostgreSQL has no support for V7-style/QNX-style signal().
That might be true - although I'm not sure it actually still works - but
my point is that I can't see Tom's suggestion on relying on
!HAVE_POSIX_SIGNALS for QNX work out.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 22, 2014 at 09:34:42AM +0200, Andres Freund wrote:
On 2014-08-22 01:36:37 -0400, Noah Misch wrote:
On Thu, Aug 21, 2014 at 01:33:38AM +0200, Andres Freund wrote:
On 2014-07-25 18:29:53 -0400, Tom Lane wrote:
* QNX lacks sigaction SA_RESTART: I modified "src/include/port.h" to define macros to retry system calls upon EINTR (open,read,write,...) when compiled on QNX
That's pretty scary too. For one thing, such macros would affect every
call site whether it's running with SA_RESTART or not. Do you really
need it? It looks to me like we just turn off HAVE_POSIX_SIGNALS if
you don't have SA_RESTART. Maybe that code has bit-rotted by now, but
it did work at one time.I have pretty much no trust that we're maintaining
!HAVE_POSIX_SIGNAL. And none that we have that capability of doing so. I
seriously doubt there's any !HAVE_POSIX_SIGNAL animals and
873ab97219caabeb2f7b390268a4fe01e2b7518c makes it pretty darn unlikely
that we have much chance of finding such mistakes during development.I bet it's fine for its intended target, namely BSD-style signal() in which
SA_RESTART-like behavior is implicit. See the src/port/pqsignal.c header
comment. PostgreSQL has no support for V7-style/QNX-style signal().That might be true - although I'm not sure it actually still works - but
my point is that I can't see Tom's suggestion on relying on
!HAVE_POSIX_SIGNALS for QNX work out.
True.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2014-08-21 15:25:44 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
About SA_RESTART:
------------------------
I would like to offer you a different perspective which may alter your current opinion.
I believe the port.h QNX macro replacement for SA_RESTART is still a reasonable solution on QNX for these reasons:First, I think it is better to adapt PostgreSQL to suit the platform
than to adapt the platform to suit PostgreSQL.
Well. That might be somewhat true for a popular platform. Which QNX
really isn't. I personally don't believe your approach to be likely to
end up with a correct and maintainable port.
Changing default behavior of libc on QNX to suit PostgreSQL may break
other applications which rely on the current behavior of libc.
I don't see how *adding* SA_RESTART support which would only be used
when SA_RESTART is being passed to sigaction(), would do that.
Yes, I could forget to add a port.h macro for a given interruptible
primitive, but I could likewise forget to update the wrapper for that
call in a custom libc.
I requested that QNX support provide me a list of interruptible
primitives, but I was able to identify many by searching through the
QNX help. Definition of a new interruptible primitive is a rare
event, so once a solid list of macros is in place for QNX, it should
need very little maintenance. If you have any specific calls you
believe are missing from my list of macros, I would be happy to add
them.
I have no idea whether there are any other ones - I don't have access to
a QNX machine, and I don't personally wan't any. The problem is that we
might want to start using new syscalls or QNX might introduce new
interruptible signals. Problems caused by missed interruptible syscalls
won't show during low-load testing like pg_regress. They'll show up
during production usage.
port.h is included in c.h, which is in postgres.h, so the QNX macros
should be effective for all QNX PostgreSQL compiles. If it were not,
no one could reply on any port.h features on any platform.
Yea, that's not a concern I have.
The first release on any platform has risk of defects, which can be
corrected once identified. I would expect that a first release on any
platform would include a warning or disclaimer stating that it is new
port.Lastly, the QNX-specific section added to port.h appears to solve the
SA_RESTART issue for QNX, while having no impact on compiles of
existing platforms.
My problem is that it's ugly hack for a niche paltform that will need to
be maintained for a long while into the future. I don't have a problem
adding support for not that frequently used platforms if the support is
very localized, but that's definitely not the case here.
About configure:
--------------------
"./configure" barked at 2 things on QNX, and it advised using both
"--without-readline --disable-thread-safety". I can investigate
further, but I have been focusing on the bigger issues first.
Yea, those aren't really critical. It'd be interesting to know why the
the thread safety test fails - quite possibly it's just the configure
test for pthreads not being very good.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
Hi,
On 2014-08-21 15:25:44 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
About SA_RESTART:
------------------------
I would like to offer you a different perspective which may alter your current opinion.
I believe the port.h QNX macro replacement for SA_RESTART is still a reasonable solution on QNX for these reasons:First, I think it is better to adapt PostgreSQL to suit the platform
than to adapt the platform to suit PostgreSQL.Well. That might be somewhat true for a popular platform. Which QNX
really isn't. I personally don't believe your approach to be likely to
end up with a correct and maintainable port.Changing default behavior of libc on QNX to suit PostgreSQL may break
other applications which rely on the current behavior of libc.I don't see how *adding* SA_RESTART support which would only be used
when SA_RESTART is being passed to sigaction(), would do that.
I guess the important question here is how much traction does Keith have
with the QNX development group.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I am reaching out to our QNX support contacts today, I will let you know how they respond.
Keith Baker
-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]
Sent: Friday, August 22, 2014 10:42 AM
To: Andres Freund
Cc: Baker, Keith [OCDUS Non-J&J]; Robert Haas; Tom Lane; pgsql-
hackers@postgresql.org
Subject: Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQLAndres Freund wrote:
Hi,
On 2014-08-21 15:25:44 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
About SA_RESTART:
------------------------
I would like to offer you a different perspective which may alter yourcurrent opinion.
I believe the port.h QNX macro replacement for SA_RESTART is still a
reasonable solution on QNX for these reasons:
First, I think it is better to adapt PostgreSQL to suit the platform
than to adapt the platform to suit PostgreSQL.Well. That might be somewhat true for a popular platform. Which QNX
really isn't. I personally don't believe your approach to be likely to
end up with a correct and maintainable port.Changing default behavior of libc on QNX to suit PostgreSQL may
break other applications which rely on the current behavior of libc.I don't see how *adding* SA_RESTART support which would only be used
when SA_RESTART is being passed to sigaction(), would do that.I guess the important question here is how much traction does Keith have
with the QNX development group.--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-08-22 10:41:55 -0400, Alvaro Herrera wrote:
Andres Freund wrote:
Hi,
On 2014-08-21 15:25:44 +0000, Baker, Keith [OCDUS Non-J&J] wrote:
About SA_RESTART:
------------------------
I would like to offer you a different perspective which may alter your current opinion.
I believe the port.h QNX macro replacement for SA_RESTART is still a reasonable solution on QNX for these reasons:First, I think it is better to adapt PostgreSQL to suit the platform
than to adapt the platform to suit PostgreSQL.Well. That might be somewhat true for a popular platform. Which QNX
really isn't. I personally don't believe your approach to be likely to
end up with a correct and maintainable port.Changing default behavior of libc on QNX to suit PostgreSQL may break
other applications which rely on the current behavior of libc.I don't see how *adding* SA_RESTART support which would only be used
when SA_RESTART is being passed to sigaction(), would do that.I guess the important question here is how much traction does Keith have
with the QNX development group.
If you search for SA_RESTART and QNX there's a fair number of bugs
cropping up where it leads to problems... I think a large amount of open
source software essentially relies on it these days.
Note that it doesn't necessarily need to be implemented inside QNX. It
could very well be a wrapper library that you would optionally link
against. That'd benefit more users than just postgres on QNX.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers