backend freezeing on win32 fixed (I hope ;-) )
Hi,
I think I have fixed the freezing of the postgres backend on Windows NT. Now
it survives 5 regression test in a cycle with some concurrent connections
during running the tests.
Where the problem was (manual backtrace):
- InitPostgres() (utils/init/postinit.c)
- InitProcess() (storage/lmgr/proc.c)
- IpcSemaphoreCreate() (storage/ipc/ipc.c)
- semget() (now in libcygipc - sem.c)
- sem_connect() (sem.c)
- WaitForSingleObject() (win32 system call)
- freezing....
It have looked like a problem with initializing the same semaphore for
second time (they are "preinitialized" for performance reasons in
InitProcGlobal() in storage/lmgr/proc.c)
The fix (made for v6.5.1) is here:
-------------------------------
--- src/backend/storage/lmgr/proc.c.old Sat Aug 14 16:50:19 1999
+++ src/backend/storage/lmgr/proc.c Sat Aug 14 16:50:52 1999
@@ -160,6 +160,7 @@
* Pre-create the semaphores for the first maxBackends
processes,
* unless we are running as a standalone backend.
*/
+#ifndef __CYGWIN__
if (key != PrivateIPCKey)
{
for (i = 0;
@@ -180,6 +181,7 @@
ProcGlobal->freeSemMap[i] = (1 <<
PROC_NSEMS_PER_SET);
}
}
+#endif /* __CYGWIN__ */
}
}
-------------------------------
Dan
PS: I have packed the tree after "make install" for 6.5.1 with the patch
above, so it is a "binary distribution".
----------------------------------------------
Daniel Horak
network and system administrator
e-mail: horak@mmp.plzen-city.cz
privat e-mail: dan.horak@email.cz ICQ:36448176
----------------------------------------------
Horak Daniel <horak@mmp.plzen-city.cz> writes:
I think I have fixed the freezing of the postgres backend on Windows NT. Now
it survives 5 regression test in a cycle with some concurrent connections
during running the tests.
It have looked like a problem with initializing the same semaphore for
second time (they are "preinitialized" for performance reasons in
InitProcGlobal() in storage/lmgr/proc.c)
They should never be "initialized a second time". And the preallocation
is *not* for performance reasons, it is to make sure we can actually get
enough semaphores (rather than dying under load when we fail to get the
N+1'st semaphore when starting the N+1'st backend).
The fix (made for v6.5.1) is here:
[ Fix consists of diking out preallocation of semaphores by postmaster ]
I do not like this patch one bit --- I think it is voodoo that doesn't
really have anything to do with the true problem. I don't know what
the true problem is, mind you, but I don't think this is the way to
fix it.
Is it possible that the CygWin environment doesn't have a correct
emulation of IPC semaphores, such that a sema allocated by one process
(the postmaster) is not available to other procs (the backends)?
That would explain preallocation not working --- but if that's it then
we have major problems in other places, since the code assumes that a
sema once allocated will remain available to later backends.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofMon16Aug1999113554+02002E7F82FAC1FCD2118E1500A024B3BF907DED3B@exchange.mmp.plzen-city.cz | Resolved by subject fallback
They should never be "initialized a second time". And the
preallocation
is *not* for performance reasons, it is to make sure we can
actually get
enough semaphores (rather than dying under load when we fail
to get the
N+1'st semaphore when starting the N+1'st backend).The fix (made for v6.5.1) is here:
[ Fix consists of diking out preallocation of semaphores bypostmaster ]
I do not like this patch one bit --- I think it is voodoo that doesn't
really have anything to do with the true problem. I don't know what
the true problem is, mind you, but I don't think this is the way to
fix it.
I know it is not a perfect solution but now it is better than nothing.
Is it possible that the CygWin environment doesn't have a correct
emulation of IPC semaphores, such that a sema allocated by one process
It seems that there is really a problem in the IPC for cygwin, but I am not
an expert in Windows programming and internals so it is hard for me to make
a better one. But I will try to correct the IPC library.
(the postmaster) is not available to other procs (the backends)?
That would explain preallocation not working --- but if that's it then
we have major problems in other places, since the code assumes that a
sema once allocated will remain available to later backends.
Does this mean that when I have one connection to the server, I end it and
start a new one, this new one will use the same semaphores? But it seems to
work.
Can disabling the semaphores prealocation have some negative effects on the
correct function of the backend?
Dan
Import Notes
Resolved by subject fallback
Horak Daniel <horak@mmp.plzen-city.cz> writes:
I do not like this patch one bit --- I think it is voodoo that doesn't
really have anything to do with the true problem. I don't know what
the true problem is, mind you, but I don't think this is the way to
fix it.
I know it is not a perfect solution but now it is better than nothing.
Does this mean that when I have one connection to the server, I end it and
start a new one, this new one will use the same semaphores? But it seems to
work.
Yes. The way the code used to work (pre 6.5) was that the first backend
to fire up would grab a block of 16 semaphores, which would be used by
backends 2-16; when you started a 17'th concurrent backend another block
of 16 semaphores would be grabbed; etc. The code you diked out simply
forces preallocation of a bunch of semaphores at postmaster start time,
rather than having it done by selected backends. That's why it doesn't
make any sense to me that removing it would fix anything --- you'll
still have backends dependent on semaphores that were created by other
processes, it's just that they were other backends instead of the
postmaster.
In any case, when one backend quits and another one is started, the new
one will re-use the semaphore no longer used by the defunct backend.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofMon16Aug1999163505+02002E7F82FAC1FCD2118E1500A024B3BF907DED3D@exchange.mmp.plzen-city.cz | Resolved by subject fallback
On Mon, Aug 16, 1999 at 10:07:00AM -0400, Tom Lane wrote:
Is it possible that the CygWin environment doesn't have a correct
emulation of IPC semaphores, such that a sema allocated by one process
(the postmaster) is not available to other procs (the backends)?
That would explain preallocation not working --- but if that's it then
we have major problems in other places, since the code assumes that a
sema once allocated will remain available to later backends.
We don't have correct emulation of IPC semapahores since they are not
implemented at all. I assume that if you're relying on persistent
semaphores, then some add-on package is being used.
cgf
In any case, when one backend quits and another one is
started, the new
one will re-use the semaphore no longer used by the defunct backend.
I have tested my solution a bit more and I have to say that reusing a
semaphore by a new backend works OK. But it is not possible for a newly
created backend to use a semaphore allocated by postmaster (it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I can't say that
the ipc library is a 100% correct implementation of SysV IPC, it is probably
(sure ;-) )caused by the Windows internals.
Dan
Import Notes
Resolved by subject fallback
Horak Daniel <horak@mmp.plzen-city.cz> writes:
I have tested my solution a bit more and I have to say that reusing a
semaphore by a new backend works OK. But it is not possible for a newly
created backend to use a semaphore allocated by postmaster (it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way.
It seems that you have found a bug in the cygipc library. I suggest
reporting it to the author of same...
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue17Aug1999140626+02002E7F82FAC1FCD2118E1500A024B3BF907DED3F@exchange.mmp.plzen-city.cz | Resolved by subject fallback
Horak Daniel <horak@mmp.plzen-city.cz> writes:
I have tested my solution a bit more and I have to say that
reusing a
semaphore by a new backend works OK. But it is not possible
for a newly
created backend to use a semaphore allocated by postmaster
(it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seemsthat my solution
uses the ipc library in the right way.
It seems that you have found a bug in the cygipc library. I suggest
reporting it to the author of same...
Or it can be a feature ;-) But before it will be fixed (if it can be fixed)
I would like to see my patch it the sources. It is very simple, without
negative effects... The win32 port will be more stable than it is now. We
still can't consider the win32 port to be run in a production environment.
Dan
Import Notes
Resolved by subject fallback
Horak Daniel <horak@mmp.plzen-city.cz> writes:
Or it can be a feature ;-) But before it will be fixed (if it can be fixed)
I would like to see my patch it the sources. It is very simple, without
negative effects...
How do you know it has no negative effects? The problem that it was
intended to fix only showed up with large numbers of backends (ie, more
than the system limit on number of semaphores, which is depressingly
small on many old-line Unixes). Perhaps cygipc has no limit on number
of semaphores, or perhaps it tries to be a faithful imitation of SysV ;-)
Have you checked?
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue17Aug1999154417+02002E7F82FAC1FCD2118E1500A024B3BF907DED40@exchange.mmp.plzen-city.cz | Resolved by subject fallback
How do you know it has no negative effects? The problem that it was
intended to fix only showed up with large numbers of backends
(ie, more
than the system limit on number of semaphores, which is depressingly
small on many old-line Unixes). Perhaps cygipc has no limit on number
of semaphores, or perhaps it tries to be a faithful imitation
of SysV ;-)
Have you checked?
There is a static limit on the max number of semaphores, it can cause the
same problems as on Unix.
this is part of sys/sem.h:
#define SEMMNI 128 /* ? max # of semaphore identifiers */
#define SEMMSL 32 /* <= 512 max num of semaphores per id */
?? should be 32 sems per id (DH)
#define SEMMNS (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
#define SEMOPM 32 /* ~ 100 max num of ops per semop call */
#define SEMVMX 32767 /* semaphore maximum value */
But I have thought no negative effects on other ports.
Dan
Import Notes
Resolved by subject fallback
[Charset iso-8859-1 unsupported, filtering to ASCII...]
In any case, when one backend quits and another one is
started, the new
one will re-use the semaphore no longer used by the defunct backend.I have tested my solution a bit more and I have to say that reusing a
semaphore by a new backend works OK. But it is not possible for a newly
created backend to use a semaphore allocated by postmaster (it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I can't say that
the ipc library is a 100% correct implementation of SysV IPC, it is probably
(sure ;-) )caused by the Windows internals.
Seems we may have to use the patch, or make some other patch for NT-only
that works around this NT bug.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Horak Daniel <horak@mmp.plzen-city.cz> writes:
I have tested my solution a bit more and I have to say that reusing a
semaphore by a new backend works OK. But it is not possible for a newly
created backend to use a semaphore allocated by postmaster (it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way.It seems that you have found a bug in the cygipc library. I suggest
reporting it to the author of same...
Yes, but can we expect all NT sites to get the patch before using
PostgreSQL? Is there a workaround we can implement?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
It seems that you have found a bug in the cygipc library. I suggest
reporting it to the author of same...Yes, but can we expect all NT sites to get the patch before using
PostgreSQL? Is there a workaround we can implement?
The workaround is in my first mail in this thread - it disables the
preallocation of semaphores (storage/lmgr/proc.c/InitProcGlobal()) in the
cygwin port.Some patch that will correct the behavior of IPC library is not
available yet and I don't if it ever can be available (it can be a desing
problem in the library or Windows internals).
Dan
Import Notes
Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I can't say that
the ipc library is a 100% correct implementation of SysV IPC, it is probably
(sure ;-) )caused by the Windows internals.
Seems we may have to use the patch, or make some other patch for NT-only
that works around this NT bug.
I don't have a problem with installing an NT patch (lord knows there
are plenty of #ifdef __CYGWIN32__'s in the code already). But I have
a problem with *this* patch because I don't believe we understand what
it is doing, and therefore I have no confidence in it. The extent of
our understanding so far is that one backend can create a semaphore that
can be used by a later backend, but the postmaster cannot create a
semaphore that can be used by a later backend. I don't really believe
that; I think there is something else going on. Until we understand
what the something else is, I don't think we have a trustworthy
solution.
The real reason I feel itchy about this is that I know that interprocess
synchronization is a very tricky area, so I'm not confident that the
limited amount of testing Dan can do by himself proves that things are
solid. As the old saw goes, "testing cannot prove the absence of bugs".
I want to have both clean test results *and* an understanding of what
we are doing before I will feel comfortable.
Looking again at the code, it occurs to me that a backend exiting
normally will probably leave its semaphore set nonzero, which could
(given a buggy IPC library) have something to do with whether another
process can attach to the sema or not. The postmaster code is *trying*
to create the semas with nonzero starting values, but I see that the
backend code takes the additional step of doing
semun.val = IpcSemaphoreDefaultStartValue;
semctl(semId, semNum, SETVAL, semun);
whereas the postmaster code doesn't. Maybe the create call isn't
initializing the semaphores the way it's told to? It'd be worth
trying adding a step like this to the postmaster preallocation.
In any case, I'd really like us to get some feedback from the author of
cygipc about this issue. I don't mind working around a bug once we
understand exactly what the bug is --- but in this particular area,
I think guessing our way to a workaround isn't good enough.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofTue17Aug1999104128-0400199908171441.KAA08651@candle.pha.pa.us | Resolved by subject fallback
Got it.
Bruce Momjian <maillist@candle.pha.pa.us> writes:
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I can't say that
the ipc library is a 100% correct implementation of SysV IPC, it is probably
(sure ;-) )caused by the Windows internals.Seems we may have to use the patch, or make some other patch for NT-only
that works around this NT bug.I don't have a problem with installing an NT patch (lord knows there
are plenty of #ifdef __CYGWIN32__'s in the code already). But I have
a problem with *this* patch because I don't believe we understand what
it is doing, and therefore I have no confidence in it. The extent of
our understanding so far is that one backend can create a semaphore that
can be used by a later backend, but the postmaster cannot create a
semaphore that can be used by a later backend. I don't really believe
that; I think there is something else going on. Until we understand
what the something else is, I don't think we have a trustworthy
solution.The real reason I feel itchy about this is that I know that interprocess
synchronization is a very tricky area, so I'm not confident that the
limited amount of testing Dan can do by himself proves that things are
solid. As the old saw goes, "testing cannot prove the absence of bugs".
I want to have both clean test results *and* an understanding of what
we are doing before I will feel comfortable.Looking again at the code, it occurs to me that a backend exiting
normally will probably leave its semaphore set nonzero, which could
(given a buggy IPC library) have something to do with whether another
process can attach to the sema or not. The postmaster code is *trying*
to create the semas with nonzero starting values, but I see that the
backend code takes the additional step of doing
semun.val = IpcSemaphoreDefaultStartValue;
semctl(semId, semNum, SETVAL, semun);
whereas the postmaster code doesn't. Maybe the create call isn't
initializing the semaphores the way it's told to? It'd be worth
trying adding a step like this to the postmaster preallocation.In any case, I'd really like us to get some feedback from the author of
cygipc about this issue. I don't mind working around a bug once we
understand exactly what the bug is --- but in this particular area,
I think guessing our way to a workaround isn't good enough.regards, tom lane
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: owner-pgsql-hackers@postgreSQL.org
[mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Horak Daniel
Sent: Tuesday, August 17, 1999 9:06 PM
To: 'Tom Lane'
Cc: 'pgsql-hackers@postgreSQL.org'
Subject: RE: [HACKERS] backend freezeing on win32 fixed (I hope ;-) )In any case, when one backend quits and another one is
started, the new
one will re-use the semaphore no longer used by the defunct backend.I have tested my solution a bit more and I have to say that reusing a
semaphore by a new backend works OK. But it is not possible for a newly
created backend to use a semaphore allocated by postmaster (it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that
my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I
can't say that
the ipc library is a 100% correct implementation of SysV IPC, it
is probably
(sure ;-) )caused by the Windows internals.
Yutaka Tanida [yutaka@marin.or.jp] and I have examined IPC
library.
We found that postmaster doesn't call exec() after fork() since v6.4.
The value of static/extern variables which cygipc library holds may
be different from their initial values when postmaster fork()s child
backend processes.
I made the following patch for cygipc library on trial.
This patch was effective for Yutaka's test case.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
*** sem.c.orig Tue Dec 01 00:16:25 1998
--- sem.c Tue Aug 17 13:22:06 1999
***************
*** 58,63 ****
--- 58,78 ----
static int GFirstSem = 0; /*PCPC*/
static int GFdSem ; /*PCPC*/
+ static pid_t GProcessId = 0;
+
+ static void init_globals(void)
+ {
+ pid_t pid;
+
+ if (pid=getpid(), pid != GProcessId)
+ {
+ GFirstSem = 0;
+ used_sems = used_semids = max_semid = 0;
+ sem_seq = 0;
+ GProcessId = pid;
+ }
+ }
+
/************************************************************************/
/* Demande d'acces a la zone partagee de gestion des semaphores */
/************************************************************************/
***************
*** 77,82 ****
--- 92,98 ----
{
int LRet ;
+ init_globals();
if( GFirstSem == 0 )
{
if( IsGSemSemExist() )
*** shm.c.orig Tue Dec 01 01:04:57 1998
--- shm.c Tue Aug 17 13:22:27 1999
***************
*** 59,64 ****
--- 59,81 ----
static int GFirstShm = 0; /*PCPC*/
static int GFdShm ; /*PCPC*/
+ /*****************************************/
+ /* Initialization of static variables */
+ /*****************************************/
+ static pid_t GProcessId = 0;
+ static void init_globals(void)
+ {
+ pid_t pid;
+
+ if (pid=getpid(), pid != GProcessId)
+ {
+ GFirstShm = 0;
+ shm_rss = shm_swp = max_shmid = 0;
+ shm_seq = 0;
+ GProcessId = pid;
+ }
+ }
+
/************************************************************************/
/* Demande d'acces a la zone partagee de gestion des shm */
/************************************************************************/
***************
*** 82,87 ****
--- 99,105 ----
{
int LRet ;
+ init_globals();
if( GFirstShm == 0 )
{
if( IsGSemShmExist() )
*** msg.c.orig Tue Dec 01 00:16:09 1998
--- msg.c Tue Aug 17 13:20:04 1999
***************
*** 57,62 ****
--- 57,77 ----
static int GFirstMsg = 0; /*PCPC*/
static int GFdMsg ; /*PCPC*/
+ /*****************************************/
+ /* Initialization of static variables */
+ /*****************************************/
+ static pid_t GProcessId = 0;
+ static void init_globals(void)
+ {
+ pid_t pid;
+
+ if (pid=getpid(), pid != GProcessId)
+ {
+ GFirstMsg = 0;
+ msgbytes = msghdrs = msg_seq = used_queues = max_msqid = 0;
+ GProcessId = pid;
+ }
+ }
/************************************************************************/
/* Demande d'acces a la zone partagee de gestion des semaphores */
/************************************************************************/
***************
*** 79,84 ****
--- 94,100 ----
{
int LRet ;
+ init_globals();
if( GFirstMsg == 0 )
{
if( IsGSemMsgExist() )
In any case, when one backend quits and another one is
started, the new
one will re-use the semaphore no longer used by thedefunct backend.
I have tested my solution a bit more and I have to say that
reusing a
semaphore by a new backend works OK. But it is not possible
for a newly
created backend to use a semaphore allocated by postmaster
(it freezes on
test if the semaphore with given key already exists - done with
semId=semget(semKey, 0, 0) in function IpcSemaphoreCreate() in
storage/ipc/ipc.c ). Why it is, I don't know, but it seems that
my solution
uses the ipc library in the right way. There are no longer any error
messages from the ipc library when running the server. And I
can't say that
the ipc library is a 100% correct implementation of SysV IPC, it
is probably
(sure ;-) )caused by the Windows internals.Yutaka Tanida [yutaka@marin.or.jp] and I have examined IPC
library.We found that postmaster doesn't call exec() after fork() since v6.4.
The value of static/extern variables which cygipc library holds may
be different from their initial values when postmaster fork()s child
backend processes.I made the following patch for cygipc library on trial.
This patch was effective for Yutaka's test case.
I will try it right now, it looks interesting. It could also explain some
"non-deterministic" behavior of the cygipc library.
Dan
Import Notes
Resolved by subject fallback
I made the following patch for cygipc library on trial.
This patch was effective for Yutaka's test case.I will try it right now, it looks interesting. It could also
explain some
"non-deterministic" behavior of the cygipc library.
After first few tests I can say, that the patch for cygipc looks like the
right solution of the backend-freezing problem. So we can put it into
src/win32 and mention to apply it in the README.NT in part about cygipc. I
will send it to author of the cygipc library.
Dan
Import Notes
Resolved by subject fallback