FATAL: semctl(1672698088, 12, SETVAL, 0) failed

Started by Qingqing Zhouabout 20 years ago6 messagesbugs
Jump to latest
#1Qingqing Zhou
zhouqq@cs.toronto.edu

I encountered an error when I fast shutdown 8.1.1 on Win2k:

FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation
was interrupted by a call to WSACancelBlockingCall.

A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
dig out the
original post from our web archives):

From: Niederland
Date: Tues, Dec 13 2005 9:49 am

2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.

---

There are two problems here:

(1) Why a socket error?
In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

(2) What's happened here?
It may come from PGSemaphoreReset(), and win32 semop() looks like this:

ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
...
else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
{
pgwin32_dispatch_queued_signals();
errno = EINTR;
}
else if (ret == WAIT_TIMEOUT)
errno = EAGAIN;

So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
by a TIMEOUT ... any ideas?

Regards,
Qingqing

#2Bruce Momjian
bruce@momjian.us
In reply to: Qingqing Zhou (#1)
Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

Qingqing Zhou wrote:

I encountered an error when I fast shutdown 8.1.1 on Win2k:

FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation
was interrupted by a call to WSACancelBlockingCall.

A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
dig out the
original post from our web archives):

From: Niederland
Date: Tues, Dec 13 2005 9:49 am

2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.

---

There are two problems here:

(1) Why a socket error?
In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

We did this so that our code could refer to EINTR/EAGAIN without
port-specific tests.

(2) What's happened here?
It may come from PGSemaphoreReset(), and win32 semop() looks like this:

ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
...
else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
{
pgwin32_dispatch_queued_signals();
errno = EINTR;
}
else if (ret == WAIT_TIMEOUT)
errno = EAGAIN;

So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
by a TIMEOUT ... any ideas?

I looked at the documentation for the function:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp

and it isn't clear what return failure values it has. We certainly
could loop on WSAEINTR. Can you test it?

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +

#3Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Qingqing Zhou (#1)
Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote

In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

We did this so that our code could refer to EINTR/EAGAIN without
port-specific tests.

AFAICS, by doing so, the EINTR/EAGAIN will be translated into
WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
appropriate for the code not involving any socket stuff ... I think we need
a fix here.

(2) What's happened here?
It may come from PGSemaphoreReset(), and win32 semop() looks like this:

ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
...
else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
{
pgwin32_dispatch_queued_signals();
errno = EINTR;
}
else if (ret == WAIT_TIMEOUT)
errno = EAGAIN;

So it seems the EINTR is caused by an incoming signal, the EAGAIN is

caused

by a TIMEOUT ... any ideas?

I looked at the documentation for the function:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp

and it isn't clear what return failure values it has. We certainly
could loop on WSAEINTR. Can you test it?

Yeah, looking at other code of using semop(), we could plug in a loop in the
win32 semctl():

   /* Quickly lock/unlock the semaphore (if we can) */
+ do
+ {
+    errStatus = semop(semId, &sops, 1);
+ } while (errStatus < 0 && errno == EINTR);

if (semop(semId, &sops, 1) < 0)
return -1;

But:
(1) The EINTR problem happens rather rare, so testing it is difficult;
(2) I would rather not doing the above changes before we understand what's
happened here, especially when we have seen a EAGAIN reported here.

Regards,
Qingqing

#4Bruce Momjian
bruce@momjian.us
In reply to: Qingqing Zhou (#3)
Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

Qingqing Zhou wrote:

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote

In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

We did this so that our code could refer to EINTR/EAGAIN without
port-specific tests.

AFAICS, by doing so, the EINTR/EAGAIN will be translated into
WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
appropriate for the code not involving any socket stuff ... I think we need
a fix here.

Uh, how do we handle it now? I thought we did just that.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp

and it isn't clear what return failure values it has. We certainly
could loop on WSAEINTR. Can you test it?

Yeah, looking at other code of using semop(), we could plug in a loop in the
win32 semctl():

/* Quickly lock/unlock the semaphore (if we can) */
+ do
+ {
+    errStatus = semop(semId, &sops, 1);
+ } while (errStatus < 0 && errno == EINTR);

if (semop(semId, &sops, 1) < 0)
return -1;

But:
(1) The EINTR problem happens rather rare, so testing it is difficult;
(2) I would rather not doing the above changes before we understand what's
happened here, especially when we have seen a EAGAIN reported here.

OK, so how do we find the answer?

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +

#5Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Bruce Momjian (#4)
Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

On Tue, 28 Feb 2006, Bruce Momjian wrote:

Uh, how do we handle it now? I thought we did just that.

OK, so how do we find the answer?

For both problems, I am uncertain (or I've sent a patch already :-(). Call
more artillery support here ...

Regards,
Qingqing

#6Bruce Momjian
bruce@momjian.us
In reply to: Qingqing Zhou (#5)
Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

Thread added to TODO.detail for Win32:

o Check WSACancelBlockingCall() for interrupts (win32intr)

---------------------------------------------------------------------------

Qingqing Zhou wrote:

On Tue, 28 Feb 2006, Bruce Momjian wrote:

Uh, how do we handle it now? I thought we did just that.

OK, so how do we find the answer?

For both problems, I am uncertain (or I've sent a patch already :-(). Call
more artillery support here ...

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +