Question on win32 semaphore simulation

Started by Qingqing Zhoualmost 20 years ago15 messages
#1Qingqing Zhou
zhouqq@cs.toronto.edu

As I reviewed the win32/sema.c, there is some code that I am not clear, can
anybody explain please?

In semctl(SETVAL):

if (semun.val < sem_counts[semNum])
sops.sem_op = -1;
else
sops.sem_op = 1;

/* Quickly lock/unlock the semaphore (if we can) */
if (semop(semId, &sops, 1) < 0)
return -1;

When semun.val < sem_counts[semNum], it means we want to set the semaphore
to semun.val, but because somebody ReleaseSemaphore() for serveral times, so
we should wait for this semaphore several times (i.e., sem_counts[semNum] -
semun.val) to recover it. When semun.val > sem_counts[semNum], we should
ReleaseSemaphore() serveral times to recovery it.

That is, should the sem_op assignment logic be:

sops.sem_op = semun.val - sem_counts[semNum];

Of course, this would require we add a loop logic in semop().

Regards,
Qingqing

#2Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Qingqing Zhou (#1)
Re: Question on win32 semaphore simulation

"Qingqing Zhou" <zhouqq@cs.toronto.edu> wrote

As I reviewed the win32/sema.c, there is some code that I am not clear,

can

anybody explain please?

There is another problem related to concurrent operations on win32 sema. Say
two processes are doing semop(+1) concurrently. Look at this code:

/* Don't want the lock anymore */
sem_counts[sops[0].sem_num]++;
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);

Except for the problem mentioned in the above thread that the first line
should be: sem_counts[sops[0].sem_num] += sops[0].sem_op, the sem_counts[]
are unprotected by anything, so we might lose an update. Maybe I totally
misunderstand something?

Regards,
Qingqing

#3Magnus Hagander
mha@sollentuna.net
In reply to: Qingqing Zhou (#2)
Re: Question on win32 semaphore simulation

As I reviewed the win32/sema.c, there is some code that I am not
clear,

can

anybody explain please?

There is another problem related to concurrent operations on
win32 sema. Say two processes are doing semop(+1)
concurrently. Look at this code:

/* Don't want the lock anymore */
sem_counts[sops[0].sem_num]++;
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);

Except for the problem mentioned in the above thread that the
first line should be: sem_counts[sops[0].sem_num] +=
sops[0].sem_op, the sem_counts[] are unprotected by anything,
so we might lose an update. Maybe I totally misunderstand something?

I've never really looked intot eh semaphore stuff, but if sem_counts[]
is in shared memory it should definitly be protected.

Looking at the code, it looks fairly complex to me. I don't really know
how sysv semaphores are supposed to work, or how we use them, but
perhaps the whole piece of code can be simplified?

//Magnus

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#3)
Re: Question on win32 semaphore simulation

"Magnus Hagander" <mha@sollentuna.net> writes:

Looking at the code, it looks fairly complex to me. I don't really know
how sysv semaphores are supposed to work, or how we use them, but
perhaps the whole piece of code can be simplified?

I'm not sure why the win32 port chose to emulate the SysV semaphore
interface anyway. You could equally well have used the Posix interface
(src/backend/port/posix_sema.c). Or, given Microsoft's NIH tendencies,
you might have needed to write a third implementation of the pg_sema.h
interface ... but it'd likely still be no larger than win32/sema.c ...

regards, tom lane

#5Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#4)
Re: Question on win32 semaphore simulation

Looking at the code, it looks fairly complex to me. I don't really
know how sysv semaphores are supposed to work, or how we

use them, but

perhaps the whole piece of code can be simplified?

I'm not sure why the win32 port chose to emulate the SysV
semaphore interface anyway. You could equally well have used
the Posix interface (src/backend/port/posix_sema.c). Or,
given Microsoft's NIH tendencies, you might have needed to
write a third implementation of the pg_sema.h interface ...
but it'd likely still be no larger than win32/sema.c ...

I think that's a leftover from that code coming from a time when we
didn't have an abstraction for semaphores. Specifically, it may have
come out of the peerdirect port with was IIRC 7.3.

Going with the third option might be a good idea - win32 *does* have
native semaphores, and most of the work appears to be first adapting our
need to sysv, then adapting sysv to win32. Worth looking at I guess.

//Magnus

#6Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Magnus Hagander (#5)
Re: Question on win32 semaphore simulation

""Magnus Hagander"" <mha@sollentuna.net> wrote

I'm not sure why the win32 port chose to emulate the SysV
semaphore interface anyway. You could equally well have used
the Posix interface (src/backend/port/posix_sema.c). Or,
given Microsoft's NIH tendencies, you might have needed to
write a third implementation of the pg_sema.h interface ...
but it'd likely still be no larger than win32/sema.c ...

Going with the third option might be a good idea - win32 *does* have
native semaphores, and most of the work appears to be first adapting our
need to sysv, then adapting sysv to win32. Worth looking at I guess.

Emulating the posix interface has almost the same difficulty as SysV for two
reasons:
(1) we don't have to emulate everything of SysV. For example, the "nops"
parameter of semop() since we don't use it;
(2) the killer function is PGSemaphoreReset(). There is no direct function
for this in Win32 either.

So we might want to fix current win32/sema.c for two problems:
(1) semctl(SETVAL, val=0) - there is no other "val" than zero is used;
(2) concurrent access to sem_counts[];

For problem 1, we can do it by locking the whole session of semaphore
operation (which is the saftest way, but it is performance bad enough), or
we could just do it as the posix_sema.c/PGSemaphoreReset does (which is the
easiest way - just get rid of the EAGAIN report if we can't get it). But
both of them to me, too bad - because I don't really understand why there is
a semctl(SETVAL, val) in SysV anyway - IMHO this does not make any sense -
even if you SETVAL to some value, you are not guranteed that you will stll
get this value just after the function return. The
UnlockBuffers()/UnpinBuffer() fix several days ago have proved this. In
short, if we can get rid of this usage, both POSIX and Win32 will be happy
to see it.

Regards,
Qingqing

#7Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Magnus Hagander (#5)
Re: Question on win32 semaphore simulation

"Qingqing Zhou" <zhouqq@cs.toronto.edu> wrote

So we might want to fix current win32/sema.c for two problems:
(1) semctl(SETVAL, val=0) - there is no other "val" than zero is used;
(2) concurrent access to sem_counts[];

Attached is a patch for the above proposed change -- but still, I hope we
don't need semctl(SETVAL) at all.

Regards,
Qingqing

---------

Index: sema.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/port/win32/sema.c,v
retrieving revision 1.13
diff -c -r1.13 sema.c
*** sema.c 9 Apr 2006 19:21:34 -0000 1.13
--- sema.c 19 Apr 2006 03:56:55 -0000
***************
*** 42,48 ****
   /* Fix the count of all sem of the pool to semun.array */
   if (flag == SETALL)
   {
!   int   i;
    struct sembuf sops;
    sops.sem_flg = IPC_NOWAIT;
--- 42,49 ----
   /* Fix the count of all sem of the pool to semun.array */
   if (flag == SETALL)
   {
!   int  i;
!   int  errStatus;
    struct sembuf sops;

sops.sem_flg = IPC_NOWAIT;
***************
*** 60,66 ****
sops.sem_num = i;

     /* Quickly lock/unlock the semaphore (if we can) */
!    if (semop(semId, &sops, 1) < 0)
      return -1;
    }
    return 1;
--- 61,72 ----
     sops.sem_num = i;

/* Quickly lock/unlock the semaphore (if we can) */
! do
! {
! errStatus = semop(semId, &sops, 1);
! } while (errStatus < 0 && errno == EINTR);
!
! if (errStatus < 0)
return -1;
}
return 1;
***************
*** 72,88 ****
if (semun.val != sem_counts[semNum])
{
struct sembuf sops;

sops.sem_flg = IPC_NOWAIT;
sops.sem_num = semNum;
!
! if (semun.val < sem_counts[semNum])
! sops.sem_op = -1;
! else
! sops.sem_op = 1;

/* Quickly lock/unlock the semaphore (if we can) */
! if (semop(semId, &sops, 1) < 0)
return -1;
}

--- 78,96 ----
    if (semun.val != sem_counts[semNum])
    {
     struct sembuf sops;
+    int  errStatus;

sops.sem_flg = IPC_NOWAIT;
sops.sem_num = semNum;
! sops.sem_op = semun.val - sem_counts[semNum];

/* Quickly lock/unlock the semaphore (if we can) */
! do
! {
! errStatus = semop(semId, &sops, 1);
! } while (errStatus < 0 && errno == EINTR);
!
! if (errStatus < 0)
return -1;
}

***************
*** 226,269 ****

cur_handle = sem_handles[sops[0].sem_num];

! if (sops[0].sem_op == -1)
{
DWORD ret;
HANDLE wh[2];

wh[0] = cur_handle;
wh[1] = pgwin32_signal_event;

! ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
!
! if (ret == WAIT_OBJECT_0)
{
! /* We got it! */
! sem_counts[sops[0].sem_num]--;
! return 0;
}
! else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
! {
! /* Signal event is set - we have a signal to deliver */
! pgwin32_dispatch_queued_signals();
! errno = EINTR;
! }
! else if (ret == WAIT_TIMEOUT)
! /* Couldn't get it */
! errno = EAGAIN;
! else
! errno = EIDRM;
}
! else if (sops[0].sem_op > 0)
{
/* Don't want the lock anymore */
! sem_counts[sops[0].sem_num]++;
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);
return 0;
}
- else
- /* Not supported */
- errno = ERANGE;

   /* If we get down here, then something is wrong */
   return -1;
--- 234,295 ----

cur_handle = sem_handles[sops[0].sem_num];

! if (sops[0].sem_op < 0)
{
DWORD ret;
HANDLE wh[2];
+ int i;

wh[0] = cur_handle;
wh[1] = pgwin32_signal_event;

! /*
! * Try to consume the specified sem count. If we can't, we just
! * quit the operation silently because it is possible there is
! * another process just did some semop(-k) during our loop.
! */
! errno = 0;
! for (i = 0; i < -(sops[0].sem_op); i++)
{
! ret = WaitForMultipleObjectsEx(2, wh, FALSE,
! (sops[0].sem_flg & IPC_NOWAIT) ? 0 : INFINITE, TRUE);
!
! if (ret == WAIT_OBJECT_0)
! {
! /* We got it! */
! InterlockedDecrement((volatile long *)(sem_counts + sops[0].sem_num));
! }
! else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
! {
! /* Signal event is set - we have a signal to deliver */
! pgwin32_dispatch_queued_signals();
! errno = EINTR;
! }
! else if (ret == WAIT_TIMEOUT)
! /* Couldn't get it */
! break;
! else
! errno = EIDRM;
!
! /* return immediately on error */
! if (errno != 0)
! break;
}
!
! /* successfully done */
! if (errno == 0)
! return 0;
}
! else
{
+ Assert(sops[0].sem_op > 0);
+
/* Don't want the lock anymore */
! InterlockedExchangeAdd((volatile long *)
! (sem_counts + sops[0].sem_num), sops[0].sem_op);
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);
return 0;
}

/* If we get down here, then something is wrong */
return -1;

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Qingqing Zhou (#6)
Re: Question on win32 semaphore simulation

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:

(2) the killer function is PGSemaphoreReset(). There is no direct function
for this in Win32 either.

If you can do PGSemaphoreTryLock, then Reset need only be a loop around
it (cf. posix_sema.c). In current usage Reset doesn't have to be very
efficient at all, because it's only used during backend startup to bring
the semaphore to a known state.

(1) semctl(SETVAL, val=0) - there is no other "val" than zero is used;

Really? Better look again.

If you think the SysV interface is baroque (which I don't disagree
with), then you should just get rid of it entirely and implement
pg_sema.h directly atop the Windows primitives. I don't have a lot of
sympathy for "let's implement just part of SysV because I don't like
that other part". There is no contract saying that sysv_sema.c might
not start using SysV features it doesn't use today.

regards, tom lane

#9Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#8)
Re: Question on win32 semaphore simulation

(2) the killer function is PGSemaphoreReset(). There is no direct
function for this in Win32 either.

If you can do PGSemaphoreTryLock, then Reset need only be a
loop around it (cf. posix_sema.c). In current usage Reset
doesn't have to be very efficient at all, because it's only
used during backend startup to bring the semaphore to a known state.

(1) semctl(SETVAL, val=0) - there is no other "val" than

zero is used;

Really? Better look again.

If you think the SysV interface is baroque (which I don't
disagree with), then you should just get rid of it entirely
and implement pg_sema.h directly atop the Windows primitives.
I don't have a lot of sympathy for "let's implement just
part of SysV because I don't like that other part". There is
no contract saying that sysv_sema.c might not start using
SysV features it doesn't use today.

That's what I was thinking when I said "option 3". It shouldn't be *too*
hard, and much cleaner.

//Magnus

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Qingqing Zhou (#7)
Re: [HACKERS] Question on win32 semaphore simulation

While we have installed a Win32-specific semaphore implementation for
CVS HEAD, what do we want do apply for the back branches, 8.0.X and
8.1.X. Is this the patch that should be applied?

---------------------------------------------------------------------------

Qingqing Zhou wrote:

"Qingqing Zhou" <zhouqq@cs.toronto.edu> wrote

So we might want to fix current win32/sema.c for two problems:
(1) semctl(SETVAL, val=0) - there is no other "val" than zero is used;
(2) concurrent access to sem_counts[];

Attached is a patch for the above proposed change -- but still, I hope we
don't need semctl(SETVAL) at all.

Regards,
Qingqing

---------

Index: sema.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/port/win32/sema.c,v
retrieving revision 1.13
diff -c -r1.13 sema.c
*** sema.c 9 Apr 2006 19:21:34 -0000 1.13
--- sema.c 19 Apr 2006 03:56:55 -0000
***************
*** 42,48 ****
/* Fix the count of all sem of the pool to semun.array */
if (flag == SETALL)
{
!   int   i;
struct sembuf sops;
sops.sem_flg = IPC_NOWAIT;
--- 42,49 ----
/* Fix the count of all sem of the pool to semun.array */
if (flag == SETALL)
{
!   int  i;
!   int  errStatus;
struct sembuf sops;

sops.sem_flg = IPC_NOWAIT;
***************
*** 60,66 ****
sops.sem_num = i;

/* Quickly lock/unlock the semaphore (if we can) */
!    if (semop(semId, &sops, 1) < 0)
return -1;
}
return 1;
--- 61,72 ----
sops.sem_num = i;

/* Quickly lock/unlock the semaphore (if we can) */
! do
! {
! errStatus = semop(semId, &sops, 1);
! } while (errStatus < 0 && errno == EINTR);
!
! if (errStatus < 0)
return -1;
}
return 1;
***************
*** 72,88 ****
if (semun.val != sem_counts[semNum])
{
struct sembuf sops;

sops.sem_flg = IPC_NOWAIT;
sops.sem_num = semNum;
!
! if (semun.val < sem_counts[semNum])
! sops.sem_op = -1;
! else
! sops.sem_op = 1;

/* Quickly lock/unlock the semaphore (if we can) */
! if (semop(semId, &sops, 1) < 0)
return -1;
}

--- 78,96 ----
if (semun.val != sem_counts[semNum])
{
struct sembuf sops;
+    int  errStatus;

sops.sem_flg = IPC_NOWAIT;
sops.sem_num = semNum;
! sops.sem_op = semun.val - sem_counts[semNum];

/* Quickly lock/unlock the semaphore (if we can) */
! do
! {
! errStatus = semop(semId, &sops, 1);
! } while (errStatus < 0 && errno == EINTR);
!
! if (errStatus < 0)
return -1;
}

***************
*** 226,269 ****

cur_handle = sem_handles[sops[0].sem_num];

! if (sops[0].sem_op == -1)
{
DWORD ret;
HANDLE wh[2];

wh[0] = cur_handle;
wh[1] = pgwin32_signal_event;

! ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
!
! if (ret == WAIT_OBJECT_0)
{
! /* We got it! */
! sem_counts[sops[0].sem_num]--;
! return 0;
}
! else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
! {
! /* Signal event is set - we have a signal to deliver */
! pgwin32_dispatch_queued_signals();
! errno = EINTR;
! }
! else if (ret == WAIT_TIMEOUT)
! /* Couldn't get it */
! errno = EAGAIN;
! else
! errno = EIDRM;
}
! else if (sops[0].sem_op > 0)
{
/* Don't want the lock anymore */
! sem_counts[sops[0].sem_num]++;
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);
return 0;
}
- else
- /* Not supported */
- errno = ERANGE;

/* If we get down here, then something is wrong */
return -1;
--- 234,295 ----

cur_handle = sem_handles[sops[0].sem_num];

! if (sops[0].sem_op < 0)
{
DWORD ret;
HANDLE wh[2];
+ int i;

wh[0] = cur_handle;
wh[1] = pgwin32_signal_event;

! /*
! * Try to consume the specified sem count. If we can't, we just
! * quit the operation silently because it is possible there is
! * another process just did some semop(-k) during our loop.
! */
! errno = 0;
! for (i = 0; i < -(sops[0].sem_op); i++)
{
! ret = WaitForMultipleObjectsEx(2, wh, FALSE,
! (sops[0].sem_flg & IPC_NOWAIT) ? 0 : INFINITE, TRUE);
!
! if (ret == WAIT_OBJECT_0)
! {
! /* We got it! */
! InterlockedDecrement((volatile long *)(sem_counts + sops[0].sem_num));
! }
! else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
! {
! /* Signal event is set - we have a signal to deliver */
! pgwin32_dispatch_queued_signals();
! errno = EINTR;
! }
! else if (ret == WAIT_TIMEOUT)
! /* Couldn't get it */
! break;
! else
! errno = EIDRM;
!
! /* return immediately on error */
! if (errno != 0)
! break;
}
!
! /* successfully done */
! if (errno == 0)
! return 0;
}
! else
{
+ Assert(sops[0].sem_op > 0);
+
/* Don't want the lock anymore */
! InterlockedExchangeAdd((volatile long *)
! (sem_counts + sops[0].sem_num), sops[0].sem_op);
ReleaseSemaphore(cur_handle, sops[0].sem_op, NULL);
return 0;
}

/* If we get down here, then something is wrong */
return -1;

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: [HACKERS] Question on win32 semaphore simulation

Bruce Momjian <pgman@candle.pha.pa.us> writes:

While we have installed a Win32-specific semaphore implementation for
CVS HEAD, what do we want do apply for the back branches, 8.0.X and
8.1.X. Is this the patch that should be applied?

Leave 'em alone. That code has zero field validation, and should
certainly not get shipped until it's survived a beta-test cycle.

regards, tom lane

#12Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#11)
Re: [HACKERS] Question on win32 semaphore simulation

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

While we have installed a Win32-specific semaphore implementation for
CVS HEAD, what do we want do apply for the back branches, 8.0.X and
8.1.X. Is this the patch that should be applied?

Leave 'em alone. That code has zero field validation, and should
certainly not get shipped until it's survived a beta-test cycle.

Uh, this is a bug fix, and the patch I am asking about is not the Win32
semaphore reimplementation but a more limited fix. Here is the problem
report:

http://archives.postgresql.org/pgsql-bugs/2006-04/msg00101.php

The test request:

http://archives.postgresql.org/pgsql-bugs/2006-04/msg00251.php

and the successful test run:

http://archives.postgresql.org/pgsql-bugs/2006-05/msg00002.php

We don't require bug fixes to go through beta testing.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Bruce Momjian (#12)
Re: [HACKERS] Question on win32 semaphore simulation

On Sun, 7 May 2006, Bruce Momjian wrote:

Leave 'em alone. That code has zero field validation, and should
certainly not get shipped until it's survived a beta-test cycle.

Uh, this is a bug fix, and the patch I am asking about is not the Win32
semaphore reimplementation but a more limited fix.

Sorry for the late reply. Maybe more intensive tests are needed? Since
this bug seems could not lead data corruption, we can wait till next bug
report and let the user test it then decide to apply?

Regards,
Qingqing

#14Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Qingqing Zhou (#13)
Re: [HACKERS] Question on win32 semaphore simulation

Qingqing Zhou wrote:

On Sun, 7 May 2006, Bruce Momjian wrote:

Leave 'em alone. That code has zero field validation, and should
certainly not get shipped until it's survived a beta-test cycle.

Uh, this is a bug fix, and the patch I am asking about is not the Win32
semaphore reimplementation but a more limited fix.

Sorry for the late reply. Maybe more intensive tests are needed? Since
this bug seems could not lead data corruption, we can wait till next bug
report and let the user test it then decide to apply?

Well we did have a bug report by Peter Brant, and a test by him with the
patch that fixes it, so it seems it should be applied. The URLs I
posted had that information.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Qingqing Zhou (#13)
Re: [HACKERS] Question on win32 semaphore simulation

"Bruce Momjian" <pgman@candle.pha.pa.us> wrote

Sorry for the late reply. Maybe more intensive tests are needed? Since
this bug seems could not lead data corruption, we can wait till next bug
report and let the user test it then decide to apply?

Well we did have a bug report by Peter Brant, and a test by him with the
patch that fixes it, so it seems it should be applied. The URLs I
posted had that information.

Ok, go ahead. What I am worrying about is to cause bigger trouble than now
... I will watch it if anything unexpected reported.

Regards,
Qingqing