WL_SOCKET_ACCEPT fairness on Windows
Hi,
Commit 7389aad6 started using WaitEventSetWait() to wait for incoming
connections. Before that, we used select(), for which we have our own
implementation for Windows.
While hacking on patches to rip a bunch of unused Win32 socket wrapper
code out, I twigged that the old pgwin32_select() code was careful to
report multiple sockets at once by brute force polling of all of them,
while WaitEventSetWait() doesn't do that: it reports just one event,
because that's what the Windows WaitForMultipleEvents() syscall does.
I guess this means you can probably fill up the listen queue of server
socket 1 to prevent server socket 2 from ever being serviced, whereas
on Unix we'll accept one connection at a time from each in round-robin
fashion.
I think we could get the same effect as pgwin32_select() more cheaply
by doing the initial WaitForMultipleEvents() call with the caller's
timeout exactly as we do today, and then, while we have space,
repeatedly calling
WaitForMultipleEvents(handles=&events[last_signaled_event_index + 1],
timeout=0) until it reports timeout. Windows always reports the
signaled event with the lowest index in the array, so the idea is to
poll the remaining part of the array without waiting, to check for any
more. In the most common case of FEBE socket waits etc there would be
no extra system call (because it uses nevents=1, so no space for
more), and in the postmaster's main loop there would commonly be only
one extra system call to determine that no other events have been
signaled.
The attached patch shows the idea. It's using an ugly goto, but I
guess it should be made decent with a real loop; I just didn't want to
change the indentation level for this POC.
I mention this now because I'm not sure whether to consider this an
'open item' for 16, or merely an enhancement for 17. I guess the
former, because someone might call that a new denial of service
vector. On the other hand, if you fill up the listen queue for socket
1 with enough vigour, you're also denying service to socket 1, so I
don't know if it's worth worrying about. Opinions on that?
I don't have Windows to test any of this. Although this patch passes
on CI, that means nothing, as I don't expect the new code to be
reached, so I'm hoping to find a someone who would be able to set up
such a test on Windows, ie putting elog in to the new code path and
trying to reach it by connecting to two different ports fast enough to
exercise the multiple event case.
Or maybe latch.c really needs its own test suite.
Attachments:
0001-Teach-WaitEventSetWait-to-report-multiple-events-on-.patchtext/x-patch; charset=US-ASCII; name=0001-Teach-WaitEventSetWait-to-report-multiple-events-on-.patchDownload
From 733fd375578d8216fed9a81d4a1a575b1f542ca9 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 1 Apr 2023 12:36:12 +1300
Subject: [PATCH] Teach WaitEventSetWait to report multiple events on Windows.
The WaitEventSet implementation on Windows has always reported only one
event at a time, and always the "lowest" in its event array. Since
commit 7389aad6 started using WaitEventSet to handle incoming socket
connections, this unfairness might potentially upset someone who wants
to handle incoming connection on multiple sockets. If one of them has a
non-empty listen queue due to incoming connections, the other might
never be serviced. The previously coding based on select() was fair in
that way.
Fix, by teaching WaitEventSetWait() to poll for extra events. No change
in behavior in the common case of callers with nevents=1, but for the
postmaster's main look, we'll drain all the events that can fit in the
output buffer, which is deliberately set large enough to handle the
maximum possible number of sockets. This brings the Windows behavior in
line with Unix.
---
src/backend/storage/ipc/latch.c | 34 +++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f4123e7de7..cc7b572008 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -2025,6 +2025,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
*/
cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];
+loop:
+
occurred_events->pos = cur_event->pos;
occurred_events->user_data = cur_event->user_data;
occurred_events->events = 0;
@@ -2044,6 +2046,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
occurred_events->events = WL_LATCH_SET;
occurred_events++;
returned_events++;
+ nevents--;
}
}
else if (cur_event->events == WL_POSTMASTER_DEATH)
@@ -2063,6 +2066,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
occurred_events->events = WL_POSTMASTER_DEATH;
occurred_events++;
returned_events++;
+ nevents--;
}
}
else if (cur_event->events & WL_SOCKET_MASK)
@@ -2124,6 +2128,36 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
{
occurred_events++;
returned_events++;
+ nevents--;
+ }
+ }
+
+ /*
+ * Do we have space to report more events that might also be signaled later
+ * in the array than cur_event? Being able to return multiple socket
+ * events at a time like the Unix implementations might be important for
+ * client code that wants to be able to service busy sockets fairly.
+ */
+ if (nevents > 0)
+ {
+ int next_pos = cur_event->pos + 1;
+ int count = set->nevents - next_pos;
+
+ if (count > 0)
+ {
+ /*
+ * Poll with zero timeout, and ignore errors now because we
+ * already have events to report.
+ */
+ rc = WaitForMultipleObjects(count,
+ set->handles + next_pos + 1,
+ false,
+ 0);
+ if (rc >= WAIT_OBJECT_0 && rc < WAIT_OBJECT_0 + count)
+ {
+ cur_event = &set->events[next_pos + rc];
+ goto loop;
+ }
}
}
--
2.40.0
Hi,
On 2023-04-01 13:42:21 +1300, Thomas Munro wrote:
Commit 7389aad6 started using WaitEventSetWait() to wait for incoming
connections. Before that, we used select(), for which we have our own
implementation for Windows.While hacking on patches to rip a bunch of unused Win32 socket wrapper
code out, I twigged that the old pgwin32_select() code was careful to
report multiple sockets at once by brute force polling of all of them,
while WaitEventSetWait() doesn't do that: it reports just one event,
because that's what the Windows WaitForMultipleEvents() syscall does.
I guess this means you can probably fill up the listen queue of server
socket 1 to prevent server socket 2 from ever being serviced, whereas
on Unix we'll accept one connection at a time from each in round-robin
fashion.
It does indeed sound like it'd behave that way:
If bWaitAll is FALSE, the return value minus WAIT_OBJECT_0 indicates the
lpHandles array index of the object that satisfied the wait. If more than
one object became signaled during the call, this is the array index of the
signaled object with the smallest index value of all the signaled objects.
I wonder if we ought to bite the bullet and replace the use of
WaitForMultipleObjects() with RegisterWaitForSingleObject() and then use
GetQueuedCompletionStatus() to wait. The fairness issue here is a motivation,
but the bigger one is that that'd get us out from under the
MAXIMUM_WAIT_OBJECTS (IIRC 64) limit. Afaict that'd also allow us to read
multiple notifications at once, using GetQueuedCompletionStatusEx().
Medium term that'd also be a small step towards using readiness based APIs in
a few places...
I think we could get the same effect as pgwin32_select() more cheaply
by doing the initial WaitForMultipleEvents() call with the caller's
timeout exactly as we do today, and then, while we have space,
repeatedly calling
WaitForMultipleEvents(handles=&events[last_signaled_event_index + 1],
timeout=0) until it reports timeout.
That would make sense, and should indeed be reasonable cost-wise.
I mention this now because I'm not sure whether to consider this an
'open item' for 16, or merely an enhancement for 17. I guess the
former, because someone might call that a new denial of service
vector. On the other hand, if you fill up the listen queue for socket
1 with enough vigour, you're also denying service to socket 1, so I
don't know if it's worth worrying about. Opinions on that?
I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f4123e7de7..cc7b572008 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -2025,6 +2025,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, */ cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];+loop:
+
As far as I can tell, we'll never see WL_LATCH_SET or WL_POSTMASTER_DEATH. I
think it'd look cleaner to move the body of if (cur_event->events & WL_SOCKET_MASK)
into a separate function that we then also can call further down.
occurred_events->pos = cur_event->pos;
occurred_events->user_data = cur_event->user_data;
occurred_events->events = 0;
@@ -2044,6 +2046,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
occurred_events->events = WL_LATCH_SET;
occurred_events++;
returned_events++;
+ nevents--;
}
}
else if (cur_event->events == WL_POSTMASTER_DEATH)
@@ -2063,6 +2066,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
occurred_events->events = WL_POSTMASTER_DEATH;
occurred_events++;
returned_events++;
+ nevents--;
}
}
else if (cur_event->events & WL_SOCKET_MASK)
@@ -2124,6 +2128,36 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
{
occurred_events++;
returned_events++;
+ nevents--;
+ }
+ }
Seems like we could use returned_events to get nevents in the way you want it,
without adding even more ++/-- to each of the different events?
Greetings,
Andres Freund
On Sat, Apr 1, 2023 at 2:35 PM Andres Freund <andres@anarazel.de> wrote:
I wonder if we ought to bite the bullet and replace the use of
WaitForMultipleObjects() with RegisterWaitForSingleObject() and then use
GetQueuedCompletionStatus() to wait. The fairness issue here is a motivation,
but the bigger one is that that'd get us out from under the
MAXIMUM_WAIT_OBJECTS (IIRC 64) limit. Afaict that'd also allow us to read
multiple notifications at once, using GetQueuedCompletionStatusEx().
Interesting. So a callback would run in a system-managed thread, and
that would post a custom message in an IOCP for us to read, kinda like
the fake waitpid() thing? Seems a bit gross and context-switchy but I
agree that the 64 event limit is also terrible.
Medium term that'd also be a small step towards using readiness based APIs in
a few places...
Yeah, that would be cool.
I think we could get the same effect as pgwin32_select() more cheaply
by doing the initial WaitForMultipleEvents() call with the caller's
timeout exactly as we do today, and then, while we have space,
repeatedly calling
WaitForMultipleEvents(handles=&events[last_signaled_event_index + 1],
timeout=0) until it reports timeout.That would make sense, and should indeed be reasonable cost-wise.
Cool.
I mention this now because I'm not sure whether to consider this an
'open item' for 16, or merely an enhancement for 17. I guess the
former, because someone might call that a new denial of service
vector. On the other hand, if you fill up the listen queue for socket
1 with enough vigour, you're also denying service to socket 1, so I
don't know if it's worth worrying about. Opinions on that?I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?
Alright, let's see if anyone else thinks this is worth fixing for 16.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f4123e7de7..cc7b572008 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -2025,6 +2025,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, */ cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];+loop:
+As far as I can tell, we'll never see WL_LATCH_SET or WL_POSTMASTER_DEATH. I
think it'd look cleaner to move the body of if (cur_event->events & WL_SOCKET_MASK)
into a separate function that we then also can call further down.
We could see them, AFAICS, and I don't see much advantage in making
that assumption? Shouldn't we just shove it in a loop, just like the
other platforms' implementations? Done in this version, which is best
viewed with git show --ignore-all-space.
Seems like we could use returned_events to get nevents in the way you want it,
without adding even more ++/-- to each of the different events?
Yeah. This time I use reported_events. I also fixed a maths failure:
I'd forgotten to use rc - WAIT_OBJECT_0, suggesting that CI never
reached the code.
Attachments:
v2-0001-Teach-WaitEventSetWait-to-report-multiple-events-.patchapplication/x-patch; name=v2-0001-Teach-WaitEventSetWait-to-report-multiple-events-.patchDownload
From 125deab4030d7cf8918df19fb67fc4c8bee27570 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 1 Apr 2023 12:36:12 +1300
Subject: [PATCH v2] Teach WaitEventSetWait to report multiple events on
Windows.
The WaitEventSet implementation on Windows has always reported only one
event at a time, and always the "lowest" in its event array. Since
commit 7389aad6 started using WaitEventSet to handle incoming socket
connections, this unfairness changes the accept priority when using
multiple server sockets. If one of them has a non-empty listen queue
due to incoming connections, the other might never be serviced. The
previously coding based on select() was fair in that way.
Fix, by teaching WaitEventSetWait() to poll for extra events. No change
in behavior in the common case of callers with nevents=1 (for example
the main FEBE socket code), but for the postmaster's main loop we'll
drain all the events that can fit in the output buffer, which is
deliberately set large enough to handle the maximum possible number of
sockets. This brings the Windows behavior in line with Unix.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA2dk29hr5zRP3HVJQ-_PncNJM6HVQ7aaYLXLRBZU-xw%40mail.gmail.com
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f4123e7de7..334236aefc 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1933,14 +1933,10 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
#elif defined(WAIT_USE_WIN32)
/*
- * Wait using Windows' WaitForMultipleObjects().
+ * Wait using Windows' WaitForMultipleObjects(). Each call only "consumes" one
+ * event, so we keep calling until we've filled up our output buffer.
*
- * Unfortunately this will only ever return a single readiness notification at
- * a time. Note that while the official documentation for
- * WaitForMultipleObjects is ambiguous about multiple events being "consumed"
- * with a single bWaitAll = FALSE call,
- * https://blogs.msdn.microsoft.com/oldnewthing/20150409-00/?p=44273 confirms
- * that only one event is "consumed".
+ * https://blogs.msdn.microsoft.com/oldnewthing/20150409-00/?p=44273
*/
static inline int
WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
@@ -2025,106 +2021,145 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
*/
cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];
- occurred_events->pos = cur_event->pos;
- occurred_events->user_data = cur_event->user_data;
- occurred_events->events = 0;
-
- if (cur_event->events == WL_LATCH_SET)
- {
- /*
- * We cannot use set->latch->event to reset the fired event if we
- * aren't waiting on this latch now.
- */
- if (!ResetEvent(set->handles[cur_event->pos + 1]))
- elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
-
- if (set->latch && set->latch->is_set)
- {
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_LATCH_SET;
- occurred_events++;
- returned_events++;
- }
- }
- else if (cur_event->events == WL_POSTMASTER_DEATH)
- {
- /*
- * Postmaster apparently died. Since the consequences of falsely
- * returning WL_POSTMASTER_DEATH could be pretty unpleasant, we take
- * the trouble to positively verify this with PostmasterIsAlive(),
- * even though there is no known reason to think that the event could
- * be falsely set on Windows.
- */
- if (!PostmasterIsAliveInternal())
- {
- if (set->exit_on_postmaster_death)
- proc_exit(1);
- occurred_events->fd = PGINVALID_SOCKET;
- occurred_events->events = WL_POSTMASTER_DEATH;
- occurred_events++;
- returned_events++;
- }
- }
- else if (cur_event->events & WL_SOCKET_MASK)
+ for (;;)
{
- WSANETWORKEVENTS resEvents;
- HANDLE handle = set->handles[cur_event->pos + 1];
-
- Assert(cur_event->fd);
+ int next_pos;
+ int count;
- occurred_events->fd = cur_event->fd;
+ occurred_events->pos = cur_event->pos;
+ occurred_events->user_data = cur_event->user_data;
+ occurred_events->events = 0;
- ZeroMemory(&resEvents, sizeof(resEvents));
- if (WSAEnumNetworkEvents(cur_event->fd, handle, &resEvents) != 0)
- elog(ERROR, "failed to enumerate network events: error code %d",
- WSAGetLastError());
- if ((cur_event->events & WL_SOCKET_READABLE) &&
- (resEvents.lNetworkEvents & FD_READ))
+ if (cur_event->events == WL_LATCH_SET)
{
- /* data available in socket */
- occurred_events->events |= WL_SOCKET_READABLE;
-
- /*------
- * WaitForMultipleObjects doesn't guarantee that a read event will
- * be returned if the latch is set at the same time. Even if it
- * did, the caller might drop that event expecting it to reoccur
- * on next call. So, we must force the event to be reset if this
- * WaitEventSet is used again in order to avoid an indefinite
- * hang. Refer https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
- * for the behavior of socket events.
- *------
+ /*
+ * We cannot use set->latch->event to reset the fired event if we
+ * aren't waiting on this latch now.
*/
- cur_event->reset = true;
- }
- if ((cur_event->events & WL_SOCKET_WRITEABLE) &&
- (resEvents.lNetworkEvents & FD_WRITE))
- {
- /* writeable */
- occurred_events->events |= WL_SOCKET_WRITEABLE;
- }
- if ((cur_event->events & WL_SOCKET_CONNECTED) &&
- (resEvents.lNetworkEvents & FD_CONNECT))
- {
- /* connected */
- occurred_events->events |= WL_SOCKET_CONNECTED;
+ if (!ResetEvent(set->handles[cur_event->pos + 1]))
+ elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
+
+ if (set->latch && set->latch->is_set)
+ {
+ occurred_events->fd = PGINVALID_SOCKET;
+ occurred_events->events = WL_LATCH_SET;
+ occurred_events++;
+ returned_events++;
+ }
}
- if ((cur_event->events & WL_SOCKET_ACCEPT) &&
- (resEvents.lNetworkEvents & FD_ACCEPT))
+ else if (cur_event->events == WL_POSTMASTER_DEATH)
{
- /* incoming connection could be accepted */
- occurred_events->events |= WL_SOCKET_ACCEPT;
+ /*
+ * Postmaster apparently died. Since the consequences of falsely
+ * returning WL_POSTMASTER_DEATH could be pretty unpleasant, we
+ * take the trouble to positively verify this with
+ * PostmasterIsAlive(), even though there is no known reason to
+ * think that the event could be falsely set on Windows.
+ */
+ if (!PostmasterIsAliveInternal())
+ {
+ if (set->exit_on_postmaster_death)
+ proc_exit(1);
+ occurred_events->fd = PGINVALID_SOCKET;
+ occurred_events->events = WL_POSTMASTER_DEATH;
+ occurred_events++;
+ returned_events++;
+ }
}
- if (resEvents.lNetworkEvents & FD_CLOSE)
+ else if (cur_event->events & WL_SOCKET_MASK)
{
- /* EOF/error, so signal all caller-requested socket flags */
- occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
- }
+ WSANETWORKEVENTS resEvents;
+ HANDLE handle = set->handles[cur_event->pos + 1];
- if (occurred_events->events != 0)
- {
- occurred_events++;
- returned_events++;
+ Assert(cur_event->fd);
+
+ occurred_events->fd = cur_event->fd;
+
+ ZeroMemory(&resEvents, sizeof(resEvents));
+ if (WSAEnumNetworkEvents(cur_event->fd, handle, &resEvents) != 0)
+ elog(ERROR, "failed to enumerate network events: error code %d",
+ WSAGetLastError());
+ if ((cur_event->events & WL_SOCKET_READABLE) &&
+ (resEvents.lNetworkEvents & FD_READ))
+ {
+ /* data available in socket */
+ occurred_events->events |= WL_SOCKET_READABLE;
+
+ /*------
+ * WaitForMultipleObjects doesn't guarantee that a read event
+ * will be returned if the latch is set at the same time. Even
+ * if it did, the caller might drop that event expecting it to
+ * reoccur on next call. So, we must force the event to be
+ * reset if this WaitEventSet is used again in order to avoid
+ * an indefinite hang.
+ *
+ * Refer
+ * https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx
+ * for the behavior of socket events.
+ *------
+ */
+ cur_event->reset = true;
+ }
+ if ((cur_event->events & WL_SOCKET_WRITEABLE) &&
+ (resEvents.lNetworkEvents & FD_WRITE))
+ {
+ /* writeable */
+ occurred_events->events |= WL_SOCKET_WRITEABLE;
+ }
+ if ((cur_event->events & WL_SOCKET_CONNECTED) &&
+ (resEvents.lNetworkEvents & FD_CONNECT))
+ {
+ /* connected */
+ occurred_events->events |= WL_SOCKET_CONNECTED;
+ }
+ if ((cur_event->events & WL_SOCKET_ACCEPT) &&
+ (resEvents.lNetworkEvents & FD_ACCEPT))
+ {
+ /* incoming connection could be accepted */
+ occurred_events->events |= WL_SOCKET_ACCEPT;
+ }
+ if (resEvents.lNetworkEvents & FD_CLOSE)
+ {
+ /* EOF/error, so signal all caller-requested socket flags */
+ occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
+ }
+
+ if (occurred_events->events != 0)
+ {
+ occurred_events++;
+ returned_events++;
+ }
}
+
+ /* Is the output buffer full? */
+ if (returned_events == nevents)
+ break;
+
+ /* Have we run out of possible events? */
+ next_pos = cur_event->pos + 1;
+ if (next_pos == set->nevents)
+ break;
+
+ /*
+ * Poll the rest of the event handles in the array starting at
+ * next_pos being careful to skip over the initial signal handle too.
+ * This time we use a zero timeout.
+ */
+ count = set->nevents - next_pos;
+ rc = WaitForMultipleObjects(count,
+ set->handles + 1 + next_pos,
+ false,
+ 0);
+
+ /*
+ * We don't distinguish between errors and WAIT_TIMEOUT here because
+ * we already have events to report.
+ */
+ if (rc < WAIT_OBJECT_0 || rc >= WAIT_OBJECT_0 + count)
+ break;
+
+ /* We have another event to decode. */
+ cur_event = &set->events[next_pos + (rc - WAIT_OBJECT_0)];
}
return returned_events;
--
2.40.0
On 3/31/23 11:00 PM, Thomas Munro wrote:
I mention this now because I'm not sure whether to consider this an
'open item' for 16, or merely an enhancement for 17. I guess the
former, because someone might call that a new denial of service
vector. On the other hand, if you fill up the listen queue for socket
1 with enough vigour, you're also denying service to socket 1, so I
don't know if it's worth worrying about. Opinions on that?I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?Alright, let's see if anyone else thinks this is worth fixing for 16.
[RMT hat]
Given this has sat for a bit, I wanted to see if any of your thinking
has changed on whether this should be fixed for v16 or v17. I have
personally not formed an opinion yet, but per the current discussion, it
seems like this could wait?
Thanks,
Jonathan
On Wed, May 17, 2023 at 2:57 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
On 3/31/23 11:00 PM, Thomas Munro wrote:
I mention this now because I'm not sure whether to consider this an
'open item' for 16, or merely an enhancement for 17. I guess the
former, because someone might call that a new denial of service
vector. On the other hand, if you fill up the listen queue for socket
1 with enough vigour, you're also denying service to socket 1, so I
don't know if it's worth worrying about. Opinions on that?I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?Alright, let's see if anyone else thinks this is worth fixing for 16.
[RMT hat]
Given this has sat for a bit, I wanted to see if any of your thinking
has changed on whether this should be fixed for v16 or v17. I have
personally not formed an opinion yet, but per the current discussion, it
seems like this could wait?
Yeah. No one seems to think this is worth worrying about (please
speak up if you do). I'll go ahead and remove this from the open item
lists now, but I'll leave the patch in the CF for 17, to see if a
Windows hacker/tester thinks it's a worthwhile improvement.
On 5/16/23 4:41 PM, Thomas Munro wrote:
On Wed, May 17, 2023 at 2:57 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
Given this has sat for a bit, I wanted to see if any of your thinking
has changed on whether this should be fixed for v16 or v17. I have
personally not formed an opinion yet, but per the current discussion, it
seems like this could wait?Yeah. No one seems to think this is worth worrying about (please
speak up if you do). I'll go ahead and remove this from the open item
lists now, but I'll leave the patch in the CF for 17, to see if a
Windows hacker/tester thinks it's a worthwhile improvement.
[RMT hat, personal opinion]
That seems reasonable to me.
Thanks,
Jonathan
Hi,
On 2023-05-17 08:41:24 +1200, Thomas Munro wrote:
Yeah. No one seems to think this is worth worrying about (please
speak up if you do).
+1 - we have much bigger fish to fry IMO.
Greetings,
Andres Freund
On Sat, April 1, 2023 at 11:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Hi,
Thanks for your patch.
I tried to test this patch on Windows. And I did cover the new code path below:
```
+ /* We have another event to decode. */
+ cur_event = &set->events[next_pos + (rc - WAIT_OBJECT_0)];
```
But I have one thing want to confirm:
In my tests, I think the scenario with two different events (e.g., one ending
socket connection and one incoming socket connection) has been optimized.
However, it seems that when there are multiple incoming socket connections, the
function WaitEventSetWaitBlock is called multiple times instead of being called
once. Is this our expected result?
Here are my test details:
I use the debugger to ensure that multiple events occur when the function
WaitEventSetWaitBlock is called. First, I added a breakpoint in the below code:
```
/*
* Sleep.
*
* Need to wait for ->nevents + 1, because signal handle is in [0].
*/
b rc = WaitForMultipleObjects(set->nevents + 1, set->handles, FALSE,
cur_timeout);
```
And then make sure that the postmaster process enters the function
WaitForMultipleObjects. (I think the postmaster process will only return from
the function WaitForMultipleObjects when any object is signaled or timeout
occurs). Before the timeout occurs, I set up multiple socket connections using
psql (the first connection makes the process returns from the function
WaitForMultipleObjects). Then, as I continued debugging, multiple socket
connections were handled by different calls of the function
WaitEventSetWaitBlock.
Please let me know if I am missing something.
Regards,
Wang wei
On Thu, May 18, 2023 at 8:53 PM Wei Wang (Fujitsu)
<wangw.fnst@fujitsu.com> wrote:
On Sat, April 1, 2023 at 11:00 AM Thomas Munro <thomas.munro@gmail.com> wrote: I tried to test this patch on Windows. And I did cover the new code path below: ``` + /* We have another event to decode. */ + cur_event = &set->events[next_pos + (rc - WAIT_OBJECT_0)]; ```But I have one thing want to confirm:
In my tests, I think the scenario with two different events (e.g., one ending
socket connection and one incoming socket connection) has been optimized.
However, it seems that when there are multiple incoming socket connections, the
function WaitEventSetWaitBlock is called multiple times instead of being called
once. Is this our expected result?
Thanks for testing! Maybe I am misunderstanding something: what I
expect to happen is that we call *WaitForMultipleObjects()* one extra
time (to see if there is another event available immediately, and
usually there is not), but not WaitEventSetWaitBlock().
Here are my test details:
I use the debugger to ensure that multiple events occur when the function
WaitEventSetWaitBlock is called. First, I added a breakpoint in the below code:
```
/*
* Sleep.
*
* Need to wait for ->nevents + 1, because signal handle is in [0].
*/
b rc = WaitForMultipleObjects(set->nevents + 1, set->handles, FALSE,
cur_timeout);
```
And then make sure that the postmaster process enters the function
WaitForMultipleObjects. (I think the postmaster process will only return from
the function WaitForMultipleObjects when any object is signaled or timeout
occurs). Before the timeout occurs, I set up multiple socket connections using
psql (the first connection makes the process returns from the function
WaitForMultipleObjects). Then, as I continued debugging, multiple socket
connections were handled by different calls of the function
WaitEventSetWaitBlock.
This is a good test. Also, to test the exact scenario I was worrying
about, you could initiate 4 psql sessions while the server is blocked
in a debugger, 2 on a Unix domain socket, and 2 on a TCP socket, and
then when you "continue" the server with a break on (say) accept() so
that you can "continue" each time, you should see that it alternates
between the two sockets accepting from both "fairly", instead of
draining one socket entirely first.
Please let me know if I am missing something.
I think your report already shows that it basically works correctly.
Do you think it's worth committing for 17 to make it work more like
Unix, or was I just being paranoid?