pgbench stopped supporting large number of client connections on Windows

Started by Marina Polyakovaabout 5 years ago11 messages
#1Marina Polyakova
m.polyakova@postgrespro.ru
2 attachment(s)

Hello, hackers!

While trying to test a patch that adds a synchronization barrier in
pgbench [1]/messages/by-id/20200227180100.zyvjwzcpiokfsqm2@alap3.anarazel.de on Windows, I found that since the commit "Use ppoll(2), if
available, to wait for input in pgbench." [2]https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9 I cannot use a large
number of client connections in pgbench on my Windows virtual machines
(Windows Server 2008 R2 and Windows 2019), for example:

bin\pgbench.exe -c 90 -S -T 3 postgres

starting vacuum...end.
too many client connections for select()

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8):

bin\reindexdb.exe -j 95 postgres

reindexdb: fatal: too many jobs for this platform -- try 90

bin\vacuumdb.exe -j 95 postgres

vacuumdb: vacuuming database "postgres"
vacuumdb: fatal: too many jobs for this platform -- try 90

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see Windows
documentation [4]From https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select : Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque. and pgbench debug output in attached
pgbench_debug.txt).

src/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
* Doing a hard exit here is a bit grotty, but it doesn't seem worth
* complicating the API to make it less grotty.
*/
pg_log_fatal("too many client connections for select()");
exit(1);
}

src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup:
/*
* Fail and exit immediately if trying to use a socket in an
* unsupported range.  POSIX requires open(2) to use the lowest
* unused file descriptor and the hint given relies on that.
*/
if (PQsocket(conn) >= FD_SETSIZE)
{
pg_log_fatal("too many jobs for this platform -- try %d", i);
exit(1);
}

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch
(based on commit [3]https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8). I checked it for reindexdb and vacuumdb, and it
works for simple databases (1025 jobs are not allowed and 1024 jobs is
ok). Unfortunately, pgbench was getting connection errors when it tried
to use 1000 jobs on my virtual machines, although there were no errors
for fewer jobs (500) and the same number of clients (1000)...

Any suggestions are welcome!

[1]: /messages/by-id/20200227180100.zyvjwzcpiokfsqm2@alap3.anarazel.de
/messages/by-id/20200227180100.zyvjwzcpiokfsqm2@alap3.anarazel.de
[2]: https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9
https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9
[3]: https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8
https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8
[4]: From https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select : Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque.
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
:
Internally, socket handles in an fd_set structure are not represented as
bit flags as in Berkeley Unix. Their data representation is opaque.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

fix_max_client_conn_on_Windows.patchtext/x-diff; name=fix_max_client_conn_on_Windows.patchDownload
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbec567331ad5ea03d31af707f5e91b4c..7a54638db191982d538cabf007d82715fa254b6a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -62,6 +62,7 @@
 #include "common/string.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
+#include "fe_utils/socket_utils.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
 #include "pgbench.h"
@@ -6698,7 +6699,7 @@ clear_socket_set(socket_set *sa)
 static void
 add_socket_to_set(socket_set *sa, int fd, int idx)
 {
-	if (fd < 0 || fd >= FD_SETSIZE)
+	if (fd < 0 || !check_fd_set_size(fd, &sa->fds))
 	{
 		/*
 		 * Doing a hard exit here is a bit grotty, but it doesn't seem worth
diff --git a/src/bin/scripts/scripts_parallel.c b/src/bin/scripts/scripts_parallel.c
index ec264a269a7d7a506c347112af6be183b2aec9ce..61977741c7c5f76b1966ab8e59792a1d2b53b934 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/bin/scripts/scripts_parallel.c
@@ -25,6 +25,7 @@
 #include "common.h"
 #include "common/logging.h"
 #include "fe_utils/cancel.h"
+#include "fe_utils/socket_utils.h"
 #include "scripts_parallel.h"
 
 static void init_slot(ParallelSlot *slot, PGconn *conn);
@@ -144,6 +145,16 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
 			if (sock < 0)
 				continue;
 
+			/*
+			 * Fail and exit immediately if trying to use a socket in an
+			 * unsupported range.
+			 */
+			if (!check_fd_set_size(sock, &slotset))
+			{
+				pg_log_fatal("too many jobs for this platform -- try %d", i);
+				exit(1);
+			}
+
 			FD_SET(sock, &slotset);
 			if (sock > maxFd)
 				maxFd = sock;
@@ -221,18 +232,6 @@ ParallelSlotsSetup(const ConnParams *cparams,
 		for (i = 1; i < numslots; i++)
 		{
 			conn = connectDatabase(cparams, progname, echo, false, true);
-
-			/*
-			 * Fail and exit immediately if trying to use a socket in an
-			 * unsupported range.  POSIX requires open(2) to use the lowest
-			 * unused file descriptor and the hint given relies on that.
-			 */
-			if (PQsocket(conn) >= FD_SETSIZE)
-			{
-				pg_log_fatal("too many jobs for this platform -- try %d", i);
-				exit(1);
-			}
-
 			init_slot(slots + i, conn);
 		}
 	}
diff --git a/src/include/fe_utils/socket_utils.h b/src/include/fe_utils/socket_utils.h
new file mode 100644
index 0000000000000000000000000000000000000000..31f00c869631a8845823d99c74045557d6272e19
--- /dev/null
+++ b/src/include/fe_utils/socket_utils.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * Socket-processing utility routines for frontend code
+ *
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * src/include/fe_utils/socket_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SOCKET_UTILS_H
+#define SOCKET_UTILS_H
+
+/*
+ * Return true if the file descriptor can be added to the set with the size
+ * FD_SETSIZE and false otherwise.
+ */
+static inline bool
+check_fd_set_size(int fd, const fd_set *fds)
+{
+#ifdef WIN32
+	/*
+	 * We cannot check the socket value at runtime because on Windows it can be
+	 * greater than or equal to FD_SETSIZE.
+	 */
+	Assert(fds != NULL);
+	return (fds->fd_count < FD_SETSIZE);
+#else							/* !WIN32 */
+	/*
+	 * POSIX: the behavior of the macro FD_SET is undefined if the fd argument
+	 * is less than 0 or greater than or equal to FD_SETSIZE.
+	 */
+	return (fd < FD_SETSIZE);
+#endif							/* !WIN32 */
+}
+
+#endif							/* SOCKET_UTILS_H */
pgbench_debug.txttext/plain; name=pgbench_debug.txtDownload
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Marina Polyakova (#1)
re: pgbench stopped supporting large number of client connections on Windows

Hi Marina,
Nice catch.

rc/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
* Doing a hard exit here is a bit grotty, but it doesn't seem worth
* complicating the API to make it less grotty.
*/
pg_log_fatal("too many client connections for select()");
exit(1);
}

It seems to me that the limit is hardcode in,
src/backend/port/win32/socket.c

FD_SETSIZE * 2

that would be 2048?

regards,

Ranier Vilela

#3Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Marina Polyakova (#1)
Re: pgbench stopped supporting large number of client connections on Windows

Hello Marina,

While trying to test a patch that adds a synchronization barrier in pgbench
[1] on Windows,

Thanks for trying that, I do not have a windows setup for testing, and the
sync code I wrote for Windows is basically blind coding:-(

I found that since the commit "Use ppoll(2), if available, to
wait for input in pgbench." [2] I cannot use a large number of client
connections in pgbench on my Windows virtual machines (Windows Server 2008 R2
and Windows 2019), for example:

bin\pgbench.exe -c 90 -S -T 3 postgres

starting vacuum...end.

ISTM that 1 thread with 90 clients is a bad idea, see below.

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]):

Windows fd implementation is somehow buggy because it does not return the
smallest number available, and then with the assumption that select uses a
dense array indexed with them (true on linux, less so on Windows which
probably uses a sparse array), so that the number gets over the limit,
even if less are actually used, hence the catch, as you noted.

Another point is windows has a hardcoded number of objects one thread can
really wait for, typically 64, so that waiting for more requires actually
forking threads to do the waiting. But if you are ready to fork threads
just to wait, then probaly you could have started pgbench with more
threads in the first place. Now it would probably not make the problem go
away because fd numbers would be per process, not per thread, but it
really suggests that one should not load a thread is more than 64 clients.

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see Windows
documentation [4] and pgbench debug output in attached pgbench_debug.txt).

Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions
about the internal implementation of windows (there is a counter somewhere
in windows fd_set struct), which was rejected because if was breaking the
interface. Now your patch is basically resurrecting that. Why not if there
is no other solution, but this is quite depressing, and because it breaks
the interface it would be broken if windows changed its internals for some
reason:-(

Doesn't windows has "ppoll"? Should we implement the stuff above windows
polling capabilities and coldly skip its failed posix portability
attempts? This raises again the issue that you should not have more that
64 clients per thread anyway, because it is an intrinsic limit on windows.

I think that at one point it was suggested to error or warn if
nclients/nthreads is too great, but that was not kept in the end.

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based
on commit [3]). I checked it for reindexdb and vacuumdb, and it works for
simple databases (1025 jobs are not allowed and 1024 jobs is ok).
Unfortunately, pgbench was getting connection errors when it tried to use
1000 jobs on my virtual machines, although there were no errors for fewer
jobs (500) and the same number of clients (1000)...

It seems that the max number of threads you can start depends on available
memory, because each thread is given its own stack, so it would depend on
your vm settings?

Any suggestions are welcome!

Use ppoll, and start more threads but not too many?

--
Fabien.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fabien COELHO (#3)
Re: pgbench stopped supporting large number of client connections on Windows

Fabien COELHO <coelho@cri.ensmp.fr> writes:

Any suggestions are welcome!

Use ppoll, and start more threads but not too many?

Does ppoll exist on Windows?

There was a prior thread on this topic, which seems to have drifted off
into the sunset:

/messages/by-id/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0@BL0PR1901MB1985.namprd19.prod.outlook.com

regards, tom lane

#5Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Ranier Vilela (#2)
Re: pgbench stopped supporting large number of client connections on Windows

On 2020-11-06 23:54, Ranier Vilela wrote:

Hi Marina,

Hello!

Nice catch.

Thank you!

rc/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
* Doing a hard exit here is a bit grotty, but it doesn't seem worth
* complicating the API to make it less grotty.
*/
pg_log_fatal("too many client connections for select()");
exit(1);
}

It seems to me that the limit is hardcode in,
src/backend/port/win32/socket.c

FD_SETSIZE * 2

that would be 2048?

1) If you mean the function pgwin32_select in the file
src/backend/port/win32/socket.c, IIUC it is only used in the backend,
see src/include/port/win32_port.h:

#ifndef FRONTEND
<...>
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
<...>
#endif /* FRONTEND */

2) It looks like FD_SETSIZE does not set a limit on the socket value on
Windows, see
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Marina Polyakova (#5)
Re: pgbench stopped supporting large number of client connections on Windows

Em sáb., 7 de nov. de 2020 às 14:55, Marina Polyakova <
m.polyakova@postgrespro.ru> escreveu:

On 2020-11-06 23:54, Ranier Vilela wrote:

Hi Marina,

Hello!

1) If you mean the function pgwin32_select in the file
src/backend/port/win32/socket.c, IIUC it is only used in the backend,
see src/include/port/win32_port.h:

#ifndef FRONTEND
<...>
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
<...>
#endif /* FRONTEND */

Yes. My mistake, you right here.

2) It looks like FD_SETSIZE does not set a limit on the socket value on
Windows, see

https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

Correct.
It seems that the limit will be defined by compilation, before the
inclusion of Winsock2.h.
Have you tried to define -DFD_SETSIZE=2048

best regards,
Ranier Vilela

#7Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#4)
Re: pgbench stopped supporting large number of client connections on Windows

Hello Tom,

Use ppoll, and start more threads but not too many?

Does ppoll exist on Windows?

Some g*gling suggest that the answer is no.

There was a prior thread on this topic, which seems to have drifted off
into the sunset:

Indeed. I may have contributed to this dwindling by not adding a CF entry
for this thread, so that there was no reminder anywhere.

/messages/by-id/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0@BL0PR1901MB1985.namprd19.prod.outlook.com

It seems that there is no simple good solution around windows wait event
implementations:
- timeout seems to be milliseconds on all things I saw
- 64 is an intrinsic limit, probably because of the underlying nᅵ implementations

Maybe we could provide a specific windows implementation limited to 64 fd
(which is not a bad thing bench-wise) but with a rounded-down timeout, so
that it could end-up on an activate spinning wait in some cases, which is
probably not a bug issue, all things considered.

--
Fabien.

#8Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Fabien COELHO (#3)
1 attachment(s)
Re: pgbench stopped supporting large number of client connections on Windows

On 2020-11-07 01:01, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

Thank you for your comments!

While trying to test a patch that adds a synchronization barrier in
pgbench [1] on Windows,

Thanks for trying that, I do not have a windows setup for testing, and
the sync code I wrote for Windows is basically blind coding:-(

FYI:

1) It looks like pgbench will no longer support Windows XP due to the
function DeleteSynchronizationBarrier. From
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier
:

Minimum supported client: Windows 8 [desktop apps only]
Minimum supported server: Windows Server 2012 [desktop apps only]

On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1]/messages/by-id/alpine.DEB.2.22.394.2011021726390.989361@pseudo
has compiled without (new) warnings, but when running pgbench I got the
following error:

The procedure entry point DeleteSynchronizationBarrier could not be
located in the dynamic link library KERNEL32.dll.

2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1]/messages/by-id/alpine.DEB.2.22.394.2011021726390.989361@pseudo
with fix_max_client_conn_on_Windows.patch has compiled without (new)
warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with
and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1
-S) your patches fix problems with progress reports as in [2]/messages/by-id/20200227185129.hikscyenomnlrord@alap3.anarazel.de, but on
Windows I did not notice such changes, see attached
pgbench_runs_linux_vs_windows.zip.

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]):

Windows fd implementation is somehow buggy because it does not return
the smallest number available, and then with the assumption that
select uses a dense array indexed with them (true on linux, less so on
Windows which probably uses a sparse array), so that the number gets
over the limit, even if less are actually used, hence the catch, as
you noted.

I agree with you. It looks like the structure fd_set just contains used
sockets by this application on Windows, and the macro FD_SETSIZE is used
only here.

From
https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set
:

typedef struct fd_set {
u_int fd_count;
SOCKET fd_array[FD_SETSIZE];
} fd_set, FD_SET, *PFD_SET, *LPFD_SET;

From
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see
Windows documentation [4] and pgbench debug output in attached
pgbench_debug.txt).

Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions
about the internal implementation of windows (there is a counter
somewhere in windows fd_set struct), which was rejected because if was
breaking the interface. Now your patch is basically resurrecting that.

I tried to keep the behaviour "we check if the socket value can be used
in select() at runtime", but now I will also read that thread...

Why not if there is no other solution, but this is quite depressing,
and because it breaks the interface it would be broken if windows
changed its internals for some reason:-(

It looks like if the internals of the structure fd_set are changed, we
will also have problems with the function pgwin32_select from
src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..

(I'm writing responses to the rest of your comments but it takes
time...)

[1]: /messages/by-id/alpine.DEB.2.22.394.2011021726390.989361@pseudo
/messages/by-id/alpine.DEB.2.22.394.2011021726390.989361@pseudo
[2]: /messages/by-id/20200227185129.hikscyenomnlrord@alap3.anarazel.de
/messages/by-id/20200227185129.hikscyenomnlrord@alap3.anarazel.de

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

pgbench_runs_linux_vs_windows.zipapplication/zip; name=pgbench_runs_linux_vs_windows.zipDownload
#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Marina Polyakova (#8)
Re: pgbench stopped supporting large number of client connections on Windows

Hi Marina,

On Sun, Nov 8, 2020 at 11:59 PM Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

On 2020-11-07 01:01, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

Thank you for your comments!

While trying to test a patch that adds a synchronization barrier in
pgbench [1] on Windows,

Thanks for trying that, I do not have a windows setup for testing, and
the sync code I wrote for Windows is basically blind coding:-(

FYI:

1) It looks like pgbench will no longer support Windows XP due to the
function DeleteSynchronizationBarrier. From
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier
:

Minimum supported client: Windows 8 [desktop apps only]
Minimum supported server: Windows Server 2012 [desktop apps only]

On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1]
has compiled without (new) warnings, but when running pgbench I got the
following error:

The procedure entry point DeleteSynchronizationBarrier could not be
located in the dynamic link library KERNEL32.dll.

2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1]
with fix_max_client_conn_on_Windows.patch has compiled without (new)
warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with
and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1
-S) your patches fix problems with progress reports as in [2], but on
Windows I did not notice such changes, see attached
pgbench_runs_linux_vs_windows.zip.

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]):

Windows fd implementation is somehow buggy because it does not return
the smallest number available, and then with the assumption that
select uses a dense array indexed with them (true on linux, less so on
Windows which probably uses a sparse array), so that the number gets
over the limit, even if less are actually used, hence the catch, as
you noted.

I agree with you. It looks like the structure fd_set just contains used
sockets by this application on Windows, and the macro FD_SETSIZE is used
only here.

From
https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set
:

typedef struct fd_set {
u_int fd_count;
SOCKET fd_array[FD_SETSIZE];
} fd_set, FD_SET, *PFD_SET, *LPFD_SET;

From
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see
Windows documentation [4] and pgbench debug output in attached
pgbench_debug.txt).

Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions
about the internal implementation of windows (there is a counter
somewhere in windows fd_set struct), which was rejected because if was
breaking the interface. Now your patch is basically resurrecting that.

I tried to keep the behaviour "we check if the socket value can be used
in select() at runtime", but now I will also read that thread...

Why not if there is no other solution, but this is quite depressing,
and because it breaks the interface it would be broken if windows
changed its internals for some reason:-(

It looks like if the internals of the structure fd_set are changed, we
will also have problems with the function pgwin32_select from
src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..

(I'm writing responses to the rest of your comments but it takes
time...)

This patch on Commitfest has been "Waiting on Author" for almost 2
months. Could you share the current status? Are you updating the
patch?

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#9)
Re: pgbench stopped supporting large number of client connections on Windows

Hi,

On Thu, Jan 7, 2021 at 10:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi Marina,

On Sun, Nov 8, 2020 at 11:59 PM Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

On 2020-11-07 01:01, Fabien COELHO wrote:

Hello Marina,

Hello, Fabien!

Thank you for your comments!

While trying to test a patch that adds a synchronization barrier in
pgbench [1] on Windows,

Thanks for trying that, I do not have a windows setup for testing, and
the sync code I wrote for Windows is basically blind coding:-(

FYI:

1) It looks like pgbench will no longer support Windows XP due to the
function DeleteSynchronizationBarrier. From
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier
:

Minimum supported client: Windows 8 [desktop apps only]
Minimum supported server: Windows Server 2012 [desktop apps only]

On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1]
has compiled without (new) warnings, but when running pgbench I got the
following error:

The procedure entry point DeleteSynchronizationBarrier could not be
located in the dynamic link library KERNEL32.dll.

2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1]
with fix_max_client_conn_on_Windows.patch has compiled without (new)
warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with
and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1
-S) your patches fix problems with progress reports as in [2], but on
Windows I did not notice such changes, see attached
pgbench_runs_linux_vs_windows.zip.

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]):

Windows fd implementation is somehow buggy because it does not return
the smallest number available, and then with the assumption that
select uses a dense array indexed with them (true on linux, less so on
Windows which probably uses a sparse array), so that the number gets
over the limit, even if less are actually used, hence the catch, as
you noted.

I agree with you. It looks like the structure fd_set just contains used
sockets by this application on Windows, and the macro FD_SETSIZE is used
only here.

From
https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set
:

typedef struct fd_set {
u_int fd_count;
SOCKET fd_array[FD_SETSIZE];
} fd_set, FD_SET, *PFD_SET, *LPFD_SET;

From
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see
Windows documentation [4] and pgbench debug output in attached
pgbench_debug.txt).

Okay.

But then, how may one detect that there are too many fds in the set?

I think that an earlier version of the code needed to make assumptions
about the internal implementation of windows (there is a counter
somewhere in windows fd_set struct), which was rejected because if was
breaking the interface. Now your patch is basically resurrecting that.

I tried to keep the behaviour "we check if the socket value can be used
in select() at runtime", but now I will also read that thread...

Why not if there is no other solution, but this is quite depressing,
and because it breaks the interface it would be broken if windows
changed its internals for some reason:-(

It looks like if the internals of the structure fd_set are changed, we
will also have problems with the function pgwin32_select from
src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..

(I'm writing responses to the rest of your comments but it takes
time...)

This patch on Commitfest has been "Waiting on Author" for almost 2
months. Could you share the current status? Are you updating the
patch?

Status update for a commitfest entry.

Since this is a bug fix, I've moved this patch to the next commitfest.
I think if this patch is still inactive until the feature freeze we
can remove this entry by setting it to "Returned with Feedback".

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#11David Steele
david@pgmasters.net
In reply to: Masahiko Sawada (#10)
Re: pgbench stopped supporting large number of client connections on Windows

On 2/1/21 8:18 AM, Masahiko Sawada wrote:

This patch on Commitfest has been "Waiting on Author" for almost 2
months. Could you share the current status? Are you updating the
patch?

Status update for a commitfest entry.

Since this is a bug fix, I've moved this patch to the next commitfest.
I think if this patch is still inactive until the feature freeze we
can remove this entry by setting it to "Returned with Feedback".

If no further work is planned on this patch then we should close it. I
plan to do that on April 1 if there are no objections.

It's listed as a bug but seems like a limitation that could just be
documented.

Regards,
--
-David
david@pgmasters.net