Current Win32 port status

Started by Claudio Natoliabout 22 years ago10 messages
#1Claudio Natoli
claudio.natoli@memetrics.com

Hi all,

just a small note to anyone who is interested in the status of this port.

Firstly, the fork/exec changes are coming along well. The first patch, for
fork/exec'ing of backends has been accepted and applied. A second patch, for
fork/exec'ing of the remainder of the postgres process has just been
submitted for review. Two patches (which are essentially already completed)
will follow this; the first to allow some further rearrangement of the
postmaster fork/execs in preparation for the Win32 CreateProcess calls, and
the final patch will place the actual CreateProcess calls into the
code-base.

It is reasonable to expect that we will have this changes in place within a
few weeks.

At that point, we will be within striking distance of a Win32 port. The
"only" remaining barriers to a running, albeit "imperfect", implementation
are:
* signals (non-trivial, to say the least, but encouraging to see
discussion occurring in this regard),
* a workable pipe replacement
* possible bootstrap issue between semaphores + shared memory

[these, and other remaining issues to "perfect" the port, are listed in
greater detail on
http://momjian.postgresql.org/main/writings/pgsql/win32.html]

FWIW, having kludged up some quick + dirty workarounds to the above points,
I have actually had postgres running natively on my Win2K box, which I trust
is an encouraging sign (to say the least) to anyone hanging out for this
port...

Merry Chirstmas all,
Claudio

--- 
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see 
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>
#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Claudio Natoli (#1)
Re: Current Win32 port status

Claudio Natoli wrote:

Hi all,

just a small note to anyone who is interested in the status of this port.

Firstly, the fork/exec changes are coming along well. The first patch, for
fork/exec'ing of backends has been accepted and applied. A second patch, for
fork/exec'ing of the remainder of the postgres process has just been
submitted for review. Two patches (which are essentially already completed)
will follow this; the first to allow some further rearrangement of the
postmaster fork/execs in preparation for the Win32 CreateProcess calls, and
the final patch will place the actual CreateProcess calls into the
code-base.

It is reasonable to expect that we will have this changes in place within a
few weeks.

At that point, we will be within striking distance of a Win32 port. The
"only" remaining barriers to a running, albeit "imperfect", implementation
are:
* signals (non-trivial, to say the least, but encouraging to see
discussion occurring in this regard),
* a workable pipe replacement

I don't have 'pipe' mentioned on the win32 patch. Can you give details?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Claudio Natoli
claudio.natoli@memetrics.com
In reply to: Bruce Momjian (#2)
Re: Current Win32 port status

Bruce Momjian wrote:

* a workable pipe replacement

I don't have 'pipe' mentioned on the win32 patch. Can you
give details?

Yeah you do. The second point under "Problems with select()".

Basically, the Win32 call to pipe() returns a file descriptor which is
invalid to pass on to Win32 select() (as it only takes socket handles).

So, we need to replace the select'ing mechanism under Win32 (yech), or write
a Win32 pipe() replacement that returns two socket endpoints (good enough
for our purposes), or something else...

Cheers,
Claudio

--- 
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see 
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>
#4Magnus Hagander
mha@sollentuna.net
In reply to: Claudio Natoli (#3)
Re: [HACKERS] Current Win32 port status

Bruce Momjian wrote:

* a workable pipe replacement

I don't have 'pipe' mentioned on the win32 patch. Can you
give details?

Yeah you do. The second point under "Problems with select()".

Basically, the Win32 call to pipe() returns a file descriptor
which is invalid to pass on to Win32 select() (as it only
takes socket handles).

So, we need to replace the select'ing mechanism under Win32
(yech), or write a Win32 pipe() replacement that returns two
socket endpoints (good enough for our purposes), or something else...

I think you want to be investigating
WSAEventSelect() and then WaitForMultipleObjectsEx().

WSAEventSelect() claims it needs a WSAEVENT, but according to docs
otherwhere it should accept a standard event handle on NT+ platforms.

WaitForMultiple... will accept pipes, events, anything. (The Ex function
will also allow dispatching of user APCs, see related discussion about
signals)

//Magnus

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#4)
Re: [HACKERS] Current Win32 port status

Magnus Hagander wrote:

Bruce Momjian wrote:

* a workable pipe replacement

I don't have 'pipe' mentioned on the win32 patch. Can you
give details?

Yeah you do. The second point under "Problems with select()".

Basically, the Win32 call to pipe() returns a file descriptor
which is invalid to pass on to Win32 select() (as it only
takes socket handles).

So, we need to replace the select'ing mechanism under Win32
(yech), or write a Win32 pipe() replacement that returns two
socket endpoints (good enough for our purposes), or something else...

I think you want to be investigating
WSAEventSelect() and then WaitForMultipleObjectsEx().

WSAEventSelect() claims it needs a WSAEVENT, but according to docs
otherwhere it should accept a standard event handle on NT+ platforms.

WaitForMultiple... will accept pipes, events, anything. (The Ex function
will also allow dispatching of user APCs, see related discussion about
signals)

Using a socket or a pair of sockets is a very common practice in porting
this sort of code from Unix to Windows. IIRC this is what Cygwin does
under the hood.

That would help to preserve the programming paradigms already in use in
Postgres. If it proves to be a performance bottleneck then it could be
revisited, but it seems unlikely.

Tom (rightly) admonished me not long ago that we do not need to use
every last part of the Unix API in our code. The same goes a fortiori
for the Windows API, IMNSHO. Minimal disturbance and acceptable
performance should be the initial goals.

cheers

andrew

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: [HACKERS] Current Win32 port status

Andrew Dunstan <andrew@dunslane.net> writes:

Using a socket or a pair of sockets is a very common practice in porting
this sort of code from Unix to Windows. IIRC this is what Cygwin does
under the hood.

That would help to preserve the programming paradigms already in use in
Postgres. If it proves to be a performance bottleneck then it could be
revisited, but it seems unlikely.

AFAIR there is no place in Postgres where performance of a pipe
connection is critical. Don't go out of your way to make it fast.

In fact, right offhand I only see two pipes used at all in the source
code: they are both in pgstat.c. It's fairly likely that that could be
redesigned if it poses a problem on Windows. (One of the pipes never
even transports any data; it's only used as a cheap-and-dirty means of
letting the statistics subprocess detect postmaster exit.)

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: [HACKERS] Current Win32 port status

Tom Lane wrote:

AFAIR there is no place in Postgres where performance of a pipe
connection is critical. Don't go out of your way to make it fast.

In fact, right offhand I only see two pipes used at all in the source
code: they are both in pgstat.c. It's fairly likely that that could be
redesigned if it poses a problem on Windows. (One of the pipes never
even transports any data; it's only used as a cheap-and-dirty means of
letting the statistics subprocess detect postmaster exit.)

You are correct - I noticed that. Also, the only places in the backend
where we seem to use select() on FDs are in pgstat.c and postmaster.c,
and in the latter case the FDs *are* sockets, so we won't have a problem
there on Windows. There are a couple of other places where it is used
for small sleeps (storage/lmgr/s_lock.c and access/transam/xact.c) -
those should possibly be abstracted out (Windows doesn't behave well
there anyway, I believe - with 0 FDs I read somewhere it returns
immediately regardless of the timeout setting).

Bottom line: this should be a very small nut to crack.

cheers

andrew

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#7)
select() for small sleep

I wrote:

There are a couple of other places where [select()] is used for small
sleeps (storage/lmgr/s_lock.c and access/transam/xact.c) - those
should possibly be abstracted out (Windows doesn't behave well there
anyway, I believe - with 0 FDs I read somewhere it returns immediately
regardless of the timeout setting).

What is the preferred way to handle these 2 cases? We could handle them
with #ifdef'd code inline, or create a new function pg_usleep(), or
possibly handle it with conditional macros inline. If a new function or
macro, where should they go?

cheers

andrew

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: select() for small sleep

Andrew Dunstan <andrew@dunslane.net> writes:

I wrote:

There are a couple of other places where [select()] is used for small
sleeps (storage/lmgr/s_lock.c and access/transam/xact.c) -

What is the preferred way to handle these 2 cases? We could handle them
with #ifdef'd code inline, or create a new function pg_usleep(), or
possibly handle it with conditional macros inline. If a new function or
macro, where should they go?

I'd go with a new function. There is no reason to try to "optimize"
this code by putting it inline; if you're trying to delay, another few
nanoseconds to enter a subroutine doesn't matter.

As for where, maybe make a new file in src/port/. That would make it
relatively easy to use the same function in client-side code if we
needed to.

regards, tom lane

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#9)
2 attachment(s)
pg_usleep

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I wrote:

There are a couple of other places where [select()] is used for small
sleeps (storage/lmgr/s_lock.c and access/transam/xact.c) -

What is the preferred way to handle these 2 cases? We could handle them
with #ifdef'd code inline, or create a new function pg_usleep(), or
possibly handle it with conditional macros inline. If a new function or
macro, where should they go?

I'd go with a new function. There is no reason to try to "optimize"
this code by putting it inline; if you're trying to delay, another few
nanoseconds to enter a subroutine doesn't matter.

As for where, maybe make a new file in src/port/. That would make it
relatively easy to use the same function in client-side code if we
needed to.

patch + new file attached. Haven't tested on Windows, but should be fine.

cheers

andrew

Attachments:

pg_usleep.ctext/plain; name=pg_usleep.cDownload
usleep.patchtext/plain; name=usleep.patchDownload
Index: src/Makefile.global.in
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/Makefile.global.in,v
retrieving revision 1.172
diff -c -w -r1.172 Makefile.global.in
*** src/Makefile.global.in	19 Dec 2003 23:29:15 -0000	1.172
--- src/Makefile.global.in	30 Dec 2003 18:50:56 -0000
***************
*** 342,348 ****
  #
  # substitute implementations of the C library
  
! LIBOBJS = @LIBOBJS@ path.o sprompt.o thread.o
  
  ifneq (,$(LIBOBJS))
  LIBS += -lpgport
--- 342,348 ----
  #
  # substitute implementations of the C library
  
! LIBOBJS = @LIBOBJS@ path.o sprompt.o thread.o pg_usleep.o
  
  ifneq (,$(LIBOBJS))
  LIBS += -lpgport
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/access/transam/xact.c,v
retrieving revision 1.158
diff -c -w -r1.158 xact.c
*** src/backend/access/transam/xact.c	2 Dec 2003 19:26:47 -0000	1.158
--- src/backend/access/transam/xact.c	30 Dec 2003 18:50:56 -0000
***************
*** 562,572 ****
  			if (CommitDelay > 0 && enableFsync &&
  				CountActiveBackends() >= CommitSiblings)
  			{
! 				struct timeval delay;
! 
! 				delay.tv_sec = 0;
! 				delay.tv_usec = CommitDelay;
! 				(void) select(0, NULL, NULL, NULL, &delay);
  			}
  
  			XLogFlush(recptr);
--- 562,569 ----
  			if (CommitDelay > 0 && enableFsync &&
  				CountActiveBackends() >= CommitSiblings)
  			{
! 				/* call platform independent usleep */
! 				pg_usleep(CommitDelay);
  			}
  
  			XLogFlush(recptr);
Index: src/backend/storage/lmgr/s_lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.23
diff -c -w -r1.23 s_lock.c
*** src/backend/storage/lmgr/s_lock.c	27 Dec 2003 20:58:58 -0000	1.23
--- src/backend/storage/lmgr/s_lock.c	30 Dec 2003 18:50:56 -0000
***************
*** 46,52 ****
  s_lock(volatile slock_t *lock, const char *file, int line)
  {
  	/*
! 	 * We loop tightly for awhile, then delay using select() and try
  	 * again. Preferably, "awhile" should be a small multiple of the
  	 * maximum time we expect a spinlock to be held.  100 iterations seems
  	 * about right.  In most multi-CPU scenarios, the spinlock is probably
--- 46,52 ----
  s_lock(volatile slock_t *lock, const char *file, int line)
  {
  	/*
! 	 * We loop tightly for awhile, then delay using pg_usleep() and try
  	 * again. Preferably, "awhile" should be a small multiple of the
  	 * maximum time we expect a spinlock to be held.  100 iterations seems
  	 * about right.  In most multi-CPU scenarios, the spinlock is probably
***************
*** 84,90 ****
  	int			spins = 0;
  	int			delays = 0;
  	int			cur_delay = MIN_DELAY_CSEC;
- 	struct timeval delay;
  
  	while (TAS(lock))
  	{
--- 84,89 ----
***************
*** 97,105 ****
  			if (++delays > NUM_DELAYS)
  				s_lock_stuck(lock, file, line);
  
! 			delay.tv_sec = cur_delay / 100;
! 			delay.tv_usec = (cur_delay % 100) * 10000;
! 			(void) select(0, NULL, NULL, NULL, &delay);
  
  #if defined(S_LOCK_TEST)
  			fprintf(stdout, "*");
--- 96,102 ----
  			if (++delays > NUM_DELAYS)
  				s_lock_stuck(lock, file, line);
  
! 			pg_usleep(cur_delay * 10000);
  
  #if defined(S_LOCK_TEST)
  			fprintf(stdout, "*");
Index: src/include/port.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/port.h,v
retrieving revision 1.15
diff -c -w -r1.15 port.h
*** src/include/port.h	29 Nov 2003 22:40:53 -0000	1.15
--- src/include/port.h	30 Dec 2003 18:50:56 -0000
***************
*** 122,124 ****
--- 122,127 ----
  				char *buffer, size_t buflen,
  				struct hostent **result,
  				int *herrno);
+ 
+ extern void pg_usleep(unsigned int usecs);
+