Win32 CHECK_FOR_INTERRUPTS() performance tweak
This patch improves the win32 CHECK_FOR_INTERRUPTS() performance by
testing if any unblocked signals are queued before check
pgwin32_signal_event. This avoids an unnecessary system call.
Regards,
Qingqing
---
Index: backend/port/win32/signal.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/port/win32/signal.c,v
retrieving revision 1.12
diff -u -r1.12 signal.c
--- backend/port/win32/signal.c 15 Oct 2005 02:49:23 -0000 1.12
+++ backend/port/win32/signal.c 21 Oct 2005 05:32:52 -0000
@@ -19,11 +19,12 @@
/* pg_signal_crit_sec is used to protect only pg_signal_queue. That is the only
* variable that can be accessed from the signal sending threads! */
static CRITICAL_SECTION pg_signal_crit_sec;
-static int pg_signal_queue;
static pqsigfunc pg_signal_array[PG_SIGNAL_COUNT];
static pqsigfunc pg_signal_defaults[PG_SIGNAL_COUNT];
-static int pg_signal_mask;
+
+int pg_signal_queue;
+int pg_signal_mask;
DLLIMPORT HANDLE pgwin32_signal_event;
HANDLE pgwin32_initial_signal_pipe = INVALID_HANDLE_VALUE;
@@ -91,11 +92,10 @@
int i;
EnterCriticalSection(&pg_signal_crit_sec);
- while (pg_signal_queue & ~pg_signal_mask)
+ while (UNBLOCKED_SIGNAL_QUEUE())
{
/* One or more unblocked signals queued for execution */
-
- int exec_mask = pg_signal_queue & ~pg_signal_mask;
+ int exec_mask = UNBLOCKED_SIGNAL_QUEUE();
for (i = 0; i < PG_SIGNAL_COUNT; i++)
{
Index: include/port/win32.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/port/win32.h,v
retrieving revision 1.47
diff -u -r1.47 win32.h
--- include/port/win32.h 15 Oct 2005 02:49:45 -0000 1.47
+++ include/port/win32.h 21 Oct 2005 05:33:26 -0000
@@ -214,6 +214,12 @@
/* In backend/port/win32/signal.c */
+extern int pg_signal_queue;
+extern int pg_signal_mask;
+
+#define UNBLOCKED_SIGNAL_QUEUE() \
+ (pg_signal_queue & ~pg_signal_mask)
+
extern DLLIMPORT HANDLE pgwin32_signal_event;
extern HANDLE pgwin32_initial_signal_pipe;
Index: include/miscadmin.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/miscadmin.h,v
retrieving revision 1.180
diff -u -r1.180 miscadmin.h
--- include/miscadmin.h 15 Oct 2005 02:49:41 -0000 1.180
+++ include/miscadmin.h 21 Oct 2005 05:33:56 -0000
@@ -87,8 +87,10 @@
#define CHECK_FOR_INTERRUPTS() \
do { \
- if (WaitForSingleObjectEx(pgwin32_signal_event,0,TRUE) == WAIT_OBJECT_0) \
- pgwin32_dispatch_queued_signals(); \
+ if (UNBLOCKED_SIGNAL_QUEUE()) { \
+ if (WaitForSingleObjectEx(pgwin32_signal_event,0,TRUE) == WAIT_OBJECT_0) \
+ pgwin32_dispatch_queued_signals(); \
+ } \
if (InterruptPending) \
ProcessInterrupts(); \
} while(0)
Qingqing Zhou <zhouqq@cs.toronto.edu> writes:
This patch improves the win32 CHECK_FOR_INTERRUPTS() performance by
testing if any unblocked signals are queued before check
pgwin32_signal_event. This avoids an unnecessary system call.
http://archives.postgresql.org/pgsql-patches/2005-10/msg00191.php
This looks to me like a pretty important performance tweak for Windows.
Can any of the people who worked on the Windows signal implementation
look it over and confirm it's OK?
regards, tom lane
This patch improves the win32 CHECK_FOR_INTERRUPTS() performance by
testing if any unblocked signals are queued before check
pgwin32_signal_event. This avoids an unnecessary system call.http://archives.postgresql.org/pgsql-patches/2005-10/msg00191.php
This looks to me like a pretty important performance tweak
for Windows.
Can any of the people who worked on the Windows signal
implementation look it over and confirm it's OK?
I think it looks good. (Haven't tested it yet, but from reading it.)
It seems it's safe to skip the interlocking that we do. If we miss one
signal for some reason, we will find it on the next hit to
CHECK_FOR_INTERRUPTS(). And if we get an "extra hit", we still
re-examine the stuff when we're locked, so it shouldn't be a big
problem.
Do you have any numbers on how much performance increases with it? I
agree that it looks like it could be a significant help in some cases,
but it'd be great to have numbers...
I'm a little bit concerned about doing it this late in beta, but it does
look safe to me. When it's that late, it'd be good to have one more
person review it though :)
//Magnus
Import Notes
Resolved by subject fallback
"Magnus Hagander" <mha@sollentuna.net> writes:
Do you have any numbers on how much performance increases with it?
A rough estimate is that it would cost more than half as much as EXPLAIN
ANALYZE does: that imposes two extra syscalls per ExecProcNode, while
this adds one. There are other high-frequency CHECK_FOR_INTERRUPTS
calls that I can't quantify as easily. My guess is that it's costing us
less than a factor of 2, but well more than 10%, on typical queries
... so definitely worth fixing for 8.1 if we can convince ourselves
it's correct.
regards, tom lane
I can't get the postgres to link with the patch...
Am I missing something?
Merlin
Info: resolving _pg_signal_mask by linking to __imp__pg_signal_mask
(auto-import)
Info: resolving _pg_signal_queue by linking to __imp__pg_signal_queue
(auto-import)
fu000001.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname'
fu000003.o(.idata$3+0xc): undefined reference to `libpostgres_a_iname'
nmth000000.o(.idata$4+0x0): undefined reference to `_nm__pg_signal_mask'
nmth000002.o(.idata$4+0x0): undefined reference to
`_nm__pg_signal_queue'
Import Notes
Resolved by subject fallback
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
I can't get the postgres to link with the patch...
Am I missing something?
Probably those variables need "extern DLLIMPORT".
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
... so definitely worth fixing for 8.1 if we can convince ourselves
it's correct.
Despite the performance, there is one thing I am not exactly sure. Shall we
add "volatile" quanlifier to at least pg_signal_queue? The dangerous place
is PGSemaphoreLock(). If the compiler cache this value somehow, then we are
in trouble, but the original way (check event directly) does not have this
problem.
Regards,
Qingqing
I can't get the postgres to link with the patch...
Am I missing something?
Merlin
False alarm. I had to rerun configure which copies win32.h in various
places, as Qingqing noted.
Merlin
Import Notes
Resolved by subject fallback
I can't get the postgres to link with the patch...
Am I missing something?
MerlinFalse alarm. I had to rerun configure which copies win32.h in various
places, as Qingqing noted.
Not false alarm :) Only with DLLIMPORT can I link all the libraries.
Will have performance #s up in a bit.
Merlin
Import Notes
Resolved by subject fallback
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
Shall we add "volatile" quanlifier to at least pg_signal_queue?
If that's changed by a separate thread, "volatile" seems essential.
What about the mask variable?
regards, tom lane
Qingqing Zhou wrote:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
... so definitely worth fixing for 8.1 if we can convince ourselves
it's correct.Despite the performance, there is one thing I am not exactly sure. Shall we
add "volatile" quanlifier to at least pg_signal_queue? The dangerous place
is PGSemaphoreLock(). If the compiler cache this value somehow, then we are
in trouble, but the original way (check event directly) does not have this
problem.
The fact this question is asked worries me a bit.
Also, I have a small style question - why use a nested if instead of
just saying
if (UNBLOCKED_SIGNAL_QUEUE() && WaitForSingleObjectEx(pgwin32_signal_event,0,TRUE) == WAIT_OBJECT_0)
?
cheers
andrew
Shall we add "volatile" quanlifier to at least pg_signal_queue?
If that's changed by a separate thread, "volatile" seems essential.
What about the mask variable?
Yes, that does seem right. Previously it would never be concurrently
modified, because it was always locked by the critical section, but now
we read it without locking, and we certainly don't want that optimized
away.
The mask is only ever written by the main thread, never by the signal
dispatching thread. So I think that one could do without.
//Magnus
Import Notes
Resolved by subject fallback
I can't get the postgres to link with the patch...
Am I missing something?
MerlinFalse alarm. I had to rerun configure which copies win32.h in
various
places, as Qingqing noted.
Not false alarm :) Only with DLLIMPORT can I link all the libraries.
Will have performance #s up in a bit.
I have a couple of cpu-bound performance tests that I just ran with and
without the patch. Everything is ran with n=1 until volatile issue is
sorted out but so far I am not seeing any performance
improvement...believe me I would like nothing more than to report
otherwise. Also my tests maybe don't stress the signal code much. I
observed both the running time measured from the client and the weighted
average cpu load on the server.
Maybe it make a difference with high user load.
Merlin
Import Notes
Resolved by subject fallback
On Fri, 21 Oct 2005, Magnus Hagander wrote:
Shall we add "volatile" quanlifier to at least pg_signal_queue?
If that's changed by a separate thread, "volatile" seems essential.
What about the mask variable?Yes, that does seem right. Previously it would never be concurrently
modified, because it was always locked by the critical section, but now
we read it without locking, and we certainly don't want that optimized
away.The mask is only ever written by the main thread, never by the signal
dispatching thread. So I think that one could do without.
Agreed.
Regards,
Qingqing
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
Will have performance #s up in a bit.
I have a couple of cpu-bound performance tests that I just ran with and
without the patch. Everything is ran with n=1 until volatile issue is
sorted out but so far I am not seeing any performance
improvement...
Hm, what were the tests exactly? Offhand I'd expect something like a
SELECT COUNT(*) on a large but not-too-wide table to show noticeable
improvement.
regards, tom lane
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
Hm, what were the tests exactly? Offhand I'd expect something like a
SELECT COUNT(*) on a large but not-too-wide table to show noticeable
improvement.regards, tom lane
I STAND CORRECTED! My tests were high volume record by record
iterators, etc. Read and drool, gentlemen.
Merlin
=============stock============
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 844.000 ms
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 843.000 ms
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 844.000 ms
esp=# \q
=============patched============
esp=# \timing
Timing is on.
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 453.000 ms
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 468.000 ms
esp=# select count(*) from data1.line_file;
count
--------
321306
(1 row)
Time: 469.000 ms
Import Notes
Resolved by subject fallback
Hm, what were the tests exactly? Offhand I'd expect
something like a
SELECT COUNT(*) on a large but not-too-wide table to show
noticeable
improvement.
regards, tom lane
I STAND CORRECTED! My tests were high volume record by
record iterators, etc. Read and drool, gentlemen.
Wow, that's just great!
Was that with the volatile attribute or not?
//Magnus
Import Notes
Resolved by subject fallback
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
I STAND CORRECTED! My tests were high volume record by record
iterators, etc. Read and drool, gentlemen.
Looks good to me ;-) ...
If I recall the bidding correctly, the original patch needs DLLIMPORT
qualifiers attached to both of the variables, plus volatile attached to
pg_signal_queue. Do you want to send along the modified patch, or do
you think a non-Windows-hacker can get it right the first time?
regards, tom lane
On Fri, 21 Oct 2005, Merlin Moncure wrote:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
Hm, what were the tests exactly? Offhand I'd expect something like a
SELECT COUNT(*) on a large but not-too-wide table to show noticeable
improvement.regards, tom lane
I STAND CORRECTED! My tests were high volume record by record
iterators, etc. Read and drool, gentlemen.
Great! Did you patch the "volatile" thing?
Regards,
Qingqing
Wow, that's just great!
Was that with the volatile attribute or not?
//Magnus
not. However (I'm assuming) this should not greatly impact things.
Good work. QQ: Can you fix the patch? I'm done till Monday.
Merlin
Import Notes
Resolved by subject fallback