Stats collection on Windows
Hi all,
I think I've found the cause (or one of the causes) why stats
collection is unreliable on Windows and I'm wondering about the best way
to go about fixing it.
The problem is that process IDs on Windows seem to be assigned without
much rhyme or reason and it seems to happen relatively frequently that a
new process will be assigned the same process ID as a process which
recently died. If this happens before the backend has been expired out
of pgstat.c's pgStatBeDead hash, the backend will be missed.
I was thinking the postmaster could maintain a backend sequence number
with similar semantics to a UNIXish process ID which could then be used
as the key for pgStatBeDead instead of the actual process ID. Does that
sound reasonable?
Peter
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
I think I've found the cause (or one of the causes) why stats
collection is unreliable on Windows and I'm wondering about the best way
to go about fixing it.
The problem is that process IDs on Windows seem to be assigned without
much rhyme or reason and it seems to happen relatively frequently that a
new process will be assigned the same process ID as a process which
recently died. If this happens before the backend has been expired out
of pgstat.c's pgStatBeDead hash, the backend will be missed.
That's an interesting theory, but do you have any actual evidence for it?
The evidence I've seen says that our big problem on Windows is the stats
collector process just quitting due to unexplained piperead() failures.
(I mean, I'd love to blame Microsoft for everything, but even the
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
The problem is that process IDs on Windows seem to be assigned without
much rhyme or reason and it seems to happen relatively frequently that a
new process will be assigned the same process ID as a process which
recently died.That's an interesting theory, but do you have any actual evidence for it?
I can confirm that on Windows 2000 the process ID is recycled instantly.
Regards,
Qingqing
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)
Can you explain more of this? IMHO, if we rely on feature like this, the
difference is unstable-every-day vs. unstable-every-year.
Regards,
Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)
Can you explain more of this? IMHO, if we rely on feature like this, the
difference is unstable-every-day vs. unstable-every-year.
The mere existence of the kill() primitive should bring to mind reasons
why it's a bad idea.
regards, tom lane
Redmond crowd should be able to figure out that recycling
process IDs
instantly would be a stupid idea...)
Can you explain more of this? IMHO, if we rely on feature
like this,
the difference is unstable-every-day vs. unstable-every-year.
The mere existence of the kill() primitive should bring to
mind reasons why it's a bad idea.
Except the kill() primitive *does not exist* on windows.
That said, how did you go about to confirm that the pid is recycled
instantly? I was under the impression that it assignes any unused pid in
random order, which is also what a quick glance at my XP box looks like
(don't have a 2000 box around, but I wasn't aware of such a change
between those - but it's certainly not impossible). But if oyu had some
better method of determining it, please let me know :-)
If that's how, several other OSes do the same thing AFAIK - for security
reasons. For example OpenBSD. So if we rely heavily on that, we may be
in trouble elsewhere as well.
//Magnus
Import Notes
Resolved by subject fallback
On Tue, Apr 04, 2006 at 11:02:11PM -0400, Tom Lane wrote:
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
I think I've found the cause (or one of the causes) why stats
collection is unreliable on Windows and I'm wondering about the best way
to go about fixing it.
The problem is that process IDs on Windows seem to be assigned without
much rhyme or reason and it seems to happen relatively frequently that a
new process will be assigned the same process ID as a process which
recently died. If this happens before the backend has been expired out
of pgstat.c's pgStatBeDead hash, the backend will be missed.That's an interesting theory, but do you have any actual evidence for it?
The evidence I've seen says that our big problem on Windows is the stats
collector process just quitting due to unexplained piperead() failures.
(I mean, I'd love to blame Microsoft for everything, but even the
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)
Why? :-)
They use HANDLE. The process ID isn't nearly as useful as it is on UNIX.
I haven't looked at that stuff in a long time, but process "ID" on Windows
may be a compatibility method.
Process "ID" isn't necessarily a good way of identifying tasks,
precisely because they may be reused. Using a serial allocated at
process start might make more sense. Relying on it not being reused,
is not dissimilar to the old malloc() "tricks" of assuming that
malloc() will not return something recently free()d. It's bad.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Tue, Apr 04, 2006 at 11:17:49PM -0400, Tom Lane wrote:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)Can you explain more of this? IMHO, if we rely on feature like this, the
difference is unstable-every-day vs. unstable-every-year.The mere existence of the kill() primitive should bring to mind reasons
why it's a bad idea.
CreateProcess - "The process is assigned a process identifier. The
identifier is valid until the process terminates. It can be used to
identify the process, or specified in the OpenProcess function to open
a handle to the process. The initial thread in the process is also
assigned a thread identifier. ..."
TerminateProcess - "Terminates the specified process and all of its
threads."
TerminateProcess takes a HANDLE, not a process identifier. Yes, they
provide the "kill" primitive, but only as a compatibility measure. A
"good" Windows process, should maintain a HANDLE to the process, and
kill the process using the HANDLE. This way, there is no race. The
HANDLE is also how you wait for the process to terminate normally.
I prefer the "Redmond" way, in that I find UNIX's use of integer
identifiers to be encouraging of race conditions. UNIX requires hacks
like minimizing PID reuse, because UNIX is the one that is broken. :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, Apr 05, 2006 at 03:20:47AM -0400, mark@mark.mielke.cc wrote:
TerminateProcess takes a HANDLE, not a process identifier. Yes, they
provide the "kill" primitive, but only as a compatibility measure. A
"good" Windows process, should maintain a HANDLE to the process, and
kill the process using the HANDLE. This way, there is no race. The
HANDLE is also how you wait for the process to terminate normally.
Which presents the solution, we should use the HANDLE on windows rather
than the process identifier.
I prefer the "Redmond" way, in that I find UNIX's use of integer
identifiers to be encouraging of race conditions. UNIX requires hacks
like minimizing PID reuse, because UNIX is the one that is broken. :-)
Eh? A HANDLE is (or can be mapped to) an integer too. I don't see
anything on that page about handle reuse. If you run a machine long
enough I'm sure it can be reused also...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
On Wed, Apr 05, 2006 at 09:30:06AM +0200, Martijn van Oosterhout wrote:
On Wed, Apr 05, 2006 at 03:20:47AM -0400, mark@mark.mielke.cc wrote:
TerminateProcess takes a HANDLE, not a process identifier. Yes, they
provide the "kill" primitive, but only as a compatibility measure. A
"good" Windows process, should maintain a HANDLE to the process, and
kill the process using the HANDLE. This way, there is no race. The
HANDLE is also how you wait for the process to terminate normally.Which presents the solution, we should use the HANDLE on windows rather
than the process identifier.
Yes.
I prefer the "Redmond" way, in that I find UNIX's use of integer
identifiers to be encouraging of race conditions. UNIX requires hacks
like minimizing PID reuse, because UNIX is the one that is broken. :-)Eh? A HANDLE is (or can be mapped to) an integer too. I don't see
anything on that page about handle reuse. If you run a machine long
enough I'm sure it can be reused also...
Once upon a time, when I played with this stuff (I mostly use UNIX, not
Windows), I concluded to myself that HANDLE was process-local, and that
it was allocated. Meaning - it won't be re-used until you CloseHandle().
It's best then, to think of HANDLE as a opaque object. Regardless, of
whether it is process-local or not, until you run CloseHandle(), it is
yours to keep, and it won't be re-used.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, Apr 05, 2006 at 03:38:28AM -0400, mark@mark.mielke.cc wrote:
Once upon a time, when I played with this stuff (I mostly use UNIX, not
Windows), I concluded to myself that HANDLE was process-local, and that
it was allocated. Meaning - it won't be re-used until you CloseHandle().
It's best then, to think of HANDLE as a opaque object. Regardless, of
whether it is process-local or not, until you run CloseHandle(), it is
yours to keep, and it won't be re-used.
HANDLE is process local? That is worse then, because then there's no
guarentee that each process will see a different identifier.
The stats collector identifies processes by their process id, which
they get using getpid(). If instead they used a handle for their own
process (GetCurrentProcess() always returns -1, but you can apparently
clone it to get a real handle), you have no idea whether that handle is
unique amongst backends, because it's process local.
The stats collector doesn't have any open handles for the backend, it's
just a way for backends to identify themselves. It appears that process
handles are not up to the task either...
Do we have a plan C?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
Once upon a time, when I played with this stuff (I mostly use UNIX,
not Windows), I concluded to myself that HANDLE wasprocess-local, and
that it was allocated. Meaning - it won't be re-used until
you CloseHandle().
It's best then, to think of HANDLE as a opaque object.
Regardless, of
whether it is process-local or not, until you run
CloseHandle(), it is
yours to keep, and it won't be re-used.
HANDLE is process local? That is worse then, because then
there's no guarentee that each process will see a different
identifier.
HANDLE is process local. What you need to do is run DuplicateHandle() on
it specifying it should "also be valid for process Y" (for which you
need a HANDLE opened, in this case the stats collector). This will give
you a new handle whichi s valid in the *target process*, but it is *not*
valid in your own process.
A quick look shows that pids appear to be reused quite fast on Windows
2000, but *not* on XP or 2003. Don't have any other versions around to
test.
AFAIK, the pid on unix is also only vaild while the process is running.
It's not likely to be reused quickly, which appears to be the same for
XP+ on win32.
Do we have a plan C?
Well, first we need to know that this really *is* the problem. I'm
unsure if we've seen "proof" that this actually causes a real problem.
(It's certainly not the main issue with teh stats system on win32, which
is the fact that it crashes now and then)
How likely is it that the arrive-in-wrong-order really happens? And can
we do something about that one? (I assume the problem is that the
backend-die message is sent from postmaster, meaning that the die can
appear before the alive sent from the backend itself. What if we changed
it so we had one message sent from a backend doing a controlled shutdown
(99.9% of the time) and a different one when the postmaster picks up a
dead backend. And we apply the "dead counter" only to the messages from
the postmaster, and expire the record right away when we receive one
from a controlled shutdown (which would happen in-line with the
backend-start message)
//Magnus
//Magnus
Import Notes
Resolved by subject fallback
Redmond crowd should be able to figure out that
recycling process
IDs instantly would be a stupid idea...)
Can you explain more of this? IMHO, if we rely on feature
like this,
the difference is unstable-every-day vs. unstable-every-year.
The mere existence of the kill() primitive should bring to mind
reasons why it's a bad idea.CreateProcess - "The process is assigned a process
identifier. The identifier is valid until the process
terminates. It can be used to identify the process, or
specified in the OpenProcess function to open a handle to the
process. The initial thread in the process is also assigned a
thread identifier. ..."TerminateProcess - "Terminates the specified process and all
of its threads."TerminateProcess takes a HANDLE, not a process identifier.
Yes. You get the handle by doing OpenProcess() with PROCESS_TERMINATE
access.
The functions used to enumerate processes all return the process id, not
a HANDLE.(See Process32First/Process32Next). You use the HANDLE only
when you need to *modify* something (for example, killing it).
Yes, they provide the "kill" primitive, but only as a
compatibility measure.
They do? Where is it? Certainly not in the documented SDK that I can
see.
A "good" Windows process, should
maintain a HANDLE to the process, and kill the process using
the HANDLE. This way, there is no race. The HANDLE is also
how you wait for the process to terminate normally.
A "good" Windows process would be using threads ;-)
That's what the Windows API was written for, and that's why anything
that deals with multiple processes working together is a lot harder than
on Unix.
//Magnus
Import Notes
Resolved by subject fallback
On Wed, Apr 05, 2006 at 09:58:54AM +0200, Martijn van Oosterhout wrote:
On Wed, Apr 05, 2006 at 03:38:28AM -0400, mark@mark.mielke.cc wrote:
Once upon a time, when I played with this stuff (I mostly use UNIX, not
Windows), I concluded to myself that HANDLE was process-local, and that
it was allocated. Meaning - it won't be re-used until you CloseHandle().
It's best then, to think of HANDLE as a opaque object. Regardless, of
whether it is process-local or not, until you run CloseHandle(), it is
yours to keep, and it won't be re-used.HANDLE is process local? That is worse then, because then there's no
guarentee that each process will see a different identifier.
It's no different from a file descriptor on UNIX.
Neither UNIX nor Windows promise that a process identifier is valid
beyond the life of the process. UNIX avoids it from happening, as it
is necessary to avoid races with system calls such as kill(). Windows
does not have this problem.
The stats collector identifies processes by their process id, which
they get using getpid(). If instead they used a handle for their own
process (GetCurrentProcess() always returns -1, but you can apparently
clone it to get a real handle), you have no idea whether that handle is
unique amongst backends, because it's process local.
The stats collector doesn't have any open handles for the backend, it's
just a way for backends to identify themselves. It appears that process
handles are not up to the task either...
Do we have a plan C?
Sure. Serial. Allocate on process start.
Or, back to another topic from months ago - UUID generation... :-)
Process identifier should not be used beyond the life of the process.
As soon as wait() removes the process on UNIX, the process identifier
is no longer valid, and could be reused. That the operating system tries
to prevent problems by avoiding recycling isn't necessarily a good
reason to exploit this capability.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, Apr 05, 2006 at 10:31:37AM +0200, Magnus Hagander wrote:
TerminateProcess takes a HANDLE, not a process identifier.
Yes. You get the handle by doing OpenProcess() with PROCESS_TERMINATE
access.
The functions used to enumerate processes all return the process id, not
a HANDLE.(See Process32First/Process32Next). You use the HANDLE only
when you need to *modify* something (for example, killing it).
Enumerating processes isn't a common operationg for programs other
than 'top'. :-) But even so, one can use OpenProcess() to evaluate
that the process being referenced is actually the one you think it
is, before calling TerminateProcess(), eliminating the race.
Yes, they provide the "kill" primitive, but only as a
compatibility measure.They do? Where is it? Certainly not in the documented SDK that I can
see.
I believe it is called KillProcess().
A "good" Windows process, should
maintain a HANDLE to the process, and kill the process using
the HANDLE. This way, there is no race. The HANDLE is also
how you wait for the process to terminate normally.A "good" Windows process would be using threads ;-)
That's what the Windows API was written for, and that's why anything
that deals with multiple processes working together is a lot harder than
on Unix.
Haha. Good point. :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, Apr 05, 2006 at 06:03:31AM -0400, mark@mark.mielke.cc wrote:
It's no different from a file descriptor on UNIX.
Neither UNIX nor Windows promise that a process identifier is valid
beyond the life of the process. UNIX avoids it from happening, as it
is necessary to avoid races with system calls such as kill(). Windows
does not have this problem.
But consider, why is the process id there? (Amongst other reasons) so
that users can monitor pg_stat_activity and kill a backend that's out
of control. The equivalent to this in windows would be to.
1. Get HANDLE from the process ID.
2. TerminateProcess with that HANDLE.
Presumably users have a little GUI app that displays processes on the
system with the process ID so they can kill it. If a process ID is
quickly reused, they may end up killing the wrong one.
The race condition in this case involves the user and you can't solve
that programmatically. The non-reuse of pids is more for
user-friendlyness than anything else. The Window use of HANDLE doesn't
solve this problem at all.
Sure. Serial. Allocate on process start.
Or, back to another topic from months ago - UUID generation... :-)
Neither of which solve the "I'm a user and want to kill *that* backend"
problem. Because even on windows you'll have to get the process id,
convert it to a handle an kill it. Same race condition.
Process identifier should not be used beyond the life of the process.
As soon as wait() removes the process on UNIX, the process identifier
is no longer valid, and could be reused. That the operating system tries
to prevent problems by avoiding recycling isn't necessarily a good
reason to exploit this capability.
Yeah, but it's very useful as a user. Consider this scenerio:
<process 1234 goes AWOL>
kill -INT 1234
<check if process still there>
kill -TERM 1234
<process still there, damn it!>
kill -9 1234
There gone. With no quick PID reuse I can be sure I won't kill the
wrong one. This is presumably why recent versions of windows don't reuse
pids quickly either... It for *users* not programs.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
On Wed, Apr 05, 2006 at 12:20:49PM +0200, Martijn van Oosterhout wrote:
On Wed, Apr 05, 2006 at 06:03:31AM -0400, mark@mark.mielke.cc wrote:
It's no different from a file descriptor on UNIX.
Neither UNIX nor Windows promise that a process identifier is valid
beyond the life of the process. UNIX avoids it from happening, as it
is necessary to avoid races with system calls such as kill(). Windows
does not have this problem.But consider, why is the process id there? (Amongst other reasons) so
that users can monitor pg_stat_activity and kill a backend that's out
of control. The equivalent to this in windows would be to.
1. Get HANDLE from the process ID.
2. TerminateProcess with that HANDLE.
You missed 1.5. Ensure that HANDLE matches the process you intend. :-)
This is a little bit of a distraction though, as any system that requires
the user to be able to kill broken backends, is only indicative of a
broken backend. We're talking about how to deal with a broken process,
after the process owner (PostgreSQL) has forgotten about it.
What would be wrong with having a PostgreSQL generated serial associated
with each backend, that can be used by the backend process owner, to
map to a HANDLE, which uses TerminateProcess() underneath?
Presumably users have a little GUI app that displays processes on the
system with the process ID so they can kill it. If a process ID is
quickly reused, they may end up killing the wrong one.
Sure. But the problem here, is that PostgreSQL is broken, so they find a
need to go to their process listing GUI. And who is to say that the GUI
doesn't do as I've suggested above? Ensure that TerminateProcess() is
against the intended process? I have no idea if it does - but it could.
The race condition in this case involves the user and you can't solve
that programmatically. The non-reuse of pids is more for
user-friendlyness than anything else. The Window use of HANDLE doesn't
solve this problem at all.
Sure you can. But it also shouldn't matter.
Sure. Serial. Allocate on process start.
Or, back to another topic from months ago - UUID generation... :-)Neither of which solve the "I'm a user and want to kill *that* backend"
problem. Because even on windows you'll have to get the process id,
convert it to a handle an kill it. Same race condition.
No, and not if PostgreSQL kills the backend for you cleanly, using the
HANDLE, that it owns for the process.
Process identifier should not be used beyond the life of the process.
As soon as wait() removes the process on UNIX, the process identifier
is no longer valid, and could be reused. That the operating system tries
to prevent problems by avoiding recycling isn't necessarily a good
reason to exploit this capability.Yeah, but it's very useful as a user. Consider this scenerio:
<process 1234 goes AWOL>
kill -INT 1234
<check if process still there>
kill -TERM 1234
<process still there, damn it!>
kill -9 1234
There gone. With no quick PID reuse I can be sure I won't kill the
wrong one. This is presumably why recent versions of windows don't reuse
pids quickly either... It for *users* not programs.
Users shouldn't need to kill programs.
If they do - on the off chance that they do, they should be more
careful than you list above.
That's a kneejerk reaction to a problem that shouldn't occur in the
first place. It's a habit trained by UNIX users.
I rarely ever need to kill a process on my Windows box, and when I have,
the process listing in the GUI hasn't offered me a process ID. I click
on the item I wish to kill, and I right click "End Process". A lot more
user friendly, in my opinion. And if they happen to fix the race in the
background using the method I suggest above? All the better...
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
It's no different from a file descriptor on UNIX.
Neither UNIX nor Windows promise that a process identifier is valid
beyond the life of the process. UNIX avoids it fromhappening, as it
is necessary to avoid races with system calls such as
kill(). Windows
does not have this problem.
But consider, why is the process id there? (Amongst other
reasons) so that users can monitor pg_stat_activity and kill
a backend that's out of control. The equivalent to this in
windows would be to.1. Get HANDLE from the process ID.
2. TerminateProcess with that HANDLE.Presumably users have a little GUI app that displays
processes on the system with the process ID so they can kill
it. If a process ID is quickly reused, they may end up
killing the wrong one.
Actualy, not quite. That's the equivalent of kill -9. What you'd do is
"pg_ctl kill TERM <pid>". Or whatever other signal you want to send. But
you still need the pid, there is no way to get the required information
from a plain handle when you're not in that process.
Though I'd say I'm usually more interested in noticing what process it
is, rather than killing it. Possibly canceling it, but very seldomly
actually kill it. But it still needs the process id, yes.
(BTW, the GUI would generaelly be Task Manager, so it's not like it's a
special tool)
<snip rest>
//Magnus
Import Notes
Resolved by subject fallback
Yes, they provide the "kill" primitive, but only as a
compatibility
measure.
They do? Where is it? Certainly not in the documented SDK
that I can
see.
I believe it is called KillProcess().
No such function documented on my MSDN, other than one in SQL Server
DMO.
Are you referring to TerminateProcess()? If so, it can just terminate
the process, not send it arbitrary signals. Considering windows doesnt'
reallyi have the concept of signals, that's not very surprising though
:-)
//Magnus
Import Notes
Resolved by subject fallback
This is a little bit of a distraction though, as any system
that requires the user to be able to kill broken backends, is
only indicative of a broken backend. We're talking about how
to deal with a broken process, after the process owner
(PostgreSQL) has forgotten about it.
Don't confuse the postmaster with the statistics system. the postmaster
knows about the backends per handle, but the stats system runs outside
of the postmaster, and can possibly get confused.
What would be wrong with having a PostgreSQL generated serial
associated with each backend, that can be used by the backend
process owner, to map to a HANDLE, which uses
TerminateProcess() underneath?
It might be a good idea to use the combination pid+internalid. Note taht
we *don't* just want to TerminateProcess(), we need the pid to send
actual signals with. And in this case, that's not even the problem - the
problem is that we're not dropping the process from our list (I think).
Presumably users have a little GUI app that displays
processes on the
system with the process ID so they can kill it. If a process ID is
quickly reused, they may end up killing the wrong one.Sure. But the problem here, is that PostgreSQL is broken, so
they find a need to go to their process listing GUI. And who
is to say that the GUI doesn't do as I've suggested above?
Ensure that TerminateProcess() is against the intended
process? I have no idea if it does - but it could.
Because the GUI is most likely either Task Manager or Process Explorer.
There is no need for a special GUI.
Process identifier should not be used beyond the life of
the process.
As soon as wait() removes the process on UNIX, the process
identifier is no longer valid, and could be reused. That the
operating system tries to prevent problems by avoiding recycling
isn't necessarily a good reason to exploit this capability.Yeah, but it's very useful as a user. Consider this scenerio:
<process 1234 goes AWOL>
kill -INT 1234
<check if process still there>
kill -TERM 1234
<process still there, damn it!>
kill -9 1234
There gone. With no quick PID reuse I can be sure I won't kill the
wrong one. This is presumably why recent versions of windows don't
reuse pids quickly either... It for *users* not programs.Users shouldn't need to kill programs.
And normally they don't. But again, this isn't where the problem is. If
your process is stuck and you need to kill it, it most likely will still
be around once you get to killing it. If not, you were trying to kill it
too soon.
I rarely ever need to kill a process on my Windows box, and
when I have, the process listing in the GUI hasn't offered me
a process ID. I click on the item I wish to kill, and I right
click "End Process". A lot more user friendly, in my opinion.
And if they happen to fix the race in the background using
the method I suggest above? All the better...
AFAIK, Task Manager uses the pid to kill the task. Try tracing it. (And
it will happily show you the pid as well - just enable it in the
checkbox. It's the only reasonable way to differ between two processes
off the same exe, for example two different notepad:s...)
//Magnus
Import Notes
Resolved by subject fallback