pg_terminate_backend

Started by Andreas Pflugover 19 years ago28 messages
#1Andreas Pflug
pgadmin@pse-consulting.de

Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do
have console access to that machine and it's not win32 but a decent OS. For other cases I'd really really really
appreciate if that function would make it into 8.2.

utils/adt/misc.c says:

#*ifdef* NOT_USED

//* Disabled in 8.0 due to reliability concerns; FIXME someday *//
Datum
*pg_terminate_backend*(PG_FUNCTION_ARGS)

Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
remove that comment and make the function live finally?

Regards,
Andreas

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Andreas Pflug (#1)
Re: pg_terminate_backend

Andreas Pflug wrote:

Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do
have console access to that machine and it's not win32 but a decent OS.

You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM
to a backend, don't you?

cheers

andrew

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#1)
Re: pg_terminate_backend

Andreas Pflug <pgadmin@pse-consulting.de> writes:

utils/adt/misc.c says:
//* Disabled in 8.0 due to reliability concerns; FIXME someday *//
Datum
*pg_terminate_backend*(PG_FUNCTION_ARGS)

Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
remove that comment and make the function live finally?

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe. The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised. That concern is now
even more pressing than it was.

regards, tom lane

#4Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Andrew Dunstan (#2)
Re: pg_terminate_backend

Andrew Dunstan wrote:

Andreas Pflug wrote:

Since I have a stuck backend without client again, I'll have to kill
-SIGTERM a backend. Fortunately, I do have console access to that
machine and it's not win32 but a decent OS.

You do know that on Windows you can use pg_ctl to send a pseudo
SIGTERM to a backend, don't you?

The main issue still is that console access id required, on any OS.

Regards,
Andreas

#5Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#3)
Re: pg_terminate_backend

Tom Lane wrote:

Andreas Pflug <pgadmin@pse-consulting.de> writes:

utils/adt/misc.c says:
//* Disabled in 8.0 due to reliability concerns; FIXME someday *//
Datum
*pg_terminate_backend*(PG_FUNCTION_ARGS)

Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
remove that comment and make the function live finally?

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe. The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised. That concern is now
even more pressing than it was.

If the backend's stuck, I'll have to SIGTERM it, whether there's
pg_terminate_backend or not. Ultimately, if resources should remain
locked, there's no chance except restarting the whole server anyway.
SIGTERM gives me a fair chance (>90%) that it will work without restart.

The persistent refusal of supporting the function makes it more painful
to execute, but not less necessary.

Regards,
Andreas

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: pg_terminate_backend

Tom Lane wrote:

Andreas Pflug <pgadmin@pse-consulting.de> writes:

utils/adt/misc.c says:
//* Disabled in 8.0 due to reliability concerns; FIXME someday *//
Datum
*pg_terminate_backend*(PG_FUNCTION_ARGS)

Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
remove that comment and make the function live finally?

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe. The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised. That concern is now
even more pressing than it was.

I am not sure how you prove the non-existance of a bug. Ideas?

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: pg_terminate_backend

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe. The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised. That concern is now
even more pressing than it was.

I am not sure how you prove the non-existance of a bug. Ideas?

What I'm looking for is some concentrated testing. The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.

regards, tom lane

#8Csaba Nagy
nagy@ecircle-ag.com
In reply to: Tom Lane (#7)
Re: pg_terminate_backend

What I'm looking for is some concentrated testing. The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.

Now wait a minute, is there some risk of lockup if I kill a backend ?
Cause I do that relatively often (say 20 times a day, when some web
users time out but their query keeps running). Should I rather not do it
?

Thanks,
Csaba.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Csaba Nagy (#8)
Re: pg_terminate_backend

Csaba Nagy <nagy@ecircle-ag.com> writes:

Now wait a minute, is there some risk of lockup if I kill a backend ?
Cause I do that relatively often (say 20 times a day, when some web
users time out but their query keeps running). Should I rather not do it
?

statement_timeout is your friend.

regards, tom lane

#10Csaba Nagy
nagy@ecircle-ag.com
In reply to: Tom Lane (#9)
Re: pg_terminate_backend

You didn't answer the original question: is killing SIGTERM a backend
known/suspected to be dangerous ? And if yes, what's the risk (pointers
to discussions would be nice too).

statement_timeout is your friend.

I know, but unfortunately I can't use it. I did try to use
statement_timeout and it worked out quite bad (due to our usage
scenario).

Some of the web requests which time out on the web should still go
through... and we have activities which should not observe statement
timeout at all, i.e. they must finish however long that takes.

I know it would be possible to use a different user with it's own
statement timeout for those requests, but that means we have to rewrite
a lot of code which is not possible immediately, and our admins would
resist to add even more configuration (additional users=additional
connection pool+caches and all to be configured). We also can fix the
queries so no timeout happens in the first place, but that will take us
even more time.

Cheers,
Csaba.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#5)
Re: pg_terminate_backend

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Tom Lane wrote:

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe.

If the backend's stuck, I'll have to SIGTERM it, whether there's
pg_terminate_backend or not.

"Stuck?" You have not shown us a case where SIGTERM rather than SIGINT
is necessary or appropriate. It seems to me the above is assuming the
existence of unknown backend bugs, exactly the same thing you think
I shouldn't be assuming ...

regards, tom lane

#12Csaba Nagy
nagy@ecircle-ag.com
In reply to: Csaba Nagy (#10)
Re: pg_terminate_backend

On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:

You didn't answer the original question: is killing SIGTERM a backend

^^^^^^^^^^^^^^^
Nevermind, I don't do that. I do 'kill backend_pid' without specifying
the signal, and I'm sufficiently unfamiliar with the unix signal names
to have confused them. Is a plain "kill" still dangerous ?

Thanks,
Csaba.

#13Csaba Nagy
nagy@ecircle-ag.com
In reply to: Tom Lane (#11)
Re: pg_terminate_backend

"Stuck?" You have not shown us a case where SIGTERM rather than SIGINT
is necessary or appropriate. It seems to me the above is assuming the
existence of unknown backend bugs, exactly the same thing you think
I shouldn't be assuming ...

I do know a case where a plain kill will seem to be stucked: on vacuum
of a big table. I guess when it starts an index's cleanup scan it will
insist to finish it before stopping. I'm not sure if that's the cause,
but I have seen delays of 30 minutes for killing a vacuum... it's true
that finally it always did die... but it's also true that I have 'kill
-9'-ed it before because I thought it's stucked.

Cheers,
Csaba.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Csaba Nagy (#13)
Re: pg_terminate_backend

Csaba Nagy <nagy@ecircle-ag.com> writes:

I do know a case where a plain kill will seem to be stucked: on vacuum
of a big table. I guess when it starts an index's cleanup scan it will
insist to finish it before stopping.

We've fixed a few cases of missing CHECK_FOR_INTERRUPTS lately, and will
fix more if you can point them out. Note though that SIGTERM is just as
vulnerable to that as SIGINT.

regards, tom lane

#15Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#11)
Re: pg_terminate_backend

Tom Lane wrote:

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Tom Lane wrote:

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe.

If the backend's stuck, I'll have to SIGTERM it, whether there's
pg_terminate_backend or not.

"Stuck?" You have not shown us a case where SIGTERM rather than SIGINT
is necessary or appropriate.

Last night, I had a long-running query I launched from pgAdmin. It was
happily running and completing on the server (took about 2 hours), and
the backend went back to <IDLE>. pgAdmin didn't get back a response,
assuming the query was still running. Apparently, the VPN router had
interrupted the connection silently without notifying either side of the
tcp connection. Since the backend is <IDLE>, there's no query to cancel
and SIGINT won't help. So "Stuck" for me means a backend *not*
responding to SIGINT.
BTW, there's another scenario where SIGINT won't help. Imagine an app
running wild hammering the server with queries regardless of query
cancels (maybe some retry mechanism). You'd like to interrupt that
connection, i.e. get rid of the backend.

Regards,
Andreas

#16Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Csaba Nagy (#12)
Re: pg_terminate_backend

Csaba Nagy wrote:

On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:

You didn't answer the original question: is killing SIGTERM a backend

^^^^^^^^^^^^^^^
Nevermind, I don't do that. I do 'kill backend_pid' without specifying
the signal, and I'm sufficiently unfamiliar with the unix signal names
to have confused them. Is a plain "kill" still dangerous ?

SIGTERM is the default kill parameter, so you do exactly what I'm
talking about.

Regards,
Andreas

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Csaba Nagy (#12)
Re: pg_terminate_backend

Csaba Nagy <nagy@ecircle-ag.com> writes:

On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:

You didn't answer the original question: is killing SIGTERM a backend

^^^^^^^^^^^^^^^
Nevermind, I don't do that. I do 'kill backend_pid' without specifying
the signal,

"man kill" says the default is SIGTERM.

regards, tom lane

#18Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Bruce Momjian (#6)
Re: pg_terminate_backend

Bruce Momjian wrote:

I am not sure how you prove the non-existance of a bug. Ideas?

Would be worth at least the Nobel prize :-)

Regards,
Andreas

#19Csaba Nagy
nagy@ecircle-ag.com
In reply to: Tom Lane (#17)
Re: pg_terminate_backend

"man kill" says the default is SIGTERM.

OK, so that means I do use it... is it known to be dangerous ? I thought
till now that it is safe to use. What about "select pg_cancel_backend()"
?

Thanks,
Csaba.

#20Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Csaba Nagy (#19)
Re: pg_terminate_backend

Csaba Nagy wrote:

"man kill" says the default is SIGTERM.

OK, so that means I do use it... is it known to be dangerous ? I thought
till now that it is safe to use.

Apparently you never suffered any problems from that; neither did I.

What about "select pg_cancel_backend()"

That's the function wrapper around kill -SIGINT, which is probably the
way you could safely stop your queries most of the time.

Regards,
Andreas

#21korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Bruce Momjian (#6)
Re: pg_terminate_backend

I am not sure how you prove the non-existance of a bug. Ideas?

I do that by deleting all of my code (usually by accident :-)

No code, no bugs!

-- Korry

#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
Re: pg_terminate_backend

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

No, you have that backwards. The burden of proof is on those who want
it to show that it's now safe. The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised. That concern is now
even more pressing than it was.

I am not sure how you prove the non-existance of a bug. Ideas?

What I'm looking for is some concentrated testing. The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.

OK, here is an opportunity for someone to run tests to get this into
8.2. The code already exists in CVS, but we need testing to enable it.
I would think running a huge workload and killing it over and over again
would be a good test.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#22)
Re: pg_terminate_backend

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

What I'm looking for is some concentrated testing. The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.

OK, here is an opportunity for someone to run tests to get this into
8.2. The code already exists in CVS, but we need testing to enable it.
I would think running a huge workload and killing it over and over again
would be a good test.

Big multiprocess workload and you kill individual processes at random
while letting the rest run. It probably needs to be something that
stresses more of the code than pgbench would, too. (For instance,
it'd be a good idea if some of the workload involved having a few 2PC
transactions getting prepared and then either committed or rolled
back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of
corner case that's probably never been exercised.)

regards, tom lane

#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
Re: pg_terminate_backend

Thanks. Good plan.

---------------------------------------------------------------------------

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

What I'm looking for is some concentrated testing. The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.

OK, here is an opportunity for someone to run tests to get this into
8.2. The code already exists in CVS, but we need testing to enable it.
I would think running a huge workload and killing it over and over again
would be a good test.

Big multiprocess workload and you kill individual processes at random
while letting the rest run. It probably needs to be something that
stresses more of the code than pgbench would, too. (For instance,
it'd be a good idea if some of the workload involved having a few 2PC
transactions getting prepared and then either committed or rolled
back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of
corner case that's probably never been exercised.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#25Magnus Hagander
mha@sollentuna.net
In reply to: Andreas Pflug (#4)
Re: pg_terminate_backend

Since I have a stuck backend without client again, I'll have to

kill

-SIGTERM a backend. Fortunately, I do have console access to

that

machine and it's not win32 but a decent OS.

You do know that on Windows you can use pg_ctl to send a pseudo
SIGTERM to a backend, don't you?

The main issue still is that console access id required, on any OS.

Yeah.
Though for the Windows case only, we could easily enough make it
possible to run pg_ctl kill remotely, since we use a named pipe. Does
this seem like a good or bad idea?

//Magnus

#26Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Magnus Hagander (#25)
Re: pg_terminate_backend

Magnus Hagander wrote:

Since I have a stuck backend without client again, I'll have to

kill

-SIGTERM a backend. Fortunately, I do have console access to

that

machine and it's not win32 but a decent OS.

You do know that on Windows you can use pg_ctl to send a pseudo
SIGTERM to a backend, don't you?

The main issue still is that console access id required, on any OS.

Yeah.
Though for the Windows case only, we could easily enough make it
possible to run pg_ctl kill remotely, since we use a named pipe. Does
this seem like a good or bad idea?

Not too helpful. How to kill a win32 backend from a linux workstation?
Additionally, NP requires an authenticated RPC connection. I you're not
allowed to access the console, you probably haven't got sufficient
access permissions to NP as well, or you'd need extra policy tweaking or
so. Nightmarish, just to avoid the easy and intuitive way.

Regards,
Andreas

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#25)
Re: pg_terminate_backend

"Magnus Hagander" <mha@sollentuna.net> writes:

Though for the Windows case only, we could easily enough make it
possible to run pg_ctl kill remotely, since we use a named pipe. Does
this seem like a good or bad idea?

Seems like we'd be opening a can of security worms :-(

regards, tom lane

#28Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#27)
Re: pg_terminate_backend

Though for the Windows case only, we could easily enough make it
possible to run pg_ctl kill remotely, since we use a named pipe.

Does

this seem like a good or bad idea?

Seems like we'd be opening a can of security worms :-(

Not really, standard windows ACL already applies to everything, so you
need to be an admin on the machine to make it work.

Anyhoo, I don't really see the gain in it, which also seems to be what
others think, so let's just drop that idea.

//Magnus