Re: possible bug not in open items

Started by Jeff Davisabout 17 years ago16 messagesbugs
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:

http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

It may or may not be a real bug, but I didn't receive any response. If
you think it might be a bug, can you please add it to the open items?

Hmm, odd I don't have it either; can you repost it?

The docs say:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Regards,
Jeff Davis

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#1)

Jeff Davis wrote:

The docs say:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

I actually started to looked at this when you posted, but was pre-empted
with something else. Looking at the code, there's comments that actually
use COPY FROM STDIN as an example of things that should not be
interrupted by signals:

/*
* (2) Allow asynchronous signals to be executed immediately if they
* come in while we are waiting for client input. (This must be
* conditional since we don't want, say, reads on behalf of COPY FROM
* STDIN doing the same thing.)
*/
QueryCancelPending = false; /* forget any earlier CANCEL signal */
DoingCommandRead = true;

/*
* (3) read a command (loop blocks here)
*/
firstchar = ReadCommand(&input_message);

But in light of your bug report, that doesn't seem quite right. We don't
want to enable notify or catchup interrupts during COPY FROM, but we
should react to fast shutdown and query cancel.

I'm not too familiar with this code, but I think we could just enable
ImmediateInterruptOK in CopyGetData().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#1)

Jeff Davis wrote:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Tracking and grepping for pq_get* functions, there's one more place that
does a blocking read like that: reading the function oid and args in a
fastpath function call. Using v2 protocol. That has got to be deprecated
enough to not worry about :-). Then again, it wouldn't be hard to put
set ImmediateInterruptOK there as well, for the sake of completeness.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

I'm not too familiar with this code, but I think we could just enable
ImmediateInterruptOK in CopyGetData().

Only if you are wanting to break things.

The reason we don't allow client read to be interrupted is the fear of
losing protocol sync on an incomplete message. For the SIGTERM case
this would (probably) be okay, since we aren't going to pay any more
attention to the client anyway, but accepting SIGINT there is right out.

regards, tom lane

#5Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#4)

On Fri, 2009-03-27 at 15:43 -0400, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

I'm not too familiar with this code, but I think we could just enable
ImmediateInterruptOK in CopyGetData().

Only if you are wanting to break things.

The reason we don't allow client read to be interrupted is the fear of
losing protocol sync on an incomplete message. For the SIGTERM case
this would (probably) be okay, since we aren't going to pay any more
attention to the client anyway, but accepting SIGINT there is right out.

That's perfectly acceptable to me. I'm only concerned about the shutdown
case, and that's the only case that's in conflict with the docs.

Regards,
Jeff Davis

#6Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#4)

On Fri, 2009-03-27 at 15:43 -0400, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

I'm not too familiar with this code, but I think we could just enable
ImmediateInterruptOK in CopyGetData().

Only if you are wanting to break things.

Doesn't DoingCommandRead protect us in the SIGINT case?

Regards,
Jeff Davis

#7Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#3)

Where are we on this?

---------------------------------------------------------------------------

Heikki Linnakangas wrote:

Jeff Davis wrote:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Tracking and grepping for pq_get* functions, there's one more place that
does a blocking read like that: reading the function oid and args in a
fastpath function call. Using v2 protocol. That has got to be deprecated
enough to not worry about :-). Then again, it wouldn't be hard to put
set ImmediateInterruptOK there as well, for the sake of completeness.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)

Bruce Momjian <bruce@momjian.us> writes:

Where are we on this?

Pretty much nowhere --- there's no proposed patch, and I don't think
it's exactly trivial. Do you want to put it on TODO?

regards, tom lane

#9Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#8)

On Thu, 2009-04-09 at 12:59 -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Where are we on this?

Pretty much nowhere --- there's no proposed patch, and I don't think
it's exactly trivial. Do you want to put it on TODO?

Here is a patch that does what I think Heikki was suggesting. If a
proper fix is non-trivial, then I assume there's some problem with my
patch, but I'll post it for the archives anyway.

I don't see any obvious protocol synchronization problem, and it looks
like DoingCommandRead protects against that in the case of SIGINT to a
backend. And in the case of SIGTERM to a backend, the connection will be
terminated anyway.

Regards,
Jeff Davis

Attachments:

shutdown.patchtext/x-patch; charset=utf-8; name=shutdown.patchDownload+14-0
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#9)

Jeff Davis <pgsql@j-davis.com> writes:

Here is a patch that does what I think Heikki was suggesting. If a
proper fix is non-trivial, then I assume there's some problem with my
patch, but I'll post it for the archives anyway.

This patch is so wrong that it's scary. You can't have
ImmediateInterruptOK true over the duration of any significant amount of
backend processing --- as an example, if you take control away in the
middle of a malloc call, you'll probably be left with a corrupt malloc
arena.

It doesn't even work to try to take control away while control is
inside, say, the OpenSSL code. The reason we have the
prepare_for_client_read/client_read_ended hooks is to allow the flag to
be turned on over a sufficiently narrow scope --- to wit, the recv()
kernel call and nothing else --- that it's safe.

AFAICS, a safe patch for this has got to involve teaching those hooks
about COPY mode.

regards, tom lane

#11Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#10)

On Fri, 2009-04-10 at 14:47 -0400, Tom Lane wrote:

This patch is so wrong that it's scary. You can't have
ImmediateInterruptOK true over the duration of any significant amount of
backend processing --- as an example, if you take control away in the
middle of a malloc call, you'll probably be left with a corrupt malloc
arena.

Thank you for the explanation. My initial thinking was that either
DoingCommandRead would protect us (for SIGINT to the backend), or we
were going to terminate the process anyway (for SIGTERM). But it sounds
like it leaves us in a state so unsafe that we can't even abort the
transaction nicely.

Regards,
Jeff Davis

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#11)

Jeff Davis <pgsql@j-davis.com> writes:

Thank you for the explanation. My initial thinking was that either
DoingCommandRead would protect us (for SIGINT to the backend), or we
were going to terminate the process anyway (for SIGTERM). But it sounds
like it leaves us in a state so unsafe that we can't even abort the
transaction nicely.

Well, we could presumably do exit(1) regardless. But if the idea is to
have a clean shutdown, you have to get through proc_exit(), and that
requires essentially all the backend subsystems to be alive and
undamaged.

regards, tom lane

#13Bruce Momjian
bruce@momjian.us
In reply to: Jeff Davis (#1)

Was this ever addressed?

---------------------------------------------------------------------------

Jeff Davis wrote:

On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:

http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

It may or may not be a real bug, but I didn't receive any response. If
you think it might be a bug, can you please add it to the open items?

Hmm, odd I don't have it either; can you repost it?

The docs say:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Regards,
Jeff Davis

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +
#14Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#13)

On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:

Was this ever addressed?

It doesn't appear to be fixed, and I don't see it on the TODO, either.
Should we add it there?

Regards,
Jeff Davis

Show quoted text

---------------------------------------------------------------------------

Jeff Davis wrote:

On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:

http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

It may or may not be a real bug, but I didn't receive any response. If
you think it might be a bug, can you please add it to the open items?

Hmm, odd I don't have it either; can you repost it?

The docs say:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Regards,
Jeff Davis

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#14)

Jeff Davis <pgsql@j-davis.com> writes:

On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:

Was this ever addressed?

It doesn't appear to be fixed, and I don't see it on the TODO, either.
Should we add it there?

+1. It likely wouldn't be real hard to fix, but given the lack of field
complaints I'm not thinking we need to treat it as urgent.

regards, tom lane

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#15)

Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:

Was this ever addressed?

It doesn't appear to be fixed, and I don't see it on the TODO, either.
Should we add it there?

+1. It likely wouldn't be real hard to fix, but given the lack of field
complaints I'm not thinking we need to treat it as urgent.

Added to TODO:

Allow a stalled COPY to exit if the backend is terminated

* http://archives.postgresql.org/pgsql-bugs/2009-04/msg00067.php

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do