[RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Started by MauMauover 11 years ago30 messages
#1MauMau
maumau307@gmail.com

Hello,

I'm investigating a mysterious hang problem on PostgreSQL 9.2.8. If many
sessions use temporary tables whose rows are deleted on commit, the hang
occurs. I'd like to show you the stack trace, but I'm trying to figure out
how to reproduce the problem. IIRC, the stack trace was as follows. The
standby server was running normally.

...
SyncRepWaitForLSN
CommitTransaction
CommitTransactionCommand
ProcessCatchupEvent
HandleCatchupInterrupt
procsignal_sigusr1_handler
<SIGUSR1 received>
recv
...
ReadCommand
PostgresMain
...

Looking at smgrtruncate(), the sinval message is sent even when the
truncated relation is a temporary relation. However, I think the sinval
message is not necessary for temp relations, because each session doesn't
see the temp relations of other sessions. So, the following code seems
better. This avoids sinval queue overflow which leads to SIGUSR1. Is this
correct?

if (SmgrIsTemp(reln))
/* Do his on behalf of sinval message handler */
smgrclosenode(reln->smgr_rnode);
else
CacheInvalidateSmgr(reln->smgr_rnode);

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: MauMau (#1)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

"MauMau" <maumau307@gmail.com> writes:

Looking at smgrtruncate(), the sinval message is sent even when the
truncated relation is a temporary relation. However, I think the sinval
message is not necessary for temp relations, because each session doesn't
see the temp relations of other sessions.

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files. In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3MauMau
maumau307@gmail.com
In reply to: Tom Lane (#2)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "Tom Lane" <tgl@sss.pgh.pa.us>

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files. In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.

I'm sorry I don't show the exact problem yet. Apart from that, I understood
that you insist it's not appropriate for smgr to be aware of whether the
file is a temporary relation, in terms of module layering. However, it
doesn't seem so in the current implementation. md.c, which is a layer under
or part of smgr, uses SmgrIsTemp(). In addition, as the name suggests,
SmgrIsTemp() is a function of smgr, which is defined in smgr.h. So, it's
not inappropriate for smgr to use it.

What I wanted to ask is whether and why sinval messages are really necessary
for session-private objects like temp relations. I thought shared inval is,
as the name indicates, for objects "shared" among sessions. If so, sinval
for session-private objects doesn't seem to match the concept.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: MauMau (#3)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On Wed, Jul 23, 2014 at 12:17 PM, MauMau <maumau307@gmail.com> wrote:

From: "Tom Lane" <tgl@sss.pgh.pa.us>

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files. In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.

I'm sorry I don't show the exact problem yet. Apart from that, I understood
that you insist it's not appropriate for smgr to be aware of whether the
file is a temporary relation, in terms of module layering. However, it
doesn't seem so in the current implementation. md.c, which is a layer under
or part of smgr, uses SmgrIsTemp(). In addition, as the name suggests,
SmgrIsTemp() is a function of smgr, which is defined in smgr.h. So, it's
not inappropriate for smgr to use it.

What I wanted to ask is whether and why sinval messages are really necessary
for session-private objects like temp relations. I thought shared inval is,
as the name indicates, for objects "shared" among sessions. If so, sinval
for session-private objects doesn't seem to match the concept.

I think the problem here is that it actually is possible for one
session to access the temporary objects of another session:

rhaas=# create temp table fructivore (a int);
CREATE TABLE
rhaas=# select 'fructivore'::regclass::oid;
oid
-------
24578
(1 row)

Switch windows:

rhaas=# select 24578::regclass;
regclass
----------------------
pg_temp_2.fructivore
(1 row)

rhaas=# alter table pg_temp_2.fructivore add column b int;
ALTER TABLE

Now, we could prohibit that specific thing. But at the very least, it
has to be possible for one session to drop another session's temporary
objects, because autovacuum does it eventually, and superusers will
want to do it sooner to shut autovacuum up. So it's hard to reason
about whether and to what extent it's safe to not send sinval messages
for temporary objects.

I think you might be approaching this problem from the wrong end,
though. The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record? The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables. Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#4)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-24 11:17:15 -0400, Robert Haas wrote:

I think you might be approaching this problem from the wrong end,
though.

Yep.

The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record? The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables. Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.

I think we should do what the first paragraph in
http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de
outlined. As Tom says somewhere downthread that requires some code
review, but other than that it should get rid of a fair amount of
problems.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6MauMau
maumau307@gmail.com
In reply to: Robert Haas (#4)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "Robert Haas" <robertmhaas@gmail.com>

I think the problem here is that it actually is possible for one
session to access the temporary objects of another session:
Now, we could prohibit that specific thing. But at the very least, it
has to be possible for one session to drop another session's temporary
objects, because autovacuum does it eventually, and superusers will
want to do it sooner to shut autovacuum up. So it's hard to reason
about whether and to what extent it's safe to not send sinval messages
for temporary objects.

I was a bit surprised to know that one session can access the data of
another session's temporary tables. That implenentation nay be complicating
the situation -- extra sinval messages.

I think you might be approaching this problem from the wrong end,
though. The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record? The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables. Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.

I could figure out what log record was output in the transaction started in
ProcessCatchupEvent() by inserting elog() in XLogInsert(). The log record
was (RM_STANDBY_ID, XLOG_STANDBY_LOCK).

The cause of the hang turned out clear. It was caused as follows:

1. When a transaction commits which used a temporary table created with ON
COMMIT DELETE ROWS, the sinval catchup signal (SIGUSR1) was issued from
smgrtruncate(). This is because the temporary table is truncated at
transaction end.

2. Another session, which is waiting for a client request, receives SIGUSR1.
It calls ProcessCatchupEvent().

3. ProcessCatchupEvent() calls StartTransactionCommand(), emits the
XLOG_STANDBY_LOCK WAL record, and then calls CommitTransactionCommand().

4. It then calls SyncRepWaitForLSN(), which in turn waits on the latch.

5. But the WaitLatch() never returns, because the session is already running
inside the SIGUSR1 handler in step 2. WaitLatch() needs SIGUSR1 to
complete.

I think there is a problem with the latch and SIGUSR1 mechanism. How can we
fix this problem?

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7MauMau
maumau307@gmail.com
In reply to: Andres Freund (#5)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "Andres Freund" <andres@2ndquadrant.com>

I think we should do what the first paragraph in
http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de
outlined. As Tom says somewhere downthread that requires some code
review, but other than that it should get rid of a fair amount of
problems.

As mentioned in the mail I've just sent, there seems to be a problem around
the latch and/or sinval catchup implementation.

Or, is it bad that many things are done in SIGUSR1 handler? If some
processing in SIGUSR1 handler requires waiting on a latch, it hangs at
WaitLatch(). Currently, the only processing in the backend which requires a
latch may be to wait for the sync standby. However, in the future, the
latch may be used for more tasks.

Another problem is, who knows WaitLatch() can return prematurely (before the
actual waited-for event does SetLatch()) due to the SIGUSR1 issued for
sinval catchup event?

How should we tackle these problem?

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: MauMau (#6)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

"MauMau" <maumau307@gmail.com> writes:

[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]

Ugh.

One line of thought is that it's pretty unsafe to be doing anything
as complicated as transaction start/commit in a signal handler, even one
that is sure it's not interrupting anything else. The only excuse
ProcessCatchupEvent has for that is that it's "trying to ensure proper
cleanup from an error". Perhaps with closer analysis we could convince
ourselves that errors thrown from AcceptInvalidationMessages() wouldn't
need transaction abort cleanup.

For that matter, it's not exactly clear that an error thrown out of
there would result in good things even with the transaction
infrastructure. Presuming that we're waiting for client input, an
error longjmp out of ProcessCatchupEvent could mean taking control
away from OpenSSL, and I bet that won't end well. Maybe we should
just be doing

PG_TRY
AcceptInvalidationMessages();
PG_CATCH
elog(FATAL, "curl up and die");

ProcessIncomingNotify is also using
StartTransactionCommand()/CommitTransactionCommand(), so that would
need some thought too.

Or we could try to get rid of the need to do anything beyond setting
a flag in the interrupt handler itself. But I'm afraid that's probably
unworkable, especially now that we use SA_RESTART on all signals.

Another line of thought is that we've been way too uncritical about
shoving different kinds of events into the SIGUSR1 multiplexor.
It might be a good idea to separate "high level" interrupts from
"low level" ones, using say SIGUSR1 for the former and SIGUSR2
for the latter. However, that doesn't sound very back-patchable,
even assuming that we can come up with a clean division.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-26 11:32:24 -0400, Tom Lane wrote:

"MauMau" <maumau307@gmail.com> writes:

[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]

Ugh.

One line of thought is that it's pretty unsafe to be doing anything
as complicated as transaction start/commit in a signal handler, even one
that is sure it's not interrupting anything else.

Yea, that's really not nice.

The only excuse
ProcessCatchupEvent has for that is that it's "trying to ensure proper
cleanup from an error". Perhaps with closer analysis we could convince
ourselves that errors thrown from AcceptInvalidationMessages() wouldn't
need transaction abort cleanup.

Hm. I'm not convinced that's going to be easy.

Wouldn't it be better to move the catchup interrupt processing out of
the signal handler? For normal backends we only enable when reading from
the client and DoingCommandRead is set. How about setting a variable in
the signal handler and doing the actual catchup processing after the
recv() returned EINTR? That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv. I think that'd be
a much safer model and after researching it a bit for the idle in
transaction timeout thing it doesn't look super hard. Even windows seems
to already support everything necessary.

Or we could try to get rid of the need to do anything beyond setting
a flag in the interrupt handler itself. But I'm afraid that's probably
unworkable, especially now that we use SA_RESTART on all signals.

Yea :(. Several platforms actually ignore SA_RESTART for
send/recv/select/... under some circumstances (notably linux), but it'd
probably be hard to get it right for all.

I don't think we can continue to use the blocking calls for much longer
unless we allow them to be interruptible. But I doubt that change would
be backpatchable.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Andres Freund <andres@2ndquadrant.com> writes:

Wouldn't it be better to move the catchup interrupt processing out of
the signal handler? For normal backends we only enable when reading from
the client and DoingCommandRead is set. How about setting a variable in
the signal handler and doing the actual catchup processing after the
recv() returned EINTR?

Only it won't. See SA_RESTART. I think turning that off is a nonstarter,
as per previous discussions.

That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv.

Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
We already have a hook that lets us do the actual recv even when using
OpenSSL, and in principle that function could do interrupt-service-like
functions if it got kicked off the recv().

Anything in this line is going to be a bigger change than I'd want to
back-patch, though. Are we OK with not fixing the problem in the back
branches? Given the shortage of field complaints, that might be all
right.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#10)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-26 13:58:38 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv.

Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
We already have a hook that lets us do the actual recv even when using
OpenSSL, and in principle that function could do interrupt-service-like
functions if it got kicked off the recv().

I've started playing with this. Looks clearly worthwile.

I think if we do it right we pretty much can get rid of the whole
prepare_for_client_read() machinery and handle everything via
ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
with joy.

I'm not yet entirely sure where the interrupt processing should happen,
but I guess that'll fall out of the work at some point. The important
bit imo is to *not* not do anything but return with BIO_set_retry_*()
from my_sock_read/write(). That then allows us to implement stuff like
the idle transaction timeout with much fewer problems.

I probably won't finish doing this before leaving on holidays, so nobody
should hesitate to look themselves if interested. If not, I plan to pick
this up again. I think it's a prerequisite to getting rid of the FATAL
for recovery conflict interrupts which I really would like to do.

Anything in this line is going to be a bigger change than I'd want to
back-patch, though.

Agreed. I don't think it will, but it very well could have performance
implications. Besides the obvious risk of bugs...

Are we OK with not fixing the problem in the back
branches? Given the shortage of field complaints, that might be all
right.

I'm not really comfortable with that. How about simply flagging a couple
contexts to not do the SyncRepWaitForLsn() dance? Possibly just by doing
something ugly like
SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL,
PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR)?
during startup, inval and similar transaction commands? Not pretty, but
it looks simple enough to be backpatchable.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12MauMau
maumau307@gmail.com
In reply to: Tom Lane (#8)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "Tom Lane" <tgl@sss.pgh.pa.us>

[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]

I must add one thing. After some client processes closed the connection
without any hang, their server processes were stuck with a stack trace like
this (I'll look for and show the exact stack trace tomorrow):

WaitLatchOrSocket
SyncRepWaitForLSN
CommitTransaction
CommitTransactionCommand
ProcessCatchupEvent
...
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
...

The process appears to be hanging during session termination. So, it's not
the problem only during client request wait.

Another line of thought is that we've been way too uncritical about
shoving different kinds of events into the SIGUSR1 multiplexor.
It might be a good idea to separate "high level" interrupts from
"low level" ones, using say SIGUSR1 for the former and SIGUSR2
for the latter. However, that doesn't sound very back-patchable,
even assuming that we can come up with a clean division.

This seems to be one step in the right direction. There are two issues in
the current implementation:

[Issue 1]
[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]
This is (partly) because the latch wakeup and other processing use the same
SIGUSR1 in normal backend, autovacuum launcher/worker, and the background
worker with database access. On the other hand, other background daemon
processes properly use SIGUSR1 only for latch wakeup, and SIGUSR2 for other
tasks.

[Issue 2]
WaitLatch() returns prematurely due to the sinval catchup signal, even
though the target event (e.g. reply from standby) hasn't occurred and called
SetLatch() yet. This is because procsignal_sigusr1_handler() always calls
latch_sigusr1_handler().
This is probably not related to the cause of the hang.

So, as you suggest, I think it would be a good idea to separate
latch_sigusr1_handler() call into a different function solely for it like
other daemon processes, and leave the rest in procsignal_sigusr1_handler()
and rename function to procsignal_sigusr2_handler() or procsignal_handler().
Originally, it's not natural that the current procsignal_sigusr1_handler()
contains latch_sigusr1_handler() call, because latch processing is not based
on the procsignal mechanism (SetLatch() doesn't call SendProcSignal()).

I'll try the fix tomorrow if possible. What kind of problems do you hink of
for back-patching?

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13MauMau
maumau307@gmail.com
In reply to: MauMau (#12)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "MauMau" <maumau307@gmail.com>

I must add one thing. After some client processes closed the connection
without any hang, their server processes were stuck with a stack trace
like this (I'll look for and show the exact stack trace tomorrow):

I found two kinds of stack traces:

#0 0x0000003199ec488f in poll () from /lib64/libc.so.6
#1 0x0000000000609f24 in WaitLatchOrSocket ()
#2 0x000000000063ad92 in SyncRepWaitForLSN ()
#3 0x00000000004ad474 in CommitTransaction ()
#4 0x00000000004aef53 in CommitTransactionCommand ()
#5 0x000000000064b547 in shmem_exit ()
#6 0x000000000064b625 in proc_exit_prepare ()
#7 0x000000000064b6a8 in proc_exit ()
#8 0x0000000000668a94 in PostgresMain ()
#9 0x0000000000617f2c in ServerLoop ()
#10 0x000000000061ae96 in PostmasterMain ()
#11 0x00000000005b2ccf in main ()

#0 0x0000003f4badf258 in poll () from /lib64/libc.so.6
#1 0x0000000000619b94 in WaitLatchOrSocket ()
#2 0x0000000000640c4c in SyncRepWaitForLSN ()
#3 0x0000000000491c18 in RecordTransactionCommit ()
#4 0x0000000000491d98 in CommitTransaction ()
#5 0x0000000000493135 in CommitTransactionCommand ()
#6 0x0000000000653fc5 in ProcessCatchupEvent ()
#7 0x00000000006540ed in HandleCatchupInterrupt ()
#8 0x00000000006533e3 in procsignal_sigusr1_handler ()
#9 <signal handler called>
#10 0x0000003f4bae96b0 in recv () from /lib64/libc.so.6
#11 0x00000000005b75f6 in secure_read ()
#12 0x00000000005c223b in pq_recvbuf ()
#13 0x00000000005c263b in pq_getbyte ()
#14 0x000000000066e081 in PostgresMain ()
#15 0x0000000000627d81 in PostmasterMain ()
#16 0x00000000005c4803 in main ()

I'll try the fix tomorrow if possible. What kind of problems do you hink
of for back-patching?

I could reproduce the problem with 9.2.8, but have not yet with 9.5dev.
I'll try with 9.2.9, and create the fix.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#11)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-26 20:20:05 +0200, Andres Freund wrote:

On 2014-07-26 13:58:38 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv.

Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
We already have a hook that lets us do the actual recv even when using
OpenSSL, and in principle that function could do interrupt-service-like
functions if it got kicked off the recv().

I've started playing with this. Looks clearly worthwile.

I think if we do it right we pretty much can get rid of the whole
prepare_for_client_read() machinery and handle everything via
ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
with joy.

One thing I am wondering about around this is: Why are we only
processing catchup events when DoingCommandRead? There's other paths
where we can wait for data from the client for a long time. Obviously we
don't want to process async.c stuff from inside copy, but I don't see
why that's the case for sinval.c.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#14)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Andres Freund <andres@2ndquadrant.com> writes:

One thing I am wondering about around this is: Why are we only
processing catchup events when DoingCommandRead? There's other paths
where we can wait for data from the client for a long time. Obviously we
don't want to process async.c stuff from inside copy, but I don't see
why that's the case for sinval.c.

It might be all right to do it during copy, but I'd just as soon treat
that as a separate issue. If you merge it into the basic patch then it
might be hard to get rid of if there are problems.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#15)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-28 15:29:57 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

One thing I am wondering about around this is: Why are we only
processing catchup events when DoingCommandRead? There's other paths
where we can wait for data from the client for a long time. Obviously we
don't want to process async.c stuff from inside copy, but I don't see
why that's the case for sinval.c.

It might be all right to do it during copy, but I'd just as soon treat
that as a separate issue. If you merge it into the basic patch then it
might be hard to get rid of if there are problems.

Yea, not planning to merge it. Just wondering to make sure I understand
all the implications.

Another thing I'm wondering about - also not for the basic patch - is
accepting termination while writing to the client. It's rather annoying
that we currently don't allow to pg_terminate_backend() when writing to
the client.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#11)
1 attachment(s)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-07-26 20:20:05 +0200, Andres Freund wrote:

On 2014-07-26 13:58:38 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv.

Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
We already have a hook that lets us do the actual recv even when using
OpenSSL, and in principle that function could do interrupt-service-like
functions if it got kicked off the recv().

I've started playing with this. Looks clearly worthwile.

I think if we do it right we pretty much can get rid of the whole
prepare_for_client_read() machinery and handle everything via
ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
with joy.

I'm not yet entirely sure where the interrupt processing should happen,
but I guess that'll fall out of the work at some point. The important
bit imo is to *not* not do anything but return with BIO_set_retry_*()
from my_sock_read/write(). That then allows us to implement stuff like
the idle transaction timeout with much fewer problems.

I probably won't finish doing this before leaving on holidays, so nobody
should hesitate to look themselves if interested. If not, I plan to pick
this up again. I think it's a prerequisite to getting rid of the FATAL
for recovery conflict interrupts which I really would like to do.

I tried to get something reviewable before leaving, but alas, there's
far too many edgecases to consider. A good part of it has to do with my
decision to always operate the underlying socket in nonblocking mode and
do all the writing using latches. I think that's the right decision,
because it allows reliable interruptions everywhere and is easier than
duplicated codepaths. But I realize that's not guaranteed to be well
liked.

a) Latches aren't ready early enough. We need to read/write to the
socket long before MyProc is initialized. To have sane behaviour during
early startup we need to be canceleable there was well.

Right now I'm just busy looping in that case, but that's obviously not
acceptable. My best idea is to have another latch that we can use during
early startup. Say *MyProcLatch. Initially that points to a process
local latch and once initialized it points to MyProc->procLatch.

b) Latches don't support WL_WRITEABLE without WL_READABLE. I've simply
changed the code in both latch implementations to poll for errors in
both cases and set both readable/writeable if an error occurs. That
seems easy enough. Are there any bigger caveats for changing this?

c) There's a couple more or less undocumented
pgwin32_waitforsinglesocket() calls in be-secure.c. Afaics they're
likely partially already not required and definitely not required after
socket handling is baded on latches.

d) prepare_for_client_read(), client_read_ended() are now really quite a
misnomer. Because interrupts interrupt send/recv appropriately and
because we do *not* want to process them inside my_sock_read|write (so
we don't recursively enter openssl to send the FATAL to the client)
they're not used from within ssl anymore. But on a higher level, just
like:
prepare_for_client_read();
CHECK_FOR_INTERRUPTS();
client_read_ended();
Not sure if that's the right abstraction for just a:
EnableNotifyInterrupt();
EnableCatchupInterrupt();
CHECK_FOR_INTERRUPTS();
DisableNotifyInterrupt();
DisableCatchupInterrupt();
where the Enable* just set a variable so CHECK_FOR_INTERRUPTS can
process the events if occuring.

Doing it that way allows to throw away *large* chunks of code from
sinval.c and async.c because they don't have to care about doing
dangerous stuff from signal handlers anymore.

But the above needs a new name.

Even though it ends up being a bit more work than I anticipated, the
result still seems like a significant improvement over the current
code.

Will get back to this once I'm back (~10 days). Unless somebody else
wants to pick this up. I've at attached my *heavily WIP* patch.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

latch-based-be-secure-WIP.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 92f2077..165c4aa 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -126,6 +126,7 @@
 #include "miscadmin.h"
 #include "storage/ipc.h"
 #include "storage/lmgr.h"
+#include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/procsignal.h"
 #include "storage/sinval.h"
@@ -341,10 +342,9 @@ static List *upperPendingNotifies = NIL;		/* list of upper-xact lists */
  *
  * NB: the "volatile" on these declarations is critical!  If your compiler
  * does not grok "volatile", you'd be best advised to compile this file
- * with all optimization turned off.
- */
-static volatile sig_atomic_t notifyInterruptEnabled = 0;
-static volatile sig_atomic_t notifyInterruptOccurred = 0;
+ * with all optimization turned off. */
+volatile sig_atomic_t notifyInterruptOK = 0;
+volatile sig_atomic_t notifyInterruptPending = 0;
 
 /* True if we've registered an on_shmem_exit cleanup */
 static bool unlistenExitRegistered = false;
@@ -1625,11 +1625,8 @@ AtSubAbort_Notify(void)
 /*
  * HandleNotifyInterrupt
  *
- *		This is called when PROCSIG_NOTIFY_INTERRUPT is received.
- *
- *		If we are idle (notifyInterruptEnabled is set), we can safely invoke
- *		ProcessIncomingNotify directly.  Otherwise, just set a flag
- *		to do it later.
+ *		This is called the next time ProcessInterrupts is run after
+ *		PROCSIG_NOTIFY_INTERRUPT has been received.
  */
 void
 HandleNotifyInterrupt(void)
@@ -1641,69 +1638,13 @@ HandleNotifyInterrupt(void)
 	 * they were ever turned on.
 	 */
 
-	/* Don't joggle the elbow of proc_exit */
-	if (proc_exit_inprogress)
-		return;
-
-	if (notifyInterruptEnabled)
-	{
-		bool		save_ImmediateInterruptOK = ImmediateInterruptOK;
-
-		/*
-		 * We may be called while ImmediateInterruptOK is true; turn it off
-		 * while messing with the NOTIFY state.  This prevents problems if
-		 * SIGINT or similar arrives while we're working.  Just to be real
-		 * sure, bump the interrupt holdoff counter as well.  That way, even
-		 * if something inside ProcessIncomingNotify() transiently sets
-		 * ImmediateInterruptOK (eg while waiting on a lock), we won't get
-		 * interrupted until we're done with the notify interrupt.
-		 */
-		ImmediateInterruptOK = false;
-		HOLD_INTERRUPTS();
+	/* signal that work needs to be done */
+	notifyInterruptPending = 1;
 
-		/*
-		 * I'm not sure whether some flavors of Unix might allow another
-		 * SIGUSR1 occurrence to recursively interrupt this routine. To cope
-		 * with the possibility, we do the same sort of dance that
-		 * EnableNotifyInterrupt must do --- see that routine for comments.
-		 */
-		notifyInterruptEnabled = 0;		/* disable any recursive signal */
-		notifyInterruptOccurred = 1;	/* do at least one iteration */
-		for (;;)
-		{
-			notifyInterruptEnabled = 1;
-			if (!notifyInterruptOccurred)
-				break;
-			notifyInterruptEnabled = 0;
-			if (notifyInterruptOccurred)
-			{
-				/* Here, it is finally safe to do stuff. */
-				if (Trace_notify)
-					elog(DEBUG1, "HandleNotifyInterrupt: perform async notify");
-
-				ProcessIncomingNotify();
-
-				if (Trace_notify)
-					elog(DEBUG1, "HandleNotifyInterrupt: done");
-			}
-		}
-
-		/*
-		 * Restore the holdoff level and ImmediateInterruptOK, and check for
-		 * interrupts if needed.
-		 */
-		RESUME_INTERRUPTS();
-		ImmediateInterruptOK = save_ImmediateInterruptOK;
-		if (save_ImmediateInterruptOK)
-			CHECK_FOR_INTERRUPTS();
-	}
-	else
-	{
-		/*
-		 * In this path it is NOT SAFE to do much of anything, except this:
-		 */
-		notifyInterruptOccurred = 1;
-	}
+	/* make sure the event is processed in due course */
+	/* XXX: arguably we should only do this if notifyInterruptOK */
+	if (MyProc != NULL)
+		SetLatch(&MyProc->procLatch);
 }
 
 /*
@@ -1723,44 +1664,11 @@ EnableNotifyInterrupt(void)
 	if (IsTransactionOrTransactionBlock())
 		return;					/* not really idle */
 
-	/*
-	 * This code is tricky because we are communicating with a signal handler
-	 * that could interrupt us at any point.  If we just checked
-	 * notifyInterruptOccurred and then set notifyInterruptEnabled, we could
-	 * fail to respond promptly to a signal that happens in between those two
-	 * steps.  (A very small time window, perhaps, but Murphy's Law says you
-	 * can hit it...)  Instead, we first set the enable flag, then test the
-	 * occurred flag.  If we see an unserviced interrupt has occurred, we
-	 * re-clear the enable flag before going off to do the service work. (That
-	 * prevents re-entrant invocation of ProcessIncomingNotify() if another
-	 * interrupt occurs.) If an interrupt comes in between the setting and
-	 * clearing of notifyInterruptEnabled, then it will have done the service
-	 * work and left notifyInterruptOccurred zero, so we have to check again
-	 * after clearing enable.  The whole thing has to be in a loop in case
-	 * another interrupt occurs while we're servicing the first. Once we get
-	 * out of the loop, enable is set and we know there is no unserviced
-	 * interrupt.
-	 *
-	 * NB: an overenthusiastic optimizing compiler could easily break this
-	 * code. Hopefully, they all understand what "volatile" means these days.
-	 */
-	for (;;)
+	while (notifyInterruptPending)
 	{
-		notifyInterruptEnabled = 1;
-		if (!notifyInterruptOccurred)
-			break;
-		notifyInterruptEnabled = 0;
-		if (notifyInterruptOccurred)
-		{
-			if (Trace_notify)
-				elog(DEBUG1, "EnableNotifyInterrupt: perform async notify");
-
-			ProcessIncomingNotify();
-
-			if (Trace_notify)
-				elog(DEBUG1, "EnableNotifyInterrupt: done");
-		}
+		ProcessIncomingNotify();
 	}
+	notifyInterruptOK = true;
 }
 
 /*
@@ -1777,13 +1685,23 @@ EnableNotifyInterrupt(void)
 bool
 DisableNotifyInterrupt(void)
 {
-	bool		result = (notifyInterruptEnabled != 0);
+	bool        result = (notifyInterruptOK != 0);
 
-	notifyInterruptEnabled = 0;
+	notifyInterruptOK = true;
 
 	return result;
 }
 
+void
+ProcessNotifyInterrupt(void)
+{
+	if (IsTransactionOrTransactionBlock())
+		return;					/* not really idle */
+
+	ProcessIncomingNotify();
+}
+
+
 /*
  * Read all pending notifications from the queue, and deliver appropriate
  * ones to my frontend.  Stop when we reach queue head or an uncommitted
@@ -2076,9 +1994,10 @@ asyncQueueAdvanceTail(void)
 /*
  * ProcessIncomingNotify
  *
- *		Deal with arriving NOTIFYs from other backends.
- *		This is called either directly from the PROCSIG_NOTIFY_INTERRUPT
- *		signal handler, or the next time control reaches the outer idle loop.
+ *		Deal with arriving NOTIFYs from other backends as soon as it's safe to
+ *		do so. This used to be called from the PROCSIG_NOTIFY_INTERRUPT
+ *		signal handler, but isn't anymore.
+ *
  *		Scan the queue for arriving notifications and report them to my front
  *		end.
  *
@@ -2087,18 +2006,13 @@ asyncQueueAdvanceTail(void)
 static void
 ProcessIncomingNotify(void)
 {
-	bool		catchup_enabled;
-
 	/* We *must* reset the flag */
-	notifyInterruptOccurred = 0;
+	notifyInterruptPending = 0;
 
 	/* Do nothing else if we aren't actively listening */
 	if (listenChannels == NIL)
 		return;
 
-	/* Must prevent catchup interrupt while I am running */
-	catchup_enabled = DisableCatchupInterrupt();
-
 	if (Trace_notify)
 		elog(DEBUG1, "ProcessIncomingNotify");
 
@@ -2123,9 +2037,6 @@ ProcessIncomingNotify(void)
 
 	if (Trace_notify)
 		elog(DEBUG1, "ProcessIncomingNotify: done");
-
-	if (catchup_enabled)
-		EnableCatchupInterrupt();
 }
 
 /*
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 59204cf..93e6c3b 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -74,9 +74,12 @@
 #endif
 #endif   /* USE_SSL */
 
+#include "miscadmin.h"
+
 #include "libpq/libpq.h"
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
+#include "storage/proc.h"
 
 
 #ifdef USE_SSL
@@ -258,7 +261,7 @@ secure_read(Port *port, void *ptr, size_t len)
 	{
 		int			err;
 
-rloop:
+rloop_ssl:
 		errno = 0;
 		n = SSL_read(port->ssl, ptr, len);
 		err = SSL_get_error(port->ssl, n);
@@ -266,8 +269,23 @@ rloop:
 		{
 			case SSL_ERROR_NONE:
 				port->count += n;
+
+				prepare_for_client_read();
+				CHECK_FOR_INTERRUPTS();
+				client_read_ended();
+
 				break;
 			case SSL_ERROR_WANT_READ:
+				/*
+				 * We'll, among others, get here if the low level routine
+				 * doing the actual recv() via the socket got interrupted by a
+				 * interrupt. That's so we can handle interrupts once outside
+				 * openssl so we don't jump out from underneath its covers.
+				 */
+				prepare_for_client_read();
+				CHECK_FOR_INTERRUPTS();
+				client_read_ended();
+				/* fallthrough */
 			case SSL_ERROR_WANT_WRITE:
 				if (port->noblock)
 				{
@@ -275,13 +293,7 @@ rloop:
 					n = -1;
 					break;
 				}
-#ifdef WIN32
-				pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
-											(err == SSL_ERROR_WANT_READ) ?
-									FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
-											INFINITE);
-#endif
-				goto rloop;
+				goto rloop_ssl;
 			case SSL_ERROR_SYSCALL:
 				/* leave it to caller to ereport the value of errno */
 				if (n != -1)
@@ -312,11 +324,33 @@ rloop:
 	else
 #endif
 	{
-		prepare_for_client_read();
-
+rloop_nossl:
+		/*
+		 * Try to read from the socket without blocking. If it suceeds we're
+		 * done, otherwise we'll wait for the socket using the latch mechanism.
+		 */
 		n = recv(port->sock, ptr, len, 0);
 
+		/* process interrupts that potentially happened while receiving */
+		prepare_for_client_read();
+		CHECK_FOR_INTERRUPTS();
 		client_read_ended();
+
+		if (!port->noblock && n <= 0 && errno == EWOULDBLOCK)
+		{
+			int w;
+
+			w = WaitLatchOrSocket(&MyProc->procLatch,
+								  WL_LATCH_SET | WL_SOCKET_READABLE,
+								  port->sock, 0);
+
+			if (w & (WL_SOCKET_READABLE | WL_LATCH_SET))
+			{
+				ResetLatch(&MyProc->procLatch);
+				goto rloop_nossl;
+			}
+		}
+
 	}
 
 	return n;
@@ -386,7 +420,7 @@ secure_write(Port *port, void *ptr, size_t len)
 			}
 		}
 
-wloop:
+wloop_ssl:
 		errno = 0;
 		n = SSL_write(port->ssl, ptr, len);
 		err = SSL_get_error(port->ssl, n);
@@ -397,13 +431,15 @@ wloop:
 				break;
 			case SSL_ERROR_WANT_READ:
 			case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
-				pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
-											(err == SSL_ERROR_WANT_READ) ?
-									FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
-											INFINITE);
-#endif
-				goto wloop;
+				if (MyProc != NULL)
+				{
+					/* XXX: we probably want to process latch sets at some point here */
+					WaitLatchOrSocket(&MyProc->procLatch,
+									  err == SSL_ERROR_WANT_READ ?
+									  WL_SOCKET_READABLE :  WL_SOCKET_WRITEABLE,
+									  port->sock, 0);
+				}
+				goto wloop_ssl;
 			case SSL_ERROR_SYSCALL:
 				/* leave it to caller to ereport the value of errno */
 				if (n != -1)
@@ -455,8 +491,28 @@ wloop:
 	}
 	else
 #endif
+	{
+wloop_nossl:
 		n = send(port->sock, ptr, len, 0);
 
+		if (MyProc == NULL)
+			goto wloop_nossl;
+
+		if (n < 0 && errno == EWOULDBLOCK && MyProc != NULL)
+		{
+			int w;
+
+			/* XXX: we probably want to process latch sets at some point here */
+			w = WaitLatchOrSocket(&MyProc->procLatch,
+								  WL_SOCKET_WRITEABLE,
+								  port->sock, 0);
+			if (w & (WL_SOCKET_WRITEABLE | WL_LATCH_SET))
+			{
+				ResetLatch(&MyProc->procLatch);
+				goto wloop_nossl;
+			}
+		}
+	}
 	return n;
 }
 
@@ -487,13 +543,40 @@ my_sock_read(BIO *h, char *buf, int size)
 {
 	int			res = 0;
 
-	prepare_for_client_read();
-
 	if (buf != NULL)
 	{
-		res = recv(h->num, buf, size, 0);
 		BIO_clear_retry_flags(h);
-		if (res <= 0)
+retry:
+		res = recv(h->num, buf, size, 0);
+
+		if (MyProc == NULL && res < 0 && errno == EWOULDBLOCK)
+			goto retry;
+		else if (!MyProcPort->noblock && res < 0 && errno == EWOULDBLOCK)
+		{
+			int w;
+
+			w = WaitLatchOrSocket(&MyProc->procLatch,
+								  WL_LATCH_SET | WL_SOCKET_READABLE,
+								  h->num, 0);
+
+			/*
+			 * If the latch has been set while we were waiting for the socket
+			 * to become readable force a return to outside openssl's
+			 * control. There we can correctly do error handling and such
+			 * without disturbing openssl's internal state.
+			 */
+			if (w & WL_LATCH_SET)
+			{
+				ResetLatch(&MyProc->procLatch);
+				BIO_set_retry_read(h);
+				errno = EINTR;
+				return -1;
+			}
+
+			if (w & WL_SOCKET_READABLE)
+				goto retry;
+		}
+		else if (res <= 0)
 		{
 			/* If we were interrupted, tell caller to retry */
 			if (errno == EINTR)
@@ -503,8 +586,6 @@ my_sock_read(BIO *h, char *buf, int size)
 		}
 	}
 
-	client_read_ended();
-
 	return res;
 }
 
@@ -517,7 +598,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
-		if (errno == EINTR)
+		if (errno == EINTR || errno == EWOULDBLOCK)
 		{
 			BIO_set_retry_write(h);
 		}
@@ -1012,12 +1093,12 @@ aloop:
 		{
 			case SSL_ERROR_WANT_READ:
 			case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
-				pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
-											(err == SSL_ERROR_WANT_READ) ?
-						FD_READ | FD_CLOSE | FD_ACCEPT : FD_WRITE | FD_CLOSE,
-											INFINITE);
-#endif
+				/* FIXME: interrupt handling? */
+				if (MyProc != NULL)
+					WaitLatchOrSocket(&MyProc->procLatch,
+									  err == SSL_ERROR_WANT_READ ?
+									  WL_SOCKET_READABLE :  WL_SOCKET_WRITEABLE,
+									  port->sock, 0);
 				goto aloop;
 			case SSL_ERROR_SYSCALL:
 				if (r < 0)
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..cd4a941 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -158,6 +158,22 @@ pq_init(void)
 	PqCommBusy = false;
 	DoingCopyOut = false;
 	on_proc_exit(pq_close, 0);
+
+	/*
+	 * We now always operate the underlying socket in nonblocking mode and use
+	 * WaitLatchOrSocket to implement blocking semantics if needed.
+	 *
+	 * Use COMMERROR on failure, because ERROR would try to send the error to
+	 * the client, which might require changing the mode again, leading to
+	 * infinite recursion.
+	 */
+#ifdef WIN32
+	pgwin32_noblock = true;
+#else
+	if (!pg_set_noblock(MyProcPort->sock))
+		ereport(COMMERROR,
+				(errmsg("could not set socket to nonblocking mode: %m")));
+#endif
 }
 
 /* --------------------------------
@@ -792,31 +808,6 @@ TouchSocketFiles(void)
 static void
 pq_set_nonblocking(bool nonblocking)
 {
-	if (MyProcPort->noblock == nonblocking)
-		return;
-
-#ifdef WIN32
-	pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
-	/*
-	 * Use COMMERROR on failure, because ERROR would try to send the error to
-	 * the client, which might require changing the mode again, leading to
-	 * infinite recursion.
-	 */
-	if (nonblocking)
-	{
-		if (!pg_set_noblock(MyProcPort->sock))
-			ereport(COMMERROR,
-					(errmsg("could not set socket to nonblocking mode: %m")));
-	}
-	else
-	{
-		if (!pg_set_block(MyProcPort->sock))
-			ereport(COMMERROR,
-					(errmsg("could not set socket to blocking mode: %m")));
-	}
-#endif
 	MyProcPort->noblock = nonblocking;
 }
 
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index d0e928f..7d9bcca 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -199,10 +199,6 @@ WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 /*
  * Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
  * conditions.
- *
- * When waiting on a socket, WL_SOCKET_READABLE *must* be included in
- * 'wakeEvents'; WL_SOCKET_WRITEABLE is optional.  The reason for this is
- * that EOF and error conditions are reported only via WL_SOCKET_READABLE.
  */
 int
 WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
@@ -230,8 +226,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
 
 	Assert(wakeEvents != 0);	/* must have at least one wake event */
-	/* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
-	Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
 
 	if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
@@ -352,7 +346,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 				result |= WL_SOCKET_READABLE;
 			}
 			if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
-				(pfds[0].revents & POLLOUT))
+				(pfds[0].revents & (POLLOUT | POLLHUP | POLLERR | POLLNVAL)))
 			{
 				result |= WL_SOCKET_WRITEABLE;
 			}
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 6c50dbb..95e4a16 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -117,8 +117,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
 
 	Assert(wakeEvents != 0);	/* must have at least one wake event */
-	/* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
-	Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
 
 	if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
@@ -152,10 +150,10 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	if (wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))
 	{
 		/* Need an event object to represent events on the socket */
-		int			flags = 0;
+		int			flags = FD_CLOSE;
 
 		if (wakeEvents & WL_SOCKET_READABLE)
-			flags |= (FD_READ | FD_CLOSE);
+			flags |= FD_READ;
 		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
@@ -232,7 +230,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 				elog(ERROR, "failed to enumerate network events: error code %u",
 					 WSAGetLastError());
 			if ((wakeEvents & WL_SOCKET_READABLE) &&
-				(resEvents.lNetworkEvents & (FD_READ | FD_CLOSE)))
+				(resEvents.lNetworkEvents & FD_READ))
 			{
 				result |= WL_SOCKET_READABLE;
 			}
@@ -241,6 +239,11 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 			{
 				result |= WL_SOCKET_WRITEABLE;
 			}
+			if (resEvents.lNetworkEvents & FD_CLOSE)
+			{
+				result |= WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE;
+			}
+
 		}
 		else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
 				 rc == WAIT_OBJECT_0 + pmdeath_eventno)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index b53cfdb..1549cf0 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -614,6 +614,10 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		ResetLatch(&MyProc->procLatch);
 
+		/* process catchup interrupts and the like */
+		/* FIXME: Will be ignored by ProcessInterrupts right now */
+		CHECK_FOR_INTERRUPTS();
+
 		DisableCatchupInterrupt();
 
 		/*
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index d7d0406..a4c9c51 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -18,6 +18,7 @@
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "storage/ipc.h"
+#include "storage/proc.h"
 #include "storage/sinvaladt.h"
 #include "utils/inval.h"
 
@@ -41,8 +42,8 @@ uint64		SharedInvalidMessageCounter;
  * does not grok "volatile", you'd be best advised to compile this file
  * with all optimization turned off.
  */
-static volatile int catchupInterruptEnabled = 0;
-static volatile int catchupInterruptOccurred = 0;
+volatile sig_atomic_t catchupInterruptOK = 0;
+volatile sig_atomic_t catchupInterruptPending = 0;
 
 static void ProcessCatchupEvent(void);
 
@@ -141,9 +142,9 @@ ReceiveSharedInvalidMessages(
 	 * catchup signal this way avoids creating spikes in system load for what
 	 * should be just a background maintenance activity.
 	 */
-	if (catchupInterruptOccurred)
+	if (catchupInterruptPending)
 	{
-		catchupInterruptOccurred = 0;
+		catchupInterruptPending = 0;
 		elog(DEBUG4, "sinval catchup complete, cleaning queue");
 		SICleanupQueue(false, 0);
 	}
@@ -155,12 +156,9 @@ ReceiveSharedInvalidMessages(
  *
  * This is called when PROCSIG_CATCHUP_INTERRUPT is received.
  *
- * If we are idle (catchupInterruptEnabled is set), we can safely
- * invoke ProcessCatchupEvent directly.  Otherwise, just set a flag
- * to do it later.  (Note that it's quite possible for normal processing
- * of the current transaction to cause ReceiveSharedInvalidMessages()
- * to be run later on; in that case the flag will get cleared again,
- * since there's no longer any reason to do anything.)
+ * We used to directly call ProcessCatchupEvent directly when idle. These days
+ * we just set a flag to do it later and notify the process of that fact by
+ * setting the processes latch.
  */
 void
 HandleCatchupInterrupt(void)
@@ -170,63 +168,19 @@ HandleCatchupInterrupt(void)
 	 * you do here.
 	 */
 
-	/* Don't joggle the elbow of proc_exit */
-	if (proc_exit_inprogress)
-		return;
+	catchupInterruptPending = 1;
 
-	if (catchupInterruptEnabled)
-	{
-		bool		save_ImmediateInterruptOK = ImmediateInterruptOK;
-
-		/*
-		 * We may be called while ImmediateInterruptOK is true; turn it off
-		 * while messing with the catchup state.  This prevents problems if
-		 * SIGINT or similar arrives while we're working.  Just to be real
-		 * sure, bump the interrupt holdoff counter as well.  That way, even
-		 * if something inside ProcessCatchupEvent() transiently sets
-		 * ImmediateInterruptOK (eg while waiting on a lock), we won't get
-		 * interrupted until we're done with the catchup interrupt.
-		 */
-		ImmediateInterruptOK = false;
-		HOLD_INTERRUPTS();
-
-		/*
-		 * I'm not sure whether some flavors of Unix might allow another
-		 * SIGUSR1 occurrence to recursively interrupt this routine. To cope
-		 * with the possibility, we do the same sort of dance that
-		 * EnableCatchupInterrupt must do --- see that routine for comments.
-		 */
-		catchupInterruptEnabled = 0;	/* disable any recursive signal */
-		catchupInterruptOccurred = 1;	/* do at least one iteration */
-		for (;;)
-		{
-			catchupInterruptEnabled = 1;
-			if (!catchupInterruptOccurred)
-				break;
-			catchupInterruptEnabled = 0;
-			if (catchupInterruptOccurred)
-			{
-				/* Here, it is finally safe to do stuff. */
-				ProcessCatchupEvent();
-			}
-		}
+	/* make sure the event is processed in due course */
+	/* XXX: arguably we should only do this if notifyInterruptEnabled */
+	if (MyProc != NULL)
+		SetLatch(&MyProc->procLatch);
+}
 
-		/*
-		 * Restore the holdoff level and ImmediateInterruptOK, and check for
-		 * interrupts if needed.
-		 */
-		RESUME_INTERRUPTS();
-		ImmediateInterruptOK = save_ImmediateInterruptOK;
-		if (save_ImmediateInterruptOK)
-			CHECK_FOR_INTERRUPTS();
-	}
-	else
-	{
-		/*
-		 * In this path it is NOT SAFE to do much of anything, except this:
-		 */
-		catchupInterruptOccurred = 1;
-	}
+void
+ProcessCatchupInterrupt(void)
+{
+	/* Here, it is finally safe to do stuff. */
+	ProcessCatchupEvent();
 }
 
 /*
@@ -242,35 +196,13 @@ HandleCatchupInterrupt(void)
 void
 EnableCatchupInterrupt(void)
 {
-	/*
-	 * This code is tricky because we are communicating with a signal handler
-	 * that could interrupt us at any point.  If we just checked
-	 * catchupInterruptOccurred and then set catchupInterruptEnabled, we could
-	 * fail to respond promptly to a signal that happens in between those two
-	 * steps.  (A very small time window, perhaps, but Murphy's Law says you
-	 * can hit it...)  Instead, we first set the enable flag, then test the
-	 * occurred flag.  If we see an unserviced interrupt has occurred, we
-	 * re-clear the enable flag before going off to do the service work. (That
-	 * prevents re-entrant invocation of ProcessCatchupEvent() if another
-	 * interrupt occurs.) If an interrupt comes in between the setting and
-	 * clearing of catchupInterruptEnabled, then it will have done the service
-	 * work and left catchupInterruptOccurred zero, so we have to check again
-	 * after clearing enable.  The whole thing has to be in a loop in case
-	 * another interrupt occurs while we're servicing the first. Once we get
-	 * out of the loop, enable is set and we know there is no unserviced
-	 * interrupt.
-	 *
-	 * NB: an overenthusiastic optimizing compiler could easily break this
-	 * code. Hopefully, they all understand what "volatile" means these days.
-	 */
+	catchupInterruptOK = true;
+
 	for (;;)
 	{
-		catchupInterruptEnabled = 1;
-		if (!catchupInterruptOccurred)
+		if (!catchupInterruptPending)
 			break;
-		catchupInterruptEnabled = 0;
-		if (catchupInterruptOccurred)
-			ProcessCatchupEvent();
+		ProcessCatchupEvent();
 	}
 }
 
@@ -288,9 +220,9 @@ EnableCatchupInterrupt(void)
 bool
 DisableCatchupInterrupt(void)
 {
-	bool		result = (catchupInterruptEnabled != 0);
+	bool        result = (catchupInterruptOK != 0);
 
-	catchupInterruptEnabled = 0;
+	catchupInterruptOK = false;
 
 	return result;
 }
@@ -299,24 +231,15 @@ DisableCatchupInterrupt(void)
  * ProcessCatchupEvent
  *
  * Respond to a catchup event (PROCSIG_CATCHUP_INTERRUPT) from another
- * backend.
- *
- * This is called either directly from the PROCSIG_CATCHUP_INTERRUPT
- * signal handler, or the next time control reaches the outer idle loop
- * (assuming there's still anything to do by then).
+ * backend once it's safe to do so.
  */
 static void
 ProcessCatchupEvent(void)
 {
-	bool		notify_enabled;
-
-	/* Must prevent notify interrupt while I am running */
-	notify_enabled = DisableNotifyInterrupt();
-
 	/*
 	 * What we need to do here is cause ReceiveSharedInvalidMessages() to run,
 	 * which will do the necessary work and also reset the
-	 * catchupInterruptOccurred flag.  If we are inside a transaction we can
+	 * catchupInterruptPending flag.  If we are inside a transaction we can
 	 * just call AcceptInvalidationMessages() to do this.  If we aren't, we
 	 * start and immediately end a transaction; the call to
 	 * AcceptInvalidationMessages() happens down inside transaction start.
@@ -337,7 +260,4 @@ ProcessCatchupEvent(void)
 		StartTransactionCommand();
 		CommitTransactionCommand();
 	}
-
-	if (notify_enabled)
-		EnableNotifyInterrupt();
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..d9d5ddf 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -490,13 +490,11 @@ ReadCommand(StringInfo inBuf)
  * prepare_for_client_read -- set up to possibly block on client input
  *
  * This must be called immediately before any low-level read from the
- * client connection.  It is necessary to do it at a sufficiently low level
- * that there won't be any other operations except the read kernel call
- * itself between this call and the subsequent client_read_ended() call.
- * In particular there mustn't be use of malloc() or other potentially
- * non-reentrant libc functions.  This restriction makes it safe for us
- * to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
+ * client connection.
+ *
+ * NB: We used to allow direct execution of interrupt handlers
+ * (i.e. ImmediateInterruptOK = true) once within, but that proved ot be
+ * fragile and complicated.
  */
 void
 prepare_for_client_read(void)
@@ -507,9 +505,6 @@ prepare_for_client_read(void)
 		EnableNotifyInterrupt();
 		EnableCatchupInterrupt();
 
-		/* Allow cancel/die interrupts to be processed while waiting */
-		ImmediateInterruptOK = true;
-
 		/* And don't forget to detect one that already arrived */
 		CHECK_FOR_INTERRUPTS();
 	}
@@ -2588,8 +2583,8 @@ die(SIGNAL_ARGS)
 		ProcDiePending = true;
 
 		/*
-		 * If it's safe to interrupt, and we're waiting for input or a lock,
-		 * service the interrupt immediately
+		 * If it's safe to interrupt, and we're waiting for a lock, service
+		 * the interrupt immediately
 		 */
 		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
 			CritSectionCount == 0)
@@ -2630,8 +2625,8 @@ StatementCancelHandler(SIGNAL_ARGS)
 		QueryCancelPending = true;
 
 		/*
-		 * If it's safe to interrupt, and we're waiting for input or a lock,
-		 * service the interrupt immediately
+		 * If it's safe to interrupt, and we're waiting for a lock, service
+		 * the interrupt immediately
 		 */
 		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
 			CritSectionCount == 0)
@@ -2789,8 +2784,8 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 			RecoveryConflictRetryable = false;
 
 		/*
-		 * If it's safe to interrupt, and we're waiting for input or a lock,
-		 * service the interrupt immediately
+		 * If it's safe to interrupt, and we're waiting for a lock, service
+		 * the interrupt immediately
 		 */
 		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
 			CritSectionCount == 0)
@@ -2966,6 +2961,22 @@ ProcessInterrupts(void)
 					 errmsg("canceling statement due to user request")));
 		}
 	}
+
+	/*
+	 * Process catchup/notify events if we're up for doing so right now. If
+	 * not they'll get run the next time the respective interrupt processing
+	 * is enabled. Thus it's okay to consume the interrupt despite not
+	 * processing these events.
+	 */
+	if (catchupInterruptPending && catchupInterruptOK)
+	{
+		ProcessCatchupInterrupt();
+	}
+	if (notifyInterruptPending && notifyInterruptOK)
+	{
+		ProcessNotifyInterrupt();
+	}
+
 	/* If we get here, do nothing (probably, QueryCancelPending was reset) */
 }
 
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 0650e65..cd77fda 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -13,6 +13,8 @@
 #ifndef ASYNC_H
 #define ASYNC_H
 
+#include <signal.h>
+
 #include "fmgr.h"
 
 /*
@@ -21,6 +23,8 @@
 #define NUM_ASYNC_BUFFERS	8
 
 extern bool Trace_notify;
+extern volatile sig_atomic_t notifyInterruptOK;
+extern volatile sig_atomic_t notifyInterruptPending;
 
 extern Size AsyncShmemSize(void);
 extern void AsyncShmemInit(void);
@@ -55,5 +59,6 @@ extern void HandleNotifyInterrupt(void);
  */
 extern void EnableNotifyInterrupt(void);
 extern bool DisableNotifyInterrupt(void);
+extern void ProcessNotifyInterrupt(void);
 
 #endif   /* ASYNC_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 812ea95..b36be97 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -14,8 +14,9 @@
 #ifndef SINVAL_H
 #define SINVAL_H
 
-#include "storage/relfilenode.h"
+#include <signal.h>
 
+#include "storage/relfilenode.h"
 
 /*
  * We support several types of shared-invalidation messages:
@@ -123,6 +124,8 @@ typedef union
 /* Counter of messages processed; don't worry about overflow. */
 extern uint64 SharedInvalidMessageCounter;
 
+extern volatile sig_atomic_t catchupInterruptOK;
+extern volatile sig_atomic_t catchupInterruptPending;
 
 extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
 						  int n);
@@ -140,6 +143,7 @@ extern void HandleCatchupInterrupt(void);
  */
 extern void EnableCatchupInterrupt(void);
 extern bool DisableCatchupInterrupt(void);
+extern void ProcessCatchupInterrupt(void);
 
 extern int xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
 									 bool *RelcacheInitFileInval);
#18MauMau
maumau307@gmail.com
In reply to: MauMau (#13)
3 attachment(s)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

From: "MauMau" <maumau307@gmail.com>

I'll try the fix tomorrow if possible. What kind of problems do you hink
of for back-patching?

I could reproduce the problem with 9.2.8, but have not yet with 9.5dev.
I'll try with 9.2.9, and create the fix.

I could also reproduce the problem with 9.2.9, but I couldn't with 9.5devel.

However, I could confirm that the attached patch solves the problem. The
patch is based on 9.2.9. To adjust this patch for 9.3 and later, set the
background worker's SIGUSR2 handler to procsignal_sigusr2_handler like
normal backends. Could you review and commit this? We wish the fix for 9.1
and above.

To reproduce the problem, you can do as follows with the attached files.
Originally, test.sql had many columns the customer is actually using, but we
cannot show the real DDL as it's the customer's asset.

$ createdb test
$ pgbench -c 20 -j 20 -T 600 -s -n -f test.pgbench

Synchronous streaming replication and hot standby need to be used.

When I ran this on a 4-core RHEL5 box, the problem arose within a few
minutes. pgbench continues to run emitting a lot of messages like below.
Over time, the number of normal backends will increases, and dozens of which
remain stuck with the stack trace I showed before.

CREATE TABLE
psql:test.sql:1: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit
index "test_table_pkey" for table "test_table"

I have no idea why 9.5devel didn't show the problem. One difference I
noticed is that pgbench didn't output the message of implicit index creation
for the primary key.

Regards
MauMau

Attachments:

sinval_catchup_hang.patchapplication/octet-stream; name=sinval_catchup_hang.patchDownload
diff -rpcd a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
*** a/src/backend/postmaster/autovacuum.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/postmaster/autovacuum.c	2014-07-29 17:56:47.000000000 +0900
*************** avl_sigusr2_handler(SIGNAL_ARGS)
*** 1342,1347 ****
--- 1342,1349 ----
  {
  	int			save_errno = errno;
  
+ 	procsignal_sigusr2_handler(SIGUSR2);
+ 
  	got_SIGUSR2 = true;
  	if (MyProc)
  		SetLatch(&MyProc->procLatch);
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1496,1502 ****
  
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 	pqsignal(SIGUSR2, SIG_IGN);
  	pqsignal(SIGFPE, FloatExceptionHandler);
  	pqsignal(SIGCHLD, SIG_DFL);
  
--- 1498,1504 ----
  
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 	pqsignal(SIGUSR2, procsignal_sigusr2_handler);
  	pqsignal(SIGFPE, FloatExceptionHandler);
  	pqsignal(SIGCHLD, SIG_DFL);
  
diff -rpcd a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
*** a/src/backend/storage/ipc/procsignal.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/storage/ipc/procsignal.c	2014-07-29 17:56:47.000000000 +0900
*************** procsignal_sigusr1_handler(SIGNAL_ARGS)
*** 258,263 ****
--- 258,276 ----
  {
  	int			save_errno = errno;
  
+ 	latch_sigusr1_handler();
+ 
+ 	errno = save_errno;
+ }
+ 
+ /*
+  * procsignal_sigusr2_handler - handle SIGUSR2 signal.
+  */
+ void
+ procsignal_sigusr2_handler(SIGNAL_ARGS)
+ {
+ 	int			save_errno = errno;
+ 
  	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
  		HandleCatchupInterrupt();
  
*************** procsignal_sigusr1_handler(SIGNAL_ARGS)
*** 282,288 ****
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  
- 	latch_sigusr1_handler();
- 
  	errno = save_errno;
  }
--- 295,299 ----
diff -rpcd a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
*** a/src/backend/tcop/postgres.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/tcop/postgres.c	2014-07-29 17:56:47.000000000 +0900
*************** PostgresMain(int argc, char *argv[],
*** 3624,3630 ****
  		 */
  		pqsignal(SIGPIPE, SIG_IGN);
  		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 		pqsignal(SIGUSR2, SIG_IGN);
  		pqsignal(SIGFPE, FloatExceptionHandler);
  
  		/*
--- 3624,3630 ----
  		 */
  		pqsignal(SIGPIPE, SIG_IGN);
  		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 		pqsignal(SIGUSR2, procsignal_sigusr2_handler);
  		pqsignal(SIGFPE, FloatExceptionHandler);
  
  		/*
diff -rpcd a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
*** a/src/include/storage/procsignal.h	2014-07-22 04:12:31.000000000 +0900
--- b/src/include/storage/procsignal.h	2014-07-29 17:56:47.000000000 +0900
*************** extern int SendProcSignal(pid_t pid, Pro
*** 54,58 ****
--- 54,59 ----
  			   BackendId backendId);
  
  extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
+ extern void procsignal_sigusr2_handler(SIGNAL_ARGS);
  
  #endif   /* PROCSIGNAL_H */
test.pgbenchapplication/octet-stream; name=test.pgbenchDownload
test.sqlapplication/octet-stream; name=test.sqlDownload
#19MauMau
maumau307@gmail.com
In reply to: MauMau (#18)
1 attachment(s)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Please find attached the revised patch. I failed to change SIGUSR1 to
SIGUSR2 when sending sinval catchup signal.

In addition, I changed wal sender to not receive SIGUSR1/2 from
SendProcSignal(). Without this, wal sender will excessively wake up or
terminate by the signals. This is an existing problem.

Regards
MauMau

Attachments:

sinval_catchup_hang_v2.patchapplication/octet-stream; name=sinval_catchup_hang_v2.patchDownload
diff -rpcd a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
*** a/src/backend/postmaster/autovacuum.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/postmaster/autovacuum.c	2014-07-31 12:30:09.000000000 +0900
*************** avl_sigusr2_handler(SIGNAL_ARGS)
*** 1342,1347 ****
--- 1342,1349 ----
  {
  	int			save_errno = errno;
  
+ 	procsignal_sigusr2_handler(SIGUSR2);
+ 
  	got_SIGUSR2 = true;
  	if (MyProc)
  		SetLatch(&MyProc->procLatch);
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1496,1502 ****
  
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 	pqsignal(SIGUSR2, SIG_IGN);
  	pqsignal(SIGFPE, FloatExceptionHandler);
  	pqsignal(SIGCHLD, SIG_DFL);
  
--- 1498,1504 ----
  
  	pqsignal(SIGPIPE, SIG_IGN);
  	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 	pqsignal(SIGUSR2, procsignal_sigusr2_handler);
  	pqsignal(SIGFPE, FloatExceptionHandler);
  	pqsignal(SIGCHLD, SIG_DFL);
  
diff -rpcd a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
*** a/src/backend/storage/ipc/procsignal.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/storage/ipc/procsignal.c	2014-07-31 12:30:09.000000000 +0900
***************
*** 26,32 ****
  
  
  /*
!  * The SIGUSR1 signal is multiplexed to support signalling multiple event
   * types. The specific reason is communicated via flags in shared memory.
   * We keep a boolean flag for each possible "reason", so that different
   * reasons can be signaled to a process concurrently.  (However, if the same
--- 26,32 ----
  
  
  /*
!  * The SIGUSR2 signal is multiplexed to support signalling multiple event
   * types. The specific reason is communicated via flags in shared memory.
   * We keep a boolean flag for each possible "reason", so that different
   * reasons can be signaled to a process concurrently.  (However, if the same
*************** CleanupProcSignalState(int status, Datum
*** 140,146 ****
  	Assert(slot == MyProcSignalSlot);
  
  	/*
! 	 * Clear MyProcSignalSlot, so that a SIGUSR1 received after this point
  	 * won't try to access it after it's no longer ours (and perhaps even
  	 * after we've unmapped the shared memory segment).
  	 */
--- 140,146 ----
  	Assert(slot == MyProcSignalSlot);
  
  	/*
! 	 * Clear MyProcSignalSlot, so that a SIGUSR2 received after this point
  	 * won't try to access it after it's no longer ours (and perhaps even
  	 * after we've unmapped the shared memory segment).
  	 */
*************** SendProcSignal(pid_t pid, ProcSignalReas
*** 194,200 ****
  			/* Atomically set the proper flag */
  			slot->pss_signalFlags[reason] = true;
  			/* Send signal */
! 			return kill(pid, SIGUSR1);
  		}
  	}
  	else
--- 194,200 ----
  			/* Atomically set the proper flag */
  			slot->pss_signalFlags[reason] = true;
  			/* Send signal */
! 			return kill(pid, SIGUSR2);
  		}
  	}
  	else
*************** SendProcSignal(pid_t pid, ProcSignalReas
*** 218,224 ****
  				/* Atomically set the proper flag */
  				slot->pss_signalFlags[reason] = true;
  				/* Send signal */
! 				return kill(pid, SIGUSR1);
  			}
  		}
  	}
--- 218,224 ----
  				/* Atomically set the proper flag */
  				slot->pss_signalFlags[reason] = true;
  				/* Send signal */
! 				return kill(pid, SIGUSR2);
  			}
  		}
  	}
*************** SendProcSignal(pid_t pid, ProcSignalReas
*** 230,236 ****
  /*
   * CheckProcSignal - check to see if a particular reason has been
   * signaled, and clear the signal flag.  Should be called after receiving
!  * SIGUSR1.
   */
  static bool
  CheckProcSignal(ProcSignalReason reason)
--- 230,236 ----
  /*
   * CheckProcSignal - check to see if a particular reason has been
   * signaled, and clear the signal flag.  Should be called after receiving
!  * SIGUSR2.
   */
  static bool
  CheckProcSignal(ProcSignalReason reason)
*************** procsignal_sigusr1_handler(SIGNAL_ARGS)
*** 258,263 ****
--- 258,276 ----
  {
  	int			save_errno = errno;
  
+ 	latch_sigusr1_handler();
+ 
+ 	errno = save_errno;
+ }
+ 
+ /*
+  * procsignal_sigusr2_handler - handle SIGUSR2 signal.
+  */
+ void
+ procsignal_sigusr2_handler(SIGNAL_ARGS)
+ {
+ 	int			save_errno = errno;
+ 
  	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
  		HandleCatchupInterrupt();
  
*************** procsignal_sigusr1_handler(SIGNAL_ARGS)
*** 282,288 ****
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  
- 	latch_sigusr1_handler();
- 
  	errno = save_errno;
  }
--- 295,299 ----
diff -rpcd a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
*** a/src/backend/tcop/postgres.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/tcop/postgres.c	2014-07-31 12:30:09.000000000 +0900
*************** PostgresMain(int argc, char *argv[],
*** 3624,3630 ****
  		 */
  		pqsignal(SIGPIPE, SIG_IGN);
  		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 		pqsignal(SIGUSR2, SIG_IGN);
  		pqsignal(SIGFPE, FloatExceptionHandler);
  
  		/*
--- 3624,3630 ----
  		 */
  		pqsignal(SIGPIPE, SIG_IGN);
  		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
! 		pqsignal(SIGUSR2, procsignal_sigusr2_handler);
  		pqsignal(SIGFPE, FloatExceptionHandler);
  
  		/*
diff -rpcd a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
*** a/src/backend/utils/init/postinit.c	2014-07-22 04:12:31.000000000 +0900
--- b/src/backend/utils/init/postinit.c	2014-07-31 12:30:09.000000000 +0900
*************** InitPostgres(const char *in_dbname, Oid 
*** 486,492 ****
  	 */
  	MyBackendId = InvalidBackendId;
  
! 	SharedInvalBackendInit(false);
  
  	if (MyBackendId > MaxBackends || MyBackendId <= 0)
  		elog(FATAL, "bad backend ID: %d", MyBackendId);
--- 486,496 ----
  	 */
  	MyBackendId = InvalidBackendId;
  
! 	/* walsender doesn't need sinval messages */
! 	if (am_walsender)
! 		SharedInvalBackendInit(true);
! 	else
! 		SharedInvalBackendInit(false);
  
  	if (MyBackendId > MaxBackends || MyBackendId <= 0)
  		elog(FATAL, "bad backend ID: %d", MyBackendId);
diff -rpcd a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
*** a/src/include/storage/procsignal.h	2014-07-22 04:12:31.000000000 +0900
--- b/src/include/storage/procsignal.h	2014-07-31 12:30:09.000000000 +0900
*************** extern int SendProcSignal(pid_t pid, Pro
*** 54,58 ****
--- 54,59 ----
  			   BackendId backendId);
  
  extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
+ extern void procsignal_sigusr2_handler(SIGNAL_ARGS);
  
  #endif   /* PROCSIGNAL_H */
#20MauMau
maumau307@gmail.com
In reply to: MauMau (#19)
1 attachment(s)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

The cause is that the temporary relations are truncated unconditionally
regardless of whether they are accessed in the transaction or not. That is,
the following sequence of steps result in the hang:

1. A session creates a temporary table with ON COMMIT DELETE ROWS. It adds
the temporary table to the list of relations that should be truncated at
transaction commit.

2. The session receives a sinval catchup signal (SIGUSR1) from another
session. It starts a transaction and processes sinval messages in the
SIGUSR1 signal handler. No WAL is output while processing the sinval
messages.

3. When the transaction commits, the list of temporary relations are checked
to see if they need to be truncated.

4. The temporary table created in step 1 is truncated. To truncate a
relation, Access Exclusive lock is acquired on it. When hot standby is
used, acquiring an Access Exclusive lock generates a WAL record
(RM_STANDBY_ID, XLOG_STANDBY_LOCK).

5. The transaction waits on a latch for a reply from a synchronous standby,
because it wrote some WAL. But the latch wait never returns, because the
latch needs to receive SIGUSR1 but the SIGUSR1 handler is already in
progress from step 2.

The correct behavior is for the transaction not to truncate the temporary
table in step 4, because the transaction didn't use the temporary table.

I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied
the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9.
I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2?

Finally, I found a very easy way to reproduce the problem:

1. On terminal session 1, start psql and run:
CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
Leave the psql session open.

2. On terminal session 2, run:
pgbench -c8 -t500 -s1 -n -f test.sql dbname
[test.sql]
CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
DROP TABLE t;

3. On the psql session on terminal session 1, run any SQL statement. It
doesn't reply. The backend is stuck at SyncRepWaitForLSN().

Regards
MauMau

Attachments:

sinval_catchup_hang_v3.patchapplication/octet-stream; name=sinval_catchup_hang_v3.patchDownload
diff -rpcd a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
*** a/src/backend/commands/tablecmds.c	Tue Jul 22 04:12:31 2014
--- b/src/backend/commands/tablecmds.c	Fri Aug  8 17:28:44 2014
*************** PreCommit_on_commit_actions(void)
*** 10167,10173 ****
  				/* Do nothing (there shouldn't be such entries, actually) */
  				break;
  			case ONCOMMIT_DELETE_ROWS:
! 				oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
  				break;
  			case ONCOMMIT_DROP:
  				{
--- 10167,10180 ----
  				/* Do nothing (there shouldn't be such entries, actually) */
  				break;
  			case ONCOMMIT_DELETE_ROWS:
! 
! 				/*
! 				 * If this transaction hasn't accessed any temporary
! 				 * relations, we can skip truncating ON COMMIT DELETE ROWS
! 				 * tables, as they must still be empty.
! 				 */
! 				if (MyXactAccessedTempRel)
! 					oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
  				break;
  			case ONCOMMIT_DROP:
  				{
#21Robert Haas
robertmhaas@gmail.com
In reply to: MauMau (#20)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied
the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9.
I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2?

It was considered a performance improvement - i.e. a feature - rather
than a bug fix, so it was only added to master. That was after the
release of 9.2 and before the release of 9.3, so it's in newer
branches but not older ones.

Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Branch: master Release: REL9_3_BR [c9d7dbacd] 2013-01-29 10:43:33 +0200

Skip truncating ON COMMIT DELETE ROWS temp tables, if the transaction hasn't
touched any temporary tables.

We could try harder, and keep track of whether we've inserted to any temp
tables, rather than accessed them, and which temp tables have been inserted
to. But this is dead simple, and already covers many interesting scenarios.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23MauMau
maumau307@gmail.com
In reply to: Tom Lane (#22)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Robert Haas <robertmhaas@gmail.com> writes:

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

From: "Tom Lane" <tgl@sss.pgh.pa.us>

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

Thank you, Robert san. I'll be waiting for it to be back-ported to the next
9.1/9.2 release.

Yes, I think this failure is only one potential symptom caused by the
implemnentation mistake -- handling both latch wakeup and other tasks that
wait on a latch in the SIGUSR1 handler. Although there may be no such tasks
now, I'd like to correct and clean up the implementation as follows to avoid
similar problems in the future. I think it's enough to do this only for
9.5. Please correct me before I go deeper in the wrong direction.

* The SIGUSR1 handler only does latch wakeup. Any other task is done in
other signal handlers such as SIGUSR2. Many daemon postgres processes
follow this style, but the normal backend, autovacuum daemons, and
background workers don't now.

* InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1,
latch_sigusr1_handler). Change the argument of latch_sigusr1_handler()
accordingly.

* Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler
functions from all processes. We can eliminate many SIGUSR1 handler
functions which have the same contents.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#22)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#24)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-08-12 11:04:00 -0400, Robert Haas wrote:

On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

Isn't that still leaving the very related issue of waits due to hot
pruning open?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#25)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-12 11:04:00 -0400, Robert Haas wrote:

On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

Isn't that still leaving the very related issue of waits due to hot
pruning open?

Yes. Do you have a back-patchable solution for that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#26)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-08-12 11:56:41 -0400, Robert Haas wrote:

On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-12 11:04:00 -0400, Robert Haas wrote:

On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Aug 8, 2014 at 10:30 AM, MauMau <maumau307@gmail.com> wrote:

I've tracked down the real root cause. The fix is very simple. Please
check the attached one-liner patch.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem. As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix. At best it's
band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

Isn't that still leaving the very related issue of waits due to hot
pruning open?

Yes. Do you have a back-patchable solution for that?

The easiest thing I can think of is sprinkling a few
SetConfigOption('synchronous_commit', 'off',
PGC_INTERNAL, PGC_S_OVERRIDE,
GUC_ACTION_LOCAL, true, ERROR);
in some strategic places. From a quick look:
* all of autovacuum
* locally in ProcessCompletedNotifies
* locally in ProcessIncomingNotify
* locally in ProcessCatchupEvent
* locally in InitPostgres

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#27)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-08-12 11:56:41 -0400, Robert Haas wrote:

Yes. Do you have a back-patchable solution for that?

The easiest thing I can think of is sprinkling a few
SetConfigOption('synchronous_commit', 'off',
PGC_INTERNAL, PGC_S_OVERRIDE,
GUC_ACTION_LOCAL, true, ERROR);

This still seems to me to be applying a band-aid that covers over some
symptoms; it's not dealing with the root cause that we've overloaded
the signal handling mechanism too much. What reason is there to think
that there are no other symptoms of that?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#28)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

On 2014-08-12 13:11:55 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-08-12 11:56:41 -0400, Robert Haas wrote:

Yes. Do you have a back-patchable solution for that?

The easiest thing I can think of is sprinkling a few
SetConfigOption('synchronous_commit', 'off',
PGC_INTERNAL, PGC_S_OVERRIDE,
GUC_ACTION_LOCAL, true, ERROR);

This still seems to me to be applying a band-aid that covers over some
symptoms; it's not dealing with the root cause that we've overloaded
the signal handling mechanism too much. What reason is there to think
that there are no other symptoms of that?

I'm not arguing against fixing that. I think we need to do both,
although I am wary of fixing the signal handling in the back
branches. Fixing the signal handling won't get rid of the problem that
one e.g. might not be able to log in anymore if all synchronous standbys
are down and login caused hot pruning.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#9)
Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

Hi,

On 2014-07-26 18:16:01 +0200, Andres Freund wrote:

On 2014-07-26 11:32:24 -0400, Tom Lane wrote:

"MauMau" <maumau307@gmail.com> writes:

[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]

Ugh.

One line of thought is that it's pretty unsafe to be doing anything
as complicated as transaction start/commit in a signal handler, even one
that is sure it's not interrupting anything else.

Yea, that's really not nice.

MauMau, we don't do this anymore. Could you verify that the issue is
fixed for you?

I'd completely forgotten that this thread made me work on moving
everything complicated out of signal handlers...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers