Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

Started by Craig Ringerabout 9 years ago8 messages
#1Craig Ringer
craig@2ndquadrant.com
1 attachment(s)

Hi all

While adding support for logical decoding on standby I noticed that
the walsender doesn't respect
SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set.

It blindly assumes that it means there's new WAL:

WalSndSignals(void)
{
...
pqsignal(SIGUSR1, WalSndXLogSendHandler); /* request WAL
sending */
...
}

since all WalSndXLogSendHandler does is set the latch.

Handling recovery conflicts in the walsender is neccessary for logical
decoding on standby, so that we can replay drop database.

All the recovery conflict machinery is currently contained in
postgres.c and not used by, or accessible to, the walsender. It
actually works to just set procsignal_sigusr1_handler as walsender's
SIGUSR1 handler, but I'm not convinced it's safe:

Most of the procsignals don't make sense for walsender and could
possibly attempts things that use state the walsender doesn't have set
up. The comments on procsignal say that callers should tolerate
getting the wrong signal due to possible races:

* Also, because of race conditions, it's important that all the signals be
* defined so that no harm is done if a process mistakenly receives one.

(procsignal.h)

I'm wondering about adding a new state flag IsWalSender and have
RecoveryConflictInterrupt() ignore most conflict reasons if
IsWalSender is true. Though it strikes me that during logical decoding
on standby, the walsender could quite possibly conflict with other
things too, so it'd be better to make it safe to handle all the
conflict cases within the walsender.

Anyway, this PoC passes regression tests and allows drop database on a
standby to succeed when a slot is in-use. Not for commit as-is.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Allow-walsender-to-exit-on-conflict-with-recovery.patchtext/x-patch; charset=US-ASCII; name=0001-Allow-walsender-to-exit-on-conflict-with-recovery.patchDownload
From 7b19d2375657bf9b360e4c8feeeaf5f24b7f615a Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Fri, 18 Nov 2016 10:24:55 +0800
Subject: [PATCH 1/2] Allow walsender to exit on conflict with recovery

Now that logical decoding on standby is supported, the walsender needs to be
able to exit in response to conflict with recovery so that it can terminate to
allow replay of a DROP DATABASE to proceed.

WIP. The comments on RecoveryConflictInterrupt() still say it's only called
by normal user backends. There's no safeguard to stop walsender from invoking
other recovery conflict clauses that may be unsafe for it to call.
---
 src/backend/replication/walsender.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 327dbb2..65b38a2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -187,7 +187,6 @@ static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
-static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -2666,17 +2665,6 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: set flag to send WAL records */
-static void
-WalSndXLogSendHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
@@ -2710,7 +2698,7 @@ WalSndSignals(void)
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
 												 * shutdown */
 
-- 
2.5.5

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#1)
Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

On 18 November 2016 at 10:57, Craig Ringer <craig@2ndquadrant.com> wrote:

Hi all

While adding support for logical decoding on standby I noticed that
the walsender doesn't respect
SIGUSR1 with PROCSIG_RECOVERY_CONFLICT_DATABASE set.

I have an update on this after further study.

Surprisingly I haven't found a way to crash the walsender with it, but
as expected it is also not really correct as-is.

It's fine for terminating walsender sessions on database drop, or for
holding a lock on an object too long. Termination due to conflict with
vacuum is more problematic because:

* it'll try to terminate a logical decoding walsender even if there's
no real conflict, since only normal relation blocks were affected, not
user catalogs or system catalogs; and

* most of the time its attempts to terminate won't do anything because
there's no xact running on the walsender at the time it checks for
termination.

So that's definitely going to need more work.

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Craig Ringer (#2)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Robert Haas (#3)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>

On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

IMHO, I don't think walsender is allowed to just share the
backends' handler for a practical reason that pss_signalFlags can
harm.

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#4)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>

On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

Indeed, at the moment it doesn't. I'm proposing to change that, since
walsenders doing logical decoding on a standby need to be notified of
conflicts with recovery, many of which are the same as for normal user
backends and bgworkers.

The main exception is snapshot conflicts, where logical decoding
walsenders don't care about conflicts with specific tables, they want
to be terminated if the effective catalog_xmin on the upstream
increases past their needed catalog_xmin. They don't care about
non-catalog (or non-heap-catalog) tables. So I expect we'd just want
to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
and handle conflict with catalog_xmin increases separately.

IMHO, I don't think walsender is allowed to just share the
backends' handler for a practical reason that pss_signalFlags can
harm.

I'm not sure I see the problem. The ProcSignalReason argument to
RecoveryConflictInterrupt states that:

* Also, because of race conditions, it's important that all the signals be
* defined so that no harm is done if a process mistakenly receives one.

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Craig Ringer (#5)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

Hello,

At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com>

On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>

On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

Indeed, at the moment it doesn't. I'm proposing to change that, since
walsenders doing logical decoding on a standby need to be notified of
conflicts with recovery, many of which are the same as for normal user
backends and bgworkers.

The main exception is snapshot conflicts, where logical decoding
walsenders don't care about conflicts with specific tables, they want
to be terminated if the effective catalog_xmin on the upstream
increases past their needed catalog_xmin. They don't care about
non-catalog (or non-heap-catalog) tables. So I expect we'd just want
to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
and handle conflict with catalog_xmin increases separately.

Thank you for the explanation. It seems to be reasonable for
walsender to have the same mechanism with
procsignal_sigusr1_handler. Sharing a handler or having another
one is a matter of design. (But sharing it might not be better if
walsender handles only two reasons.)

IMHO, I don't think walsender is allowed to just share the
backends' handler for a practical reason that pss_signalFlags can
harm.

I'm not sure I see the problem. The ProcSignalReason argument to
RecoveryConflictInterrupt states that:

* Also, because of race conditions, it's important that all the signals be
* defined so that no harm is done if a process mistakenly receives one.

It won't cause actual problem, but it is not managed on the
*current* walsender. I meant that taking any action based on
unmanaged variable seems to be a flaw of design. But anyway no
problem if it is redesigned to be used on walsender.

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

The global is unncecessary if walsender have a different handler
from normal backends. If there's at least one or few additional
reasons for signal, sharing SendProcSignal and having dedicate
handler might be better.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#6)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

No that's not right.

At Tue, 22 Nov 2016 17:45:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161122.174534.266086549.horiguchi.kyotaro@lab.ntt.co.jp>

Hello,

At Tue, 22 Nov 2016 12:35:56 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YF9-pojdPukTO7XGi+G1w=aRkv=tV7xBG0OPQdD+xX+-Q@mail.gmail.com>

On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZh6M5bTmtZZm1vBpFGHmeNgESe4UrJ3-OwKnu56SKB7g@mail.gmail.com>

On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

I'm still interested in hearing comments from experienced folks about
whether it's sane to do this at all, rather than have similar
duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

Indeed, at the moment it doesn't. I'm proposing to change that, since
walsenders doing logical decoding on a standby need to be notified of
conflicts with recovery, many of which are the same as for normal user
backends and bgworkers.

The main exception is snapshot conflicts, where logical decoding
walsenders don't care about conflicts with specific tables, they want
to be terminated if the effective catalog_xmin on the upstream
increases past their needed catalog_xmin. They don't care about
non-catalog (or non-heap-catalog) tables. So I expect we'd just want
to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
and handle conflict with catalog_xmin increases separately.

Thank you for the explanation. It seems to be reasonable for
walsender to have the same mechanism with
procsignal_sigusr1_handler. Sharing a handler or having another
one is a matter of design. (But sharing it might not be better if
walsender handles only two reasons.)

...

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

The global is unncecessary if walsender have a different handler
from normal backends. If there's at least one or few additional
reasons for signal, sharing SendProcSignal and having dedicate
handler might be better.

If no behavior is shared among normal backend and walsender, it
would be a good reason not to share the handler function. What
you are willing to do seems so.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#7)
Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

On 22 November 2016 at 17:49, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

The global is unncecessary if walsender have a different handler
from normal backends. If there's at least one or few additional
reasons for signal, sharing SendProcSignal and having dedicate
handler might be better.

If no behavior is shared among normal backend and walsender, it
would be a good reason not to share the handler function. What
you are willing to do seems so.

I've explored this some more, and it looks like using
procsignal_sigusr1_handler for handling recovery conflicts in the
walsender during logical decoding actually makes a lot of sense.
Almost all behaviour is shared, and so far I haven't needed any
special cases at all. I needed to add a new recovery signal for
conflict with catalog_xmin advance on upstream, but that was it.

Many of the cases make no sense for physical walsenders, so it
probably makes sense to bail out early if it's a physical walsender,
but for a walsender doing logical replication the only one that I
don't think makes sense is conflict with snapshot, which won't get
sent and is harmless if received.

(The comment on it is slightly wrong anyway; it claims it's only used
by normal user backends in transactions, but database conflicts are
fired even when not in an xact.)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers