logical replication and PANIC during shutdown checkpoint in publisher

Started by Fujii Masaoalmost 9 years ago67 messages

masao.fujii@gmail.com

almost 9 years ago

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Fujii Masao (#1)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/12/17 09:55, Fujii Masao wrote:

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao

masao.fujii@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#2)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 09:55, Fujii Masao wrote:

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

So we make walsenders switch to that mode and wait for all the already-ongoing
their "write" commands to finish, and then we start a shutdown checkpoint?
This is an idea, but seems a bit complicated. ISTM that it's simpler to
terminate only walsenders for logical rep before shutdown checkpoint.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Fujii Masao (#3)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 09:55, Fujii Masao wrote:

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

So we make walsenders switch to that mode and wait for all the already-ongoing
their "write" commands to finish, and then we start a shutdown checkpoint?
This is an idea, but seems a bit complicated. ISTM that it's simpler to
terminate only walsenders for logical rep before shutdown checkpoint.

Perhaps my memory is failing me here... But in clean shutdowns we do
shut down WAL senders after the checkpoint has completed so as we are
sure that they have flushed the LSN corresponding to the checkpoint
ending, right? Why introducing an inconsistency for logical workers?
It seems to me that logical workers should fail under the same rules.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao

masao.fujii@gmail.com

over 8 years ago

In reply to: Michael Paquier (#4)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 09:55, Fujii Masao wrote:

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

So we make walsenders switch to that mode and wait for all the already-ongoing
their "write" commands to finish, and then we start a shutdown checkpoint?
This is an idea, but seems a bit complicated. ISTM that it's simpler to
terminate only walsenders for logical rep before shutdown checkpoint.

Perhaps my memory is failing me here... But in clean shutdowns we do
shut down WAL senders after the checkpoint has completed so as we are
sure that they have flushed the LSN corresponding to the checkpoint
ending, right?

Yes.

Why introducing an inconsistency for logical workers?
It seems to me that logical workers should fail under the same rules.

Could you tell me why? You think that even walsender for logical rep
needs to stream the shutdown checkpoint WAL record to the subscriber?
I was not thinking that's true.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Fujii Masao (#5)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, Apr 14, 2017 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 13, 2017 at 12:36 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 13, 2017 at 12:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 13, 2017 at 5:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/12/17 09:55, Fujii Masao wrote:

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

Can we turn it into a kind of read-only or no-new-commands mode instead,
so it can keep streaming but not accept any new actions?

So we make walsenders switch to that mode and wait for all the already-ongoing
their "write" commands to finish, and then we start a shutdown checkpoint?
This is an idea, but seems a bit complicated. ISTM that it's simpler to
terminate only walsenders for logical rep before shutdown checkpoint.

Perhaps my memory is failing me here... But in clean shutdowns we do
shut down WAL senders after the checkpoint has completed so as we are
sure that they have flushed the LSN corresponding to the checkpoint
ending, right?

Yes.

Why introducing an inconsistency for logical workers?
It seems to me that logical workers should fail under the same rules.

Could you tell me why? You think that even walsender for logical rep
needs to stream the shutdown checkpoint WAL record to the subscriber?
I was not thinking that's true.

For physical replication, the property to wait that standbys have
flushed the LSN of the shutdown checkpoint can be important for
switchovers. For example, with a primary and a standby, it is possible
to stop cleanly the master, promote the standby, and then connect back
to the cluster the old primary as a standby to the now-new primary
with the guarantee that both are in a consistent state. It seems to me
that having similar guarantees for logical replication is important.

Now, I have not reviewed the code of logirep in details at the level
of Peter, Petr or yourself...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Fujii Masao (#1)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case - if it's one of the replication commands
and not SQL then we'll need to backpatch any solution we come up with to
9.4, if it's not replication command, we only need to fix 10.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao

masao.fujii@gmail.com

over 8 years ago

In reply to: Petr Jelinek (#7)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Fujii Masao (#8)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 14/04/17 19:33, Fujii Masao wrote:

On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Ah yes looking at the code, it does exactly that (on master only). Means
that backport will be necessary.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#9)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/14/17 14:23, Petr Jelinek wrote:

Ah yes looking at the code, it does exactly that (on master only). Means
that backport will be necessary.

I think these two sentences are contradicting each other.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#10)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 14/04/17 21:05, Peter Eisentraut wrote:

On 4/14/17 14:23, Petr Jelinek wrote:

Ah yes looking at the code, it does exactly that (on master only). Means
that backport will be necessary.

I think these two sentences are contradicting each other.

Hehe, didn't realize master will be taken as master branch, I meant
master as in not standby :)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Fujii Masao (#1)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Noah Misch (#12)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On 16/04/17 08:12, Noah Misch wrote:

On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

Just FYI this is not new in 10, the issue exists since the 9.4
introduction of logical replication slots.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Fujii Masao (#8)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:

On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Erroring out in these cases sounds easy enough. Wonder if there's not a
bigger problem with WAL records generated e.g. by HOT pruning or such,
during decoding. Not super likely, but would probably hit exactly the
same, no?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#14)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 17/04/17 18:02, Andres Freund wrote:

On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:

On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Erroring out in these cases sounds easy enough. Wonder if there's not a
bigger problem with WAL records generated e.g. by HOT pruning or such,
during decoding. Not super likely, but would probably hit exactly the
same, no?

Sounds possible, yes. Sounds like that's going to be nontrivial to fix
though.

Another problem is that queries can run on walsender now. But that
should be possible to detect and shutdown just like backend.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Petr Jelinek (#15)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-04-17 18:28:16 +0200, Petr Jelinek wrote:

On 17/04/17 18:02, Andres Freund wrote:

On 2017-04-15 02:33:59 +0900, Fujii Masao wrote:

On Fri, Apr 14, 2017 at 10:33 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 12/04/17 15:55, Fujii Masao wrote:

Hi,

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

Hmm, but CREATE_REPLICATION_SLOT should not generate WAL afaik. I agree
that the issue with walsender still exist (since we now allow normal SQL
to run there) but I think it's important to identify what exactly causes
the WAL activity in your case

At least in my case, the following CREATE_REPLICATION_SLOT command
generated WAL record.

BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ;
CREATE_REPLICATION_SLOT testslot TEMPORARY LOGICAL pgoutput USE_SNAPSHOT;

Here is the pg_waldump output of the WAL record that CREATE_REPLICATION_SLOT
generated.

rmgr: Standby len (rec/tot): 24/ 50, tx: 0,
lsn: 0/01601438, prev 0/01601400, desc: RUNNING_XACTS nextXid 692
latestCompletedXid 691 oldestRunningXid 692

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Erroring out in these cases sounds easy enough. Wonder if there's not a
bigger problem with WAL records generated e.g. by HOT pruning or such,
during decoding. Not super likely, but would probably hit exactly the
same, no?

Sounds possible, yes. Sounds like that's going to be nontrivial to fix
though.

Another problem is that queries can run on walsender now. But that
should be possible to detect and shutdown just like backend.

This sounds like a case for s/PANIC/ERROR|FATAL/ to me...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#16)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/17/17 12:30, Andres Freund wrote:

So I guess that CREATE_REPLICATION_SLOT code calls LogStandbySnapshot()
and which generates WAL record about snapshot of running transactions.

Erroring out in these cases sounds easy enough. Wonder if there's not a
bigger problem with WAL records generated e.g. by HOT pruning or such,
during decoding. Not super likely, but would probably hit exactly the
same, no?

Sounds possible, yes. Sounds like that's going to be nontrivial to fix
though.

Another problem is that queries can run on walsender now. But that
should be possible to detect and shutdown just like backend.

This sounds like a case for s/PANIC/ERROR|FATAL/ to me...

I'd imagine the postmaster would tell the walsender that it has started
shutdown, and then the walsender would reject $certain_things. But I
don't see an existing way for the walsender to know that shutdown has
been initiated. SIGINT is still free ...

The alternative of shutting down logical walsenders earlier also doesn't
look straightforward, since the postmaster doesn't know directly what
kind of walsender a certain process is. So you'd also need additional
signal types or something there.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#17)

1 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I'd imagine the postmaster would tell the walsender that it has started
shutdown, and then the walsender would reject $certain_things. But I
don't see an existing way for the walsender to know that shutdown has
been initiated. SIGINT is still free ...

The WAL sender receives SIGUSR2 from the postmaster when shutdown is
initiated, so why not just rely on that and issue an ERROR when a
client attempts to create or drop a new slot, setting up
walsender_ready_to_stop unconditionally? It seems to me that the issue
here is the delay between the moment SIGTERM is acknowledged by the
WAL sender and the moment CREATE_SLOT is treater. An idea with the
attached...

The alternative of shutting down logical walsenders earlier also doesn't
look straightforward, since the postmaster doesn't know directly what
kind of walsender a certain process is. So you'd also need additional
signal types or something there.

Yup, but is a switchover between a publisher and a subscriber
something that can happen?
--
Michael

Attachments:

walsender-shutdown-fix.patchapplication/octet-stream; name=walsender-shutdown-fix.patchDownload

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index dbb10c7b00..0fc3a14765 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -844,6 +844,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
+	/*
+	 * If WAL sender is shutting down, prevent CREATE_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (walsender_ready_to_stop)
+		ereport(ERROR,
+				(errmsg("CREATE_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	parseCreateReplSlotOptions(cmd, &reserve_wal, &snapshot_action);
 
 	/* setup state for XLogReadPage */
@@ -1019,6 +1027,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
+	/*
+	 * If WAL sender is shutting down, prevent DROP_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (walsender_ready_to_stop)
+		ereport(ERROR,
+				(errmsg("DROP_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	ReplicationSlotDrop(cmd->slotname);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
@@ -2840,6 +2856,8 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	walsender_ready_to_stop = true;
+
 	/*
 	 * If replication has not yet started, die like with SIGTERM. If
 	 * replication is active, only set a flag and wake up the main loop. It
@@ -2849,7 +2867,6 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;

#19

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#18)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/19/17 01:45, Michael Paquier wrote:

On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I'd imagine the postmaster would tell the walsender that it has started
shutdown, and then the walsender would reject $certain_things. But I
don't see an existing way for the walsender to know that shutdown has
been initiated. SIGINT is still free ...

The WAL sender receives SIGUSR2 from the postmaster when shutdown is
initiated, so why not just rely on that and issue an ERROR when a
client attempts to create or drop a new slot, setting up
walsender_ready_to_stop unconditionally? It seems to me that the issue
here is the delay between the moment SIGTERM is acknowledged by the
WAL sender and the moment CREATE_SLOT is treater. An idea with the
attached...

I think the problem with a signal-based solution is that there is no
feedback. Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Noah Misch (#12)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Sun, Apr 16, 2017 at 06:12:58AM +0000, Noah Misch wrote:

On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:

When I shut down the publisher while I repeated creating and dropping
the subscription in the subscriber, the publisher emitted the following
PANIC error during shutdown checkpoint.

PANIC: concurrent transaction log activity while database system is
shutting down

The cause of this problem is that walsender for logical replication can
generate WAL records even during shutdown checkpoint.

Firstly walsender keeps running until shutdown checkpoint finishes
so that all the WAL including shutdown checkpoint record can be
replicated to the standby. This was safe because previously walsender
could not generate WAL records. However this assumption became
invalid because of logical replication. That is, currenty walsender for
logical replication can generate WAL records, for example, by executing
CREATE_REPLICATION_SLOT command. This is an oversight in
logical replication patch, I think.

To fix this issue, we should terminate walsender for logical replication
before shutdown checkpoint starts. Of course walsender for physical
replication still needs to keep running until shutdown checkpoint ends,
though.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.

[1] /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#19)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Apr 20, 2017 at 4:57 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/19/17 01:45, Michael Paquier wrote:

On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I'd imagine the postmaster would tell the walsender that it has started
shutdown, and then the walsender would reject $certain_things. But I
don't see an existing way for the walsender to know that shutdown has
been initiated. SIGINT is still free ...

The WAL sender receives SIGUSR2 from the postmaster when shutdown is
initiated, so why not just rely on that and issue an ERROR when a
client attempts to create or drop a new slot, setting up
walsender_ready_to_stop unconditionally? It seems to me that the issue
here is the delay between the moment SIGTERM is acknowledged by the
WAL sender and the moment CREATE_SLOT is treated. An idea with the
attached...

I think the problem with a signal-based solution is that there is no
feedback. Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

Are you sure that it is necessary to go to such extent? Why wouldn't
it be enough to prevent any replication commands generating WAL to run
when the WAL sender knows that the postmaster is in shutdown mode?
--
Michael
VMware vCenter Server
www.vmware.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Michael Paquier (#21)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Apr 20, 2017 at 12:40 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 20, 2017 at 4:57 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I think the problem with a signal-based solution is that there is no
feedback. Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

Are you sure that it is necessary to go to such extent? Why wouldn't
it be enough to prevent any replication commands generating WAL to run
when the WAL sender knows that the postmaster is in shutdown mode?

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#22)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 20/04/17 05:57, Michael Paquier wrote:

On Thu, Apr 20, 2017 at 12:40 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 20, 2017 at 4:57 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I think the problem with a signal-based solution is that there is no
feedback. Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

Are you sure that it is necessary to go to such extent? Why wouldn't
it be enough to prevent any replication commands generating WAL to run
when the WAL sender knows that the postmaster is in shutdown mode?

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.

+1 to this solution

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#23)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/20/17 07:52, Petr Jelinek wrote:

On 20/04/17 05:57, Michael Paquier wrote:

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.

+1 to this solution

Michael, can you attempt to supply a patch?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Noah Misch (#20)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/19/17 23:04, Noah Misch wrote:

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

We have a possible solution but need to work out a patch. Let's say
next check-in on Monday.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#24)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/20/17 07:52, Petr Jelinek wrote:

On 20/04/17 05:57, Michael Paquier wrote:

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.

+1 to this solution

Michael, can you attempt to supply a patch?

Hmm. I have been actually looking at this solution and I am having
doubts regarding its robustness. In short this would need to be
roughly a two-step process:
- In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
make it call ShutdownXLOG(). Prior doing that, a first signal should
be sent to all the WAL senders with
SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
used.
- At reception of this signal, all WAL senders switch to a stopping
state, refusing commands that can generate WAL.
- Checkpointer looks at the state of all WAL senders, looping with a
sleep call of a couple of ms, refusing to launch the shutdown
checkpoint as long as all WAL senders have not switched to the
stopping state.
- In reaper(), once checkpointer is confirmed as stopped, signal again
the WAL senders, and tell them to perform the last loop.

After that, I got a second, more simple idea.
CheckpointerShmem->ckpt_flags holds the information about checkpoints
currently running, so we could have the WAL senders look at this data
and prevent any commands generating WAL. The checkpointer may be
already stopped at the moment the WAL senders finish their loop, so we
need also to check if the checkpointer is running or not on those code
paths. Such safeguards may actually be enough for the problem of this
thread. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#26)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 21/04/17 06:11, Michael Paquier wrote:

On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/20/17 07:52, Petr Jelinek wrote:

On 20/04/17 05:57, Michael Paquier wrote:

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.

+1 to this solution

Michael, can you attempt to supply a patch?

Hmm. I have been actually looking at this solution and I am having
doubts regarding its robustness. In short this would need to be
roughly a two-step process:
- In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
make it call ShutdownXLOG(). Prior doing that, a first signal should
be sent to all the WAL senders with
SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
used.
- At reception of this signal, all WAL senders switch to a stopping
state, refusing commands that can generate WAL.
- Checkpointer looks at the state of all WAL senders, looping with a
sleep call of a couple of ms, refusing to launch the shutdown
checkpoint as long as all WAL senders have not switched to the
stopping state.
- In reaper(), once checkpointer is confirmed as stopped, signal again
the WAL senders, and tell them to perform the last loop.

After that, I got a second, more simple idea.
CheckpointerShmem->ckpt_flags holds the information about checkpoints
currently running, so we could have the WAL senders look at this data
and prevent any commands generating WAL. The checkpointer may be
already stopped at the moment the WAL senders finish their loop, so we
need also to check if the checkpointer is running or not on those code
paths. Such safeguards may actually be enough for the problem of this
thread. Thoughts?

Hmm but how do we handle statements that are already in progress by the
time ckpt_flags changes?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Petr Jelinek (#27)

1 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Sun, Apr 23, 2017 at 10:15 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 21/04/17 06:11, Michael Paquier wrote:

On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Hmm. I have been actually looking at this solution and I am having
doubts regarding its robustness. In short this would need to be
roughly a two-step process:
- In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
make it call ShutdownXLOG(). Prior doing that, a first signal should
be sent to all the WAL senders with
SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
used.
- At reception of this signal, all WAL senders switch to a stopping
state, refusing commands that can generate WAL.
- Checkpointer looks at the state of all WAL senders, looping with a
sleep call of a couple of ms, refusing to launch the shutdown
checkpoint as long as all WAL senders have not switched to the
stopping state.
- In reaper(), once checkpointer is confirmed as stopped, signal again
the WAL senders, and tell them to perform the last loop.

OK, I have been hacking that, finishing with the attached. In the
attached I am using SIGUSR2 to instruct the WAL senders to prepare for
stopping, and SIGINT to handle the last WAL flush loop. The shutdown
checkpoint moves on only if all active WAL senders are marked with a
STOPPING state. Reviews as welcome.

After that, I got a second, more simple idea.
CheckpointerShmem->ckpt_flags holds the information about checkpoints
currently running, so we could have the WAL senders look at this data
and prevent any commands generating WAL. The checkpointer may be
already stopped at the moment the WAL senders finish their loop, so we
need also to check if the checkpointer is running or not on those code
paths. Such safeguards may actually be enough for the problem of this
thread. Thoughts?

Hmm but how do we handle statements that are already in progress by the
time ckpt_flags changes?

Yup, this does not handle well race conditions.
--
Michael

Attachments:

walsender-chkpt-v1.patchapplication/octet-stream; name=walsender-chkpt-v1.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2a83671b53..80d12b26d7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7667879c6..4b64c460c3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8325,6 +8325,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * If there are any WAL senders active, wait to get the confirmation that
+	 * they are in a stopping state before moving on to next steps.
+	 */
+	WalSndWaitStop();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index c38234527f..d998739f92 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2904,7 +2904,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3642,7 +3642,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so as the shutdown
+				 * checkpoint can progress.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3651,6 +3653,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 064cf5ee28..7951629acb 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,17 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends SIGUSR2 before telling the
+ * checkpointer to issue the shutdown checkpoint to switch all the WAL
+ * senders to a stopping state. Once this state is reached WAL senders will
+ * block any replication command that may generate WAL activity. The
+ * checkpointer checks the state of each WAL sender, and begins the shutdown
+ * checkpoint once all the WAL senders are confirmed as stopping. When the
+ * shutdown checkpoint finishes, the postmaster sends SIGINT to all the WAL
+ * senders once all the regular backends have exited and the shutdown
+ * checkpoint has been written. This instructs walsender to send any
+ * outstanding WAL, including the shutdown checkpoint record, wait for it to
+ * be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +183,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_SIGINT = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -213,6 +220,7 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -299,11 +307,14 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -676,7 +687,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT || got_SIGUSR2)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -844,6 +855,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
+	/*
+	 * If WAL sender is shutting down, prevent CREATE_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("CREATE_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	parseCreateReplSlotOptions(cmd, &reserve_wal, &snapshot_action);
 
 	/* setup state for XLogReadPage */
@@ -1019,6 +1038,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
+	/*
+	 * If WAL sender is shutting down, prevent DROP_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("DROP_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	ReplicationSlotDrop(cmd->slotname);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
@@ -1048,7 +1075,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1098,7 +1125,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT || got_SIGUSR2)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1286,6 +1313,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to a stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1294,7 +1329,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1369,6 +1404,13 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2090,13 +2132,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to a stopping
+			 * mode.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2836,7 +2885,24 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2851,7 +2917,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2864,14 +2930,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2949,6 +3015,51 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached a stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can be
+ * safely begin.
+ */
+void
+WalSndWaitStop(void)
+{
+	int			i;
+
+	for (;;)
+	{
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2982,6 +3093,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..eb9cf0b0dc 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStop(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*

#29

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#26)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/21/17 00:11, Michael Paquier wrote:

Hmm. I have been actually looking at this solution and I am having
doubts regarding its robustness. In short this would need to be
roughly a two-step process:
- In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
make it call ShutdownXLOG(). Prior doing that, a first signal should
be sent to all the WAL senders with
SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
used.
- At reception of this signal, all WAL senders switch to a stopping
state, refusing commands that can generate WAL.
- Checkpointer looks at the state of all WAL senders, looping with a
sleep call of a couple of ms, refusing to launch the shutdown
checkpoint as long as all WAL senders have not switched to the
stopping state.
- In reaper(), once checkpointer is confirmed as stopped, signal again
the WAL senders, and tell them to perform the last loop.

Yeah that looks like a reasonable approach.

I'm not sure why in your patch you process got_SIGUSR2 in
WalSndErrorCleanup() instead of in the main loop.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#25)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/20/17 11:30, Peter Eisentraut wrote:

On 4/19/17 23:04, Noah Misch wrote:

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

We have a possible solution but need to work out a patch. Let's say
next check-in on Monday.

Update: We have a patch that looks promising, but we haven't made much
progress in reviewing the details. I'll work on it this week, and
perhaps Michael also has time to work on it this week. We could use
more eyes. Next check-in Friday.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#30)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On Wed, Apr 26, 2017 at 4:26 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/20/17 11:30, Peter Eisentraut wrote:

We have a possible solution but need to work out a patch. Let's say
next check-in on Monday.

Update: We have a patch that looks promising, but we haven't made much
progress in reviewing the details. I'll work on it this week, and
perhaps Michael also has time to work on it this week.

Next week is Golden Week in Japan so I'll have limited access to an
electronic devices. This week should be fine.

We could use more eyes. Next check-in Friday.

Reviews always welcome.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#29)

1 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Wed, Apr 26, 2017 at 3:17 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/21/17 00:11, Michael Paquier wrote:

Hmm. I have been actually looking at this solution and I am having
doubts regarding its robustness. In short this would need to be
roughly a two-step process:
- In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
make it call ShutdownXLOG(). Prior doing that, a first signal should
be sent to all the WAL senders with
SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
used.
- At reception of this signal, all WAL senders switch to a stopping
state, refusing commands that can generate WAL.
- Checkpointer looks at the state of all WAL senders, looping with a
sleep call of a couple of ms, refusing to launch the shutdown
checkpoint as long as all WAL senders have not switched to the
stopping state.
- In reaper(), once checkpointer is confirmed as stopped, signal again
the WAL senders, and tell them to perform the last loop.

Yeah that looks like a reasonable approach.

I'm not sure why in your patch you process got_SIGUSR2 in
WalSndErrorCleanup() instead of in the main loop.

Yes I was hesitating about this one when hacking it. Thinking an extra
time, the similar check in StartReplication() should also not use
got_SIGUSR2 to give the WAL sender a chance to do more work while the
shutdown checkpoint is running as it could take minutes.

Attached is an updated patch to reflect that.
--
Michael

Attachments:

walsender-chkpt-v2.patchapplication/octet-stream; name=walsender-chkpt-v2.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2a83671b53..80d12b26d7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7667879c6..4b64c460c3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8325,6 +8325,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * If there are any WAL senders active, wait to get the confirmation that
+	 * they are in a stopping state before moving on to next steps.
+	 */
+	WalSndWaitStop();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2bb4380533..992fb3ac98 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,7 +3656,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so as the shutdown
+				 * checkpoint can progress.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3665,6 +3667,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 064cf5ee28..dda3368168 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,17 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends SIGUSR2 before telling the
+ * checkpointer to issue the shutdown checkpoint to switch all the WAL
+ * senders to a stopping state. Once this state is reached WAL senders will
+ * block any replication command that may generate WAL activity. The
+ * checkpointer checks the state of each WAL sender, and begins the shutdown
+ * checkpoint once all the WAL senders are confirmed as stopping. When the
+ * shutdown checkpoint finishes, the postmaster sends SIGINT to all the WAL
+ * senders once all the regular backends have exited and the shutdown
+ * checkpoint has been written. This instructs walsender to send any
+ * outstanding WAL, including the shutdown checkpoint record, wait for it to
+ * be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +183,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_SIGINT = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -213,6 +220,7 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -299,11 +307,14 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -676,7 +687,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -844,6 +855,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
+	/*
+	 * If WAL sender is shutting down, prevent CREATE_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("CREATE_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	parseCreateReplSlotOptions(cmd, &reserve_wal, &snapshot_action);
 
 	/* setup state for XLogReadPage */
@@ -1019,6 +1038,14 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
+	/*
+	 * If WAL sender is shutting down, prevent DROP_REPLICATION_SLOT as it
+	 * could result in the generation of new WAL data.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("DROP_REPLICATION_SLOT cannot be called during WAL sender shutdown")));
+
 	ReplicationSlotDrop(cmd->slotname);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
@@ -1048,7 +1075,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1098,7 +1125,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1286,6 +1313,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to a stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1294,7 +1329,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1369,6 +1404,13 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2090,13 +2132,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to a stopping
+			 * mode.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2836,7 +2885,24 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2851,7 +2917,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2864,14 +2930,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2949,6 +3015,51 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached a stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStop(void)
+{
+	int			i;
+
+	for (;;)
+	{
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2982,6 +3093,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..eb9cf0b0dc 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStop(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*

#33

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Peter Eisentraut (#30)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, Apr 25, 2017 at 03:26:06PM -0400, Peter Eisentraut wrote:

On 4/20/17 11:30, Peter Eisentraut wrote:

On 4/19/17 23:04, Noah Misch wrote:

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

We have a possible solution but need to work out a patch. Let's say
next check-in on Monday.

Update: We have a patch that looks promising, but we haven't made much
progress in reviewing the details. I'll work on it this week, and
perhaps Michael also has time to work on it this week. We could use
more eyes. Next check-in Friday.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Noah Misch

noah@leadboat.com

over 8 years ago

In reply to: Noah Misch (#33)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On Sat, Apr 29, 2017 at 11:18:45AM -0700, Noah Misch wrote:

On Tue, Apr 25, 2017 at 03:26:06PM -0400, Peter Eisentraut wrote:

On 4/20/17 11:30, Peter Eisentraut wrote:

On 4/19/17 23:04, Noah Misch wrote:

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

We have a possible solution but need to work out a patch. Let's say
next check-in on Monday.

Update: We have a patch that looks promising, but we haven't made much
progress in reviewing the details. I'll work on it this week, and
perhaps Michael also has time to work on it this week. We could use
more eyes. Next check-in Friday.

This PostgreSQL 10 open item is past due for your status update. Kindly send
a status update within 24 hours, and include a date for your subsequent status
update. Refer to the policy on open item ownership:
/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due
for your status update. Please thoroughly reacquaint yourself with the policy
on open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and then reply immediately. If I do not hear from
you by 2017-05-02 01:00 UTC, I will transfer this item to release management
team ownership without further notice.

[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Noah Misch (#34)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/30/17 20:52, Noah Misch wrote:

IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due
for your status update. Please thoroughly reacquaint yourself with the policy
on open item ownership[1] and then reply immediately. If I do not hear from
you by 2017-05-02 01:00 UTC, I will transfer this item to release management
team ownership without further notice.

I'm reviewing this now and will report tomorrow.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#32)

1 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 4/25/17 21:47, Michael Paquier wrote:

Attached is an updated patch to reflect that.

I edited this a bit, here is a new version.

A variant approach would be to prohibit *all* new commands after
entering the "stopping" state, just let running commands run. That way
we don't have to pick which individual commands are at risk. I'm not
sure that we have covered everything here.

More reviews please. Also, this is a possible backpatching candidate.

Also, if someone has a test script for reproducing the original issue,
please share it, or run it against this patch.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Prevent-panic-during-shutdown-checkpoint.patchinvalid/octet-stream; name=v3-0001-Prevent-panic-during-shutdown-checkpoint.patchDownload

From a58f96305a7228aaa2da06813bfb30f293c77095 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 15:09:06 -0400
Subject: [PATCH v3] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, since walsenders cannot generate WAL,
but for instances certain variants of the CREATE_REPLICATION_SLOT
replication command do generate WAL and can trigger this if run while
the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any commands that might generate WAL.  The checkpointer waits for
all walsenders to reach this state before proceeding with the shutdown
checkpoint.  After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpointer and
then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 doc/src/sgml/monitoring.sgml                |   5 +
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 152 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 150 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2a83671b53..80d12b26d7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ <title><structname>pg_stat_replication</structname> View</title>
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a89d99838a..5d6f8b75b8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8325,6 +8325,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4a25ed8f5b..01f1c2805f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,7 +3656,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3665,6 +3667,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2a6c8bb62d..7ac6ca514c 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +180,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -213,6 +217,7 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -299,11 +304,14 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -324,6 +332,21 @@ WalSndShutdown(void)
 }
 
 /*
+ * Throw error if in stopping mode.
+ *
+ * This is useful to prevent commands that could generate WAL while the
+ * shutdown checkpoint is being written.
+ */
+static void
+PreventCommandIfStopping(const char *cmdname)
+{
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute %s while WAL sender is in stopping mode",
+						cmdname)));
+}
+
+/*
  * Handle the IDENTIFY_SYSTEM command.
  */
 static void
@@ -676,7 +699,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -844,6 +867,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 
 	Assert(!MyReplicationSlot);
 
+	PreventCommandIfStopping("CREATE_REPLICATION_SLOT");
+
 	parseCreateReplSlotOptions(cmd, &reserve_wal, &snapshot_action);
 
 	/* setup state for XLogReadPage */
@@ -1024,6 +1049,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 static void
 DropReplicationSlot(DropReplicationSlotCmd *cmd)
 {
+	PreventCommandIfStopping("DROP_REPLICATION_SLOT");
 	ReplicationSlotDrop(cmd->slotname);
 	EndCommand("DROP_REPLICATION_SLOT", DestRemote);
 }
@@ -1053,7 +1079,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1103,7 +1129,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1291,6 +1317,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to the stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1299,7 +1333,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1374,6 +1408,13 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2095,13 +2136,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2841,7 +2889,23 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2856,7 +2920,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2869,14 +2933,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2954,6 +3018,50 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2987,6 +3095,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..99f12377e0 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.12.2

#37

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#36)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, May 2, 2017 at 7:07 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/25/17 21:47, Michael Paquier wrote:

Attached is an updated patch to reflect that.

I edited this a bit, here is a new version.

Thanks, looks fine for me.

A variant approach would be to prohibit *all* new commands after
entering the "stopping" state, just let running commands run. That way
we don't have to pick which individual commands are at risk. I'm not
sure that we have covered everything here.

It seems to me that everything is covered. We are taking about
creation and dropping of slots here, where standby snapshots can be
created and SQL queries can be run when doing a tablesync meaning that
FPWs could be taken in the context of the WAL sender. Blocking all
commands would be surely safer I agree, but I see no reason to block
things more than necessary.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Petr Jelinek

petr.jelinek@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#37)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 02/05/17 05:35, Michael Paquier wrote:

On Tue, May 2, 2017 at 7:07 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/25/17 21:47, Michael Paquier wrote:

Attached is an updated patch to reflect that.

I edited this a bit, here is a new version.

Thanks, looks fine for me.

A variant approach would be to prohibit *all* new commands after
entering the "stopping" state, just let running commands run. That way
we don't have to pick which individual commands are at risk. I'm not
sure that we have covered everything here.

It seems to me that everything is covered. We are taking about
creation and dropping of slots here, where standby snapshots can be
created and SQL queries can be run when doing a tablesync meaning that
FPWs could be taken in the context of the WAL sender. Blocking all
commands would be surely safer I agree, but I see no reason to block
things more than necessary.

I don't think the code covers all because a) the SQL queries are not
covered at all that I can see and b) logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

I wonder if this whole prevent thing should just be called
unconditionally on walsender that's connected to database
(am_db_walsender), because in the WAL logging will only happen there -
CREATE_REPLICATION_SLOT PHYSICAL will not WAL log and
CREATE_REPLICATION_SLOT LOGICAL can't be run without being connected to
db, neither can logical decoding and SQL queries, so that coves all.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Petr Jelinek (#38)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, May 2, 2017 at 4:11 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:

On 02/05/17 05:35, Michael Paquier wrote:

On Tue, May 2, 2017 at 7:07 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/25/17 21:47, Michael Paquier wrote:

Attached is an updated patch to reflect that.

I edited this a bit, here is a new version.

Thanks, looks fine for me.

A variant approach would be to prohibit *all* new commands after
entering the "stopping" state, just let running commands run. That way
we don't have to pick which individual commands are at risk. I'm not
sure that we have covered everything here.

It seems to me that everything is covered. We are taking about
creation and dropping of slots here, where standby snapshots can be
created and SQL queries can be run when doing a tablesync meaning that
FPWs could be taken in the context of the WAL sender. Blocking all
commands would be surely safer I agree, but I see no reason to block
things more than necessary.

I don't think the code covers all because a) the SQL queries are not
covered at all that I can see and b) logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

Ahhh. So START_REPLICATION can also now generate WAL. Good to know.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#39)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/2/17 03:43, Michael Paquier wrote:

I don't think the code covers all because a) the SQL queries are not
covered at all that I can see and b) logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

Ahhh. So START_REPLICATION can also now generate WAL. Good to know.

And just looking at pg_current_wal_location(), running BASE_BACKUP also
advances the WAL.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Petr Jelinek (#38)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/2/17 03:11, Petr Jelinek wrote:

logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

This seems a bit impossible to resolve. On the one hand, we want to
allow streaming until after the shutdown checkpoint. On the other hand,
streaming itself might produce new WAL.

Can we prevent HOT pruning during logical decoding?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#40)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, May 2, 2017 at 9:27 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/2/17 03:43, Michael Paquier wrote:

I don't think the code covers all because a) the SQL queries are not
covered at all that I can see and b) logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

Ahhh. So START_REPLICATION can also now generate WAL. Good to know.

And just looking at pg_current_wal_location(), running BASE_BACKUP also
advances the WAL.

Indeed. I forgot the backup end record and the segment switch. We are
good for a backpatch down to 9.2 here.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#41)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, May 2, 2017 at 9:30 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/2/17 03:11, Petr Jelinek wrote:

logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

This seems a bit impossible to resolve. On the one hand, we want to
allow streaming until after the shutdown checkpoint. On the other hand,
streaming itself might produce new WAL.

It would be nice to split things into two:
- patch 1 adding the signal handling that wins a backpatch.
- patch 2 fixing the side cases with logical decoding.

Can we prevent HOT pruning during logical decoding?

It does not sound much difficult to do, couldn't you just make it a
no-op with am_walsender?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#43)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/2/17 10:08, Michael Paquier wrote:

On Tue, May 2, 2017 at 9:30 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/2/17 03:11, Petr Jelinek wrote:

logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

This seems a bit impossible to resolve. On the one hand, we want to
allow streaming until after the shutdown checkpoint. On the other hand,
streaming itself might produce new WAL.

It would be nice to split things into two:
- patch 1 adding the signal handling that wins a backpatch.
- patch 2 fixing the side cases with logical decoding.

The side cases with logical decoding are also not new and would need
backpatching, AIUI.

Can we prevent HOT pruning during logical decoding?

It does not sound much difficult to do, couldn't you just make it a
no-op with am_walsender?

That's my hope.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#35)

Re: Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/1/17 13:43, Peter Eisentraut wrote:

On 4/30/17 20:52, Noah Misch wrote:

IMMEDIATE ATTENTION REQUIRED. This PostgreSQL 10 open item is long past due
for your status update. Please thoroughly reacquaint yourself with the policy
on open item ownership[1] and then reply immediately. If I do not hear from
you by 2017-05-02 01:00 UTC, I will transfer this item to release management
team ownership without further notice.

I'm reviewing this now and will report tomorrow.

We have consensus around a patch, but we are still discussing and
discovering some new details.

There is also the question of whether to backpatch and how far. (Could
be all the way to 9.2.)

I propose, if there are no new insights by Friday, I will commit the
current patch to master, which will fix the reported problem for PG10,
and punt the remaining side issues to "Older Bugs".

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#44)

1 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/2/17 10:08, Michael Paquier wrote:

On Tue, May 2, 2017 at 9:30 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/2/17 03:11, Petr Jelinek wrote:

logical decoding can theoretically
do HOT pruning (even if the chance is really small) so it's not safe to
start logical replication either.

This seems a bit impossible to resolve. On the one hand, we want to
allow streaming until after the shutdown checkpoint. On the other hand,
streaming itself might produce new WAL.

It would be nice to split things into two:
- patch 1 adding the signal handling that wins a backpatch.
- patch 2 fixing the side cases with logical decoding.

The side cases with logical decoding are also not new and would need
backpatching, AIUI.

Okay, I thought that there was some new concept part of logical
replication here.

Can we prevent HOT pruning during logical decoding?

It does not sound much difficult to do, couldn't you just make it a
no-op with am_walsender?

That's my hope.

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

Attached is an updated patch, which also forbids the run of any
replication commands when the stopping state is reached.
--
Michael

Attachments:

v4-0001-Prevent-panic-during-shutdown-checkpoint.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Prevent-panic-during-shutdown-checkpoint.patchDownload

From c8a44b6f84926712b7b6f2b36b4f13b0d1b41977 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 5 May 2017 14:10:16 +0900
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but WAL senders can generate WAL
in the context of certain replication commands that can be run during
the shutdown checkpoint:
- certain variants of CREATE_REPLICATION_SLOT.
- BASE_BACKUP and the backend end WAL record.
- START_REPLICATION and logical decoding, able to do HOT-pruning.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any commands that might generate WAL.  The checkpointer waits for
all walsenders to reach this state before proceeding with the shutdown
checkpoint.  After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpointer and
then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 doc/src/sgml/monitoring.sgml                |   5 +
 src/backend/access/heap/pruneheap.c         |   9 ++
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 7 files changed, 150 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2a83671b53..80d12b26d7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d69a266c36..d510649b18 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -22,6 +22,7 @@
 #include "catalog/catalog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "replication/walsender.h"
 #include "storage/bufmgr.h"
 #include "utils/snapmgr.h"
 #include "utils/rel.h"
@@ -189,6 +190,14 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 	PruneState	prstate;
 
 	/*
+	 * Do nothing in the presence of a WAL sender. This code path can be
+	 * taken when doing logical decoding, and it is better to avoid WAL
+	 * generation as this interferes with shutdown checkpoints.
+	 */
+	if (am_walsender)
+		return ndeleted;
+
+	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
 	 * logic as possible out of the critical section, and also ensures that
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a89d99838a..5d6f8b75b8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8325,6 +8325,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4a25ed8f5b..01f1c2805f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,7 +3656,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3665,6 +3667,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2a6c8bb62d..bcc024fec2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +180,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -213,6 +217,7 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -299,11 +304,14 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -323,6 +331,7 @@ WalSndShutdown(void)
 	abort();					/* keep the compiler quiet */
 }
 
+
 /*
  * Handle the IDENTIFY_SYSTEM command.
  */
@@ -676,7 +685,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1053,7 +1062,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1103,7 +1112,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1291,6 +1300,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to the stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1299,7 +1316,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1374,6 +1391,21 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode. This is useful to prevent commands
+	 * that could generate WAL while the shutdown checkpoint is being written.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute replication command while WAL sender is in stopping mode")));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2095,13 +2127,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2841,7 +2880,23 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2856,7 +2911,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2869,14 +2924,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2954,6 +3009,50 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2987,6 +3086,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..99f12377e0 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.12.2

#47

Pavan Deolasee

pavan.deolasee@gmail.com

over 8 years ago

In reply to: Michael Paquier (#46)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, May 5, 2017 at 10:56 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut

Can we prevent HOT pruning during logical decoding?

It does not sound much difficult to do, couldn't you just make it a
no-op with am_walsender?

That's my hope.

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

IMO the check should go inside heap_page_prune_opt(). Do we need to worry
about wal_log_hints or checksums producing WAL because of hint bit updates?
While I haven't read the thread, I am assuming if HOT pruning can happen,
surely hint bits can get set too.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#48

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Pavan Deolasee (#47)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, May 5, 2017 at 5:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Fri, May 5, 2017 at 10:56 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Wed, May 3, 2017 at 12:25 AM, Peter Eisentraut

Can we prevent HOT pruning during logical decoding?

It does not sound much difficult to do, couldn't you just make it a
no-op with am_walsender?

That's my hope.

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

IMO the check should go inside heap_page_prune_opt(). Do we need to worry
about wal_log_hints or checksums producing WAL because of hint bit updates?
While I haven't read the thread, I am assuming if HOT pruning can happen,
surely hint bits can get set too.

Yeah, that's as well what I am worrying about. Experts of logical
decoding will correct me, but it seems to me that we have to cover all
the cases where heap scans can generate WAL.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#46)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/5/17 01:26, Michael Paquier wrote:

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

Attached is an updated patch, which also forbids the run of any
replication commands when the stopping state is reached.

I have committed this without the HOT pruning change. That can be
considered separately, and I think it could use another round of
thinking about it.

I will move the open item to "Older Bugs" now, because the user
experience regression, so to speak, in version 10 has been addressed.

(This could be a backpatching candidate, but I am not planning on it for
next week's releases in any case.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#49)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, May 5, 2017 at 11:50 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/5/17 01:26, Michael Paquier wrote:

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

Attached is an updated patch, which also forbids the run of any
replication commands when the stopping state is reached.

I have committed this without the HOT pruning change. That can be
considered separately, and I think it could use another round of
thinking about it.

Agreed. Just adding an ERROR message in XLogInsert() is not going to
help much as this leads also to PANIC for critical sections :(
So a patch really needs to be a no-op for all WAL-related operations
within the WAL sender, and that will be quite invasive I am afraid.

I will move the open item to "Older Bugs" now, because the user
experience regression, so to speak, in version 10 has been addressed.
(This could be a backpatching candidate, but I am not planning on it for
next week's releases in any case.)

No issues with all that.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Michael Paquier (#50)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Sat, May 6, 2017 at 6:40 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Agreed. Just adding an ERROR message in XLogInsert() is not going to
help much as this leads also to PANIC for critical sections :(
So a patch really needs to be a no-op for all WAL-related operations
within the WAL sender, and that will be quite invasive I am afraid.

I will move the open item to "Older Bugs" now, because the user
experience regression, so to speak, in version 10 has been addressed.
(This could be a backpatching candidate, but I am not planning on it for
next week's releases in any case.)

No issues with all that.

So, now that the last round of minor releases has happened and that
some dust has settled on this patch, shouldn't there be a backpatch?
If yes, do you need patches for all branches? This problems goes down
to 9.2 anyway as BASE_BACKUP can generate end-of-backup records.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Michael Paquier (#51)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 5/26/17 14:16, Michael Paquier wrote:

So, now that the last round of minor releases has happened and that
some dust has settled on this patch, shouldn't there be a backpatch?
If yes, do you need patches for all branches? This problems goes down
to 9.2 anyway as BASE_BACKUP can generate end-of-backup records.

Yes, this could be backpatched now. It looks like it will need a bit of
fiddling to get it into all the backbranches. If you want to give it a
closer look, go ahead please.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#52)

5 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, May 26, 2017 at 4:47 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 5/26/17 14:16, Michael Paquier wrote:

So, now that the last round of minor releases has happened and that
some dust has settled on this patch, shouldn't there be a backpatch?
If yes, do you need patches for all branches? This problems goes down
to 9.2 anyway as BASE_BACKUP can generate end-of-backup records.

Yes, this could be backpatched now. It looks like it will need a bit of
fiddling to get it into all the backbranches. If you want to give it a
closer look, go ahead please.

Attached are patches for 9.2~9.6. There are a couple of conflicts
across each version. Particularly in 9.2, I have made the choice to
not rename walsender_ready_to_stop to got_SIGINT as this is used as
well in basebackup.c to make clearer the code. In 9.3~ the use of this
flag is located only within walsender.c.
--
Michael

Attachments:

walsender-shutdown-96.patchapplication/octet-stream; name=walsender-shutdown-96.patchDownload

From 2781015088b73d580218ef4a2aa743c4227c1801 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 15:09:06 -0400
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 5 files changed, 136 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b5aa9158be..7085d9dcf1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8006,6 +8006,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6f923bdbdb..a639462d31 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2882,7 +2882,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3620,7 +3620,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3629,6 +3631,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 834bf947a3..f1ae681254 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2016, PostgreSQL Global Development Group
@@ -171,13 +174,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -187,6 +191,7 @@ static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -265,11 +270,14 @@ WalSndErrorCleanup(void)
 		ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -672,7 +680,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -971,7 +979,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1025,7 +1033,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1212,6 +1220,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to the stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1220,7 +1236,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1291,6 +1307,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * Log replication command if log_replication_commands is enabled. Even
 	 * when it's disabled, log the command with DEBUG1 level for backward
 	 * compatibility.
@@ -1879,13 +1911,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2588,7 +2627,23 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2603,7 +2658,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2616,14 +2671,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2701,6 +2756,50 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2734,6 +2833,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index de7338638e..d61d03e66c 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -34,6 +34,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7794aa567e..188a3ce9ed 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.13.0

walsender-shutdown-95.patchapplication/octet-stream; name=walsender-shutdown-95.patchDownload

From 1cc3c542b53033ec3b0ad68f2a709f565dd87a3f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 15:09:06 -0400
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 5 files changed, 136 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3cc74bfe2e..8ae6f092a3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7949,6 +7949,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80d3c52241..259a441941 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2852,7 +2852,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3590,7 +3590,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3599,6 +3601,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 31e9c8ad6d..de76d41030 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2015, PostgreSQL Global Development Group
@@ -170,13 +173,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -186,6 +190,7 @@ static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -263,11 +268,14 @@ WalSndErrorCleanup()
 		ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -671,7 +679,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -962,7 +970,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1017,7 +1025,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1204,6 +1212,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to the stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1212,7 +1228,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1283,6 +1299,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * Log replication command if log_replication_commands is enabled. Even
 	 * when it's disabled, log the command with DEBUG1 level for backward
 	 * compatibility.
@@ -1876,13 +1908,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2590,7 +2629,23 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2605,7 +2660,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2618,14 +2673,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2703,6 +2758,50 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2737,6 +2836,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index cb3f6bd21f..71857bebf0 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -34,6 +34,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 6dae480285..ab541da916 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.13.0

walsender-shutdown-93.patchapplication/octet-stream; name=walsender-shutdown-93.patchDownload

From a24394dc940540f4df57a4d4af09eeaeeaf4a32b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 27 May 2017 15:25:01 -0400
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 130 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 5 files changed, 126 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 33af2f6291..4812966bb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6779,6 +6779,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2c3c3a44e7..bc88eaa4bc 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2807,7 +2807,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3504,7 +3504,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3513,6 +3515,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 9b455eef58..429ff04751 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instruct walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2013, PostgreSQL Global Development Group
@@ -155,19 +158,21 @@ static bool streamingDoneReceiving;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -225,11 +230,14 @@ WalSndErrorCleanup()
 	}
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -558,7 +566,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop();
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -633,6 +641,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext cmd_context;
 	MemoryContext old_context;
 
+	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
 	elog(DEBUG1, "received replication command: %s", cmd_string);
 
 	CHECK_FOR_INTERRUPTS();
@@ -1034,13 +1058,20 @@ WalSndLoop(void)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait
 			 * for them to be replicated to the standby, and exit.
 			 * This may be a normal termination at shutdown, or a promotion,
 			 * the walsender is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 			{
 				XLogRecPtr	replicatedPtr;
 
@@ -1736,7 +1767,24 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	if (MyWalSnd)
+		SetLatch(&MyWalSnd->latch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -1751,7 +1799,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	if (MyWalSnd)
 		SetLatch(&MyWalSnd->latch);
 
@@ -1765,14 +1813,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -1837,6 +1885,50 @@ WalSndWakeup(void)
 		SetLatch(&WalSndCtl->walsnds[i].latch);
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -1871,6 +1963,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2cc7ddfa37..210793a7db 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -32,6 +32,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 7eaa21b9f7..5df78b9aa8 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.13.0

walsender-shutdown-94.patchapplication/octet-stream; name=walsender-shutdown-94.patchDownload

From 268671816c5290625b58845a63bc7e8d77c80b8e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 15:09:06 -0400
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 144 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 5 files changed, 137 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 17a2692659..c301de2792 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7948,6 +7948,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 42d4444b81..a6fb3a1a00 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2835,7 +2835,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3580,7 +3580,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3589,6 +3591,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 96ad33ea83..3e096a2d67 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2014, PostgreSQL Global Development Group
@@ -169,13 +172,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGINT = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
+ * This is set while we are streaming. When not set, SIGINT signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * got_SIGINT and terminating when it's set (after streaming any remaining
+ * WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -185,6 +189,7 @@ static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -262,11 +267,14 @@ WalSndErrorCleanup()
 		ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
+
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -670,7 +678,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -960,7 +968,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_SIGINT = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1015,7 +1023,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_SIGINT)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1201,6 +1209,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
+		 * If postmaster asked us to switch to the stopping state, do so.
+		 * Shutdown is in progress and this will allow the checkpointer to
+		 * move on with the shutdown checkpoint.
+		 */
+		if (got_SIGUSR2)
+			WalSndSetState(WALSNDSTATE_STOPPING);
+
+		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1209,7 +1225,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_SIGINT)
 			break;
 
 		/*
@@ -1283,6 +1299,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -1870,13 +1902,20 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGINT)
 				WalSndDone(send_data);
 		}
 
@@ -2592,7 +2631,24 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	if (MyWalSnd)
+		SetLatch(&MyWalSnd->latch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2607,7 +2663,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	walsender_ready_to_stop = true;
+	got_SIGINT = true;
 	if (MyWalSnd)
 		SetLatch(&MyWalSnd->latch);
 
@@ -2621,14 +2677,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -2693,6 +2749,50 @@ WalSndWakeup(void)
 		SetLatch(&WalSndCtl->walsnds[i].latch);
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -2727,6 +2827,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index cff2be6d8f..0038d35946 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -33,6 +33,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index dff33549c8..a78f9b9fdb 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.13.0

walsender-shutdown-92.patchapplication/octet-stream; name=walsender-shutdown-92.patchDownload

From 6e5406b82d4a20c88df5bff8aeb2dbe903b40e7b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 1 May 2017 15:09:06 -0400
Subject: [PATCH] Prevent panic during shutdown checkpoint

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and throws
a PANIC if so.  At that point, only walsenders are still active, so one
might think this could not happen, but walsenders can also generate WAL,
for instance in BASE_BACKUP and certain variants of
CREATE_REPLICATION_SLOT.  So they can trigger this panic if such a
command is run while the shutdown checkpoint is being written.

To fix this, divide the walsender shutdown into two phases.  First, the
postmaster sends a SIGUSR2 signal to all walsenders.  The walsenders
then put themselves into the "stopping" state.  In this state, they
reject any new commands.  (For simplicity, we reject all new commands,
so that in the future we do not have to track meticulously which
commands might generate WAL.)  The checkpointer waits for all walsenders
to reach this state before proceeding with the shutdown checkpoint.
After the shutdown checkpoint is done, the postmaster sends
SIGINT (previously unused) to the walsenders.  This triggers the
existing shutdown behavior of sending out the shutdown checkpoint record
and then terminating.

Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Fujii Masao <masao.fujii@gmail.com>
---
 src/backend/access/transam/xlog.c           |   6 ++
 src/backend/postmaster/postmaster.c         |   9 ++-
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/walsender.c         | 121 ++++++++++++++++++++++++----
 src/include/replication/walsender.h         |   1 +
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 122 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8c7a239160..82bc216a84 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7936,6 +7936,12 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(LOG,
 			(errmsg("shutting down")));
 
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 95b2ce3c6b..148d1deeb2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2516,7 +2516,7 @@ reaper(SIGNAL_ARGS)
 				ereport(LOG,
 						(errmsg("terminating all walsender processes to force cascaded "
 							"standby(s) to update timeline and reconnect")));
-				SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
+				SignalSomeChildren(SIGINT, BACKEND_TYPE_WALSND);
 			}
 
 			/*
@@ -2595,7 +2595,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGUSR2);
+				SignalChildren(SIGINT);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3138,7 +3138,9 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint.
+				 * checkpointer to do a shutdown checkpoint. All WAL senders
+				 * are told to switch to a stopping state so that the shutdown
+				 * checkpoint can go ahead.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3147,6 +3149,7 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
+					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index f9a48d9cca..caa07ebeb7 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -444,7 +444,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		 * file is required for recovery, and even that only if there happens
 		 * to be a timeline switch in the first WAL segment that contains the
 		 * checkpoint record, or if we're taking a base backup from a standby
-		 * server and the target timeline changes while the backup is taken. 
+		 * server and the target timeline changes while the backup is taken.
 		 * But they are small and highly useful for debugging purposes, so
 		 * better include them all, always.
 		 */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 94da622f05..bf5d0bcb04 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -19,11 +19,14 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instruct walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all regular
+ * backends have exited. This causes the walsender to switch to the "stopping"
+ * state. In this state, the walsender will reject any replication command
+ * that may generate WAL activity. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGINT. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2012, PostgreSQL Global Development Group
@@ -110,6 +113,7 @@ static TimestampTz last_reply_timestamp;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
 volatile sig_atomic_t walsender_shutdown_requested = false;
 volatile sig_atomic_t walsender_ready_to_stop = false;
 
@@ -118,6 +122,7 @@ static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndShutdownHandler(SIGNAL_ARGS);
 static void WalSndQuickDieHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
+static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -372,15 +377,15 @@ StartReplication(StartReplicationCmd *cmd)
 	SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
 
 	/*
-	 * When promoting a cascading standby, postmaster sends SIGUSR2 to any
+	 * When promoting a cascading standby, postmaster sends SIGINT to any
 	 * cascading walsenders to kill them. But there is a corner-case where
-	 * such walsender fails to receive SIGUSR2 and survives a standby
-	 * promotion unexpectedly. This happens when postmaster sends SIGUSR2
+	 * such walsender fails to receive SIGINT and survives a standby
+	 * promotion unexpectedly. This happens when postmaster sends SIGINT
 	 * before the walsender marks itself as a WAL sender, because postmaster
-	 * sends SIGUSR2 to only the processes marked as a WAL sender.
+	 * sends SIGINT to only the processes marked as a WAL sender.
 	 *
 	 * To avoid this corner-case, if recovery is NOT in progress even though
-	 * the walsender is cascading one, we do the same thing as SIGUSR2 signal
+	 * the walsender is cascading one, we do the same thing as SIGINT signal
 	 * handler does, i.e., set walsender_ready_to_stop to true. Which causes
 	 * the walsender to end later.
 	 *
@@ -445,6 +450,22 @@ HandleReplicationCommand(const char *cmd_string)
 	MemoryContext cmd_context;
 	MemoryContext old_context;
 
+	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_SIGUSR2)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
 	elog(DEBUG1, "received replication command: %s", cmd_string);
 
 	cmd_context = AllocSetContextCreate(CurrentMemoryContext,
@@ -800,7 +821,14 @@ WalSndLoop(void)
 			}
 
 			/*
-			 * When SIGUSR2 arrives, we send any outstanding logs up to the
+			 * At the reception of SIGUSR2, switch the WAL sender to the stopping
+			 * state.
+			 */
+			if (got_SIGUSR2)
+				WalSndSetState(WALSNDSTATE_STOPPING);
+
+			/*
+			 * When SIGINT arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait
 			 * for them to be replicated to the standby, and exit.
 			 * This may be a normal termination at shutdown, or a promotion,
@@ -1378,7 +1406,24 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/* SIGUSR2: set flag to switch to stopping state */
+static void
+WalSndSwitchStopping(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_SIGUSR2 = true;
+	if (MyWalSnd)
+		SetLatch(&MyWalSnd->latch);
+
+	errno = save_errno;
+}
+
+/*
+ * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -1398,14 +1443,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 	pqsignal(SIGTERM, WalSndShutdownHandler);	/* request shutdown */
 	pqsignal(SIGQUIT, WalSndQuickDieHandler);	/* hard crash time */
 	pqsignal(SIGALRM, handle_sig_alarm);
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -1465,6 +1510,50 @@ WalSndWakeup(void)
 		SetLatch(&WalSndCtl->walsnds[i].latch);
 }
 
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);	/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -1499,6 +1588,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 128d2dbf59..de9d1a4597 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -31,6 +31,7 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 45cd7444cd..212a73d8f9 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
-- 
2.13.0

#54

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Peter Eisentraut (#49)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-05-05 10:50:11 -0400, Peter Eisentraut wrote:

On 5/5/17 01:26, Michael Paquier wrote:

The only code path doing HOT-pruning and generating WAL is
heap_page_prune(). Do you think that we need to worry about FPWs as
well?

Attached is an updated patch, which also forbids the run of any
replication commands when the stopping state is reached.

I have committed this without the HOT pruning change. That can be
considered separately, and I think it could use another round of
thinking about it.

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#54)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#55)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

Some points:

1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
backends use SIGINT for WalSndLastCycleHandler(), which is now
triggerable by pg_cancel_backend(). Especially for logical rep
walsenders it's not absurd sending that.
2) Walsenders now can run normal queries.
3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
address the PANIC problem for database connected walsenders at all,
because it doesn't even cancel non-replication commands. I.e. an
already running query can just continue to run. Which afaict just
entirely breaks shutdown. If the connection is idle, or running a
query, we'll just wait forever.
4) the whole logic introduced in the above commit doesn't actually
appear to handle logical decoding senders properly - wasn't the whole
issue at hand that those can write WAL in some case? But
nevertheless WalSndWaitForWal() does a
WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
and waiting* - which seems to obviate the entire point of that commit.

I'm working on a patch rejiggering things so:

a) upon shutdown checkpointer (so we can use procsignal), not
postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
we don't have to use up a normal signal handler)
b) Upon reception walsenders immediately exit if !replication_active,
otherwise sets got_STOPPING
c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
currently done, moving to WALSNDSTATE_STOPPING. Not yet quite sure
how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
d) Once all remaining walsenders are in stopping state, postmaster sends
SIGUSR2 to trigger shutdown (basically as before)

Does that seem to make sense?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#56)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

I am still confused by your previous email, which, at least it seems
to me, implies that logical WAL senders have been working correctly
with query cancellations. Now SIGINT is just ignored, which means that
pg_cancel_backend() has never worked for WAL senders until now, and
this behavior has not changed with 086221c. So there is no new
breakage introduced by this commit. I get your point to reuse SIGINT
for query cancellations though, but that's a new feature.

Some points:

1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
backends use SIGINT for WalSndLastCycleHandler(), which is now
triggerable by pg_cancel_backend(). Especially for logical rep
walsenders it's not absurd sending that.
2) Walsenders now can run normal queries.
3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
address the PANIC problem for database connected walsenders at all,
because it doesn't even cancel non-replication commands. I.e. an
already running query can just continue to run. Which afaict just
entirely breaks shutdown. If the connection is idle, or running a
query, we'll just wait forever.
4) the whole logic introduced in the above commit doesn't actually
appear to handle logical decoding senders properly - wasn't the whole
issue at hand that those can write WAL in some case? But
nevertheless WalSndWaitForWal() does a
WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
and waiting* - which seems to obviate the entire point of that commit.

I'm working on a patch rejiggering things so:

a) upon shutdown checkpointer (so we can use procsignal), not
postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
we don't have to use up a normal signal handler)

You'll need a second one that wakes up the latch of the WAL senders to
send more WAL records.

b) Upon reception walsenders immediately exit if !replication_active,
otherwise sets got_STOPPING

Okay, that's what happens now anyway, any new replication command
received results in an error. I actually prefer the way of doing in
HEAD, which at least reports an error.

c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
currently done, moving to WALSNDSTATE_STOPPING. Not yet quite sure
how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().

Wouldn't it make sense to have the logical receivers be able to
receive WAL up to the end of checkpoint record?

d) Once all remaining walsenders are in stopping state, postmaster sends
SIGUSR2 to trigger shutdown (basically as before)

OK.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#57)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-02 10:05:21 +0900, Michael Paquier wrote:

On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

I am still confused by your previous email, which, at least it seems
to me, implies that logical WAL senders have been working correctly
with query cancellations. Now SIGINT is just ignored, which means that
pg_cancel_backend() has never worked for WAL senders until now, and
this behavior has not changed with 086221c. So there is no new
breakage introduced by this commit. I get your point to reuse SIGINT
for query cancellations though, but that's a new feature.

The issue is that the commit made a non-existant feature
(pg_cancel_backend() to walsenders) into a broken one (pg_cancel_backend
terminates walsenders). Additionally v10 added something new
(walsenders executing SQL), and that will need at least some signal
handling fixes - hard to do if e.g. SIGINT is reused for something else.

a) upon shutdown checkpointer (so we can use procsignal), not
postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
we don't have to use up a normal signal handler)

You'll need a second one that wakes up the latch of the WAL senders to
send more WAL records.

Don't think so, procsignal_sigusr1_handler serves quite well for that.
There's nearby discussion that we need to do so anyway, to fix recovery
conflict interrupts, parallelism interrupts and such.

b) Upon reception walsenders immediately exit if !replication_active,
otherwise sets got_STOPPING

Okay, that's what happens now anyway, any new replication command
received results in an error. I actually prefer the way of doing in
HEAD, which at least reports an error.

Err, no. What happens right now is that plainly nothing is done if a
connection is idle or busy executing things. Only if a new command is
sent we error out - that makes very little sense.

c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
currently done, moving to WALSNDSTATE_STOPPING. Not yet quite sure
how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().

Wouldn't it make sense to have the logical receivers be able to
receive WAL up to the end of checkpoint record?

Yea, that's what I'm doing. For that we really only need to change the
check in WalSndWaitForWal() check of got_SIGINT to got_STOPPING, and add
a XLogSendLogical() check in the WalSndCaughtUp if() that sets
got_SIGUSR2 *without* setting WALSNDSTATE_STOPPING (otherwise we'd
possibly continue to trigger wal records until the send buffer is
emptied).

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Andres Freund (#54)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Thu, Jun 1, 2017 at 6:05 PM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad.

Yep, that's bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andres Freund (#56)

4 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-01 17:29:12 -0700, Andres Freund wrote:

On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:

On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:

I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
Normally INT is used cancel interrupts, and since walsender is now also
working as a normal backend, this overlap is bad. Even for plain
walsender backend this seems bad, because now non-superusers replication
users will terminate replication connections when they do
pg_cancel_backend(). For replication=dbname users it's especially bad
because there can be completely legitimate uses of pg_cancel_backend().

Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

Some points:

1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
backends use SIGINT for WalSndLastCycleHandler(), which is now
triggerable by pg_cancel_backend(). Especially for logical rep
walsenders it's not absurd sending that.
2) Walsenders now can run normal queries.
3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
address the PANIC problem for database connected walsenders at all,
because it doesn't even cancel non-replication commands. I.e. an
already running query can just continue to run. Which afaict just
entirely breaks shutdown. If the connection is idle, or running a
query, we'll just wait forever.
4) the whole logic introduced in the above commit doesn't actually
appear to handle logical decoding senders properly - wasn't the whole
issue at hand that those can write WAL in some case? But
nevertheless WalSndWaitForWal() does a
WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
and waiting* - which seems to obviate the entire point of that commit.

I'm working on a patch rejiggering things so:

a) upon shutdown checkpointer (so we can use procsignal), not
postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
we don't have to use up a normal signal handler)
b) Upon reception walsenders immediately exit if !replication_active,
otherwise sets got_STOPPING
c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
currently done, moving to WALSNDSTATE_STOPPING. Not yet quite sure
how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
d) Once all remaining walsenders are in stopping state, postmaster sends
SIGUSR2 to trigger shutdown (basically as before)

Does that seem to make sense?

Attached is a *preliminary* patch series implementing this. I've first
reverted the previous patch, as otherwise backpatchable versions of the
necessary patches would get too complicated, due to the signals used and
such.

This also fixes several of the issues from the somewhat related thread at
http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de

I'm not perfectly happy with the use of XLogBackgroundFlush() but we
don't currently expose anything else to flush all pending WAL afaics -
it's not too bad either. Without that we can end up waiting forever
while if the last XLogInserts are done by an asynchronously committing
backend or the relevant backends exited before getting to flush out
their records, because walwriter has already been shut down at that
point.

Comments?

- Andres

Attachments:

0001-Revert-Prevent-panic-during-shutdown-checkpoint.patchtext/x-patch; charset=us-asciiDownload

From 187f99cdf98886be954cd1edda275c51b83da5ef Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 14:14:34 -0700
Subject: [PATCH 1/4] Revert "Prevent panic during shutdown checkpoint"

This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which
was made to master only.

The approach implemented in the above commit has some issues.  While
this could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                |   5 -
 src/backend/access/transam/xlog.c           |   6 --
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++------------------------
 src/include/replication/walsender.h         |   1 -
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 24 insertions(+), 141 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 79ca45a156..5640c0d84a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,11 +1690,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
-         <listitem>
-          <para>
-           <literal>stopping</>: This WAL sender is stopping.
-          </para>
-         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 399822d3fe..35ee7d1cb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,12 +8324,6 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
-	/*
-	 * Wait for WAL senders to be in stopping state.  This prevents commands
-	 * from writing new WAL.
-	 */
-	WalSndWaitStopping();
-
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 35b4ec88d3..5c79b1e40d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGINT);
+				SignalChildren(SIGUSR2);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,9 +3656,7 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint. All WAL senders
-				 * are told to switch to a stopping state so that the shutdown
-				 * checkpoint can go ahead.
+				 * checkpointer to do a shutdown checkpoint.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3667,7 +3665,6 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
-					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38880..aa705e5b35 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,14 +24,11 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all regular
- * backends have exited. This causes the walsender to switch to the "stopping"
- * state. In this state, the walsender will reject any replication command
- * that may generate WAL activity. The checkpointer begins the shutdown
- * checkpoint once all walsenders are confirmed as stopping. When the shutdown
- * checkpoint finishes, the postmaster sends us SIGINT. This instructs
- * walsender to send any outstanding WAL, including the shutdown checkpoint
- * record, wait for it to be replicated to the standby, and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all
+ * regular backends have exited and the shutdown checkpoint has been written.
+ * This instructs walsender to send any outstanding WAL, including the
+ * shutdown checkpoint record, wait for it to be replicated to the standby,
+ * and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -180,14 +177,13 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t got_SIGINT = false;
-static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t walsender_ready_to_stop = false;
 
 /*
- * This is set while we are streaming. When not set, SIGINT signal will be
+ * This is set while we are streaming. When not set, SIGUSR2 signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * got_SIGINT and terminating when it's set (after streaming any remaining
- * WAL).
+ * walsender_ready_to_stop and terminating when it's set (after streaming any
+ * remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -217,7 +213,6 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
-static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -306,14 +301,11 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
-
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -686,7 +678,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1064,7 +1056,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		got_SIGINT = true;
+		walsender_ready_to_stop = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1115,7 +1107,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1327,14 +1319,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
-		 * If postmaster asked us to switch to the stopping state, do so.
-		 * Shutdown is in progress and this will allow the checkpointer to
-		 * move on with the shutdown checkpoint.
-		 */
-		if (got_SIGUSR2)
-			WalSndSetState(WALSNDSTATE_STOPPING);
-
-		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1343,7 +1327,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			break;
 
 		/*
@@ -1418,22 +1402,6 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
-	 * If WAL sender has been told that shutdown is getting close, switch its
-	 * status accordingly to handle the next replication commands correctly.
-	 */
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
-
-	/*
-	 * Throw error if in stopping mode.  We need prevent commands that could
-	 * generate WAL while the shutdown checkpoint is being written.  To be
-	 * safe, we just prohibit all new commands.
-	 */
-	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
-		ereport(ERROR,
-				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
-
-	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2155,20 +2123,13 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * At the reception of SIGUSR2, switch the WAL sender to the
-			 * stopping state.
-			 */
-			if (got_SIGUSR2)
-				WalSndSetState(WALSNDSTATE_STOPPING);
-
-			/*
-			 * When SIGINT arrives, we send any outstanding logs up to the
+			 * When SIGUSR2 arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (got_SIGINT)
+			if (walsender_ready_to_stop)
 				WalSndDone(send_data);
 		}
 
@@ -2907,23 +2868,7 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to switch to stopping state */
-static void
-WalSndSwitchStopping(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGUSR2 = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
-/*
- * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
- * sender should already have been switched to WALSNDSTATE_STOPPING at
- * this point.
- */
+/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2938,7 +2883,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	got_SIGINT = true;
+	walsender_ready_to_stop = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2951,14 +2896,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGINT, SIG_IGN);	/* not used */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
+	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -3036,50 +2981,6 @@ WalSndWakeup(void)
 	}
 }
 
-/*
- * Wait that all the WAL senders have reached the stopping state. This is
- * used by the checkpointer to control when shutdown checkpoints can
- * safely begin.
- */
-void
-WalSndWaitStopping(void)
-{
-	for (;;)
-	{
-		int			i;
-		bool		all_stopped = true;
-
-		for (i = 0; i < max_wal_senders; i++)
-		{
-			WalSndState state;
-			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
-
-			SpinLockAcquire(&walsnd->mutex);
-
-			if (walsnd->pid == 0)
-			{
-				SpinLockRelease(&walsnd->mutex);
-				continue;
-			}
-
-			state = walsnd->state;
-			SpinLockRelease(&walsnd->mutex);
-
-			if (state != WALSNDSTATE_STOPPING)
-			{
-				all_stopped = false;
-				break;
-			}
-		}
-
-		/* safe to leave if confirmation is done for all WAL senders */
-		if (all_stopped)
-			return;
-
-		pg_usleep(10000L);		/* wait for 10 msec */
-	}
-}
-
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3113,8 +3014,6 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
-		case WALSNDSTATE_STOPPING:
-			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 99f12377e0..2ca903872e 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,7 +44,6 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
-extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 36311e124c..2c59056cef 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,8 +24,7 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING,
-	WALSNDSTATE_STOPPING
+	WALSNDSTATE_STREAMING
 } WalSndState;
 
 /*
-- 
2.12.0.264.gd6db3f2165.dirty

0002-Have-walsenders-participate-in-procsignal-infrastruc.patchtext/x-patch; charset=us-asciiDownload

From fc7a0a3227fbd2ad71e07ee22fa3e9f2de861d3e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 16:54:51 -0700
Subject: [PATCH 2/4] Have walsenders participate in procsignal infrastructure.

The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.

This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler.  In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.

Author: Andres Freund
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
---
 src/backend/replication/walsender.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index aa705e5b35..27aa3e6bc7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -212,7 +212,6 @@ static struct
 
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
-static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -2857,17 +2856,6 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: set flag to send WAL records */
-static void
-WalSndXLogSendHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
@@ -2901,7 +2889,7 @@ WalSndSignals(void)
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
 												 * shutdown */
 
-- 
2.12.0.264.gd6db3f2165.dirty

0003-Prevent-possibility-of-panics-during-shutdown-checkp.patchtext/x-patch; charset=us-asciiDownload

From f8c2a9b96778dca5e6c94af68316f608651d79e2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 14:15:34 -0700
Subject: [PATCH 3/4] Prevent possibility of panics during shutdown checkpoint.

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.  Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.

Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
---
 doc/src/sgml/monitoring.sgml                |   5 +
 src/backend/access/transam/xlog.c           |  11 ++
 src/backend/replication/walsender.c         | 188 ++++++++++++++++++++++++----
 src/backend/storage/ipc/procsignal.c        |   4 +
 src/include/replication/walsender.h         |   4 +
 src/include/replication/walsender_private.h |   3 +-
 src/include/storage/procsignal.h            |   1 +
 7 files changed, 188 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5640c0d84a..79ca45a156 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 35ee7d1cb6..70d2570dc2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,6 +8324,17 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Signal walsenders to move to stopping state.
+	 */
+	WalSndInitStopping();
+
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 27aa3e6bc7..ff5aff496d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,17 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, checkpointer sends us
+ * PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited.  If
+ * the backend is idle or runs an SQL query this causes the backend to
+ * shutdown, if logical replication is in progress all existing WAL records
+ * are processed followed by a shutdown.  Otherwise this causes the walsender
+ * to switch to the "stopping" state. In this state, the walsender will reject
+ * any further replication commands. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +183,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_STOPPING = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
- * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * This is set while we are streaming. When not set
+ * PROCSIG_WALSND_INIT_STOPPING signal will be handled like SIGTERM. When set,
+ * the main loop is responsible for checking got_STOPPING and terminating when
+ * it's set (after streaming any remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -300,7 +307,8 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+
+	if (got_STOPPING || got_SIGUSR2)
 		proc_exit(0);
 
 	/* Revert back to startup state */
@@ -677,7 +685,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1055,7 +1063,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_STOPPING = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1106,7 +1114,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_STOPPING)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1311,6 +1319,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 		/* Check for input from the client */
 		ProcessRepliesIfAny();
 
+		/*
+		 * If we're shutting down, trigger pending WAL to be written out,
+		 * otherwise we'd possibly end up waiting for WAL that never gets
+		 * written, because walwriter has shut down already.
+		 */
+		if (got_STOPPING)
+			XLogBackgroundFlush();
+
 		/* Update our idea of the currently flushed position. */
 		if (!RecoveryInProgress())
 			RecentFlushPtr = GetFlushRecPtr();
@@ -1326,7 +1342,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			break;
 
 		/*
@@ -1401,6 +1417,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2128,7 +2160,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGUSR2)
 				WalSndDone(send_data);
 		}
 
@@ -2443,6 +2475,10 @@ XLogSendPhysical(void)
 	XLogRecPtr	endptr;
 	Size		nbytes;
 
+	/* If requested switch the WAL sender to the stopping state. */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
 	if (streamingDoneSending)
 	{
 		WalSndCaughtUp = true;
@@ -2733,7 +2769,16 @@ XLogSendLogical(void)
 		 * point, then we're caught up.
 		 */
 		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+		{
 			WalSndCaughtUp = true;
+
+			/*
+			 * Have WalSndLoop() terminate the connection in an orderly
+			 * manner, after writing out all the pending data.
+			 */
+			if (got_STOPPING)
+				got_SIGUSR2 = true;
+		}
 	}
 
 	/* Update shared memory status */
@@ -2843,6 +2888,26 @@ WalSndRqstFileReload(void)
 	}
 }
 
+/*
+ * Handle PROCSIG_WALSND_INIT_STOPPING signal.
+ */
+void
+HandleWalSndInitStopping(void)
+{
+	Assert(am_walsender);
+
+	/*
+	 * If replication has not yet started, die like with SIGTERM. If
+	 * replication is active, only set a flag and wake up the main loop. It
+	 * will send any outstanding WAL, wait for it to be replicated to the
+	 * standby, and then exit gracefully.
+	 */
+	if (!replication_active)
+		kill(MyProcPid, SIGTERM);
+	else
+		got_STOPPING = true;
+}
+
 /* SIGHUP: set flag to re-read config file at next convenient time */
 static void
 WalSndSigHupHandler(SIGNAL_ARGS)
@@ -2856,22 +2921,17 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/*
+ * SIGUSR2: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	/*
-	 * If replication has not yet started, die like with SIGTERM. If
-	 * replication is active, only set a flag and wake up the main loop. It
-	 * will send any outstanding WAL, wait for it to be replicated to the
-	 * standby, and then exit gracefully.
-	 */
-	if (!replication_active)
-		kill(MyProcPid, SIGTERM);
-
-	walsender_ready_to_stop = true;
+	got_SIGUSR2 = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2969,6 +3029,78 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Signal all walsenders to move to stopping state.
+ *
+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
+ */
+void
+WalSndInitStopping(void)
+{
+	int			i;
+
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+		pid_t		pid;
+
+		SpinLockAcquire(&walsnd->mutex);
+		pid = walsnd->pid;
+		SpinLockRelease(&walsnd->mutex);
+
+		if (pid == 0)
+			continue;
+
+		SendProcSignal(pid, PROCSIG_WALSND_INIT_STOPPING, InvalidBackendId);
+	}
+}
+
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);		/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3002,6 +3134,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4a21d5512d..b9302ac630 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -20,6 +20,7 @@
 #include "access/parallel.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
@@ -270,6 +271,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_MESSAGE))
 		HandleParallelMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_WALSND_INIT_STOPPING))
+		HandleWalSndInitStopping();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..edc497f91c 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,10 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndInitStopping(void);
+extern void WalSndWaitStopping(void);
+extern void HandleWalSndInitStopping(void);
+extern void WalSndRqstStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index d068dde5d7..553f0f43f7 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -32,6 +32,7 @@ typedef enum
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
 	PROCSIG_PARALLEL_MESSAGE,	/* message from cooperating parallel backend */
+	PROCSIG_WALSND_INIT_STOPPING,	/* ask walsenders to prepare for shutdown  */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_DATABASE,
-- 
2.12.0.264.gd6db3f2165.dirty

0004-Wire-up-query-cancel-interrupt-for-walsender-backend.patchtext/x-patch; charset=us-asciiDownload

From d5364e4503e43bfb919298132817ec608fa61519 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 16:07:08 -0700
Subject: [PATCH 4/4] Wire up query cancel interrupt for walsender backends.

This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ff5aff496d..94a0b34389 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2944,7 +2944,7 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, StatementCancelHandler);	/* query cancel */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
-- 
2.12.0.264.gd6db3f2165.dirty

#61

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andres Freund (#60)

6 attachment(s)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-02 17:20:23 -0700, Andres Freund wrote:

Attached is a *preliminary* patch series implementing this. I've first
reverted the previous patch, as otherwise backpatchable versions of the
necessary patches would get too complicated, due to the signals used and
such.

I went again through this, and the only real thing I found that there
was a leftover prototype in walsender.h. I've in interim worked on
backpatch versions of that series, annoying conflicts, but nothing
really problematic. The only real difference is adding SetLatch() calls
to HandleWalSndInitStopping() < 9.6, and guarding SetLatch with an if <
9.5.

As an additional patch (based on one by Petr), even though it more
belongs to
http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de
attached is a patch unifying SIGHUP between normal and walsender
backends. This needs to be backpatched all the way. I've also attached
a second patch, again based on Petr's, that unifies SIGHUP handling
across all the remaining backends, but that's something that probably
more appropriate for v11, although I'm still tempted to commit it
earlier.

Michael, Peter, Fujii, is either of you planning to review this? I'm
planning to commit this tomorrow morning PST, unless somebody protest
till then.

- Andres

Attachments:

0001-Revert-Prevent-panic-during-shutdown-checkpoint.patchtext/x-patch; charset=us-asciiDownload

From 39f95c9e85811d6759a29b293adc97567d895d69 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 14:14:34 -0700
Subject: [PATCH 1/6] Revert "Prevent panic during shutdown checkpoint"

This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which
was made to master only.

The approach implemented in the above commit has some issues.  While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                |   5 -
 src/backend/access/transam/xlog.c           |   6 --
 src/backend/postmaster/postmaster.c         |   7 +-
 src/backend/replication/walsender.c         | 143 ++++------------------------
 src/include/replication/walsender.h         |   1 -
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 24 insertions(+), 141 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 79ca45a156..5640c0d84a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,11 +1690,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
-         <listitem>
-          <para>
-           <literal>stopping</>: This WAL sender is stopping.
-          </para>
-         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 399822d3fe..35ee7d1cb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,12 +8324,6 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
-	/*
-	 * Wait for WAL senders to be in stopping state.  This prevents commands
-	 * from writing new WAL.
-	 */
-	WalSndWaitStopping();
-
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 35b4ec88d3..5c79b1e40d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
 				 * Waken walsenders for the last time. No regular backends
 				 * should be around anymore.
 				 */
-				SignalChildren(SIGINT);
+				SignalChildren(SIGUSR2);
 
 				pmState = PM_SHUTDOWN_2;
 
@@ -3656,9 +3656,7 @@ PostmasterStateMachine(void)
 				/*
 				 * If we get here, we are proceeding with normal shutdown. All
 				 * the regular children are gone, and it's time to tell the
-				 * checkpointer to do a shutdown checkpoint. All WAL senders
-				 * are told to switch to a stopping state so that the shutdown
-				 * checkpoint can go ahead.
+				 * checkpointer to do a shutdown checkpoint.
 				 */
 				Assert(Shutdown > NoShutdown);
 				/* Start the checkpointer if not running */
@@ -3667,7 +3665,6 @@ PostmasterStateMachine(void)
 				/* And tell it to shut down */
 				if (CheckpointerPID != 0)
 				{
-					SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 					signal_child(CheckpointerPID, SIGUSR2);
 					pmState = PM_SHUTDOWN;
 				}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38880..aa705e5b35 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,14 +24,11 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all regular
- * backends have exited. This causes the walsender to switch to the "stopping"
- * state. In this state, the walsender will reject any replication command
- * that may generate WAL activity. The checkpointer begins the shutdown
- * checkpoint once all walsenders are confirmed as stopping. When the shutdown
- * checkpoint finishes, the postmaster sends us SIGINT. This instructs
- * walsender to send any outstanding WAL, including the shutdown checkpoint
- * record, wait for it to be replicated to the standby, and then exit.
+ * If the server is shut down, postmaster sends us SIGUSR2 after all
+ * regular backends have exited and the shutdown checkpoint has been written.
+ * This instructs walsender to send any outstanding WAL, including the
+ * shutdown checkpoint record, wait for it to be replicated to the standby,
+ * and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -180,14 +177,13 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t got_SIGINT = false;
-static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t walsender_ready_to_stop = false;
 
 /*
- * This is set while we are streaming. When not set, SIGINT signal will be
+ * This is set while we are streaming. When not set, SIGUSR2 signal will be
  * handled like SIGTERM. When set, the main loop is responsible for checking
- * got_SIGINT and terminating when it's set (after streaming any remaining
- * WAL).
+ * walsender_ready_to_stop and terminating when it's set (after streaming any
+ * remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -217,7 +213,6 @@ static struct
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
-static void WalSndSwitchStopping(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -306,14 +301,11 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 
 	/* Revert back to startup state */
 	WalSndSetState(WALSNDSTATE_STARTUP);
-
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
 }
 
 /*
@@ -686,7 +678,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1064,7 +1056,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		got_SIGINT = true;
+		walsender_ready_to_stop = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1115,7 +1107,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (got_SIGINT)
+	if (walsender_ready_to_stop)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1327,14 +1319,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
 		/*
-		 * If postmaster asked us to switch to the stopping state, do so.
-		 * Shutdown is in progress and this will allow the checkpointer to
-		 * move on with the shutdown checkpoint.
-		 */
-		if (got_SIGUSR2)
-			WalSndSetState(WALSNDSTATE_STOPPING);
-
-		/*
 		 * If postmaster asked us to stop, don't wait here anymore. This will
 		 * cause the xlogreader to return without reading a full record, which
 		 * is the fastest way to reach the mainloop which then can quit.
@@ -1343,7 +1327,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (got_SIGINT)
+		if (walsender_ready_to_stop)
 			break;
 
 		/*
@@ -1418,22 +1402,6 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
-	 * If WAL sender has been told that shutdown is getting close, switch its
-	 * status accordingly to handle the next replication commands correctly.
-	 */
-	if (got_SIGUSR2)
-		WalSndSetState(WALSNDSTATE_STOPPING);
-
-	/*
-	 * Throw error if in stopping mode.  We need prevent commands that could
-	 * generate WAL while the shutdown checkpoint is being written.  To be
-	 * safe, we just prohibit all new commands.
-	 */
-	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
-		ereport(ERROR,
-				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
-
-	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2155,20 +2123,13 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			}
 
 			/*
-			 * At the reception of SIGUSR2, switch the WAL sender to the
-			 * stopping state.
-			 */
-			if (got_SIGUSR2)
-				WalSndSetState(WALSNDSTATE_STOPPING);
-
-			/*
-			 * When SIGINT arrives, we send any outstanding logs up to the
+			 * When SIGUSR2 arrives, we send any outstanding logs up to the
 			 * shutdown checkpoint record (i.e., the latest record), wait for
 			 * them to be replicated to the standby, and exit. This may be a
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (got_SIGINT)
+			if (walsender_ready_to_stop)
 				WalSndDone(send_data);
 		}
 
@@ -2907,23 +2868,7 @@ WalSndXLogSendHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to switch to stopping state */
-static void
-WalSndSwitchStopping(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGUSR2 = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
-/*
- * SIGINT: set flag to do a last cycle and shut down afterwards. The WAL
- * sender should already have been switched to WALSNDSTATE_STOPPING at
- * this point.
- */
+/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
@@ -2938,7 +2883,7 @@ WalSndLastCycleHandler(SIGNAL_ARGS)
 	if (!replication_active)
 		kill(MyProcPid, SIGTERM);
 
-	got_SIGINT = true;
+	walsender_ready_to_stop = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2951,14 +2896,14 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, WalSndLastCycleHandler);	/* request a last cycle and
-												 * shutdown */
+	pqsignal(SIGINT, SIG_IGN);	/* not used */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
-	pqsignal(SIGUSR2, WalSndSwitchStopping);	/* switch to stopping state */
+	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
+												 * shutdown */
 
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
@@ -3036,50 +2981,6 @@ WalSndWakeup(void)
 	}
 }
 
-/*
- * Wait that all the WAL senders have reached the stopping state. This is
- * used by the checkpointer to control when shutdown checkpoints can
- * safely begin.
- */
-void
-WalSndWaitStopping(void)
-{
-	for (;;)
-	{
-		int			i;
-		bool		all_stopped = true;
-
-		for (i = 0; i < max_wal_senders; i++)
-		{
-			WalSndState state;
-			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
-
-			SpinLockAcquire(&walsnd->mutex);
-
-			if (walsnd->pid == 0)
-			{
-				SpinLockRelease(&walsnd->mutex);
-				continue;
-			}
-
-			state = walsnd->state;
-			SpinLockRelease(&walsnd->mutex);
-
-			if (state != WALSNDSTATE_STOPPING)
-			{
-				all_stopped = false;
-				break;
-			}
-		}
-
-		/* safe to leave if confirmation is done for all WAL senders */
-		if (all_stopped)
-			return;
-
-		pg_usleep(10000L);		/* wait for 10 msec */
-	}
-}
-
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3113,8 +3014,6 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
-		case WALSNDSTATE_STOPPING:
-			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 99f12377e0..2ca903872e 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,7 +44,6 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
-extern void WalSndWaitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 36311e124c..2c59056cef 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,8 +24,7 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING,
-	WALSNDSTATE_STOPPING
+	WALSNDSTATE_STREAMING
 } WalSndState;
 
 /*
-- 
2.12.0.264.gd6db3f2165.dirty

0002-Have-walsenders-participate-in-procsignal-infrastruc.patchtext/x-patch; charset=us-asciiDownload

From 7e252e3213eaadafcdcd451eaa08c6da7f8ef804 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 16:54:51 -0700
Subject: [PATCH 2/6] Have walsenders participate in procsignal infrastructure.

The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.

This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler.  In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.

Author: Andres Freund
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
---
 src/backend/replication/walsender.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index aa705e5b35..27aa3e6bc7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -212,7 +212,6 @@ static struct

 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
-static void WalSndXLogSendHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);

 /* Prototypes for private functions */
@@ -2857,17 +2856,6 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }

-/* SIGUSR1: set flag to send WAL records */
-static void
-WalSndXLogSendHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to do a last cycle and shut down afterwards */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
@@ -2901,7 +2889,7 @@ WalSndSignals(void)
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalSndXLogSendHandler);	/* request WAL sending */
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, WalSndLastCycleHandler);	/* request a last cycle and
 												 * shutdown */

-- 
2.12.0.264.gd6db3f2165.dirty

0003-Prevent-possibility-of-panics-during-shutdown-checkp.patchtext/x-patch; charset=us-asciiDownload

From 01647701515b688ef6bf488430f7785f6ab50414 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 14:15:34 -0700
Subject: [PATCH 3/6] Prevent possibility of panics during shutdown checkpoint.

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.  Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.

Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
---
 doc/src/sgml/monitoring.sgml                |   5 +
 src/backend/access/transam/xlog.c           |  11 ++
 src/backend/replication/walsender.c         | 188 ++++++++++++++++++++++++----
 src/backend/storage/ipc/procsignal.c        |   4 +
 src/include/replication/walsender.h         |   3 +
 src/include/replication/walsender_private.h |   3 +-
 src/include/storage/procsignal.h            |   1 +
 7 files changed, 187 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5640c0d84a..79ca45a156 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,6 +1690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
            <literal>backup</>: This WAL sender is sending a backup.
           </para>
          </listitem>
+         <listitem>
+          <para>
+           <literal>stopping</>: This WAL sender is stopping.
+          </para>
+         </listitem>
        </itemizedlist>
      </entry>
     </row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 35ee7d1cb6..70d2570dc2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,6 +8324,17 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
+	/*
+	 * Signal walsenders to move to stopping state.
+	 */
+	WalSndInitStopping();
+
+	/*
+	 * Wait for WAL senders to be in stopping state.  This prevents commands
+	 * from writing new WAL.
+	 */
+	WalSndWaitStopping();
+
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 27aa3e6bc7..ff5aff496d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,11 +24,17 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the server is shut down, postmaster sends us SIGUSR2 after all
- * regular backends have exited and the shutdown checkpoint has been written.
- * This instructs walsender to send any outstanding WAL, including the
- * shutdown checkpoint record, wait for it to be replicated to the standby,
- * and then exit.
+ * If the server is shut down, checkpointer sends us
+ * PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited.  If
+ * the backend is idle or runs an SQL query this causes the backend to
+ * shutdown, if logical replication is in progress all existing WAL records
+ * are processed followed by a shutdown.  Otherwise this causes the walsender
+ * to switch to the "stopping" state. In this state, the walsender will reject
+ * any further replication commands. The checkpointer begins the shutdown
+ * checkpoint once all walsenders are confirmed as stopping. When the shutdown
+ * checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
+ * walsender to send any outstanding WAL, including the shutdown checkpoint
+ * record, wait for it to be replicated to the standby, and then exit.
  *
  *
  * Portions Copyright (c) 2010-2017, PostgreSQL Global Development Group
@@ -177,13 +183,14 @@ static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t walsender_ready_to_stop = false;
+static volatile sig_atomic_t got_SIGUSR2 = false;
+static volatile sig_atomic_t got_STOPPING = false;
 
 /*
- * This is set while we are streaming. When not set, SIGUSR2 signal will be
- * handled like SIGTERM. When set, the main loop is responsible for checking
- * walsender_ready_to_stop and terminating when it's set (after streaming any
- * remaining WAL).
+ * This is set while we are streaming. When not set
+ * PROCSIG_WALSND_INIT_STOPPING signal will be handled like SIGTERM. When set,
+ * the main loop is responsible for checking got_STOPPING and terminating when
+ * it's set (after streaming any remaining WAL).
  */
 static volatile sig_atomic_t replication_active = false;
 
@@ -300,7 +307,8 @@ WalSndErrorCleanup(void)
 	ReplicationSlotCleanup();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+
+	if (got_STOPPING || got_SIGUSR2)
 		proc_exit(0);
 
 	/* Revert back to startup state */
@@ -677,7 +685,7 @@ StartReplication(StartReplicationCmd *cmd)
 		WalSndLoop(XLogSendPhysical);
 
 		replication_active = false;
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			proc_exit(0);
 		WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1055,7 +1063,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	{
 		ereport(LOG,
 				(errmsg("terminating walsender process after promotion")));
-		walsender_ready_to_stop = true;
+		got_STOPPING = true;
 	}
 
 	WalSndSetState(WALSNDSTATE_CATCHUP);
@@ -1106,7 +1114,7 @@ StartLogicalReplication(StartReplicationCmd *cmd)
 	ReplicationSlotRelease();
 
 	replication_active = false;
-	if (walsender_ready_to_stop)
+	if (got_STOPPING)
 		proc_exit(0);
 	WalSndSetState(WALSNDSTATE_STARTUP);
 
@@ -1311,6 +1319,14 @@ WalSndWaitForWal(XLogRecPtr loc)
 		/* Check for input from the client */
 		ProcessRepliesIfAny();
 
+		/*
+		 * If we're shutting down, trigger pending WAL to be written out,
+		 * otherwise we'd possibly end up waiting for WAL that never gets
+		 * written, because walwriter has shut down already.
+		 */
+		if (got_STOPPING)
+			XLogBackgroundFlush();
+
 		/* Update our idea of the currently flushed position. */
 		if (!RecoveryInProgress())
 			RecentFlushPtr = GetFlushRecPtr();
@@ -1326,7 +1342,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 		 * RecentFlushPtr, so we can send all remaining data before shutting
 		 * down.
 		 */
-		if (walsender_ready_to_stop)
+		if (got_STOPPING)
 			break;
 
 		/*
@@ -1401,6 +1417,22 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * If WAL sender has been told that shutdown is getting close, switch its
+	 * status accordingly to handle the next replication commands correctly.
+	 */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
+	/*
+	 * Throw error if in stopping mode.  We need prevent commands that could
+	 * generate WAL while the shutdown checkpoint is being written.  To be
+	 * safe, we just prohibit all new commands.
+	 */
+	if (MyWalSnd->state == WALSNDSTATE_STOPPING)
+		ereport(ERROR,
+				(errmsg("cannot execute new commands while WAL sender is in stopping mode")));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
@@ -2128,7 +2160,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
 			 * normal termination at shutdown, or a promotion, the walsender
 			 * is not sure which.
 			 */
-			if (walsender_ready_to_stop)
+			if (got_SIGUSR2)
 				WalSndDone(send_data);
 		}
 
@@ -2443,6 +2475,10 @@ XLogSendPhysical(void)
 	XLogRecPtr	endptr;
 	Size		nbytes;
 
+	/* If requested switch the WAL sender to the stopping state. */
+	if (got_STOPPING)
+		WalSndSetState(WALSNDSTATE_STOPPING);
+
 	if (streamingDoneSending)
 	{
 		WalSndCaughtUp = true;
@@ -2733,7 +2769,16 @@ XLogSendLogical(void)
 		 * point, then we're caught up.
 		 */
 		if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
+		{
 			WalSndCaughtUp = true;
+
+			/*
+			 * Have WalSndLoop() terminate the connection in an orderly
+			 * manner, after writing out all the pending data.
+			 */
+			if (got_STOPPING)
+				got_SIGUSR2 = true;
+		}
 	}
 
 	/* Update shared memory status */
@@ -2843,6 +2888,26 @@ WalSndRqstFileReload(void)
 	}
 }
 
+/*
+ * Handle PROCSIG_WALSND_INIT_STOPPING signal.
+ */
+void
+HandleWalSndInitStopping(void)
+{
+	Assert(am_walsender);
+
+	/*
+	 * If replication has not yet started, die like with SIGTERM. If
+	 * replication is active, only set a flag and wake up the main loop. It
+	 * will send any outstanding WAL, wait for it to be replicated to the
+	 * standby, and then exit gracefully.
+	 */
+	if (!replication_active)
+		kill(MyProcPid, SIGTERM);
+	else
+		got_STOPPING = true;
+}
+
 /* SIGHUP: set flag to re-read config file at next convenient time */
 static void
 WalSndSigHupHandler(SIGNAL_ARGS)
@@ -2856,22 +2921,17 @@ WalSndSigHupHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR2: set flag to do a last cycle and shut down afterwards */
+/*
+ * SIGUSR2: set flag to do a last cycle and shut down afterwards. The WAL
+ * sender should already have been switched to WALSNDSTATE_STOPPING at
+ * this point.
+ */
 static void
 WalSndLastCycleHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	/*
-	 * If replication has not yet started, die like with SIGTERM. If
-	 * replication is active, only set a flag and wake up the main loop. It
-	 * will send any outstanding WAL, wait for it to be replicated to the
-	 * standby, and then exit gracefully.
-	 */
-	if (!replication_active)
-		kill(MyProcPid, SIGTERM);
-
-	walsender_ready_to_stop = true;
+	got_SIGUSR2 = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -2969,6 +3029,78 @@ WalSndWakeup(void)
 	}
 }
 
+/*
+ * Signal all walsenders to move to stopping state.
+ *
+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
+ */
+void
+WalSndInitStopping(void)
+{
+	int			i;
+
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+		pid_t		pid;
+
+		SpinLockAcquire(&walsnd->mutex);
+		pid = walsnd->pid;
+		SpinLockRelease(&walsnd->mutex);
+
+		if (pid == 0)
+			continue;
+
+		SendProcSignal(pid, PROCSIG_WALSND_INIT_STOPPING, InvalidBackendId);
+	}
+}
+
+/*
+ * Wait that all the WAL senders have reached the stopping state. This is
+ * used by the checkpointer to control when shutdown checkpoints can
+ * safely begin.
+ */
+void
+WalSndWaitStopping(void)
+{
+	for (;;)
+	{
+		int			i;
+		bool		all_stopped = true;
+
+		for (i = 0; i < max_wal_senders; i++)
+		{
+			WalSndState state;
+			WalSnd	   *walsnd = &WalSndCtl->walsnds[i];
+
+			SpinLockAcquire(&walsnd->mutex);
+
+			if (walsnd->pid == 0)
+			{
+				SpinLockRelease(&walsnd->mutex);
+				continue;
+			}
+
+			state = walsnd->state;
+			SpinLockRelease(&walsnd->mutex);
+
+			if (state != WALSNDSTATE_STOPPING)
+			{
+				all_stopped = false;
+				break;
+			}
+		}
+
+		/* safe to leave if confirmation is done for all WAL senders */
+		if (all_stopped)
+			return;
+
+		pg_usleep(10000L);		/* wait for 10 msec */
+	}
+}
+
 /* Set state for current walsender (only called in walsender) */
 void
 WalSndSetState(WalSndState state)
@@ -3002,6 +3134,8 @@ WalSndGetStateString(WalSndState state)
 			return "catchup";
 		case WALSNDSTATE_STREAMING:
 			return "streaming";
+		case WALSNDSTATE_STOPPING:
+			return "stopping";
 	}
 	return "UNKNOWN";
 }
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4a21d5512d..b9302ac630 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -20,6 +20,7 @@
 #include "access/parallel.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
@@ -270,6 +271,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_PARALLEL_MESSAGE))
 		HandleParallelMessageInterrupt();
 
+	if (CheckProcSignal(PROCSIG_WALSND_INIT_STOPPING))
+		HandleWalSndInitStopping();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index 2ca903872e..c50e450ec2 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -44,6 +44,9 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
+extern void WalSndInitStopping(void);
+extern void WalSndWaitStopping(void);
+extern void HandleWalSndInitStopping(void);
 extern void WalSndRqstFileReload(void);
 
 /*
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 2c59056cef..36311e124c 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -24,7 +24,8 @@ typedef enum WalSndState
 	WALSNDSTATE_STARTUP = 0,
 	WALSNDSTATE_BACKUP,
 	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
+	WALSNDSTATE_STREAMING,
+	WALSNDSTATE_STOPPING
 } WalSndState;
 
 /*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index d068dde5d7..553f0f43f7 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -32,6 +32,7 @@ typedef enum
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
 	PROCSIG_PARALLEL_MESSAGE,	/* message from cooperating parallel backend */
+	PROCSIG_WALSND_INIT_STOPPING,	/* ask walsenders to prepare for shutdown  */
 
 	/* Recovery conflict reasons */
 	PROCSIG_RECOVERY_CONFLICT_DATABASE,
-- 
2.12.0.264.gd6db3f2165.dirty

0004-Unify-SIGHUP-handling-between-normal-and-walsender-b.patchtext/x-patch; charset=us-asciiDownload

From b109a13bc9ded7dede6e391040dd4966b65d6b9f Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 4 Jun 2017 16:14:52 -0700
Subject: [PATCH 4/6] Unify SIGHUP handling between normal and walsender
 backends.

Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload.  Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).

This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.

A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.

Author: Petr Jelinek, Andres Freund
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
---
 src/backend/replication/walsender.c | 29 +++++++----------------------
 src/backend/tcop/postgres.c         | 30 ++++++++++++++----------------
 src/backend/utils/init/globals.c    |  1 +
 src/include/miscadmin.h             |  5 +++++
 4 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index ff5aff496d..a4f754a518 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -182,7 +182,6 @@ static bool streamingDoneReceiving;
 static bool WalSndCaughtUp = false;
 
 /* Flags set by signal handlers for later service in main loop */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t got_SIGUSR2 = false;
 static volatile sig_atomic_t got_STOPPING = false;
 
@@ -218,7 +217,6 @@ static struct
 }	LagTracker;
 
 /* Signal handlers */
-static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
 /* Prototypes for private functions */
@@ -1201,9 +1199,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 		CHECK_FOR_INTERRUPTS();
 
 		/* Process any requests or signals received recently */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			SyncRepInitConfig();
 		}
@@ -1309,9 +1307,9 @@ WalSndWaitForWal(XLogRecPtr loc)
 		CHECK_FOR_INTERRUPTS();
 
 		/* Process any requests or signals received recently */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			SyncRepInitConfig();
 		}
@@ -2101,9 +2099,9 @@ WalSndLoop(WalSndSendDataCallback send_data)
 		CHECK_FOR_INTERRUPTS();
 
 		/* Process any requests or signals received recently */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			SyncRepInitConfig();
 		}
@@ -2908,19 +2906,6 @@ HandleWalSndInitStopping(void)
 		got_STOPPING = true;
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-WalSndSigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /*
  * SIGUSR2: set flag to do a last cycle and shut down afterwards. The WAL
  * sender should already have been switched to WALSNDSTATE_STOPPING at
@@ -2942,7 +2927,7 @@ void
 WalSndSignals(void)
 {
 	/* Set up signal handlers */
-	pqsignal(SIGHUP, WalSndSigHupHandler);		/* set flag to read config
+	pqsignal(SIGHUP, PostgresSigHupHandler);	/* set flag to read config
 												 * file */
 	pqsignal(SIGINT, SIG_IGN);	/* not used */
 	pqsignal(SIGTERM, die);		/* request shutdown */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1357769150..70c9f8db59 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -123,13 +123,6 @@ char	   *register_stack_base_ptr = NULL;
 #endif
 
 /*
- * Flag to mark SIGHUP. Whenever the main loop comes around it
- * will reread the configuration file. (Better than doing the
- * reading in the signal handler, ey?)
- */
-static volatile sig_atomic_t got_SIGHUP = false;
-
-/*
  * Flag to keep track of whether we have started a transaction.
  * For extended query protocol this has to be remembered across messages.
  */
@@ -187,7 +180,6 @@ static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *pstmts);
 static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
-static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
 
 
@@ -2684,13 +2676,19 @@ FloatExceptionHandler(SIGNAL_ARGS)
 					   "invalid operation, such as division by zero.")));
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-SigHupHandler(SIGNAL_ARGS)
+/*
+ * SIGHUP: set flag to re-read config file at next convenient time.
+ *
+ * Sets the ConfigRereadPending flag, which should be checked at convenient
+ * places inside main loops. (Better than doing the reading in the signal
+ * handler, ey?)
+ */
+void
+PostgresSigHupHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	got_SIGHUP = true;
+	ConfigRereadPending = true;
 	SetLatch(MyLatch);
 
 	errno = save_errno;
@@ -3632,8 +3630,8 @@ PostgresMain(int argc, char *argv[],
 		WalSndSignals();
 	else
 	{
-		pqsignal(SIGHUP, SigHupHandler);		/* set flag to read config
-												 * file */
+		pqsignal(SIGHUP, PostgresSigHupHandler);		/* set flag to read config
+														 * file */
 		pqsignal(SIGINT, StatementCancelHandler);		/* cancel current query */
 		pqsignal(SIGTERM, die); /* cancel current query and exit */
 
@@ -4046,9 +4044,9 @@ PostgresMain(int argc, char *argv[],
 		 * (6) check for any other interesting events that happened while we
 		 * slept.
 		 */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 08b6030a64..f758a94b2f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -31,6 +31,7 @@ volatile bool QueryCancelPending = false;
 volatile bool ProcDiePending = false;
 volatile bool ClientConnectionLost = false;
 volatile bool IdleInTransactionSessionTimeoutPending = false;
+volatile sig_atomic_t ConfigRereadPending = false;
 volatile uint32 InterruptHoldoffCount = 0;
 volatile uint32 QueryCancelHoldoffCount = 0;
 volatile uint32 CritSectionCount = 0;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 4c607b299c..1cd24fd761 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -23,6 +23,8 @@
 #ifndef MISCADMIN_H
 #define MISCADMIN_H
 
+#include <signal.h>
+
 #include "pgtime.h"				/* for pg_time_t */
 
 
@@ -81,6 +83,7 @@ extern PGDLLIMPORT volatile bool InterruptPending;
 extern PGDLLIMPORT volatile bool QueryCancelPending;
 extern PGDLLIMPORT volatile bool ProcDiePending;
 extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending;
+extern PGDLLIMPORT volatile sig_atomic_t ConfigRereadPending;
 
 extern volatile bool ClientConnectionLost;
 
@@ -273,6 +276,8 @@ extern void restore_stack_base(pg_stack_base_t base);
 extern void check_stack_depth(void);
 extern bool stack_is_too_deep(void);
 
+extern void PostgresSigHupHandler(SIGNAL_ARGS);
+
 /* in tcop/utility.c */
 extern void PreventCommandIfReadOnly(const char *cmdname);
 extern void PreventCommandIfParallelMode(const char *cmdname);
-- 
2.12.0.264.gd6db3f2165.dirty

0005-Wire-up-query-cancel-interrupt-for-walsender-backend.patchtext/x-patch; charset=us-asciiDownload

From 92d5958f9fda344606a6c453123e70a17e3e671d Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 2 Jun 2017 16:07:08 -0700
Subject: [PATCH 5/6] Wire up query cancel interrupt for walsender backends.

This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
---
 src/backend/replication/walsender.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index a4f754a518..e132374d13 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2929,7 +2929,7 @@ WalSndSignals(void)
 	/* Set up signal handlers */
 	pqsignal(SIGHUP, PostgresSigHupHandler);	/* set flag to read config
 												 * file */
-	pqsignal(SIGINT, SIG_IGN);	/* not used */
+	pqsignal(SIGINT, StatementCancelHandler);	/* query cancel */
 	pqsignal(SIGTERM, die);		/* request shutdown */
 	pqsignal(SIGQUIT, quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
-- 
2.12.0.264.gd6db3f2165.dirty

0006-Use-PostgresSigHupHandler-everywhere-SIGHUP-is-handl.patchtext/x-patch; charset=us-asciiDownload

From dd18489d82b656ba3fca0e1be6cab7fd9f2ed429 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 4 Jun 2017 16:36:39 -0700
Subject: [PATCH 6/6] Use PostgresSigHupHandler everywhere SIGHUP is handled.

---
 src/backend/postmaster/autovacuum.c        | 30 ++++++++----------------------
 src/backend/postmaster/bgwriter.c          | 20 +++-----------------
 src/backend/postmaster/checkpointer.c      | 25 +++++--------------------
 src/backend/postmaster/pgarch.c            | 25 +++++--------------------
 src/backend/postmaster/pgstat.c            | 28 +++++++---------------------
 src/backend/postmaster/startup.c           |  7 +++----
 src/backend/postmaster/syslogger.c         | 20 +++-----------------
 src/backend/postmaster/walwriter.c         | 20 +++-----------------
 src/backend/replication/logical/launcher.c | 21 +++------------------
 src/backend/replication/logical/worker.c   | 23 +++--------------------
 src/backend/replication/walreceiver.c      |  7 +++----
 src/backend/utils/misc/guc.c               |  9 +++++----
 12 files changed, 51 insertions(+), 184 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 89dd3b321b..e11d353576 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -137,7 +137,6 @@ static bool am_autovacuum_launcher = false;
 static bool am_autovacuum_worker = false;
 
 /* Flags set by signal handlers */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t got_SIGUSR2 = false;
 static volatile sig_atomic_t got_SIGTERM = false;
 
@@ -351,7 +350,6 @@ static void perform_work_item(AutoVacuumWorkItem *workitem);
 static void autovac_report_activity(autovac_table *tab);
 static void autovac_report_workitem(AutoVacuumWorkItem *workitem,
 						const char *nspname, const char *relname);
-static void av_sighup_handler(SIGNAL_ARGS);
 static void avl_sigusr2_handler(SIGNAL_ARGS);
 static void avl_sigterm_handler(SIGNAL_ARGS);
 static void autovac_refresh_stats(void);
@@ -461,7 +459,7 @@ AutoVacLauncherMain(int argc, char *argv[])
 	 * backend, so we use the same signal handling.  See equivalent code in
 	 * tcop/postgres.c.
 	 */
-	pqsignal(SIGHUP, av_sighup_handler);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 	pqsignal(SIGINT, StatementCancelHandler);
 	pqsignal(SIGTERM, avl_sigterm_handler);
 
@@ -675,9 +673,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 		if (got_SIGTERM)
 			break;
 
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 
 			/* shutdown requested in config file? */
@@ -1406,18 +1404,6 @@ AutoVacWorkerFailed(void)
 	AutoVacuumShmem->av_signal[AutoVacForkFailed] = true;
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-av_sighup_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: a worker is up and running, or just finished, or failed to fork */
 static void
 avl_sigusr2_handler(SIGNAL_ARGS)
@@ -1540,7 +1526,7 @@ AutoVacWorkerMain(int argc, char *argv[])
 	 * backend, so we use the same signal handling.  See equivalent code in
 	 * tcop/postgres.c.
 	 */
-	pqsignal(SIGHUP, av_sighup_handler);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 
 	/*
 	 * SIGINT is used to signal canceling the current table's vacuum; SIGTERM
@@ -2333,9 +2319,9 @@ do_autovacuum(void)
 		/*
 		 * Check for config changes before processing each collected table.
 		 */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 
 			/*
@@ -2573,9 +2559,9 @@ deleted:
 				 * jobs.
 				 */
 				CHECK_FOR_INTERRUPTS();
-				if (got_SIGHUP)
+				if (ConfigRereadPending)
 				{
-					got_SIGHUP = false;
+					ConfigRereadPending = false;
 					ProcessConfigFile(PGC_SIGHUP);
 				}
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 2674bb49ba..09a97b912b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -89,13 +89,11 @@ static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
 /*
  * Flags set by interrupt handlers for later service in the main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
 /* Signal handlers */
 
 static void bg_quickdie(SIGNAL_ARGS);
-static void BgSigHupHandler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
 static void bgwriter_sigusr1_handler(SIGNAL_ARGS);
 
@@ -120,7 +118,7 @@ BackgroundWriterMain(void)
 	 * bgwriter doesn't participate in ProcSignal signalling, but a SIGUSR1
 	 * handler is still needed for latch wakeups.
 	 */
-	pqsignal(SIGHUP, BgSigHupHandler);	/* set flag to read config file */
+	pqsignal(SIGHUP, PostgresSigHupHandler);	/* set flag to read config file */
 	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, ReqShutdownHandler);		/* shutdown */
 	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
@@ -259,9 +257,9 @@ BackgroundWriterMain(void)
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
 
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 		if (shutdown_requested)
@@ -432,18 +430,6 @@ bg_quickdie(SIGNAL_ARGS)
 	exit(2);
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-BgSigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGTERM: set flag to shutdown and exit */
 static void
 ReqShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index a55071900d..726c1c2a1d 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -149,7 +149,6 @@ double		CheckPointCompletionTarget = 0.5;
 /*
  * Flags set by interrupt handlers for later service in the main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t checkpoint_requested = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
@@ -177,7 +176,6 @@ static void UpdateSharedMemoryConfig(void);
 /* Signal handlers */
 
 static void chkpt_quickdie(SIGNAL_ARGS);
-static void ChkptSigHupHandler(SIGNAL_ARGS);
 static void ReqCheckpointHandler(SIGNAL_ARGS);
 static void chkpt_sigusr1_handler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
@@ -205,8 +203,7 @@ CheckpointerMain(void)
 	 * want to wait for the backends to exit, whereupon the postmaster will
 	 * tell us it's okay to shut down (via SIGUSR2).
 	 */
-	pqsignal(SIGHUP, ChkptSigHupHandler);		/* set flag to read config
-												 * file */
+	pqsignal(SIGHUP, PostgresSigHupHandler);	/* set flag to read config file */
 	pqsignal(SIGINT, ReqCheckpointHandler);		/* request checkpoint */
 	pqsignal(SIGTERM, SIG_IGN); /* ignore SIGTERM */
 	pqsignal(SIGQUIT, chkpt_quickdie);	/* hard crash time */
@@ -365,9 +362,9 @@ CheckpointerMain(void)
 		 */
 		AbsorbFsyncRequests();
 
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 
 			/*
@@ -691,9 +688,9 @@ CheckpointWriteDelay(int flags, double progress)
 		!ImmediateCheckpointRequested() &&
 		IsCheckpointOnSchedule(progress))
 	{
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 			/* update shmem copies of config variables */
 			UpdateSharedMemoryConfig();
@@ -846,18 +843,6 @@ chkpt_quickdie(SIGNAL_ARGS)
 	exit(2);
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-ChkptSigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGINT: set flag to run a normal checkpoint right away */
 static void
 ReqCheckpointHandler(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 2dce39fdef..9b407abe41 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -73,7 +73,6 @@ static time_t last_sigterm_time = 0;
 /*
  * Flags set by interrupt handlers for later service in the main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
@@ -88,7 +87,6 @@ static pid_t pgarch_forkexec(void);
 
 NON_EXEC_STATIC void PgArchiverMain(int argc, char *argv[]) pg_attribute_noreturn();
 static void pgarch_exit(SIGNAL_ARGS);
-static void ArchSigHupHandler(SIGNAL_ARGS);
 static void ArchSigTermHandler(SIGNAL_ARGS);
 static void pgarch_waken(SIGNAL_ARGS);
 static void pgarch_waken_stop(SIGNAL_ARGS);
@@ -219,7 +217,7 @@ PgArchiverMain(int argc, char *argv[])
 	 * Ignore all signals usually bound to some action in the postmaster,
 	 * except for SIGHUP, SIGTERM, SIGUSR1, SIGUSR2, and SIGQUIT.
 	 */
-	pqsignal(SIGHUP, ArchSigHupHandler);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, ArchSigTermHandler);
 	pqsignal(SIGQUIT, pgarch_exit);
@@ -252,19 +250,6 @@ pgarch_exit(SIGNAL_ARGS)
 	exit(1);
 }
 
-/* SIGHUP signal handler for archiver process */
-static void
-ArchSigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	/* set flag to re-read config file at next convenient time */
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGTERM signal handler for archiver process */
 static void
 ArchSigTermHandler(SIGNAL_ARGS)
@@ -341,9 +326,9 @@ pgarch_MainLoop(void)
 		time_to_stop = ready_to_stop;
 
 		/* Check for config update */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 
@@ -444,9 +429,9 @@ pgarch_ArchiverCopyLoop(void)
 			 * setting for archive_command as soon as possible, even if there
 			 * is a backlog of files to be archived.
 			 */
-			if (got_SIGHUP)
+			if (ConfigRereadPending)
 			{
-				got_SIGHUP = false;
+				ConfigRereadPending = false;
 				ProcessConfigFile(PGC_SIGHUP);
 			}
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f453dade6c..1176dca62e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -266,7 +266,6 @@ static List *pending_write_requests = NIL;
 
 /* Signal handler flags */
 static volatile bool need_exit = false;
-static volatile bool got_SIGHUP = false;
 
 /*
  * Total time charged to functions so far in the current backend.
@@ -287,7 +286,6 @@ static pid_t pgstat_forkexec(void);
 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
 static void pgstat_exit(SIGNAL_ARGS);
 static void pgstat_beshutdown_hook(int code, Datum arg);
-static void pgstat_sighup_handler(SIGNAL_ARGS);
 
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
@@ -4186,7 +4184,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	 * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
 	 * support latch operations, because we only use a local latch.
 	 */
-	pqsignal(SIGHUP, pgstat_sighup_handler);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, SIG_IGN);
 	pqsignal(SIGQUIT, pgstat_exit);
@@ -4221,10 +4219,10 @@ PgstatCollectorMain(int argc, char *argv[])
 	 * message.  (This effectively means that if backends are sending us stuff
 	 * like mad, we won't notice postmaster death until things slack off a
 	 * bit; which seems fine.)	To do that, we have an inner loop that
-	 * iterates as long as recv() succeeds.  We do recognize got_SIGHUP inside
-	 * the inner loop, which means that such interrupts will get serviced but
-	 * the latch won't get cleared until next time there is a break in the
-	 * action.
+	 * iterates as long as recv() succeeds.  We do recognize
+	 * ConfigRereadPending inside the inner loop, which means that such
+	 * interrupts will get serviced but the latch won't get cleared until next
+	 * time there is a break in the action.
 	 */
 	for (;;)
 	{
@@ -4246,9 +4244,9 @@ PgstatCollectorMain(int argc, char *argv[])
 			/*
 			 * Reload configuration if we got SIGHUP from the postmaster.
 			 */
-			if (got_SIGHUP)
+			if (ConfigRereadPending)
 			{
-				got_SIGHUP = false;
+				ConfigRereadPending = false;
 				ProcessConfigFile(PGC_SIGHUP);
 			}
 
@@ -4439,18 +4437,6 @@ pgstat_exit(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGHUP handler for collector process */
-static void
-pgstat_sighup_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /*
  * Subroutine to clear stats in a database entry
  *
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index b623252475..f57f7970fe 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -38,7 +38,6 @@
 /*
  * Flags set by interrupt handlers for later service in the redo loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 static volatile sig_atomic_t promote_triggered = false;
 
@@ -122,7 +121,7 @@ StartupProcSigHupHandler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
-	got_SIGHUP = true;
+	ConfigRereadPending = true;
 	WakeupRecovery();
 
 	errno = save_errno;
@@ -150,9 +149,9 @@ HandleStartupProcInterrupts(void)
 	/*
 	 * Check if we were requested to re-read config file.
 	 */
-	if (got_SIGHUP)
+	if (ConfigRereadPending)
 	{
-		got_SIGHUP = false;
+		ConfigRereadPending = false;
 		ProcessConfigFile(PGC_SIGHUP);
 	}
 
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 9f5ca5cac0..8ee318faee 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -122,7 +122,6 @@ static CRITICAL_SECTION sysloggerSection;
 /*
  * Flags set by interrupt handlers for later service in the main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t rotation_requested = false;
 
 
@@ -144,7 +143,6 @@ static unsigned int __stdcall pipeThread(void *arg);
 static void logfile_rotate(bool time_based_rotation, int size_rotation_for);
 static char *logfile_getname(pg_time_t timestamp, const char *suffix);
 static void set_next_rotation_time(void);
-static void sigHupHandler(SIGNAL_ARGS);
 static void sigUsr1Handler(SIGNAL_ARGS);
 static void update_metainfo_datafile(void);
 
@@ -240,7 +238,7 @@ SysLoggerMain(int argc, char *argv[])
 	 * broken backends...
 	 */
 
-	pqsignal(SIGHUP, sigHupHandler);	/* set flag to read config file */
+	pqsignal(SIGHUP, PostgresSigHupHandler);	/* set flag to read config file */
 	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, SIG_IGN);
 	pqsignal(SIGQUIT, SIG_IGN);
@@ -303,9 +301,9 @@ SysLoggerMain(int argc, char *argv[])
 		/*
 		 * Process any requests or signals received recently.
 		 */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 
 			/*
@@ -1421,18 +1419,6 @@ update_metainfo_datafile(void)
  * --------------------------------
  */
 
-/* SIGHUP: set flag to reload config file */
-static void
-sigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGUSR1: set flag to rotate logfile */
 static void
 sigUsr1Handler(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a575d8f953..29e00f2890 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -79,12 +79,10 @@ int			WalWriterFlushAfter = 128;
 /*
  * Flags set by interrupt handlers for later service in the main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t shutdown_requested = false;
 
 /* Signal handlers */
 static void wal_quickdie(SIGNAL_ARGS);
-static void WalSigHupHandler(SIGNAL_ARGS);
 static void WalShutdownHandler(SIGNAL_ARGS);
 static void walwriter_sigusr1_handler(SIGNAL_ARGS);
 
@@ -108,7 +106,7 @@ WalWriterMain(void)
 	 * We have no particular use for SIGINT at the moment, but seems
 	 * reasonable to treat like SIGTERM.
 	 */
-	pqsignal(SIGHUP, WalSigHupHandler); /* set flag to read config file */
+	pqsignal(SIGHUP, PostgresSigHupHandler); /* set flag to read config file */
 	pqsignal(SIGINT, WalShutdownHandler);		/* request shutdown */
 	pqsignal(SIGTERM, WalShutdownHandler);		/* request shutdown */
 	pqsignal(SIGQUIT, wal_quickdie);	/* hard crash time */
@@ -260,9 +258,9 @@ WalWriterMain(void)
 		/*
 		 * Process any requests or signals received recently.
 		 */
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 		if (shutdown_requested)
@@ -342,18 +340,6 @@ wal_quickdie(SIGNAL_ARGS)
 	exit(2);
 }
 
-/* SIGHUP: set flag to re-read config file at next convenient time */
-static void
-WalSigHupHandler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* SIGTERM: set flag to exit normally */
 static void
 WalShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 345a415212..d92ee3d3a6 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -80,7 +80,6 @@ static void logicalrep_worker_detach(void);
 static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
 
 /* Flags set by signal handlers */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t got_SIGTERM = false;
 
 static bool on_commit_launcher_wakeup = false;
@@ -637,20 +636,6 @@ logicalrep_launcher_sigterm(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGHUP: set flag to reload configuration at next convenient time */
-static void
-logicalrep_launcher_sighup(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-
-	/* Waken anything waiting on the process latch */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /*
  * Count the number of registered (not necessarily running) sync workers
  * for a subscription.
@@ -799,7 +784,7 @@ ApplyLauncherMain(Datum main_arg)
 	before_shmem_exit(logicalrep_launcher_onexit, (Datum) 0);
 
 	/* Establish signal handlers. */
-	pqsignal(SIGHUP, logicalrep_launcher_sighup);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 	pqsignal(SIGTERM, logicalrep_launcher_sigterm);
 	BackgroundWorkerUnblockSignals();
 
@@ -889,9 +874,9 @@ ApplyLauncherMain(Datum main_arg)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
+			ConfigRereadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index a570900a42..16d3a6f5df 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -120,9 +120,6 @@ static void store_flush_position(XLogRecPtr remote_lsn);
 
 static void maybe_reread_subscription(void);
 
-/* Flags set by signal handlers */
-static volatile sig_atomic_t got_SIGHUP = false;
-
 /*
  * Should this worker apply changes for given relation.
  *
@@ -1156,10 +1153,10 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		if (got_SIGHUP)
+		if (ConfigRereadPending)
 		{
-			got_SIGHUP = false;
 			ProcessConfigFile(PGC_SIGHUP);
+			ConfigRereadPending = false;
 		}
 
 		if (rc & WL_TIMEOUT)
@@ -1451,20 +1448,6 @@ subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
 	MySubscriptionValid = false;
 }
 
-/* SIGHUP: set flag to reload configuration at next convenient time */
-static void
-logicalrep_worker_sighup(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	got_SIGHUP = true;
-
-	/* Waken anything waiting on the process latch */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
-}
-
 /* Logical Replication Apply worker entry point */
 void
 ApplyWorkerMain(Datum main_arg)
@@ -1480,7 +1463,7 @@ ApplyWorkerMain(Datum main_arg)
 	logicalrep_worker_attach(worker_slot);
 
 	/* Setup signal handling */
-	pqsignal(SIGHUP, logicalrep_worker_sighup);
+	pqsignal(SIGHUP, PostgresSigHupHandler);
 	pqsignal(SIGTERM, die);
 	BackgroundWorkerUnblockSignals();
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2723612718..3c7bb49d5c 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -95,7 +95,6 @@ static uint32 recvOff = 0;
  * Flags set by interrupt handlers of walreceiver for later service in the
  * main loop.
  */
-static volatile sig_atomic_t got_SIGHUP = false;
 static volatile sig_atomic_t got_SIGTERM = false;
 
 /*
@@ -424,9 +423,9 @@ WalReceiverMain(void)
 				/* Process any requests or signals received recently */
 				ProcessWalRcvInterrupts();
 
-				if (got_SIGHUP)
+				if (ConfigRereadPending)
 				{
-					got_SIGHUP = false;
+					ConfigRereadPending = false;
 					ProcessConfigFile(PGC_SIGHUP);
 					XLogWalRcvSendHSFeedback(true);
 				}
@@ -799,7 +798,7 @@ WalRcvDie(int code, Datum arg)
 static void
 WalRcvSigHupHandler(SIGNAL_ARGS)
 {
-	got_SIGHUP = true;
+	ConfigRereadPending = true;
 }
 
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 92e1d63b2f..dd20ceab30 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8965,10 +8965,11 @@ read_nondefault_variables(void)
  * value before processing serialized values.
  *
  * A PGC_S_DEFAULT setting on the serialize side will typically match new
- * postmaster children, but that can be false when got_SIGHUP == true and the
- * pending configuration change modifies this setting.  Nonetheless, we omit
- * PGC_S_DEFAULT settings from serialization and make up for that by restoring
- * defaults before applying serialized values.
+ * postmaster children, but that can be false when
+ * ConfigRereadPending == true and the pending configuration change
+ * modifies this setting.  Nonetheless, we omit PGC_S_DEFAULT settings from
+ * serialization and make up for that by restoring defaults before applying
+ * serialized values.
  *
  * PGC_POSTMASTER variables always have the same value in every child of a
  * particular postmaster.  Most PGC_INTERNAL variables are compile-time
-- 
2.12.0.264.gd6db3f2165.dirty

#62

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#61)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote:

Michael, Peter, Fujii, is either of you planning to review this? I'm
planning to commit this tomorrow morning PST, unless somebody protest
till then.

Yes, I am. It would be nice if you could let me 24 hours to look at it
in details.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#62)

Re: logical replication and PANIC during shutdown checkpoint in publisher

Hi,

On 2017-06-05 10:31:12 +0900, Michael Paquier wrote:

On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote:

Michael, Peter, Fujii, is either of you planning to review this? I'm
planning to commit this tomorrow morning PST, unless somebody protest
till then.

Yes, I am. It would be nice if you could let me 24 hours to look at it
in details.

Sure. Could you let me know when you're done?

Noah, I might thus not be able to resolve most of "Query handling in
Walsender is somewhat broken" by tomorrow, but it might end up being
Tuesday. Even after that there'll be a remaining item after all these.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#61)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-02 17:20:23 -0700, Andres Freund wrote:

Attached is a *preliminary* patch series implementing this. I've first
reverted the previous patch, as otherwise backpatchable versions of the
necessary patches would get too complicated, due to the signals used and
such.

That makes sense.

I went again through this, and the only real thing I found that there
was a leftover prototype in walsender.h. I've in interim worked on
backpatch versions of that series, annoying conflicts, but nothing
really problematic. The only real difference is adding SetLatch() calls
to HandleWalSndInitStopping() < 9.6, and guarding SetLatch with an if <
9.5.

As an additional patch (based on one by Petr), even though it more
belongs to
http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de
attached is a patch unifying SIGHUP between normal and walsender
backends. This needs to be backpatched all the way. I've also attached
a second patch, again based on Petr's, that unifies SIGHUP handling
across all the remaining backends, but that's something that probably
more appropriate for v11, although I'm still tempted to commit it
earlier.

I have looked at all those patches. The set looks solid to me.

0001 and 0002 are straight-forward things. It makes sense to unify the
SIGUSR1 handling.

Here are some comments about 0003.

+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
s/prevent/preventing/?
I would rephrase the last sentence a bit:
"After that each WAL sender will wait until the end-of-checkpoint
record has been flushed on the receiver side."

+           /*
+            * Have WalSndLoop() terminate the connection in an orderly
+            * manner, after writing out all the pending data.
+            */
+           if (got_STOPPING)
+               got_SIGUSR2 = true;
I think that for correctness the state of the WAL sender should be
switched to WALSNDSTATE_STOPPING in XLogSendLogical() as well.

About 0004... This definitely meritates a backpatch, PostgresMain() is
taken by WAL senders as well when executing queries.

-       if (got_SIGHUP)
+       if (ConfigRereadPending)
        {
-           got_SIGHUP = false;
+           ConfigRereadPending = false;
A more appropriate name would be ConfigReloadPending perhaps?

0005 looks like a fine one-liner to me.

For 0006, you could include as well the removal of worker_spi_sighup()
in the refactoring. I think that it would be interesting to be able to
trigger a feedback message using SIGHUP in WAL receivers, refactoring
at the same time SIGHUP handling for WAL receivers. It is possible for
example to abuse SIGHUP in autovacuum for cost parameters.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#64)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-05 15:30:38 +0900, Michael Paquier wrote:

I have looked at all those patches. The set looks solid to me.

Thanks!

Here are some comments about 0003.
+           /*
+            * Have WalSndLoop() terminate the connection in an orderly
+            * manner, after writing out all the pending data.
+            */
+           if (got_STOPPING)
+               got_SIGUSR2 = true;
I think that for correctness the state of the WAL sender should be
switched to WALSNDSTATE_STOPPING in XLogSendLogical() as well.

No, that would be wrong. If we switched here, checkpointer would finish
waiting, even though XLogSendLogical() might get called again. That
e.g. could happen the TCP socket was full, and XLogSendLogical() gets
called again.

A more appropriate name would be ConfigReloadPending perhaps?

Hm, ok.

0005 looks like a fine one-liner to me.

For 0006, you could include as well the removal of worker_spi_sighup()
in the refactoring.

Ok. I'll leave that patch for now, since I think it's probably better
to apply it only to master once v10 branched off.

I think that it would be interesting to be able to
trigger a feedback message using SIGHUP in WAL receivers, refactoring
at the same time SIGHUP handling for WAL receivers. It is possible for
example to abuse SIGHUP in autovacuum for cost parameters.

Could you clarify a bit here, I can't follow? Do you think it's
actually a good idea to combine that with the largely mechanical patch?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andres Freund (#65)

Re: logical replication and PANIC during shutdown checkpoint in publisher

On Tue, Jun 6, 2017 at 9:47 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-05 15:30:38 +0900, Michael Paquier wrote:

I think that it would be interesting to be able to
trigger a feedback message using SIGHUP in WAL receivers, refactoring
at the same time SIGHUP handling for WAL receivers. It is possible for
example to abuse SIGHUP in autovacuum for cost parameters.

Could you clarify a bit here, I can't follow? Do you think it's
actually a good idea to combine that with the largely mechanical patch?

Sort of. The thought here is to be able to trigger
XLogWalRcvSendReply() using a SIGHUP, even if force_reply is not
enforced. But looking again at the code, XLogWalRcvSendReply() is
processed only if data is received so sending multiple times the same
message to server would be pointless. Still, don't you think that it
would be helpful to wake up the WAL receiver at will on SIGHUP by
setting its latch? XLogWalRcvSendHSFeedback() could be triggered at
will using that.

ProcessWalRcvInterrupts() could include the checks for SIGHUP by the way...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Michael Paquier (#64)

Re: logical replication and PANIC during shutdown checkpoint in publisher

Hi,

On 2017-06-05 15:30:38 +0900, Michael Paquier wrote:

+ * This will trigger walsenders to send the remaining WAL, prevent them from
+ * accepting further commands. After that they'll wait till the last WAL is
+ * written.
s/prevent/preventing/?
I would rephrase the last sentence a bit:
"After that each WAL sender will wait until the end-of-checkpoint
record has been flushed on the receiver side."

I didn't like your proposed phrasing much, but I aggree that what I had
wasn't good either. Tried to improve it.

Thanks for the review.

I pushed this series, this should resolve the issue in this thread
entirely, and should fix a good chunk of the issues in the 'walsender
and parallelism' thread.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers