Regression stoping PostgreSQL 9.4.13 if a walsender is running

Started by Marco Nenciariniover 8 years ago4 messages
#1Marco Nenciarini
marco.nenciarini@2ndquadrant.it

I have noticed that after the 9.4.13 release PostgreSQL reliably fails
to shutdown with smart and fast method if there is a running walsender.

The postmaster continues waiting forever for the walsender termination.

It works perfectly with all the other major releases.

I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d

After some investigation I discovered that the instruction that sets
got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
function.

The trivial patch is the following:

~~~
diff --git a/src/backend/replication/walsender.c
b/src/backend/replication/walsender.c
index a0601b3..b24f9a1 100644
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*************** WalSndLastCycleHandler(SIGNAL_ARGS)
*** 2658,2663 ****
--- 2658,2664 ----
  {
    int         save_errno = errno;

+ got_SIGUSR2 = true;
if (MyWalSnd)
SetLatch(&MyWalSnd->latch);

~~~

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Marco Nenciarini (#1)
Re: Regression stoping PostgreSQL 9.4.13 if a walsender is running

On Wed, Aug 23, 2017 at 2:28 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:

I have noticed that after the 9.4.13 release PostgreSQL reliably fails
to shutdown with smart and fast method if there is a running walsender.

The postmaster continues waiting forever for the walsender termination.

It works perfectly with all the other major releases.

Right. A similar issue has been reported yesterday:
/messages/by-id/CAA5_DuD0O1XyM8OnOzhRepyPU-t8nZKLzs1pT2JpzP0NS+vVNA@mail.gmail.com
Thanks for digging into the origin of the problem, I was lacking of
time yesterday to look at it.

I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d

After some investigation I discovered that the instruction that sets
got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
function.

That looks correct to me, only REL9_4_STABLE is impacted. This bug
breaks many use cases like failovers :(
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#2)
Re: Regression stoping PostgreSQL 9.4.13 if a walsender is running

On 2017-08-23 09:52:45 +0900, Michael Paquier wrote:

On Wed, Aug 23, 2017 at 2:28 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:

I have noticed that after the 9.4.13 release PostgreSQL reliably fails
to shutdown with smart and fast method if there is a running walsender.

The postmaster continues waiting forever for the walsender termination.

It works perfectly with all the other major releases.

Right. A similar issue has been reported yesterday:
/messages/by-id/CAA5_DuD0O1XyM8OnOzhRepyPU-t8nZKLzs1pT2JpzP0NS+vVNA@mail.gmail.com
Thanks for digging into the origin of the problem, I was lacking of
time yesterday to look at it.

I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d

After some investigation I discovered that the instruction that sets
got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
function.

Yea, that's an annoying screwup (by me) - there were merge conflicts on
every single version, so apparently I screwed up at least one of them
:(. Sorry for that.

Will fix tomorrow.

That looks correct to me, only REL9_4_STABLE is impacted. This bug
breaks many use cases like failovers :(

Well, scheduled failovers, that is.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Marco Nenciarini (#1)
Re: Regression stoping PostgreSQL 9.4.13 if a walsender is running

Hi,

On 2017-08-22 19:28:22 +0200, Marco Nenciarini wrote:

I have noticed that after the 9.4.13 release PostgreSQL reliably fails
to shutdown with smart and fast method if there is a running walsender.

The postmaster continues waiting forever for the walsender termination.

It works perfectly with all the other major releases.

I bisected the issue to commit 1cdc0ab9c180222a94e1ea11402e728688ddc37d

After some investigation I discovered that the instruction that sets
got_SIGUSR2 was lost during the backpatch in the WalSndLastCycleHandler
function.

The trivial patch is the following:

Pushed, thanks! And sorry again.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers