Infinite loop on master shutdown

Started by Kyotaro Horiguchialmost 8 years ago3 messageshackers
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello, as in pgsql-bug ML.

/messages/by-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

# The patch is slightly different from that I posted to -bugs.

It is enough to chek for the invalid state just once but the
patch continues the check.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

reject_invalid_standby.patchtext/x-patch; charset=us-asciiDownload+13-0
#2Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#1)
Re: Infinite loop on master shutdown

Hi,

On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:

Hello, as in pgsql-bug ML.

/messages/by-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

I'm very doubtful. If you do bad stuff to a standby, bad things can
happen...

Greetings,

Andres Freund

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#2)
Re: Infinite loop on master shutdown

At Thu, 17 May 2018 09:20:01 -0700, Andres Freund <andres@anarazel.de> wrote in <20180517162001.rzd7l6g2h66hvzvd@alap3.anarazel.de>

Hi,

On 2018-05-17 17:19:00 +0900, Kyotaro HORIGUCHI wrote:

Hello, as in pgsql-bug ML.

/messages/by-id/20180517.170021.24356216.horiguchi.kyotaro@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

I'm very doubtful. If you do bad stuff to a standby, bad things can
happen...

Yes, I doubted its worthiness since I didn't find more natural
way to cause that.

Thanks for the opinion.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center