Problem while setting the fpw with SIGHUP

Started by Dilip Kumarabout 8 years ago75 messageshackers
Jump to latest
#1Dilip Kumar
dilipbalaut@gmail.com

While setting the full_page_write with SIGHUP I hit an assert in checkpoint
process. And, that is because inside a CRITICAL section we are calling
RecoveryInProgress which intern allocates memory. So I have moved
RecoveryInProgress call out of the CRITICAL section and the problem got
solved.

command executed: killall -SIGHUP postgres
Crash call stack:
#0 0x00007fa19560d5d7 in raise () from /lib64/libc.so.6
#1 0x00007fa19560ecc8 in abort () from /lib64/libc.so.6
#2 0x00000000009fc991 in ExceptionalCondition (conditionName=0xc5ab1c
"!(CritSectionCount == 0)", errorType=0xc5a739 "FailedAssertion",
fileName=0xc5a8a5 "mcxt.c", lineNumber=635) at assert.c:54
#3 0x0000000000a34e56 in MemoryContextCreate (node=0x192edc0,
tag=T_AllocSetContext, size=216, nameoffset=216, methods=0xc58620
<AllocSetMethods>,
parent=0x18fe1b0, name=0xac1137 "WAL record construction", flags=0) at
mcxt.c:635
#4 0x0000000000a2aaa1 in AllocSetContextCreateExtended (parent=0x18fe1b0,
name=0xac1137 "WAL record construction", flags=0, minContextSize=0,
initBlockSize=8192, maxBlockSize=8388608) at aset.c:463
#5 0x000000000055983c in InitXLogInsert () at xloginsert.c:1033
#6 0x000000000054e4e5 in InitXLOGAccess () at xlog.c:8183
#7 0x000000000054df71 in RecoveryInProgress () at xlog.c:7952
#8 0x00000000005507f6 in UpdateFullPageWrites () at xlog.c:9566
#9 0x00000000007ea821 in UpdateSharedMemoryConfig () at checkpointer.c:1366
#10 0x00000000007e95a1 in CheckpointerMain () at checkpointer.c:383

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

fix_fpwupdate.patchapplication/octet-stream; name=fix_fpwupdate.patchDownload+4-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#1)
Re: Problem while setting the fpw with SIGHUP

On Fri, Mar 09, 2018 at 01:42:04PM +0530, Dilip Kumar wrote:

While setting the full_page_write with SIGHUP I hit an assert in checkpoint
process. And, that is because inside a CRITICAL section we are calling
RecoveryInProgress which intern allocates memory. So I have moved
RecoveryInProgress call out of the CRITICAL section and the problem got
solved.

Indeed, I can see how this is possible.

If you apply the following you can also have way more fun, but that's
overdoing it:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7918,6 +7918,8 @@ CheckRecoveryConsistency(void)
bool
RecoveryInProgress(void)
{
+   Assert(CritSectionCount == 0);

Anyway, it seems to me that you are not taking care of all possible race
conditions here. RecoveryInProgress() could as well be called in
XLogFlush(), and that's a code path taken during redo.

Instead of doing what you are suggesting, why not moving
InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
the allocations for WAL inserts is done unconditionally? This has
the cost of also making this allocation even for backends which are
started during recovery, still we are talking about allocating a couple
of bytes in exchange of addressing completely all race conditions in
this area. InitXLogInsert() does not depend on any post-recovery data
like ThisTimeLineId, so a split is possible.

Thoughts?
--
Michael

#3Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#2)
Re: Problem while setting the fpw with SIGHUP

On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:

Instead of doing what you are suggesting, why not moving
InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
the allocations for WAL inserts is done unconditionally? This has
the cost of also making this allocation even for backends which are
started during recovery, still we are talking about allocating a couple
of bytes in exchange of addressing completely all race conditions in
this area. InitXLogInsert() does not depend on any post-recovery data
like ThisTimeLineId, so a split is possible.

I have been hacking things this way, and it seems to me that it takes
care of all this class of problems. CreateCheckPoint() actually
mentions that InitXLogInsert() cannot be called in a critical section,
so the comments don't match the code. I also think that we still want
to be able to use RecoveryInProgress() in critical sections to do
decision-making for the generation of WAL records.
--
Michael

Attachments:

xlog-insert-critical-v1.patchtext/x-diff; charset=us-asciiDownload+18-8
#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#3)
Re: Problem while setting the fpw with SIGHUP

On Fri, Mar 16, 2018 at 10:53 AM, Michael Paquier <michael@paquier.xyz>
wrote:

On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:

Instead of doing what you are suggesting, why not moving
InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
the allocations for WAL inserts is done unconditionally? This has
the cost of also making this allocation even for backends which are
started during recovery, still we are talking about allocating a couple
of bytes in exchange of addressing completely all race conditions in
this area. InitXLogInsert() does not depend on any post-recovery data
like ThisTimeLineId, so a split is possible.

I have been hacking things this way, and it seems to me that it takes
care of all this class of problems. CreateCheckPoint() actually
mentions that InitXLogInsert() cannot be called in a critical section,
so the comments don't match the code. I also think that we still want
to be able to use RecoveryInProgress() in critical sections to do
decision-making for the generation of WAL records

Thanks for the patch, the idea looks good to me. Please find some comments
and updated patch.

I think like WALWriterProcess, we need to call InitXLogInsert for the
CheckpointerProcess as well as for the BgWriterProcess
because earlier they were calling InitXLogInsert while check
RecoveryInProgress before inserting the WAL.

see below crash:
#0 0x00007f89273a65d7 in raise () from /lib64/libc.so.6
#1 0x00007f89273a7cc8 in abort () from /lib64/libc.so.6
#2 0x00000000009fd24e in errfinish (dummy=0) at elog.c:556
#3 0x00000000009ff70b in elog_finish (elevel=20, fmt=0xac0d1d "too much
WAL data") at elog.c:1378
#4 0x0000000000558766 in XLogRegisterData (data=0xf3efac <fullPageWrites>
"\001", len=1) at xloginsert.c:330
#5 0x000000000055080e in UpdateFullPageWrites () at xlog.c:9569
#6 0x00000000007ea831 in UpdateSharedMemoryConfig () at checkpointer.c:1360
#7 0x00000000007e95b1 in CheckpointerMain () at checkpointer.c:370
#8 0x0000000000561680 in AuxiliaryProcessMain (argc=2,
argv=0x7fffcfd4bec0) at bootstrap.c:447

I have modified you patch and called InitXLogInsert for CheckpointerProcess
and BgWriterProcess also. After that the
issue is solved and fpw is getting set properly.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

xlog-insert-critical-v2.patchapplication/octet-stream; name=xlog-insert-critical-v2.patchDownload+22-8
#5Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#4)
Re: Problem while setting the fpw with SIGHUP

On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:

I think like WALWriterProcess, we need to call InitXLogInsert for the
CheckpointerProcess as well as for the BgWriterProcess
because earlier they were calling InitXLogInsert while check
RecoveryInProgress before inserting the WAL.

/* don't set signals, bgwriter has its own agenda */
+                       InitXLOGAccess();
+                       InitXLogInsert()

This is wrong, as the checkpointer is started as well on standbys, and
that InitXLOGAccess initializes things for WAL generation like
ThisTimeLineID. So you should just call InitXLogInsert(), and a comment
would be welcome for both the bgwriter and the checkpointer.
--
Michael

#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#5)
Re: Problem while setting the fpw with SIGHUP

On Tue, Mar 20, 2018 at 11:26 AM, Michael Paquier <michael@paquier.xyz>
wrote:

On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:

I think like WALWriterProcess, we need to call InitXLogInsert for the
CheckpointerProcess as well as for the BgWriterProcess
because earlier they were calling InitXLogInsert while check
RecoveryInProgress before inserting the WAL.

/* don't set signals, bgwriter has its own agenda */
+                       InitXLOGAccess();
+                       InitXLogInsert()

This is wrong, as the checkpointer is started as well on standbys, and
that InitXLOGAccess initializes things for WAL generation like
ThisTimeLineID. So you should just call InitXLogInsert(), and a comment
would be welcome for both the bgwriter and the checkpointer.

Yeah, you are right. Fixed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

xlog-insert-critical-v3.patchapplication/octet-stream; name=xlog-insert-critical-v3.patchDownload+24-8
#7Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#6)
Re: Problem while setting the fpw with SIGHUP

On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:

Yeah, you are right. Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench. I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert(). Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint. So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process. In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
* Initialize WAL-related operations and enter the main loop of each
* process. InitXLogInsert is called for each process which can
* generate WAL. While this is wasteful for processes started on
* a standby, this gives the guarantee that initialization of WAL
* insertion areas is able to happen in a consistent way and out of
* any critical sections so as the facility is usable when a promotion
* is triggered.
*/

What do you think?
--
Michael

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Michael Paquier (#7)
Re: Problem while setting the fpw with SIGHUP

On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <michael@paquier.xyz>
wrote:

On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:

Yeah, you are right. Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench. I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert(). Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint. So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process.

In StarupXLOG, just before the CreateCheckPoint() call, we are calling
LocalSetXLogInsertAllowed(). So, I am thinking can we just remove
InitXLogInsert
from CreateCheckpoint. And, we don't need to move it to bootstrap.c. Or am
I missing something?

In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
* Initialize WAL-related operations and enter the main loop of each
* process. InitXLogInsert is called for each process which can
* generate WAL. While this is wasteful for processes started on
* a standby, this gives the guarantee that initialization of WAL
* insertion areas is able to happen in a consistent way and out of
* any critical sections so as the facility is usable when a promotion
* is triggered.
*/

What do you think?

Looks good to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Michael Paquier
michael@paquier.xyz
In reply to: Dilip Kumar (#8)
Re: Problem while setting the fpw with SIGHUP

On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote:

On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <michael@paquier.xyz>
wrote:
In StartupXLOG, just before the CreateCheckPoint() call, we are calling
LocalSetXLogInsertAllowed(). So, I am thinking can we just remove
InitXLogInsert from CreateCheckpoint. And, we don't need to move it to
bootstrap.c. Or am I missing something?

I have finally been able to spend more time on this issue, and checked
for a couple of hours all the calls to RecoveryInProgress() that could
be triggered within a critical section to see if the move I suggested
previously is worth it ot not as this would cost some memory for
standbys all the time, which would suck for many read-only sessions.

There are a couple of calls potentially happening in critical sections,
however except for the one in UpdateFullPageWrites() those are used for
sanity checks in code paths that should never trigger it, take
XLogInsertBegin() for example. So with assertions this would trigger
a failure before the real elog(ERROR) message shows up.

Hence, I am changing opinion still I think that instead of the patch you
proposed first we could just do a call to InitXLogInsert() before
entering the critical section. This is also more consistent with what
CreateCheckpoint() does. That's also less risky for a backpatch as your
patch increases the window between the beginning of the critical section
and the real moment where the check for RecoveryInProgress is needed. A
comment explaining why the initialization needs to happen is also
essential.

All in all, this would give the simple patch attached.

Thoughts?
--
Michael

Attachments:

0001-Fix-WAL-insert-when-updating-full_page_writes-for-ch.patchtext/x-diff; charset=us-asciiDownload+10-1
#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#9)
Re: Problem while setting the fpw with SIGHUP

At Tue, 27 Mar 2018 16:46:30 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180327074630.GD9940@paquier.xyz>

I have finally been able to spend more time on this issue, and checked
for a couple of hours all the calls to RecoveryInProgress() that could
be triggered within a critical section to see if the move I suggested
previously is worth it ot not as this would cost some memory for
standbys all the time, which would suck for many read-only sessions.

There are a couple of calls potentially happening in critical sections,
however except for the one in UpdateFullPageWrites() those are used for
sanity checks in code paths that should never trigger it, take
XLogInsertBegin() for example. So with assertions this would trigger
a failure before the real elog(ERROR) message shows up.

Hence, I am changing opinion still I think that instead of the patch you
proposed first we could just do a call to InitXLogInsert() before
entering the critical section. This is also more consistent with what
CreateCheckpoint() does. That's also less risky for a backpatch as your
patch increases the window between the beginning of the critical section
and the real moment where the check for RecoveryInProgress is needed. A
comment explaining why the initialization needs to happen is also
essential.

All in all, this would give the simple patch attached.

Thoughts?

At the first look I was uneasy that the patch initializes xlog
working area earlier than required.

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

setup_xlog_workarea_when_not_standby.patchtext/x-patch; charset=us-asciiDownload+7-0
#11Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#10)
Re: Problem while setting the fpw with SIGHUP

On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby. After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites(). The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.
--
Michael

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#11)
Re: Problem while setting the fpw with SIGHUP

At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180327130226.GA1105@paquier.xyz>

On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby. After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites(). The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.

Yeah. I realized that after sending the mail.

I looked closer and found several problems there.

- On standby, StartupXLOG calls UpdateFullPageWrites and
checkpointer can call the same function simultaneously, but it
doesn't assume concurrent call.

- StartupXLOG can make a concurrent write to
Insert->fullPageWrite so it needs to be locked.

- At the time of the very end of recovery, the startup process
ignores possible change of full_page_writes GUC. It sticks with
the startup value. It leads to loss of
XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
reload)

- If checkpointer calls UpdateFullPageWrites concurrently with
change of SharedRecoveryInProgress to false in StartupXLOG the
change may lose corresponding XLOG_CHANGE_FPW.

So, if we don't accept the current behavior, what I think we
should do are all of the follows.

A. In StartupXLOG, protect write to Insert->fullPageWrites with
wal insert exlusive lock. Do the same thing for read in
UpdateFullPageWrites.

B. Surround the whole UpdateFullPageWrites with any kind of lock
to exclude concurrent calls. The attached uses ControlFileLock.
This also exludes the function with chaging of
SharedRecoveryInProgress to avoid loss of XLOG_CHANGE_FPW.

C. After exiting recovery mode, call UpdateFullPageWrites from
StartupXLOG if shared fullPageWrites is found changed from the
last known value. If checkponiter did the same thing at the
same time, one of them completes the work.

D. Call RecoveryInProgress to set up xlog working area.

The attached does that. I don't like that it uses ControlFileLock
to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
WALInsertLock cannot be used since UpdateFullPageWrites may take
the same lock.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#12)
Re: Problem while setting the fpw with SIGHUP

On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:

The attached does that. I don't like that it uses ControlFileLock
to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
WALInsertLock cannot be used since UpdateFullPageWrites may take
the same lock.

You visibly forgot your patch.
--
Michael

#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#13)
Re: Problem while setting the fpw with SIGHUP

At Wed, 28 Mar 2018 15:59:48 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180328065948.GM1105@paquier.xyz>

On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:

The attached does that. I don't like that it uses ControlFileLock
to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
WALInsertLock cannot be used since UpdateFullPageWrites may take
the same lock.

You visibly forgot your patch.

Mmm, someone must have eaten that. I'm sure it is attached this
time.

I don't like UpdateFullPageWrites is using ControlFileLock to
exclusion..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_updatefullpagewrites_concurrency.patchtext/x-patch; charset=us-asciiDownload+41-6
#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#12)
Re: Problem while setting the fpw with SIGHUP

On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180327130226.GA1105@paquier.xyz>

On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby. After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites(). The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.

Yeah. I realized that after sending the mail.

I looked closer and found several problems there.

- On standby, StartupXLOG calls UpdateFullPageWrites and
checkpointer can call the same function simultaneously, but it
doesn't assume concurrent call.

- StartupXLOG can make a concurrent write to
Insert->fullPageWrite so it needs to be locked.

- At the time of the very end of recovery, the startup process
ignores possible change of full_page_writes GUC. It sticks with
the startup value. It leads to loss of
XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
reload)

Won't this be covered by checkpointer process? Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#15)
Re: Problem while setting the fpw with SIGHUP

At Sat, 31 Mar 2018 17:43:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1JSHSPhkHZXj5q2yf3x1MgBN0oYHb9JvcoVh9ZYqB5g+w@mail.gmail.com>

On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180327130226.GA1105@paquier.xyz>

On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby. After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites(). The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.

Yeah. I realized that after sending the mail.

I looked closer and found several problems there.

- On standby, StartupXLOG calls UpdateFullPageWrites and
checkpointer can call the same function simultaneously, but it
doesn't assume concurrent call.

- StartupXLOG can make a concurrent write to
Insert->fullPageWrite so it needs to be locked.

- At the time of the very end of recovery, the startup process
ignores possible change of full_page_writes GUC. It sticks with
the startup value. It leads to loss of
XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
reload)

Won't this be covered by checkpointer process? Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

Right. Checkpointer is doing the right thing, but without
writing XLOG_FPW_CHANGE records during recovery.

The problem is in StartupXLOG. It doesn't read shared FPW flag
during recovery and updates local flag according to WAL
records. Then it tries to issue XLOG_FPW_CHANGE if its local
status and shared flag are different. This is correct.

But after that, checkpointer still cannot write XLOG (before
SharedRecovoeryInProgress becomes false) but checkpointer can
change shared fullPagesWrites without writing WAL and the WAL
record is eventually lost.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#16)
Re: Problem while setting the fpw with SIGHUP

Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20180404.172646.238325981.horiguchi.kyotaro@lab.ntt.co.jp>

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point and XLOG_FPW_CHANGE is written only for FPW's turning
off before REDO point.

Disabling FPW at any time makes sense when we need to slow down
the speed of WAL urgently, but...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Change-FPW-handling.patchtext/x-patch; charset=us-asciiDownload+31-93
#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#17)
Re: Problem while setting the fpw with SIGHUP

On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20180404.172646.238325981.horiguchi.kyotaro@lab.ntt.co.jp>

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point

I am not completely sure if that is the right option because this
would mean that we are defining the new scope for a GUC variable. I
guess we should take the input of others as well. I am not sure what
is the right way to do that, but maybe we can start a new thread with
a proper subject and description rather than discussing this under
some related bug fix patch discussion. I guess we can try that after
CF unless some other people pitch in and share their feedback.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#18)
Re: Problem while setting the fpw with SIGHUP

At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+1zULC52G_EyNcrrxFCmBi4NUuA1CoQAKu2FFPai_Teg@mail.gmail.com>

On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20180404.172646.238325981.horiguchi.kyotaro@lab.ntt.co.jp>

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point

I am not completely sure if that is the right option because this
would mean that we are defining the new scope for a GUC variable. I
guess we should take the input of others as well. I am not sure what
is the right way to do that, but maybe we can start a new thread with
a proper subject and description rather than discussing this under
some related bug fix patch discussion. I guess we can try that after
CF unless some other people pitch in and share their feedback.

I'd like to refrain from making a new thread since this topic is
registered as an open item (in Old Bugs section). Or making a new
thread then relinking it from the page is preferable?

I'm surprised a bit that this got confilcted so soon. On the way
rebasing, for anyone's information, I considered comment and
documentation fix but all comments and documentation can be read
as both old and new behavior. That being said, the patch contains
a small addtion about the new behavior.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Change-FPW-handling.patchtext/x-patch; charset=us-asciiDownload+34-94
#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro Horiguchi (#19)
Re: Problem while setting the fpw with SIGHUP

On 09/04/18 11:13, Kyotaro HORIGUCHI wrote:

At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+1zULC52G_EyNcrrxFCmBi4NUuA1CoQAKu2FFPai_Teg@mail.gmail.com>

On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20180404.172646.238325981.horiguchi.kyotaro@lab.ntt.co.jp>

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable? I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages. Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point

I am not completely sure if that is the right option because this
would mean that we are defining the new scope for a GUC variable. I
guess we should take the input of others as well. I am not sure what
is the right way to do that, but maybe we can start a new thread with
a proper subject and description rather than discussing this under
some related bug fix patch discussion. I guess we can try that after
CF unless some other people pitch in and share their feedback.

I think the new behavior where the GUC only takes effect at next
checkpoint is OK. It seems quite intuitive.

[rebased patch version]

Looks good at a quick glance. Assuming no objections from others, I'll
take a closer look and commit tomorrow. Thanks!

- Heikki

#21Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#20)
#22Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#22)
#24Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#24)
#26Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#20)
#27Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#26)
#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#27)
#29Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#28)
#30Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#29)
#31Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#27)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#36)
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#35)
#39Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#36)
#40Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#38)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#35)
#42Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#41)
#43Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#42)
#44Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#40)
#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#43)
#46Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#45)
#47Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#46)
#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#47)
#49Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#49)
#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#49)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#51)
#53Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#52)
#54Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#50)
#55Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#54)
#56Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#50)
#57Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#55)
#59Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#57)
#60Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Amit Kapila (#55)
#61Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#60)
#62Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#61)
#63Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro Horiguchi (#60)
#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#61)
#65Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#64)
#66Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#65)
#67Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#66)
#68Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#66)
#69Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#67)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#68)
#71Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#70)
#72Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#71)
#73Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#72)
#74Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#73)
#75Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#74)