Checkpoint start logging is done inside critical section
The LogCheckpointStart() call inside CreateCheckPoint() is done while
inside a critical section. The elog call could trigger errors due to
memory allocations or from a logging hook, resulting in a panic. It
seems better to postpone the logging until after the critical section
is done. It's only a few lwlock acquisitions away and shouldn't make
any material difference. Patch to do so is attached.
Regards,
Ants Aasma
Attachments:
log-checkpoint-after-critical-section.patchtext/x-patch; charset=US-ASCII; name=log-checkpoint-after-critical-section.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78ffc..faa9690e48 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8907,15 +8907,6 @@ CreateCheckPoint(int flags)
XLogCtl->RedoRecPtr = checkPoint.redo;
SpinLockRelease(&XLogCtl->info_lck);
- /*
- * If enabled, log checkpoint start. We postpone this until now so as not
- * to log anything if we decided to skip the checkpoint.
- */
- if (log_checkpoints)
- LogCheckpointStart(flags, false);
-
- TRACE_POSTGRESQL_CHECKPOINT_START(flags);
-
/*
* Get the other info we need for the checkpoint record.
*
@@ -8962,6 +8953,15 @@ CreateCheckPoint(int flags)
*/
END_CRIT_SECTION();
+ /*
+ * If enabled, log checkpoint start. We postpone this until now so as not
+ * to log anything if we decided to skip the checkpoint.
+ */
+ if (log_checkpoints)
+ LogCheckpointStart(flags, false);
+
+ TRACE_POSTGRESQL_CHECKPOINT_START(flags);
+
/*
* In some cases there are groups of actions that must all occur on one
* side or the other of a checkpoint record. Before flushing the
On Thu, Oct 18, 2018 at 4:44 AM Ants Aasma <ants.aasma@eesti.ee> wrote:
The LogCheckpointStart() call inside CreateCheckPoint() is done while
inside a critical section. The elog call could trigger errors due to
memory allocations or from a logging hook, resulting in a panic.
Yeah, but we use logging hook inside LWLockAcquire as well which is
also called inside critical section in the nearby code, not sure if we
can do anything about it.
It
seems better to postpone the logging until after the critical section
is done. It's only a few lwlock acquisitions away and shouldn't make
any material difference. Patch to do so is attached.
+1. I think we should backpatch this as well. I can take it forward
unless people have objections to it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi,
On 2018-10-18 10:21:39 +0530, Amit Kapila wrote:
On Thu, Oct 18, 2018 at 4:44 AM Ants Aasma <ants.aasma@eesti.ee> wrote:
The LogCheckpointStart() call inside CreateCheckPoint() is done while
inside a critical section. The elog call could trigger errors due to
memory allocations or from a logging hook, resulting in a panic.Yeah, but we use logging hook inside LWLockAcquire as well which is
also called inside critical section in the nearby code, not sure if we
can do anything about it.
It
seems better to postpone the logging until after the critical section
is done. It's only a few lwlock acquisitions away and shouldn't make
any material difference. Patch to do so is attached.+1. I think we should backpatch this as well. I can take it forward
unless people have objections to it.
We do a fair bit of allocations inside a critical section during a
checkpoint (that's why we mark the ctx as being ok with that). I've a
very hard time seing this small allocation be a problem. I definitely
don't think there's a need to backpatch that.
Greetings,
Andres Freund
On Thu, Oct 18, 2018 at 10:27 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-10-18 10:21:39 +0530, Amit Kapila wrote:
On Thu, Oct 18, 2018 at 4:44 AM Ants Aasma <ants.aasma@eesti.ee> wrote:
The LogCheckpointStart() call inside CreateCheckPoint() is done while
inside a critical section. The elog call could trigger errors due to
memory allocations or from a logging hook, resulting in a panic.Yeah, but we use logging hook inside LWLockAcquire as well which is
also called inside critical section in the nearby code, not sure if we
can do anything about it.It
seems better to postpone the logging until after the critical section
is done. It's only a few lwlock acquisitions away and shouldn't make
any material difference. Patch to do so is attached.+1. I think we should backpatch this as well. I can take it forward
unless people have objections to it.We do a fair bit of allocations inside a critical section during a
checkpoint
The other possibility of allocation is during
LocalSetXLogInsertAllowed(), but we make provision for that before
starting critical section. Which other cases do you see where
allocation is possible? I think in general, we take care that we
don't allocate inside a critical section, otherwise, it will lead to
at the very least assertion failure unless we have marked the memory
context as "allowed in crit section".
(that's why we mark the ctx as being ok with that).
Yeah, as the palloc for log message would be called in an ErrorContext
where it is safe to do the allocation, so ideally this shouldn't be a
problem. So, it seems to me that this is not a problem, Ants, do you
see any problem in any particular scenario or was this based on
theoretical analysis?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 18, 2018 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 18, 2018 at 10:27 AM Andres Freund <andres@anarazel.de> wrote:
(that's why we mark the ctx as being ok with that).
Yeah, as the palloc for log message would be called in an ErrorContext
where it is safe to do the allocation, so ideally this shouldn't be a
problem. So, it seems to me that this is not a problem, Ants, do you
see any problem in any particular scenario or was this based on
theoretical analysis?
This was purely theoretical, as also evidenced by lack of complaints
even though the code has been like that for a very long time. I was
actually mostly worried about extension code run by logging hook
causing the panic.
Regards,
Ants Aasma