CheckpointLock needed in CreateCheckPoint()?

Started by Amul Sulabout 5 years ago12 messages
#1Amul Sul
sulamul@gmail.com

Hi ALL,

Snip from CreateCheckPoint():
--
/*
* Acquire CheckpointLock to ensure only one checkpoint happens at a time.
* (This is just pro forma, since in the present system structure there is
* only one process that is allowed to issue checkpoints at any given
* time.)
*/
LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
--

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

This CheckpointLock is acquired almost at the start of CreateCheckPoint() and
released at the end. The in-between code can take a while to be complete. All
that time interrupt will be on hold which doesn't seem to be a great idea but
that wasn't problematic until now. I am having trouble due to this interrupt
hold for a long since I was trying to add code changes where the checkpointer
process is suppose to respond to the system read-write state change request as
soon as possible[1]. That cannot be done if the interrupt is disabled
since read-write state change uses the global barrier mechanism to convey system
state changes to other running processes.

So my question is, can we completely get rid of the CheckpointLock need in
CreateCheckPoint()?

Thoughts/Comments/Suggestions?

Thank you!

Regards,
Amul

1]/messages/by-id/CA+TgmoYexwDQjdd1=15KMz+7VfHVx8VHNL2qjRRK92P=CSZDxg@mail.gmail.com

#2Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amul Sul (#1)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 7, 2021 at 11:32 AM Amul Sul <sulamul@gmail.com> wrote:

Snip from CreateCheckPoint():
--
/*
* Acquire CheckpointLock to ensure only one checkpoint happens at a time.
* (This is just pro forma, since in the present system structure there is
* only one process that is allowed to issue checkpoints at any given
* time.)
*/
LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
--

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

Looks like CheckpointLock() can also be called in standalone backends
(RequestCheckpoint->CreateCheckPoint) along with the checkpointer
process. Having said that, I'm not quite sure whether it can get
called at the same time from a backend(may be with checkpoint;
command) and the checkpointer process.

/*
* If in a standalone backend, just do it ourselves.
*/
if (!IsPostmasterEnvironment)
{
/*
* There's no point in doing slow checkpoints in a standalone backend,
* because there's no other backends the checkpoint could disrupt.
*/
CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE);

See the below comment for IsPostmasterEnvironment:

/*
* IsPostmasterEnvironment is true in a postmaster process and any postmaster
* child process; it is false in a standalone process (bootstrap or
* standalone backend).

This CheckpointLock is acquired almost at the start of CreateCheckPoint() and
released at the end. The in-between code can take a while to be complete. All
that time interrupt will be on hold which doesn't seem to be a great idea but
that wasn't problematic until now. I am having trouble due to this interrupt
hold for a long since I was trying to add code changes where the checkpointer
process is suppose to respond to the system read-write state change request as
soon as possible[1]. That cannot be done if the interrupt is disabled
since read-write state change uses the global barrier mechanism to convey system
state changes to other running processes.

So my question is, can we completely get rid of the CheckpointLock need in
CreateCheckPoint()?

Thoughts/Comments/Suggestions?

I don't think we can completely get rid of CheckpointLock in
CreateCheckPoint given the fact that it can get called from multiple
processes.

How about serving that ASRO barrier request alone for checkpointer
process in ProcessInterrupts even though the CheckpointLock is held by
the checkpointer process? And also while the checkpointing is
happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
ASRO barrier has arrived. This may sound bit hacky, but just a
thought. I'm thinking that in ProcessInterrupts we can get to know the
checkpointer process type via MyAuxProcType, and whether the lock is
held or not using CheckpointLock, and then we can handle the ASRO
global barrier request.

I may be missing something.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Bharath Rupireddy (#2)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Jan 7, 2021 at 11:32 AM Amul Sul <sulamul@gmail.com> wrote:

Snip from CreateCheckPoint():
--
/*
* Acquire CheckpointLock to ensure only one checkpoint happens at a time.
* (This is just pro forma, since in the present system structure there is
* only one process that is allowed to issue checkpoints at any given
* time.)
*/
LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
--

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

Looks like CheckpointLock() can also be called in standalone backends
(RequestCheckpoint->CreateCheckPoint) along with the checkpointer
process. Having said that, I'm not quite sure whether it can get
called at the same time from a backend(may be with checkpoint;
command) and the checkpointer process.

If it is a standalone backend then there will be no postmaster and no
other process i.e. no checkpoint process also.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Amul Sul
sulamul@gmail.com
In reply to: Bharath Rupireddy (#2)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Jan 7, 2021 at 11:32 AM Amul Sul <sulamul@gmail.com> wrote:

Snip from CreateCheckPoint():
--
/*
* Acquire CheckpointLock to ensure only one checkpoint happens at a time.
* (This is just pro forma, since in the present system structure there is
* only one process that is allowed to issue checkpoints at any given
* time.)
*/
LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
--

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

Looks like CheckpointLock() can also be called in standalone backends
(RequestCheckpoint->CreateCheckPoint) along with the checkpointer
process. Having said that, I'm not quite sure whether it can get
called at the same time from a backend(may be with checkpoint;
command) and the checkpointer process.

/*
* If in a standalone backend, just do it ourselves.
*/
if (!IsPostmasterEnvironment)
{
/*
* There's no point in doing slow checkpoints in a standalone backend,
* because there's no other backends the checkpoint could disrupt.
*/
CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE);

See the below comment for IsPostmasterEnvironment:

/*
* IsPostmasterEnvironment is true in a postmaster process and any postmaster
* child process; it is false in a standalone process (bootstrap or
* standalone backend).

I am not sure I understood your point completely. Do you mean there could be
multiple bootstrap or standalone backends that could exist at a time and can
perform checkpoint?

This CheckpointLock is acquired almost at the start of CreateCheckPoint() and
released at the end. The in-between code can take a while to be complete. All
that time interrupt will be on hold which doesn't seem to be a great idea but
that wasn't problematic until now. I am having trouble due to this interrupt
hold for a long since I was trying to add code changes where the checkpointer
process is suppose to respond to the system read-write state change request as
soon as possible[1]. That cannot be done if the interrupt is disabled
since read-write state change uses the global barrier mechanism to convey system
state changes to other running processes.

So my question is, can we completely get rid of the CheckpointLock need in
CreateCheckPoint()?

Thoughts/Comments/Suggestions?

I don't think we can completely get rid of CheckpointLock in
CreateCheckPoint given the fact that it can get called from multiple
processes.

How?

How about serving that ASRO barrier request alone for checkpointer
process in ProcessInterrupts even though the CheckpointLock is held by
the checkpointer process? And also while the checkpointing is
happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
ASRO barrier has arrived. This may sound bit hacky, but just a
thought. I'm thinking that in ProcessInterrupts we can get to know the
checkpointer process type via MyAuxProcType, and whether the lock is
held or not using CheckpointLock, and then we can handle the ASRO
global barrier request.

I am afraid that this will easily get rejected by the community. Note that this
is a very crucial code path of the database server. There are other options
too, but don't want to propose them until clarity on the CheckpointLock
necessity.

Regards,
Amul

#5Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Amul Sul (#4)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 7, 2021 at 1:22 PM Amul Sul <sulamul@gmail.com> wrote:

I am not sure I understood your point completely. Do you mean there could be
multiple bootstrap or standalone backends that could exist at a time and can
perform checkpoint?

Actually, my understanding of the standalone backend was wrong
initially. Now, I'm clear with that.

I don't think we can completely get rid of CheckpointLock in
CreateCheckPoint given the fact that it can get called from multiple
processes.

How?

When the server is run in a standalone backend, it seems like there
can be only one process running in single user mode, as pointed by
Dilip upthread, so there will be no simultaneous calls to
CreateCheckPoint.

How about serving that ASRO barrier request alone for checkpointer
process in ProcessInterrupts even though the CheckpointLock is held by
the checkpointer process? And also while the checkpointing is
happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
ASRO barrier has arrived. This may sound bit hacky, but just a
thought. I'm thinking that in ProcessInterrupts we can get to know the
checkpointer process type via MyAuxProcType, and whether the lock is
held or not using CheckpointLock, and then we can handle the ASRO
global barrier request.

I am afraid that this will easily get rejected by the community. Note that this
is a very crucial code path of the database server. There are other options
too, but don't want to propose them until clarity on the CheckpointLock
necessity.

I understand that.

I'm sorry, if I have sidetracked the main point here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#6Robert Haas
robertmhaas@gmail.com
In reply to: Amul Sul (#1)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 7, 2021 at 1:02 AM Amul Sul <sulamul@gmail.com> wrote:

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

Yeah, I think this lock is useless. In fact, I think it's harmful,
because it makes large sections of the checkpointer code, basically
all of it really, run with interrupts held off for no reason.

It's not impossible that removing the lock could reveal bugs
elsewhere: suppose we have segments of code that actually aren't safe
to interrupt, but because of this LWLock, it never happens. But, if
that happens to be true, I think we should just view those cases as
bugs to be fixed.

One thing that blunts the impact of this quite a bit is that the
checkpointer doesn't use rely much on CHECK_FOR_INTERRUPTS() in the
first place. It has its own machinery: HandleCheckpointerInterrupts().
Possibly that's something we should be looking to refactor somehow as
well.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#6)
1 attachment(s)
Re: CheckpointLock needed in CreateCheckPoint()?

On Thu, Jan 14, 2021 at 10:21 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I think this lock is useless. In fact, I think it's harmful,
because it makes large sections of the checkpointer code, basically
all of it really, run with interrupts held off for no reason.

It's not impossible that removing the lock could reveal bugs
elsewhere: suppose we have segments of code that actually aren't safe
to interrupt, but because of this LWLock, it never happens. But, if
that happens to be true, I think we should just view those cases as
bugs to be fixed.

One thing that blunts the impact of this quite a bit is that the
checkpointer doesn't use rely much on CHECK_FOR_INTERRUPTS() in the
first place. It has its own machinery: HandleCheckpointerInterrupts().
Possibly that's something we should be looking to refactor somehow as
well.

Here's a patch to remove CheckpointLock completely. For
ProcessInterrupts() to do anything, one of the following things would
have to be true:

1. ProcDiePending = true. Normally this is set by die(), but the
checkpointer does not use die(). Perhaps someday it should, or maybe
not, but that's an issue for another day.

2. ClientConnectionLost = true. This gets set in pqcomm.c's
internal_flush(), but the checkpointer has no client connection.

3. DoingCommandRead = true. Can't happen; only set in PostgresMain().

4. QueryCancelPending = true. Can be set by recovery conflicts, but
those shouldn't happen in the checkpointer. Normally set by
StatementCancelHandler, which the checkpointer doesn't use.

5. IdleInTransactionSessionTimeoutPending = true. Set up by
InitPostgres(), which is not used by auxiliary processes.

6. ProcSignalBarrierPending = true. This is the case we actually care
about allowing for the ASRO patch, so if this occurs, it's good.

7. ParallelMessagePending = true. The checkpointer definitely never
starts parallel workers.

So I don't see any problem with just ripping this out entirely, but
I'd like to know if anybody else does.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Remove-CheckpointLock.patchapplication/octet-stream; name=v1-0001-Remove-CheckpointLock.patchDownload
From b3d1478d1ac0c0df6c36b0cd097d23b9b77d4243 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 18 Jan 2021 13:14:39 -0500
Subject: [PATCH v1] Remove CheckpointLock.

Up until now, we've held this lock when performing a checkpoint or
restartpoint, but this isn't actually necessary, because only one
process can be performing a checkpoint at a time: either the
checkpointer, during normal operation, or the postmaster, during
single-user operation. So, we don't need the lock.

One possible concern in making this change is that it means that
a substantial amount of code where HOLD_INTERRUPTS() was previously
in effect due to the preceding LWLockAcquire() would now be
running without that. This could mean that ProcessInterrupts()
gets called in places from which it didn't before. However, this
seems unlikely to do very much, because the checkpointer doesn't
have any signal mapped to die(), so it's not clear how,
for example, ProcDiePending = true could happen in the first
place. Similarly with ClientConnectionLost and recovery conflicts.
---
 doc/src/sgml/monitoring.sgml             |  4 ----
 src/backend/access/heap/rewriteheap.c    |  4 ++--
 src/backend/access/transam/xlog.c        | 28 +-----------------------
 src/backend/replication/logical/origin.c |  4 ++--
 src/backend/storage/lmgr/lwlocknames.txt |  2 +-
 5 files changed, 6 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f05140dd42..9496f76b1f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1880,10 +1880,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       <entry>Waiting to associate a data block with a buffer in the buffer
        pool.</entry>
      </row>
-     <row>
-      <entry><literal>Checkpoint</literal></entry>
-      <entry>Waiting to begin a checkpoint.</entry>
-     </row>
      <row>
       <entry><literal>CheckpointerComm</literal></entry>
       <entry>Waiting to manage fsync requests.</entry>
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 29ffe40670..fcaad9ba0b 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1256,8 +1256,8 @@ CheckPointLogicalRewriteHeap(void)
 
 			/*
 			 * The file cannot vanish due to concurrency since this function
-			 * is the only one removing logical mappings and it's run while
-			 * CheckpointLock is held exclusively.
+			 * is the only one removing logical mappings and only one
+			 * checkpoint can be in progress at a time.
 			 */
 			if (fd < 0)
 				ereport(ERROR,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 470e113b33..cc007b8963 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -430,10 +430,6 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
  * ControlFileLock: must be held to read/update control file or create
  * new log file.
  *
- * CheckpointLock: must be held to do a checkpoint or restartpoint (ensures
- * only one checkpointer at a time; currently, with all checkpoints done by
- * the checkpointer, this is just pro forma).
- *
  *----------
  */
 
@@ -8864,14 +8860,6 @@ CreateCheckPoint(int flags)
 	 */
 	InitXLogInsert();
 
-	/*
-	 * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
-	 * (This is just pro forma, since in the present system structure there is
-	 * only one process that is allowed to issue checkpoints at any given
-	 * time.)
-	 */
-	LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
-
 	/*
 	 * Prepare to accumulate statistics.
 	 *
@@ -8941,7 +8929,6 @@ CreateCheckPoint(int flags)
 		if (last_important_lsn == ControlFile->checkPoint)
 		{
 			WALInsertLockRelease();
-			LWLockRelease(CheckpointLock);
 			END_CRIT_SECTION();
 			ereport(DEBUG1,
 					(errmsg("checkpoint skipped because system is idle")));
@@ -9241,15 +9228,12 @@ CreateCheckPoint(int flags)
 									 CheckpointStats.ckpt_segs_added,
 									 CheckpointStats.ckpt_segs_removed,
 									 CheckpointStats.ckpt_segs_recycled);
-
-	LWLockRelease(CheckpointLock);
 }
 
 /*
  * Mark the end of recovery in WAL though without running a full checkpoint.
  * We can expect that a restartpoint is likely to be in progress as we
- * do this, though we are unwilling to wait for it to complete. So be
- * careful to avoid taking the CheckpointLock anywhere here.
+ * do this, though we are unwilling to wait for it to complete.
  *
  * CreateRestartPoint() allows for the case where recovery may end before
  * the restartpoint completes so there is no concern of concurrent behaviour.
@@ -9399,12 +9383,6 @@ CreateRestartPoint(int flags)
 	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
 
-	/*
-	 * Acquire CheckpointLock to ensure only one restartpoint or checkpoint
-	 * happens at a time.
-	 */
-	LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
-
 	/* Get a local copy of the last safe checkpoint record. */
 	SpinLockAcquire(&XLogCtl->info_lck);
 	lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
@@ -9420,7 +9398,6 @@ CreateRestartPoint(int flags)
 	{
 		ereport(DEBUG2,
 				(errmsg("skipping restartpoint, recovery has already ended")));
-		LWLockRelease(CheckpointLock);
 		return false;
 	}
 
@@ -9455,7 +9432,6 @@ CreateRestartPoint(int flags)
 			UpdateControlFile();
 			LWLockRelease(ControlFileLock);
 		}
-		LWLockRelease(CheckpointLock);
 		return false;
 	}
 
@@ -9621,8 +9597,6 @@ CreateRestartPoint(int flags)
 			 xtime ? errdetail("Last completed transaction was at log time %s.",
 							   timestamptz_to_str(xtime)) : 0));
 
-	LWLockRelease(CheckpointLock);
-
 	/*
 	 * Finally, execute archive_cleanup_command, if any.
 	 */
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 77781d059d..9bd761a426 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -559,8 +559,8 @@ CheckPointReplicationOrigin(void)
 						tmppath)));
 
 	/*
-	 * no other backend can perform this at the same time, we're protected by
-	 * CheckpointLock.
+	 * no other backend can perform this at the same time; only one
+	 * checkpoint can happen at a time.
 	 */
 	tmpfd = OpenTransientFile(tmppath,
 							  O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..6c7cf6c295 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -15,7 +15,7 @@ SInvalWriteLock						6
 WALBufMappingLock					7
 WALWriteLock						8
 ControlFileLock						9
-CheckpointLock						10
+# 10 was CheckpointLock
 XactSLRULock						11
 SubtransSLRULock					12
 MultiXactGenLock					13
-- 
2.24.3 (Apple Git-128)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: CheckpointLock needed in CreateCheckPoint()?

Robert Haas <robertmhaas@gmail.com> writes:

Here's a patch to remove CheckpointLock completely. ...
So I don't see any problem with just ripping this out entirely, but
I'd like to know if anybody else does.

If memory serves, the reason for the lock was that the CHECKPOINT
command used to run the checkpointing code directly in the calling
backend, so we needed it to keep more than one process from doing
that at once. AFAICS, it's no longer possible for more than one
process to try to run that code concurrently, so we shouldn't need
the lock anymore.

regards, tom lane

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: CheckpointLock needed in CreateCheckPoint()?

On Mon, Jan 18, 2021 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If memory serves, the reason for the lock was that the CHECKPOINT
command used to run the checkpointing code directly in the calling
backend, so we needed it to keep more than one process from doing
that at once. AFAICS, it's no longer possible for more than one
process to try to run that code concurrently, so we shouldn't need
the lock anymore.

Interesting. I think that must have been a *very* long time ago.
Perhaps 076a055acf3c55314de267c62b03191586d79cf6 from 2004?

--
Robert Haas
EDB: http://www.enterprisedb.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: CheckpointLock needed in CreateCheckPoint()?

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jan 18, 2021 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If memory serves, the reason for the lock was that the CHECKPOINT
command used to run the checkpointing code directly in the calling
backend, so we needed it to keep more than one process from doing
that at once. AFAICS, it's no longer possible for more than one
process to try to run that code concurrently, so we shouldn't need
the lock anymore.

Interesting. I think that must have been a *very* long time ago.

Yeah. Digging further, it looks like I oversimplified things above:
we once launched special background-worker-like processes for checkpoints,
and there could be more than one at a time. That seems to have changed
here:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master Release: REL8_0_BR [076a055ac] 2004-05-29 22:48:23 +0000

Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files. Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.

At the time I thought the CheckpointLock had become totally pro forma,
but there are later references to having to prevent a checkpoint from
running concurrently with a restartpoint, which was originally done
within the startup/WAL-replay process. It looks like that changed here:

Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Branch: master Release: REL8_4_BR [7e48b77b1] 2009-06-25 21:36:00 +0000

Fix some serious bugs in archive recovery, now that bgwriter is active
during it:

When bgwriter is active, the startup process can't perform mdsync() correctly
because it won't see the fsync requests accumulated in bgwriter's private
pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery
checkpoint as well, when it's active.

(The modern checkpointer process wasn't split out of bgwriter until
806a2aee3 of 2011-11-01.)

regards, tom lane

#11Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#7)
Re: CheckpointLock needed in CreateCheckPoint()?

On Mon, Jan 18, 2021 at 02:36:48PM -0500, Robert Haas wrote:

Here's a patch to remove CheckpointLock completely. For
ProcessInterrupts() to do anything, one of the following things would
have to be true:

[...]

So I don't see any problem with just ripping this out entirely, but
I'd like to know if anybody else does.

Agreed, +1.
--
Michael

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: CheckpointLock needed in CreateCheckPoint()?

On Mon, Jan 18, 2021 at 5:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. Digging further, it looks like I oversimplified things above:
we once launched special background-worker-like processes for checkpoints,
and there could be more than one at a time.

Thanks. I updated the commit message to mention some of the relevant
commits, and then committed this to master.

--
Robert Haas
EDB: http://www.enterprisedb.com