May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
Hello!
Found a periodic spike growth of the checkpoint_req counter on replica by 20-30 units
after large insert (~350Mb) on master.
Reproduction on master and replica with default conf:
1) execute the command "insert into test values (generate_series(1,1E7));".
This leads to the table's growth by about 350Mb during about 15 secs (on my pc).
2)The wal records start coming to the replica, and when their number exceeds a certain limit, a request is emitted to the checkpointer process to create restartpoint on the replica and checkpoint_req is incremented. With default settings, this limit is 42 segments.
3) Restartpoint creation fails because a new restartpoint can only be created if the replica has received new WAL records about the checkpoint from the moment of the previous restartpoint. But there were no such records.
4) When the next WAL segment is received by replica, the next request is generated to create a restartpoint on the replica, and so on.
5) Finally, a WAL record about the checkpoint arrives on the replica, restartpoint is created and the growth of checkpoint_req stops.
The described process can be observed in the log with additional debugging. See insert_1E7_once.log attached. This
log is for v13 but master has the same behavior.
Can we treat such behavior as a bug?
If so it seems possible to check if a creating of restartpoint is obviously impossible before sending request and don't send it at all if so.
The patch applied tries to fix it.
With best regards.
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
insert_1E7_once.logtext/x-log; charset=UTF-8; name=insert_1E7_once.logDownload
022-09-04 21:09:45.160 MSK [2424110] LOG: Start CFS version 0.54 supported compression algorithms pglz,zlib encryption disabled GC enabled
2022-09-04 21:09:45.168 MSK [2424110] LOG: starting PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2022-09-04 21:09:45.168 MSK [2424110] LOG: listening on IPv4 address "0.0.0.0", port 54131
2022-09-04 21:09:45.168 MSK [2424110] LOG: listening on IPv6 address "::", port 54131
2022-09-04 21:09:45.177 MSK [2424110] LOG: listening on Unix socket "/tmp/.s.PGSQL.54131"
2022-09-04 21:09:45.187 MSK [2424150] LOG: database system was interrupted; last known up at 2022-09-04 21:09:44 MSK
2022-09-04 21:09:45.282 MSK [2424150] LOG: entering standby mode
2022-09-04 21:09:45.292 MSK [2424150] LOG: redo starts at 0/2000028
2022-09-04 21:09:45.296 MSK [2424150] LOG: consistent recovery state reached at 0/2000198
2022-09-04 21:09:45.296 MSK [2424110] LOG: database system is ready to accept read only connections
2022-09-04 21:09:45.307 MSK [2424233] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2022-09-04 21:09:45.341 MSK [2424259] LOG: Start 1 background garbage collection workers for CFS
2022-09-04 21:10:26.653 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 43, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:10:26.653 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 0, done: 0, failed: 0.
Now: 41. Last_chkpt_time: 0. Elapsed secs: 41. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:26.653 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:10:26.669 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:10:28.926 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 44, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:10:28.926 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 1, done: 1, failed: 0.
Now: 43. Last_chkpt_time: -244. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:28.926 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:10:29.176 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:10:30.014 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 45, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:10:30.014 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 2, done: 2, failed: 0.
Now: 45. Last_chkpt_time: -242. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:30.014 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:10:30.058 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:10:45.072 MSK [2424224] LOG: @@@@@@@@@@@@Checkpoint requested by time! Now: 60. Last_chkpt_time: -240. Elapsed secs: 300. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:10:45.072 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:10:45.077 MSK [2424224] LOG: !!!Checkpoint NOT performed!!!
2022-09-04 21:11:00.092 MSK [2424224] LOG: @@@@@@@@@@@@Checkpoint requested by time! Now: 75. Last_chkpt_time: -225. Elapsed secs: 300. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:00.092 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:00.092 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:15.104 MSK [2424224] LOG: @@@@@@@@@@@@Checkpoint requested by time! Now: 90. Last_chkpt_time: -210. Elapsed secs: 300. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:15.104 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:15.113 MSK [2424224] LOG: !!!Checkpoint NOT performed!!!
2022-09-04 21:11:16.187 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 46, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:16.187 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 6, done: 6, failed: 0.
Now: 91. Last_chkpt_time: -195. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:16.188 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:16.192 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:17.455 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 47, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:17.455 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 7, done: 7, failed: 0.
Now: 92. Last_chkpt_time: -194. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:17.455 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:17.459 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:18.672 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 48, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:18.672 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 8, done: 8, failed: 0.
Now: 93. Last_chkpt_time: -193. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:18.672 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:18.676 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:20.049 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 49, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:20.049 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 9, done: 9, failed: 0.
Now: 95. Last_chkpt_time: -192. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:20.049 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:20.054 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:21.405 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 50, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:21.405 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 10, done: 10, failed: 0.
Now: 96. Last_chkpt_time: -190. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:21.405 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:21.411 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:22.939 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 51, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:22.939 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 11, done: 11, failed: 0.
Now: 97. Last_chkpt_time: -189. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:22.939 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:22.944 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:24.395 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 52, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:24.395 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 12, done: 12, failed: 0.
Now: 99. Last_chkpt_time: -188. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:24.395 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:24.400 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:25.860 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 53, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:25.861 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 13, done: 13, failed: 0.
Now: 100. Last_chkpt_time: -186. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:25.861 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:25.865 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:27.336 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 54, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:27.336 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 14, done: 14, failed: 0.
Now: 102. Last_chkpt_time: -185. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:27.336 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:27.343 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:28.797 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 55, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:28.798 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 15, done: 15, failed: 0.
Now: 103. Last_chkpt_time: -183. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:28.798 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:28.802 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:31.371 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 56, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:31.372 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 16, done: 16, failed: 0.
Now: 106. Last_chkpt_time: -182. Elapsed secs: 288. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:31.372 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:31.378 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:32.836 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 57, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:32.836 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 17, done: 17, failed: 0.
Now: 107. Last_chkpt_time: -179. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:32.836 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:32.841 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:34.323 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 58, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:34.324 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 18, done: 18, failed: 0.
Now: 109. Last_chkpt_time: -178. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:34.324 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:34.328 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:35.754 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 59, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:35.754 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 19, done: 19, failed: 0.
Now: 110. Last_chkpt_time: -176. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:35.754 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:35.761 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:36.737 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 60, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:36.737 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 20, done: 20, failed: 0.
Now: 111. Last_chkpt_time: -175. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:36.737 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:36.741 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:37.750 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 61, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:37.750 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 21, done: 21, failed: 0.
Now: 112. Last_chkpt_time: -174. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:37.750 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:37.756 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:38.799 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 62, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:38.799 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 22, done: 22, failed: 0.
Now: 113. Last_chkpt_time: -173. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:38.799 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:38.807 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:39.749 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 63, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:39.749 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 23, done: 23, failed: 0.
Now: 114. Last_chkpt_time: -172. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:39.749 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:39.755 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:40.839 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 64, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:40.839 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 24, done: 24, failed: 0.
Now: 115. Last_chkpt_time: -171. Elapsed secs: 286. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:40.839 MSK [2424224] LOG: !!!This is restartpoint!!!
2022-09-04 21:11:40.844 MSK [2424224] LOG: !!!Restartpoint NOT performed!!!
2022-09-04 21:11:42.205 MSK [2424150] LOG: &&&&&&&&&&&&&&&&!!!Request a restartpoint if we've replayed too much xlog!!!
readSegNo: 65, old_segno: 2, CheckPointSegments: 42
2022-09-04 21:11:42.205 MSK [2424224] LOG: Checkpoint requested by wal. Checkpoints already started: 25, done: 25, failed: 0.
Now: 117. Last_chkpt_time: -170. Elapsed secs: 287. CheckPointTimeout: 300. CheckPointWarning: 30
2022-09-04 21:11:42.205 MSK [2424224] LOG: !!!This is restartpoint!!!
v1-0001-Fix-burst-checkpoint_req-growth.patchtext/x-patch; charset=UTF-8; name=v1-0001-Fix-burst-checkpoint_req-growth.patchDownload
commit 62b447282d7436642984005627f966f93f4a2439
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Tue Sep 6 12:18:56 2022 +0300
Remove burst growth of the checkpoint_req on replica.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7a710e6490..0c7510bca5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8880,3 +8880,24 @@ SetWalWriterSleeping(bool sleeping)
XLogCtl->WalWriterSleeping = sleeping;
SpinLockRelease(&XLogCtl->info_lck);
}
+
+/*
+ * Update the WalWriterSleeping flag.
+ */
+bool IsNewCheckpointWALRecs(void)
+{
+ bool result = false;
+ /*
+ * Get the last safe checkpoint record and check if
+ * there is a new checkpoint WAL records since the
+ * last restartpoint.
+ */
+ SpinLockAcquire(&XLogCtl->info_lck);
+ if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr)
+ && XLogCtl->lastCheckPoint.redo >
+ ControlFile->checkPointCopy.redo)
+ result = true;
+ SpinLockRelease(&XLogCtl->info_lck);
+
+ return result;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index ae2af5ae3d..2502e4f726 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3194,7 +3194,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
- RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ {
+ /*
+ * If there is no new checkpoint WAL records since the
+ * last restartpoint the creation of new one
+ * will certainly fail, so skip it.
+ */
+ if (IsNewCheckpointWALRecs())
+ RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ }
}
}
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0aa85d90e8..d1431ab9b0 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -82,6 +82,7 @@ extern void XLogRecoveryShmemInit(void);
extern void InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdownPtr, bool *haveBackupLabel, bool *haveTblspcMap);
extern void PerformWalRecovery(void);
+extern bool IsNewCheckpointWALRecs(void);
/*
* FinishWalRecovery() returns this. It contains information about the point
At Tue, 6 Sep 2022 14:02:53 +0300, "Anton A. Melnikov" <aamelnikov@inbox.ru> wrote in
Can we treat such behavior as a bug?
Depends on how we see the counter value. I think this can be annoying
but not a bug. CreateRestartPoint already handles that case.
While standby is well catching up, startup may make requests once per
segment switch while primary is running the latest checkpoint since
standby won't receive a checkpoint record until the primary ends the
last checkpoint.
If so it seems possible to check if a creating of restartpoint is
obviously impossible before sending request and don't send it at all
if so.The patch applied tries to fix it.
It lets XLogPageRead run the same check with what CreateRestartPoint
does, so it basically works (it is forgetting a lock, though. The
reason for omitting the lock in CreateRestartPoint is that it knows
checkopinter is the only updator of the shared variable.). I'm not
sure I like that for the code duplication.
I'm not sure we need to fix that but if we do that, I would impletent
IsNewCheckPointWALRecs() using XLogCtl->RedoRecPtr and
XLogCtl->lastCheckPoint.redo instead since they are protected by the
same lock, and they work more correct way, that is, that can avoid
restartpoint requests while the last checkpoint is running. And I
would rename it as RestartPointAvailable() or something like that.
Or I might want to add XLogRestartpointNeeded(readSegNo) to reduce the
required number of info_lck by reading XLogCtl members at once.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hello!
Thank you very much for your feedback and essential remarks.
On 07.09.2022 10:39, Kyotaro Horiguchi wrote:
It lets XLogPageRead run the same check with what CreateRestartPoint
does, so it basically works (it is forgetting a lock, though. The
reason for omitting the lock in CreateRestartPoint is that it knows
checkopinter is the only updator of the shared variable.). I'm not
sure I like that for the code duplication.I'm not sure we need to fix that but if we do that, I would impletent
IsNewCheckPointWALRecs() using XLogCtl->RedoRecPtr and
XLogCtl->lastCheckPoint.redo instead since they are protected by the
same lock, and they work more correct way, that is, that can avoid
restartpoint requests while the last checkpoint is running. And I
would rename it as RestartPointAvailable() or something like that.
Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch).
The access to Controlfile was removed so lwlock seems to be not needed.
Some logic duplication is still present and and i'm not quite sure if
it's possible to get rid of it. Would be glad to any suggestions.
Or I might want to add XLogRestartpointNeeded(readSegNo) to reduce the
required number of info_lck by reading XLogCtl members at once.
If we place this check into the XLogCheckpointNeeded() this will lead to a double
take of info_lck in XLogPageRead() when the restartpoint request is forming.
As it's done now taking of info_lck will be more rarely.
It seems i probably didn't understand your idea, please clarify it for me.
Depends on how we see the counter value. I think this can be annoying
but not a bug. CreateRestartPoint already handles that case.
Yes! It is in fact annoying as docs says that checkpoint_req counts
"the number of requested checkpoints that have been performed".
But really checkpoints_req counts both the number of checkpoints requests
and restartpoint ones which may not be performed and resources are not spent.
The second frightening factor is the several times faster growth
of the checkpoints_timed counter on the replica vs primary due to scheduling
replays in 15 second if an attempt to create the restartpoint failed.
Here is a patch that leaves all logic as is, but adds a stats about
restartpoints. (v1-0001-Add-restartpoint-stats.patch)
.
For instance, for the same period on primary with this patch:
# SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx
current_time
--------------------
00:19:15.794561+03
(1 row)
-[ RECORD 1 ]---------+-----------------------------
checkpoints_timed | 4
checkpoints_req | 10
restartpoints_timed | 0
restartpoints_req | 0
restartpoints_done | 0
On replica:
# SELECT CURRENT_TIME; select * from pg_stat_bgwriter \gx
current_time
--------------------
00:19:11.363009+03
(1 row)
-[ RECORD 1 ]---------+------------------------------
checkpoints_timed | 0
checkpoints_req | 0
restartpoints_timed | 42
restartpoints_req | 67
restartpoints_done | 10
Only the counters checkpoints_timed, checkpoints_req and restartpoints_done give
the indication of resource-intensive operations.
Without this patch, the user would see on the replica something like this:
checkpoints_timed | 42
checkpoints_req | 109
With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v1-0001-Add-restartpoint-stats.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-restartpoint-stats.patchDownload
commit 9a155dd54634aa83dbab0852849e0a69288b0b9f
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Thu Sep 15 01:53:17 2022 +0300
Add statistic about restartpoint into pg_stat_bgwriter
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..5ef2ce0c89 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1104,6 +1104,9 @@ CREATE VIEW pg_stat_bgwriter AS
SELECT
pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fc076fc14..2296701ddf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -350,6 +350,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
+ bool chkpt_or_rstpt_requested = false;
+ bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@@ -368,7 +370,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
- PendingCheckpointerStats.requested_checkpoints++;
+ chkpt_or_rstpt_requested = true;
}
/*
@@ -382,7 +384,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
- PendingCheckpointerStats.timed_checkpoints++;
+ chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@@ -418,6 +420,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
+ if (chkpt_or_rstpt_requested)
+ {
+ chkpt_or_rstpt_requested = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.requested_restartpoints++;
+ else
+ PendingCheckpointerStats.requested_checkpoints++;
+ }
+
+ if (chkpt_or_rstpt_timed)
+ {
+ chkpt_or_rstpt_timed = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.timed_restartpoints++;
+ else
+ PendingCheckpointerStats.timed_checkpoints++;
+ }
+
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -481,6 +501,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
+
+ if (do_restartpoint)
+ PendingCheckpointerStats.performed_restartpoints++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index af8d513e7b..9de99a2cf1 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(timed_checkpoints);
CHECKPOINTER_ACC(requested_checkpoints);
+ CHECKPOINTER_ACC(timed_restartpoints);
+ CHECKPOINTER_ACC(requested_restartpoints);
+ CHECKPOINTER_ACC(performed_restartpoints);
CHECKPOINTER_ACC(checkpoint_write_time);
CHECKPOINTER_ACC(checkpoint_sync_time);
CHECKPOINTER_ACC(buf_written_checkpoints);
@@ -112,6 +115,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(timed_checkpoints);
CHECKPOINTER_COMP(requested_checkpoints);
+ CHECKPOINTER_COMP(timed_restartpoints);
+ CHECKPOINTER_COMP(requested_restartpoints);
+ CHECKPOINTER_COMP(performed_restartpoints);
CHECKPOINTER_COMP(checkpoint_write_time);
CHECKPOINTER_COMP(checkpoint_sync_time);
CHECKPOINTER_COMP(buf_written_checkpoints);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index be15b4b2e5..f76448eba6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1668,6 +1668,24 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints);
}
+Datum
+pg_stat_get_bgwriter_timed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_requested_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_performed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->performed_restartpoints);
+}
+
Datum
pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a07e737a33..a1d1fe6796 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5604,6 +5604,21 @@
proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' },
+{ oid => '9938',
+ descr => 'statistics: number of timed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_timed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_timed_restartpoints' },
+{ oid => '9939',
+ descr => 'statistics: number of backend requested restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_requested_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_requested_restartpoints' },
+{ oid => '9940',
+ descr => 'statistics: number of backend performed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_performed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_performed_restartpoints' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the bgwriter during checkpoints',
proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ac28f813b4..680ff6f39b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -269,6 +269,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter timed_checkpoints;
PgStat_Counter requested_checkpoints;
+ PgStat_Counter timed_restartpoints;
+ PgStat_Counter requested_restartpoints;
+ PgStat_Counter performed_restartpoints;
PgStat_Counter checkpoint_write_time; /* times in milliseconds */
PgStat_Counter checkpoint_sync_time;
PgStat_Counter buf_written_checkpoints;
v2-0001-Fix-burst-checkpoint_req-growth.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-burst-checkpoint_req-growth.patchDownload
commit d5a58d8585692be0d24db9414859088e3e7f7dad
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Tue Sep 6 12:18:56 2022 +0300
Remove burst growth of the checkpoint_req on replica.
with correcttions according to comment:
https://www.postgresql.org/message-id/20220907.163946.2214338139097492905.horikyota.ntt%40gmail.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 81d339d57d..3ed1a87943 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9054,3 +9054,20 @@ SetWalWriterSleeping(bool sleeping)
XLogCtl->WalWriterSleeping = sleeping;
SpinLockRelease(&XLogCtl->info_lck);
}
+
+/*
+ * Check if a new checkpoint WAL record has been received since the
+ * last restartpoint. So it is possible to create a new one.
+ */
+bool RestartPointAvailable(void)
+{
+ bool result = false;
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+ if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr)
+ && XLogCtl->lastCheckPoint.redo > XLogCtl->RedoRecPtr)
+ result = true;
+ SpinLockRelease(&XLogCtl->info_lck);
+
+ return result;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index b41e682664..7236e0f0a4 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3199,7 +3199,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
- RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ {
+ /*
+ * If there is no new checkpoint WAL records since the
+ * last restartpoint the creation of new one
+ * will certainly fail, so skip it.
+ */
+ if (RestartPointAvailable())
+ RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ }
}
}
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index 0aa85d90e8..a248f2251e 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -82,6 +82,7 @@ extern void XLogRecoveryShmemInit(void);
extern void InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdownPtr, bool *haveBackupLabel, bool *haveTblspcMap);
extern void PerformWalRecovery(void);
+extern bool RestartPointAvailable(void);
/*
* FinishWalRecovery() returns this. It contains information about the point
Hi,
On 2022-09-19 01:29:21 +0300, Anton A. Melnikov wrote:
Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch).
This patch doesn't pass the main regression tests tests successfully:
https://cirrus-ci.com/task/5502700019253248
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/rules.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/rules.out 2022-12-06 05:49:53.687424000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out 2022-12-06 05:53:04.642690000 +0000
@@ -1816,6 +1816,9 @@
FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset);
pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
Greetings,
Andres Freund
Hello!
On 06.12.2022 21:44, Andres Freund wrote:
Hi,
On 2022-09-19 01:29:21 +0300, Anton A. Melnikov wrote:
Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch).
This patch doesn't pass the main regression tests tests successfully:
https://cirrus-ci.com/task/5502700019253248
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/rules.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out --- /tmp/cirrus-ci-build/src/test/regress/expected/rules.out 2022-12-06 05:49:53.687424000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out 2022-12-06 05:53:04.642690000 +0000 @@ -1816,6 +1816,9 @@ FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset); pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, + pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed, + pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req, + pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,Greetings,
Andres Freund
Thank you for pointing!
I didn't think that the patch tester would apply both patch variants simultaneously,
assuming that these are two different possible solutions of the problem.
But it's even good, let it check both at once!
There was an error in the second variant (Add-restartpoint-stats), i forgot to correct the rules.out.
So fixed the second variant and rebased the first one (Fix-burst-checkpoint_req-growth)
to the current master.
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v3-0001-Fix-burst-checkpoint_req-growth.patchtext/x-patch; charset=UTF-8; name=v3-0001-Fix-burst-checkpoint_req-growth.patchDownload
commit 6f3ab7b6b5562aaf68ccfdd6dde17c27c2d2869e
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Wed Dec 7 10:10:58 2022 +0300
Remove burst growth of the checkpoint_req on replica.
with correcttions according to comment: https://www.postgresql.org/message-id/20220907.163946.2214338139097492905.horikyota.ntt%40gmail.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..02d86bf370 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8935,3 +8935,20 @@ SetWalWriterSleeping(bool sleeping)
XLogCtl->WalWriterSleeping = sleeping;
SpinLockRelease(&XLogCtl->info_lck);
}
+
+/*
+ * Check if a new checkpoint WAL record has been received since the
+ * last restartpoint. So it is possible to create a new one.
+ */
+bool RestartPointAvailable(void)
+{
+ bool result = false;
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+ if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr)
+ && XLogCtl->lastCheckPoint.redo > XLogCtl->RedoRecPtr)
+ result = true;
+ SpinLockRelease(&XLogCtl->info_lck);
+
+ return result;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 97b882564f..a88c716197 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3196,7 +3196,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
- RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ {
+ /*
+ * If there is no new checkpoint WAL records since the
+ * last restartpoint the creation of new one
+ * will certainly fail, so skip it.
+ */
+ if (RestartPointAvailable())
+ RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ }
}
}
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index f3398425d8..dcc2e64ca7 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -83,6 +83,7 @@ extern void InitWalRecovery(ControlFileData *ControlFile,
bool *wasShutdown_ptr, bool *haveBackupLabel_ptr,
bool *haveTblspcMap_ptr);
extern void PerformWalRecovery(void);
+extern bool RestartPointAvailable(void);
/*
* FinishWalRecovery() returns this. It contains information about the point
v2-0001-Add-restartpoint-stats.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-restartpoint-stats.patchDownload
commit 9ec1f90bbe407da94f8e940084e8815b4f8e874b
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Wed Dec 7 10:49:19 2022 +0300
Add statistic about restartpoint into pg_stat_bgwriter
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..658cafdc03 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1107,6 +1107,9 @@ CREATE VIEW pg_stat_bgwriter AS
SELECT
pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fc076fc14..2296701ddf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -350,6 +350,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
+ bool chkpt_or_rstpt_requested = false;
+ bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@@ -368,7 +370,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
- PendingCheckpointerStats.requested_checkpoints++;
+ chkpt_or_rstpt_requested = true;
}
/*
@@ -382,7 +384,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
- PendingCheckpointerStats.timed_checkpoints++;
+ chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@@ -418,6 +420,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
+ if (chkpt_or_rstpt_requested)
+ {
+ chkpt_or_rstpt_requested = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.requested_restartpoints++;
+ else
+ PendingCheckpointerStats.requested_checkpoints++;
+ }
+
+ if (chkpt_or_rstpt_timed)
+ {
+ chkpt_or_rstpt_timed = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.timed_restartpoints++;
+ else
+ PendingCheckpointerStats.timed_checkpoints++;
+ }
+
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -481,6 +501,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
+
+ if (do_restartpoint)
+ PendingCheckpointerStats.performed_restartpoints++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index af8d513e7b..9de99a2cf1 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(timed_checkpoints);
CHECKPOINTER_ACC(requested_checkpoints);
+ CHECKPOINTER_ACC(timed_restartpoints);
+ CHECKPOINTER_ACC(requested_restartpoints);
+ CHECKPOINTER_ACC(performed_restartpoints);
CHECKPOINTER_ACC(checkpoint_write_time);
CHECKPOINTER_ACC(checkpoint_sync_time);
CHECKPOINTER_ACC(buf_written_checkpoints);
@@ -112,6 +115,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(timed_checkpoints);
CHECKPOINTER_COMP(requested_checkpoints);
+ CHECKPOINTER_COMP(timed_restartpoints);
+ CHECKPOINTER_COMP(requested_restartpoints);
+ CHECKPOINTER_COMP(performed_restartpoints);
CHECKPOINTER_COMP(checkpoint_write_time);
CHECKPOINTER_COMP(checkpoint_sync_time);
CHECKPOINTER_COMP(buf_written_checkpoints);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 25a159b5e5..4c6b98dcce 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1138,6 +1138,24 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints);
}
+Datum
+pg_stat_get_bgwriter_timed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_requested_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_performed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->performed_restartpoints);
+}
+
Datum
pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..5d31ce40f7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5637,6 +5637,21 @@
proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' },
+{ oid => '9938',
+ descr => 'statistics: number of timed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_timed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_timed_restartpoints' },
+{ oid => '9939',
+ descr => 'statistics: number of backend requested restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_requested_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_requested_restartpoints' },
+{ oid => '9940',
+ descr => 'statistics: number of backend performed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_performed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_performed_restartpoints' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the bgwriter during checkpoints',
proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a3df8d27c3..cfc384ec5b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -269,6 +269,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter timed_checkpoints;
PgStat_Counter requested_checkpoints;
+ PgStat_Counter timed_restartpoints;
+ PgStat_Counter requested_restartpoints;
+ PgStat_Counter performed_restartpoints;
PgStat_Counter checkpoint_write_time; /* times in milliseconds */
PgStat_Counter checkpoint_sync_time;
PgStat_Counter buf_written_checkpoints;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fb9f936d43..0809264ad5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1816,6 +1816,9 @@ pg_stat_archiver| SELECT s.archived_count,
FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset);
pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
Hello!
On 15.03.2023 21:29, Gregory Stark (as CFM) wrote:
These patches that are "Needs Review" and have received no comments at
all since before March 1st are these. If your patch is amongst this
list I would suggest any of:1) Move it yourself to the next CF (or withdraw it)
2) Post to the list with any pending questions asking for specific
feedback -- it's much more likely to get feedback than just a generic
"here's a patch plz review"...
3) Mark it Ready for Committer and possibly post explaining the
resolution to any earlier questions to make it easier for a committer
to understand the state
There are two different patch variants and some discussion expected.
So moved them to the next CF.
With the best wishes!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v2-0001-Add-restartpoint-stats.patchtext/x-patch; charset=UTF-8; name=v2-0001-Add-restartpoint-stats.patchDownload
commit 9ec1f90bbe407da94f8e940084e8815b4f8e874b
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Wed Dec 7 10:49:19 2022 +0300
Add statistic about restartpoint into pg_stat_bgwriter
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2d8104b090..658cafdc03 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1107,6 +1107,9 @@ CREATE VIEW pg_stat_bgwriter AS
SELECT
pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fc076fc14..2296701ddf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -350,6 +350,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
+ bool chkpt_or_rstpt_requested = false;
+ bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@@ -368,7 +370,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
- PendingCheckpointerStats.requested_checkpoints++;
+ chkpt_or_rstpt_requested = true;
}
/*
@@ -382,7 +384,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
- PendingCheckpointerStats.timed_checkpoints++;
+ chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@@ -418,6 +420,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
+ if (chkpt_or_rstpt_requested)
+ {
+ chkpt_or_rstpt_requested = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.requested_restartpoints++;
+ else
+ PendingCheckpointerStats.requested_checkpoints++;
+ }
+
+ if (chkpt_or_rstpt_timed)
+ {
+ chkpt_or_rstpt_timed = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.timed_restartpoints++;
+ else
+ PendingCheckpointerStats.timed_checkpoints++;
+ }
+
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -481,6 +501,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
+
+ if (do_restartpoint)
+ PendingCheckpointerStats.performed_restartpoints++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index af8d513e7b..9de99a2cf1 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(timed_checkpoints);
CHECKPOINTER_ACC(requested_checkpoints);
+ CHECKPOINTER_ACC(timed_restartpoints);
+ CHECKPOINTER_ACC(requested_restartpoints);
+ CHECKPOINTER_ACC(performed_restartpoints);
CHECKPOINTER_ACC(checkpoint_write_time);
CHECKPOINTER_ACC(checkpoint_sync_time);
CHECKPOINTER_ACC(buf_written_checkpoints);
@@ -112,6 +115,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(timed_checkpoints);
CHECKPOINTER_COMP(requested_checkpoints);
+ CHECKPOINTER_COMP(timed_restartpoints);
+ CHECKPOINTER_COMP(requested_restartpoints);
+ CHECKPOINTER_COMP(performed_restartpoints);
CHECKPOINTER_COMP(checkpoint_write_time);
CHECKPOINTER_COMP(checkpoint_sync_time);
CHECKPOINTER_COMP(buf_written_checkpoints);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 25a159b5e5..4c6b98dcce 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1138,6 +1138,24 @@ pg_stat_get_bgwriter_requested_checkpoints(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_checkpoints);
}
+Datum
+pg_stat_get_bgwriter_timed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->timed_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_requested_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->requested_restartpoints);
+}
+
+Datum
+pg_stat_get_bgwriter_performed_restartpoints(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->performed_restartpoints);
+}
+
Datum
pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f9301b2627..5d31ce40f7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5637,6 +5637,21 @@
proname => 'pg_stat_get_bgwriter_requested_checkpoints', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_requested_checkpoints' },
+{ oid => '9938',
+ descr => 'statistics: number of timed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_timed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_timed_restartpoints' },
+{ oid => '9939',
+ descr => 'statistics: number of backend requested restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_requested_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_requested_restartpoints' },
+{ oid => '9940',
+ descr => 'statistics: number of backend performed restartpoints started by the bgwriter',
+ proname => 'pg_stat_get_bgwriter_performed_restartpoints', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_bgwriter_performed_restartpoints' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the bgwriter during checkpoints',
proname => 'pg_stat_get_bgwriter_buf_written_checkpoints', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a3df8d27c3..cfc384ec5b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -269,6 +269,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter timed_checkpoints;
PgStat_Counter requested_checkpoints;
+ PgStat_Counter timed_restartpoints;
+ PgStat_Counter requested_restartpoints;
+ PgStat_Counter performed_restartpoints;
PgStat_Counter checkpoint_write_time; /* times in milliseconds */
PgStat_Counter checkpoint_sync_time;
PgStat_Counter buf_written_checkpoints;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index fb9f936d43..0809264ad5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1816,6 +1816,9 @@ pg_stat_archiver| SELECT s.archived_count,
FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset);
pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed,
pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req,
+ pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed,
+ pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req,
+ pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done,
pg_stat_get_checkpoint_write_time() AS checkpoint_write_time,
pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time,
pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint,
v3-0001-Fix-burst-checkpoint_req-growth.patchtext/x-patch; charset=UTF-8; name=v3-0001-Fix-burst-checkpoint_req-growth.patchDownload
commit 6f3ab7b6b5562aaf68ccfdd6dde17c27c2d2869e
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Wed Dec 7 10:10:58 2022 +0300
Remove burst growth of the checkpoint_req on replica.
with correcttions according to comment: https://www.postgresql.org/message-id/20220907.163946.2214338139097492905.horikyota.ntt%40gmail.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..02d86bf370 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8935,3 +8935,20 @@ SetWalWriterSleeping(bool sleeping)
XLogCtl->WalWriterSleeping = sleeping;
SpinLockRelease(&XLogCtl->info_lck);
}
+
+/*
+ * Check if a new checkpoint WAL record has been received since the
+ * last restartpoint. So it is possible to create a new one.
+ */
+bool RestartPointAvailable(void)
+{
+ bool result = false;
+
+ SpinLockAcquire(&XLogCtl->info_lck);
+ if (!XLogRecPtrIsInvalid(XLogCtl->lastCheckPointRecPtr)
+ && XLogCtl->lastCheckPoint.redo > XLogCtl->RedoRecPtr)
+ result = true;
+ SpinLockRelease(&XLogCtl->info_lck);
+
+ return result;
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 97b882564f..a88c716197 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3196,7 +3196,15 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
{
(void) GetRedoRecPtr();
if (XLogCheckpointNeeded(readSegNo))
- RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ {
+ /*
+ * If there is no new checkpoint WAL records since the
+ * last restartpoint the creation of new one
+ * will certainly fail, so skip it.
+ */
+ if (RestartPointAvailable())
+ RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
+ }
}
}
diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h
index f3398425d8..dcc2e64ca7 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -83,6 +83,7 @@ extern void InitWalRecovery(ControlFileData *ControlFile,
bool *wasShutdown_ptr, bool *haveBackupLabel_ptr,
bool *haveTblspcMap_ptr);
extern void PerformWalRecovery(void);
+extern bool RestartPointAvailable(void);
/*
* FinishWalRecovery() returns this. It contains information about the point
Hi, Anton!
On Thu, Mar 16, 2023 at 2:39 PM Anton A. Melnikov <aamelnikov@inbox.ru> wrote:
On 15.03.2023 21:29, Gregory Stark (as CFM) wrote:
These patches that are "Needs Review" and have received no comments at
all since before March 1st are these. If your patch is amongst this
list I would suggest any of:1) Move it yourself to the next CF (or withdraw it)
2) Post to the list with any pending questions asking for specific
feedback -- it's much more likely to get feedback than just a generic
"here's a patch plz review"...
3) Mark it Ready for Committer and possibly post explaining the
resolution to any earlier questions to make it easier for a committer
to understand the stateThere are two different patch variants and some discussion expected.
So moved them to the next CF.
Thank you for your detailed observation regarding the spike growth of
the checkpoint_req counter on the replica following a large insert
operation on the master. After reviewing your description and the
log, I agree with Kyotaro Horiguchi that the behavior you've outlined,
though potentially perceived as annoying, does not constitute a bug in
the PostgreSQL.
After examining the second patch
("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
additional statistics as outlined in the patch is the most suitable
approach to address the concerns raised. This solution provides more
visibility into the system's behavior without altering its core
mechanics. However, it's essential that this additional functionality
is accompanied by comprehensive documentation to ensure clear
understanding and ease of use by the PostgreSQL community.
Please consider expanding the documentation to include detailed
explanations of the new statistics and their implications in various
scenarios.
------
Regards,
Alexander Korotkov
Thanks for remarks!
On 28.11.2023 21:34, Alexander Korotkov wrote:
After examining the second patch
("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
additional statistics as outlined in the patch is the most suitable
approach to address the concerns raised. This solution provides more
visibility into the system's behavior without altering its core
mechanics.
Agreed. I left only this variant of the patch and rework it due to commit 96f05261.
So the new counters is in the pg_stat_checkpointer view now.
Please see the v3-0001-add-restartpoints-stats.patch attached.
However, it's essential that this additional functionality
is accompanied by comprehensive documentation to ensure clear
understanding and ease of use by the PostgreSQL community.Please consider expanding the documentation to include detailed
explanations of the new statistics and their implications in various
scenarios.
In the separate v3-0002-doc-for-restartpoints-stats.patch i added the definitions
of the new counters into the "28.2.15. pg_stat_checkpointer" section
and explanation of them with examples into the "30.5.WAL Configuration" one.
Would be glad for any comments and and concerns.
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v3-0001-add-restartpoints-stats.patchtext/x-patch; charset=UTF-8; name=v3-0001-add-restartpoints-stats.patchDownload
commit 5711c09dbfbe02586a247a98e0ae41cd71a221a3
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Sun Dec 3 12:49:11 2023 +0300
Add statistic about restartpoints into pg_stat_checkpointer
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 11d18ed9dd..058fc47c91 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1141,6 +1141,9 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
+ pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
+ pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index dc2da5a2cd..67ecb177e7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -340,6 +340,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
+ bool chkpt_or_rstpt_requested = false;
+ bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@@ -358,7 +360,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
- PendingCheckpointerStats.num_requested++;
+ chkpt_or_rstpt_requested = true;
}
/*
@@ -372,7 +374,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
- PendingCheckpointerStats.num_timed++;
+ chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@@ -408,6 +410,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
+ if (chkpt_or_rstpt_timed)
+ {
+ chkpt_or_rstpt_timed = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_timed++;
+ else
+ PendingCheckpointerStats.num_timed++;
+ }
+
+ if (chkpt_or_rstpt_requested)
+ {
+ chkpt_or_rstpt_requested = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_requested++;
+ else
+ PendingCheckpointerStats.num_requested++;
+ }
+
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -471,6 +491,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
+
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_performed++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index 301a0bc7bd..6ee258f240 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(restartpoints_timed);
+ CHECKPOINTER_ACC(restartpoints_requested);
+ CHECKPOINTER_ACC(restartpoints_performed);
CHECKPOINTER_ACC(write_time);
CHECKPOINTER_ACC(sync_time);
CHECKPOINTER_ACC(buffers_written);
@@ -116,6 +119,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(restartpoints_timed);
+ CHECKPOINTER_COMP(restartpoints_requested);
+ CHECKPOINTER_COMP(restartpoints_performed);
CHECKPOINTER_COMP(write_time);
CHECKPOINTER_COMP(sync_time);
CHECKPOINTER_COMP(buffers_written);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0cea320c00..e65cbf41e9 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1193,6 +1193,24 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_timed);
+}
+
+Datum
+pg_stat_get_checkpointer_restartpoints_requested(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_requested);
+}
+
+Datum
+pg_stat_get_checkpointer_restartpoints_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_performed);
+}
+
Datum
pg_stat_get_checkpointer_buffers_written(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb58dee3bc..f6926c864d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5721,6 +5721,21 @@
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
+{ oid => '8743',
+ descr => 'statistics: number of timed restartpoints started by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
+{ oid => '8744',
+ descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_restartpoints_requested', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
+{ oid => '8745',
+ descr => 'statistics: number of backend performed restartpoints',
+ proname => 'pg_stat_get_checkpointer_restartpoints_performed', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the checkpointer',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f95d8db0c4..3cc2762ef5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -262,6 +262,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter restartpoints_timed;
+ PgStat_Counter restartpoints_requested;
+ PgStat_Counter restartpoints_performed;
PgStat_Counter write_time; /* times in milliseconds */
PgStat_Counter sync_time;
PgStat_Counter buffers_written;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 05070393b9..f645e8486b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1822,6 +1822,9 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
+ pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
+ pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,
v3-0002-doc-for-restartpoints-stats.patchtext/x-patch; charset=UTF-8; name=v3-0002-doc-for-restartpoints-stats.patchDownload
commit 4acf7f57f1476611b70f027b4fddd7cc276204af
Author: Anton A. Melnikov <a.melnikov@postgrespro.ru>
Date: Mon Dec 4 04:19:47 2023 +0300
Add docs about restartpoints related counters in the pg_stat_checkpointer view.
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 42509042ad..45cc091ea7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2982,6 +2982,33 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_timed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of scheduled restartpoints due to timeout or after failed attempt to perform it
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_req</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of requested restartpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of restartpoints that have been performed
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>write_time</structfield> <type>double precision</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 2ed4eb659d..678b0489d3 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -655,14 +655,41 @@
directory.
Restartpoints can't be performed more frequently than checkpoints on the
primary because restartpoints can only be performed at checkpoint records.
- A restartpoint is triggered when a checkpoint record is reached if at
- least <varname>checkpoint_timeout</varname> seconds have passed since the last
- restartpoint, or if WAL size is about to exceed
- <varname>max_wal_size</varname>. However, because of limitations on when a
- restartpoint can be performed, <varname>max_wal_size</varname> is often exceeded
- during recovery, by up to one checkpoint cycle's worth of WAL.
+ A restartpoint can be demanded by a shedule or by an external request.
+ The <structfield>restartpoints_timed</structfield> counter in the
+ <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
+ view counts the first ones while the <structfield>restartpoints_req</structfield>
+ the second.
+ A restartpoint is triggered by shedule when a checkpoint record is reached
+ if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since
+ the last performed restartpoint or when the previous attempt to perform
+ the restartpoint have been failed. In the last case the next restartpoint
+ will be scheduled in 15s.
+ A restartpoint is triggered by request due to similar reasons like checkpoint
+ but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/>
+ However, because of limitations on when a restartpoint can be performed,
+ <varname>max_wal_size</varname> is often exceeded during recovery,
+ by up to one checkpoint cycle's worth of WAL.
(<varname>max_wal_size</varname> is never a hard limit anyway, so you should
always leave plenty of headroom to avoid running out of disk space.)
+ The <structfield>restartpoints_done</structfield> counter in the
+ <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
+ view counts the restartpoints that have really been performed.
+ </para>
+
+ <para>
+ In some cases, when the WAL size on the primary increases quickly,
+ for instance during massive INSERT,
+ the <structfield>restartpoints_req</structfield> counter on the standby
+ may demonstarte a spike growth.
+ This occurs since requests to create a new restartpoint due to increased
+ XLOG consumption cannot be performed because the safe checkpoint record
+ since last restartpoint has not yet been replayed on the standby.
+ This behavior is normal does not lead to increase in system resources
+ consumption.
+ Only the <structfield>restartpoints_done</structfield>
+ counter among the restartpoint related ones indicates that noticable system
+ resources have been spent.
</para>
<para>
Hi, Anton!
On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov <a.melnikov@postgrespro.ru>
wrote:
Thanks for remarks!
On 28.11.2023 21:34, Alexander Korotkov wrote:
After examining the second patch
("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
additional statistics as outlined in the patch is the most suitable
approach to address the concerns raised. This solution provides more
visibility into the system's behavior without altering its core
mechanics.Agreed. I left only this variant of the patch and rework it due to commit
96f05261.
So the new counters is in the pg_stat_checkpointer view now.
Please see the v3-0001-add-restartpoints-stats.patch attached.However, it's essential that this additional functionality
is accompanied by comprehensive documentation to ensure clear
understanding and ease of use by the PostgreSQL community.Please consider expanding the documentation to include detailed
explanations of the new statistics and their implications in various
scenarios.In the separate v3-0002-doc-for-restartpoints-stats.patch i added the
definitions
of the new counters into the "28.2.15. pg_stat_checkpointer" section
and explanation of them with examples into the "30.5.WAL Configuration"
one.Would be glad for any comments and and concerns.
I made some grammar corrections to the docs and have written the commit
message.
I think this patch now looks good. I'm going to push this if there are no
objections.
------
Regards,
Alexander Korotkov
Attachments:
0001-Enhance-checkpointer-restartpoint-statistics-v4.patchapplication/octet-stream; name=0001-Enhance-checkpointer-restartpoint-statistics-v4.patchDownload
From 5c1c62320a01586ecf8c041d3f3bc24fac994abd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Fri, 22 Dec 2023 23:31:51 +0200
Subject: [PATCH] Enhance checkpointer restartpoint statistics
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.
Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.
The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions. This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.
System catalog is changed. Catversion is bumped.
Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
---
doc/src/sgml/monitoring.sgml | 27 +++++++++++++
doc/src/sgml/wal.sgml | 39 ++++++++++++++++---
src/backend/catalog/system_views.sql | 3 ++
src/backend/postmaster/checkpointer.c | 27 ++++++++++++-
.../utils/activity/pgstat_checkpointer.c | 6 +++
src/backend/utils/adt/pgstatfuncs.c | 18 +++++++++
src/include/catalog/pg_proc.dat | 15 +++++++
src/include/pgstat.h | 3 ++
src/test/regress/expected/rules.out | 3 ++
9 files changed, 133 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4f8058d8b1b..b804eb8b5ef 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2982,6 +2982,33 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_timed</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of scheduled restartpoints due to timeout or after a failed attempt to perform it
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_req</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of requested restartpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>restartpoints_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of restartpoints that have been performed
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>write_time</structfield> <type>double precision</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 2ed4eb659db..05e2a8f8be9 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -655,14 +655,41 @@
directory.
Restartpoints can't be performed more frequently than checkpoints on the
primary because restartpoints can only be performed at checkpoint records.
- A restartpoint is triggered when a checkpoint record is reached if at
- least <varname>checkpoint_timeout</varname> seconds have passed since the last
- restartpoint, or if WAL size is about to exceed
- <varname>max_wal_size</varname>. However, because of limitations on when a
- restartpoint can be performed, <varname>max_wal_size</varname> is often exceeded
- during recovery, by up to one checkpoint cycle's worth of WAL.
+ A restartpoint can be demanded by a schedule or by an external request.
+ The <structfield>restartpoints_timed</structfield> counter in the
+ <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
+ view counts the first ones while the <structfield>restartpoints_req</structfield>
+ the second.
+ A restartpoint is triggered by schedule when a checkpoint record is reached
+ if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since
+ the last performed restartpoint or when the previous attempt to perform
+ the restartpoint has failed. In the last case, the next restartpoint
+ will be scheduled in 15 seconds.
+ A restartpoint is triggered by request due to similar reasons like checkpoint
+ but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/>
+ However, because of limitations on when a restartpoint can be performed,
+ <varname>max_wal_size</varname> is often exceeded during recovery,
+ by up to one checkpoint cycle's worth of WAL.
(<varname>max_wal_size</varname> is never a hard limit anyway, so you should
always leave plenty of headroom to avoid running out of disk space.)
+ The <structfield>restartpoints_done</structfield> counter in the
+ <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link>
+ view counts the restartpoints that have really been performed.
+ </para>
+
+ <para>
+ In some cases, when the WAL size on the primary increases quickly,
+ for instance during massive INSERT,
+ the <structfield>restartpoints_req</structfield> counter on the standby
+ may demonstrate a peak growth.
+ This occurs because requests to create a new restartpoint due to increased
+ XLOG consumption cannot be performed because the safe checkpoint record
+ since the last restartpoint has not yet been replayed on the standby.
+ This behavior is normal and does not lead to an increase in system resource
+ consumption.
+ Only the <structfield>restartpoints_done</structfield>
+ counter among the restartpoint-related ones indicates that noticeable system
+ resources have been spent.
</para>
<para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 11d18ed9dd6..058fc47c919 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1141,6 +1141,9 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
+ pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
+ pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index dc2da5a2cd8..67ecb177e7e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -340,6 +340,8 @@ CheckpointerMain(void)
pg_time_t now;
int elapsed_secs;
int cur_timeout;
+ bool chkpt_or_rstpt_requested = false;
+ bool chkpt_or_rstpt_timed = false;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
@@ -358,7 +360,7 @@ CheckpointerMain(void)
if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
{
do_checkpoint = true;
- PendingCheckpointerStats.num_requested++;
+ chkpt_or_rstpt_requested = true;
}
/*
@@ -372,7 +374,7 @@ CheckpointerMain(void)
if (elapsed_secs >= CheckPointTimeout)
{
if (!do_checkpoint)
- PendingCheckpointerStats.num_timed++;
+ chkpt_or_rstpt_timed = true;
do_checkpoint = true;
flags |= CHECKPOINT_CAUSE_TIME;
}
@@ -408,6 +410,24 @@ CheckpointerMain(void)
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;
+ if (chkpt_or_rstpt_timed)
+ {
+ chkpt_or_rstpt_timed = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_timed++;
+ else
+ PendingCheckpointerStats.num_timed++;
+ }
+
+ if (chkpt_or_rstpt_requested)
+ {
+ chkpt_or_rstpt_requested = false;
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_requested++;
+ else
+ PendingCheckpointerStats.num_requested++;
+ }
+
/*
* We will warn if (a) too soon since last checkpoint (whatever
* caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -471,6 +491,9 @@ CheckpointerMain(void)
* checkpoints happen at a predictable spacing.
*/
last_checkpoint_time = now;
+
+ if (do_restartpoint)
+ PendingCheckpointerStats.restartpoints_performed++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index 301a0bc7bd3..6ee258f2402 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,9 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(restartpoints_timed);
+ CHECKPOINTER_ACC(restartpoints_requested);
+ CHECKPOINTER_ACC(restartpoints_performed);
CHECKPOINTER_ACC(write_time);
CHECKPOINTER_ACC(sync_time);
CHECKPOINTER_ACC(buffers_written);
@@ -116,6 +119,9 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(restartpoints_timed);
+ CHECKPOINTER_COMP(restartpoints_requested);
+ CHECKPOINTER_COMP(restartpoints_performed);
CHECKPOINTER_COMP(write_time);
CHECKPOINTER_COMP(sync_time);
CHECKPOINTER_COMP(buffers_written);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0cea320c00e..e65cbf41e9f 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1193,6 +1193,24 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_timed);
+}
+
+Datum
+pg_stat_get_checkpointer_restartpoints_requested(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_requested);
+}
+
+Datum
+pg_stat_get_checkpointer_restartpoints_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->restartpoints_performed);
+}
+
Datum
pg_stat_get_checkpointer_buffers_written(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b8b26c263db..9052f5262a2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5721,6 +5721,21 @@
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
+{ oid => '8743',
+ descr => 'statistics: number of timed restartpoints started by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
+{ oid => '8744',
+ descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_restartpoints_requested', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
+{ oid => '8745',
+ descr => 'statistics: number of backend performed restartpoints',
+ proname => 'pg_stat_get_checkpointer_restartpoints_performed', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
{ oid => '2771',
descr => 'statistics: number of buffers written by the checkpointer',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fc93d0d731d..ab91b3b367d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -262,6 +262,9 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter restartpoints_timed;
+ PgStat_Counter restartpoints_requested;
+ PgStat_Counter restartpoints_performed;
PgStat_Counter write_time; /* times in milliseconds */
PgStat_Counter sync_time;
PgStat_Counter buffers_written;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 05070393b99..f645e8486bf 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1822,6 +1822,9 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
+ pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
+ pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
pg_stat_get_checkpointer_write_time() AS write_time,
pg_stat_get_checkpointer_sync_time() AS sync_time,
pg_stat_get_checkpointer_buffers_written() AS buffers_written,
--
2.39.3 (Apple Git-145)
On Sat, Dec 23, 2023 at 12:04 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov <a.melnikov@postgrespro.ru> wrote:
Thanks for remarks!
On 28.11.2023 21:34, Alexander Korotkov wrote:
After examining the second patch
("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
additional statistics as outlined in the patch is the most suitable
approach to address the concerns raised. This solution provides more
visibility into the system's behavior without altering its core
mechanics.Agreed. I left only this variant of the patch and rework it due to commit 96f05261.
So the new counters is in the pg_stat_checkpointer view now.
Please see the v3-0001-add-restartpoints-stats.patch attached.However, it's essential that this additional functionality
is accompanied by comprehensive documentation to ensure clear
understanding and ease of use by the PostgreSQL community.Please consider expanding the documentation to include detailed
explanations of the new statistics and their implications in various
scenarios.In the separate v3-0002-doc-for-restartpoints-stats.patch i added the definitions
of the new counters into the "28.2.15. pg_stat_checkpointer" section
and explanation of them with examples into the "30.5.WAL Configuration" one.Would be glad for any comments and and concerns.
I made some grammar corrections to the docs and have written the commit message.
I think this patch now looks good. I'm going to push this if there are no objections.
Pushed!
------
Regards,
Alexander Korotkov
On 25.12.2023 02:38, Alexander Korotkov wrote:
Pushed!
Thanks a lot!
With the best regards!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, Dec 22, 2023 at 11:04 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
Hi, Anton!
On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov <a.melnikov@postgrespro.ru> wrote:
Thanks for remarks!
On 28.11.2023 21:34, Alexander Korotkov wrote:
After examining the second patch
("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
additional statistics as outlined in the patch is the most suitable
approach to address the concerns raised. This solution provides more
visibility into the system's behavior without altering its core
mechanics.Agreed. I left only this variant of the patch and rework it due to commit 96f05261.
So the new counters is in the pg_stat_checkpointer view now.
Please see the v3-0001-add-restartpoints-stats.patch attached.However, it's essential that this additional functionality
is accompanied by comprehensive documentation to ensure clear
understanding and ease of use by the PostgreSQL community.Please consider expanding the documentation to include detailed
explanations of the new statistics and their implications in various
scenarios.In the separate v3-0002-doc-for-restartpoints-stats.patch i added the definitions
of the new counters into the "28.2.15. pg_stat_checkpointer" section
and explanation of them with examples into the "30.5.WAL Configuration" one.Would be glad for any comments and and concerns.
I made some grammar corrections to the docs and have written the commit message.
I think this patch now looks good. I'm going to push this if there are no objections.
Per the docs, the sync_time, write_time and buffers_written only apply
to checkpoints, not restartpoints. Is this correct? AFAICT from a
quick look at the code they include both checkpoints and restartpoints
in which case I think the docs should be clarified to indicate that?
(Or if I'm wrong, and it doesn't include them, then shouldn't we have
separate counters for them?)
//Magnus
On Sat, Mar 9, 2024 at 4:38 PM Magnus Hagander <magnus@hagander.net> wrote:
Per the docs, the sync_time, write_time and buffers_written only apply
to checkpoints, not restartpoints. Is this correct? AFAICT from a
quick look at the code they include both checkpoints and restartpoints
in which case I think the docs should be clarified to indicate that?
Right, these fields work as before reflecting both checkpoints and
restartpoints. Documentation said checkpoints implying restartpoints
as well. Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs. Please, check the patch
attached.
------
Regards,
Alexander Korotkov
Attachments:
rename_pg_stat_get_checkpointer_fields.patchapplication/octet-stream; name=rename_pg_stat_get_checkpointer_fields.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8aca08140ea..8736eac2841 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3016,7 +3016,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Total amount of time that has been spent in the portion of
- checkpoint processing where files are written to disk, in milliseconds
+ processing checkpoints and restartpoints where files are written to disk,
+ in milliseconds
</para></entry>
</row>
@@ -3026,8 +3027,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Total amount of time that has been spent in the portion of
- checkpoint processing where files are synchronized to disk, in
- milliseconds
+ processing checkpoints and restartpoints where files are synchronized to
+ disk, in milliseconds
</para></entry>
</row>
@@ -3036,7 +3037,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>buffers_written</structfield> <type>bigint</type>
</para>
<para>
- Number of buffers written during checkpoints
+ Number of buffers written during checkpoints and restartpoints
</para></entry>
</row>
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fca..7da1c3b1765 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5760,12 +5760,12 @@
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_stat_reset_time' },
{ oid => '3160',
- descr => 'statistics: checkpoint time spent writing buffers to disk, in milliseconds',
+ descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_write_time' },
{ oid => '3161',
- descr => 'statistics: checkpoint time spent synchronizing buffers to disk, in milliseconds',
+ descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_sync_time' },
On 11.03.2024 03:39, Alexander Korotkov wrote:
Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs. Please, check the patch
attached.
Maybe bring the pg_stat_get_checkpointer_buffers_written() description in consistent with these changes?
Like that:
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5740 +5740 @@
- descr => 'statistics: number of buffers written by the checkpointer',
+ descr => 'statistics: number of buffers written during checkpoints and restartpoints',
And after i took a fresh look at this patch i noticed a simple way to extract
write_time and sync_time counters for restartpoints too.
What do you think, is there a sense to do this?
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Mar 11, 2024 at 5:43 AM Anton A. Melnikov
<a.melnikov@postgrespro.ru> wrote:
On 11.03.2024 03:39, Alexander Korotkov wrote:
Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs. Please, check the patch
attached.Maybe bring the pg_stat_get_checkpointer_buffers_written() description in consistent with these changes?
Like that:--- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5740 +5740 @@ - descr => 'statistics: number of buffers written by the checkpointer', + descr => 'statistics: number of buffers written during checkpoints and restartpoints',
This makes sense. I've included this into the revised patch.
And after i took a fresh look at this patch i noticed a simple way to extract
write_time and sync_time counters for restartpoints too.What do you think, is there a sense to do this?
I'm not sure we need this. The ways we trigger checkpoints and
restartpoints are different. This is why we needed separate
statistics. But the process of writing buffers is the same.
------
Regards,
Alexander Korotkov
Attachments:
rename_pg_stat_get_checkpointer_fields_v2.patchapplication/octet-stream; name=rename_pg_stat_get_checkpointer_fields_v2.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8aca08140ea..8736eac2841 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3016,7 +3016,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Total amount of time that has been spent in the portion of
- checkpoint processing where files are written to disk, in milliseconds
+ processing checkpoints and restartpoints where files are written to disk,
+ in milliseconds
</para></entry>
</row>
@@ -3026,8 +3027,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para>
<para>
Total amount of time that has been spent in the portion of
- checkpoint processing where files are synchronized to disk, in
- milliseconds
+ processing checkpoints and restartpoints where files are synchronized to
+ disk, in milliseconds
</para></entry>
</row>
@@ -3036,7 +3037,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>buffers_written</structfield> <type>bigint</type>
</para>
<para>
- Number of buffers written during checkpoints
+ Number of buffers written during checkpoints and restartpoints
</para></entry>
</row>
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fca..4af5c2e8470 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5737,7 +5737,7 @@
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
{ oid => '2771',
- descr => 'statistics: number of buffers written by the checkpointer',
+ descr => 'statistics: number of buffers written during checkpoints and restartpoints',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_buffers_written' },
@@ -5760,12 +5760,12 @@
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_stat_reset_time' },
{ oid => '3160',
- descr => 'statistics: checkpoint time spent writing buffers to disk, in milliseconds',
+ descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_write_time' },
{ oid => '3161',
- descr => 'statistics: checkpoint time spent synchronizing buffers to disk, in milliseconds',
+ descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_sync_time' },
On Mon, Mar 11, 2024 at 11:48 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Mon, Mar 11, 2024 at 5:43 AM Anton A. Melnikov
<a.melnikov@postgrespro.ru> wrote:On 11.03.2024 03:39, Alexander Korotkov wrote:
Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs. Please, check the patch
attached.Maybe bring the pg_stat_get_checkpointer_buffers_written() description in consistent with these changes?
Like that:--- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5740 +5740 @@ - descr => 'statistics: number of buffers written by the checkpointer', + descr => 'statistics: number of buffers written during checkpoints and restartpoints',This makes sense. I've included this into the revised patch.
Pushed.
------
Regards,
Alexander Korotkov
On 14.03.2024 03:19, Alexander Korotkov wrote:
Pushed.
Thanks!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2024/03/14 9:19, Alexander Korotkov wrote:
On Mon, Mar 11, 2024 at 11:48 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Mon, Mar 11, 2024 at 5:43 AM Anton A. Melnikov
<a.melnikov@postgrespro.ru> wrote:On 11.03.2024 03:39, Alexander Korotkov wrote:
Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs. Please, check the patch
attached.Maybe bring the pg_stat_get_checkpointer_buffers_written() description in consistent with these changes?
Like that:--- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5740 +5740 @@ - descr => 'statistics: number of buffers written by the checkpointer', + descr => 'statistics: number of buffers written during checkpoints and restartpoints',This makes sense. I've included this into the revised patch.
Pushed.
If I understand correctly, restartpoints_timed and restartpoints_done were
separated because a restartpoint can be skipped. restartpoints_timed counts
when a restartpoint is triggered by a timeout, whether it runs or not,
while restartpoints_done only tracks completed restartpoints.
Similarly, I believe checkpoints should be handled the same way.
Checkpoints can also be skipped when the system is idle, but currently,
num_timed counts even the skipped ones, despite its documentation stating
it's the "Number of scheduled checkpoints that have been performed."
Why not separate num_timed into something like checkpoints_timed and
checkpoints_done to reflect these different counters?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi!
On 13.09.2024 18:20, Fujii Masao wrote:
If I understand correctly, restartpoints_timed and restartpoints_done were
separated because a restartpoint can be skipped. restartpoints_timed counts
when a restartpoint is triggered by a timeout, whether it runs or not,
while restartpoints_done only tracks completed restartpoints.Similarly, I believe checkpoints should be handled the same way.
Checkpoints can also be skipped when the system is idle, but currently,
num_timed counts even the skipped ones, despite its documentation stating
it's the "Number of scheduled checkpoints that have been performed."Why not separate num_timed into something like checkpoints_timed and
checkpoints_done to reflect these different counters?
+1
This idea seems quite tenable to me.
There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.
I'm not sure should we include testing for the case when num_done is less than
num_timed + num_requested to the regress tests. I haven't been able to get it in a short time yet.
E.g. such a case may be obtained when an a error "checkpoints are
occurring too frequently" as follows:
-set checkpoint_timeout = 30 and checkpoint_warning = 40 in the postgresql.conf
-start server
-do periodically bulk insertions in the 1st client (e.g. insert into test values (generate_series(1,1E7));)
-watch for pg_stat_checkpointer in the 2nd one:
# SELECT CURRENT_TIME; select * from pg_stat_checkpointer;
# \watch
After some time, in the log will appear:
2024-09-16 16:38:47.888 MSK [193733] LOG: checkpoints are occurring too frequently (13 seconds apart)
2024-09-16 16:38:47.888 MSK [193733] HINT: Consider increasing the configuration parameter "max_wal_size".
And num_timed + num_requested will become greater than num_done.
Would be nice to find some simpler and faster way.
With the best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v1-0001-Introduce-num_done-counter-in-the-pg_stat_checkpointer.patchtext/x-patch; charset=UTF-8; name=v1-0001-Introduce-num_done-counter-in-the-pg_stat_checkpointer.patchDownload
From fcd0b61d1f1718dbf664cb3509aad16543d65375 Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" <a.melnikov@postgrespro.ru>
Date: Mon, 16 Sep 2024 16:12:07 +0300
Subject: [PATCH] Introduce num_done counter in the pg_stat_checkpointer view
that reflects number of really performed checkpoints.
---
doc/src/sgml/monitoring.sgml | 13 +++++-
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/checkpointer.c | 2 +
.../utils/activity/pgstat_checkpointer.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 6 +++
src/include/catalog/pg_proc.dat | 40 +++++++++++--------
src/include/pgstat.h | 1 +
src/test/regress/expected/rules.out | 1 +
8 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 933de6fe07f..dad7e236a43 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3051,7 +3051,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_timed</structfield> <type>bigint</type>
</para>
<para>
- Number of scheduled checkpoints that have been performed
+ Number of scheduled checkpoints
</para></entry>
</row>
@@ -3060,7 +3060,16 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_requested</structfield> <type>bigint</type>
</para>
<para>
- Number of requested checkpoints that have been performed
+ Number of backend requested checkpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>num_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of checkpoints that have been performed
</para></entry>
</row>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..49109dbdc86 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1138,6 +1138,7 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c85726..06ad2f52f27 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -495,6 +495,8 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_performed++;
+ else
+ PendingCheckpointerStats.num_performed++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e183..4a0a2d1493a 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,7 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(num_performed);
CHECKPOINTER_ACC(restartpoints_timed);
CHECKPOINTER_ACC(restartpoints_requested);
CHECKPOINTER_ACC(restartpoints_performed);
@@ -127,6 +128,7 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(num_performed);
CHECKPOINTER_COMP(restartpoints_timed);
CHECKPOINTER_COMP(restartpoints_requested);
CHECKPOINTER_COMP(restartpoints_performed);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 33c7b25560b..5624bef18e7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1191,6 +1191,12 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_num_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_performed);
+}
+
Datum
pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 53a081ed886..c8c1b79ff63 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5807,42 +5807,58 @@
proargmodes => '{o,o,o,o,o,o,o}',
proargnames => '{archived_count,last_archived_wal,last_archived_time,failed_count,last_failed_wal,last_failed_time,stats_reset}',
prosrc => 'pg_stat_get_archiver' },
-{ oid => '2769',
+{ oid => '6347',
descr => 'statistics: number of timed checkpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_num_timed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_timed' },
-{ oid => '2770',
+{ oid => '6348',
descr => 'statistics: number of backend requested checkpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
-{ oid => '6327',
+{ oid => '6349',
+ descr => 'statistics: number of checkpoints performed by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_num_performed',
+ provolatile => 's', proparallel => 'r', prorettype => 'int8',
+ proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_num_performed' },
+{ oid => '6350',
descr => 'statistics: number of timed restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
-{ oid => '6328',
+{ oid => '6351',
descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_requested',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
-{ oid => '6329',
+{ oid => '6352',
descr => 'statistics: number of backend performed restartpoints',
proname => 'pg_stat_get_checkpointer_restartpoints_performed',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
-{ oid => '2771',
+{ oid => '6353',
descr => 'statistics: number of buffers written during checkpoints and restartpoints',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_buffers_written' },
-{ oid => '6314', descr => 'statistics: last reset for the checkpointer',
+{ oid => '6354', descr => 'statistics: last reset for the checkpointer',
proname => 'pg_stat_get_checkpointer_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_stat_reset_time' },
+{ oid => '6355',
+ descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_write_time' },
+{ oid => '6356',
+ descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2772',
descr => 'statistics: number of buffers written by the bgwriter for cleaning dirty buffers',
proname => 'pg_stat_get_bgwriter_buf_written_clean', provolatile => 's',
@@ -5857,16 +5873,6 @@
proname => 'pg_stat_get_bgwriter_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_stat_reset_time' },
-{ oid => '3160',
- descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_write_time' },
-{ oid => '3161',
- descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2859', descr => 'statistics: number of buffer allocations',
proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a1..39850d8f14f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -294,6 +294,7 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter num_performed;
PgStat_Counter restartpoints_timed;
PgStat_Counter restartpoints_requested;
PgStat_Counter restartpoints_performed;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae9..f5434d8365c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,6 +1824,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
--
2.46.0
On 2024/09/16 23:30, Anton A. Melnikov wrote:
+1
This idea seems quite tenable to me.There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.
Thanks for the patch! I believe this change is targeted for v18. For v17, however,
we should update the description of num_timed in the documentation. Thought?
Here's a suggestion:
"Number of scheduled checkpoints due to timeout. Note that checkpoints may be
skipped if the server has been idle since the last one, and this value counts
both completed and skipped checkpoints."
Regarding the patch:
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_performed++;
+ else
+ PendingCheckpointerStats.num_performed++;
I expected the counter not to be incremented when a checkpoint is skipped,
but in this code, when a checkpoint is skipped, ckpt_performed is set to true,
triggering the counter increment. This seems wrong.
I'm not sure should we include testing for the case when num_done is less than
num_timed + num_requested to the regress tests. I haven't been able to get it in a short time yet.
I'm not sure if that test is really necessary...
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024/09/17 11:47, Fujii Masao wrote:
On 2024/09/16 23:30, Anton A. Melnikov wrote:
+1
This idea seems quite tenable to me.There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.Thanks for the patch! I believe this change is targeted for v18. For v17, however,
we should update the description of num_timed in the documentation. Thought?
Here's a suggestion:"Number of scheduled checkpoints due to timeout. Note that checkpoints may be
skipped if the server has been idle since the last one, and this value counts
both completed and skipped checkpoints."
Patch attached.
Unless there are any objections, I plan to commit this and back-patch it to v17.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v1-0001-docs-Improve-the-description-of-num_timed-column-.patchtext/plain; charset=UTF-8; name=v1-0001-docs-Improve-the-description-of-num_timed-column-.patchDownload
From 96dd9e0f923aef44ee1d73ea2bafea8d9db805bf Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Wed, 18 Sep 2024 19:03:53 +0900
Subject: [PATCH v1] docs: Improve the description of num_timed column in
pg_stat_checkpointer.
The previous documentation stated that num_timed reflects the number of
scheduled checkpoints performed. However, checkpoints may be skipped
if the server has been idle, and num_timed counts both skipped and completed
checkpoints. This commit clarifies the description to make it clear that
the counter includes both skipped and completed checkpoints.
Back-patch to v17 where pg_stat_checkpointer was added.
Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com
---
doc/src/sgml/monitoring.sgml | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 933de6fe07..a2fda4677d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3051,7 +3051,10 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_timed</structfield> <type>bigint</type>
</para>
<para>
- Number of scheduled checkpoints that have been performed
+ Number of scheduled checkpoints due to timeout.
+ Note that checkpoints may be skipped if the server has been idle
+ since the last one, and this value counts both completed and
+ skipped checkpoints
</para></entry>
</row>
--
2.45.2
On Wed, Sep 18, 2024 at 1:21 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2024/09/17 11:47, Fujii Masao wrote:
On 2024/09/16 23:30, Anton A. Melnikov wrote:
+1
This idea seems quite tenable to me.There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.Thanks for the patch! I believe this change is targeted for v18. For v17, however,
we should update the description of num_timed in the documentation. Thought?
Here's a suggestion:"Number of scheduled checkpoints due to timeout. Note that checkpoints may be
skipped if the server has been idle since the last one, and this value counts
both completed and skipped checkpoints."Patch attached.
Unless there are any objections, I plan to commit this and back-patch it to v17.
I've checked this patch, it looks good to me.
Generally, it looks like I should be in charge for this, given I've
committed previous patch by Anton. Thank you for reacting here faster
than me. Please, go ahead with the patch.
------
Regards,
Alexander Korotkov
Supabase
Fujii, Alexander thanks a lot!
On 17.09.2024 05:47, Fujii Masao wrote:
Regarding the patch: if (do_restartpoint) PendingCheckpointerStats.restartpoints_performed++; + else + PendingCheckpointerStats.num_performed++;I expected the counter not to be incremented when a checkpoint is skipped,
but in this code, when a checkpoint is skipped, ckpt_performed is set to true,
triggering the counter increment. This seems wrong.
Tried to fix it via returning bool value from the CreateCheckPoint()
similarly to the CreateRestartPoint().
And slightly adjusted the patch so that it could be applied after yours.
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v2-0002-Introduce-num_done-counter-in-the-pg_stat_checkpoint.patchtext/x-patch; charset=UTF-8; name=v2-0002-Introduce-num_done-counter-in-the-pg_stat_checkpoint.patchDownload
From 97595f65cb12eb2243e1b7391e1bc77bd161f41c Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" <a.melnikov@postgrespro.ru>
Date: Mon, 16 Sep 2024 16:12:07 +0300
Subject: [PATCH] Introduce num_done counter in the pg_stat_checkpointer view
that reflects number of really performed checkpoints.
---
doc/src/sgml/monitoring.sgml | 11 ++++-
src/backend/access/transam/xlog.c | 6 ++-
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/checkpointer.c | 5 ++-
.../utils/activity/pgstat_checkpointer.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 6 +++
src/include/access/xlog.h | 2 +-
src/include/catalog/pg_proc.dat | 40 +++++++++++--------
src/include/pgstat.h | 1 +
src/test/regress/expected/rules.out | 1 +
10 files changed, 52 insertions(+), 23 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a2fda4677d7..19bf0164f1c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3063,7 +3063,16 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_requested</structfield> <type>bigint</type>
</para>
<para>
- Number of requested checkpoints that have been performed
+ Number of backend requested checkpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>num_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of checkpoints that have been performed
</para></entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 853ab06812b..ca1155567dc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6879,7 +6879,7 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
* both the record marking the completion of the checkpoint and the location
* from which WAL replay would begin if needed.
*/
-void
+bool
CreateCheckPoint(int flags)
{
bool shutdown;
@@ -6971,7 +6971,7 @@ CreateCheckPoint(int flags)
END_CRIT_SECTION();
ereport(DEBUG1,
(errmsg_internal("checkpoint skipped because system is idle")));
- return;
+ return false;
}
}
@@ -7353,6 +7353,8 @@ CreateCheckPoint(int flags)
CheckpointStats.ckpt_segs_added,
CheckpointStats.ckpt_segs_removed,
CheckpointStats.ckpt_segs_recycled);
+
+ return true;
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..49109dbdc86 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1138,6 +1138,7 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c85726..ef29cb439b2 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -461,8 +461,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
*/
if (!do_restartpoint)
{
- CreateCheckPoint(flags);
- ckpt_performed = true;
+ ckpt_performed = CreateCheckPoint(flags);
}
else
ckpt_performed = CreateRestartPoint(flags);
@@ -495,6 +494,8 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
if (do_restartpoint)
PendingCheckpointerStats.restartpoints_performed++;
+ else
+ PendingCheckpointerStats.num_performed++;
}
else
{
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e183..4a0a2d1493a 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,7 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(num_performed);
CHECKPOINTER_ACC(restartpoints_timed);
CHECKPOINTER_ACC(restartpoints_requested);
CHECKPOINTER_ACC(restartpoints_performed);
@@ -127,6 +128,7 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(num_performed);
CHECKPOINTER_COMP(restartpoints_timed);
CHECKPOINTER_COMP(restartpoints_requested);
CHECKPOINTER_COMP(restartpoints_performed);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9c23ac7c8c8..17b0fc02ef0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1191,6 +1191,12 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_num_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_performed);
+}
+
Datum
pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 083810f5b4c..36f6e4e4b4e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -239,7 +239,7 @@ extern void LocalProcessControlFile(bool reset);
extern WalLevel GetActiveWalLevelOnStandby(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
-extern void CreateCheckPoint(int flags);
+extern bool CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN);
extern void XLogPutNextOid(Oid nextOid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43f608d7a0a..83c96140b01 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5807,42 +5807,58 @@
proargmodes => '{o,o,o,o,o,o,o}',
proargnames => '{archived_count,last_archived_wal,last_archived_time,failed_count,last_failed_wal,last_failed_time,stats_reset}',
prosrc => 'pg_stat_get_archiver' },
-{ oid => '2769',
+{ oid => '6347',
descr => 'statistics: number of timed checkpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_num_timed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_timed' },
-{ oid => '2770',
+{ oid => '6348',
descr => 'statistics: number of backend requested checkpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
-{ oid => '6327',
+{ oid => '6349',
+ descr => 'statistics: number of checkpoints performed by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_num_performed',
+ provolatile => 's', proparallel => 'r', prorettype => 'int8',
+ proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_num_performed' },
+{ oid => '6350',
descr => 'statistics: number of timed restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
-{ oid => '6328',
+{ oid => '6351',
descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_requested',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
-{ oid => '6329',
+{ oid => '6352',
descr => 'statistics: number of backend performed restartpoints',
proname => 'pg_stat_get_checkpointer_restartpoints_performed',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_performed' },
-{ oid => '2771',
+{ oid => '6353',
descr => 'statistics: number of buffers written during checkpoints and restartpoints',
proname => 'pg_stat_get_checkpointer_buffers_written', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_buffers_written' },
-{ oid => '6314', descr => 'statistics: last reset for the checkpointer',
+{ oid => '6354', descr => 'statistics: last reset for the checkpointer',
proname => 'pg_stat_get_checkpointer_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_stat_reset_time' },
+{ oid => '6355',
+ descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_write_time' },
+{ oid => '6356',
+ descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2772',
descr => 'statistics: number of buffers written by the bgwriter for cleaning dirty buffers',
proname => 'pg_stat_get_bgwriter_buf_written_clean', provolatile => 's',
@@ -5857,16 +5873,6 @@
proname => 'pg_stat_get_bgwriter_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_stat_reset_time' },
-{ oid => '3160',
- descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_write_time' },
-{ oid => '3161',
- descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2859', descr => 'statistics: number of buffer allocations',
proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4752dfe7197..476acd680c0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -294,6 +294,7 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter num_performed;
PgStat_Counter restartpoints_timed;
PgStat_Counter restartpoints_requested;
PgStat_Counter restartpoints_performed;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae9..f5434d8365c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,6 +1824,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
--
2.46.0
On 2024/09/18 21:22, Alexander Korotkov wrote:
Patch attached.
Unless there are any objections, I plan to commit this and back-patch it to v17.I've checked this patch, it looks good to me.
Generally, it looks like I should be in charge for this, given I've
committed previous patch by Anton. Thank you for reacting here faster
than me. Please, go ahead with the patch.
Thanks for the review! Pushed!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024/09/18 23:35, Anton A. Melnikov wrote:
Fujii, Alexander thanks a lot!
On 17.09.2024 05:47, Fujii Masao wrote:
Regarding the patch: if (do_restartpoint) PendingCheckpointerStats.restartpoints_performed++; + else + PendingCheckpointerStats.num_performed++;I expected the counter not to be incremented when a checkpoint is skipped,
but in this code, when a checkpoint is skipped, ckpt_performed is set to true,
triggering the counter increment. This seems wrong.Tried to fix it via returning bool value from the CreateCheckPoint()
similarly to the CreateRestartPoint().And slightly adjusted the patch so that it could be applied after yours.
Thanks for updating the patch!
-void
+bool
CreateCheckPoint(int flags)
It would be helpful to explain the new return value in the comment
at the top of this function.
- CreateCheckPoint(flags);
- ckpt_performed = true;
+ ckpt_performed = CreateCheckPoint(flags);
This change could result in the next scheduled checkpoint being
triggered in 15 seconds if a checkpoint is skipped, which isn’t
the intended behavior.
-{ oid => '2769',
+{ oid => '6347',
I don't think that the existing functions need to be reassigned new OIDs.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 18.09.2024 21:04, Fujii Masao wrote:
- CreateCheckPoint(flags); - ckpt_performed = true; + ckpt_performed = CreateCheckPoint(flags);This change could result in the next scheduled checkpoint being
triggered in 15 seconds if a checkpoint is skipped, which isn’t
the intended behavior.
Thanks for pointing this out! This is really bug.
Rearranged the logic a bit to save the previous behavior
in the v3 attached.
-void +bool CreateCheckPoint(int flags)It would be helpful to explain the new return value in the comment
at the top of this function.
Sure. Added an info about return value to the comment.
-{ oid => '2769', +{ oid => '6347',I don't think that the existing functions need to be reassigned new OIDs.
Ok. Left oids as is in the v3. Just added a new one for
pg_stat_get_checkpointer_num_performed().
With the best regards!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v3-0001-Introduce-num_done-counter-in-the-pg_stat_checkpoint.patchtext/x-patch; charset=UTF-8; name=v3-0001-Introduce-num_done-counter-in-the-pg_stat_checkpoint.patchDownload
From 5832b1beb453b96a6ccbe72c404f2ad5373d0497 Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" <a.melnikov@postgrespro.ru>
Date: Thu, 19 Sep 2024 12:51:17 +0300
Subject: [PATCH] Introduce num_done counter in the pg_stat_checkpointer view
that reflects number of really performed checkpoints.
---
doc/src/sgml/monitoring.sgml | 11 ++++-
src/backend/access/transam/xlog.c | 9 +++-
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/checkpointer.c | 45 ++++++++++++-------
.../utils/activity/pgstat_checkpointer.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 6 +++
src/include/access/xlog.h | 2 +-
src/include/catalog/pg_proc.dat | 26 ++++++-----
src/include/pgstat.h | 1 +
src/test/regress/expected/rules.out | 1 +
10 files changed, 74 insertions(+), 30 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a2fda4677d7..19bf0164f1c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3063,7 +3063,16 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_requested</structfield> <type>bigint</type>
</para>
<para>
- Number of requested checkpoints that have been performed
+ Number of backend requested checkpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>num_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of checkpoints that have been performed
</para></entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 853ab06812b..c9d37cba453 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6878,8 +6878,11 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
* In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
* both the record marking the completion of the checkpoint and the location
* from which WAL replay would begin if needed.
+ *
+ * Returns true if a new checkpoint was performed or false if checkpoint
+ * was skipped because system is idle.
*/
-void
+bool
CreateCheckPoint(int flags)
{
bool shutdown;
@@ -6971,7 +6974,7 @@ CreateCheckPoint(int flags)
END_CRIT_SECTION();
ereport(DEBUG1,
(errmsg_internal("checkpoint skipped because system is idle")));
- return;
+ return false;
}
}
@@ -7353,6 +7356,8 @@ CreateCheckPoint(int flags)
CheckpointStats.ckpt_segs_added,
CheckpointStats.ckpt_segs_removed,
CheckpointStats.ckpt_segs_recycled);
+
+ return true;
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..49109dbdc86 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1138,6 +1138,7 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c85726..8d610a3f1a7 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -461,8 +461,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
*/
if (!do_restartpoint)
{
- CreateCheckPoint(flags);
- ckpt_performed = true;
+ ckpt_performed = CreateCheckPoint(flags);
}
else
ckpt_performed = CreateRestartPoint(flags);
@@ -484,27 +483,41 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
ConditionVariableBroadcast(&CheckpointerShmem->done_cv);
- if (ckpt_performed)
+ if (!do_restartpoint)
{
/*
- * Note we record the checkpoint start time not end time as
- * last_checkpoint_time. This is so that time-driven
- * checkpoints happen at a predictable spacing.
- */
+ * Note we record the checkpoint start time not end time as
+ * last_checkpoint_time. This is so that time-driven
+ * checkpoints happen at a predictable spacing.
+ */
last_checkpoint_time = now;
- if (do_restartpoint)
- PendingCheckpointerStats.restartpoints_performed++;
+ if(ckpt_performed)
+ PendingCheckpointerStats.num_performed++;
+
}
else
{
- /*
- * We were not able to perform the restartpoint (checkpoints
- * throw an ERROR in case of error). Most likely because we
- * have not received any new checkpoint WAL records since the
- * last restartpoint. Try again in 15 s.
- */
- last_checkpoint_time = now - CheckPointTimeout + 15;
+ if (ckpt_performed)
+ {
+ /*
+ * The same as for checkpoint.
+ * Please see the corresponding comment.
+ */
+ last_checkpoint_time = now;
+
+ PendingCheckpointerStats.restartpoints_performed++;
+ }
+ else
+ {
+ /*
+ * We were not able to perform the restartpoint (checkpoints
+ * throw an ERROR in case of error). Most likely because we
+ * have not received any new checkpoint WAL records since the
+ * last restartpoint. Try again in 15 s.
+ */
+ last_checkpoint_time = now - CheckPointTimeout + 15;
+ }
}
ckpt_active = false;
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e183..4a0a2d1493a 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,7 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(num_performed);
CHECKPOINTER_ACC(restartpoints_timed);
CHECKPOINTER_ACC(restartpoints_requested);
CHECKPOINTER_ACC(restartpoints_performed);
@@ -127,6 +128,7 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(num_performed);
CHECKPOINTER_COMP(restartpoints_timed);
CHECKPOINTER_COMP(restartpoints_requested);
CHECKPOINTER_COMP(restartpoints_performed);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9c23ac7c8c8..17b0fc02ef0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1191,6 +1191,12 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_num_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_performed);
+}
+
Datum
pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 083810f5b4c..36f6e4e4b4e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -239,7 +239,7 @@ extern void LocalProcessControlFile(bool reset);
extern WalLevel GetActiveWalLevelOnStandby(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
-extern void CreateCheckPoint(int flags);
+extern bool CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN);
extern void XLogPutNextOid(Oid nextOid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43f608d7a0a..baa6120e664 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5817,6 +5817,12 @@
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
+{ oid => '2775',
+ descr => 'statistics: number of checkpoints performed by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_num_performed',
+ provolatile => 's', proparallel => 'r', prorettype => 'int8',
+ proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_num_performed' },
{ oid => '6327',
descr => 'statistics: number of timed restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
@@ -5843,6 +5849,16 @@
proname => 'pg_stat_get_checkpointer_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_stat_reset_time' },
+ { oid => '3160',
+ descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_write_time' },
+{ oid => '3161',
+ descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
+ proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
+ proparallel => 'r', prorettype => 'float8', proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2772',
descr => 'statistics: number of buffers written by the bgwriter for cleaning dirty buffers',
proname => 'pg_stat_get_bgwriter_buf_written_clean', provolatile => 's',
@@ -5857,16 +5873,6 @@
proname => 'pg_stat_get_bgwriter_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
prosrc => 'pg_stat_get_bgwriter_stat_reset_time' },
-{ oid => '3160',
- descr => 'statistics: checkpoint/restartpoint time spent writing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_write_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_write_time' },
-{ oid => '3161',
- descr => 'statistics: checkpoint/restartpoint time spent synchronizing buffers to disk, in milliseconds',
- proname => 'pg_stat_get_checkpointer_sync_time', provolatile => 's',
- proparallel => 'r', prorettype => 'float8', proargtypes => '',
- prosrc => 'pg_stat_get_checkpointer_sync_time' },
{ oid => '2859', descr => 'statistics: number of buffer allocations',
proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4752dfe7197..476acd680c0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -294,6 +294,7 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter num_performed;
PgStat_Counter restartpoints_timed;
PgStat_Counter restartpoints_requested;
PgStat_Counter restartpoints_performed;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae9..f5434d8365c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,6 +1824,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
--
2.46.0
On 2024/09/19 19:16, Anton A. Melnikov wrote:
On 18.09.2024 21:04, Fujii Masao wrote:
- CreateCheckPoint(flags); - ckpt_performed = true; + ckpt_performed = CreateCheckPoint(flags);This change could result in the next scheduled checkpoint being
triggered in 15 seconds if a checkpoint is skipped, which isn’t
the intended behavior.Thanks for pointing this out! This is really bug.
Rearranged the logic a bit to save the previous behavior
in the v3 attached.
Thanks for updating the patch!
I've attached the updated version (0001.patch). I made some cosmetic changes,
including reverting the switch in the entries for pg_stat_get_checkpointer_write_time
and pg_stat_get_checkpointer_sync_time in pg_proc.dat, as I didn’t think
that change was necessary. Could you please review the latest version?
After we commit 0001.patch, how about applying 0002.patch, which updates
the documentation for the pg_stat_checkpointer view to clarify what types
of checkpoints and restartpoints each counter tracks?
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v4-0001-Add-num_done-counter-to-the-pg_stat_checkpointer-.patchtext/plain; charset=UTF-8; name=v4-0001-Add-num_done-counter-to-the-pg_stat_checkpointer-.patchDownload
From 859f3fecb4fb4900b6ce12f6346c5d9565fbc072 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 20 Sep 2024 11:33:07 +0900
Subject: [PATCH v4 1/2] Add num_done counter to the pg_stat_checkpointer view.
Checkpoints can be skipped when the server is idle. The existing num_timed and
num_requested counters in pg_stat_checkpointer track both completed and
skipped checkpoints, but there was no way to count only the completed ones.
This commit introduces the num_done counter, which tracks only completed
checkpoints, making it easier to see how many were actually performed.
Bump catalog version.
Author: Anton A. Melnikov
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com
---
doc/src/sgml/monitoring.sgml | 11 +++++-
src/backend/access/transam/xlog.c | 9 ++++-
src/backend/catalog/system_views.sql | 1 +
src/backend/postmaster/checkpointer.c | 39 ++++++++++++-------
.../utils/activity/pgstat_checkpointer.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 6 +++
src/include/access/xlog.h | 2 +-
src/include/catalog/pg_proc.dat | 6 +++
src/include/pgstat.h | 1 +
src/test/regress/expected/rules.out | 1 +
10 files changed, 60 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a2fda4677d..19bf0164f1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3063,7 +3063,16 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_requested</structfield> <type>bigint</type>
</para>
<para>
- Number of requested checkpoints that have been performed
+ Number of backend requested checkpoints
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>num_done</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of checkpoints that have been performed
</para></entry>
</row>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 853ab06812..64304d77d3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6878,8 +6878,11 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
* In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
* both the record marking the completion of the checkpoint and the location
* from which WAL replay would begin if needed.
+ *
+ * Returns true if a new checkpoint was performed, or false if it was skipped
+ * because the system was idle.
*/
-void
+bool
CreateCheckPoint(int flags)
{
bool shutdown;
@@ -6971,7 +6974,7 @@ CreateCheckPoint(int flags)
END_CRIT_SECTION();
ereport(DEBUG1,
(errmsg_internal("checkpoint skipped because system is idle")));
- return;
+ return false;
}
}
@@ -7353,6 +7356,8 @@ CreateCheckPoint(int flags)
CheckpointStats.ckpt_segs_added,
CheckpointStats.ckpt_segs_removed,
CheckpointStats.ckpt_segs_recycled);
+
+ return true;
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..49109dbdc8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1138,6 +1138,7 @@ CREATE VIEW pg_stat_checkpointer AS
SELECT
pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c8572..9087e3f8db 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -460,10 +460,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
* Do the checkpoint.
*/
if (!do_restartpoint)
- {
- CreateCheckPoint(flags);
- ckpt_performed = true;
- }
+ ckpt_performed = CreateCheckPoint(flags);
else
ckpt_performed = CreateRestartPoint(flags);
@@ -484,7 +481,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
ConditionVariableBroadcast(&CheckpointerShmem->done_cv);
- if (ckpt_performed)
+ if (!do_restartpoint)
{
/*
* Note we record the checkpoint start time not end time as
@@ -493,18 +490,32 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
*/
last_checkpoint_time = now;
- if (do_restartpoint)
- PendingCheckpointerStats.restartpoints_performed++;
+ if (ckpt_performed)
+ PendingCheckpointerStats.num_performed++;
}
else
{
- /*
- * We were not able to perform the restartpoint (checkpoints
- * throw an ERROR in case of error). Most likely because we
- * have not received any new checkpoint WAL records since the
- * last restartpoint. Try again in 15 s.
- */
- last_checkpoint_time = now - CheckPointTimeout + 15;
+ if (ckpt_performed)
+ {
+ /*
+ * The same as for checkpoint. Please see the
+ * corresponding comment.
+ */
+ last_checkpoint_time = now;
+
+ PendingCheckpointerStats.restartpoints_performed++;
+ }
+ else
+ {
+ /*
+ * We were not able to perform the restartpoint
+ * (checkpoints throw an ERROR in case of error). Most
+ * likely because we have not received any new checkpoint
+ * WAL records since the last restartpoint. Try again in
+ * 15 s.
+ */
+ last_checkpoint_time = now - CheckPointTimeout + 15;
+ }
}
ckpt_active = false;
diff --git a/src/backend/utils/activity/pgstat_checkpointer.c b/src/backend/utils/activity/pgstat_checkpointer.c
index bbfc9c7e18..4a0a2d1493 100644
--- a/src/backend/utils/activity/pgstat_checkpointer.c
+++ b/src/backend/utils/activity/pgstat_checkpointer.c
@@ -49,6 +49,7 @@ pgstat_report_checkpointer(void)
#define CHECKPOINTER_ACC(fld) stats_shmem->stats.fld += PendingCheckpointerStats.fld
CHECKPOINTER_ACC(num_timed);
CHECKPOINTER_ACC(num_requested);
+ CHECKPOINTER_ACC(num_performed);
CHECKPOINTER_ACC(restartpoints_timed);
CHECKPOINTER_ACC(restartpoints_requested);
CHECKPOINTER_ACC(restartpoints_performed);
@@ -127,6 +128,7 @@ pgstat_checkpointer_snapshot_cb(void)
#define CHECKPOINTER_COMP(fld) pgStatLocal.snapshot.checkpointer.fld -= reset.fld;
CHECKPOINTER_COMP(num_timed);
CHECKPOINTER_COMP(num_requested);
+ CHECKPOINTER_COMP(num_performed);
CHECKPOINTER_COMP(restartpoints_timed);
CHECKPOINTER_COMP(restartpoints_requested);
CHECKPOINTER_COMP(restartpoints_performed);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9c23ac7c8c..17b0fc02ef 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1191,6 +1191,12 @@ pg_stat_get_checkpointer_num_requested(PG_FUNCTION_ARGS)
PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_requested);
}
+Datum
+pg_stat_get_checkpointer_num_performed(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT64(pgstat_fetch_stat_checkpointer()->num_performed);
+}
+
Datum
pg_stat_get_checkpointer_restartpoints_timed(PG_FUNCTION_ARGS)
{
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 083810f5b4..36f6e4e4b4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -239,7 +239,7 @@ extern void LocalProcessControlFile(bool reset);
extern WalLevel GetActiveWalLevelOnStandby(void);
extern void StartupXLOG(void);
extern void ShutdownXLOG(int code, Datum arg);
-extern void CreateCheckPoint(int flags);
+extern bool CreateCheckPoint(int flags);
extern bool CreateRestartPoint(int flags);
extern WALAvailability GetWALAvailability(XLogRecPtr targetLSN);
extern void XLogPutNextOid(Oid nextOid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43f608d7a0..52fca54f3a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5817,6 +5817,12 @@
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
+{ oid => '2775',
+ descr => 'statistics: number of checkpoints performed by the checkpointer',
+ proname => 'pg_stat_get_checkpointer_num_performed',
+ provolatile => 's', proparallel => 'r', prorettype => 'int8',
+ proargtypes => '',
+ prosrc => 'pg_stat_get_checkpointer_num_performed' },
{ oid => '6327',
descr => 'statistics: number of timed restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_timed', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4752dfe719..476acd680c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -294,6 +294,7 @@ typedef struct PgStat_CheckpointerStats
{
PgStat_Counter num_timed;
PgStat_Counter num_requested;
+ PgStat_Counter num_performed;
PgStat_Counter restartpoints_timed;
PgStat_Counter restartpoints_requested;
PgStat_Counter restartpoints_performed;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..f5434d8365 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1824,6 +1824,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_buf_written_clean() AS buffers_cle
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_checkpointer| SELECT pg_stat_get_checkpointer_num_timed() AS num_timed,
pg_stat_get_checkpointer_num_requested() AS num_requested,
+ pg_stat_get_checkpointer_num_performed() AS num_done,
pg_stat_get_checkpointer_restartpoints_timed() AS restartpoints_timed,
pg_stat_get_checkpointer_restartpoints_requested() AS restartpoints_req,
pg_stat_get_checkpointer_restartpoints_performed() AS restartpoints_done,
--
2.45.2
v4-0002-docs-Enhance-the-pg_stat_checkpointer-view-docume.patchtext/plain; charset=UTF-8; name=v4-0002-docs-Enhance-the-pg_stat_checkpointer-view-docume.patchDownload
From d9311aee5a0e7665eb0ea32928f728a8cd01fab5 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Sat, 21 Sep 2024 00:47:12 +0900
Subject: [PATCH v4 2/2] docs: Enhance the pg_stat_checkpointer view
documentation.
This commit updates the documentation for the pg_stat_checkpointer view
to clarify what kind of checkpoints or restartpoints each counter tracks.
This makes it easier to understand the meaning of each counter.
---
doc/src/sgml/monitoring.sgml | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 19bf0164f1..3484a0490a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3051,10 +3051,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_timed</structfield> <type>bigint</type>
</para>
<para>
- Number of scheduled checkpoints due to timeout.
- Note that checkpoints may be skipped if the server has been idle
- since the last one, and this value counts both completed and
- skipped checkpoints
+ Number of scheduled checkpoints due to timeout
</para></entry>
</row>
@@ -3063,7 +3060,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
<structfield>num_requested</structfield> <type>bigint</type>
</para>
<para>
- Number of backend requested checkpoints
+ Number of requested checkpoints
</para></entry>
</row>
@@ -3146,6 +3143,18 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</tgroup>
</table>
+ <para>
+ Checkpoints may be skipped if the server has been idle since the last one.
+ <structfield>num_timed</structfield> and
+ <structfield>num_requested</structfield> count both completed and skipped
+ checkpoints, while <structfield>num_done</structfield> tracks only
+ the completed ones. Similarly, restartpoints may be skipped
+ if the last replayed checkpoint record is already the last restartpoint.
+ <structfield>restartpoints_timed</structfield> and
+ <structfield>restartpoints_req</structfield> count both completed and
+ skipped restartpoints, while <structfield>restartpoints_done</structfield>
+ tracks only the completed ones.
+ </para>
</sect2>
<sect2 id="monitoring-pg-stat-wal-view">
--
2.45.2
On 20.09.2024 19:19, Fujii Masao wrote:
I've attached the updated version (0001.patch). I made some cosmetic changes,
including reverting the switch in the entries for pg_stat_get_checkpointer_write_time
and pg_stat_get_checkpointer_sync_time in pg_proc.dat, as I didn’t think
that change was necessary. Could you please review the latest version?
Thanks for corrections!
All looks good for me.
As for switching in the pg_proc.dat entries the idea was to put them in order
so that the pg_stat_get_checkpointer* functions were grouped together.
I don't know if this is the common and accepted practice. Simply i like it better this way.
Sure, if you think it's unnecessary, let it stay as is with minimal diff.
After we commit 0001.patch, how about applying 0002.patch, which updates
the documentation for the pg_stat_checkpointer view to clarify what types
of checkpoints and restartpoints each counter tracks?
I liked that the short definitions of the counters are now separated from
the description of its work features which are combined into one paragraph.
It seems to me that is much more logical and easier to understand.
In addition, checkpoints may be skipped due to "checkpoints are occurring
too frequently" error. Not sure, but maybe add this information to
the new description?
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?
Agreed. E.g. from xlog. Then maybe changed it also in the function
descriptions in the pg_proc.dat? For pg_stat_get_checkpointer_num_requested()
and pg_stat_get_checkpointer_restartpoints_requested().
Also checked v4 with the travis patch-tester. All is ok.
With the best wishes!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2024/09/22 13:55, Anton A. Melnikov wrote:
On 20.09.2024 19:19, Fujii Masao wrote:
I've attached the updated version (0001.patch). I made some cosmetic changes,
including reverting the switch in the entries for pg_stat_get_checkpointer_write_time
and pg_stat_get_checkpointer_sync_time in pg_proc.dat, as I didn’t think
that change was necessary. Could you please review the latest version?Thanks for corrections!
All looks good for me.
Thanks for the review! I've pushed the 0001 patch.
As for switching in the pg_proc.dat entries the idea was to put them in order
so that the pg_stat_get_checkpointer* functions were grouped together.
I don't know if this is the common and accepted practice. Simply i like it better this way.
Sure, if you think it's unnecessary, let it stay as is with minimal diff.
I understand your point, but I didn't made that change to keep the diff minimal,
which should make future back-patching easier.
After we commit 0001.patch, how about applying 0002.patch, which updates
the documentation for the pg_stat_checkpointer view to clarify what types
of checkpoints and restartpoints each counter tracks?I liked that the short definitions of the counters are now separated from
the description of its work features which are combined into one paragraph.
It seems to me that is much more logical and easier to understand.
Thanks for the review!
In addition, checkpoints may be skipped due to "checkpoints are occurring
too frequently" error. Not sure, but maybe add this information to
the new description?
From what I can see in the code, that error message doesn’t seem to indicate
the checkpoint is being skipped. In fact, checkpoints are still happening
actually when that message appears. Am I misunderstanding something?
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?Agreed. E.g. from xlog. Then maybe changed it also in the function
descriptions in the pg_proc.dat? For pg_stat_get_checkpointer_num_requested()
and pg_stat_get_checkpointer_restartpoints_requested().
Yes, good catch!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 30.09.2024 06:26, Fujii Masao wrote:
Thanks for the review! I've pushed the 0001 patch.
Thanks a lot!
As for switching in the pg_proc.dat entries the idea was to put them in order
so that the pg_stat_get_checkpointer* functions were grouped together.
I don't know if this is the common and accepted practice. Simply i like it better this way.
Sure, if you think it's unnecessary, let it stay as is with minimal diff.I understand your point, but I didn't made that change to keep the diff minimal,
which should make future back-patching easier.
Agreed. Its quite reasonable. I've not take into account the backporting
possibility at all. This is of course wrong.
In addition, checkpoints may be skipped due to "checkpoints are occurring
too frequently" error. Not sure, but maybe add this information to
the new description?From what I can see in the code, that error message doesn’t seem to indicate
the checkpoint is being skipped. In fact, checkpoints are still happening
actually when that message appears. Am I misunderstanding something?
No, you are right! This is my oversight. I didn't notice that elevel is just a log
not a error. Thanks!
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2024/09/30 16:00, Anton A. Melnikov wrote:
On 30.09.2024 06:26, Fujii Masao wrote:
Thanks for the review! I've pushed the 0001 patch.
Thanks a lot!
As for switching in the pg_proc.dat entries the idea was to put them in order
so that the pg_stat_get_checkpointer* functions were grouped together.
I don't know if this is the common and accepted practice. Simply i like it better this way.
Sure, if you think it's unnecessary, let it stay as is with minimal diff.I understand your point, but I didn't made that change to keep the diff minimal,
which should make future back-patching easier.Agreed. Its quite reasonable. I've not take into account the backporting
possibility at all. This is of course wrong.In addition, checkpoints may be skipped due to "checkpoints are occurring
too frequently" error. Not sure, but maybe add this information to
the new description?From what I can see in the code, that error message doesn’t seem to indicate
the checkpoint is being skipped. In fact, checkpoints are still happening
actually when that message appears. Am I misunderstanding something?No, you are right! This is my oversight. I didn't notice that elevel is just a log
not a error. Thanks!
Ok, so I pushed 0002.patch. Thanks for the review!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2024/09/30 12:26, Fujii Masao wrote:
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?Agreed. E.g. from xlog. Then maybe changed it also in the function
descriptions in the pg_proc.dat? For pg_stat_get_checkpointer_num_requested()
and pg_stat_get_checkpointer_restartpoints_requested().Yes, good catch!
Patch attached.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
v1-0001-Improve-descriptions-of-some-pg_stat_checkpoints-.patchtext/plain; charset=UTF-8; name=v1-0001-Improve-descriptions-of-some-pg_stat_checkpoints-.patchDownload
From 5c935f5263fc4d516ceaf8d46c2a06daf035f1f7 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Tue, 8 Oct 2024 21:12:48 +0900
Subject: [PATCH v1] Improve descriptions of some pg_stat_checkpoints functions
in pg_proc.dat.
Previously, the descriptions of pg_stat_get_checkpointer_num_requested(),
pg_stat_get_checkpointer_restartpoints_requested(),
and pg_stat_get_checkpointer_restartpoints_performed() in pg_proc.dat
referred to "backend". This was misleading because these functions report
the number of checkpoints or restartpoints requested or performed
by other than backends as well.
This commit removes "backend" from these descriptions to avoid confusion.
---
src/include/catalog/pg_proc.dat | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..2ba144a7aa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5816,7 +5816,7 @@
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_timed' },
{ oid => '2770',
- descr => 'statistics: number of backend requested checkpoints started by the checkpointer',
+ descr => 'statistics: number of requested checkpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_num_requested', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_num_requested' },
@@ -5831,13 +5831,13 @@
proparallel => 'r', prorettype => 'int8', proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_timed' },
{ oid => '6328',
- descr => 'statistics: number of backend requested restartpoints started by the checkpointer',
+ descr => 'statistics: number of requested restartpoints started by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_requested',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
prosrc => 'pg_stat_get_checkpointer_restartpoints_requested' },
{ oid => '6329',
- descr => 'statistics: number of backend performed restartpoints',
+ descr => 'statistics: number of restartpoints performed by the checkpointer',
proname => 'pg_stat_get_checkpointer_restartpoints_performed',
provolatile => 's', proparallel => 'r', prorettype => 'int8',
proargtypes => '',
--
2.46.2
On 08.10.2024 15:42, Fujii Masao wrote:
On 2024/09/30 12:26, Fujii Masao wrote:
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?Agreed. E.g. from xlog. Then maybe changed it also in the function
descriptions in the pg_proc.dat? For pg_stat_get_checkpointer_num_requested()
and pg_stat_get_checkpointer_restartpoints_requested().Yes, good catch!
Patch attached.
Looked at the patch. Just in case, checked that neither
“backend completed” nor “backend requested” were found anywhere else.
All is ok for me.
Thanks a lot!
With the best wishes,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 2024/10/08 23:16, Anton A. Melnikov wrote:
On 08.10.2024 15:42, Fujii Masao wrote:
On 2024/09/30 12:26, Fujii Masao wrote:
In 0002.patch, I also modified the description of num_requested from
"Number of backend requested checkpoints" to remove "backend," as it can
be confusing since num_requested includes requests from sources other than
the backend. Thought?Agreed. E.g. from xlog. Then maybe changed it also in the function
descriptions in the pg_proc.dat? For pg_stat_get_checkpointer_num_requested()
and pg_stat_get_checkpointer_restartpoints_requested().Yes, good catch!
Patch attached.
Looked at the patch. Just in case, checked that neither
“backend completed” nor “backend requested” were found anywhere else.
All is ok for me.
Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 10.10.2024 18:14, Fujii Masao wrote:
Thanks for the review! Pushed.
Thanks a lot!
With the best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company