Wait events for delayed checkpoints
Hi,
You can't tell if your checkpointer is spending a lot of time waiting
around for flags in delayChkptFlags to clear. Trivial patch to add
that. I've managed to see it a few times when checkpointing
repeatedly with a heavy pgbench workload.
I had to stop and think for a moment about whether these events belong
under "WaitEventIPC", "waiting for notification from another process"
or under "WaitEventTimeout", "waiting for a timeout to expire". I
mean, both? It's using sleep-and-poll instead of (say) a CV due to
the economics, we want to make the other side as cheap as possible, so
we don't care about making the checkpointer take some micro-naps in
this case. I feel like the key point here is that it's waiting for
another process to do stuff and unblock it.
Attachments:
0001-Add-wait-events-for-checkpoint-delay-mechanism.patchtext/x-patch; charset=US-ASCII; name=0001-Add-wait-events-for-checkpoint-delay-mechanism.patchDownload+6-1
On Wed, Oct 11, 2023 at 9:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
You can't tell if your checkpointer is spending a lot of time waiting
around for flags in delayChkptFlags to clear. Trivial patch to add
that. I've managed to see it a few times when checkpointing
repeatedly with a heavy pgbench workload.I had to stop and think for a moment about whether these events belong
under "WaitEventIPC", "waiting for notification from another process"
or under "WaitEventTimeout", "waiting for a timeout to expire". I
mean, both? It's using sleep-and-poll instead of (say) a CV due to
the economics, we want to make the other side as cheap as possible, so
we don't care about making the checkpointer take some micro-naps in
this case. I feel like the key point here is that it's waiting for
another process to do stuff and unblock it.
IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.
+1 for the idea, too.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:
IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.+1 for the idea, too.
Agreed that timeout makes little sense in this context, and IPC looks
correct.
+ pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
do
{
pg_usleep(10000L); /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
DELAY_CHKPT_START));
+ pgstat_report_wait_end();
HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?
--
Michael
On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:
IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.+1 for the idea, too.
Agreed that timeout makes little sense in this context, and IPC looks
correct.+ pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START); do { pg_usleep(10000L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids, DELAY_CHKPT_START)); + pgstat_report_wait_end();HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?
Ah, right: the wait event should be set and cleared around pg_usleep,
not the whole loop.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Oct 13, 2023 at 2:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote:
HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?Ah, right: the wait event should be set and cleared around pg_usleep,
not the whole loop.
Duh. Yeah. Pushed like that. Thanks both.