Wait events for delayed checkpoints

Started by Thomas Munroover 2 years ago5 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

You can't tell if your checkpointer is spending a lot of time waiting
around for flags in delayChkptFlags to clear. Trivial patch to add
that. I've managed to see it a few times when checkpointing
repeatedly with a heavy pgbench workload.

I had to stop and think for a moment about whether these events belong
under "WaitEventIPC", "waiting for notification from another process"
or under "WaitEventTimeout", "waiting for a timeout to expire". I
mean, both? It's using sleep-and-poll instead of (say) a CV due to
the economics, we want to make the other side as cheap as possible, so
we don't care about making the checkpointer take some micro-naps in
this case. I feel like the key point here is that it's waiting for
another process to do stuff and unblock it.

Attachments:

0001-Add-wait-events-for-checkpoint-delay-mechanism.patchtext/x-patch; charset=US-ASCII; name=0001-Add-wait-events-for-checkpoint-delay-mechanism.patchDownload+6-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#1)
Re: Wait events for delayed checkpoints

On Wed, Oct 11, 2023 at 9:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:

You can't tell if your checkpointer is spending a lot of time waiting
around for flags in delayChkptFlags to clear. Trivial patch to add
that. I've managed to see it a few times when checkpointing
repeatedly with a heavy pgbench workload.

I had to stop and think for a moment about whether these events belong
under "WaitEventIPC", "waiting for notification from another process"
or under "WaitEventTimeout", "waiting for a timeout to expire". I
mean, both? It's using sleep-and-poll instead of (say) a CV due to
the economics, we want to make the other side as cheap as possible, so
we don't care about making the checkpointer take some micro-naps in
this case. I feel like the key point here is that it's waiting for
another process to do stuff and unblock it.

IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.

+1 for the idea, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

#3Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#2)
Re: Wait events for delayed checkpoints

On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:

IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.

+1 for the idea, too.

Agreed that timeout makes little sense in this context, and IPC looks
correct.

+        pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
         do
         {
             pg_usleep(10000L);    /* wait for 10 msec */
         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                                               DELAY_CHKPT_START));
+        pgstat_report_wait_end();

HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?
--
Michael

#4Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#3)
Re: Wait events for delayed checkpoints

On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:

IPC seems right to me. Yeah, a timeout is being used, but as you say,
that's an implementation detail.

+1 for the idea, too.

Agreed that timeout makes little sense in this context, and IPC looks
correct.

+        pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
do
{
pg_usleep(10000L);    /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
DELAY_CHKPT_START));
+        pgstat_report_wait_end();

HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?

Ah, right: the wait event should be set and cleared around pg_usleep,
not the whole loop.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#4)
Re: Wait events for delayed checkpoints

On Fri, Oct 13, 2023 at 2:19 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote:

HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?

Ah, right: the wait event should be set and cleared around pg_usleep,
not the whole loop.

Duh. Yeah. Pushed like that. Thanks both.