Re: Flush some statistics within running transactions

Started by Alvaro Herrera4 months ago4 messageshackers

alvherre@2ndquadrant.com

4 months ago

On 2026-Jan-30, Álvaro Herrera wrote:

So apparently the first function to do this in postinit.c was added by
commit c6dda1f48e57 -- and apparently it was mimicking
CheckDeadLockAlert(), which at this time looked like this:

I'm now wondering if CheckDeadLockAlert() really needed to have this in
the first place, or it was just an exercise in paranoia ... it was added
by commit 6753333f55e1, with the discussion in [1]/messages/by-id/20150115020335.GZ5245@awork2.anarazel.de, and it's not clear
to me that there was any theoretical or experimental evidence that it
was necessary; the thread didn't discuss it, and the commit message
doesn't either. Added Andres to CC as committer to this thread, maybe
he remembers.

[1]: /messages/by-id/20150115020335.GZ5245@awork2.anarazel.de

Just for laughs I moved the SetLatch call in handle_sig_alarm() to the
bottom, and remove the ones in handlers, on the theory that by the time
the SetLatch call is reached, all the handlers have already run and thus
the flag variables are set. Everything seems to continue to work:
https://cirrus-ci.com/build/5758839359799296

(Though to be honest, it's not clear to me why it would matter at which
point in handle_sig_alarm we call SetLatch relative to the variables
being set, given that these variables are only going to matter once the
signal handler returns to the original code and the next
CHECK_FOR_INTERRUPTS is hit.)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Import Notes

Reply to msg id not found: 202601301136.kvw5jklzhkru@alvherre.pgsql

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

4 months ago

In reply to: Alvaro Herrera (#1)

Hi,

On Fri, Jan 30, 2026 at 03:37:57PM +0100, Álvaro Herrera wrote:

On 2026-Jan-30, Álvaro Herrera wrote:

So apparently the first function to do this in postinit.c was added by
commit c6dda1f48e57 -- and apparently it was mimicking
CheckDeadLockAlert(), which at this time looked like this:

I'm now wondering if CheckDeadLockAlert() really needed to have this in
the first place, or it was just an exercise in paranoia ... it was added
by commit 6753333f55e1, with the discussion in [1], and it's not clear
to me that there was any theoretical or experimental evidence that it
was necessary; the thread didn't discuss it, and the commit message
doesn't either. Added Andres to CC as committer to this thread, maybe
he remembers.

[1] /messages/by-id/20150115020335.GZ5245@awork2.anarazel.de

Just for laughs I moved the SetLatch call in handle_sig_alarm() to the
bottom, and remove the ones in handlers, on the theory that by the time
the SetLatch call is reached, all the handlers have already run and thus
the flag variables are set. Everything seems to continue to work:
https://cirrus-ci.com/build/5758839359799296

Thanks for having looked at this!

(Though to be honest, it's not clear to me why it would matter at which
point in handle_sig_alarm we call SetLatch relative to the variables
being set, given that these variables are only going to matter once the
signal handler returns to the original code and the next
CHECK_FOR_INTERRUPTS is hit.)

Yeah, I think that we could keep the SetLatch() at the top of handle_sig_alarm().

My understanding is that the signal handler runs to completion without being interrupted
by the code it interrupted. So, by the time the interrupted code (like epoll_wait())
resumes and can check the latch, the entire handler has finished. So, if my understanding
is correct, having SetLatch() at the top or the bottom should not change anything (
as long as we don't have nested signal handlers).

Out of curiosity, I also remove the ones in handlers (and keep the one in handle_sig_alarm()
at the top), and everything seems to work fine:

https://cirrus-ci.com/build/6277169619402752

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Andres Freund

andres@anarazel.de

4 months ago

In reply to: Bertrand Drouvot (#2)

Hi,

On 2026-02-03 06:19:13 +0000, Bertrand Drouvot wrote:

On Fri, Jan 30, 2026 at 03:37:57PM +0100, Álvaro Herrera wrote:

(Though to be honest, it's not clear to me why it would matter at which
point in handle_sig_alarm we call SetLatch relative to the variables
being set, given that these variables are only going to matter once the
signal handler returns to the original code and the next
CHECK_FOR_INTERRUPTS is hit.)

Yeah, I think that we could keep the SetLatch() at the top of handle_sig_alarm().

Why at the top, rather than at the bottom? I don't think / I hope today's
signal handlers rely on it, but in the past we had cases where some signal
handlers ran code that could lead to a ResetLatch() being done.

Why does it matter for your patch whether SetLatch() is done multiple times as
part of various timeout handlers? I don't see how repeated SetLatch() calls
could trigger more interference with ProcSleep()? Once the latch is set it is
set (and indeed SetLatch() just returns immediately if it already is set).

My understanding is that the signal handler runs to completion without being interrupted
by the code it interrupted.

Right. But they can, on some platforms at least, be interrupted by *other*
signal handlers. I don't see any reason to believe that is not happening at
the moment.

Out of curiosity, I also remove the ones in handlers (and keep the one in handle_sig_alarm()
at the top), and everything seems to work fine:

https://cirrus-ci.com/build/6277169619402752

That doesn't tell you very much, I think. Our coverage of the relevant edge
cases isn't that good, I think.

Greetings,

Andres Freund

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

4 months ago

In reply to: Andres Freund (#3)

Hi,

On Tue, Feb 03, 2026 at 12:09:31PM -0500, Andres Freund wrote:

Hi,

On 2026-02-03 06:19:13 +0000, Bertrand Drouvot wrote:

On Fri, Jan 30, 2026 at 03:37:57PM +0100, Álvaro Herrera wrote:

(Though to be honest, it's not clear to me why it would matter at which
point in handle_sig_alarm we call SetLatch relative to the variables
being set, given that these variables are only going to matter once the
signal handler returns to the original code and the next
CHECK_FOR_INTERRUPTS is hit.)

Why does it matter for your patch whether SetLatch() is done multiple times as
part of various timeout handlers? I don't see how repeated SetLatch() calls
could trigger more interference with ProcSleep()? Once the latch is set it is
set (and indeed SetLatch() just returns immediately if it already is set).

Yeah, this was just a finding while diagnosing the ProcSleep() "issue". This
discussion is not relevant in this thread anymore (specially since v5 where the
design changed in such a way that the ProcSleep() "issue" does not appear
anymore).

We could open a dedicated thread if we think that's worth continuing the discussion
about removing the SetLatch() in those handlers (but they are probably harmless
to keep afterall). Thanks to you and Álvaro for having shared your thoughts on
it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com