Add wait events for server logging destination writes

Started by 신성준11 days ago7 messageshackers
Jump to latest
#1신성준
shinsj4653@gmail.com

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.

Thanks,
Seongjun Shin

Attachments:

v1-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v1-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+8-1
v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchapplication/octet-stream; name=v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchDownload+3-1
#2신성준
shinsj4653@gmail.com
In reply to: 신성준 (#1)
Re: Add wait events for server logging destination writes

Hi,

cfbot caught a build failure on v1, in the SanityCheck task on Linux
and Windows: elog.c uses pgstat_report_wait_start()/end() and the
WAIT_EVENT_* constants but didn't include utils/wait_event.h. It only
built here because of an accidental transitive include on my machine;
on the CI images the declarations weren't visible.

v2 fixes that by adding the missing #include "utils/wait_event.h" to
elog.c, folded into 0001 so that patch builds on its own. No other
changes; the wait events and the reported write paths are the same as
in v1.

v2-0001 adds the two events and covers the write(2) paths.
v2-0002 covers the Windows WriteConsoleW() path, split out as before.

Applies cleanly on current master; full build passes locally.

Thanks,
Seongjun Shin

2026년 5월 31일 (일) 오후 5:50, 신성준 <shinsj4653@gmail.com>님이 작성:

Show quoted text

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.

Thanks,
Seongjun Shin

Attachments:

v2-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v2-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+9-1
v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchapplication/octet-stream; name=v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchDownload+3-1
#3Kirk Wolak
wolakk@gmail.com
In reply to: 신성준 (#1)
Re: Add wait events for server logging destination writes

On Sun, May 31, 2026 at 4:50 AM 신성준 <shinsj4653@gmail.com> wrote:

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

+1
Nice. We have too many waits that are registered as CPU.

Show quoted text
#4Michael Paquier
michael@paquier.xyz
In reply to: 신성준 (#2)
Re: Add wait events for server logging destination writes

On Sun, May 31, 2026 at 07:42:41PM +0900, 신성준 wrote:

cfbot caught a build failure on v1, in the SanityCheck task on Linux
and Windows: elog.c uses pgstat_report_wait_start()/end() and the
WAIT_EVENT_* constants but didn't include utils/wait_event.h. It only
built here because of an accidental transitive include on my machine;
on the CI images the declarations weren't visible.

v2 fixes that by adding the missing #include "utils/wait_event.h" to
elog.c, folded into 0001 so that patch builds on its own. No other
changes; the wait events and the reported write paths are the same as
in v1.

v2-0001 adds the two events and covers the write(2) paths.
v2-0002 covers the Windows WriteConsoleW() path, split out as before.

Applies cleanly on current master; full build passes locally.

Hmm. Usually we split the event numbers so as there is one for each
code path, but here we are just dealing with the same routine that
sends chunks. Using the same numbers seem fine by me.

If others have any thoughts or comments, feel free.
--
Michael

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: 신성준 (#1)
Re: Add wait events for server logging destination writes

Hello.

At Sun, 31 May 2026 17:50:08 +0900, 신성준 <shinsj4653@gmail.com> wrote in

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.

Also, if the intention is to cover all places where logging output can
block, I wonder whether the syslog() calls should be covered as
well. If they are intentionally excluded, perhaps a short comment
explaining the rationale would be useful.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6신성준
shinsj4653@gmail.com
In reply to: Kyotaro Horiguchi (#5)
Re: Add wait events for server logging destination writes

Hi,

Thanks Kirk, glad it's useful.

Kyotaro Horiguchi wrote:

Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.

Also, if the intention is to cover all places where logging output
can block, I wonder whether the syslog() calls should be covered as
well.

Good points -- both are blocking output paths and there's no real
reason to leave them out, so v3 instruments them rather than excluding
them with a comment. The intent is exactly to cover the places where
logging output can block, so this makes the series consistent.

v3 adds two more WaitEventIO events:

IO / SyslogWrite - syslog(3) in write_syslog()
IO / EventlogWrite - ReportEventW()/ReportEventA() in write_eventlog()

Same approach as before: the wait is reported only around the leaf
call, using the existing pgstat_report_wait_start()/end() helpers, so
it stays allocation-free and safe on the error-reporting path, and the
series still touches just elog.c and wait_event_names.txt.

This also matches Michael's point on v2 -- each event covers a routine
rather than a single call site, so SyslogWrite wraps the syslog(3)
calls in write_syslog() and EventlogWrite wraps both ReportEvent
variants in write_eventlog(), the same way SysloggerWrite already
covers the two writes in write_pipe_chunks().

As before, 0001 is the portable part and 0002 is the Windows part
(WriteConsoleW plus the new EventlogWrite).

One caveat: EventlogWrite is Windows-only, so I couldn't get a runtime
before/after for it here -- I've only confirmed it builds (cfbot's
Windows task should cover that). The other events still show up in the
sampling I posted earlier. If someone on Windows can exercise the
event-log path I'd appreciate a confirmation.

Applies cleanly on current master; full build passes locally on both
Autoconf and Meson, with no new warnings.

Thanks,
Seongjun Shin

2026년 6월 1일 (월) 오후 2:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com>님이 작성:

Show quoted text

Hello.

At Sun, 31 May 2026 17:50:08 +0900, 신성준 <shinsj4653@gmail.com> wrote in

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.

Also, if the intention is to cover all places where logging output can
block, I wonder whether the syslog() calls should be covered as
well. If they are intentionally excluded, perhaps a short comment
explaining the rationale would be useful.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v3-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v3-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+14-1
v3-0002-Add-wait-events-for-Windows-specific-logging-outp.patchapplication/octet-stream; name=v3-0002-Add-wait-events-for-Windows-specific-logging-outp.patchDownload+8-1
#7신성준
shinsj4653@gmail.com
In reply to: 신성준 (#6)
Re: Add wait events for server logging destination writes

Hi,

cfbot flagged v3 as needing a rebase -- it stopped applying after the
recent changes to elog.c (the switch to a const WCHAR pointer in
write_eventlog(), pgindent, etc.) and the new COPY pipe wait events in
wait_event_names.txt.

v4 is the same change rebased over current master, no functional
difference from v3. The only real conflict was in write_eventlog(),
where the EventlogWrite wrapping now sits on top of the const
utf16_const pointer; everything else merged cleanly.

Still applies as two patches:
v4-0001 - portable part (SysloggerWrite, StderrWrite, SyslogWrite)
v4-0002 - Windows part (WriteConsoleW plus EventlogWrite)

Builds clean on current master with both Autoconf and Meson, no new
warnings.

Thanks,
Seongjun Shin

2026년 6월 7일 (일) 오전 1:25, 신성준 <shinsj4653@gmail.com>님이 작성:

Show quoted text

Hi,

Thanks Kirk, glad it's useful.

Kyotaro Horiguchi wrote:

Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.

Also, if the intention is to cover all places where logging output
can block, I wonder whether the syslog() calls should be covered as
well.

Good points -- both are blocking output paths and there's no real
reason to leave them out, so v3 instruments them rather than excluding
them with a comment. The intent is exactly to cover the places where
logging output can block, so this makes the series consistent.

v3 adds two more WaitEventIO events:

IO / SyslogWrite - syslog(3) in write_syslog()
IO / EventlogWrite - ReportEventW()/ReportEventA() in write_eventlog()

Same approach as before: the wait is reported only around the leaf
call, using the existing pgstat_report_wait_start()/end() helpers, so
it stays allocation-free and safe on the error-reporting path, and the
series still touches just elog.c and wait_event_names.txt.

This also matches Michael's point on v2 -- each event covers a routine
rather than a single call site, so SyslogWrite wraps the syslog(3)
calls in write_syslog() and EventlogWrite wraps both ReportEvent
variants in write_eventlog(), the same way SysloggerWrite already
covers the two writes in write_pipe_chunks().

As before, 0001 is the portable part and 0002 is the Windows part
(WriteConsoleW plus the new EventlogWrite).

One caveat: EventlogWrite is Windows-only, so I couldn't get a runtime
before/after for it here -- I've only confirmed it builds (cfbot's
Windows task should cover that). The other events still show up in the
sampling I posted earlier. If someone on Windows can exercise the
event-log path I'd appreciate a confirmation.

Applies cleanly on current master; full build passes locally on both
Autoconf and Meson, with no new warnings.

Thanks,
Seongjun Shin

2026년 6월 1일 (월) 오후 2:49, Kyotaro Horiguchi <horikyota.ntt@gmail.com>님이 작성:

Hello.

At Sun, 31 May 2026 17:50:08 +0900, 신성준 <shinsj4653@gmail.com> wrote in

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.

Also, if the intention is to cover all places where logging output can
block, I wonder whether the syslog() calls should be covered as
well. If they are intentionally excluded, perhaps a short comment
explaining the rationale would be useful.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v4-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v4-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+14-1
v4-0002-Add-wait-events-for-Windows-specific-logging-outp.patchapplication/octet-stream; name=v4-0002-Add-wait-events-for-Windows-specific-logging-outp.patchDownload+8-1