Add wait events for server logging destination writes

Started by 신성준about 20 hours ago3 messageshackers
Jump to latest
#1신성준
shinsj4653@gmail.com

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.

Thanks,
Seongjun Shin

Attachments:

v1-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v1-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+8-1
v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchapplication/octet-stream; name=v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchDownload+3-1
#2신성준
shinsj4653@gmail.com
In reply to: 신성준 (#1)
Re: Add wait events for server logging destination writes

Hi,

cfbot caught a build failure on v1, in the SanityCheck task on Linux
and Windows: elog.c uses pgstat_report_wait_start()/end() and the
WAIT_EVENT_* constants but didn't include utils/wait_event.h. It only
built here because of an accidental transitive include on my machine;
on the CI images the declarations weren't visible.

v2 fixes that by adding the missing #include "utils/wait_event.h" to
elog.c, folded into 0001 so that patch builds on its own. No other
changes; the wait events and the reported write paths are the same as
in v1.

v2-0001 adds the two events and covers the write(2) paths.
v2-0002 covers the Windows WriteConsoleW() path, split out as before.

Applies cleanly on current master; full build passes locally.

Thanks,
Seongjun Shin

2026년 5월 31일 (일) 오후 5:50, 신성준 <shinsj4653@gmail.com>님이 작성:

Show quoted text

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.

Thanks,
Seongjun Shin

Attachments:

v2-0001-Add-wait-events-for-server-logging-destination-wr.patchapplication/octet-stream; name=v2-0001-Add-wait-events-for-server-logging-destination-wr.patchDownload+9-1
v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchapplication/octet-stream; name=v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patchDownload+3-1
#3Kirk Wolak
wolakk@gmail.com
In reply to: 신성준 (#1)
Re: Add wait events for server logging destination writes

On Sun, May 31, 2026 at 4:50 AM 신성준 <shinsj4653@gmail.com> wrote:

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%

- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

+1
Nice. We have too many waits that are registered as CPU.

Show quoted text