bgwriter doesn't flush WAL stats
Hi,
I was trying to add WAL stats to pg_stat_io. While doing that I was
comparing pg_stat_wal and pg_stat_io's WAL stats and there was some
inequality between the total number of WALs. I found that the difference
comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush
them because the pgstat_report_wal() function isn't called in bgwriter. I
attached a small patch for calling the pgstat_report_wal() function in
bgwriter.
bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() -> SyncOneBuffer()
-> FlushBuffer() -> XLogFlush() -> XLogWrite()
I used a query like BEGIN; followed by lots of(3000 in my case) INSERT,
DELETE, or UPDATE, followed by a COMMIT while testing.
Example output before patch applied:
┌─────────────┬─────────────────┐
│ view_name │ total_wal_write │
├─────────────┼─────────────────┤
│ pg_stat_wal │ 10318 │
│ pg_stat_io │ 10321 │
└─────────────┴─────────────────┘
┌─────────────────────┬────────┬────────┐
│ backend_type │ object │ writes │
├─────────────────────┼────────┼────────┤
│ autovacuum launcher │ wal │ 0 │
│ autovacuum worker │ wal │ 691 │
│ client backend │ wal │ 8170 │
│ background worker │ wal │ 0 │
│ background writer │ wal │ 3 │
│ checkpointer │ wal │ 1 │
│ standalone backend │ wal │ 737 │
│ startup │ wal │ 0 │
│ walsender │ wal │ 0 │
│ walwriter │ wal │ 719 │
└─────────────────────┴────────┴────────┘
After the patch has been applied, there are no differences between
pg_stat_wal and pg_stat_io.
I appreciate any comment/feedback on this patch.
Regards,
Nazir Bilal Yavuz
Microsoft
Attachments:
v1-0001-Flush-WAL-in-bgwriter.patchtext/x-diff; charset=US-ASCII; name=v1-0001-Flush-WAL-in-bgwriter.patchDownload
From 9317e86e4b2b3ba202ce090f383ac48526b97955 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 13:39:13 +0300
Subject: [PATCH v1] Flush WAL in bgwriter
bgwriter generates WAL but it doesn't flush them, so they can't be seen
in pg_stat_wal. Call pgstat_report_wal() to flush WAL in bgwriter.
---
src/backend/postmaster/bgwriter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..f2e4f23d9f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -241,6 +241,7 @@ BackgroundWriterMain(void)
/* Report pending statistics to the cumulative stats system */
pgstat_report_bgwriter();
+ pgstat_report_wal(true);
if (FirstCallSinceLastCheckpoint())
{
--
2.40.1
On Wed, 21 Jun 2023 at 13:04, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Hi,
I was trying to add WAL stats to pg_stat_io. While doing that I was comparing pg_stat_wal and pg_stat_io's WAL stats and there was some inequality between the total number of WALs. I found that the difference comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush them because the pgstat_report_wal() function isn't called in bgwriter. I attached a small patch for calling the pgstat_report_wal() function in bgwriter.
bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() -> SyncOneBuffer() -> FlushBuffer() -> XLogFlush() -> XLogWrite()
I was quite confused here, as XLogWrite() does not generate any WAL;
it only writes existing WAL from buffers to disk.
In a running PostgreSQL instance, WAL is only generated through
XLogInsert(xloginsert.c) and serialized / written to buffers in its
call to XLogInsertRecord(xlog.c); XLogFlush and XLogWrite are only
responsible for writing those buffers to disk.
The only path that I see in XLogWrite() that could potentially put
anything into WAL is through RequestCheckpoint(), but that only writes
out a checkpoint when it is not in a postmaster environment - in all
other cases it will wake up the checkpointer and wait for that
checkpoint to finish.
I also got confused with your included views; they're not included in
the patch and the current master branch doesn't emit object=wal, so I
can't really check that the patch works as intended.
But on the topic of reporting the WAL stats in bgwriter; that seems
like a good idea to fix, yes.
+1
Kind regards,
Matthias van de Meent
Neon, Inc.
Hi,
Thanks for the explanation.
On Wed, 21 Jun 2023 at 18:03, Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:
On Wed, 21 Jun 2023 at 13:04, Nazir Bilal Yavuz <byavuz81@gmail.com>
wrote:
I was trying to add WAL stats to pg_stat_io. While doing that I was
comparing pg_stat_wal and pg_stat_io's WAL stats and there was some
inequality between the total number of WALs. I found that the difference
comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush
them because the pgstat_report_wal() function isn't called in bgwriter. I
attached a small patch for calling the pgstat_report_wal() function in
bgwriter.
bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() ->
SyncOneBuffer() -> FlushBuffer() -> XLogFlush() -> XLogWrite()
I was quite confused here, as XLogWrite() does not generate any WAL;
it only writes existing WAL from buffers to disk.
In a running PostgreSQL instance, WAL is only generated through
XLogInsert(xloginsert.c) and serialized / written to buffers in its
call to XLogInsertRecord(xlog.c); XLogFlush and XLogWrite are only
responsible for writing those buffers to disk.
Yes, you are right. Correct explanation should be "bgwriter writes existing
WAL from buffers to disk but pg_stat_wal doesn't count them because
bgwriter doesn't call pgstat_report_wal() to update WAL statistics".
I also got confused with your included views; they're not included in
the patch and the current master branch doesn't emit object=wal, so I
can't really check that the patch works as intended.
I attached a WIP patch for showing WAL stats in pg_stat_io.
After applying patch, I used these queries for the getting views I shared
in the first mail;
Query for the first view:
SELECT
'pg_stat_wal' AS view_name,
SUM(wal_write) AS total_wal_write
FROM
pg_stat_wal
UNION ALL
SELECT
'pg_stat_io' AS view_name,
SUM(writes) AS total_wal_write
FROM
pg_stat_io
WHERE
object = 'wal';
Query for the second view:
SELECT backend_type, object, writes FROM pg_stat_io where object = 'wal';
I also changed the description on the patch file and attached it.
Regards,
Nazir Bilal Yavuz
Microsoft
Attachments:
v2-0002-Flush-WAL-in-bgwriter.patchtext/x-diff; charset=US-ASCII; name=v2-0002-Flush-WAL-in-bgwriter.patchDownload
From 674b54adc3a92093f0cc25cd9117bf32956c6f2c Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 13:39:13 +0300
Subject: [PATCH v2 2/2] Flush WAL in bgwriter
bgwriter writes existing WAL from buffers to disk but pg_stat_wal
doesn't count them because bgwriter doesn't call pgstat_report_wal()
to update WAL statistics. Call pgstat_report_wal() to flush WAL in
bgwriter.
---
src/backend/postmaster/bgwriter.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..f2e4f23d9f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -241,6 +241,7 @@ BackgroundWriterMain(void)
/* Report pending statistics to the cumulative stats system */
pgstat_report_bgwriter();
+ pgstat_report_wal(true);
if (FirstCallSinceLastCheckpoint())
{
--
2.40.1
v2-0001-WIP-Show-WAL-stats-on-pg_stat_io.patchtext/x-diff; charset=US-ASCII; name=v2-0001-WIP-Show-WAL-stats-on-pg_stat_io.patchDownload
From 45f0dc4ee660baceb91a96f1e5aafab3adc9bf13 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 18:36:25 +0300
Subject: [PATCH v2 1/2] [WIP] Show WAL stats on pg_stat_io
---
src/backend/access/transam/xlog.c | 11 +++++++++--
src/backend/utils/activity/pgstat_io.c | 9 ++++++++-
src/include/pgstat.h | 3 ++-
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8b0710abe6..37028429c7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2177,7 +2177,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nbytes;
Size nleft;
int written;
- instr_time start;
+ instr_time start, io_start;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2185,6 +2185,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
nleft = nbytes;
do
{
+ io_start = pgstat_prepare_io_time();
errno = 0;
/* Measure I/O timing to write WAL data */
@@ -2209,6 +2210,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start);
}
+ pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_WRITE, io_start, 1);
+
PendingWalStats.wal_write++;
if (written <= 0)
@@ -8163,7 +8166,7 @@ void
issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
- instr_time start;
+ instr_time start, io_start;
Assert(tli != 0);
@@ -8176,6 +8179,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
sync_method == SYNC_METHOD_OPEN_DSYNC)
return;
+ io_start = pgstat_prepare_io_time();
+
/* Measure I/O timing to sync the WAL file */
if (track_wal_io_timing)
INSTR_TIME_SET_CURRENT(start);
@@ -8235,6 +8240,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
}
+ pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, io_start, 1);
+
PendingWalStats.wal_sync++;
}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d422..f05c62973a 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -231,6 +231,8 @@ pgstat_get_io_object_name(IOObject io_object)
return "relation";
case IOOBJECT_TEMP_RELATION:
return "temp relation";
+ case IOOBJECT_WAL:
+ return "wal";
}
elog(ERROR, "unrecognized IOObject value: %d", io_object);
@@ -313,8 +315,8 @@ pgstat_tracks_io_bktype(BackendType bktype)
case B_ARCHIVER:
case B_LOGGER:
case B_WAL_RECEIVER:
- case B_WAL_WRITER:
return false;
+ case B_WAL_WRITER:
case B_AUTOVAC_LAUNCHER:
case B_AUTOVAC_WORKER:
@@ -350,6 +352,11 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object,
if (!pgstat_tracks_io_bktype(bktype))
return false;
+
+ if (io_context != IOCONTEXT_NORMAL &&
+ io_object == IOOBJECT_WAL)
+ return false;
+
/*
* Currently, IO on temporary relations can only occur in the
* IOCONTEXT_NORMAL IOContext.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 57a2c0866a..fb7d3abf22 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -277,9 +277,10 @@ typedef enum IOObject
{
IOOBJECT_RELATION,
IOOBJECT_TEMP_RELATION,
+ IOOBJECT_WAL,
} IOObject;
-#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+#define IOOBJECT_NUM_TYPES (IOOBJECT_WAL + 1)
typedef enum IOContext
{
--
2.40.1
At Wed, 21 Jun 2023 18:52:26 +0300, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote in
I attached a WIP patch for showing WAL stats in pg_stat_io.
Yeah, your diagnosis appears accurate. I managed to trigger an
assertion failure quite easily when I added
"Assert(!pgstat_have_pending_wal()) just after the call to
pgstat_report_bgwriter(). Good find!
I slightly inclined to place the added call after smgrcloseall() but
it doesn't seem to cause any io-stats updates so the proposed first
patch as-is looks good to me.
Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.
There was a brief mention about WAL IO during the development of
pgstat_io [1]/messages/by-id/CAAKRu_bM55pj3pPRW0nd_-paWHLRkOU69r816AeztBBa-N1HLA@mail.gmail.com.
It'd be different if we tracked WAL fsyncs more granularly - which would be
quite interesting - but that's something for another day^Wpatch.I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?
[1]: /messages/by-id/CAAKRu_bM55pj3pPRW0nd_-paWHLRkOU69r816AeztBBa-N1HLA@mail.gmail.com
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Jun 21, 2023 at 9:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.
I haven't reviewed the patch yet, but in my opinion having an
IOOBJECT_WAL makes sense. I imagined that we would add WAL as an
IOObject along with others such as an IOOBJECT_BYPASS for "bypass" IO
(IO done through the smgr API directly) and an IOOBJECT_SPILL or
something like it for spill files from joins/aggregates/etc.
I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?
Personally, I think WAL fits well as an IOObject. Then we can add
IOCONTEXT_INIT and use that for WAL file initialization and
IOCONTEXT_NORMAL for normal WAL writes/fysncs/etc. I don't think we
need a new dimension for it as it feels like an IO target just like
shared buffers and temporary buffers do. I think we should save adding
new dimensions for relationships that we can't express in the existing
paradigm.
- Melanie
Hi,
Created a commitfest entry for this.
Link: https://commitfest.postgresql.org/43/4405/
Regards,
Nazir Bilal Yavuz
Microsoft
On Thu, 22 Jun 2023 at 17:03, Melanie Plageman <melanieplageman@gmail.com>
wrote:
Show quoted text
On Wed, Jun 21, 2023 at 9:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.I haven't reviewed the patch yet, but in my opinion having an
IOOBJECT_WAL makes sense. I imagined that we would add WAL as an
IOObject along with others such as an IOOBJECT_BYPASS for "bypass" IO
(IO done through the smgr API directly) and an IOOBJECT_SPILL or
something like it for spill files from joins/aggregates/etc.I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?Personally, I think WAL fits well as an IOObject. Then we can add
IOCONTEXT_INIT and use that for WAL file initialization and
IOCONTEXT_NORMAL for normal WAL writes/fysncs/etc. I don't think we
need a new dimension for it as it feels like an IO target just like
shared buffers and temporary buffers do. I think we should save adding
new dimensions for relationships that we can't express in the existing
paradigm.- Melanie
The first patch, to flush the bgwriter's WAL stats to the stats
collector, seems like a straightforward bug fix, so committed and
backpatched that. Thank you!
I didn't look at the second patch.
--
Heikki Linnakangas
Neon (https://neon.tech)
Hi,
On Mon, 2 Oct 2023 at 13:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
The first patch, to flush the bgwriter's WAL stats to the stats
collector, seems like a straightforward bug fix, so committed and
backpatched that. Thank you!I didn't look at the second patch.
Thanks for the push!
Actual commitfest entry for the second patch is:
https://commitfest.postgresql.org/45/4416/. I sent a second patch to
this thread just to show how I found this bug. There is no need to
review it, this commitfest entry could be closed as committed.
Regards,
Nazir Bilal Yavuz
Microsoft