bgwriter doesn't flush WAL stats

Started by Nazir Bilal Yavuzover 2 years ago8 messages
#1Nazir Bilal Yavuz
byavuz81@gmail.com
1 attachment(s)

Hi,

I was trying to add WAL stats to pg_stat_io. While doing that I was
comparing pg_stat_wal and pg_stat_io's WAL stats and there was some
inequality between the total number of WALs. I found that the difference
comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush
them because the pgstat_report_wal() function isn't called in bgwriter. I
attached a small patch for calling the pgstat_report_wal() function in
bgwriter.

bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() -> SyncOneBuffer()
-> FlushBuffer() -> XLogFlush() -> XLogWrite()

I used a query like BEGIN; followed by lots of(3000 in my case) INSERT,
DELETE, or UPDATE, followed by a COMMIT while testing.

Example output before patch applied:

┌─────────────┬─────────────────┐
│ view_name │ total_wal_write │
├─────────────┼─────────────────┤
│ pg_stat_wal │ 10318 │
│ pg_stat_io │ 10321 │
└─────────────┴─────────────────┘

┌─────────────────────┬────────┬────────┐
│ backend_type │ object │ writes │
├─────────────────────┼────────┼────────┤
│ autovacuum launcher │ wal │ 0 │
│ autovacuum worker │ wal │ 691 │
│ client backend │ wal │ 8170 │
│ background worker │ wal │ 0 │
│ background writer │ wal │ 3 │
│ checkpointer │ wal │ 1 │
│ standalone backend │ wal │ 737 │
│ startup │ wal │ 0 │
│ walsender │ wal │ 0 │
│ walwriter │ wal │ 719 │
└─────────────────────┴────────┴────────┘

After the patch has been applied, there are no differences between
pg_stat_wal and pg_stat_io.

I appreciate any comment/feedback on this patch.

Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v1-0001-Flush-WAL-in-bgwriter.patchtext/x-diff; charset=US-ASCII; name=v1-0001-Flush-WAL-in-bgwriter.patchDownload
From 9317e86e4b2b3ba202ce090f383ac48526b97955 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 13:39:13 +0300
Subject: [PATCH v1] Flush WAL in bgwriter

bgwriter generates WAL but it doesn't flush them, so they can't be seen
in pg_stat_wal. Call pgstat_report_wal() to flush WAL in bgwriter.
---
 src/backend/postmaster/bgwriter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..f2e4f23d9f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -241,6 +241,7 @@ BackgroundWriterMain(void)
 
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_bgwriter();
+		pgstat_report_wal(true);
 
 		if (FirstCallSinceLastCheckpoint())
 		{
-- 
2.40.1

#2Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Nazir Bilal Yavuz (#1)
Re: bgwriter doesn't flush WAL stats

On Wed, 21 Jun 2023 at 13:04, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Hi,

I was trying to add WAL stats to pg_stat_io. While doing that I was comparing pg_stat_wal and pg_stat_io's WAL stats and there was some inequality between the total number of WALs. I found that the difference comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush them because the pgstat_report_wal() function isn't called in bgwriter. I attached a small patch for calling the pgstat_report_wal() function in bgwriter.

bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() -> SyncOneBuffer() -> FlushBuffer() -> XLogFlush() -> XLogWrite()

I was quite confused here, as XLogWrite() does not generate any WAL;
it only writes existing WAL from buffers to disk.
In a running PostgreSQL instance, WAL is only generated through
XLogInsert(xloginsert.c) and serialized / written to buffers in its
call to XLogInsertRecord(xlog.c); XLogFlush and XLogWrite are only
responsible for writing those buffers to disk.

The only path that I see in XLogWrite() that could potentially put
anything into WAL is through RequestCheckpoint(), but that only writes
out a checkpoint when it is not in a postmaster environment - in all
other cases it will wake up the checkpointer and wait for that
checkpoint to finish.

I also got confused with your included views; they're not included in
the patch and the current master branch doesn't emit object=wal, so I
can't really check that the patch works as intended.

But on the topic of reporting the WAL stats in bgwriter; that seems
like a good idea to fix, yes.

+1

Kind regards,

Matthias van de Meent
Neon, Inc.

#3Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Matthias van de Meent (#2)
2 attachment(s)
Re: bgwriter doesn't flush WAL stats

Hi,

Thanks for the explanation.

On Wed, 21 Jun 2023 at 18:03, Matthias van de Meent <
boekewurm+postgres@gmail.com> wrote:

On Wed, 21 Jun 2023 at 13:04, Nazir Bilal Yavuz <byavuz81@gmail.com>

wrote:

I was trying to add WAL stats to pg_stat_io. While doing that I was

comparing pg_stat_wal and pg_stat_io's WAL stats and there was some
inequality between the total number of WALs. I found that the difference
comes from bgwriter's WALs. bgwriter generates WAL but it doesn't flush
them because the pgstat_report_wal() function isn't called in bgwriter. I
attached a small patch for calling the pgstat_report_wal() function in
bgwriter.

bgwriter generates WAL by calling functions in this order:
bgwriter.c -> BackgroundWriterMain() -> BgBufferSync() ->

SyncOneBuffer() -> FlushBuffer() -> XLogFlush() -> XLogWrite()

I was quite confused here, as XLogWrite() does not generate any WAL;
it only writes existing WAL from buffers to disk.
In a running PostgreSQL instance, WAL is only generated through
XLogInsert(xloginsert.c) and serialized / written to buffers in its
call to XLogInsertRecord(xlog.c); XLogFlush and XLogWrite are only
responsible for writing those buffers to disk.

Yes, you are right. Correct explanation should be "bgwriter writes existing
WAL from buffers to disk but pg_stat_wal doesn't count them because
bgwriter doesn't call pgstat_report_wal() to update WAL statistics".

I also got confused with your included views; they're not included in
the patch and the current master branch doesn't emit object=wal, so I
can't really check that the patch works as intended.

I attached a WIP patch for showing WAL stats in pg_stat_io.

After applying patch, I used these queries for the getting views I shared
in the first mail;

Query for the first view:
SELECT
'pg_stat_wal' AS view_name,
SUM(wal_write) AS total_wal_write
FROM
pg_stat_wal
UNION ALL
SELECT
'pg_stat_io' AS view_name,
SUM(writes) AS total_wal_write
FROM
pg_stat_io
WHERE
object = 'wal';

Query for the second view:
SELECT backend_type, object, writes FROM pg_stat_io where object = 'wal';

I also changed the description on the patch file and attached it.

Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v2-0002-Flush-WAL-in-bgwriter.patchtext/x-diff; charset=US-ASCII; name=v2-0002-Flush-WAL-in-bgwriter.patchDownload
From 674b54adc3a92093f0cc25cd9117bf32956c6f2c Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 13:39:13 +0300
Subject: [PATCH v2 2/2] Flush WAL in bgwriter

bgwriter writes existing WAL from buffers to disk but pg_stat_wal
doesn't count them because bgwriter doesn't call pgstat_report_wal()
to update WAL statistics. Call pgstat_report_wal() to flush WAL in
bgwriter.
---
 src/backend/postmaster/bgwriter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..f2e4f23d9f 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -241,6 +241,7 @@ BackgroundWriterMain(void)
 
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_bgwriter();
+		pgstat_report_wal(true);
 
 		if (FirstCallSinceLastCheckpoint())
 		{
-- 
2.40.1

v2-0001-WIP-Show-WAL-stats-on-pg_stat_io.patchtext/x-diff; charset=US-ASCII; name=v2-0001-WIP-Show-WAL-stats-on-pg_stat_io.patchDownload
From 45f0dc4ee660baceb91a96f1e5aafab3adc9bf13 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Wed, 21 Jun 2023 18:36:25 +0300
Subject: [PATCH v2 1/2] [WIP] Show WAL stats on pg_stat_io

---
 src/backend/access/transam/xlog.c      | 11 +++++++++--
 src/backend/utils/activity/pgstat_io.c |  9 ++++++++-
 src/include/pgstat.h                   |  3 ++-
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8b0710abe6..37028429c7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2177,7 +2177,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
-			instr_time	start;
+			instr_time	start, io_start;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2185,6 +2185,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 			nleft = nbytes;
 			do
 			{
+				io_start = pgstat_prepare_io_time();
 				errno = 0;
 
 				/* Measure I/O timing to write WAL data */
@@ -2209,6 +2210,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 					INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start);
 				}
 
+				pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_WRITE, io_start, 1);
+
 				PendingWalStats.wal_write++;
 
 				if (written <= 0)
@@ -8163,7 +8166,7 @@ void
 issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 {
 	char	   *msg = NULL;
-	instr_time	start;
+	instr_time	start, io_start;
 
 	Assert(tli != 0);
 
@@ -8176,6 +8179,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		sync_method == SYNC_METHOD_OPEN_DSYNC)
 		return;
 
+	io_start = pgstat_prepare_io_time();
+
 	/* Measure I/O timing to sync the WAL file */
 	if (track_wal_io_timing)
 		INSTR_TIME_SET_CURRENT(start);
@@ -8235,6 +8240,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
 	}
 
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, io_start, 1);
+
 	PendingWalStats.wal_sync++;
 }
 
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d422..f05c62973a 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -231,6 +231,8 @@ pgstat_get_io_object_name(IOObject io_object)
 			return "relation";
 		case IOOBJECT_TEMP_RELATION:
 			return "temp relation";
+		case IOOBJECT_WAL:
+			return "wal";
 	}
 
 	elog(ERROR, "unrecognized IOObject value: %d", io_object);
@@ -313,8 +315,8 @@ pgstat_tracks_io_bktype(BackendType bktype)
 		case B_ARCHIVER:
 		case B_LOGGER:
 		case B_WAL_RECEIVER:
-		case B_WAL_WRITER:
 			return false;
+		case B_WAL_WRITER:
 
 		case B_AUTOVAC_LAUNCHER:
 		case B_AUTOVAC_WORKER:
@@ -350,6 +352,11 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object,
 	if (!pgstat_tracks_io_bktype(bktype))
 		return false;
 
+
+	if (io_context != IOCONTEXT_NORMAL &&
+		io_object == IOOBJECT_WAL)
+		return false;
+
 	/*
 	 * Currently, IO on temporary relations can only occur in the
 	 * IOCONTEXT_NORMAL IOContext.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 57a2c0866a..fb7d3abf22 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -277,9 +277,10 @@ typedef enum IOObject
 {
 	IOOBJECT_RELATION,
 	IOOBJECT_TEMP_RELATION,
+	IOOBJECT_WAL,
 } IOObject;
 
-#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+#define IOOBJECT_NUM_TYPES (IOOBJECT_WAL + 1)
 
 typedef enum IOContext
 {
-- 
2.40.1

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nazir Bilal Yavuz (#3)
Re: bgwriter doesn't flush WAL stats

At Wed, 21 Jun 2023 18:52:26 +0300, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote in

I attached a WIP patch for showing WAL stats in pg_stat_io.

Yeah, your diagnosis appears accurate. I managed to trigger an
assertion failure quite easily when I added
"Assert(!pgstat_have_pending_wal()) just after the call to
pgstat_report_bgwriter(). Good find!

I slightly inclined to place the added call after smgrcloseall() but
it doesn't seem to cause any io-stats updates so the proposed first
patch as-is looks good to me.

Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.

There was a brief mention about WAL IO during the development of
pgstat_io [1]/messages/by-id/CAAKRu_bM55pj3pPRW0nd_-paWHLRkOU69r816AeztBBa-N1HLA@mail.gmail.com.

It'd be different if we tracked WAL fsyncs more granularly - which would be
quite interesting - but that's something for another day^Wpatch.

I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?

[1]: /messages/by-id/CAAKRu_bM55pj3pPRW0nd_-paWHLRkOU69r816AeztBBa-N1HLA@mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Melanie Plageman
melanieplageman@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: bgwriter doesn't flush WAL stats

On Wed, Jun 21, 2023 at 9:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.

I haven't reviewed the patch yet, but in my opinion having an
IOOBJECT_WAL makes sense. I imagined that we would add WAL as an
IOObject along with others such as an IOOBJECT_BYPASS for "bypass" IO
(IO done through the smgr API directly) and an IOOBJECT_SPILL or
something like it for spill files from joins/aggregates/etc.

I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?

Personally, I think WAL fits well as an IOObject. Then we can add
IOCONTEXT_INIT and use that for WAL file initialization and
IOCONTEXT_NORMAL for normal WAL writes/fysncs/etc. I don't think we
need a new dimension for it as it feels like an IO target just like
shared buffers and temporary buffers do. I think we should save adding
new dimensions for relationships that we can't express in the existing
paradigm.

- Melanie

#6Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Melanie Plageman (#5)
Re: bgwriter doesn't flush WAL stats

Hi,

Created a commitfest entry for this.
Link: https://commitfest.postgresql.org/43/4405/

Regards,
Nazir Bilal Yavuz
Microsoft

On Thu, 22 Jun 2023 at 17:03, Melanie Plageman <melanieplageman@gmail.com>
wrote:

Show quoted text

On Wed, Jun 21, 2023 at 9:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Regarding the second patch, it introduces WAL IO time as a
IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
convention or design of the pgstat_io component, which primarily
focuses on shared buffer IOs.

I haven't reviewed the patch yet, but in my opinion having an
IOOBJECT_WAL makes sense. I imagined that we would add WAL as an
IOObject along with others such as an IOOBJECT_BYPASS for "bypass" IO
(IO done through the smgr API directly) and an IOOBJECT_SPILL or
something like it for spill files from joins/aggregates/etc.

I do have a question about this.
So, if we were to start tracking WAL IO would it fit within this
paradigm to have a new IOPATH_WAL for WAL or would it add a separate
dimension?

Personally, I think WAL fits well as an IOObject. Then we can add
IOCONTEXT_INIT and use that for WAL file initialization and
IOCONTEXT_NORMAL for normal WAL writes/fysncs/etc. I don't think we
need a new dimension for it as it feels like an IO target just like
shared buffers and temporary buffers do. I think we should save adding
new dimensions for relationships that we can't express in the existing
paradigm.

- Melanie

#7Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Nazir Bilal Yavuz (#6)
Re: bgwriter doesn't flush WAL stats

The first patch, to flush the bgwriter's WAL stats to the stats
collector, seems like a straightforward bug fix, so committed and
backpatched that. Thank you!

I didn't look at the second patch.

--
Heikki Linnakangas
Neon (https://neon.tech)

#8Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Heikki Linnakangas (#7)
Re: bgwriter doesn't flush WAL stats

Hi,

On Mon, 2 Oct 2023 at 13:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

The first patch, to flush the bgwriter's WAL stats to the stats
collector, seems like a straightforward bug fix, so committed and
backpatched that. Thank you!

I didn't look at the second patch.

Thanks for the push!

Actual commitfest entry for the second patch is:
https://commitfest.postgresql.org/45/4416/. I sent a second patch to
this thread just to show how I found this bug. There is no need to
review it, this commitfest entry could be closed as committed.

Regards,
Nazir Bilal Yavuz
Microsoft