Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

Started by Thomas Munroover 7 years ago13 messages
#1Thomas Munro
thomas.munro@enterprisedb.com
1 attachment(s)

Hi,

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

Here's a suggested fix.

I noticed this because I have some patches in development that hit
these cases a bit more and the numbers didn't match my expectation. I
suppose someone might want a separate counter for zero-filled buffers
(they're still buffer interactions and cache misses) but it seems like
it's probably below the kind of thing we're interested in counting
(though in passing, I notice recently that some kernels keep some free
pages pre-zeroed so they can supply them faster, which is curious).

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAIN.patchapplication/octet-stream; name=0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAIN.patchDownload
From 87037f4187f59ea2780ee51f764c4d37a0d6c0a3 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 16 Apr 2018 20:51:08 +1200
Subject: [PATCH] Don't count zero-filled buffers as 'read' in EXPLAIN.

If you extend a relation, it should count as a block written, not read (we
write a zero-filled block).  If you ask for a zero-filled buffer, it shouldn't
be counted as read or written.

Thomas Munro
---
 src/backend/storage/buffer/bufmgr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 01eabe57063..6ea88712004 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -733,7 +733,9 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
 		if (found)
 			pgBufferUsage.local_blks_hit++;
-		else
+		else if (isExtend)
+			pgBufferUsage.local_blks_written++;
+		else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG)
 			pgBufferUsage.local_blks_read++;
 	}
 	else
@@ -746,7 +748,9 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 							 strategy, &found);
 		if (found)
 			pgBufferUsage.shared_blks_hit++;
-		else
+		else if (isExtend)
+			pgBufferUsage.shared_blks_written++;
+		else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG)
 			pgBufferUsage.shared_blks_read++;
 	}
 
-- 
2.17.0

#2Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#1)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:

Hi,

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

Just for understanding: 2) can never happen for buffers showing up in
EXPLAIN, right?

I'm not saying you shouldn't fix the accounting...

Greetings,

Andres Freund

#3Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andres Freund (#2)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Mon, Apr 30, 2018 at 3:13 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

Just for understanding: 2) can never happen for buffers showing up in
EXPLAIN, right?

I'm not saying you shouldn't fix the accounting...

Maybe the hash AM can reach that in _hash_getinitbuf() while adding
overflow pages to bucket chains? Admittedly case 2 is obscure and
rare if not unreachable and probably no one would care too much about
that in practice (whereas case 1 can be seen by simply inserting stuff
into a new empty table). Other patches I'm working on for later
proposal do it more often (think accessing known-empty pages in a kind
of preallocated extent), and it occurred to me that it's clearly a bug
on principle, hence this patch.

--
Thomas Munro
http://www.enterprisedb.com

#4Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Thomas Munro (#3)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Mon, Apr 30, 2018 at 1:38 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

On Mon, Apr 30, 2018 at 3:13 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

Just for understanding: 2) can never happen for buffers showing up in
EXPLAIN, right?

I'm not saying you shouldn't fix the accounting...

Maybe the hash AM can reach that in _hash_getinitbuf() while adding
overflow pages to bucket chains? Admittedly case 2 is obscure and
rare if not unreachable and probably no one would care too much about
that in practice (whereas case 1 can be seen by simply inserting stuff
into a new empty table). Other patches I'm working on for later
proposal do it more often (think accessing known-empty pages in a kind
of preallocated extent), and it occurred to me that it's clearly a bug
on principle, hence this patch.

I checked the patch and I agree with the change 1). And regarding change 2)
whether it is zeroing the contents of the page or not, it does read?
because if it exists in the buffer pool, we are counting them as hits
irrespective
of the mode? Am I missing something?

Regards,
Haribabu Kommi
Fujitsu Australia

#5Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Haribabu Kommi (#4)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Thu, Jul 12, 2018 at 12:46 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

I checked the patch and I agree with the change 1). And regarding change 2)
whether it is zeroing the contents of the page or not, it does read?
because if it exists in the buffer pool, we are counting them as hits
irrespective
of the mode? Am I missing something?

Further down in the function you can see that there is no read()
system call for the RBM_ZERO_* modes:

if (mode == RBM_ZERO_AND_LOCK || mode ==
RBM_ZERO_AND_CLEANUP_LOCK)
MemSet((char *) bufBlock, 0, BLCKSZ);
else
{
...
smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
...
}

I suppose someone might argue that even when it's not a hit and it's
not a read, we might still want to count this buffer interaction in
some other way. Perhaps there should be a separate counter? It may
technically be a kind of cache miss, but it's nowhere near as
expensive as a synchronous system call like read() so I didn't propose
that.

Some more on my motivation: In our zheap prototype, when the system
is working well and we have enough space, we constantly allocate
zeroed buffer pages at the insert point (= head) of an undo log and
drop pages at the discard point (= tail) in the background;
effectively a few pages just go round and round via the freelist and
no read() or write() syscalls happen. That's something I'm very happy
about and it's one of our claimed advantages over the traditional heap
(which tends to read and dirty more pages), but EXPLAIN (BUFFERS)
hides this virtuous behaviour when comparing with the traditional
heap: it falsely and slanderously reports that zheap is reading undo
pages when it is not. Of course I don't intent to litigate zheap
design in this thread, I just I figured that since this accounting is
wrong on principle and affects current PostgreSQL too (at least in
theory) I would propose this little patch independently. It's subtle
enough that I wouldn't bother to back-patch it though.

--
Thomas Munro
http://www.enterprisedb.com

#6Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Thomas Munro (#5)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Thu, Jul 12, 2018 at 8:32 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

On Thu, Jul 12, 2018 at 12:46 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show

up

as "reads" when in fact they are not reads at all:

1. Relation extension, which in fact writes a zero-filled block.
2. The RBM_ZERO_* modes, which provoke neither read nor write.

I checked the patch and I agree with the change 1). And regarding change

2)

whether it is zeroing the contents of the page or not, it does read?
because if it exists in the buffer pool, we are counting them as hits
irrespective
of the mode? Am I missing something?

Further down in the function you can see that there is no read()
system call for the RBM_ZERO_* modes:

if (mode == RBM_ZERO_AND_LOCK || mode ==
RBM_ZERO_AND_CLEANUP_LOCK)
MemSet((char *) bufBlock, 0, BLCKSZ);
else
{
...
smgrread(smgr, forkNum, blockNum, (char *)
bufBlock);
...
}

Thanks for the details. I got your point. But we need to include
RBM_ZERO_ON_ERROR case read operations, excluding others
are fine.

I suppose someone might argue that even when it's not a hit and it's
not a read, we might still want to count this buffer interaction in
some other way. Perhaps there should be a separate counter? It may
technically be a kind of cache miss, but it's nowhere near as
expensive as a synchronous system call like read() so I didn't propose
that.

Yes, I agree that we may need a new counter that counts the buffers that
are just allocated (no read or no write). But currently, may be the counter
value is very less, so people are not interested.

Some more on my motivation: In our zheap prototype, when the system
is working well and we have enough space, we constantly allocate
zeroed buffer pages at the insert point (= head) of an undo log and
drop pages at the discard point (= tail) in the background;
effectively a few pages just go round and round via the freelist and
no read() or write() syscalls happen. That's something I'm very happy
about and it's one of our claimed advantages over the traditional heap
(which tends to read and dirty more pages), but EXPLAIN (BUFFERS)
hides this virtuous behaviour when comparing with the traditional
heap: it falsely and slanderously reports that zheap is reading undo
pages when it is not. Of course I don't intent to litigate zheap
design in this thread, I just I figured that since this accounting is
wrong on principle and affects current PostgreSQL too (at least in
theory) I would propose this little patch independently. It's subtle
enough that I wouldn't bother to back-patch it though.

OK. May be it is better to implement the buffer allocate counter along with
zheap to provide better buffer results?

Regards,
Haribabu Kommi
Fujitsu Australia

#7David Rowley
david.rowley@2ndquadrant.com
In reply to: Haribabu Kommi (#6)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On 12 July 2018 at 12:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Yes, I agree that we may need a new counter that counts the buffers that
are just allocated (no read or no write). But currently, may be the counter
value is very less, so people are not interested.

The existing counters don't show up in EXPLAIN (ANALYZE, BUFFERS) when
they're zero. The new counter would surely work the same way, so the
users wouldn't be burdened by additional explain output it when
they're not affected by it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Thomas Munro (#5)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

At Thu, 12 Jul 2018 10:31:46 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=2yGnw6dJ5wx8tuSRmcpDZ5uziiH_qQ0ptdha8mMLSVqw@mail.gmail.com>

I suppose someone might argue that even when it's not a hit and it's
not a read, we might still want to count this buffer interaction in
some other way. Perhaps there should be a separate counter? It may
technically be a kind of cache miss, but it's nowhere near as
expensive as a synchronous system call like read() so I didn't propose
that.

At Thu, 12 Jul 2018 10:19:29 +1000, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in <CAJrrPGduEpxwtu9VFxT21DNK=WRP=LUjK4GjPfm+4+PCjpcAxA@mail.gmail.com>

Thanks for the details. I got your point. But we need to include
RBM_ZERO_ON_ERROR case read operations, excluding others
are fine.

FWIW I agree to Haribabu's comment that the case of RBM_ZERO_*
other than ON_ERROR is a hit. It seems to show zheap's disk I /O
reduction by itself in certain extent.

At Thu, 12 Jul 2018 13:28:37 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f_kem=OzVpFADzC2=VpzcXuSQ+HCW=Hp67GAXrnQ-EVTw@mail.gmail.com>

On 12 July 2018 at 12:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Yes, I agree that we may need a new counter that counts the buffers that
are just allocated (no read or no write). But currently, may be the counter
value is very less, so people are not interested.

The existing counters don't show up in EXPLAIN (ANALYZE, BUFFERS) when
they're zero. The new counter would surely work the same way, so the
users wouldn't be burdened by additional explain output it when
they're not affected by it.

I don't object strongly neither to the new counter but I'm not
sure it is enough distinctive from hits, in the view that "a hit
is where we found a buffer for the requested page".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kyotaro HORIGUCHI (#8)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

Is this patch committable now?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Alvaro Herrera (#9)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>

Is this patch committable now?

I don't think so. We should make a decision on a point.

I was a bit confused (sorry) but IIUIC Haribabu suggested that
the RBM_ZERO_ON_ERROR case should be included to read (not just
ignored). I'm on it. I think that RPM_ZERO_* other than ON_ERROR
cases could be a kind of hit but I don't insist on it.

So, I think we should decide on at least the ON_ERROR case before
this becomes commttable.

The another counter could be another issue. I don't object to add
the counter but I'm not sure what is the suitable name.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#11Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kyotaro HORIGUCHI (#10)
1 attachment(s)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Mon, Nov 19, 2018 at 5:24 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>

Is this patch committable now?

I don't think so. We should make a decision on a point.

I was a bit confused (sorry) but IIUIC Haribabu suggested that
the RBM_ZERO_ON_ERROR case should be included to read (not just
ignored). I'm on it. I think that RPM_ZERO_* other than ON_ERROR
cases could be a kind of hit but I don't insist on it.

So, I think we should decide on at least the ON_ERROR case before
this becomes commttable.

Agreed, RBM_ZERO_ON_ERROR needs to be counted as a read. Here's a new
version like that.

The another counter could be another issue. I don't object to add
the counter but I'm not sure what is the suitable name.

I think that might be interesting information in the future, but let's
consider that for a later patch.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAI-v2.patchapplication/octet-stream; name=0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAI-v2.patchDownload
From 400eaa29bcf27cbd6e560614f6fb8e46a2bf5842 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Mon, 16 Apr 2018 20:51:08 +1200
Subject: [PATCH] Don't count zero-filled buffers as 'read' in EXPLAIN.

If you extend a relation, it should count as a block written, not
read (we write a zero-filled block).  If you ask for a zero-filled
buffer, it shouldn't be counted as read or written.

Later we might consider counting zero-filled buffers with a separate
counter.

Author: Thomas Munro
Reviewed-by: Haribabu Kommi, Kyotaro Horiguchi, David Rowley
Discussion: https://postgr.es/m/CAEepm%3D3JytB3KPpvSwXzkY%2Bdwc5zC8P8Lk7Nedkoci81_0E9rA%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 01eabe57063..9817770affc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -733,7 +733,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
 		if (found)
 			pgBufferUsage.local_blks_hit++;
-		else
+		else if (isExtend)
+			pgBufferUsage.local_blks_written++;
+		else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG ||
+				 mode == RBM_ZERO_ON_ERROR)
 			pgBufferUsage.local_blks_read++;
 	}
 	else
@@ -746,7 +749,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 							 strategy, &found);
 		if (found)
 			pgBufferUsage.shared_blks_hit++;
-		else
+		else if (isExtend)
+			pgBufferUsage.shared_blks_written++;
+		else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG ||
+				 mode == RBM_ZERO_ON_ERROR)
 			pgBufferUsage.shared_blks_read++;
 	}
 
-- 
2.19.1

#12Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Thomas Munro (#11)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Fri, Nov 23, 2018 at 6:39 PM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

On Mon, Nov 19, 2018 at 5:24 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <

alvherre@2ndquadrant.com> wrote in
<20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>

Is this patch committable now?

I don't think so. We should make a decision on a point.

I was a bit confused (sorry) but IIUIC Haribabu suggested that
the RBM_ZERO_ON_ERROR case should be included to read (not just
ignored). I'm on it. I think that RPM_ZERO_* other than ON_ERROR
cases could be a kind of hit but I don't insist on it.

So, I think we should decide on at least the ON_ERROR case before
this becomes commttable.

Agreed, RBM_ZERO_ON_ERROR needs to be counted as a read. Here's a new
version like that.

The another counter could be another issue. I don't object to add
the counter but I'm not sure what is the suitable name.

I think that might be interesting information in the future, but let's
consider that for a later patch.

Thanks for the updated patch. It looks good.
I marked it in the commitfest as ready for committer.

Regards,
Haribabu Kommi
Fujitsu Australia

#13Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Haribabu Kommi (#12)
Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

On Tue, Nov 27, 2018 at 2:53 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Thanks for the updated patch. It looks good.
I marked it in the commitfest as ready for committer.

Pushed. Thanks for the reviews!

--
Thomas Munro
http://www.enterprisedb.com