Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

Started by Andrey Borodinalmost 3 years ago6 messages

amborodin86@gmail.com

almost 3 years ago

1 attachment(s)

Hi hackers!

I was asked to prototype a feature that helps to distinguish shared
buffer usage between index reads and heap reads. Practically it looks
like this:

# explain (analyze,verbose,buffers) select nextval('s');
QUERY PLAN
------------------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=8) (actual time=1.159..1.160
rows=1 loops=1)
Output: nextval('s'::regclass)
Buffers: shared hit=7(relation 3, index 4) read=6(relation 1, index
4, sequence 1) dirtied=1
Planning Time: 0.214 ms
Execution Time: 1.238 ms
(5 rows)

The change is in these parts "(relation 3, index 4)" and "(relation 1,
index 4, sequence 1)". Probably, it should help DBAs to better
understand complex queries.
I think cluttering output with more text is OK as long as "verbose" is
requested.

But there are some caveats:
1. Some more increments on hot paths. We have to add this tiny toll to
every single buffer hit, but it will be seldom of any use.
2. It's difficult to measure writes caused by query, and even dirties.
The patch adds "evictions" caused by query, but they have little
practical sense too.

All in all I do not have an opinion if this feature is a good tradeoff.
What do you think? Does the feature look useful? Do we want a more
polished implementation?

Thanks!

Best regards, Andrey Borodin.

Attachments:

v1-0001-Split-EXPLAIN-ANALYZE-BUFFERS-by-RelKind.patchapplication/octet-stream; name=v1-0001-Split-EXPLAIN-ANALYZE-BUFFERS-by-RelKind.patchDownload

From a64f89f9b681875939ae9a718cba3aaddd173536 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@flight.local>
Date: Sun, 12 Feb 2023 17:54:54 -0800
Subject: [PATCH v1] Split EXPLAIN ANALYZE BUFFERS by RelKind

---
 src/backend/access/common/relation.c | 93 ++++++++++++++++++++++++++++
 src/backend/commands/explain.c       | 47 ++++++++++++++
 src/backend/executor/instrument.c    | 14 +++++
 src/backend/storage/buffer/bufmgr.c  | 18 ++++--
 src/include/executor/instrument.h    |  9 +++
 src/include/utils/rel.h              |  5 ++
 6 files changed, 182 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c
index 4017e175e3..bbffcabfdd 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -215,3 +215,96 @@ relation_close(Relation relation, LOCKMODE lockmode)
 	if (lockmode != NoLock)
 		UnlockRelationId(&relid, lockmode);
 }
+
+int
+relkind_id(unsigned char relkind)
+{
+	switch (relkind)
+	{
+		case RELKIND_RELATION:
+			return 0;
+		case RELKIND_INDEX:
+			return 1;
+		case RELKIND_SEQUENCE:
+			return 2;
+		case RELKIND_TOASTVALUE:
+			return 3;
+		case RELKIND_VIEW:
+			return 4;
+		case RELKIND_MATVIEW:
+			return 5;
+		case RELKIND_COMPOSITE_TYPE:
+			return 6;
+		case RELKIND_FOREIGN_TABLE:
+			return 7;
+		case RELKIND_PARTITIONED_TABLE:
+			return 8;
+		case RELKIND_PARTITIONED_INDEX:
+			return 9;
+		default:
+			/* map all unknown to RELKIND_RELATION */
+			return 0;
+	}
+}
+
+unsigned char
+relkind_by_id(int id)
+{
+	switch (id)
+	{
+		case 0:
+			return RELKIND_RELATION;
+		case 1:
+			return RELKIND_INDEX;
+		case 2:
+			return RELKIND_SEQUENCE;
+		case 3:
+			return RELKIND_TOASTVALUE;
+		case 4:
+			return RELKIND_VIEW;
+		case 5:
+			return RELKIND_MATVIEW;
+		case 6:
+			return RELKIND_COMPOSITE_TYPE;
+		case 7:
+			return RELKIND_FOREIGN_TABLE;
+		case 8:
+			return RELKIND_PARTITIONED_TABLE;
+		case 9:
+			return RELKIND_PARTITIONED_INDEX;
+		default:
+			elog(ERROR, "unrecognized relkind id: %d", id);
+			return 0;
+	}
+}
+
+char*
+relkind_name(unsigned char relkind)
+{
+	switch (relkind)
+	{
+		case RELKIND_RELATION:
+			return "relation";
+		case RELKIND_INDEX:
+			return "index";
+		case RELKIND_SEQUENCE:
+			return "sequence";
+		case RELKIND_TOASTVALUE:
+			return "toastvalue";
+		case RELKIND_VIEW:
+			return "view";
+		case RELKIND_MATVIEW:
+			return "matview";
+		case RELKIND_COMPOSITE_TYPE:
+			return "composite type";
+		case RELKIND_FOREIGN_TABLE:
+			return "foreign table";
+		case RELKIND_PARTITIONED_TABLE:
+			return "partitioned table";
+		case RELKIND_PARTITIONED_INDEX:
+			return "partitioned index";
+		default:
+			elog(ERROR, "unrecognized relkind: '%c'", relkind);
+			return 0;
+	}
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..03ff2fa30f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3563,19 +3563,66 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage, bool planning)
 
 			if (has_shared)
 			{
+				int i;
 				appendStringInfoString(es->str, " shared");
 				if (usage->shared_blks_hit > 0)
+				{
 					appendStringInfo(es->str, " hit=%lld",
 									 (long long) usage->shared_blks_hit);
+					if (es->verbose)
+					{
+						appendStringInfo(es->str, "(");
+						for (i = 0; i < MAX_RELKIND_ID; i++)
+						{
+							if (usage->relKindUsage[i].blks_hit > 0)
+							{
+								if (i)
+									appendStringInfo(es->str, ", ");
+								appendStringInfo(es->str, "%s %lld",
+											relkind_name(relkind_by_id(i)),
+											(long long) usage->relKindUsage[i].blks_hit);
+							}
+						}
+						appendStringInfo(es->str, ")");
+					}
+				}
 				if (usage->shared_blks_read > 0)
+				{
 					appendStringInfo(es->str, " read=%lld",
 									 (long long) usage->shared_blks_read);
+					if (es->verbose)
+					{
+						appendStringInfo(es->str, "(");
+						for (i = 0; i < MAX_RELKIND_ID; i++)
+						{
+							if (usage->relKindUsage[i].blks_read > 0)
+							{
+								if (i)
+									appendStringInfo(es->str, ", ");
+								appendStringInfo(es->str, "%s %lld",
+											relkind_name(relkind_by_id(i)),
+											(long long) usage->relKindUsage[i].blks_read);
+							}
+						}
+						appendStringInfo(es->str, ")");
+					}
+				}
 				if (usage->shared_blks_dirtied > 0)
 					appendStringInfo(es->str, " dirtied=%lld",
 									 (long long) usage->shared_blks_dirtied);
 				if (usage->shared_blks_written > 0)
 					appendStringInfo(es->str, " written=%lld",
 									 (long long) usage->shared_blks_written);
+				if (es->verbose)
+				{
+					for (i = 0; i < MAX_RELKIND_ID; i++)
+					{
+						if (usage->relKindUsage[i].blks_evicted > 0)
+							appendStringInfo(es->str, " %s evicted=%lld",
+										relkind_name(relkind_by_id(i)),
+										(long long) usage->relKindUsage[i].blks_evicted);
+					}
+				}
 				if (has_local || has_temp)
 					appendStringInfoChar(es->str, ',');
 			}
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index ee78a5749d..b7cb2b298c 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -225,10 +225,17 @@ InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
 static void
 BufferUsageAdd(BufferUsage *dst, const BufferUsage *add)
 {
+	int i;
 	dst->shared_blks_hit += add->shared_blks_hit;
 	dst->shared_blks_read += add->shared_blks_read;
 	dst->shared_blks_dirtied += add->shared_blks_dirtied;
 	dst->shared_blks_written += add->shared_blks_written;
+	for (i = 0; i < MAX_RELKIND_ID; i++)
+	{
+		dst->relKindUsage[i].blks_hit += add->relKindUsage[i].blks_hit;
+		dst->relKindUsage[i].blks_read += add->relKindUsage[i].blks_read;
+		dst->relKindUsage[i].blks_evicted += add->relKindUsage[i].blks_evicted;
+	}
 	dst->local_blks_hit += add->local_blks_hit;
 	dst->local_blks_read += add->local_blks_read;
 	dst->local_blks_dirtied += add->local_blks_dirtied;
@@ -247,10 +254,17 @@ BufferUsageAccumDiff(BufferUsage *dst,
 					 const BufferUsage *add,
 					 const BufferUsage *sub)
 {
+	int i;
 	dst->shared_blks_hit += add->shared_blks_hit - sub->shared_blks_hit;
 	dst->shared_blks_read += add->shared_blks_read - sub->shared_blks_read;
 	dst->shared_blks_dirtied += add->shared_blks_dirtied - sub->shared_blks_dirtied;
 	dst->shared_blks_written += add->shared_blks_written - sub->shared_blks_written;
+	for (i = 0; i < MAX_RELKIND_ID; i++)
+	{
+		dst->relKindUsage[i].blks_hit += add->relKindUsage[i].blks_hit - sub->relKindUsage[i].blks_hit;
+		dst->relKindUsage[i].blks_read += add->relKindUsage[i].blks_read - sub->relKindUsage[i].blks_read;
+		dst->relKindUsage[i].blks_evicted += add->relKindUsage[i].blks_evicted - sub->relKindUsage[i].blks_evicted;
+	}
 	dst->local_blks_hit += add->local_blks_hit - sub->local_blks_hit;
 	dst->local_blks_read += add->local_blks_read - sub->local_blks_read;
 	dst->local_blks_dirtied += add->local_blks_dirtied - sub->local_blks_dirtied;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 2d6dbc6561..8bf011405f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -453,7 +453,7 @@ ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref)
 static Buffer ReadBuffer_common(SMgrRelation smgr, char relpersistence,
 								ForkNumber forkNum, BlockNumber blockNum,
 								ReadBufferMode mode, BufferAccessStrategy strategy,
-								bool *hit);
+								bool *hit, unsigned char relKind);
 static bool PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy);
 static void PinBuffer_Locked(BufferDesc *buf);
 static void UnpinBuffer(BufferDesc *buf);
@@ -770,7 +770,7 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
 	 */
 	pgstat_count_buffer_read(reln);
 	buf = ReadBuffer_common(RelationGetSmgr(reln), reln->rd_rel->relpersistence,
-							forkNum, blockNum, mode, strategy, &hit);
+							forkNum, blockNum, mode, strategy, &hit, reln->rd_rel->relkind);
 	if (hit)
 		pgstat_count_buffer_hit(reln);
 	return buf;
@@ -798,7 +798,7 @@ ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum,
 
 	return ReadBuffer_common(smgr, permanent ? RELPERSISTENCE_PERMANENT :
 							 RELPERSISTENCE_UNLOGGED, forkNum, blockNum,
-							 mode, strategy, &hit);
+							 mode, strategy, &hit, 0);
 }
 
 
@@ -810,7 +810,7 @@ ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum,
 static Buffer
 ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 				  BlockNumber blockNum, ReadBufferMode mode,
-				  BufferAccessStrategy strategy, bool *hit)
+				  BufferAccessStrategy strategy, bool *hit, unsigned char relKind)
 {
 	BufferDesc *bufHdr;
 	Block		bufBlock;
@@ -873,13 +873,23 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 		 */
 		bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
 							 strategy, &found, &io_context);
+		relKind = relkind_id(relKind);
 		if (found)
+		{
 			pgBufferUsage.shared_blks_hit++;
+			pgBufferUsage.relKindUsage[relKind].blks_hit++;
+		}
 		else if (isExtend)
+		{
 			pgBufferUsage.shared_blks_written++;
+			pgBufferUsage.relKindUsage[relKind].blks_evicted++;
+		}
 		else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG ||
 				 mode == RBM_ZERO_ON_ERROR)
+		{
 			pgBufferUsage.shared_blks_read++;
+			pgBufferUsage.relKindUsage[relKind].blks_read++;
+		 }
 	}
 
 	/* At this point we do NOT hold any locks. */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 87e5e2183b..869a70c662 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -14,8 +14,16 @@
 #define INSTRUMENT_H
 
 #include "portability/instr_time.h"
+#include "utils/rel.h"
 
 
+typedef struct RelKindBufferUsage
+{
+	int64	blks_hit;	/* # of shared buffer hits */
+	int64	blks_read;	/* # of shared disk blocks read */
+	int64	blks_evicted;	/* # of shared disk blocks evicted to accomodate */
+} RelKindBufferUsage;
+
 /*
  * BufferUsage and WalUsage counters keep being incremented infinitely,
  * i.e., must never be reset to zero, so that we can calculate how much
@@ -27,6 +35,7 @@ typedef struct BufferUsage
 	int64		shared_blks_read;	/* # of shared disk blocks read */
 	int64		shared_blks_dirtied;	/* # of shared blocks dirtied */
 	int64		shared_blks_written;	/* # of shared disk blocks written */
+	RelKindBufferUsage relKindUsage[MAX_RELKIND_ID];
 	int64		local_blks_hit; /* # of local buffer hits */
 	int64		local_blks_read;	/* # of local disk blocks read */
 	int64		local_blks_dirtied; /* # of local blocks dirtied */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index af9785038d..937f1d0430 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -706,4 +706,9 @@ RelationCloseSmgr(Relation relation)
 extern void RelationIncrementReferenceCount(Relation rel);
 extern void RelationDecrementReferenceCount(Relation rel);
 
+
+#define MAX_RELKIND_ID 10
+extern int relkind_id(unsigned char relkind);
+extern char* relkind_name(unsigned char relkind);
+extern unsigned char relkind_by_id(int id);
 #endif							/* REL_H */
-- 
2.32.0 (Apple Git-132)

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Andrey Borodin (#1)

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

Hi,

On 2023-02-13 16:23:30 -0800, Andrey Borodin wrote:

But there are some caveats:
1. Some more increments on hot paths. We have to add this tiny toll to
every single buffer hit, but it will be seldom of any use.

Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's
already quite expensive...

All in all I do not have an opinion if this feature is a good tradeoff.
What do you think? Does the feature look useful? Do we want a more
polished implementation?

Unless the above issues could be avoided, I don't think so.

Greetings,

Andres Freund

Andrey Borodin

amborodin86@gmail.com

almost 3 years ago

In reply to: Andres Freund (#2)

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

On Mon, Feb 13, 2023 at 4:29 PM Andres Freund <andres@anarazel.de> wrote:

1. Some more increments on hot paths. We have to add this tiny toll to
every single buffer hit, but it will be seldom of any use.

Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's
already quite expensive...

I think collection of instrumentation is done unconditionally.
We always do that
pgBufferUsage.shared_blks_hit++;
when the buffer is in shared_buffers.

Best regards, Andrey Borodin.

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Andrey Borodin (#3)

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

Hi,

On 2023-02-13 16:36:25 -0800, Andrey Borodin wrote:

On Mon, Feb 13, 2023 at 4:29 PM Andres Freund <andres@anarazel.de> wrote:

1. Some more increments on hot paths. We have to add this tiny toll to
every single buffer hit, but it will be seldom of any use.

Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's
already quite expensive...

I think collection of instrumentation is done unconditionally.
We always do that
pgBufferUsage.shared_blks_hit++;
when the buffer is in shared_buffers.

The problem I'm talking about is the increased overhead in InstrStopNode(),
due to BufferUsageAccumDiff() getting more expensive.

Andres

Andrey Borodin

amborodin86@gmail.com

almost 3 years ago

In reply to: Andres Freund (#4)

1 attachment(s)

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

On Mon, Feb 13, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote:

The problem I'm talking about is the increased overhead in InstrStopNode(),
due to BufferUsageAccumDiff() getting more expensive.

Thanks, now I understand the problem better. According to godbolt.com
my patch doubles the number of instructions in this function. Unless
we compute only tables\indexes\matviews.
Anyway, without regarding functionality of this particular patch,
BufferUsageAccumDiff() does not seem slow to me. It's just a
branchless bunch of simd instructions. Performance of this function
might matter only when called gazillion times per second.

Best regards, Andrey Borodin.

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Andrey Borodin (#5)

Re: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS

Hi,

On 2023-02-13 18:14:58 -0800, Andrey Borodin wrote:

On Mon, Feb 13, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote:

The problem I'm talking about is the increased overhead in InstrStopNode(),
due to BufferUsageAccumDiff() getting more expensive.

Thanks, now I understand the problem better. According to godbolt.com
my patch doubles the number of instructions in this function. Unless
we compute only tables\indexes\matviews.
Anyway, without regarding functionality of this particular patch,
BufferUsageAccumDiff() does not seem slow to me. It's just a
branchless bunch of simd instructions. Performance of this function
might matter only when called gazillion times per second.

It is called gazillions of times per second when you do an EXPLAIN (ANALYZE,
BUFFERS). Every invocation of an executor node calls it.

Here's a quick pgbench, showing todays code, with -O3, without assertions:
298.396 0 SELECT generate_series(1, 10000000) OFFSET 10000000;
397.400 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT generate_series(1, 10000000) OFFSET 10000000;
717.238 0 EXPLAIN (ANALYZE, TIMING ON) SELECT generate_series(1, 10000000) OFFSET 10000000;
419.736 0 EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT generate_series(1, 10000000) OFFSET 10000000;
761.575 0 EXPLAIN (ANALYZE, BUFFERS, TIMING ON) SELECT generate_series(1, 10000000) OFFSET 10000000;

The effect ends up a lot larger once you add in joins etc, because it ads
additional executor node that all have their instrumentation started/stopped.

Greetings,

Andres Freund