WAL usage calculation patch
Hello pgsql-hackers,
Submitting a patch that would enable gathering of per-statement WAL
generation statistics, similar to how it is done for buffer usage.
Collected is the number of records added to WAL and number of WAL
bytes written.
The data collected was found valuable to analyze update-heavy load,
with WAL generation being the bottleneck.
The usage data is collected at low level, after compression is done on
WAL record. Data is then exposed via pg_stat_statements, could also be
used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one
used for buffer stats. I didn't dare to unify both usage metric sets
into single struct, nor rework the way both are passed to parallel
workers.
Performance impact is (supposed to be) very low, essentially adding
two int operations and memory access on WAL record insert. Additional
efforts to allocate shmem chunk for parallel workers. Parallel workers
shmem usage is increased to fir in a struct of two longs.
Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.
Please provide your comments and/or code findings.
Attachments:
wal_stats.ext.patchapplication/octet-stream; name=wal_stats.ext.patchDownload
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 5bbe054367..c2133d1ccb 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,11 +6,11 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
- pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
- pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
- pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
- pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.7--1.8.sql \
+ pg_stat_statements--1.6--1.7.sql pg_stat_statements--1.5--1.6.sql \
+ pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.3--1.4.sql \
+ pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
+ pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
LDFLAGS_SL += $(filter -lm, $(LIBS))
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..a49a6e1a3b 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -95,25 +95,25 @@ EXECUTE pgss_test(1);
(1 row)
DEALLOCATE pgss_test;
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
----------------------------------------------------+-------+------
- PREPARE pgss_test (int) AS SELECT $1, $2 LIMIT $3 | 1 | 1
- SELECT $1 +| 4 | 4
- +| |
- AS "text" | |
- SELECT $1 + $2 | 2 | 2
- SELECT $1 + $2 + $3 AS "add" | 3 | 3
- SELECT $1 AS "float" | 1 | 1
- SELECT $1 AS "int" | 2 | 2
- SELECT $1 AS i UNION SELECT $2 ORDER BY i | 1 | 2
- SELECT $1 || $2 | 1 | 1
- SELECT pg_stat_statements_reset() | 1 | 1
- WITH t(f) AS ( +| 1 | 2
- VALUES ($1), ($2) +| |
- ) +| |
- SELECT f FROM t ORDER BY f | |
- select $1::jsonb ? $2 | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+---------------------------------------------------+-------+------+-----------------+-------------------
+ PREPARE pgss_test (int) AS SELECT $1, $2 LIMIT $3 | 1 | 1 | 0 | 0
+ SELECT $1 +| 4 | 4 | 0 | 0
+ +| | | |
+ AS "text" | | | |
+ SELECT $1 + $2 | 2 | 2 | 0 | 0
+ SELECT $1 + $2 + $3 AS "add" | 3 | 3 | 0 | 0
+ SELECT $1 AS "float" | 1 | 1 | 0 | 0
+ SELECT $1 AS "int" | 2 | 2 | 0 | 0
+ SELECT $1 AS i UNION SELECT $2 ORDER BY i | 1 | 2 | 0 | 0
+ SELECT $1 || $2 | 1 | 1 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ WITH t(f) AS ( +| 1 | 2 | 0 | 0
+ VALUES ($1), ($2) +| | | |
+ ) +| | | |
+ SELECT f FROM t ORDER BY f | | | |
+ select $1::jsonb ? $2 | 1 | 1 | 0 | 0
(11 rows)
--
@@ -195,18 +195,111 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0
+(9 rows)
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test table
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+------------------------------------------------------------------+-------+------+-----------------+-------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | 54 | 1
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 240 | 3
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | 800 | 10
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | 0 | 0
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | 438 | 6
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | 219 | 3
(9 rows)
--
@@ -287,13 +380,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
@@ -344,14 +437,14 @@ SELECT PLUS_ONE(1);
2
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT (i + $2 + $3)::INTEGER | 2 | 2
- SELECT (i + $2)::INTEGER LIMIT $3 | 2 | 2
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT (i + $2 + $3)::INTEGER | 2 | 2 | 0 | 0
+ SELECT (i + $2)::INTEGER LIMIT $3 | 2 | 2 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(5 rows)
--
@@ -469,18 +562,20 @@ NOTICE: table "test" does not exist, skipping
NOTICE: table "test" does not exist, skipping
NOTICE: function plus_one(pg_catalog.int4) does not exist, skipping
DROP FUNCTION PLUS_TWO(INTEGER);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------+-------+------
- CREATE INDEX test_b ON test(b) | 1 | 0
- DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0
- DROP FUNCTION PLUS_ONE(INTEGER) | 1 | 0
- DROP FUNCTION PLUS_TWO(INTEGER) | 1 | 0
- DROP TABLE IF EXISTS test | 3 | 0
- DROP TABLE test | 1 | 0
- SELECT $1 | 1 | 1
- SELECT pg_stat_statements_reset() | 1 | 1
-(8 rows)
+DROP TABLE pgss_test;
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-------------------------------------------+-------+------+-----------------+-------------------
+ CREATE INDEX test_b ON test(b) | 1 | 0 | 1673 | 16
+ DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 56 | 1
+ DROP FUNCTION PLUS_ONE(INTEGER) | 1 | 0 | 108 | 2
+ DROP FUNCTION PLUS_TWO(INTEGER) | 1 | 0 | 162 | 3
+ DROP TABLE IF EXISTS test | 3 | 0 | 0 | 0
+ DROP TABLE pgss_test | 1 | 0 | 798 | 15
+ DROP TABLE test | 1 | 0 | 1056 | 20
+ SELECT $1 | 1 | 1 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+(9 rows)
--
-- Track user activity and reset them
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..37bdf73571
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,49 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..2dc617437b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,8 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +296,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +317,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +846,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +950,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +996,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1046,6 +1057,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1053,6 +1069,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1088,13 +1105,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1286,6 +1304,8 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
SpinLockRelease(&e->mutex);
}
@@ -1333,7 +1353,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 25
+#define PG_STAT_STATEMENTS_COLS 25 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1345,6 +1366,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1450,6 +1480,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1646,11 +1680,17 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..7bf1fa4e9f 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -51,7 +51,7 @@ PREPARE pgss_test (int) AS SELECT $1, 'test' LIMIT 1;
EXECUTE pgss_test(1);
DEALLOCATE pgss_test;
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- CRUD: INSERT SELECT UPDATE DELETE on test table
@@ -99,7 +99,56 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test table
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- pg_stat_statements.track = none
@@ -144,7 +193,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
@@ -175,7 +224,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(3);
SELECT PLUS_ONE(1);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- queries with locking clauses
@@ -223,8 +272,9 @@ DROP TABLE IF EXISTS test \;
DROP TABLE IF EXISTS test \;
DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER);
DROP FUNCTION PLUS_TWO(INTEGER);
+DROP TABLE pgss_test;
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- Track user activity and reset them
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..f042b84f32 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,24 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Amount of WAL bytes added by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Count of WAL records added by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
wal_stats.core.patchapplication/octet-stream; name=wal_stats.core.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..1491d8aaf9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
#include "catalog/pg_database.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1225,6 +1226,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..8d3cabac6c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..771dba5dd5 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,18 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..3069b15cfa 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,19 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +53,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +61,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +71,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +81,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +90,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index caf6b86f92..ea928aaa0a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2628,6 +2628,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
Hello pgsql-hackers,
Submitting a patch that would enable gathering of per-statement WAL
generation statistics, similar to how it is done for buffer usage.
Collected is the number of records added to WAL and number of WAL
bytes written.The data collected was found valuable to analyze update-heavy load,
with WAL generation being the bottleneck.The usage data is collected at low level, after compression is done on
WAL record. Data is then exposed via pg_stat_statements, could also be
used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one
used for buffer stats. I didn't dare to unify both usage metric sets
into single struct, nor rework the way both are passed to parallel
workers.Performance impact is (supposed to be) very low, essentially adding
two int operations and memory access on WAL record insert. Additional
efforts to allocate shmem chunk for parallel workers. Parallel workers
shmem usage is increased to fir in a struct of two longs.Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.Please provide your comments and/or code findings.
I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.
To date I've been relying on tools like systemtap to do this sort of
thing. But that's a bit specialised, and Pg currently lacks useful
instrumentation for it so it can be a pain to match up activity by
parallel workers and that sort of thing. (I aim to find time to submit
a patch for that.)
I haven't yet reviewed the patch.
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.Please provide your comments and/or code findings.
I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.
+1
To date I've been relying on tools like systemtap to do this sort of
thing. But that's a bit specialised, and Pg currently lacks useful
instrumentation for it so it can be a pain to match up activity by
parallel workers and that sort of thing. (I aim to find time to submit
a patch for that.)
(I'm interested in seeing your conference talk about that! I did a
bunch of stuff with static probes to measure PHJ behaviour around
barrier waits and so on but it was hard to figure out what stuff like
that to put in the actual tree, it was all a bit
use-once-to-test-a-theory-and-then-throw-away.)
Kirill, I noticed that you included a regression test that is failing. Can
this possibly be stable across machines or even on the same machine?
Does it still pass for you or did something change on the master
branch to add a new WAL record since you posted the patch?
query | calls | rows | wal_write_bytes | wal_write_records
-------------------------------------------+-------+------+-----------------+-------------------
- CREATE INDEX test_b ON test(b) | 1 | 0 | 1673 |
16
- DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 56 |
1
+ CREATE INDEX test_b ON test(b) | 1 | 0 | 1755 |
17
+ DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 0 |
0
вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.Please provide your comments and/or code findings.
I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.+1
To date I've been relying on tools like systemtap to do this sort of
thing. But that's a bit specialised, and Pg currently lacks useful
instrumentation for it so it can be a pain to match up activity by
parallel workers and that sort of thing. (I aim to find time to submit
a patch for that.)(I'm interested in seeing your conference talk about that! I did a
bunch of stuff with static probes to measure PHJ behaviour around
barrier waits and so on but it was hard to figure out what stuff like
that to put in the actual tree, it was all a bit
use-once-to-test-a-theory-and-then-throw-away.)Kirill, I noticed that you included a regression test that is failing. Can
this possibly be stable across machines or even on the same machine?
Does it still pass for you or did something change on the master
branch to add a new WAL record since you posted the patch?
Thank you for testing the patch and running extension checks. I assume
the patch applies without problems.
As for the regr test, it apparently requires some rework. I didn't pay
attention enough to make sure the data I check is actually meaningful
and isolated enough to be repeatable.
Please consider the extension part of the patch as WIP, I'll resubmit
the patch once I get a stable and meanngful test up. Thanks for
finding it!
Show quoted text
query | calls | rows | wal_write_bytes | wal_write_records -------------------------------------------+-------+------+-----------------+------------------- - CREATE INDEX test_b ON test(b) | 1 | 0 | 1673 | 16 - DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 56 | 1 + CREATE INDEX test_b ON test(b) | 1 | 0 | 1755 | 17 + DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 0 | 0
вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.Please provide your comments and/or code findings.
I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.+1
To date I've been relying on tools like systemtap to do this sort of
thing. But that's a bit specialised, and Pg currently lacks useful
instrumentation for it so it can be a pain to match up activity by
parallel workers and that sort of thing. (I aim to find time to submit
a patch for that.)(I'm interested in seeing your conference talk about that! I did a
bunch of stuff with static probes to measure PHJ behaviour around
barrier waits and so on but it was hard to figure out what stuff like
that to put in the actual tree, it was all a bit
use-once-to-test-a-theory-and-then-throw-away.)Kirill, I noticed that you included a regression test that is failing. Can
this possibly be stable across machines or even on the same machine?
Does it still pass for you or did something change on the master
branch to add a new WAL record since you posted the patch?Thank you for testing the patch and running extension checks. I assume
the patch applies without problems.As for the regr test, it apparently requires some rework. I didn't pay
attention enough to make sure the data I check is actually meaningful
and isolated enough to be repeatable.Please consider the extension part of the patch as WIP, I'll resubmit
the patch once I get a stable and meanngful test up. Thanks for
finding it!
I have reworked the extension regression test to be more isolated.
Apparently, something merged into master branch shifted my numbers.
PFA the new patch. Core part didn't change a bit, the extension part
has regression test SQL and expected log changed.
Looking forward for new comments.
Attachments:
wal_stats.core.patchapplication/octet-stream; name=wal_stats.core.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..1491d8aaf9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
#include "catalog/pg_database.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1225,6 +1226,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..8d3cabac6c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..771dba5dd5 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,18 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..3069b15cfa 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,19 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +53,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +61,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +71,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +81,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +90,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index caf6b86f92..ea928aaa0a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2628,6 +2628,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
wal_stats.ext.patchapplication/octet-stream; name=wal_stats.ext.patchDownload
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 5bbe054367..c2133d1ccb 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,11 +6,11 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
- pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
- pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
- pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
- pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.7--1.8.sql \
+ pg_stat_statements--1.6--1.7.sql pg_stat_statements--1.5--1.6.sql \
+ pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.3--1.4.sql \
+ pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
+ pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
LDFLAGS_SL += $(filter -lm, $(LIBS))
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..f3500070b4 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,18 +195,111 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0
+(9 rows)
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+------------------------------------------------------------------+-------+------+-----------------+-------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | 54 | 1
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 240 | 3
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | 800 | 10
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | 0 | 0
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | 438 | 6
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | 219 | 3
(9 rows)
--
@@ -287,13 +380,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
@@ -469,6 +562,7 @@ NOTICE: table "test" does not exist, skipping
NOTICE: table "test" does not exist, skipping
NOTICE: function plus_one(pg_catalog.int4) does not exist, skipping
DROP FUNCTION PLUS_TWO(INTEGER);
+DROP TABLE pgss_test;
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
query | calls | rows
-------------------------------------------+-------+------
@@ -477,10 +571,11 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
DROP FUNCTION PLUS_ONE(INTEGER) | 1 | 0
DROP FUNCTION PLUS_TWO(INTEGER) | 1 | 0
DROP TABLE IF EXISTS test | 3 | 0
+ DROP TABLE pgss_test | 1 | 0
DROP TABLE test | 1 | 0
SELECT $1 | 1 | 1
SELECT pg_stat_statements_reset() | 1 | 1
-(8 rows)
+(9 rows)
--
-- Track user activity and reset them
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..37bdf73571
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,49 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6f82a671ee..2dc617437b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,8 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +296,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +317,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +846,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +950,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +996,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1046,6 +1057,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1053,6 +1069,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1088,13 +1105,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1286,6 +1304,8 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
SpinLockRelease(&e->mutex);
}
@@ -1333,7 +1353,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 25
+#define PG_STAT_STATEMENTS_COLS 25 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1345,6 +1366,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1450,6 +1480,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1646,11 +1680,17 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..634e0b3e7c 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,61 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- pg_stat_statements.track = none
@@ -144,7 +198,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..f042b84f32 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,24 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Amount of WAL bytes added by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Count of WAL records added by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
On Thu, Feb 20, 2020 at 06:56:27PM +0300, Kirill Bychik wrote:
вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.Please provide your comments and/or code findings.
I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.+1
Huge +1 too.
Thank you for testing the patch and running extension checks. I assume
the patch applies without problems.As for the regr test, it apparently requires some rework. I didn't pay
attention enough to make sure the data I check is actually meaningful
and isolated enough to be repeatable.Please consider the extension part of the patch as WIP, I'll resubmit
the patch once I get a stable and meanngful test up. Thanks for
finding it!I have reworked the extension regression test to be more isolated.
Apparently, something merged into master branch shifted my numbers.PFA the new patch. Core part didn't change a bit, the extension part
has regression test SQL and expected log changed.
I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?
While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.
Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick test to store the informations in pgstat, sending
the data in the PG_FINALLY part of vacuum():
rjuju=# create table t1(id integer, val text);
CREATE TABLE
rjuju=# insert into t1 select i, 'val ' || i from generate_series(1, 100000) i;
INSERT 0 100000
rjuju=# vacuum t1;
VACUUM
rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
rjuju | 547 | 65201 | 0 | 0
(1 row)
rjuju=# delete from t1 where id % 2 = 0;
DELETE 50000
rjuju=# select pg_sleep(60);
pg_sleep
----------
(1 row)
rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
rjuju | 547 | 65201 | 1631 | 323193
(1 row)
That's seems like useful data (especially since I recently had to dig into a
problematic WAL consumption issue that was due to some autovacuum activity),
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance. We could then
see the complete WAL usage per-database. What do you think?
Some minor points I noticed:
- the extension patch doesn't apply anymore, I guess since 70a7732007bc4689
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010)
Shouldn't it be 0xA rather than 0x10?
- it would be better to add a version number to the patches, so we're sure
which one we're talking about.
On Wed, Mar 04, 2020 at 05:02:25PM +0100, Julien Rouhaud wrote:
I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?
Yep. One way to go through that would be to test if this output is
non-zero still I suspect at quick glance that this won't be entirely
reliable either.
While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.
FWIW, one reason here is that we had recently some benchmark work done
internally where this would have been helpful in studying some spiky
WAL load patterns.
--
Michael
I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?
Agree, stability of test could be an issue, even shifting of write
format or compression method or adding compatible changes could break
such test. Frankly speaking, the numbers expected are not actually
calculated, my logic was rather well described by "these numbers
should be non-zero for real tables". I believe the test can be
modified to check that numbers are above zero, both for bytes written
and for records stored.
Having a checkpoint in the moddle of the test can be almost 100%
countered by triggering one before the test. I'll add a checkpoint
call to the test scenario, if no objections here.
While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.
Well, not sure I understand you 100%, being new to Postgres dev. Do
you want a separate counter for pages written whenever doPageWrites is
true? I can do that, if needed. Please confirm.
Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick te.....
...LONG QUOTE...
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance. We could then
see the complete WAL usage per-database. What do you think?
I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to (auto)vacuum stats.
Some minor points I noticed:
- the extension patch doesn't apply anymore, I guess since 70a7732007bc4689
Will fix, thank you.
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010)Shouldn't it be 0xA rather than 0x10?
Oww, my bad, this is embaracing! Will fix, thank you.
- it would be better to add a version number to the patches, so we're sure
which one we're talking about.
Noted, thank you.
Please comment on the proposed changes, I will cook up a new version
once all are agreed upon.
Import Notes
Reply to msg id not found: CAB-hujqqsdeTq_eMihTamJ3dhG64EDdp-fWrLYVZ3CfApfrfw@mail.gmail.com
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.Well, not sure I understand you 100%, being new to Postgres dev. Do
you want a separate counter for pages written whenever doPageWrites is
true? I can do that, if needed. Please confirm.
Yes, I meant a separate 3rd counter for the number of full page images
written. However after a quick look I think that a FPI should be
detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn
<= RedoRecPtr).
Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick te.....
...LONG QUOTE...
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance. We could then
see the complete WAL usage per-database. What do you think?I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to (auto)vacuum stats.
I agree that having a view of the full activity is a way bigger scope,
so it could be done later (and at this point in pg14), but I'm still
hoping that we can get insight of other backend WAL activity, such as
autovacuum, in pg13.
пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>:
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.Well, not sure I understand you 100%, being new to Postgres dev. Do
you want a separate counter for pages written whenever doPageWrites is
true? I can do that, if needed. Please confirm.Yes, I meant a separate 3rd counter for the number of full page images
written. However after a quick look I think that a FPI should be
detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn
<= RedoRecPtr).
This seems easy, will implement once I get some spare time.
Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick te.....
...LONG QUOTE...
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance. We could then
see the complete WAL usage per-database. What do you think?I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to (auto)vacuum stats.I agree that having a view of the full activity is a way bigger scope,
so it could be done later (and at this point in pg14), but I'm still
hoping that we can get insight of other backend WAL activity, such as
autovacuum, in pg13.
How do you think this information should be exposed? Via the pg_stat_statement?
Anyways, I believe this change could be bigger than FPI. I propose to
plan a separate patch for it, or even add it to the TODO after the
core patch of wal usage is merged.
Please expect a new patch version next week, with FPI counters added.
On Fri, Mar 6, 2020 at 6:59 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>:
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to (auto)vacuum stats.I agree that having a view of the full activity is a way bigger scope,
so it could be done later (and at this point in pg14), but I'm still
hoping that we can get insight of other backend WAL activity, such as
autovacuum, in pg13.How do you think this information should be exposed? Via the pg_stat_statement?
That's unlikely, since autovacuum won't trigger any hook. I was
thinking on some new view for pgstats, similarly to the example I
showed previously. The implementation is straightforward, although
pg_stat_database is maybe not the best choice here.
Anyways, I believe this change could be bigger than FPI. I propose to
plan a separate patch for it, or even add it to the TODO after the
core patch of wal usage is merged.
Just in case, if the problem is a lack of time, I'd be happy to help
on that if needed. Otherwise, I'll definitely not try to block any
progress for the feature as proposed.
Please expect a new patch version next week, with FPI counters added.
Thanks!
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to (auto)vacuum stats.I agree that having a view of the full activity is a way bigger scope,
so it could be done later (and at this point in pg14), but I'm still
hoping that we can get insight of other backend WAL activity, such as
autovacuum, in pg13.How do you think this information should be exposed? Via the pg_stat_statement?
That's unlikely, since autovacuum won't trigger any hook. I was
thinking on some new view for pgstats, similarly to the example I
showed previously. The implementation is straightforward, although
pg_stat_database is maybe not the best choice here.
After extensive thinking and some code diving, I did not manage to
come up with a sane idea on how to expose data about autovacuum WAL
usage. Must be the flu.
Anyways, I believe this change could be bigger than FPI. I propose to
plan a separate patch for it, or even add it to the TODO after the
core patch of wal usage is merged.Just in case, if the problem is a lack of time, I'd be happy to help
on that if needed. Otherwise, I'll definitely not try to block any
progress for the feature as proposed.
Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.
Please expect a new patch version next week, with FPI counters added.
Please find attached patch version 003, with FP writes and minor
corrections. Hope i use attachment versioning as expected in this
group :)
Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.
Thanks!
Attachments:
003.wal_stats.core.patchtext/x-patch; charset=US-ASCII; name=003.wal_stats.core.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4fa446ffa4..c975aa0dc7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /*
+ * Report a full page image constructed for the WAL record
+ */
+ pgWalUsage.wal_fp_records++;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..4bcb06f6e1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fp_records += add->wal_fp_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fp_records += add->wal_fp_records - sub->wal_fp_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..f79fac8f8c 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_fp_records; /* # of full page wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +91,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e216de9570..88aed4c652 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2632,6 +2632,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
003.wal_stats.ext.patchtext/x-patch; charset=US-ASCII; name=003.wal_stats.ext.patchDownload
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..8ed985b2b6 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,123 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0
(9 rows)
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELCT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows, wal_write_bytes > 0 as wal_bytes_written, wal_write_records > 0 as wal_records_written, wal_write_records = rows as wal_records_as_rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_written | wal_records_written | wal_records_as_rows
+------------------------------------------------------------------+-------+------+-------------------+---------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +390,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..f8b79f2277
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8,
+ OUT wal_write_fp_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..1c256fc395 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
+ int64 wal_fp_records_written; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1041,6 +1053,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+ walusage.wal_fp_records =
+ pgWalUsage.wal_fp_records - walusage_start.wal_fp_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1048,6 +1067,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1083,13 +1103,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1281,6 +1302,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
+ e->counters.wal_fp_records_written += walusage->wal_fp_records;
SpinLockRelease(&e->mutex);
}
@@ -1328,7 +1352,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1340,6 +1365,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1445,6 +1479,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1641,11 +1679,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_fp_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..5c45d85ba3 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,65 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELCT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows, wal_write_bytes > 0 as wal_bytes_written, wal_write_records > 0 as wal_records_written, wal_write_records = rows as wal_records_as_rows FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +202,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..af1c276752 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,42 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Amount of WAL bytes added by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Count of WAL records added by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Amount of WAL bytes added by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Count of WAL records added by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
On Sun, Mar 15, 2020 at 09:52:18PM +0300, Kirill Bychik wrote:
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
After extensive thinking and some code diving, I did not manage to
come up with a sane idea on how to expose data about autovacuum WAL
usage. Must be the flu.Anyways, I believe this change could be bigger than FPI. I propose to
plan a separate patch for it, or even add it to the TODO after the
core patch of wal usage is merged.Just in case, if the problem is a lack of time, I'd be happy to help
on that if needed. Otherwise, I'll definitely not try to block any
progress for the feature as proposed.Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.
I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas. The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.
We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.
Please expect a new patch version next week, with FPI counters added.
Please find attached patch version 003, with FP writes and minor
corrections. Hope i use attachment versioning as expected in this
group :)
Thanks!
Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.
I just reviewed the patches, and it globally looks good to me. The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.
I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test. Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too. There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.
Let me know if you're ok with those changes.
Attachments:
v4-0001-Track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 295f328e7ea9fd9207df789dd3db5bab458decf3 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v4 1/3] Track WAL usage.
---
src/backend/access/transam/xlog.c | 8 ++++
src/backend/access/transam/xloginsert.c | 6 +++
src/backend/executor/execParallel.c | 22 ++++++++++-
src/backend/executor/instrument.c | 51 ++++++++++++++++++++++---
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 16 +++++++-
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de2d4ee582..7cab00323d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /*
+ * Report a full page image constructed for the WAL record
+ */
+ pgWalUsage.wal_fp_records++;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..4bcb06f6e1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fp_records += add->wal_fp_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fp_records += add->wal_fp_records - sub->wal_fp_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..f79fac8f8c 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_fp_records; /* # of full page wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +91,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e216de9570..88aed4c652 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2632,6 +2632,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v4-0002-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 1a39b8d43014a9bdb86b843bd3ee5aaffeda332e Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v4 2/3] Keep track of WAL usage in pg_stat_statements.
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 148 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 51 +++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 69 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 324 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..aa2d656749 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,130 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records | wal_write_fp_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------+----------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_fp_records > 0 as wal_records_fp_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_written | wal_records_written | wal_records_fp_written | wal_records_as_rows
+------------------------------------------------------------------+-------+------+-------------------+---------------------+------------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | f | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | f | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | f | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | f | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +397,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..f8b79f2277
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8,
+ OUT wal_write_fp_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..1c256fc395 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
+ int64 wal_fp_records_written; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1041,6 +1053,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+ walusage.wal_fp_records =
+ pgWalUsage.wal_fp_records - walusage_start.wal_fp_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1048,6 +1067,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1083,13 +1103,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1281,6 +1302,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
+ e->counters.wal_fp_records_written += walusage->wal_fp_records;
SpinLockRelease(&e->mutex);
}
@@ -1328,7 +1352,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1340,6 +1365,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1445,6 +1479,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1641,11 +1679,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_fp_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..9e83622b2a 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,72 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_fp_records > 0 as wal_records_fp_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +209,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..40e79f1866 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_fp_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page images generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v4-0003-Keep-track-of-auto-vacuum-WAL-usage-in-pg_stat_da.patchtext/plain; charset=us-asciiDownload
From bf20522687d2ce9a8e87da9c4a35d4a62bb654a7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 4 Mar 2020 20:09:22 +0100
Subject: [PATCH v4 3/3] Keep track of (auto)vacuum WAL usage in
pg_stat_database.
---
src/backend/catalog/system_views.sql | 6 ++
src/backend/commands/vacuum.c | 11 ++++
src/backend/postmaster/pgstat.c | 56 ++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 96 ++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 24 +++++++
src/include/pgstat.h | 27 +++++++-
src/test/regress/expected/rules.out | 6 ++
7 files changed, 225 insertions(+), 1 deletion(-)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a3f46912..a3d1ac2523 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -892,6 +892,12 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
+ pg_stat_get_db_vac_wal_records(D.oid) AS vac_wal_records,
+ pg_stat_get_db_vac_wal_fp_records(D.oid) AS vac_wal_fp_records,
+ pg_stat_get_db_vac_wal_bytes(D.oid) AS vac_wal_bytes,
+ pg_stat_get_db_autovac_wal_records(D.oid) AS autovac_wal_records,
+ pg_stat_get_db_autovac_wal_fp_records(D.oid) AS autovac_wal_fp_records,
+ pg_stat_get_db_autovac_wal_bytes(D.oid) AS autovac_wal_bytes,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
FROM (
SELECT 0 AS oid, NULL::name AS datname
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..14a235a2ed 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -275,6 +275,8 @@ vacuum(List *relations, VacuumParams *params,
BufferAccessStrategy bstrategy, bool isTopLevel)
{
static bool in_vacuum = false;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
const char *stmttype;
volatile bool in_outer_xact,
@@ -489,6 +491,15 @@ vacuum(List *relations, VacuumParams *params,
{
in_vacuum = false;
VacuumCostActive = false;
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ pgstat_report_vac_wal_usage(walusage.wal_records,
+ walusage.wal_fp_records,
+ walusage.wal_bytes,
+ IsAutoVacuumWorkerProcess());
}
PG_END_TRY();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..c9d65669ce 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -330,6 +330,7 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
+static void pgstat_recv_vac_walusage(PgStat_MsgVacWalUsage *msg, int len);
/* ------------------------------------------------------------
* Public functions called from postmaster follow
@@ -1572,6 +1573,30 @@ pgstat_report_tempfile(size_t filesize)
pgstat_send(&msg, sizeof(msg));
}
+/* --------
+ * pgstat_report_vac_wal_usage() -
+ *
+ * Tell the collector about (auto)vacuum WAL usage.
+ * --------
+ */
+void
+pgstat_report_vac_wal_usage(long wal_records, long wal_fp_records,
+ long wal_bytes, bool autovacuum)
+{
+ PgStat_MsgVacWalUsage msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACWALUSAGE);
+ msg.m_databaseid = MyDatabaseId;
+ msg.m_wal_records = wal_records;
+ msg.m_wal_fp_records = wal_fp_records;
+ msg.m_wal_bytes = wal_bytes;
+ msg.m_autovacuum = autovacuum;
+ pgstat_send(&msg, sizeof(msg));
+}
+
/* ----------
* pgstat_ping() -
@@ -4525,6 +4550,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile(&msg.msg_tempfile, len);
break;
+ case PGSTAT_MTYPE_VACWALUSAGE:
+ pgstat_recv_vac_walusage(&msg.msg_vac_walusage, len);
+ break;
+
case PGSTAT_MTYPE_CHECKSUMFAILURE:
pgstat_recv_checksum_failure(&msg.msg_checksumfailure,
len);
@@ -6282,6 +6311,33 @@ pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len)
dbentry->n_temp_files += 1;
}
+/* ----------
+ * pgstat_recv_vac_walusage() -
+ *
+ * Process a VACWALUSAGE message.
+ * ----------
+ */
+static void
+pgstat_recv_vac_walusage(PgStat_MsgVacWalUsage *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ if (msg->m_autovacuum)
+ {
+ dbentry->n_autovac_wal_records += msg->m_wal_records;
+ dbentry->n_autovac_wal_fp_records += msg->m_wal_fp_records;
+ dbentry->autovac_wal_bytes += msg->m_wal_bytes;
+ }
+ else
+ {
+ dbentry->n_vac_wal_records += msg->m_wal_records;
+ dbentry->n_vac_wal_fp_records += msg->m_wal_fp_records;
+ dbentry->vac_wal_bytes += msg->m_wal_bytes;
+ }
+}
+
/* ----------
* pgstat_recv_funcstat() -
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cea01534a5..1bc7291c2b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,6 +1606,102 @@ pg_stat_get_db_blk_write_time(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(result);
}
+Datum
+pg_stat_get_db_vac_wal_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_vac_wal_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_vac_wal_fp_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_vac_wal_fp_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_vac_wal_bytes(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->vac_wal_bytes);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_autovac_wal_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_fp_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_autovac_wal_fp_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_bytes(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->autovac_wal_bytes);
+
+ PG_RETURN_INT64(result);
+}
+
Datum
pg_stat_get_bgwriter_timed_checkpoints(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7fb574f9dc..8b1c1487ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5372,6 +5372,30 @@
proname => 'pg_stat_get_db_blk_write_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_blk_write_time' },
+{ oid => '8176', descr => 'statistics: number of vacuum wal records',
+ proname => 'pg_stat_get_db_vac_wal_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_records' },
+{ oid => '8177', descr => 'statistics: number of vacuum wal full page records',
+ proname => 'pg_stat_get_db_vac_wal_fp_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_fp_records' },
+{ oid => '8178', descr => 'statistics: number of vacuum wal bytes',
+ proname => 'pg_stat_get_db_vac_wal_bytes', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_bytes' },
+{ oid => '8179', descr => 'statistics: number of autovacuum wal records',
+ proname => 'pg_stat_get_db_autovac_wal_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_records' },
+{ oid => '8180', descr => 'statistics: number of autovacuum wal full page records',
+ proname => 'pg_stat_get_db_autovac_wal_fp_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_fp_records' },
+{ oid => '8181', descr => 'statistics: number of autovacuum wal bytes',
+ proname => 'pg_stat_get_db_autovac_wal_bytes', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_bytes' },
{ oid => '3195', descr => 'statistics: information about WAL archiver',
proname => 'pg_stat_get_archiver', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..40fb97bdc9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -65,7 +65,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
PGSTAT_MTYPE_DEADLOCK,
- PGSTAT_MTYPE_CHECKSUMFAILURE
+ PGSTAT_MTYPE_CHECKSUMFAILURE,
+ PGSTAT_MTYPE_VACWALUSAGE
} StatMsgType;
/* ----------
@@ -544,6 +545,21 @@ typedef struct PgStat_MsgChecksumFailure
TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
+/* ----------
+ * PgStat_MsgVacWalUsage Sent by the backend to tell the collector
+ * about (auto)vacuum WAL usage.
+ * ----------
+ */
+typedef struct PgStat_MsgVacWalUsage
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+ long m_wal_records;
+ long m_wal_fp_records;
+ long m_wal_bytes;
+ bool m_autovacuum;
+} PgStat_MsgVacWalUsage;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -571,6 +587,7 @@ typedef union PgStat_Msg
PgStat_MsgDeadlock msg_deadlock;
PgStat_MsgTempFile msg_tempfile;
PgStat_MsgChecksumFailure msg_checksumfailure;
+ PgStat_MsgVacWalUsage msg_vac_walusage;
} PgStat_Msg;
@@ -613,6 +630,12 @@ typedef struct PgStat_StatDBEntry
TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
+ PgStat_Counter n_vac_wal_records;
+ PgStat_Counter n_vac_wal_fp_records;
+ PgStat_Counter vac_wal_bytes;
+ PgStat_Counter n_autovac_wal_records;
+ PgStat_Counter n_autovac_wal_fp_records;
+ PgStat_Counter autovac_wal_bytes;
TimestampTz stat_reset_timestamp;
TimestampTz stats_timestamp; /* time of db stats file update */
@@ -1261,6 +1284,8 @@ extern void pgstat_bestart(void);
extern void pgstat_report_activity(BackendState state, const char *cmd_str);
extern void pgstat_report_tempfile(size_t filesize);
+extern void pgstat_report_vac_wal_usage(long wal_records, long wal_fp_record,
+ long wal_bytes, bool autovacuum);
extern void pgstat_report_appname(const char *appname);
extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
extern const char *pgstat_get_wait_event(uint32 wait_event_info);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7304611c3..aa311cd594 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1833,6 +1833,12 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
+ pg_stat_get_db_vac_wal_records(d.oid) AS vac_wal_records,
+ pg_stat_get_db_vac_wal_fp_records(d.oid) AS vac_wal_fp_records,
+ pg_stat_get_db_vac_wal_bytes(d.oid) AS vac_wal_bytes,
+ pg_stat_get_db_autovac_wal_records(d.oid) AS autovac_wal_records,
+ pg_stat_get_db_autovac_wal_fp_records(d.oid) AS autovac_wal_fp_records,
+ pg_stat_get_db_autovac_wal_bytes(d.oid) AS autovac_wal_bytes,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
FROM ( SELECT 0 AS oid,
NULL::name AS datname
--
2.20.1
Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas. The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.
No objections about 3rd on my side, unless we miss the CF completely.
As for the code, I believe:
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
Could be done much simpler via the utility:
WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);
On a side note, I agree API to the buf/wal usage is far from perfect.
Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.I just reviewed the patches, and it globally looks good to me. The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test. Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too. There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.Let me know if you're ok with those changes.
Sorry for not getting wal_fp_usage into the docs, my fault.
As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.
On Tue, Mar 17, 2020 at 10:27:05PM +0300, Kirill Bychik wrote:
Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas. The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.No objections about 3rd on my side, unless we miss the CF completely.
As for the code, I believe: + walusage.wal_records = pgWalUsage.wal_records - + walusage_start.wal_records; + walusage.wal_fp_records = pgWalUsage.wal_fp_records - + walusage_start.wal_fp_records; + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;Could be done much simpler via the utility:
WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);
Indeed, but this function is private to instrument.c. AFAICT
pg_stat_statements is already duplicating similar code for buffers rather than
having BufferUsageAccumDiff being exported, so I chose the same approach.
I'd be in favor of exporting both functions though.
On a side note, I agree API to the buf/wal usage is far from perfect.
Yes clearly.
Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.I just reviewed the patches, and it globally looks good to me. The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test. Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too. There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.Let me know if you're ok with those changes.
Sorry for not getting wal_fp_usage into the docs, my fault.
As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.
I'm also a little bit dubious about it. The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it. At least keeping it for the temporary tables test
shouldn't be a problem.
Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas. The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.No objections about 3rd on my side, unless we miss the CF completely.
As for the code, I believe: + walusage.wal_records = pgWalUsage.wal_records - + walusage_start.wal_records; + walusage.wal_fp_records = pgWalUsage.wal_fp_records - + walusage_start.wal_fp_records; + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;Could be done much simpler via the utility:
WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);Indeed, but this function is private to instrument.c. AFAICT
pg_stat_statements is already duplicating similar code for buffers rather than
having BufferUsageAccumDiff being exported, so I chose the same approach.I'd be in favor of exporting both functions though.
On a side note, I agree API to the buf/wal usage is far from perfect.
Yes clearly.
There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.
Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.
Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.I just reviewed the patches, and it globally looks good to me. The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test. Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too. There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.Let me know if you're ok with those changes.
Sorry for not getting wal_fp_usage into the docs, my fault.
As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.I'm also a little bit dubious about it. The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it. At least keeping it for the temporary tables test
shouldn't be a problem.
Temp tables should show zero FPI WAL records, true :)
I have no objections to the patch.
On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:
There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.
The instrumentation is somewhat intended to be used with executor nodes, not
backend commands. I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.
As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.I'm also a little bit dubious about it. The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it. At least keeping it for the temporary tables test
shouldn't be a problem.Temp tables should show zero FPI WAL records, true :)
I have no objections to the patch.
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.
Attachments:
v5-0001-Track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From a41a58c51e15c31524ea28be8e31bccbf8d5b343 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v5 1/3] Track WAL usage.
---
src/backend/access/transam/xlog.c | 8 ++++
src/backend/access/transam/xloginsert.c | 6 +++
src/backend/executor/execParallel.c | 22 ++++++++++-
src/backend/executor/instrument.c | 51 ++++++++++++++++++++++---
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 16 +++++++-
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de2d4ee582..7cab00323d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /*
+ * Report a full page image constructed for the WAL record
+ */
+ pgWalUsage.wal_fp_records++;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..4bcb06f6e1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fp_records += add->wal_fp_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fp_records += add->wal_fp_records - sub->wal_fp_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..f79fac8f8c 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_fp_records; /* # of full page wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +91,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e216de9570..88aed4c652 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2632,6 +2632,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v5-0002-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 892a6cb3ffb235a82c9dec8bb9e5c52a4250f853 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v5 2/3] Keep track of WAL usage in pg_stat_statements.
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 147 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 51 +++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 68 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 322 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..46b59f56c5 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,129 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records | wal_write_fp_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------+----------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_written | wal_records_written | wal_records_as_rows
+------------------------------------------------------------------+-------+------+-------------------+---------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +396,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..f8b79f2277
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8,
+ OUT wal_write_fp_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..1c256fc395 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
+ int64 wal_fp_records_written; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1041,6 +1053,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+ walusage.wal_fp_records =
+ pgWalUsage.wal_fp_records - walusage_start.wal_fp_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1048,6 +1067,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1083,13 +1103,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1281,6 +1302,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
+ e->counters.wal_fp_records_written += walusage->wal_fp_records;
SpinLockRelease(&e->mutex);
}
@@ -1328,7 +1352,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1340,6 +1365,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1445,6 +1479,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1641,11 +1679,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_fp_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..5184a4bbb0 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,71 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +208,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..40e79f1866 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_fp_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page images generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v5-0003-Keep-track-of-auto-vacuum-WAL-usage-in-pg_stat_da.patchtext/plain; charset=us-asciiDownload
From e1e1d6d38895151d287515dafb9d9ee8f8cc912e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 4 Mar 2020 20:09:22 +0100
Subject: [PATCH v5 3/3] Keep track of (auto)vacuum WAL usage in
pg_stat_database.
---
src/backend/catalog/system_views.sql | 6 ++
src/backend/commands/vacuum.c | 11 ++++
src/backend/postmaster/pgstat.c | 56 ++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 96 ++++++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 24 +++++++
src/include/pgstat.h | 27 +++++++-
src/test/regress/expected/rules.out | 6 ++
7 files changed, 225 insertions(+), 1 deletion(-)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b8a3f46912..a3d1ac2523 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -892,6 +892,12 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
+ pg_stat_get_db_vac_wal_records(D.oid) AS vac_wal_records,
+ pg_stat_get_db_vac_wal_fp_records(D.oid) AS vac_wal_fp_records,
+ pg_stat_get_db_vac_wal_bytes(D.oid) AS vac_wal_bytes,
+ pg_stat_get_db_autovac_wal_records(D.oid) AS autovac_wal_records,
+ pg_stat_get_db_autovac_wal_fp_records(D.oid) AS autovac_wal_fp_records,
+ pg_stat_get_db_autovac_wal_bytes(D.oid) AS autovac_wal_bytes,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
FROM (
SELECT 0 AS oid, NULL::name AS datname
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..14a235a2ed 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -275,6 +275,8 @@ vacuum(List *relations, VacuumParams *params,
BufferAccessStrategy bstrategy, bool isTopLevel)
{
static bool in_vacuum = false;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
const char *stmttype;
volatile bool in_outer_xact,
@@ -489,6 +491,15 @@ vacuum(List *relations, VacuumParams *params,
{
in_vacuum = false;
VacuumCostActive = false;
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ pgstat_report_vac_wal_usage(walusage.wal_records,
+ walusage.wal_fp_records,
+ walusage.wal_bytes,
+ IsAutoVacuumWorkerProcess());
}
PG_END_TRY();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f9287b7942..c9d65669ce 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -330,6 +330,7 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
+static void pgstat_recv_vac_walusage(PgStat_MsgVacWalUsage *msg, int len);
/* ------------------------------------------------------------
* Public functions called from postmaster follow
@@ -1572,6 +1573,30 @@ pgstat_report_tempfile(size_t filesize)
pgstat_send(&msg, sizeof(msg));
}
+/* --------
+ * pgstat_report_vac_wal_usage() -
+ *
+ * Tell the collector about (auto)vacuum WAL usage.
+ * --------
+ */
+void
+pgstat_report_vac_wal_usage(long wal_records, long wal_fp_records,
+ long wal_bytes, bool autovacuum)
+{
+ PgStat_MsgVacWalUsage msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACWALUSAGE);
+ msg.m_databaseid = MyDatabaseId;
+ msg.m_wal_records = wal_records;
+ msg.m_wal_fp_records = wal_fp_records;
+ msg.m_wal_bytes = wal_bytes;
+ msg.m_autovacuum = autovacuum;
+ pgstat_send(&msg, sizeof(msg));
+}
+
/* ----------
* pgstat_ping() -
@@ -4525,6 +4550,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile(&msg.msg_tempfile, len);
break;
+ case PGSTAT_MTYPE_VACWALUSAGE:
+ pgstat_recv_vac_walusage(&msg.msg_vac_walusage, len);
+ break;
+
case PGSTAT_MTYPE_CHECKSUMFAILURE:
pgstat_recv_checksum_failure(&msg.msg_checksumfailure,
len);
@@ -6282,6 +6311,33 @@ pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len)
dbentry->n_temp_files += 1;
}
+/* ----------
+ * pgstat_recv_vac_walusage() -
+ *
+ * Process a VACWALUSAGE message.
+ * ----------
+ */
+static void
+pgstat_recv_vac_walusage(PgStat_MsgVacWalUsage *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ if (msg->m_autovacuum)
+ {
+ dbentry->n_autovac_wal_records += msg->m_wal_records;
+ dbentry->n_autovac_wal_fp_records += msg->m_wal_fp_records;
+ dbentry->autovac_wal_bytes += msg->m_wal_bytes;
+ }
+ else
+ {
+ dbentry->n_vac_wal_records += msg->m_wal_records;
+ dbentry->n_vac_wal_fp_records += msg->m_wal_fp_records;
+ dbentry->vac_wal_bytes += msg->m_wal_bytes;
+ }
+}
+
/* ----------
* pgstat_recv_funcstat() -
*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cea01534a5..1bc7291c2b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,6 +1606,102 @@ pg_stat_get_db_blk_write_time(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(result);
}
+Datum
+pg_stat_get_db_vac_wal_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_vac_wal_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_vac_wal_fp_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_vac_wal_fp_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_vac_wal_bytes(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->vac_wal_bytes);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_autovac_wal_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_fp_records(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_autovac_wal_fp_records);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
+pg_stat_get_db_autovac_wal_bytes(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ /* convert counter from microsec to millisec for display */
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->autovac_wal_bytes);
+
+ PG_RETURN_INT64(result);
+}
+
Datum
pg_stat_get_bgwriter_timed_checkpoints(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7fb574f9dc..8b1c1487ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5372,6 +5372,30 @@
proname => 'pg_stat_get_db_blk_write_time', provolatile => 's',
proparallel => 'r', prorettype => 'float8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_blk_write_time' },
+{ oid => '8176', descr => 'statistics: number of vacuum wal records',
+ proname => 'pg_stat_get_db_vac_wal_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_records' },
+{ oid => '8177', descr => 'statistics: number of vacuum wal full page records',
+ proname => 'pg_stat_get_db_vac_wal_fp_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_fp_records' },
+{ oid => '8178', descr => 'statistics: number of vacuum wal bytes',
+ proname => 'pg_stat_get_db_vac_wal_bytes', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_vac_wal_bytes' },
+{ oid => '8179', descr => 'statistics: number of autovacuum wal records',
+ proname => 'pg_stat_get_db_autovac_wal_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_records' },
+{ oid => '8180', descr => 'statistics: number of autovacuum wal full page records',
+ proname => 'pg_stat_get_db_autovac_wal_fp_records', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_fp_records' },
+{ oid => '8181', descr => 'statistics: number of autovacuum wal bytes',
+ proname => 'pg_stat_get_db_autovac_wal_bytes', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_autovac_wal_bytes' },
{ oid => '3195', descr => 'statistics: information about WAL archiver',
proname => 'pg_stat_get_archiver', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1a19921f80..40fb97bdc9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -65,7 +65,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
PGSTAT_MTYPE_DEADLOCK,
- PGSTAT_MTYPE_CHECKSUMFAILURE
+ PGSTAT_MTYPE_CHECKSUMFAILURE,
+ PGSTAT_MTYPE_VACWALUSAGE
} StatMsgType;
/* ----------
@@ -544,6 +545,21 @@ typedef struct PgStat_MsgChecksumFailure
TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
+/* ----------
+ * PgStat_MsgVacWalUsage Sent by the backend to tell the collector
+ * about (auto)vacuum WAL usage.
+ * ----------
+ */
+typedef struct PgStat_MsgVacWalUsage
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+ long m_wal_records;
+ long m_wal_fp_records;
+ long m_wal_bytes;
+ bool m_autovacuum;
+} PgStat_MsgVacWalUsage;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -571,6 +587,7 @@ typedef union PgStat_Msg
PgStat_MsgDeadlock msg_deadlock;
PgStat_MsgTempFile msg_tempfile;
PgStat_MsgChecksumFailure msg_checksumfailure;
+ PgStat_MsgVacWalUsage msg_vac_walusage;
} PgStat_Msg;
@@ -613,6 +630,12 @@ typedef struct PgStat_StatDBEntry
TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
+ PgStat_Counter n_vac_wal_records;
+ PgStat_Counter n_vac_wal_fp_records;
+ PgStat_Counter vac_wal_bytes;
+ PgStat_Counter n_autovac_wal_records;
+ PgStat_Counter n_autovac_wal_fp_records;
+ PgStat_Counter autovac_wal_bytes;
TimestampTz stat_reset_timestamp;
TimestampTz stats_timestamp; /* time of db stats file update */
@@ -1261,6 +1284,8 @@ extern void pgstat_bestart(void);
extern void pgstat_report_activity(BackendState state, const char *cmd_str);
extern void pgstat_report_tempfile(size_t filesize);
+extern void pgstat_report_vac_wal_usage(long wal_records, long wal_fp_record,
+ long wal_bytes, bool autovacuum);
extern void pgstat_report_appname(const char *appname);
extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
extern const char *pgstat_get_wait_event(uint32 wait_event_info);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c7304611c3..aa311cd594 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1833,6 +1833,12 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
+ pg_stat_get_db_vac_wal_records(d.oid) AS vac_wal_records,
+ pg_stat_get_db_vac_wal_fp_records(d.oid) AS vac_wal_fp_records,
+ pg_stat_get_db_vac_wal_bytes(d.oid) AS vac_wal_bytes,
+ pg_stat_get_db_autovac_wal_records(d.oid) AS autovac_wal_records,
+ pg_stat_get_db_autovac_wal_fp_records(d.oid) AS autovac_wal_fp_records,
+ pg_stat_get_db_autovac_wal_bytes(d.oid) AS autovac_wal_bytes,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
FROM ( SELECT 0 AS oid,
NULL::name AS datname
--
2.20.1
There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.The instrumentation is somewhat intended to be used with executor nodes, not
backend commands. I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.I'm also a little bit dubious about it. The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it. At least keeping it for the temporary tables test
shouldn't be a problem.Temp tables should show zero FPI WAL records, true :)
I have no objections to the patch.
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.
No objections on my side either. Thank you for your review, time and efforts!
On Wed, Mar 18, 2020 at 08:48:17PM +0300, Kirill Bychik wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.No objections on my side either. Thank you for your review, time and efforts!
Great, thanks also for the patches and efforts! I'll mark the entry as RFC.
On 2020/03/19 2:19, Julien Rouhaud wrote:
On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:
There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.The instrumentation is somewhat intended to be used with executor nodes, not
backend commands. I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.I'm also a little bit dubious about it. The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it. At least keeping it for the temporary tables test
shouldn't be a problem.Temp tables should show zero FPI WAL records, true :)
I have no objections to the patch.
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.
You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.
Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On Thu, Mar 19, 2020 at 09:03:02PM +0900, Fujii Masao wrote:
On 2020/03/19 2:19, Julien Rouhaud wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.
Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.
The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:
Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.
That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().
Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytes
VACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
Note that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.
Attachments:
v6-0001-Track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 73c9827b4fde3830dd52e8d2ec423f05b27dbca4 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v6 1/3] Track WAL usage.
---
src/backend/access/transam/xlog.c | 8 ++++
src/backend/access/transam/xloginsert.c | 6 +++
src/backend/executor/execParallel.c | 22 ++++++++++-
src/backend/executor/instrument.c | 51 ++++++++++++++++++++++---
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 16 +++++++-
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 95 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 793c076da6..80df3db87f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /*
+ * Report a full page image constructed for the WAL record
+ */
+ pgWalUsage.wal_fp_records++;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Report buffer usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..4bcb06f6e1 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
+static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +185,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fp_records += add->wal_fp_records;
+}
+
+static void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fp_records += add->wal_fp_records - sub->wal_fp_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..f79fac8f8c 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of wal records produced */
+ long wal_fp_records; /* # of full page wal records produced */
+ long wal_bytes; /* size of wal records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs wal usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need wal usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* wal usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total wal usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +91,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e216de9570..88aed4c652 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2632,6 +2632,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v6-0002-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From a2fadbb98b6a94915e172ed0794fbffce2417490 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v6 2/3] Keep track of WAL usage in pg_stat_statements.
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 147 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 51 +++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 68 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 322 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..46b59f56c5 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,129 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records | wal_write_fp_records
+-------------------------------------------------------------+-------+------+-----------------+-------------------+----------------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_written | wal_records_written | wal_records_as_rows
+------------------------------------------------------------------+-------+------+-------------------+---------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +396,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_write_bytes | wal_write_records
+-----------------------------------+-------+------+-----------------+-------------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..f8b79f2277
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8,
+ OUT wal_write_fp_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..1c256fc395 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ int64 wal_bytes_written; /* total amount of wal bytes written */
+ int64 wal_records_written; /* # of wal records written */
+ int64 wal_fp_records_written; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1041,6 +1053,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ walusage.wal_bytes =
+ pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ walusage.wal_records =
+ pgWalUsage.wal_records - walusage_start.wal_records;
+ walusage.wal_fp_records =
+ pgWalUsage.wal_fp_records - walusage_start.wal_fp_records;
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1048,6 +1067,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1083,13 +1103,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1281,6 +1302,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes_written += walusage->wal_bytes;
+ e->counters.wal_records_written += walusage->wal_records;
+ e->counters.wal_fp_records_written += walusage->wal_fp_records;
SpinLockRelease(&e->mutex);
}
@@ -1328,7 +1352,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1340,6 +1365,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1445,6 +1479,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1641,11 +1679,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ values[i++] = Int64GetDatumFast(tmp.wal_bytes_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_records_written);
+ values[i++] = Int64GetDatumFast(tmp.wal_fp_records_written);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..5184a4bbb0 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,71 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows,
+wal_write_bytes, wal_write_records, wal_write_fp_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_write_bytes > 0 as wal_bytes_written,
+wal_write_records > 0 as wal_records_written,
+wal_write_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +208,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..40e79f1866 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_write_bytes</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_write_fp_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page images generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v6-0003-Expose-WAL-usage-counters-in-verbose-auto-vacuum-.patchtext/plain; charset=us-asciiDownload
From d956378b4f464844d8fe8f6d8c99fab1c9f11243 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v6 3/3] Expose WAL usage counters in verbose (auto)vacuum
output.
---
src/backend/access/heap/vacuumlazy.c | 29 +++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..32e6023738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -381,6 +382,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -569,6 +572,12 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -620,7 +629,12 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page records, %ld bytes"),
+ walusage.wal_records,
+ walusage.wal_fp_records,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -713,6 +727,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1690,6 +1706,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page records, %ld WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_fp_records,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.
Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?
On 2020/03/23 7:32, Kirill Bychik wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?
Here are the comments for 0001 patch.
+ /*
+ * Report a full page image constructed for the WAL record
+ */
+ pgWalUsage.wal_fp_records++;
Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.
ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?
+ long wal_bytes; /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
/*
* Finish parallel execution. We wait for parallel workers to finish, and
* accumulate their buffer usage.
*/
There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On 2020/03/23 21:01, Fujii Masao wrote:
On 2020/03/23 7:32, Kirill Bychik wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?Here are the comments for 0001 patch.
+ /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++;Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?+ long wal_bytes; /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
/*
* Finish parallel execution. We wait for parallel workers to finish, and
* accumulate their buffer usage.
*/There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.
Here are the comments for 0002 patch.
+ OUT wal_write_bytes int8,
+ OUT wal_write_records int8,
+ OUT wal_write_fp_records int8
Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;
PARALLEL SAFE should be specified?
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?
+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;
Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
On Mon, Mar 23, 2020 at 3:24 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/03/23 21:01, Fujii Masao wrote:
On 2020/03/23 7:32, Kirill Bychik wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?Here are the comments for 0001 patch.
+ /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++;Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?+ long wal_bytes; /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
/*
* Finish parallel execution. We wait for parallel workers to finish, and
* accumulate their buffer usage.
*/There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.Here are the comments for 0002 patch.
+ OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?+RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE;PARALLEL SAFE should be specified?
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?+-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT;Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...+UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ;Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?
FTR I marked the commitfest entry as waiting on author.
Kirill do you think you'll have time to address Fuji-san's review
shortly? The end of the commitfest is approaching quite fast :(
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?Here are the comments for 0001 patch.
+ /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++;Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?+ long wal_bytes; /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
/*
* Finish parallel execution. We wait for parallel workers to finish, and
* accumulate their buffer usage.
*/There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.Here are the comments for 0002 patch.
+ OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?+RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE;PARALLEL SAFE should be specified?
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?+-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT;Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...+UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ;Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?FTR I marked the commitfest entry as waiting on author.
Kirill do you think you'll have time to address Fuji-san's review
shortly? The end of the commitfest is approaching quite fast :(
All these are really valuable objections. Unfortunately, I won't be
able to get all sorted out soon, due to total lack of time. I would be
very glad if somebody could step in for this patch.
On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability. As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.Ah right, I totally missed that when I tried to clean up the original POC.
Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.That's a way better idea! I'm attaching the full patchset with the 3rd patch
to use this approach instead. There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().Autovacuum log sample:
2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytesVACUUM log sample:
# vacuum VERBOSE t1;
INFO: vacuuming "public.t1"
INFO: "t1": removed 50000 row versions in 443 pages
INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO: "t1": truncated 443 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: vacuuming "pg_toast.pg_toast_16385"
INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUMNote that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently. I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?Here are the comments for 0001 patch.
+ /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++;Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?+ long wal_bytes; /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
/*
* Finish parallel execution. We wait for parallel workers to finish, and
* accumulate their buffer usage.
*/There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.Here are the comments for 0002 patch.
+ OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?+RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE;PARALLEL SAFE should be specified?
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?+-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT;Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...+UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ;Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?FTR I marked the commitfest entry as waiting on author.
Kirill do you think you'll have time to address Fuji-san's review
shortly? The end of the commitfest is approaching quite fast :(All these are really valuable objections. Unfortunately, I won't be
able to get all sorted out soon, due to total lack of time. I would be
very glad if somebody could step in for this patch.
I'll try to do that tomorrow!
On Sat, Mar 28, 2020 at 12:54 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
All these are really valuable objections. Unfortunately, I won't be
able to get all sorted out soon, due to total lack of time. I would be
very glad if somebody could step in for this patch.I'll try to do that tomorrow!
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index? Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations. Those will primarily cover
parallel queries that won't write WAL. How you have tested those
changes?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?
Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.
Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations. Those will primarily cover
parallel queries that won't write WAL. How you have tested those
changes?
I didn't tested those, and I'm not even sure how to properly and reliably test
that. Do you have any advice on how to achieve that?
However the patch is mimicking the buffer instrumentation that already exists,
and the approach also looks correct to me. Do you have a reason to believe
that the approach that works for buffer usage wouldn't work for WAL records? (I
of course agree that this should be tested anyway)
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.
Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.
I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
commit adding parallel maintenance.
On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.
Sawada-San would like to investigate this? If not, I will look into
this next week.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations. Those will primarily cover
parallel queries that won't write WAL. How you have tested those
changes?I didn't tested those, and I'm not even sure how to properly and reliably test
that. Do you have any advice on how to achieve that?However the patch is mimicking the buffer instrumentation that already exists,
and the approach also looks correct to me. Do you have a reason to believe
that the approach that works for buffer usage wouldn't work for WAL records? (I
of course agree that this should be tested anyway)
The buffer usage infrastructure is for read-only queries (for ex. for
stats like blks_hit, blks_read). As far as I can think, there is no
easy way to test the WAL usage via that API. It might or might not be
required in the future depending on whether we decide to use the same
infrastructure for parallel writes. I think for now we should remove
that part of changes and rather think how to get that for parallel
operations that can write WAL. For ex. we might need to do something
similar to what this patch has done in begin_parallel_vacuum and
end_parallel_vacuum. Would you like to attempt that?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.Sawada-San would like to investigate this? If not, I will look into
this next week.
Sure, I'll investigate this issue today.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.Sawada-San would like to investigate this? If not, I will look into
this next week.Sure, I'll investigate this issue today.
I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.
* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';
total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)
* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';
total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)
The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.
Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.
I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Mar 29, 2020 at 11:03:50AM +0530, Amit Kapila wrote:
On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations. Those will primarily cover
parallel queries that won't write WAL. How you have tested those
changes?I didn't tested those, and I'm not even sure how to properly and reliably test
that. Do you have any advice on how to achieve that?However the patch is mimicking the buffer instrumentation that already exists,
and the approach also looks correct to me. Do you have a reason to believe
that the approach that works for buffer usage wouldn't work for WAL records? (I
of course agree that this should be tested anyway)The buffer usage infrastructure is for read-only queries (for ex. for
stats like blks_hit, blks_read). As far as I can think, there is no
easy way to test the WAL usage via that API. It might or might not be
required in the future depending on whether we decide to use the same
infrastructure for parallel writes.
I'm not sure that I get your point. I'm assuming that you meant
parallel-read-only queries, but surely buffer usage infrastructure for
parallel query relies on the same approach as non-parallel one (each node
computes the process-local pgBufferUsage diff) and sums all of that at the end
of the parallel query execution. I also don't see how whether the query is
read-only or not is relevant here as far as instrumentation is concerned,
especially since read-only query can definitely do writes and increase the
count of dirtied buffers, like a write query would. For instance a hint
bit change can be done in a parallel query AFAIK, and this can generate WAL
records in wal_log_hints is enabled, so that's probably one way to test it.
I now think that not adding support for WAL buffers in EXPLAIN output in the
initial patch scope was a mistake, as this is probably the best way to test the
WAL counters for parallel queries. This shouldn't be hard to add though, and I
can work on it quickly if there's still a chance to get this feature included
in pg13.
I think for now we should remove
that part of changes and rather think how to get that for parallel
operations that can write WAL. For ex. we might need to do something
similar to what this patch has done in begin_parallel_vacuum and
end_parallel_vacuum. Would you like to attempt that?
Do you mean removing WAL buffers instrumentation from parallel query
infrastructure?
For parallel utility that can do writes it's probably better to keep the
discussion in the other part of the thread. I tried to think a little bit
about that, but for now I don't have a better idea than adding something
similar to intrumentation for utility command to have a general infrastructure,
as building a workaround for specific utility looks like the wrong approach.
But this would require quite import changes in utility handling, which is maybe
not a good idea a couple of week before the feature freeze, and that is
definitely not backpatchable so that won't fix the issue for parallel index
build that exists since pg11.
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.Sawada-San would like to investigate this? If not, I will look into
this next week.Sure, I'll investigate this issue today.
Thanks for looking at it!
I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.
As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase. One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase. One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).
I am not sure if we can decide at this stage whether it is
back-patchable or not. Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches. In general, I don't see why the fix here should be
complex?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I'm not sure that I get your point. I'm assuming that you meant
parallel-read-only queries, but surely buffer usage infrastructure for
parallel query relies on the same approach as non-parallel one (each node
computes the process-local pgBufferUsage diff) and sums all of that at the end
of the parallel query execution. I also don't see how whether the query is
read-only or not is relevant here as far as instrumentation is concerned,
especially since read-only query can definitely do writes and increase the
count of dirtied buffers, like a write query would. For instance a hint
bit change can be done in a parallel query AFAIK, and this can generate WAL
records in wal_log_hints is enabled, so that's probably one way to test it.
Yeah, that way we can test it. Can you try that?
I now think that not adding support for WAL buffers in EXPLAIN output in the
initial patch scope was a mistake, as this is probably the best way to test the
WAL counters for parallel queries. This shouldn't be hard to add though, and I
can work on it quickly if there's still a chance to get this feature included
in pg13.
I am not sure we will add it in Explain or not (maybe we need inputs
from others in this regard), but if it helps in testing this part of
the patch, then it is a good idea to write a patch for it. You might
want to keep it separate from the main patch as we might not commit
it.
I think for now we should remove
that part of changes and rather think how to get that for parallel
operations that can write WAL. For ex. we might need to do something
similar to what this patch has done in begin_parallel_vacuum and
end_parallel_vacuum. Would you like to attempt that?Do you mean removing WAL buffers instrumentation from parallel query
infrastructure?
Yes, I meant that but now I realize we need those and your proposed
way of testing it can help us in validating those changes.
For parallel utility that can do writes it's probably better to keep the
discussion in the other part of the thread.
Sure, I am fine with that but I am not sure if it is a good idea to
commit this patch without having a way to compute WAL utilization for
those commands.
I tried to think a little bit
about that, but for now I don't have a better idea than adding something
similar to intrumentation for utility command to have a general infrastructure,
as building a workaround for specific utility looks like the wrong approach.
I don't know what exactly you have in mind as I don't see why it
should be too complex. Let's wait for a patch from Sawada-San on
buffer usage stuff and in the meantime, we can work on other parts of
this patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase. One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).I am not sure if we can decide at this stage whether it is
back-patchable or not. Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches.
Agreed.
In general, I don't see why the fix here should be
complex?
Yeah, particularly the approach (1) will not be complex. I'll write a
patch tomorrow.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 23, 2020 at 11:24:50PM +0900, Fujii Masao wrote:
Here are the comments for 0001 patch.
+����������� /* +������������ * Report a full page image constructed for the WAL record +������������ */ +����������� pgWalUsage.wal_fp_records++;Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.
Agreed, I went with fpw.
ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls� XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?
Yes probably. I also see while adding support for EXPLAIN/auto_explain that
the previous approach was incrementing both records and fpw_records, while it
should be only one of those for each record. I fixed this using the approach I
previously mentionned in [1]/messages/by-id/CAOBaU_aECK1Z7Nn+x=MhvEwrJzK8wyPsPtWAafjqtZN1fYjEmg@mail.gmail.com which seems to work just fine.
+��� long������� wal_bytes;������� /* size of wal records produced */
Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?
Yes indeed. I switched to uint64, and modified everything accordingly (and
changed pgss to output numeric as there's no other way to handle unsigned int8)
+��� shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
bufusage_space should be walusage_space here?
Good catch, fixed.
/*
�* Finish parallel execution.� We wait for parallel workers to finish, and
�* accumulate their buffer usage.
�*/There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.
I went through all the file and quickly checked in other places, and I think I
fixed all required comments.
Here are the comments for 0002 patch.
+ OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?
Agreed, I simply dropped the "_write" part everywhere.
+RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE;PARALLEL SAFE should be specified?
Indeed, fixed.
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?
As mentionned in other pgss thread, I think the general agreement is to never
provide full script anymore, so I didn't changed that.
+-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT;Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...
It should ensure a FPW for each new block touch, but yes that's quite fragile.
Since I fixed the record / FPW record counters, I saw that this was actually
already broken as there was a mix of FPW and non-FPW, so I dropped the
checkpoint and just tested (wal_record + wal_fpw_record) instead.
+UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ;Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?
As far as I can see it's used to test multiple scenario (single command /
multiple commands in or outside explicit transaction). It shouldn't add a lot
of overhead and since some commands are issues with "\;" it's also testing
proper query string isolation when multi-command query string is provided,
which doesn't seem like a bad idea. I didn't changed that but I'm not opposed
to remove some of the updates if needed.
Also, to answer Amit Kapila's comments about WAL records and parallel query, I
added support for both EXPLAIN and auto_explain (tab completion and
documentation are also updated), and using a simple table with an index, with
forced parallelism and no leader participation and concurrent update on the
same table, I could test that WAL usage is working as expected:
rjuju=# explain (analyze, wal, verbose) select * from t1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=8.695..47.592 rows=100010 loops=1)
Output: id, val
Workers Planned: 2
Workers Launched: 2
WAL: records=204 bytes=86198
-> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.056..29.112 rows=50005 loops
Output: id, val
WAL: records=204 bytes=86198
Worker 0: actual time=0.060..28.995 rows=49593 loops=1
WAL: records=105 bytes=44222
Worker 1: actual time=0.052..29.230 rows=50417 loops=1
WAL: records=99 bytes=41976
Planning Time: 0.038 ms
Execution Time: 53.957 ms
(14 rows)
and the same query when nothing end up being modified:
rjuju=# explain (analyze, wal, verbose) select * from t1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=9.413..48.187 rows=100010 loops=1)
Output: id, val
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.033..24.697 rows=50005 loops
Output: id, val
Worker 0: actual time=0.028..24.786 rows=50447 loops=1
Worker 1: actual time=0.038..24.609 rows=49563 loops=1
Planning Time: 0.282 ms
Execution Time: 55.643 ms
(10 rows)
So it seems to me that WAL usage infrastructure for parallel query is working
just fine. I added the EXPLAIN/auto_explain in a separate commit just in case.
[1]: /messages/by-id/CAOBaU_aECK1Z7Nn+x=MhvEwrJzK8wyPsPtWAafjqtZN1fYjEmg@mail.gmail.com
Attachments:
v7-0001-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 9e90e1a41cb1cbeaa99528a9b7be75f9bf9294c4 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v7 1/4] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/transam/xlog.c | 11 ++++++
src/backend/access/transam/xloginsert.c | 1 +
src/backend/executor/execParallel.c | 38 ++++++++++++++-----
src/backend/executor/instrument.c | 50 ++++++++++++++++++++++---
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 19 +++++++++-
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 103 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8fe92962b0..f270b4a0e5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ if (doPageWrites && fpw_lsn <= RedoRecPtr)
+ pgWalUsage.wal_fpw_records++;
+ else
+ pgWalUsage.wal_records++;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..bb2290d636 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index bc1d42bf64..d2515b0a4c 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -24,6 +24,10 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
static void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -33,15 +37,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -55,6 +61,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -69,6 +76,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -97,6 +107,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -160,6 +174,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -167,21 +184,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -223,3 +244,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fpw_records += add->wal_fpw_records;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fpw_records += add->wal_fpw_records - sub->wal_fpw_records;
+}
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index f48d46aede..23264dd396 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,21 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +55,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +63,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +73,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +83,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,7 +92,9 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ca2d9ec8fb..1eef542f06 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2635,6 +2635,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v7-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut.patchtext/plain; charset=us-asciiDownload
From 20be7cc55954f1e5095fca7143a61ae6b036c41e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v7 2/4] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..e4661232b2 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page records and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ff2f45cfb2..f9d250a932 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_fpw_records > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_fpw_records > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+ usage->wal_fpw_records);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page records", NULL,
+ usage->wal_fpw_records, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ca8f0d75a6..fa61284248 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v7-0003-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 8dd26b0e0bb485f2b8f1168716903e556aa23bf4 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v7 3/4] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 144 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 58 ++++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 64 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 322 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..7bfbeffa21 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,126 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_fpw_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records | wal_fpw_records
+-------------------------------------------------------------+-------+------+-----------+-------------+-----------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+(wal_records + wal_fpw_records) > 0 as wal_records_generated,
+(wal_records + wal_fpw_records) = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +393,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..27ac80cde0
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_fpw_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..584e7a54ec 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_fpw_records; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1041,6 +1053,9 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage.blk_write_time = pgBufferUsage.blk_write_time;
INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1048,6 +1063,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1083,13 +1099,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1281,6 +1298,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_fpw_records += walusage->wal_fpw_records;
SpinLockRelease(&e->mutex);
}
@@ -1328,7 +1348,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1340,6 +1361,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1445,6 +1475,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1641,11 +1675,29 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_fpw_records);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..7e4ac4f77d 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,67 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_fpw_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+(wal_records + wal_fpw_records) > 0 as wal_records_generated,
+(wal_records + wal_fpw_records) = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +204,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..80ad03b3da 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_fp_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page images generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v7-0004-Expose-WAL-usage-counters-in-verbose-auto-vacuum-.patchtext/plain; charset=us-asciiDownload
From 455787d3ebf76cbe756292a3eac045c9761c47fc Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v7 4/4] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..ca4f03f551 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -381,6 +382,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -569,6 +572,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -620,7 +626,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page records, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_fpw_records,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -713,6 +725,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1690,6 +1704,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page records, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_fpw_records,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
Hi Amit,
Sorry I just noticed your mail.
On Sun, Mar 29, 2020 at 05:12:16PM +0530, Amit Kapila wrote:
On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I'm not sure that I get your point. I'm assuming that you meant
parallel-read-only queries, but surely buffer usage infrastructure for
parallel query relies on the same approach as non-parallel one (each node
computes the process-local pgBufferUsage diff) and sums all of that at the end
of the parallel query execution. I also don't see how whether the query is
read-only or not is relevant here as far as instrumentation is concerned,
especially since read-only query can definitely do writes and increase the
count of dirtied buffers, like a write query would. For instance a hint
bit change can be done in a parallel query AFAIK, and this can generate WAL
records in wal_log_hints is enabled, so that's probably one way to test it.Yeah, that way we can test it. Can you try that?
I now think that not adding support for WAL buffers in EXPLAIN output in the
initial patch scope was a mistake, as this is probably the best way to test the
WAL counters for parallel queries. This shouldn't be hard to add though, and I
can work on it quickly if there's still a chance to get this feature included
in pg13.I am not sure we will add it in Explain or not (maybe we need inputs
from others in this regard), but if it helps in testing this part of
the patch, then it is a good idea to write a patch for it. You might
want to keep it separate from the main patch as we might not commit
it.
As I just wrote in [1] that's exactly what I did. Using parallel query and
concurrent update on a table I could see that WAL usage for parallel query
seems to be working as one could expect.
Sure, I am fine with that but I am not sure if it is a good idea to
commit this patch without having a way to compute WAL utilization for
those commands.
I'm generally fine with waiting for a fix for the existing issue to be
committed. But as the feature freeze is approaching, I hope that it won't mean
postponing this feature to v14 because a related 2yo bug has just been
discovered, as it would seem a bit unfair.
On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase. One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).I am not sure if we can decide at this stage whether it is
back-patchable or not. Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches.Agreed.
In general, I don't see why the fix here should be
complex?Yeah, particularly the approach (1) will not be complex. I'll write a
patch tomorrow.
I've attached two patches fixing this issue for parallel index
creation and parallel vacuum. These approaches take the same approach;
we allocate DSM to share buffer usage and the leader gathers them,
described as approach (1) above. I think this is a straightforward
approach for this issue. We can create a common entry point for
parallel maintenance command that is responsible for gathering buffer
usage as well as sharing query text etc. But it will accompany
relatively big change and it might be overkill at this stage. We can
discuss that and it will become an item for PG14.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_vacuum.patchapplication/octet-stream; name=bufferusage_vacuum.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..bc08651df1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -259,6 +261,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -1991,6 +1996,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
int nindexes)
{
int nworkers;
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2088,6 +2094,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < nworkers; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -3071,6 +3084,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3154,6 +3168,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3188,6 +3214,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3317,6 +3349,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3369,10 +3402,18 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
if (lvshared->maintenance_work_mem_worker > 0)
maintenance_work_mem = lvshared->maintenance_work_mem_worker;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
vac_close_indexes(nindexes, indrels, RowExclusiveLock);
table_close(onerel, ShareUpdateExclusiveLock);
+
pfree(stats);
}
bufferusage_create_index.patchapplication/octet-stream; name=bufferusage_create_index.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index e66cd36dfa..13d6eb76c3 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1474,6 +1477,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1526,6 +1530,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1597,6 +1613,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1607,6 +1628,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1635,8 +1657,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1767,6 +1799,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1828,11 +1861,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
19857.217207 | 45238 | 226944 | 272182 |
225943 | 225894
(1 row)* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';total_time | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
6932.117365 | 45205 | 73079 | 118284 |
72403 | 72365
(1 row)The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase. One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).I am not sure if we can decide at this stage whether it is
back-patchable or not. Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches.Agreed.
In general, I don't see why the fix here should be
complex?Yeah, particularly the approach (1) will not be complex. I'll write a
patch tomorrow.I've attached two patches fixing this issue for parallel index
creation and parallel vacuum. These approaches take the same approach;
we allocate DSM to share buffer usage and the leader gathers them,
described as approach (1) above. I think this is a straightforward
approach for this issue. We can create a common entry point for
parallel maintenance command that is responsible for gathering buffer
usage as well as sharing query text etc. But it will accompany
relatively big change and it might be overkill at this stage. We can
discuss that and it will become an item for PG14.
The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_vacuum_v2.patchapplication/octet-stream; name=bufferusage_vacuum_v2.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9726f69629..e8e3be0cba 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -270,6 +272,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2069,6 +2074,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
int nindexes)
{
int nworkers;
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < nworkers; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -3179,6 +3192,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3262,6 +3276,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3296,6 +3322,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3425,6 +3457,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3494,10 +3527,17 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
On Mon, Mar 30, 2020 at 04:01:18PM +0900, Masahiko Sawada wrote:
On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.[...]
I've attached two patches fixing this issue for parallel index
creation and parallel vacuum. These approaches take the same approach;
we allocate DSM to share buffer usage and the leader gathers them,
described as approach (1) above. I think this is a straightforward
approach for this issue. We can create a common entry point for
parallel maintenance command that is responsible for gathering buffer
usage as well as sharing query text etc. But it will accompany
relatively big change and it might be overkill at this stage. We can
discuss that and it will become an item for PG14.The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.
Thanks Sawada-san!
Just minor nitpicking:
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < nworkers; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?
Other than that both patch looks good to me and a good fit for packpatching. I
also did some testing on VACUUM and CREATE INDEX and it works as expected.
On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
@@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ if (doPageWrites && fpw_lsn <= RedoRecPtr)
+ pgWalUsage.wal_fpw_records++;
+ else
+ pgWalUsage.wal_records++;
+ }
+
I think the above code has multiple problems. (a) fpw_lsn can be
InvalidXLogRecPtr and still there could be full-page image (for ex.
when REGBUF_FORCE_IMAGE flag for buffer is set). (b) There could be
multiple FPW records while inserting a record; consider when there are
multiple registered buffers. I think the right place to figure this
out is XLogRecordAssemble. (c) There are cases when we also attach the
record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA),
so we might want to increment wal_fpw_records and wal_records for such
cases.
I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
@@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;+ /* Provide WAL update data to the instrumentation */ + if (inserted) + { + pgWalUsage.wal_bytes += rechdr->xl_tot_len; + if (doPageWrites && fpw_lsn <= RedoRecPtr) + pgWalUsage.wal_fpw_records++; + else + pgWalUsage.wal_records++; + } +I think the above code has multiple problems. (a) fpw_lsn can be
InvalidXLogRecPtr and still there could be full-page image (for ex.
when REGBUF_FORCE_IMAGE flag for buffer is set). (b) There could be
multiple FPW records while inserting a record; consider when there are
multiple registered buffers. I think the right place to figure this
out is XLogRecordAssemble. (c) There are cases when we also attach the
record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA),
so we might want to increment wal_fpw_records and wal_records for such
cases.I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.
My previous approach was indeed totally broken. v8 attached which hopefully
will be ok.
Attachments:
v8-0001-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 3b8757a46aff847e5b36bf30a5e1f8d6662d0465 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v8 1/4] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 +++++--
src/backend/executor/execParallel.c | 38 ++++++++++++++-----
src/backend/executor/instrument.c | 50 ++++++++++++++++++++++---
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 19 +++++++++-
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 113 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1951103b26..6fb82c6e6b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -995,7 +996,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1251,6 +1253,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_fpw_records += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..413750948b 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..4289216a0f 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -31,15 +35,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +59,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +74,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +105,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +172,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +182,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +242,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_fpw_records += add->wal_fpw_records;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_fpw_records += add->wal_fpw_records - sub->wal_fpw_records;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..a567ccb19e 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,21 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +55,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +63,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +73,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +83,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +92,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ccc34ee2ac..9298ac663f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2636,6 +2636,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v8-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut.patchtext/plain; charset=us-asciiDownload
From cfc0ccd255fe396da28d58dd73f18902d6182734 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v8 2/4] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..e4661232b2 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page records and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..cfb71e8e92 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_fpw_records > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_fpw_records > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+ usage->wal_fpw_records);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page records", NULL,
+ usage->wal_fpw_records, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ca8f0d75a6..fa61284248 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v8-0003-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From c1c7c2561c89f035ae47d3322874fb2653ba2abc Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v8 3/4] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 144 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 58 ++++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 64 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 322 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..ad7b1153ae 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,126 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_fpw_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records | wal_fpw_records
+-------------------------------------------------------------+-------+------+-----------+-------------+-----------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +393,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..27ac80cde0
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_fpw_records int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 50c345378d..03b97a37cb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_fpw_records; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1019,6 +1031,9 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1026,6 +1041,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1061,13 +1077,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1259,6 +1276,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_fpw_records += walusage->wal_fpw_records;
SpinLockRelease(&e->mutex);
}
@@ -1306,7 +1326,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1318,6 +1339,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1423,6 +1453,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1619,11 +1653,29 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_fpw_records);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..a8c9b4428e 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,67 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_fpw_records
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +204,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..80ad03b3da 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_fp_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page images generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v8-0004-Expose-WAL-usage-counters-in-verbose-auto-vacuum-.patchtext/plain; charset=us-asciiDownload
From 62bd1fa1c667d3bbd713072688d2bd9e9f9b15fc Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v8 4/4] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9726f69629..55df857ff7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -401,6 +402,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -622,6 +625,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -673,7 +679,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page records, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_fpw_records,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -765,6 +777,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1744,6 +1758,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page records, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_fpw_records,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < nworkers; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+
This should be done for launched workers aka
lps->pcxt->nworkers_launched. I think a similar problem exists in
create index related patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); +This should be done for launched workers aka
lps->pcxt->nworkers_launched. I think a similar problem exists in
create index related patch.
You're right. Fixed in the new patches.
On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
Just minor nitpicking:
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]);We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?
We can do that but I was not sure if it's good since other codes
around there don't use that. So I'd like to leave it for committers.
It's a trivial change.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_vacuum_v3.patchapplication/octet-stream; name=bufferusage_vacuum_v3.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..777c7c9e70 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -259,6 +261,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -1991,6 +1996,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
int nindexes)
{
int nworkers;
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2088,6 +2094,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -3071,6 +3084,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3154,6 +3168,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3188,6 +3214,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3317,6 +3349,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3369,9 +3402,16 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
if (lvshared->maintenance_work_mem_worker > 0)
maintenance_work_mem = lvshared->maintenance_work_mem_worker;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
vac_close_indexes(nindexes, indrels, RowExclusiveLock);
table_close(onerel, ShareUpdateExclusiveLock);
pfree(stats);
bufferusage_create_index_v2.patchapplication/octet-stream; name=bufferusage_create_index_v2.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index e66cd36dfa..a08a7df5cc 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1474,6 +1477,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1526,6 +1530,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1597,6 +1613,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1607,6 +1628,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1635,8 +1657,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1767,6 +1799,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1828,11 +1861,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); +This should be done for launched workers aka
lps->pcxt->nworkers_launched. I think a similar problem exists in
create index related patch.You're right. Fixed in the new patches.
On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
Just minor nitpicking:
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]);We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?We can do that but I was not sure if it's good since other codes
around there don't use that. So I'd like to leave it for committers.
It's a trivial change.
I have reviewed the patch and the patch looks fine to me.
One minor comment
/+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/buffer usage are in DSM / buffer usage area in DSM
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.My previous approach was indeed totally broken. v8 attached which hopefully
will be ok.
This is better. Few more comments:
1. The point (c) from my previous email doesn't seem to be fixed
properly. Basically, the record data is only attached with FPW in
some particular cases like where REGBUF_KEEP_DATA is set, but the
patch assumes it is always set.
2.
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;
Typo. /imsage/image
3. We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]/messages/by-id/CA+fd4k5L4yVoWz0smymmqB4_SMHd2tyJExUgA_ACsL7k00B5XQ@mail.gmail.com
which fixed the case for buffer usage.
[1]: /messages/by-id/CA+fd4k5L4yVoWz0smymmqB4_SMHd2tyJExUgA_ACsL7k00B5XQ@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.My previous approach was indeed totally broken. v8 attached which hopefully
will be ok.This is better. Few more comments:
1. The point (c) from my previous email doesn't seem to be fixed
properly. Basically, the record data is only attached with FPW in
some particular cases like where REGBUF_KEEP_DATA is set, but the
patch assumes it is always set.
As I mentioned multiple times already, I'm really not familiar with
the WAL code, so I'll be happy to be proven wrong but my reading is
that in XLogRecordAssemble(), there are 2 different things being done:
- a FPW is optionally added, iif include_image is true, which doesn't
take into account REGBUF_KEEP_DATA. Looking at that part of the code
I don't see any sign of the recorded FPW being skipped or discarded if
REGBUF_KEEP_DATA is not set, and useful variables such as total_len
are modified
- then data is also optionally added, iif needs_data is set.
IIUC a FPW can be added even if the WAL record doesn't contain data.
So the behavior look ok to me, as what seems to be useful it to
distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record
and 1 FPW.
What am I missing here?
2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1;Typo. /imsage/image
Oops yes, will fix.
3. We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]
which fixed the case for buffer usage.
I'm sorry but I'm not following. Do you mean adding regression tests
for that case?
On Tue, Mar 31, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.My previous approach was indeed totally broken. v8 attached which hopefully
will be ok.This is better. Few more comments:
1. The point (c) from my previous email doesn't seem to be fixed
properly. Basically, the record data is only attached with FPW in
some particular cases like where REGBUF_KEEP_DATA is set, but the
patch assumes it is always set.2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1;Typo. /imsage/image
3. We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]
which fixed the case for buffer usage.
I have started reviewing this patch and I have some comments/questions.
1.
@@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
Better we move all variable declaration first along with other
variables and then function declaration along with other function
declaration. That is the convention we follow.
2.
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
I think you need to run pgindent, we should give only one space
between the variable name and '='.
so we need to change like below
bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
3.
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */
IMHO, the name wal_fpw_records is bit confusing, First I thought it
is counting the number of wal records which actually has FPW, then
after seeing code, I realized that it is actually counting total FPW.
Shouldn't we rename it to just wal_fpw? or wal_num_fpw or
wal_fpw_count?
4. Currently, we are combining all full-page write
force/normal/consistency checks in one category. I am not sure
whether it will be good information to know how many are force_fpw and
how many are normal_fpw?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 2:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
4. Currently, we are combining all full-page write
force/normal/consistency checks in one category. I am not sure
whether it will be good information to know how many are force_fpw and
how many are normal_fpw?
We can do it if we want but I am not sure how useful it will be. I
think we can always enhance this information if people really need
this and have a clear use-case in mind.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 2:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch. You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord. I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.My previous approach was indeed totally broken. v8 attached which hopefully
will be ok.This is better. Few more comments:
1. The point (c) from my previous email doesn't seem to be fixed
properly. Basically, the record data is only attached with FPW in
some particular cases like where REGBUF_KEEP_DATA is set, but the
patch assumes it is always set.As I mentioned multiple times already, I'm really not familiar with
the WAL code, so I'll be happy to be proven wrong but my reading is
that in XLogRecordAssemble(), there are 2 different things being done:- a FPW is optionally added, iif include_image is true, which doesn't
take into account REGBUF_KEEP_DATA. Looking at that part of the code
I don't see any sign of the recorded FPW being skipped or discarded if
REGBUF_KEEP_DATA is not set, and useful variables such as total_len
are modified
- then data is also optionally added, iif needs_data is set.IIUC a FPW can be added even if the WAL record doesn't contain data.
So the behavior look ok to me, as what seems to be useful it to
distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record
and 1 FPW.
It is possible that both of us are having different meanings for below
two variables:
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */
Let me clarify my understanding. Say if the record is just an FPI
(ex. XLOG_FPI) and doesn't contain any data then do we want to add one
to each of wal_fpw_records and wal_records? My understanding was in
such a case we will just increment wal_fpw_records.
3. We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]
which fixed the case for buffer usage.I'm sorry but I'm not following. Do you mean adding regression tests
for that case?
No. I mean to say we should implement WAL usage calculation for those
two parallel commands. AFAICS, your patch doesn't cover those two
commands.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
I think there are some issues in the num_fpw calculation. For some
cases, we have to return from XLogInsert without inserting a record.
Basically, we've to recompute/reassemble the same record. In those
cases, num_fpw should be reset. Thoughts?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 11:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started reviewing this patch and I have some comments/questions.
Thanks a lot!
1.
@@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage;static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+WalUsage pgWalUsage; +static WalUsage save_pgWalUsage; + +static void WalUsageAdd(WalUsage *dst, WalUsage *add);Better we move all variable declaration first along with other
variables and then function declaration along with other function
declaration. That is the convention we follow.
Agreed, fixed.
2.
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;I think you need to run pgindent, we should give only one space
between the variable name and '='.
so we need to change like belowbool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
Done.
3. +typedef struct WalUsage +{ + long wal_records; /* # of WAL records produced */ + long wal_fpw_records; /* # of full page write WAL records + * produced */IMHO, the name wal_fpw_records is bit confusing, First I thought it
is counting the number of wal records which actually has FPW, then
after seeing code, I realized that it is actually counting total FPW.
Shouldn't we rename it to just wal_fpw? or wal_num_fpw or
wal_fpw_count?
Yes I agree, the name was too confusing. I went with wal_num_fpw. I
also used the same for pg_stat_statements. Other fields are usually
named with a trailing "s" but wal_fpws just seems too weird. I can
change it if consistency is preferred here.
4. Currently, we are combining all full-page write
force/normal/consistency checks in one category. I am not sure
whether it will be good information to know how many are force_fpw and
how many are normal_fpw?
I agree with Amit's POV. For now a single counter seems like enough
to diagnose many behaviors.
I'll keep answering following mails before sending an updated patchset.
On Tue, Mar 31, 2020 at 12:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
It is possible that both of us are having different meanings for below two variables: +typedef struct WalUsage +{ + long wal_records; /* # of WAL records produced */ + long wal_fpw_records; /* # of full page write WAL records + * produced */Let me clarify my understanding. Say if the record is just an FPI
(ex. XLOG_FPI) and doesn't contain any data then do we want to add one
to each of wal_fpw_records and wal_records? My understanding was in
such a case we will just increment wal_fpw_records.
Yes, as Dilip just pointed out the misunderstanding is due to this
poor name. Indeed, in such case what I want is both counters to be
incremented. What I want is wal_records to reflect the total number
of records generated regardless of any content, and wal_num_fpw the
number of full page images, as it seems to make the most sense, and
the easiest way to estimate the ratio of data due to FPW.
3. We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]
which fixed the case for buffer usage.I'm sorry but I'm not following. Do you mean adding regression tests
for that case?No. I mean to say we should implement WAL usage calculation for those
two parallel commands. AFAICS, your patch doesn't cover those two
commands.
Oh I see. I just assumed that Sawada-san's patch would be committed
first and I'd then rebase the patchset on top of the newly added
infrastructure to also handle WAL counters, to avoid any conflict on
that bugfix while this new feature is being discussed. I'll rebase
the patchset against those patches then.
On Tue, Mar 31, 2020 at 12:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); +This should be done for launched workers aka
lps->pcxt->nworkers_launched. I think a similar problem exists in
create index related patch.You're right. Fixed in the new patches.
On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
Just minor nitpicking:
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]);We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?We can do that but I was not sure if it's good since other codes
around there don't use that. So I'd like to leave it for committers.
It's a trivial change.I have reviewed the patch and the patch looks fine to me.
One minor comment
/+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/buffer usage are in DSM / buffer usage area in DSM
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.
diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index b97b678..5dfaf4d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
* Next, accumulate buffer usage. (This must wait for the workers to
* finish, or we might get incomplete data.)
*/
- for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
+ for (i = 0; i < nworkers; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
It worked after the above fix.
On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, - &fpw_lsn); + &fpw_lsn, &num_fpw);- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); } while (EndPos == InvalidXLogRecPtr);I think there are some issues in the num_fpw calculation. For some
cases, we have to return from XLogInsert without inserting a record.
Basically, we've to recompute/reassemble the same record. In those
cases, num_fpw should be reset. Thoughts?
Mmm, yes but since that's the same record is being recomputed from the
same RedoRecPtr, doesn't it mean that we need to reset the counter?
Otherwise we would count the same FPW multiple times.
On Tue, Mar 31, 2020 at 7:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, - &fpw_lsn); + &fpw_lsn, &num_fpw);- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); } while (EndPos == InvalidXLogRecPtr);I think there are some issues in the num_fpw calculation. For some
cases, we have to return from XLogInsert without inserting a record.
Basically, we've to recompute/reassemble the same record. In those
cases, num_fpw should be reset. Thoughts?Mmm, yes but since that's the same record is being recomputed from the
same RedoRecPtr, doesn't it mean that we need to reset the counter?
Otherwise we would count the same FPW multiple times.
Yes. That was my point as well. I missed the part that you're already
resetting the same inside the do-while loop before calling
XLogRecordAssemble. Sorry for the noise.
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]);It worked after the above fix.
Good catch. I think we should not even call
WaitForParallelWorkersToFinish for such a case. So, I guess the fix
could be,
if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
}
or something along those lines.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 8:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]);It worked after the above fix.
Good catch. I think we should not even call
WaitForParallelWorkersToFinish for such a case. So, I guess the fix
could be,if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
}or something along those lines.
Hmm, Right!
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]);It worked after the above fix.
Good catch. I think we should not even call
WaitForParallelWorkersToFinish for such a case. So, I guess the fix
could be,if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
}
Agreed. I've attached the updated patch.
Thank you for testing, Dilip!
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_vacuum_v4.patchapplication/octet-stream; name=bufferusage_vacuum_v4.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..3a188574aa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -259,6 +261,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -1991,6 +1996,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
int nindexes)
{
int nworkers;
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2085,8 +2091,18 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
parallel_vacuum_index(Irel, stats, lps->lvshared,
vacrelstats->dead_tuples, nindexes);
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ }
/*
* Carry the shared balance value to heap scan and disable shared costing
@@ -3071,6 +3087,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3154,6 +3171,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3188,6 +3217,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3317,6 +3352,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3369,9 +3405,16 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
if (lvshared->maintenance_work_mem_worker > 0)
maintenance_work_mem = lvshared->maintenance_work_mem_worker;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
vac_close_indexes(nindexes, indrels, RowExclusiveLock);
table_close(onerel, ShareUpdateExclusiveLock);
pfree(stats);
On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]);It worked after the above fix.
Good catch. I think we should not even call
WaitForParallelWorkersToFinish for such a case. So, I guess the fix
could be,if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
}Agreed. I've attached the updated patch.
Thank you for testing, Dilip!
Thanks! One hunk is failing on the latest head. And, I have rebased
the patch for my testing so posting the same. I have done some more
testing to test multi-pass vacuum.
postgres[114321]=# show maintenance_work_mem ;
maintenance_work_mem
----------------------
1MB
(1 row)
--Test case
select pg_stat_statements_reset();
drop table test;
CREATE TABLE test (a int, b int);
CREATE INDEX idx1 on test(a);
CREATE INDEX idx2 on test(b);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
DELETE FROM test where a%2=0;
VACUUM (PARALLEL n) test;
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'VACUUM%';
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
--------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
VACUUM (PARALLEL 0) test | 5964.282408 | 92447 |
6 | 92453 | 19789 | 0
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
--------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
VACUUM (PARALLEL 1) test | 3957.7658810000003 | 92447 |
6 | 92453 | 19789 |
0
(1 row)
So I am getting correct results with the multi-pass vacuum.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
bufferusage_vacuum_v5.patchapplication/octet-stream; name=bufferusage_vacuum_v5.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04b1234..ec1a291 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -270,6 +272,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2043,6 +2048,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
int nindexes)
{
int nworkers;
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2137,8 +2143,18 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
parallel_vacuum_index(Irel, stats, lps->lvshared,
vacrelstats->dead_tuples, nindexes, vacrelstats);
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ }
/*
* Carry the shared balance value to heap scan and disable shared costing
@@ -3153,6 +3169,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3236,6 +3253,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3270,6 +3299,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3399,6 +3434,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3468,10 +3504,17 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Agreed. I've attached the updated patch.
Thank you for testing, Dilip!
Thanks! One hunk is failing on the latest head. And, I have rebased
the patch for my testing so posting the same. I have done some more
testing to test multi-pass vacuum.
The patch looks good to me. I have done a few minor modifications (a)
moved the declaration of variable closer to where it is used, (b)
changed a comment, (c) ran pgindent. I have also done some additional
testing with more number of indexes and found that vacuum and parallel
vacuum used the same number of total_read_blks and that is what is
expected here.
Let me know what you think of the attached?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
bufferusage_vacuum_v6.patchapplication/octet-stream; name=bufferusage_vacuum_v6.patchDownload
From c2edea15f18e56ee6f26e3498d691ce22cc27676 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 1 Apr 2020 11:50:57 +0530
Subject: [PATCH v6] Allow parallel vacuum to accumulate buffer usage.
Commit 40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers. This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
---
src/backend/access/heap/vacuumlazy.c | 47 ++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04b1234..9f9596c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -270,6 +272,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2137,8 +2142,20 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
parallel_vacuum_index(Irel, stats, lps->lvshared,
vacrelstats->dead_tuples, nindexes, vacrelstats);
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ int i;
+
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ }
/*
* Carry the shared balance value to heap scan and disable shared costing
@@ -3153,6 +3170,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3236,6 +3254,17 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage, so do
+ * it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3270,6 +3299,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3399,6 +3434,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3468,10 +3504,17 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
--
1.8.3.1
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
While testing I have found one issue. Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read. After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]);It worked after the above fix.
Good catch. I think we should not even call
WaitForParallelWorkersToFinish for such a case. So, I guess the fix
could be,if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
InstrAccumParallelQuery(&lps->buffer_usage[i]);
}Agreed. I've attached the updated patch.
Thank you for testing, Dilip!
Thanks! One hunk is failing on the latest head. And, I have rebased
the patch for my testing so posting the same. I have done some more
testing to test multi-pass vacuum.postgres[114321]=# show maintenance_work_mem ;
maintenance_work_mem
----------------------
1MB
(1 row)--Test case
select pg_stat_statements_reset();
drop table test;
CREATE TABLE test (a int, b int);
CREATE INDEX idx1 on test(a);
CREATE INDEX idx2 on test(b);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
DELETE FROM test where a%2=0;
VACUUM (PARALLEL n) test;
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'VACUUM%';query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
--------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
VACUUM (PARALLEL 0) test | 5964.282408 | 92447 |
6 | 92453 | 19789 | 0query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
--------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
VACUUM (PARALLEL 1) test | 3957.7658810000003 | 92447 |
6 | 92453 | 19789 |
0
(1 row)So I am getting correct results with the multi-pass vacuum.
I have done some testing for the parallel "create index".
postgres[99536]=# show maintenance_work_mem ;
maintenance_work_mem
----------------------
1MB
(1 row)
CREATE TABLE test (a int, b int);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
CREATE INDEX idx1 on test(a);
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'CREATE INDEX%';
SET max_parallel_maintenance_workers TO 0;
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1947.4959979999999 | 8947 |
11 | 8958 | 5 |
0
SET max_parallel_maintenance_workers TO 2;
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1942.1426040000001 | 8960 |
14 | 8974 | 5 |
0
(1 row)
I have noticed that the total_read_blks, with the parallel, create
index is more compared to non-parallel one. I have created a fresh
database before each run. I am not much aware of the internal code of
parallel create an index so I am not sure whether it is expected to
read extra blocks with the parallel create an index. I guess maybe
because multiple workers are inserting int the btree they might need
to visit some btree nodes multiple times while traversing the tree
down. But, it's better if someone who have more idea with this code
can confirm this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
(Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
previously mentionned changes.
Note that I'm only attaching those patches for convenience and to make sure
that cfbot is happy.
Attachments:
v9-0001-Allow-parallel-vacuum-to-accumulate-buffer-usage.patchtext/plain; charset=us-asciiDownload
From a0fb471f9f498f7479eb4ae2182271374d694e46 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 1 Apr 2020 11:50:57 +0530
Subject: [PATCH v9 1/6] Allow parallel vacuum to accumulate buffer usage.
Commit 40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers. This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
---
src/backend/access/heap/vacuumlazy.c | 47 ++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04b12342b8..9f9596c718 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -270,6 +272,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2137,8 +2142,20 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
parallel_vacuum_index(Irel, stats, lps->lvshared,
vacrelstats->dead_tuples, nindexes, vacrelstats);
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ int i;
+
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ }
/*
* Carry the shared balance value to heap scan and disable shared costing
@@ -3153,6 +3170,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3236,6 +3254,17 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage, so do
+ * it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3270,6 +3299,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3399,6 +3434,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3468,10 +3504,17 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
--
2.20.1
v9-0002-Allow-parallel-index-creation-to-accumulate-buffe.patchtext/plain; charset=us-asciiDownload
From 03763db52f027d980353fcde12ffdf8d554c4035 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 1 Apr 2020 09:37:41 +0200
Subject: [PATCH v9 2/6] Allow parallel index creation to accumulate buffer
usage.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
---
src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3924945664..ba48b7e9f9 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
--
2.20.1
v9-0003-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From faa8656da138079873c5f2dba48b2c7568fea546 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v9 3/6] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 32 ++++++++++++----
src/backend/access/nbtree/nbtsort.c | 36 +++++++++++++-----
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 +++++--
src/backend/executor/execParallel.c | 38 ++++++++++++++-----
src/backend/executor/instrument.c | 50 +++++++++++++++++++++----
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 ++++++++-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 163 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c718..27e163f5b3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -274,6 +275,8 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
/*
* The number of indexes that support parallel index bulk-deletion and
@@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3174,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3259,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3307,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3450,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3527,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index ba48b7e9f9..d7a1b95c9f 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -83,6 +83,7 @@
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000006)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -206,6 +207,7 @@ typedef struct BTLeader
Sharedsort *sharedsort2;
Snapshot snapshot;
BufferUsage *bufferusage;
+ WalUsage *walusage;
} BTLeader;
/*
@@ -1480,6 +1482,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
BufferUsage *bufferusage;
+ WalUsage *walusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1533,16 +1536,20 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
}
/*
- * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ * Estimate space for BufferUsage and WalUsage -- PARALLEL_KEY_BUFFER_USAGE
+ * and PARALLEL_KEY_WAL_USAGE.
*
- * BufferUsage during executing maintenance command can be used by an
- * extension that reports the buffer usage, such as pg_stat_statements.
- * We have no way of knowing whether anyone's looking at pgBufferUsage,
- * so do it unconditionally.
+ * BufferUsage and WalUsage during executing maintenance command can be
+ * used by an extension that reports the buffer or WAL usage, such as
+ * pg_stat_statements. We have no way of knowing whether anyone's looking
+ * at pgBufferUsage or pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -1615,10 +1622,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
bufferusage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
@@ -1631,6 +1644,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
btleader->bufferusage = bufferusage;
+ btleader->walusage = walusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1669,7 +1683,8 @@ _bt_end_parallel(BTLeader *btleader)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&btleader->bufferusage[i]);
+ InstrAccumParallelQuery(&btleader->bufferusage[i],
+ &btleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
@@ -1802,6 +1817,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
BufferUsage *bufferusage;
+ WalUsage *walusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1871,9 +1887,11 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
- /* Report buffer usage during parallel execution */
+ /* Report buffer and WAL usage during parallel execution */
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+ walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f50..444886bf0c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..413750948b 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..e0cf458314 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,9 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
-
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
Instrumentation *
@@ -31,15 +33,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +57,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +72,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +103,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +170,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +180,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +240,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..e8875a8e9b 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 587b040532..64ae983661 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut.patchtext/plain; charset=us-asciiDownload
From aa130969d8645ee23f1b12c520f3ccdbb2c96e36 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v9 4/6] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..e4661232b2 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page records and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..b05b55979b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page records", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..0e7a373caf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From ad2c14dc18f07d3ab973d8a715738827076f5e62 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v9 5/6] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 144 +++++++++++++++---
.../pg_stat_statements--1.7--1.8.sql | 50 ++++++
.../pg_stat_statements/pg_stat_statements.c | 58 ++++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 64 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 ++++
7 files changed, 322 insertions(+), 26 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..20e44b2d75 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,126 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_num_fpw
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records | wal_num_fpw
+-------------------------------------------------------------+-------+------+-----------+-------------+-------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+ a | b
+---+----------------------
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(4 rows)
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+ a | b
+---+---
+(0 rows)
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+ a | b
+---+----------------------
+ 1 | a
+ 1 | 111
+ 2 | b
+ 2 | 222
+ 3 | c
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 6 | 666
+ 7 | aaa
+ 8 | bbb
+ 9 | bbb
+(12 rows)
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+ a | b
+---+----------------------
+ 1 | 111
+ 2 | 222
+ 3 | 333
+ 4 | 444
+ 5 | 555
+ 1 | a
+ 2 | b
+ 3 | c
+(8 rows)
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(11 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +393,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..21ca4726c6
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 50c345378d..8c5461786b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_num_fpw; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1019,6 +1031,9 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1026,6 +1041,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1061,13 +1077,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1259,6 +1276,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
SpinLockRelease(&e->mutex);
}
@@ -1306,7 +1326,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1318,6 +1339,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum)0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1423,6 +1453,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1619,11 +1653,29 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..5df86d78a3 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,67 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_num_fpw
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+
+-- explicit transaction
+BEGIN;
+UPDATE pgss_test SET b = '111' WHERE a = 1 ;
+COMMIT;
+
+BEGIN \;
+UPDATE pgss_test SET b = '222' WHERE a = 2 \;
+COMMIT ;
+
+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;
+
+BEGIN \;
+UPDATE pgss_test SET b = '555' WHERE a = 5 \;
+UPDATE pgss_test SET b = '666' WHERE a = 6 \;
+COMMIT ;
+
+-- many INSERT values
+INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+
+-- SELECT with constants
+SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ;
+
+SELECT *
+ FROM pgss_test
+ WHERE a > 9
+ ORDER BY a ;
+
+-- SELECT without constants
+SELECT * FROM pgss_test ORDER BY a;
+
+-- SELECT with IN clause
+SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5);
+
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +204,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..2ccaa30846 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v9-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum-.patchtext/plain; charset=us-asciiDownload
From 562fa734c61a1998df69c0a5d22cc7c635e1d6f3 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v9 6/6] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 27e163f5b3..0a74f63856 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -409,6 +409,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -613,6 +615,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -665,7 +670,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -757,6 +768,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1726,6 +1739,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
(Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
previously mentionned changes.
Few other comments:
v9-0003-Add-infrastructure-to-track-WAL-usage
1.
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
-
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
Looks like a spurious line removal
2.
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;
Typo. /imsage/image
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
4.
+-- SELECT usage data, check WAL usage is reported, wal_records equal
rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query |
calls | rows | wal_bytes_generated | wal_records_generated |
wal_records_as_rows
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 |
1 | 1 | t | t | t
+ DROP TABLE pgss_test |
1 | 0 | t | t | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
1 | 3 | t | t | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) |
1 | 10 | t | t | t
+ SELECT * FROM pgss_test ORDER BY a |
1 | 12 | f | f | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a |
2 | 4 | f | f | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) |
1 | 8 | f | f | f
+ SELECT pg_stat_statements_reset() |
1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE |
1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2 |
6 | 6 | t | t | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 |
1 | 3 | t | t | t
+(11 rows)
+
I am not sure if the above tests make much sense as they are just
testing that if WAL is generated for these commands. I understand it
is not easy to make these tests reliable but in that case, we can
think of some simple tests. It seems to me that the difficulty is due
to full_page_writes as that depends on the checkpoint. Can we make
full_page_writes = off for these tests and check some simple
Insert/Update/Delete cases? Alternatively, if you can present the
reason why that is unstable or are tricky to write, then we can simply
get rid of these tests because I don't see tests for BufferUsage. Let
not write tests for the sake of writing it unless they can detect bugs
in the future or are meaningfully covering the new code added.
5.
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM
pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
Again, I am not sure if these modifications make much sense?
6.
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);
The alignment for walusage doesn't seem to be correct. Running
pgindent will fix this.
7.
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
Why are they different? I think we should use the same *GetDatum API
(probably Int64GetDatumFast) for these.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
One more comment related to this patch.
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
I see that other places that display uint64 values use BIGINT datatype
in SQL, so why can't we do the same here? See the usage of queryid in
pg_stat_statements or internal_pages, *_pages exposed via
pgstatindex.c.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Agreed. I've attached the updated patch.
Thank you for testing, Dilip!
Thanks! One hunk is failing on the latest head. And, I have rebased
the patch for my testing so posting the same. I have done some more
testing to test multi-pass vacuum.The patch looks good to me. I have done a few minor modifications (a)
moved the declaration of variable closer to where it is used, (b)
changed a comment, (c) ran pgindent. I have also done some additional
testing with more number of indexes and found that vacuum and parallel
vacuum used the same number of total_read_blks and that is what is
expected here.Let me know what you think of the attached?
The patch looks fine to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
One more comment related to this patch. + + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); + + /* Convert to numeric. */ + wal_bytes = DirectFunctionCall3(numeric_in, + CStringGetDatum(buf), + ObjectIdGetDatum(0), + Int32GetDatum(-1)); + + values[i++] = wal_bytes;I see that other places that display uint64 values use BIGINT datatype
in SQL, so why can't we do the same here? See the usage of queryid in
pg_stat_statements or internal_pages, *_pages exposed via
pgstatindex.c.
I have reviewed 0003 and 0004, I have a few comments.
v9-0003-Add-infrastructure-to-track-WAL-usage
1.
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
Better to give one blank line between the previous statement/variable
declaration and the next comment line.
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
---------Empty line here--------------------
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
2.
@@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
The existing comment above this loop, which just mentions the buffer
usage, not the wal usage so I guess we need to change that.
" /*
* Next, accumulate buffer usage. (This must wait for the workers to
* finish, or we might get incomplete data.)
*/"
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
3.
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
Shall we change to 'full page writes' or 'full page image' instead of
full page records?
Apart from this, I have some testing to see the wal_usage with the
parallel vacuum and the results look fine.
postgres[104248]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[104248]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[104248]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[104248]=# VACUUM (PARALLEL 1) test;
VACUUM
postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855
postgres[106479]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[106479]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[106479]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[106479]=# VACUUM (PARALLEL 0) test;
VACUUM
postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855
By tomorrow, I will try to finish reviewing 0005 and 0006.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Hi,
I'm replying here to all reviews that have been sent, thanks a lot!
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
(Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
previously mentionned changes.Few other comments: v9-0003-Add-infrastructure-to-track-WAL-usage 1. static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); - +static void WalUsageAdd(WalUsage *dst, WalUsage *add);Looks like a spurious line removal
Fixed.
2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1;Typo. /imsage/image
Ah sorry I though I fixed it previously, fixed.
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?
I just saw that Dilip did some testing, but just in case here is some
additional one
- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)
- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:
=# alter table t1 set (parallel_workers = 2);
ALTER TABLE
=# create index t1_parallel_2 on t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
=# create index t1_parallel_0 on t1(id);
CREATE INDEX
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)
It all looks good to me.
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements 4. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows +------------------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- + DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t + DROP TABLE pgss_test | 1 | 0 | t | t | f + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t + SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f + SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f + SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t + UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t + UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t +(11 rows) +I am not sure if the above tests make much sense as they are just
testing that if WAL is generated for these commands. I understand it
is not easy to make these tests reliable but in that case, we can
think of some simple tests. It seems to me that the difficulty is due
to full_page_writes as that depends on the checkpoint. Can we make
full_page_writes = off for these tests and check some simple
Insert/Update/Delete cases? Alternatively, if you can present the
reason why that is unstable or are tricky to write, then we can simply
get rid of these tests because I don't see tests for BufferUsage. Let
not write tests for the sake of writing it unless they can detect bugs
in the future or are meaningfully covering the new code added.
I don't think that we can have any hope in a stable amount of WAL bytes
generated, so testing a positive number looks sensible to me. Then testing
that each 1-line-write query generates a WAL record also looks sensible, so I
kept this. I realized that Kirill used an existing set of queries that were
previously added to validate the multi queries commands behavior, so there's no
need to have all of them again. I just kept one of each (insert, update,
delete, select) to make sure that we do record WAL activity there, but I don't
think that more can really be done. I still think that this is better than
nothing, but if you disagree feel free to drop those tests.
5. -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes | wal_records +-----------------------------------+-------+------+-----------+------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows)Again, I am not sure if these modifications make much sense?
Those are queries that were previously executed. As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe. Once again, feel free to drop the extra
wal_* columns from the output if you disagree.
6.
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);The alignment for walusage doesn't seem to be correct. Running
pgindent will fix this.
Indeed, fixed.
7. + values[i++] = Int64GetDatumFast(tmp.wal_records); + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);Why are they different? I think we should use the same *GetDatum API
(probably Int64GetDatumFast) for these.
Oops, that's a mistake from when I was working on the wal_bytes output, fixed.
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
One more comment related to this patch. + + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); + + /* Convert to numeric. */ + wal_bytes = DirectFunctionCall3(numeric_in, + CStringGetDatum(buf), + ObjectIdGetDatum(0), + Int32GetDatum(-1)); + + values[i++] = wal_bytes;I see that other places that display uint64 values use BIGINT datatype
in SQL, so why can't we do the same here? See the usage of queryid in
pg_stat_statements or internal_pages, *_pages exposed via
pgstatindex.c.
That's because it's harmless to report a signed number for a hash (at least
comapred to the overhead of having it unsigned), while that's certainly not
wanted to report a negative amount of WAL bytes generated if it goes beyond
bigint limit. See the usage of pg_lsn_mi in pg_lsn.c for instance.
On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
I have reviewed 0003 and 0004, I have a few comments.
v9-0003-Add-infrastructure-to-track-WAL-usage1.
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;Better to give one blank line between the previous statement/variable
declaration and the next comment line.
Fixed.
2.
@@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);for (i = 0; i < lps->pcxt->nworkers_launched; i++) - InstrAccumParallelQuery(&lps->buffer_usage[i]); + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); }The existing comment above this loop, which just mentions the buffer
usage, not the wal usage so I guess we need to change that.
Ah indeed, I thought I caught all the comments but missed this one. Fixed.
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
3. + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page records=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes);Shall we change to 'full page writes' or 'full page image' instead of
full page records?
Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed.
Apart from this, I have some testing to see the wal_usage with the
parallel vacuum and the results look fine.postgres[104248]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[104248]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[104248]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[104248]=# VACUUM (PARALLEL 1) test;
VACUUM
postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855postgres[106479]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[106479]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[106479]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[106479]=# VACUUM (PARALLEL 0) test;
VACUUM
postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855
Thanks! I did some similar testing, with also seq/parallel index creation and
got similar results.
By tomorrow, I will try to finish reviewing 0005 and 0006.
Thanks!
Attachments:
v10-0001-Allow-parallel-vacuum-to-accumulate-buffer-usage.patchtext/plain; charset=us-asciiDownload
From f77405a24532d7aa53d9ce7c88148b3437b45734 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 1 Apr 2020 11:50:57 +0530
Subject: [PATCH v10 1/6] Allow parallel vacuum to accumulate buffer usage.
Commit 40d964ec99 allowed vacuum command to process indexes in parallel but
forgot to accumulate the buffer usage stats of parallel workers. This
allows leader backend to accumulate buffer usage stats of all the parallel
workers.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
---
src/backend/access/heap/vacuumlazy.c | 47 ++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04b12342b8..9f9596c718 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -65,6 +65,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "optimizer/paths.h"
#include "pgstat.h"
@@ -137,6 +138,7 @@
#define PARALLEL_VACUUM_KEY_SHARED 1
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -270,6 +272,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2137,8 +2142,20 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
parallel_vacuum_index(Irel, stats, lps->lvshared,
vacrelstats->dead_tuples, nindexes, vacrelstats);
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ int i;
+
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ }
/*
* Carry the shared balance value to heap scan and disable shared costing
@@ -3153,6 +3170,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3236,6 +3254,17 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage, so do
+ * it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -3270,6 +3299,12 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ lps->buffer_usage = buffer_usage;
+
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
memcpy(sharedquery, debug_query_string, querylen + 1);
@@ -3399,6 +3434,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3468,10 +3504,17 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
+ /* Report buffer usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+
/* Pop the error context stack */
error_context_stack = errcallback.previous;
--
2.20.1
v10-0002-Allow-parallel-index-creation-to-accumulate-buff.patchtext/plain; charset=us-asciiDownload
From de052fede091558801c54a3dc0ac3444c44ee06b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 1 Apr 2020 09:37:41 +0200
Subject: [PATCH v10 2/6] Allow parallel index creation to accumulate buffer
usage.
Reported-by: Julien Rouhaud
Author: Sawada Masahiko
Reviewed-by: Dilip Kumar, Amit Kapila and Julien Rouhaud
Discussion: https://postgr.es/m/20200328151721.GB12854@nol
---
src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3924945664..ba48b7e9f9 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ *
+ * BufferUsage during executing maintenance command can be used by an
+ * extension that reports the buffer usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
--
2.20.1
v10-0003-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 00b400334987217e99a8bf115537e26534d38088 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v10 3/6] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 37 ++++++++++++++-----
src/backend/access/nbtree/nbtsort.c | 36 +++++++++++++-----
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 +++++--
src/backend/executor/execParallel.c | 38 ++++++++++++++-----
src/backend/executor/instrument.c | 49 ++++++++++++++++++++++---
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 ++++++++-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 166 insertions(+), 42 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c718..cc7e8521a5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -275,6 +276,9 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
vacrelstats->dead_tuples, nindexes, vacrelstats);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
@@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index ba48b7e9f9..d7a1b95c9f 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -83,6 +83,7 @@
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000006)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -206,6 +207,7 @@ typedef struct BTLeader
Sharedsort *sharedsort2;
Snapshot snapshot;
BufferUsage *bufferusage;
+ WalUsage *walusage;
} BTLeader;
/*
@@ -1480,6 +1482,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
BufferUsage *bufferusage;
+ WalUsage *walusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1533,16 +1536,20 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
}
/*
- * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE
+ * Estimate space for BufferUsage and WalUsage -- PARALLEL_KEY_BUFFER_USAGE
+ * and PARALLEL_KEY_WAL_USAGE.
*
- * BufferUsage during executing maintenance command can be used by an
- * extension that reports the buffer usage, such as pg_stat_statements.
- * We have no way of knowing whether anyone's looking at pgBufferUsage,
- * so do it unconditionally.
+ * BufferUsage and WalUsage during executing maintenance command can be
+ * used by an extension that reports the buffer or WAL usage, such as
+ * pg_stat_statements. We have no way of knowing whether anyone's looking
+ * at pgBufferUsage or pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -1615,10 +1622,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
bufferusage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
@@ -1631,6 +1644,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
btleader->bufferusage = bufferusage;
+ btleader->walusage = walusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1669,7 +1683,8 @@ _bt_end_parallel(BTLeader *btleader)
* finish, or we might get incomplete data.)
*/
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&btleader->bufferusage[i]);
+ InstrAccumParallelQuery(&btleader->bufferusage[i],
+ &btleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
@@ -1802,6 +1817,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
BufferUsage *bufferusage;
+ WalUsage *walusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1871,9 +1887,11 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
- /* Report buffer usage during parallel execution */
+ /* Report buffer and WAL usage during parallel execution */
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+ walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f50..444886bf0c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..5e032e7042 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page image constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..dd615581ac 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,8 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
@@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +181,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +241,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..e8875a8e9b 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 587b040532..64ae983661 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v10-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patchtext/plain; charset=us-asciiDownload
From 7405d9f06e7b5ddafa8e09244e1ebab058d7713b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v10 4/6] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..e4661232b2 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page records and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..b05b55979b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page records", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..0e7a373caf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 9d12c05b181417f30fcc495ccd34f057e9e23bea Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v10 5/6] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/pg_stat_statements/Makefile | 3 +-
.../expected/pg_stat_statements.out | 74 ++++++++++++++-----
.../pg_stat_statements--1.7--1.8.sql | 50 +++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 61 ++++++++++++++-
.../pg_stat_statements.control | 2 +-
.../sql/pg_stat_statements.sql | 29 +++++++-
doc/src/sgml/pgstatstatements.sgml | 27 +++++++
7 files changed, 219 insertions(+), 27 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 80042a0905..081f997d70 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
pg_stat_statements.o
EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \
+DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 6787ec1efd..1a3ac2af12 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -195,20 +195,56 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
3 | c
(8 rows)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
--------------------------------------------------------------+-------+------
- DELETE FROM test WHERE a > $1 | 1 | 1
- INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3
- INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10
- SELECT * FROM test ORDER BY a | 1 | 12
- SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8
- SELECT pg_stat_statements_reset() | 1 | 1
- UPDATE test SET b = $1 WHERE a = $2 | 6 | 6
- UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_num_fpw
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records | wal_num_fpw
+-------------------------------------------------------------+-------+------+-----------+-------------+-------------
+ DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 | 0
+ INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 | 0
+ INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 | 0
+ SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 | 0
+ SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 | 0
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 | 0
+ UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 | 0
(9 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+-----------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(6 rows)
+
--
-- pg_stat_statements.track = none
--
@@ -287,13 +323,13 @@ SELECT PLUS_ONE(10);
11
(1 row)
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
- query | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT | 1 | 1
- SELECT PLUS_ONE($1) | 2 | 2
- SELECT PLUS_TWO($1) | 2 | 2
- SELECT pg_stat_statements_reset() | 1 | 1
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT | 1 | 1 | 0 | 0
+ SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0
+ SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0
+ SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0
(4 rows)
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..21ca4726c6
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,50 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+ OUT userid oid,
+ OUT dbid oid,
+ OUT queryid bigint,
+ OUT query text,
+ OUT calls int8,
+ OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT mean_time float8,
+ OUT stddev_time float8,
+ OUT rows int8,
+ OUT shared_blks_hit int8,
+ OUT shared_blks_read int8,
+ OUT shared_blks_dirtied int8,
+ OUT shared_blks_written int8,
+ OUT local_blks_hit int8,
+ OUT local_blks_read int8,
+ OUT local_blks_dirtied int8,
+ OUT local_blks_written int8,
+ OUT temp_blks_read int8,
+ OUT temp_blks_written int8,
+ OUT blk_read_time float8,
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+ SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 50c345378d..295729625b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,8 @@ typedef enum pgssVersion
PGSS_V1_0 = 0,
PGSS_V1_1,
PGSS_V1_2,
- PGSS_V1_3
+ PGSS_V1_3,
+ PGSS_V1_4
} pgssVersion;
/*
@@ -161,6 +162,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_num_fpw; /* # of full page wal records written */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_4);
PG_FUNCTION_INFO_V1(pg_stat_statements);
static void pgss_shmem_startup(void);
@@ -307,12 +312,13 @@ static void pgss_ExecutorEnd(QueryDesc *queryDesc);
static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
ProcessUtilityContext context, ParamListInfo params,
QueryEnvironment *queryEnv,
- DestReceiver *dest, QueryCompletion *qc);
+ DestReceiver *dest, QueryCompletion * qc);
static uint64 pgss_hash_string(const char *str, int len);
static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
+
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
nested_level++;
@@ -1019,6 +1031,9 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1026,6 +1041,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1061,13 +1077,14 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. Time and usage are ignored in this case.
*/
static void
pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1259,6 +1276,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
SpinLockRelease(&e->mutex);
}
@@ -1306,7 +1326,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_4 26
+#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1318,6 +1339,16 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
* expected API version is identified by embedding it in the C name of the
* function. Unfortunately we weren't bright enough to do that for 1.1.
*/
+Datum
+pg_stat_statements_1_4(PG_FUNCTION_ARGS)
+{
+ bool showtext = PG_GETARG_BOOL(0);
+
+ pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext);
+
+ return (Datum) 0;
+}
+
Datum
pg_stat_statements_1_3(PG_FUNCTION_ARGS)
{
@@ -1423,6 +1454,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
if (api_version != PGSS_V1_3)
elog(ERROR, "incorrect number of output arguments");
break;
+ case PG_STAT_STATEMENTS_COLS_V1_4:
+ if (api_version != PGSS_V1_4)
+ elog(ERROR, "incorrect number of output arguments");
+ break;
default:
elog(ERROR, "incorrect number of output arguments");
}
@@ -1619,11 +1654,29 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_4)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
+ api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 :
-1 /* fail if you forget to update this assert */ ));
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 8b527070d4..6e1c80e05e 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -99,7 +99,32 @@ SELECT * FROM test ORDER BY a;
-- SELECT with IN clause
SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL
+SELECT query, calls, rows, wal_bytes, wal_records, wal_num_fpw
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = none
@@ -144,7 +169,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;
SELECT PLUS_ONE(8);
SELECT PLUS_ONE(10);
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C";
--
-- pg_stat_statements.track = all
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..2ccaa30846 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -221,6 +221,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patchtext/plain; charset=us-asciiDownload
From 29ae918310dc31bf28fc28e4e0e1546d7a406fdc Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v10 6/6] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
src/backend/commands/explain.c | 2 +-
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cc7e8521a5..735087dd74 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b05b55979b..f7f1c3efce 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page records", NULL,
+ ExplainPropertyInteger("WAL full page writes", NULL,
usage->wal_num_fpw, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
--
2.20.1
Adding Peter G.
On Wed, Apr 1, 2020 at 12:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have done some testing for the parallel "create index".
postgres[99536]=# show maintenance_work_mem ;
maintenance_work_mem
----------------------
1MB
(1 row)CREATE TABLE test (a int, b int);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
CREATE INDEX idx1 on test(a);
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'CREATE INDEX%';SET max_parallel_maintenance_workers TO 0;
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1947.4959979999999 | 8947 |
11 | 8958 | 5 |
0SET max_parallel_maintenance_workers TO 2;
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1942.1426040000001 | 8960 |
14 | 8974 | 5 |
0
(1 row)I have noticed that the total_read_blks, with the parallel, create
index is more compared to non-parallel one. I have created a fresh
database before each run. I am not much aware of the internal code of
parallel create an index so I am not sure whether it is expected to
read extra blocks with the parallel create an index. I guess maybe
because multiple workers are inserting int the btree they might need
to visit some btree nodes multiple times while traversing the tree
down. But, it's better if someone who have more idea with this code
can confirm this.
Peter, Is this behavior expected?
Let me summarize the situation so that it would be easier for Peter to
comment. Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases. We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations. We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that. We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Peter, Is this behavior expected?
Let me summarize the situation so that it would be easier for Peter to
comment. Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases. We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations. We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that. We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?
nbtree CREATE INDEX doesn't even go through the buffer manager. The
difference that Dilip showed is probably due to extra catalog accesses
in the two parallel workers -- pg_amproc lookups, and the like. Those
are rather small differences, overall.
Can Dilip demonstrate the the "extra" buffer accesses are
proportionate to the number of workers launched in some constant,
predictable way?
--
Peter Geoghegan
On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Peter, Is this behavior expected?
Let me summarize the situation so that it would be easier for Peter to
comment. Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases. We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations. We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that. We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?nbtree CREATE INDEX doesn't even go through the buffer manager.
Thanks for clarifying. So IIUC, it will not go through the buffer
manager for the index pages, but for the heap pages, it will still go
through the buffer manager.
The
difference that Dilip showed is probably due to extra catalog accesses
in the two parallel workers -- pg_amproc lookups, and the like. Those
are rather small differences, overall.
Can Dilip demonstrate the the "extra" buffer accesses are
proportionate to the number of workers launched in some constant,
predictable way?
Okay, I will test this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 9:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Peter, Is this behavior expected?
Let me summarize the situation so that it would be easier for Peter to
comment. Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases. We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations. We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that. We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?nbtree CREATE INDEX doesn't even go through the buffer manager.
Thanks for clarifying. So IIUC, it will not go through the buffer
manager for the index pages, but for the heap pages, it will still go
through the buffer manager.The
difference that Dilip showed is probably due to extra catalog accesses
in the two parallel workers -- pg_amproc lookups, and the like. Those
are rather small differences, overall.Can Dilip demonstrate the the "extra" buffer accesses are
proportionate to the number of workers launched in some constant,
predictable way?Okay, I will test this.
0-worker
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1228.895057 | 8947 |
11 | 8971 | 5 |
0
1-worker
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1006.157231 | 8962 |
12 | 8974 | 5 |
0
2-workers
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 949.44663 | 8965 |
12 | 8977 | 5 | 0
3-workers
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 1037.297196 | 8968 |
12 | 8980 | 5 |
0
4-workers
query | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written
------------------------------+------------+-----------------+------------------+-----------------+---------------------+---------------------
CREATE INDEX idx1 on test(a) | 889.332782 | 8971 |
12 | 8983 | 6 | 0
You are right, it is increasing with some constant factor.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?I just saw that Dilip did some testing, but just in case here is some
additional one- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:=# alter table t1 set (parallel_workers = 2);
ALTER TABLE=# create index t1_parallel_2 on t1(id);
CREATE INDEX=# alter table t1 set (parallel_workers = 0);
ALTER TABLE=# create index t1_parallel_0 on t1(id);
CREATE INDEX=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)It all looks good to me.
Here the wal_num_fpw and wal_bytes are different between parallel and
non-parallel versions. Is it due to checkpoint or something else? We
can probably rule out checkpoint by increasing checkpoint_timeout and
other checkpoint related parameters.
5. -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes | wal_records +-----------------------------------+-------+------+-----------+------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows)Again, I am not sure if these modifications make much sense?
Those are queries that were previously executed. As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe.
On a similar theory, one could have checked bufferusage stats as well.
The statements are using some expressions so don't see any value in
check all usage data for such statements.
Once again, feel free to drop the extra
wal_* columns from the output if you disagree.
Right now, that particular patch is not getting applied (probably due
to recent commit 17e0328224). Can you rebase it?
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
3. + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page records=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes);Shall we change to 'full page writes' or 'full page image' instead of
full page records?Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed.
I don't see this change in the patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Also, I forgot to mention that let's not base this on buffer usage
patch for create index
(v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
per recent discussion I am not sure about its usefulness. I think we
can proceed with this patch without
v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
I'm replying here to all reviews that have been sent, thanks a lot!
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
(Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
previously mentionned changes.Few other comments: v9-0003-Add-infrastructure-to-track-WAL-usage 1. static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); - +static void WalUsageAdd(WalUsage *dst, WalUsage *add);Looks like a spurious line removal
Fixed.
2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1;Typo. /imsage/image
Ah sorry I though I fixed it previously, fixed.
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?I just saw that Dilip did some testing, but just in case here is some
additional one- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:=# alter table t1 set (parallel_workers = 2);
ALTER TABLE=# create index t1_parallel_2 on t1(id);
CREATE INDEX=# alter table t1 set (parallel_workers = 0);
ALTER TABLE=# create index t1_parallel_0 on t1(id);
CREATE INDEX=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)It all looks good to me.
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements 4. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows +------------------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- + DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t + DROP TABLE pgss_test | 1 | 0 | t | t | f + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t + SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f + SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f + SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t + UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t + UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t +(11 rows) +I am not sure if the above tests make much sense as they are just
testing that if WAL is generated for these commands. I understand it
is not easy to make these tests reliable but in that case, we can
think of some simple tests. It seems to me that the difficulty is due
to full_page_writes as that depends on the checkpoint. Can we make
full_page_writes = off for these tests and check some simple
Insert/Update/Delete cases? Alternatively, if you can present the
reason why that is unstable or are tricky to write, then we can simply
get rid of these tests because I don't see tests for BufferUsage. Let
not write tests for the sake of writing it unless they can detect bugs
in the future or are meaningfully covering the new code added.I don't think that we can have any hope in a stable amount of WAL bytes
generated, so testing a positive number looks sensible to me. Then testing
that each 1-line-write query generates a WAL record also looks sensible, so I
kept this. I realized that Kirill used an existing set of queries that were
previously added to validate the multi queries commands behavior, so there's no
need to have all of them again. I just kept one of each (insert, update,
delete, select) to make sure that we do record WAL activity there, but I don't
think that more can really be done. I still think that this is better than
nothing, but if you disagree feel free to drop those tests.5. -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes | wal_records +-----------------------------------+-------+------+-----------+------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows)Again, I am not sure if these modifications make much sense?
Those are queries that were previously executed. As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe. Once again, feel free to drop the extra
wal_* columns from the output if you disagree.6.
static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
int query_location, int query_len,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage* walusage,
pgssJumbleState *jstate);The alignment for walusage doesn't seem to be correct. Running
pgindent will fix this.Indeed, fixed.
7. + values[i++] = Int64GetDatumFast(tmp.wal_records); + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);Why are they different? I think we should use the same *GetDatum API
(probably Int64GetDatumFast) for these.Oops, that's a mistake from when I was working on the wal_bytes output, fixed.
v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
One more comment related to this patch. + + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); + + /* Convert to numeric. */ + wal_bytes = DirectFunctionCall3(numeric_in, + CStringGetDatum(buf), + ObjectIdGetDatum(0), + Int32GetDatum(-1)); + + values[i++] = wal_bytes;I see that other places that display uint64 values use BIGINT datatype
in SQL, so why can't we do the same here? See the usage of queryid in
pg_stat_statements or internal_pages, *_pages exposed via
pgstatindex.c.That's because it's harmless to report a signed number for a hash (at least
comapred to the overhead of having it unsigned), while that's certainly not
wanted to report a negative amount of WAL bytes generated if it goes beyond
bigint limit. See the usage of pg_lsn_mi in pg_lsn.c for instance.On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
I have reviewed 0003 and 0004, I have a few comments.
v9-0003-Add-infrastructure-to-track-WAL-usage1.
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;Better to give one blank line between the previous statement/variable
declaration and the next comment line.Fixed.
2.
@@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);for (i = 0; i < lps->pcxt->nworkers_launched; i++) - InstrAccumParallelQuery(&lps->buffer_usage[i]); + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); }The existing comment above this loop, which just mentions the buffer
usage, not the wal usage so I guess we need to change that.Ah indeed, I thought I caught all the comments but missed this one. Fixed.
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
3. + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page records=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes);Shall we change to 'full page writes' or 'full page image' instead of
full page records?Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed.
Apart from this, I have some testing to see the wal_usage with the
parallel vacuum and the results look fine.postgres[104248]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[104248]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[104248]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[104248]=# VACUUM (PARALLEL 1) test;
VACUUM
postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855postgres[106479]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[106479]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[106479]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[106479]=# VACUUM (PARALLEL 0) test;
VACUUM
postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
query | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855Thanks! I did some similar testing, with also seq/parallel index creation and
got similar results.By tomorrow, I will try to finish reviewing 0005 and 0006.
I have reviewed these patches and I have a few cosmetic comments.
v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
1.
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_num_fpw; /* # of full page wal records written */
/s/# of full page wal records written / /* # of WAL full page image produced */
2.
static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
ProcessUtilityContext context, ParamListInfo params,
QueryEnvironment *queryEnv,
- DestReceiver *dest, QueryCompletion *qc);
+ DestReceiver *dest, QueryCompletion * qc);
Useless hunk.
3.
v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum
@@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page records", NULL,
+ ExplainPropertyInteger("WAL full page writes", NULL,
usage->wal_num_fpw, es);
Just noticed that in 0004 you have first added "WAL full page
records", which is later corrected to "WAL full page writes" in 0006.
I think we better keep this proper in 0004 itself and avoid this hunk
in 0006, otherwise, it creates confusion while reviewing.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?I just saw that Dilip did some testing, but just in case here is some
additional one- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:=# alter table t1 set (parallel_workers = 2);
ALTER TABLE=# create index t1_parallel_2 on t1(id);
CREATE INDEX=# alter table t1 set (parallel_workers = 0);
ALTER TABLE=# create index t1_parallel_0 on t1(id);
CREATE INDEX=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)It all looks good to me.
Here the wal_num_fpw and wal_bytes are different between parallel and
non-parallel versions. Is it due to checkpoint or something else? We
can probably rule out checkpoint by increasing checkpoint_timeout and
other checkpoint related parameters.
I think this is because I did a checkpoint after the VACUUM tests, so the 1st
CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I
didn't try to investigate more since:
On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
Also, I forgot to mention that let's not base this on buffer usage
patch for create index
(v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
per recent discussion I am not sure about its usefulness. I think we
can proceed with this patch without
v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.
Which is done in attached v11.
5. -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes | wal_records +-----------------------------------+-------+------+-----------+------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows)Again, I am not sure if these modifications make much sense?
Those are queries that were previously executed. As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe.On a similar theory, one could have checked bufferusage stats as well.
The statements are using some expressions so don't see any value in
check all usage data for such statements.
Dropped.
Right now, that particular patch is not getting applied (probably due
to recent commit 17e0328224). Can you rebase it?
Done.
v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
3. + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page records=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes);Shall we change to 'full page writes' or 'full page image' instead of
full page records?Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed.
I don't see this change in the patch.
Yes, as Dilip reported I fixuped the wrong commit, sorry about that. This
version should now be ok.
On Thu, Apr 02, 2020 at 12:04:32PM +0530, Dilip Kumar wrote:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
By tomorrow, I will try to finish reviewing 0005 and 0006.
I have reviewed these patches and I have a few cosmetic comments.
v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements1. + uint64 wal_bytes; /* total amount of wal bytes written */ + int64 wal_records; /* # of wal records written */ + int64 wal_num_fpw; /* # of full page wal records written *//s/# of full page wal records written / /* # of WAL full page image produced */
Done, I also consistently s/wal/WAL/.
2. static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, - DestReceiver *dest, QueryCompletion *qc); + DestReceiver *dest, QueryCompletion * qc);Useless hunk.
Oops, leftover of a pgindent as QueryCompletion isn't in the typedefs yet. I
thought I discarded all the useless hunks but missed this one. Thanks, fixed.
3.
v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum
@@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage) { ExplainPropertyInteger("WAL records", NULL, usage->wal_records, es); - ExplainPropertyInteger("WAL full page records", NULL, + ExplainPropertyInteger("WAL full page writes", NULL, usage->wal_num_fpw, es); Just noticed that in 0004 you have first added "WAL full page records", which is later corrected to "WAL full page writes" in 0006. I think we better keep this proper in 0004 itself and avoid this hunk in 0006, otherwise, it creates confusion while reviewing.
Oh, I didn't realized that I fixuped the wrong commit. Fixed.
I also adapted the documentation that mentioned full page records instead of
full page images, and integrated Justin's comment:
In 0003:
+ /* Provide WAL update data to the instrumentation */
Remove "data" ??
so changed to "Report WAL traffic to the instrumentation."
I didn't change the (auto)vacuum output yet (except fixing the s/full page
records/full page writes/ that I previously missed), as it's not clear what the
consensus is yet. I'll take care of that as soon as we reach to a consensus.
Attachments:
v11-0001-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 05776f11ad1fac45dc390cda4df03f5402e214be Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v11 1/4] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 37 ++++++++++++++-----
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 +++++--
src/backend/executor/execParallel.c | 38 ++++++++++++++-----
src/backend/executor/instrument.c | 49 ++++++++++++++++++++++---
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 ++++++++-
src/tools/pgindent/typedefs.list | 1 +
9 files changed, 139 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c718..cc7e8521a5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -275,6 +276,9 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
vacrelstats->dead_tuples, nindexes, vacrelstats);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
@@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f50..50b78f3143 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Report WAL traffic to the instrumentation. */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..5e032e7042 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page image constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..dd615581ac 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,8 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
@@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +181,25 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ memset(bufusage, 0, sizeof(BufferUsage));
+ memset(walusage, 0, sizeof(WalUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +241,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..e8875a8e9b 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 939de985d3..34623523a7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patchtext/plain; charset=us-asciiDownload
From 14bb2d37bf74f1cdc0d78fb0cab4b49525d34e20 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v11 2/4] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..494e60ecc9 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page images and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..cefe2144e5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page writes=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page writes", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..0e7a373caf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v11-0003-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 4fe82b4d09b068041bdca91628733b48845acc16 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v11 3/4] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
.../expected/pg_stat_statements.out | 39 +++++++++++++
.../pg_stat_statements--1.7--1.8.sql | 7 ++-
.../pg_stat_statements/pg_stat_statements.c | 55 +++++++++++++++++--
.../sql/pg_stat_statements.sql | 23 ++++++++
doc/src/sgml/pgstatstatements.sgml | 27 +++++++++
5 files changed, 144 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 45dbe9e677..02da7245b4 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -211,6 +211,45 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
(10 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+-----------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SELECT query, calls, rows, +| 0 | 0 | f | f | t
+ wal_bytes > $1 as wal_bytes_generated, +| | | | |
+ wal_records > $2 as wal_records_generated, +| | | | |
+ wal_records = rows as wal_records_as_rows +| | | | |
+ FROM pg_stat_statements ORDER BY query COLLATE "C" | | | | |
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(7 rows)
+
--
-- pg_stat_statements.track = none
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index aa5cc3c77b..2fcf7aee01 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -41,7 +41,10 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT temp_blks_read int8,
OUT temp_blks_written int8,
OUT blk_read_time float8,
- OUT blk_write_time float8
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
@@ -51,5 +54,3 @@ CREATE VIEW pg_stat_statements AS
SELECT * FROM pg_stat_statements(true);
GRANT SELECT ON pg_stat_statements TO PUBLIC;
-
-
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b01f..f8bf4f852a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -185,6 +185,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image generated */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -348,6 +351,7 @@ static void pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -891,6 +895,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -926,9 +931,16 @@ pgss_planner(Query *parse,
instr_time duration;
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
/* We need to track buffer usage as the planner can access them. */
bufusage_start = pgBufferUsage;
+ /*
+ * Similarly the planner could write some WAL records in some cases
+ * (e.g. setting a hint bit with those being WAL-logged)
+ */
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
plan_nested_level++;
@@ -954,6 +966,10 @@ pgss_planner(Query *parse,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(query_string,
parse->queryId,
parse->stmt_location,
@@ -962,6 +978,7 @@ pgss_planner(Query *parse,
INSTR_TIME_GET_MILLISEC(duration),
0,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1079,6 +1096,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -1123,8 +1141,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
uint64 rows;
BufferUsage bufusage_start,
bufusage;
-
+ WalUsage walusage_start,
+ walusage;
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
exec_nested_level++;
@@ -1154,6 +1174,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1162,6 +1186,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1197,7 +1222,8 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. total_time, rows, bufusage and walusage are ignored in this
+ * case.
*
* If kind is PGSS_PLAN or PGSS_EXEC, its value is used as the array position
* for the arrays in the Counters field.
@@ -1208,6 +1234,7 @@ pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1402,6 +1429,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
SpinLockRelease(&e->mutex);
}
@@ -1449,8 +1479,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS_V1_8 29
-#define PG_STAT_STATEMENTS_COLS 29 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_8 32
+#define PG_STAT_STATEMENTS_COLS 32 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1786,6 +1816,23 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_8)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 435d51008f..eaacd4021a 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -101,6 +101,29 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- pg_stat_statements.track = none
--
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index b4df84c60b..3d26108649 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -264,6 +264,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v11-0004-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patchtext/plain; charset=us-asciiDownload
From aff0b78f60bfe182d6795531f74615810631f0e0 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v11 4/4] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cc7e8521a5..735087dd74 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?I just saw that Dilip did some testing, but just in case here is some
additional one- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:=# alter table t1 set (parallel_workers = 2);
ALTER TABLE=# create index t1_parallel_2 on t1(id);
CREATE INDEX=# alter table t1 set (parallel_workers = 0);
ALTER TABLE=# create index t1_parallel_0 on t1(id);
CREATE INDEX=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)It all looks good to me.
Here the wal_num_fpw and wal_bytes are different between parallel and
non-parallel versions. Is it due to checkpoint or something else? We
can probably rule out checkpoint by increasing checkpoint_timeout and
other checkpoint related parameters.I think this is because I did a checkpoint after the VACUUM tests, so the 1st
CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I
didn't try to investigate more since:
We need to do this.
On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
Also, I forgot to mention that let's not base this on buffer usage
patch for create index
(v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
per recent discussion I am not sure about its usefulness. I think we
can proceed with this patch without
v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.Which is done in attached v11.
Hmm, I haven't suggested removing the WAL usage from the parallel
create index. I just told not to use the infrastructure of another
patch. We bypass the buffer manager but do write WAL. See
_bt_blwritepage->log_newpage. So we need to accumulate WAL usage even
if we decide not to do anything about BufferUsage which means we need
to investigate the above inconsistency in wal_num_fpw and wal_bytes
between parallel and non-parallel version.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 02:32:07PM +0530, Amit Kapila wrote:
On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?I just saw that Dilip did some testing, but just in case here is some
additional one- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
query | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2
vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2
(2 rows)- create index, overload t1's parallel_workers, using the 1M line just
vacuumed:=# alter table t1 set (parallel_workers = 2);
ALTER TABLE=# create index t1_parallel_2 on t1(id);
CREATE INDEX=# alter table t1 set (parallel_workers = 0);
ALTER TABLE=# create index t1_parallel_0 on t1(id);
CREATE INDEX=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745
create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758
(2 rows)It all looks good to me.
Here the wal_num_fpw and wal_bytes are different between parallel and
non-parallel versions. Is it due to checkpoint or something else? We
can probably rule out checkpoint by increasing checkpoint_timeout and
other checkpoint related parameters.I think this is because I did a checkpoint after the VACUUM tests, so the 1st
CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I
didn't try to investigate more since:We need to do this.
On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
Also, I forgot to mention that let's not base this on buffer usage
patch for create index
(v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
per recent discussion I am not sure about its usefulness. I think we
can proceed with this patch without
v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.Which is done in attached v11.
Hmm, I haven't suggested removing the WAL usage from the parallel
create index. I just told not to use the infrastructure of another
patch. We bypass the buffer manager but do write WAL. See
_bt_blwritepage->log_newpage. So we need to accumulate WAL usage even
if we decide not to do anything about BufferUsage which means we need
to investigate the above inconsistency in wal_num_fpw and wal_bytes
between parallel and non-parallel version.
Oh, I thought that you wanted to wait on that part, as we'll probably change
the parallel create index to report buffer access eventually.
v12 attached with an adaptation of Sawada-san's original patch but only dealing
with WAL activity.
I did some more experiment, ensuring as much stability as possible:
=# create table t1(id integer);
CREATE TABLE
=# insert into t1 select * from generate_series(1, 1000000);
INSERT 0 1000000
=# select * from pg_stat_statements_reset() ;
pg_stat_statements_reset
--------------------------
(1 row)
=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 1);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_1 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 2);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_2 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 3);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_3 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 4);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_4 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 5);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_5 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 6);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_6 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 7);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_7 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 8);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_8 ON t1(id);
CREATE INDEX
=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0_bis ON t1(id);
CREATE INDEX
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0_ter ON t1(id);
CREATE INDEX
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)
=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)
So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).
Attachments:
v12-0001-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 694fe49a9973679ecda4a76b274ed135b753887e Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v12 1/4] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 37 ++++++++++++-----
src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 ++++--
src/backend/executor/execParallel.c | 38 +++++++++++++-----
src/backend/executor/instrument.c | 53 ++++++++++++++++++++++---
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 ++++++++-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 183 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c718..cc7e8521a5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -275,6 +276,9 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
vacrelstats->dead_tuples, nindexes, vacrelstats);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
@@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3924945664..3f4cb7d39e 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ WalUsage *walusage;
} BTLeader;
/*
@@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ WalUsage *walusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ *
+ * WalUsage during execution of maintenance command can be used by an
+ * extension that reports the WAL usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgWalUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's WalUsage; no need to initialize */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->walusage = walusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(NULL, &btleader->walusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ WalUsage *walusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report WAL usage during parallel execution */
+ walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(NULL, &walusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f50..50b78f3143 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Report WAL traffic to the instrumentation. */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..5e032e7042 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page image constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..a77571a895 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,8 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
@@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +181,29 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ if (bufusage)
+ {
+ memset(bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ }
+ memset(walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ if (bufusage)
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..e8875a8e9b 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 939de985d3..34623523a7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v12-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patchtext/plain; charset=us-asciiDownload
From 07cdbb8dd2118ee9d4cfce29a3b596c9def476e8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v12 2/4] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..494e60ecc9 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, full page images and bytes generated. In text
+ format, only non-zero values are printed. This parameter may only be
+ used when <literal>ANALYZE</literal> is also enabled. It defaults to
+ <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..cefe2144e5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page writes=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page writes", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..0e7a373caf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v12-0003-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From ab9698684cc31df460b0e6993636fa4359d78402 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v12 3/4] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
.../expected/pg_stat_statements.out | 39 +++++++++++++
.../pg_stat_statements--1.7--1.8.sql | 5 +-
.../pg_stat_statements/pg_stat_statements.c | 55 +++++++++++++++++--
.../sql/pg_stat_statements.sql | 23 ++++++++
doc/src/sgml/pgstatstatements.sgml | 27 +++++++++
5 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 45dbe9e677..02da7245b4 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -211,6 +211,45 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
(10 rows)
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+-----------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SELECT query, calls, rows, +| 0 | 0 | f | f | t
+ wal_bytes > $1 as wal_bytes_generated, +| | | | |
+ wal_records > $2 as wal_records_generated, +| | | | |
+ wal_records = rows as wal_records_as_rows +| | | | |
+ FROM pg_stat_statements ORDER BY query COLLATE "C" | | | | |
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(7 rows)
+
--
-- pg_stat_statements.track = none
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 60d454db7f..2fcf7aee01 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -41,7 +41,10 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT temp_blks_read int8,
OUT temp_blks_written int8,
OUT blk_read_time float8,
- OUT blk_write_time float8
+ OUT blk_write_time float8,
+ OUT wal_bytes numeric,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b01f..f8bf4f852a 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -185,6 +185,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image generated */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
@@ -348,6 +351,7 @@ static void pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -891,6 +895,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -926,9 +931,16 @@ pgss_planner(Query *parse,
instr_time duration;
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
/* We need to track buffer usage as the planner can access them. */
bufusage_start = pgBufferUsage;
+ /*
+ * Similarly the planner could write some WAL records in some cases
+ * (e.g. setting a hint bit with those being WAL-logged)
+ */
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
plan_nested_level++;
@@ -954,6 +966,10 @@ pgss_planner(Query *parse,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(query_string,
parse->queryId,
parse->stmt_location,
@@ -962,6 +978,7 @@ pgss_planner(Query *parse,
INSTR_TIME_GET_MILLISEC(duration),
0,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1079,6 +1096,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -1123,8 +1141,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
uint64 rows;
BufferUsage bufusage_start,
bufusage;
-
+ WalUsage walusage_start,
+ walusage;
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
exec_nested_level++;
@@ -1154,6 +1174,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1162,6 +1186,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1197,7 +1222,8 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. total_time, rows, bufusage and walusage are ignored in this
+ * case.
*
* If kind is PGSS_PLAN or PGSS_EXEC, its value is used as the array position
* for the arrays in the Counters field.
@@ -1208,6 +1234,7 @@ pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1402,6 +1429,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_bytes += walusage->wal_bytes;
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
SpinLockRelease(&e->mutex);
}
@@ -1449,8 +1479,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS_V1_8 29
-#define PG_STAT_STATEMENTS_COLS 29 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_8 32
+#define PG_STAT_STATEMENTS_COLS 32 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1786,6 +1816,23 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_8)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 435d51008f..eaacd4021a 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -101,6 +101,29 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- pg_stat_statements.track = none
--
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index b4df84c60b..3d26108649 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -264,6 +264,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v12-0004-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patchtext/plain; charset=us-asciiDownload
From 5adcd8bf0af96363974e93a3f03c2e509cf92fcb Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v12 4/4] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cc7e8521a5..735087dd74 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).
I think we need to know the reason for this. Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.
Few other minor comments
------------------------------------
pg_stat_statements patch
1.
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to
validate WAL generation metrics
+--
The word 'non-temp' in the above comment appears out of place. We
don't need to specify it.
2.
+-- SELECT usage data, check WAL usage is reported, wal_records equal
rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
The comment doesn't seem to match what we are doing in the statement.
I think we can simplify it to something like "check WAL is generated
for above statements:
3.
@@ -185,6 +185,9 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image generated */
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
It is better to keep wal_bytes should be after wal_num_fpw as it is in
the main patch. Also, consider changing at other places in this
patch. I think we should add these new fields after blk_write_time or
at the end after usage.
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?
If you agree, then a similar comment exists in
v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that
as well.
v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au
5.
Specifically, include the
+ number of records, full page images and bytes generated.
How about making the above slightly clear? "Specifically, include the
number of records, number of full page image records and amount of WAL
bytes generated.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).I think we need to know the reason for this. Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.Few other minor comments ------------------------------------ pg_stat_statements patch 1. +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +--The word 'non-temp' in the above comment appears out of place. We
don't need to specify it.2. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C";The comment doesn't seem to match what we are doing in the statement.
I think we can simplify it to something like "check WAL is generated
for above statements:3. @@ -185,6 +185,9 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ + uint64 wal_bytes; /* total amount of WAL bytes generated */ + int64 wal_records; /* # of WAL records generated */ + int64 wal_num_fpw; /* # of WAL full page image generated */ double blk_read_time; /* time spent reading, in msec */ double blk_write_time; /* time spent writing, in msec */ double usage; /* usage factor */It is better to keep wal_bytes should be after wal_num_fpw as it is in
the main patch. Also, consider changing at other places in this
patch. I think we should add these new fields after blk_write_time or
at the end after usage.4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?
IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image. But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 06:40:51PM +0530, Amit Kapila wrote:
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).I think we need to know the reason for this. Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.
I did some quick testing using the attached shell script:
- one a 1k line base number of lines, scales 1 10 100 1000 (suffix _s)
- parallel workers from 0 to 8 (suffix _w)
- each index created twice (suffix _pa and _pb)
- with a vacuum;checkpoint;pg_switch_wal executed each time
I get the following results:
query | wal_bytes | wal_records | wal_num_fpw
--------------------------------------------+-----------+-------------+-------------
CREATE INDEX t1_idx_s001_pa_w0 ON t1 (id) | 61871 | 22 | 18
CREATE INDEX t1_idx_s001_pa_w1 ON t1 (id) | 62394 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w2 ON t1 (id) | 63150 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w3 ON t1 (id) | 63906 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w4 ON t1 (id) | 64662 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w5 ON t1 (id) | 65418 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w6 ON t1 (id) | 65450 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w7 ON t1 (id) | 66206 | 21 | 18
CREATE INDEX t1_idx_s001_pa_w8 ON t1 (id) | 66962 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w0 ON t1 (id) | 67718 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w1 ON t1 (id) | 68474 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w2 ON t1 (id) | 68418 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w3 ON t1 (id) | 69174 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w4 ON t1 (id) | 69930 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w5 ON t1 (id) | 70686 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w6 ON t1 (id) | 71442 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w7 ON t1 (id) | 64922 | 21 | 18
CREATE INDEX t1_idx_s001_pb_w8 ON t1 (id) | 65682 | 21 | 18
CREATE INDEX t1_idx_s010_pa_w0 ON t1 (id) | 250460 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w1 ON t1 (id) | 251216 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w2 ON t1 (id) | 251972 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w3 ON t1 (id) | 252728 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w4 ON t1 (id) | 253484 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w5 ON t1 (id) | 254240 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w6 ON t1 (id) | 253552 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w7 ON t1 (id) | 254308 | 47 | 44
CREATE INDEX t1_idx_s010_pa_w8 ON t1 (id) | 255064 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w0 ON t1 (id) | 255820 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w1 ON t1 (id) | 256576 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w2 ON t1 (id) | 257332 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w3 ON t1 (id) | 258088 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w4 ON t1 (id) | 258844 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w5 ON t1 (id) | 259600 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w6 ON t1 (id) | 260356 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w7 ON t1 (id) | 260012 | 47 | 44
CREATE INDEX t1_idx_s010_pb_w8 ON t1 (id) | 260768 | 47 | 44
CREATE INDEX t1_idx_s1000_pa_w0 ON t1 (id) | 20400595 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w1 ON t1 (id) | 20401351 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w2 ON t1 (id) | 20402107 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w3 ON t1 (id) | 20402863 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w4 ON t1 (id) | 20403619 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w5 ON t1 (id) | 20404375 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w6 ON t1 (id) | 20403687 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w7 ON t1 (id) | 20404443 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pa_w8 ON t1 (id) | 20405199 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w0 ON t1 (id) | 20405955 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w1 ON t1 (id) | 20406711 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w2 ON t1 (id) | 20407467 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w3 ON t1 (id) | 20408223 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w4 ON t1 (id) | 20408979 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w5 ON t1 (id) | 20409735 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w6 ON t1 (id) | 20410491 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w7 ON t1 (id) | 20410147 | 2762 | 2759
CREATE INDEX t1_idx_s1000_pb_w8 ON t1 (id) | 20410903 | 2762 | 2759
CREATE INDEX t1_idx_s100_pa_w0 ON t1 (id) | 2082194 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w1 ON t1 (id) | 2082950 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w2 ON t1 (id) | 2083706 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w3 ON t1 (id) | 2084462 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w4 ON t1 (id) | 2085218 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w5 ON t1 (id) | 2085974 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w6 ON t1 (id) | 2085286 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w7 ON t1 (id) | 2086042 | 293 | 290
CREATE INDEX t1_idx_s100_pa_w8 ON t1 (id) | 2086798 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w0 ON t1 (id) | 2087554 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w1 ON t1 (id) | 2088310 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w2 ON t1 (id) | 2089066 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w3 ON t1 (id) | 2089822 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w4 ON t1 (id) | 2090578 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w5 ON t1 (id) | 2091334 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w6 ON t1 (id) | 2092090 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w7 ON t1 (id) | 2091746 | 293 | 290
CREATE INDEX t1_idx_s100_pb_w8 ON t1 (id) | 2092502 | 293 | 290
(72 rows)
The fluctuations exist for all scales, but doesn't seem to depend on the input
size.
Just to be sure I tried to measure the amount of WAL for various INSERT size
using roughly the same approach, and results are stable:
query | wal_bytes | wal_records | wal_num_fpw
-----------------------------------------------------+-----------+-------------+-------------
INSERT INTO t_001_a SELECT generate_series($1, $2) | 59000 | 1000 | 0
INSERT INTO t_001_b SELECT generate_series($1, $2) | 59000 | 1000 | 0
INSERT INTO t_010_a SELECT generate_series($1, $2) | 590000 | 10000 | 0
INSERT INTO t_010_b SELECT generate_series($1, $2) | 590000 | 10000 | 0
INSERT INTO t_1000_a SELECT generate_series($1, $2) | 59000000 | 1000000 | 0
INSERT INTO t_1000_b SELECT generate_series($1, $2) | 59000000 | 1000000 | 0
INSERT INTO t_100_a SELECT generate_series($1, $2) | 5900000 | 100000 | 0
INSERT INTO t_100_b SELECT generate_series($1, $2) | 5900000 | 100000 | 0
(8 rows)
At this point I tend to think that this is somehow due to btbuild specific
behavior, or somewhere nearby.
Few other minor comments ------------------------------------ pg_stat_statements patch 1. +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +--The word 'non-temp' in the above comment appears out of place. We
don't need to specify it.
Fixed.
2. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C";The comment doesn't seem to match what we are doing in the statement.
I think we can simplify it to something like "check WAL is generated
for above statements:
Done.
3. @@ -185,6 +185,9 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ + uint64 wal_bytes; /* total amount of WAL bytes generated */ + int64 wal_records; /* # of WAL records generated */ + int64 wal_num_fpw; /* # of WAL full page image generated */ double blk_read_time; /* time spent reading, in msec */ double blk_write_time; /* time spent writing, in msec */ double usage; /* usage factor */It is better to keep wal_bytes should be after wal_num_fpw as it is in
the main patch. Also, consider changing at other places in this
patch. I think we should add these new fields after blk_write_time or
at the end after usage.
Done.
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?If you agree, then a similar comment exists in
v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that
as well.
Agreed, and fixed in both place.
v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au
5.
Specifically, include the
+ number of records, full page images and bytes generated.How about making the above slightly clear? "Specifically, include the
number of records, number of full page image records and amount of WAL
bytes generated.
Thanks, that's clearer. Done
Attachments:
v13-0001-Add-infrastructure-to-track-WAL-usage.patchtext/plain; charset=us-asciiDownload
From 1ba0a76f3e071c239dd65fe06517ead816922957 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v13 1/4] Add infrastructure to track WAL usage.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 37 ++++++++++++-----
src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++
src/backend/access/transam/xlog.c | 12 +++++-
src/backend/access/transam/xloginsert.c | 13 ++++--
src/backend/executor/execParallel.c | 38 +++++++++++++-----
src/backend/executor/instrument.c | 53 ++++++++++++++++++++++---
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 ++++++++-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 183 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c718..cc7e8521a5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -275,6 +276,9 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
vacrelstats->dead_tuples, nindexes, vacrelstats);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
@@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3924945664..3f4cb7d39e 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ WalUsage *walusage;
} BTLeader;
/*
@@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ WalUsage *walusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ *
+ * WalUsage during execution of maintenance command can be used by an
+ * extension that reports the WAL usage, such as pg_stat_statements.
+ * We have no way of knowing whether anyone's looking at pgWalUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's WalUsage; no need to initialize */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->walusage = walusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(NULL, &btleader->walusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ WalUsage *walusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report WAL usage during parallel execution */
+ walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(NULL, &walusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448f50..50b78f3143 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Report WAL traffic to the instrumentation. */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec776..5e032e7042 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page image constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..7d9ca66fc8 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f96b..a77571a895 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,8 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
@@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +181,29 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ if (bufusage)
+ {
+ memset(bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ }
+ memset(walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ if (bufusage)
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31cce..b91e724b2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf020..1cc5b524fd 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5ac1f..f2bf37ae82 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image records produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 939de985d3..34623523a7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
2.20.1
v13-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patchtext/plain; charset=us-asciiDownload
From e0cb25e07c59b4e3947ed781e8794bdd59a30903 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Sun, 29 Mar 2020 12:38:14 +0200
Subject: [PATCH v13 2/4] Add option to report WAL usage in EXPLAIN and
auto_explain.
Author: Julien Rouhaud
Reviewed-by:
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++
doc/src/sgml/ref/explain.sgml | 14 ++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde876c..56c549d84c 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4a3d..d4d29c4a64 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -109,6 +109,26 @@ LOAD 'auto_explain';
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411f..024ede4a8d 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -192,6 +193,19 @@ ROLLBACK;
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, number of full page image records and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
+ parameter may only be used when <literal>ANALYZE</literal> is also
+ enabled. It defaults to <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638f33..cefe2144e5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page writes=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page writes", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
@@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
ExplainProperty(qlabel, unit, buf, true, es);
}
+/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
/*
* Explain a float-valued property, using the specified number of
* fractional digits.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..0e7a373caf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240e5e..7b0b0a94a6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
2.20.1
v13-0003-Keep-track-of-WAL-usage-in-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 9cf6ff14f3e34b0cf8f95016ef4a2853e792eab9 Mon Sep 17 00:00:00 2001
From: Kirill Bychik <kirill.bychik@gmail.com>
Date: Tue, 17 Mar 2020 14:42:02 +0100
Subject: [PATCH v13 3/4] Keep track of WAL usage in pg_stat_statements.
Author: Kirill Bychik
Reviewed-by: Julien Rouhaud, Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
.../expected/pg_stat_statements.out | 39 +++++++++++++
.../pg_stat_statements--1.7--1.8.sql | 5 +-
.../pg_stat_statements/pg_stat_statements.c | 55 +++++++++++++++++--
.../sql/pg_stat_statements.sql | 23 ++++++++
doc/src/sgml/pgstatstatements.sgml | 27 +++++++++
5 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 45dbe9e677..f615f8c2bf 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -211,6 +211,45 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
UPDATE test SET b = $1 WHERE a > $2 | 1 | 3
(10 rows)
+--
+-- INSERT, UPDATE, DELETE on test table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- Check WAL is generated for the above statements
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+-----------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SELECT query, calls, rows, +| 0 | 0 | f | f | t
+ wal_bytes > $1 as wal_bytes_generated, +| | | | |
+ wal_records > $2 as wal_records_generated, +| | | | |
+ wal_records = rows as wal_records_as_rows +| | | | |
+ FROM pg_stat_statements ORDER BY query COLLATE "C" | | | | |
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(7 rows)
+
--
-- pg_stat_statements.track = none
--
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 60d454db7f..30566574ab 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -41,7 +41,10 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT temp_blks_read int8,
OUT temp_blks_written int8,
OUT blk_read_time float8,
- OUT blk_write_time float8
+ OUT blk_write_time float8,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8,
+ OUT wal_bytes numeric
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b01f..c42bf62fce 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -188,6 +188,9 @@ typedef struct Counters
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image records generated */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
/*
@@ -348,6 +351,7 @@ static void pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -891,6 +895,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -926,9 +931,16 @@ pgss_planner(Query *parse,
instr_time duration;
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
/* We need to track buffer usage as the planner can access them. */
bufusage_start = pgBufferUsage;
+ /*
+ * Similarly the planner could write some WAL records in some cases
+ * (e.g. setting a hint bit with those being WAL-logged)
+ */
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
plan_nested_level++;
@@ -954,6 +966,10 @@ pgss_planner(Query *parse,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(query_string,
parse->queryId,
parse->stmt_location,
@@ -962,6 +978,7 @@ pgss_planner(Query *parse,
INSTR_TIME_GET_MILLISEC(duration),
0,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1079,6 +1096,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -1123,8 +1141,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
uint64 rows;
BufferUsage bufusage_start,
bufusage;
-
+ WalUsage walusage_start,
+ walusage;
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
exec_nested_level++;
@@ -1154,6 +1174,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1162,6 +1186,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1197,7 +1222,8 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. total_time, rows, bufusage and walusage are ignored in this
+ * case.
*
* If kind is PGSS_PLAN or PGSS_EXEC, its value is used as the array position
* for the arrays in the Counters field.
@@ -1208,6 +1234,7 @@ pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1402,6 +1429,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
+ e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
}
@@ -1449,8 +1479,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS_V1_8 29
-#define PG_STAT_STATEMENTS_COLS 29 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_8 32
+#define PG_STAT_STATEMENTS_COLS 32 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1786,6 +1816,23 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_8)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+ values[i++] = wal_bytes;
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 435d51008f..75c10554a8 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -101,6 +101,29 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+--
+-- INSERT, UPDATE, DELETE on test table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- Check WAL is generated for the above statements
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
--
-- pg_stat_statements.track = none
--
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index b4df84c60b..3d26108649 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -264,6 +264,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
2.20.1
v13-0004-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patchtext/plain; charset=us-asciiDownload
From fc00f3bc115502395d5b10862be013970b3f29c8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v13 4/4] Expose WAL usage counters in verbose (auto)vacuum
output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cc7e8521a5..735087dd74 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).I think we need to know the reason for this. Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.
I have done some testing to see where these extra WAL size is coming
from. First I tried to create new db before every run then the size
is consistent. But, then on the same server, I tired as Julien showed
in his experiment then I am getting few extra wal bytes from next
create index onwards. And, the waldump(attached in the mail) shows
that is pg_class insert wal. I still have to check that why we need
to write an extra wal size.
create extension pg_stat_statements;
drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
select * from pg_stat_statements_reset() ;
create index t1_idx_parallel_0 ON t1(id);
select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
query
| calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id)
| 1 | 49320 | 23 | 15
drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
--select * from pg_stat_statements_reset() ;
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
create index t1_idx_parallel_1 ON t1(id);
select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
postgres[110383]=# select query, calls, wal_bytes, wal_records,
wal_num_fpw from pg_stat_statements;
query
| calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_1 ON t1(id)
| 1 | 50040 | 23 | 15
wal_bytes diff = 50040-49320 = 720
Below, WAL record is causing the 720 bytes difference, all other WALs
are of the same size.
t1_idx_parallel_0:
rmgr: Heap len (rec/tot): 54/ 7498, tx: 489, lsn:
0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249
t1_idx_parallel_1:
rmgr: Heap len (rec/tot): 54/ 8218, tx: 494, lsn:
0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249
wal diff: 8218 - 7498 = 720
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image.
I think this resembles what you have written here.
But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.
We count this when forming WAL records. As per my understanding, all
three counters are about WAL records. This counter tells how many
records have full page images and one of the purposes of having this
counter is to check what percentage of records contain full page
image.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hello.
The v13 patch seems failing to apply on the master.
At Fri, 3 Apr 2020 06:37:21 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image.I think this resembles what you have written here.
But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.We count this when forming WAL records. As per my understanding, all
three counters are about WAL records. This counter tells how many
records have full page images and one of the purposes of having this
counter is to check what percentage of records contain full page
image.
Aside from which is desirable or useful, acutually XLogRecordAssemble
in v13-0001 counts the number of attached images then XLogInsertRecord
sums up the number of images in pgWalUsage.wal_num_fpw.
FWIW, it seems to me that the main concern here is the source of WAL
size. If it is the case I think that the number of full page image is
more useful.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image.I think this resembles what you have written here.
But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.We count this when forming WAL records. As per my understanding, all
three counters are about WAL records. This counter tells how many
records have full page images and one of the purposes of having this
counter is to check what percentage of records contain full page
image.
How about if say "# of full-page writes generated" or "# of WAL
full-page writes generated"? I think now I understand your concern
because we want to display it as full page writes and the comment
doesn't seem to indicate the same.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image.I think this resembles what you have written here.
But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.We count this when forming WAL records. As per my understanding, all
three counters are about WAL records. This counter tells how many
records have full page images and one of the purposes of having this
counter is to check what percentage of records contain full page
image.How about if say "# of full-page writes generated" or "# of WAL
full-page writes generated"? I think now I understand your concern
because we want to display it as full page writes and the comment
doesn't seem to indicate the same.
Either of these seem good to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 7:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
Hello.
The v13 patch seems failing to apply on the master.
It is probably due to recent commit ed7a509571. I have briefly
studied that and I think we should make this patch account for plan
time WAL usage if any similar to what got committed for buffer usage.
The reason is that there is a possibility that during planning we
might write a WAL due to hint bits.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 9:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';
query | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758
create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758
create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758
create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758
create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758
create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758
create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758
create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758
create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758
create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758
create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758
(11 rows)=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
relname | pg_relation_size
-----------------------+------------------
t1_idx_parallel_0 | 22487040
t1_idx_parallel_0_bis | 22487040
t1_idx_parallel_0_ter | 22487040
t1_idx_parallel_2 | 22487040
t1_idx_parallel_1 | 22487040
t1_idx_parallel_4 | 22487040
t1_idx_parallel_3 | 22487040
t1_idx_parallel_5 | 22487040
t1_idx_parallel_6 | 22487040
t1_idx_parallel_7 | 22487040
t1_idx_parallel_8 | 22487040
(9 rows)So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index. I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).I think we need to know the reason for this. Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.I have done some testing to see where these extra WAL size is coming
from. First I tried to create new db before every run then the size
is consistent. But, then on the same server, I tired as Julien showed
in his experiment then I am getting few extra wal bytes from next
create index onwards. And, the waldump(attached in the mail) shows
that is pg_class insert wal. I still have to check that why we need
to write an extra wal size.create extension pg_stat_statements;
drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
select * from pg_stat_statements_reset() ;
create index t1_idx_parallel_0 ON t1(id);
select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
query
| calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id)
| 1 | 49320 | 23 | 15drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
--select * from pg_stat_statements_reset() ;
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
create index t1_idx_parallel_1 ON t1(id);select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
postgres[110383]=# select query, calls, wal_bytes, wal_records,
wal_num_fpw from pg_stat_statements;
query
| calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_1 ON t1(id)
| 1 | 50040 | 23 | 15wal_bytes diff = 50040-49320 = 720
Below, WAL record is causing the 720 bytes difference, all other WALs
are of the same size.
t1_idx_parallel_0:
rmgr: Heap len (rec/tot): 54/ 7498, tx: 489, lsn:
0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249t1_idx_parallel_1:
rmgr: Heap len (rec/tot): 54/ 8218, tx: 494, lsn:
0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249wal diff: 8218 - 7498 = 720
I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger. You can refer below code in
XLogRecordAssemble
{
....
bimg.length = BLCKSZ - cbimg.hole_length;
if (cbimg.hole_length == 0)
{
....
}
else
{
/* must skip the hole */
rdt_datas_last->data = page;
rdt_datas_last->len = bimg.hole_offset;
rdt_datas_last->next = ®buf->bkp_rdatas[1];
rdt_datas_last = rdt_datas_last->next;
rdt_datas_last->data =
page + (bimg.hole_offset + cbimg.hole_length);
rdt_datas_last->len =
BLCKSZ - (bimg.hole_offset + cbimg.hole_length);
}
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger.
This means if we always re-create the database or may be keep
full_page_writes to off, then we should get consistent WAL usage data
for all tests.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger.This means if we always re-create the database or may be keep
full_page_writes to off, then we should get consistent WAL usage data
for all tests.
With new database, it is always the same. But, with full-page write,
I could see one of the create index is writing extra wal and if we
change the older then the new create index at that place will write
extra wal. I guess that could be due to a non-in place update in some
of the system tables.
postgres[58554]=# create extension pg_stat_statements;
CREATE EXTENSION
postgres[58554]=#
postgres[58554]=# create table t1(id integer);
CREATE TABLE
postgres[58554]=# insert into t1 select * from generate_series(1, 1000000);
INSERT 0 1000000
postgres[58554]=# select * from pg_stat_statements_reset() ;
pg_stat_statements_reset
--------------------------
(1 row)
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_0 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 1);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_1 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 2);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_2 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 3);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_3 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 4);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_4 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 5);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_5 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 6);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_6 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 7);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_7 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 8);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_8 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# select query, calls, wal_bytes, wal_records,
wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
query | calls | wal_bytes |
wal_records | wal_num_fpw
------------------------------------------+-------+-----------+-------------+-------------
create index t1_idx_parallel_0 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_1 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_3 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_2 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_4 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_8 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_6 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_7 ON t1(id) | 1 | 20355953 |
2766 | 2745
create index t1_idx_parallel_5 ON t1(id) | 1 | 20359585 |
2767 | 2745
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger.This means if we always re-create the database or may be keep
full_page_writes to off, then we should get consistent WAL usage data
for all tests.With new database, it is always the same. But, with full-page write,
I could see one of the create index is writing extra wal and if we
change the older then the new create index at that place will write
extra wal. I guess that could be due to a non-in place update in some
of the system tables.
I have analyzed the WAL and there could be multiple reasons for the
same. With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think now I got the reason. Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page. If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size. So in subsequent records, the image size
is bigger.This means if we always re-create the database or may be keep
full_page_writes to off, then we should get consistent WAL usage data
for all tests.With new database, it is always the same. But, with full-page write,
I could see one of the create index is writing extra wal and if we
change the older then the new create index at that place will write
extra wal. I guess that could be due to a non-in place update in some
of the system tables.I have analyzed the WAL and there could be multiple reasons for the
same. With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.
Thanks for the investigation. I think it is clear that we can't
expect the same WAL size even if we repeat the same operation unless
it is a fresh database.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have analyzed the WAL and there could be multiple reasons for the
same. With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.Thanks for the investigation. I think it is clear that we can't
expect the same WAL size even if we repeat the same operation unless
it is a fresh database.
Attached find the latest patches. I have modified based on our
discussion on user interface thread [1]/messages/by-id/CAA4eK1+o1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ@mail.gmail.com, ran pgindent on all patches,
slightly modified one comment based on Dilip's input and added commit
messages. I think the patches are in good shape. I would like to
commit the first patch in this series tomorrow unless I see more
comments or any other objections. The patch-2 might need to be
rebased if the other related patch [2]/messages/by-id/E1jKC4J-0007R3-Bo@gemulon.postgresql.org got committed first and we
might need to tweak a bit based on the input from other thread [1]/messages/by-id/CAA4eK1+o1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ@mail.gmail.com
where we are discussing user interface for it.
[1]: /messages/by-id/CAA4eK1+o1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ@mail.gmail.com
[2]: /messages/by-id/E1jKC4J-0007R3-Bo@gemulon.postgresql.org
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v14-0001-Add-infrastructure-to-track-WAL-usage.patchapplication/octet-stream; name=v14-0001-Add-infrastructure-to-track-WAL-usage.patchDownload
From 87d430aa6cc2e86e263bf45bf305158ec8caa459 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 3 Apr 2020 18:21:05 +0530
Subject: [PATCH v14 1/4] Add infrastructure to track WAL usage.
This allows gathering the WAL generation statistics for each statement
execution. The three statistics that we collect are number of WAL records,
the number of full page writes and the amount of WAL bytes generated.
This helps the users who have write-intensive workload to see the impact
of I/O due to WAL. This further enables us to see approximately what
percentage of overall WAL is due to full page writes.
In future, we can extend this functionality to allow us to compute the
exact amount of WAL data due to full page writes.
This patch in itself is just an infrastructure to compute WAL usage data.
The upcoming patches will expose this data via explain, auto_explain,
pg_stat_statements and verbose (auto)vacuum output.
Author: Kirill Bychik, Julien Rouhaud
Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 37 +++++++++++++++++------
src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++++++++
src/backend/access/transam/xlog.c | 12 +++++++-
src/backend/access/transam/xloginsert.c | 13 +++++---
src/backend/executor/execParallel.c | 36 ++++++++++++++++------
src/backend/executor/instrument.c | 53 +++++++++++++++++++++++++++++----
src/include/access/xlog.h | 3 +-
src/include/executor/execParallel.h | 1 +
src/include/executor/instrument.h | 18 +++++++++--
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 182 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9596c..3ca7f5d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -139,6 +139,7 @@
#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -275,6 +276,9 @@ typedef struct LVParallelState
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
vacrelstats->dead_tuples, nindexes, vacrelstats);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
*/
if (nworkers > 0)
{
@@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
WaitForParallelWorkersToFinish(lps->pcxt);
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
/*
@@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
bool *can_parallel_vacuum;
long maxtuples;
char *sharedquery;
@@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
- * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE.
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
*
* If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage, so do
- * it unconditionally.
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
vacrelstats->dead_tuples = dead_tuples;
- /* Allocate space for each worker's BufferUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
buffer_usage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
lps->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ lps->wal_usage = wal_usage;
/* Store query string for workers */
sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
@@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
int nindexes;
char *sharedquery;
IndexBulkDeleteResult **stats;
@@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes,
&vacrelstats);
- /* Report buffer usage during parallel execution */
+ /* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3924945..4a85865 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ WalUsage *walusage;
} BTLeader;
/*
@@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ WalUsage *walusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ *
+ * WalUsage during execution of maintenance command can be used by an
+ * extension that reports the WAL usage, such as pg_stat_statements. We
+ * have no way of knowing whether anyone's looking at pgWalUsage, so do it
+ * unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's WalUsage; no need to initialize */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->walusage = walusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to finish,
+ * or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(NULL, &btleader->walusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ WalUsage *walusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report WAL usage during parallel execution */
+ walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(NULL, &walusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 977d448..50b78f3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -43,6 +43,7 @@
#include "commands/progress.h"
#include "commands/tablespace.h"
#include "common/controldata_utils.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags)
+ uint8 flags,
+ int num_fpw)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
+ /* Report WAL traffic to the instrumentation. */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ pgWalUsage.wal_records++;
+ pgWalUsage.wal_num_fpw += num_fpw;
+ }
+
return EndPos;
}
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index a618dec..5e032e7 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
#include "access/xloginsert.h"
#include "catalog/pg_control.h"
#include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "replication/origin.h"
@@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn);
+ XLogRecPtr *fpw_lsn, int *num_fpw);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
+ int num_fpw = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn)
+ XLogRecPtr *fpw_lsn, int *num_fpw)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
*/
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
+ /* Report a full page image constructed for the WAL record */
+ *num_fpw += 1;
+
/*
* Construct XLogRecData entries for the page content.
*/
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6e..b7d0719 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -12,7 +12,7 @@
* workers and ensuring that their state generally matches that of the
* leader; see src/backend/access/transam/README.parallel for details.
* However, we must save and restore relevant executor state, such as
- * any ParamListInfo associated with the query, buffer usage info, and
+ * any ParamListInfo associated with the query, buffer/WAL usage info, and
* the actual plan to be passed down to the worker.
*
* IDENTIFICATION
@@ -62,6 +62,7 @@
#define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(BufferUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for WalUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space);
pei->buffer_usage = bufusage_space;
+ /* Same for WalUsage. */
+ walusage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
+ pei->wal_usage = walusage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
/*
* Finish parallel execution. We wait for parallel workers to finish, and
- * accumulate their buffer usage.
+ * accumulate their buffer/WAL usage.
*/
void
ExecParallelFinish(ParallelExecutorInfo *pei)
@@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer usage. (This must wait for the workers to
+ * Next, accumulate buffer/WAL usage. (This must wait for the workers to
* finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
SharedExecutorInstrumentation *instrumentation;
@@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer usage during query execution.
+ * Prepare to track buffer/WAL usage during query execution.
*
* We do this after starting up the executor to match what happens in the
- * leader, which also doesn't count buffer accesses that occur during
- * executor startup.
+ * leader, which also doesn't count buffer accesses and WAL activity that
+ * occur during executor startup.
*/
InstrStartParallelQuery();
@@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer usage during parallel execution. */
+ /* Report buffer/WAL usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 042e10f..74ee480 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -19,8 +19,11 @@
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);
/* Allocate new instrumentation structure(s) */
@@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options)
/* initialize all fields to zeroes, then modify as needed */
instr = palloc0(n * sizeof(Instrumentation));
- if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER))
+ if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL))
{
bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
int i;
for (i = 0; i < n; i++)
{
instr[i].need_bufusage = need_buffers;
+ instr[i].need_walusage = need_wal;
instr[i].need_timer = need_timer;
}
}
@@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options)
{
memset(instr, 0, sizeof(Instrumentation));
instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0;
instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0;
}
@@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr)
/* save buffer usage totals at node entry, if needed */
if (instr->need_bufusage)
instr->bufusage_start = pgBufferUsage;
+
+ if (instr->need_walusage)
+ instr->walusage_start = pgWalUsage;
}
/* Exit from a plan node */
@@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples)
BufferUsageAccumDiff(&instr->bufusage,
&pgBufferUsage, &instr->bufusage_start);
+ if (instr->need_walusage)
+ WalUsageAccumDiff(&instr->walusage,
+ &pgWalUsage, &instr->walusage_start);
+
/* Is this the first tuple of this cycle? */
if (!instr->running)
{
@@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* Add delta of buffer usage since entry to node's totals */
if (dst->need_bufusage)
BufferUsageAdd(&dst->bufusage, &add->bufusage);
+
+ if (dst->need_walusage)
+ WalUsageAdd(&dst->walusage, &add->walusage);
}
/* note current values during parallel executor startup */
@@ -165,21 +181,29 @@ void
InstrStartParallelQuery(void)
{
save_pgBufferUsage = pgBufferUsage;
+ save_pgWalUsage = pgWalUsage;
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *result)
+InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- memset(result, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage);
+ if (bufusage)
+ {
+ memset(bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+ }
+ memset(walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *result)
+InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- BufferUsageAdd(&pgBufferUsage, result);
+ if (bufusage)
+ BufferUsageAdd(&pgBufferUsage, bufusage);
+ WalUsageAdd(&pgWalUsage, walusage);
}
/* dst += add */
@@ -221,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst,
INSTR_TIME_ACCUM_DIFF(dst->blk_write_time,
add->blk_write_time, sub->blk_write_time);
}
+
+/* helper functions for WAL usage accumulation */
+static void
+WalUsageAdd(WalUsage *dst, WalUsage *add)
+{
+ dst->wal_bytes += add->wal_bytes;
+ dst->wal_records += add->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw;
+}
+
+void
+WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
+{
+ dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
+ dst->wal_records += add->wal_records - sub->wal_records;
+ dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9ec7b31..b91e724 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -259,7 +259,8 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
- uint8 flags);
+ uint8 flags,
+ int num_fpw);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 17d07cf..5a39a5b 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
dsa_area *area; /* points to DSA area in DSM */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 3825a5a..64439c6 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -32,12 +32,20 @@ typedef struct BufferUsage
instr_time blk_write_time; /* time spent writing */
} BufferUsage;
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_num_fpw; /* # of WAL full page image writes produced */
+ uint64 wal_bytes; /* size of WAL records produced */
+} WalUsage;
+
/* Flag bits included in InstrAlloc's instrument_options bitmask */
typedef enum InstrumentOption
{
INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */
INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */
INSTRUMENT_ROWS = 1 << 2, /* needs row count */
+ INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */
INSTRUMENT_ALL = PG_INT32_MAX
} InstrumentOption;
@@ -46,6 +54,7 @@ typedef struct Instrumentation
/* Parameters set at node creation: */
bool need_timer; /* true if we need timer data */
bool need_bufusage; /* true if we need buffer usage data */
+ bool need_walusage; /* true if we need WAL usage data */
/* Info about current plan cycle: */
bool running; /* true if we've completed first tuple */
instr_time starttime; /* start time of current iteration of node */
@@ -53,6 +62,7 @@ typedef struct Instrumentation
double firsttuple; /* time for first tuple of this cycle */
double tuplecount; /* # of tuples emitted so far this cycle */
BufferUsage bufusage_start; /* buffer usage at start */
+ WalUsage walusage_start; /* WAL usage at start */
/* Accumulated statistics across all completed cycles: */
double startup; /* total startup time (in seconds) */
double total; /* total time (in seconds) */
@@ -62,6 +72,7 @@ typedef struct Instrumentation
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
BufferUsage bufusage; /* total buffer usage */
+ WalUsage walusage; /* total WAL usage */
} Instrumentation;
typedef struct WorkerInstrumentation
@@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options);
extern void InstrInit(Instrumentation *instr, int instrument_options);
@@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *result);
-extern void InstrAccumParallelQuery(BufferUsage *result);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
+ const WalUsage *sub);
#endif /* INSTRUMENT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 939de98..3462352 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2643,6 +2643,7 @@ WalSndCtlData
WalSndSendDataCallback
WalSndState
WalTimeSample
+WalUsage
WalWriteMethod
Walfile
WindowAgg
--
1.8.3.1
v14-0002-Add-the-option-to-report-WAL-usage-in-EXPLAIN-an.patchapplication/octet-stream; name=v14-0002-Add-the-option-to-report-WAL-usage-in-EXPLAIN-an.patchDownload
From efba4e29e063ca08e0e642d2ec422c7ed76ca5e4 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 3 Apr 2020 18:42:25 +0530
Subject: [PATCH v14 2/4] Add the option to report WAL usage in EXPLAIN and
auto_explain.
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command. This option allows to include information on WAL record
generation added by commit <> in EXPLAIN output.
This also allows the WAL usage information to be displayed via
the auto_explain module. A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.
Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/auto_explain/auto_explain.c | 15 ++++++++
doc/src/sgml/auto-explain.sgml | 20 ++++++++++
doc/src/sgml/ref/explain.sgml | 14 +++++++
src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 4 +-
src/include/commands/explain.h | 3 ++
6 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index f69dde8..56c549d 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */
static bool auto_explain_log_analyze = false;
static bool auto_explain_log_verbose = false;
static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_wal = false;
static bool auto_explain_log_triggers = false;
static bool auto_explain_log_timing = true;
static bool auto_explain_log_settings = false;
@@ -148,6 +149,17 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomBoolVariable("auto_explain.log_wal",
+ "Log WAL usage.",
+ NULL,
+ &auto_explain_log_wal,
+ false,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomBoolVariable("auto_explain.log_triggers",
"Include trigger statistics in plans.",
"This has no effect unless log_analyze is also set.",
@@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->instrument_options |= INSTRUMENT_ROWS;
if (auto_explain_log_buffers)
queryDesc->instrument_options |= INSTRUMENT_BUFFERS;
+ if (auto_explain_log_wal)
+ queryDesc->instrument_options |= INSTRUMENT_WAL;
}
}
@@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
es->verbose = auto_explain_log_verbose;
es->buffers = (es->analyze && auto_explain_log_buffers);
+ es->wal = (es->analyze && auto_explain_log_wal);
es->timing = (es->analyze && auto_explain_log_timing);
es->summary = es->analyze;
es->format = auto_explain_log_format;
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 3d619d4..d4d29c4 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -111,6 +111,26 @@ LOAD 'auto_explain';
<varlistentry>
<term>
+ <varname>auto_explain.log_wal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ <varname>auto_explain.log_wal</varname> controls whether WAL
+ usage statistics are printed when an execution plan is logged; it's
+ equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>.
+ This parameter has no effect
+ unless <varname>auto_explain.log_analyze</varname> is enabled.
+ This parameter is off by default.
+ Only superusers can change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
<varname>auto_explain.log_timing</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>auto_explain.log_timing</varname> configuration parameter</primary>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d104..024ede4 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac
COSTS [ <replaceable class="parameter">boolean</replaceable> ]
SETTINGS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFERS [ <replaceable class="parameter">boolean</replaceable> ]
+ WAL [ <replaceable class="parameter">boolean</replaceable> ]
TIMING [ <replaceable class="parameter">boolean</replaceable> ]
SUMMARY [ <replaceable class="parameter">boolean</replaceable> ]
FORMAT { TEXT | XML | JSON | YAML }
@@ -193,6 +194,19 @@ ROLLBACK;
</varlistentry>
<varlistentry>
+ <term><literal>WAL</literal></term>
+ <listitem>
+ <para>
+ Include information on WAL record generation. Specifically, include the
+ number of records, number of full page image records and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
+ parameter may only be used when <literal>ANALYZE</literal> is also
+ enabled. It defaults to <literal>FALSE</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>TIMING</literal></term>
<listitem>
<para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ee0e638..ba726df 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
ExplainState *es);
static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
es->costs = defGetBoolean(opt);
else if (strcmp(opt->defname, "buffers") == 0)
es->buffers = defGetBoolean(opt);
+ else if (strcmp(opt->defname, "wal") == 0)
+ es->wal = defGetBoolean(opt);
else if (strcmp(opt->defname, "settings") == 0)
es->settings = defGetBoolean(opt);
else if (strcmp(opt->defname, "timing") == 0)
@@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS requires ANALYZE")));
+ if (es->wal && !es->analyze)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("EXPLAIN option WAL requires ANALYZE")));
+
/* if the timing was not set explicitly, set default value */
es->timing = (timing_set) ? es->timing : es->analyze;
@@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
if (es->buffers)
instrument_option |= INSTRUMENT_BUFFERS;
+ if (es->wal)
+ instrument_option |= INSTRUMENT_WAL;
/*
* We always collect timing for the entire statement, even when node-level
@@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
}
- /* Show buffer usage */
+ /* Show buffer/WAL usage */
if (es->buffers && planstate->instrument)
show_buffer_usage(es, &planstate->instrument->bufusage);
+ if (es->wal && planstate->instrument)
+ show_wal_usage(es, &planstate->instrument->walusage);
- /* Prepare per-worker buffer usage */
- if (es->workers_state && es->buffers && es->verbose)
+ /* Prepare per-worker buffer/WAL usage */
+ if (es->workers_state && (es->buffers || es->wal) && es->verbose)
{
WorkerInstrumentation *w = planstate->worker_instrument;
@@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
continue;
ExplainOpenWorker(n, es);
- show_buffer_usage(es, &instrument->bufusage);
+ if (es->buffers)
+ show_buffer_usage(es, &instrument->bufusage);
+ if (es->wal)
+ show_wal_usage(es, &instrument->walusage);
ExplainCloseWorker(n, es);
}
}
@@ -3060,6 +3075,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
/*
+ * Show WAL usage details.
+ */
+static void
+show_wal_usage(ExplainState *es, const WalUsage *usage)
+{
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ (usage->wal_bytes > 0))
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "WAL:");
+
+ if (usage->wal_records > 0)
+ appendStringInfo(es->str, " records=%ld",
+ usage->wal_records);
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page writes=%ld",
+ usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ usage->wal_bytes);
+ appendStringInfoChar(es->str, '\n');
+ }
+ }
+ else
+ {
+ ExplainPropertyInteger("WAL records", NULL,
+ usage->wal_records, es);
+ ExplainPropertyInteger("WAL full page writes", NULL,
+ usage->wal_num_fpw, es);
+ ExplainPropertyUInteger("WAL bytes", NULL,
+ usage->wal_bytes, es);
+ }
+}
+
+/*
* Add some additional details about an IndexScan or IndexOnlyScan
*/
static void
@@ -3844,6 +3897,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value,
}
/*
+ * Explain an unsigned integer-valued property.
+ */
+void
+ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value,
+ ExplainState *es)
+{
+ char buf[32];
+
+ snprintf(buf, sizeof(buf), UINT64_FORMAT, value);
+ ExplainProperty(qlabel, unit, buf, true, es);
+}
+
+/*
* Explain a float-valued property, using the specified number of
* fractional digits.
*/
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec597..0e7a373 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end)
*/
if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS",
- "BUFFERS", "TIMING", "SUMMARY", "FORMAT");
- else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY"))
+ "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT");
+ else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY"))
COMPLETE_WITH("ON", "OFF");
else if (TailMatches("FORMAT"))
COMPLETE_WITH("TEXT", "XML", "JSON", "YAML");
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 54f6240..7b0b0a9 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -42,6 +42,7 @@ typedef struct ExplainState
bool analyze; /* print actual times */
bool costs; /* print estimated costs */
bool buffers; /* print buffer usage */
+ bool wal; /* print WAL usage */
bool timing; /* print detailed node timing */
bool summary; /* print total planning and execution timing */
bool settings; /* print modified settings */
@@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value,
ExplainState *es);
extern void ExplainPropertyInteger(const char *qlabel, const char *unit,
int64 value, ExplainState *es);
+extern void ExplainPropertyUInteger(const char *qlabel, const char *unit,
+ uint64 value, ExplainState *es);
extern void ExplainPropertyFloat(const char *qlabel, const char *unit,
double value, int ndigits, ExplainState *es);
extern void ExplainPropertyBool(const char *qlabel, bool value,
--
1.8.3.1
v14-0003-Allow-pg_stat_statements-to-track-WAL-usage-stat.patchapplication/octet-stream; name=v14-0003-Allow-pg_stat_statements-to-track-WAL-usage-stat.patchDownload
From efe635b934bd1e2ff280ccf66a7e6e15424566c9 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 3 Apr 2020 18:51:51 +0530
Subject: [PATCH v14 3/4] Allow pg_stat_statements to track WAL usage
statistics.
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit <>.
This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit 17e0328224.
Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
.../expected/pg_stat_statements.out | 39 +++++++++++++++
.../pg_stat_statements--1.7--1.8.sql | 5 +-
contrib/pg_stat_statements/pg_stat_statements.c | 55 ++++++++++++++++++++--
.../pg_stat_statements/sql/pg_stat_statements.sql | 23 +++++++++
doc/src/sgml/pgstatstatements.sgml | 27 +++++++++++
5 files changed, 145 insertions(+), 4 deletions(-)
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 45dbe9e..f615f8c 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -212,6 +212,45 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
(10 rows)
--
+-- INSERT, UPDATE, DELETE on test table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+-- Check WAL is generated for the above statements
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows
+-----------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t
+ DROP TABLE pgss_test | 1 | 0 | t | t | f
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t
+ SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f
+ SELECT query, calls, rows, +| 0 | 0 | f | f | t
+ wal_bytes > $1 as wal_bytes_generated, +| | | | |
+ wal_records > $2 as wal_records_generated, +| | | | |
+ wal_records = rows as wal_records_as_rows +| | | | |
+ FROM pg_stat_statements ORDER BY query COLLATE "C" | | | | |
+ SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t
+(7 rows)
+
+--
-- pg_stat_statements.track = none
--
SET pg_stat_statements.track = 'none';
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 60d454d..3056657 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -41,7 +41,10 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT temp_blks_read int8,
OUT temp_blks_written int8,
OUT blk_read_time float8,
- OUT blk_write_time float8
+ OUT blk_write_time float8,
+ OUT wal_records int8,
+ OUT wal_num_fpw int8,
+ OUT wal_bytes numeric
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 942922b..04abdab 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -188,6 +188,9 @@ typedef struct Counters
double blk_read_time; /* time spent reading, in msec */
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image records generated */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
/*
@@ -348,6 +351,7 @@ static void pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate);
static void pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssVersion api_version,
@@ -891,6 +895,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
0,
0,
NULL,
+ NULL,
&jstate);
}
@@ -926,9 +931,17 @@ pgss_planner(Query *parse,
instr_time duration;
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
/* We need to track buffer usage as the planner can access them. */
bufusage_start = pgBufferUsage;
+
+ /*
+ * Similarly the planner could write some WAL records in some cases
+ * (e.g. setting a hint bit with those being WAL-logged)
+ */
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
plan_nested_level++;
@@ -954,6 +967,10 @@ pgss_planner(Query *parse,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(query_string,
parse->queryId,
parse->stmt_location,
@@ -962,6 +979,7 @@ pgss_planner(Query *parse,
INSTR_TIME_GET_MILLISEC(duration),
0,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1079,6 +1097,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
queryDesc->totaltime->total * 1000.0, /* convert to msec */
queryDesc->estate->es_processed,
&queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
NULL);
}
@@ -1123,8 +1142,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
uint64 rows;
BufferUsage bufusage_start,
bufusage;
+ WalUsage walusage_start,
+ walusage;
bufusage_start = pgBufferUsage;
+ walusage_start = pgWalUsage;
INSTR_TIME_SET_CURRENT(start);
exec_nested_level++;
@@ -1154,6 +1176,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ /* calc differences of WAL counters. */
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
pgss_store(queryString,
0, /* signal that it's a utility stmt */
pstmt->stmt_location,
@@ -1162,6 +1188,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
INSTR_TIME_GET_MILLISEC(duration),
rows,
&bufusage,
+ &walusage,
NULL);
}
else
@@ -1197,7 +1224,8 @@ pgss_hash_string(const char *str, int len)
*
* If jstate is not NULL then we're trying to create an entry for which
* we have no statistics as yet; we just want to record the normalized
- * query string. total_time, rows, bufusage are ignored in this case.
+ * query string. total_time, rows, bufusage and walusage are ignored in this
+ * case.
*
* If kind is PGSS_PLAN or PGSS_EXEC, its value is used as the array position
* for the arrays in the Counters field.
@@ -1208,6 +1236,7 @@ pgss_store(const char *query, uint64 queryId,
pgssStoreKind kind,
double total_time, uint64 rows,
const BufferUsage *bufusage,
+ const WalUsage *walusage,
pgssJumbleState *jstate)
{
pgssHashKey key;
@@ -1402,6 +1431,9 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time);
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
+ e->counters.wal_records += walusage->wal_records;
+ e->counters.wal_num_fpw += walusage->wal_num_fpw;
+ e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
}
@@ -1449,8 +1481,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
#define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS_V1_2 19
#define PG_STAT_STATEMENTS_COLS_V1_3 23
-#define PG_STAT_STATEMENTS_COLS_V1_8 29
-#define PG_STAT_STATEMENTS_COLS 29 /* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_8 32
+#define PG_STAT_STATEMENTS_COLS 32 /* maximum of above */
/*
* Retrieve statement statistics.
@@ -1786,6 +1818,23 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);
}
+ if (api_version >= PGSS_V1_8)
+ {
+ char buf[256];
+ Datum wal_bytes;
+
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+ values[i++] = wal_bytes;
+ }
Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 :
api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 :
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 435d510..75c1055 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -102,6 +102,29 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5);
SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
--
+-- INSERT, UPDATE, DELETE on test table to validate WAL generation metrics
+--
+SELECT pg_stat_statements_reset();
+
+-- utility "create table" should not be shown
+CREATE TABLE pgss_test (a int, b char(20));
+
+INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_test SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_test WHERE a > 9;
+-- DROP test table
+SET pg_stat_statements.track_utility = TRUE;
+DROP TABLE pgss_test;
+SET pg_stat_statements.track_utility = FALSE;
+
+-- Check WAL is generated for the above statements
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+--
-- pg_stat_statements.track = none
--
SET pg_stat_statements.track = 'none';
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index b4df84c..3d26108 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -264,6 +264,33 @@
</entry>
</row>
+ <row>
+ <entry><structfield>wal_bytes</structfield></entry>
+ <entry><type>numeric</type></entry>
+ <entry></entry>
+ <entry>
+ Total amount of WAL bytes generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_records</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL records generated by the statement
+ </entry>
+ </row>
+
+ <row>
+ <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry></entry>
+ <entry>
+ Total count of WAL full page writes generated by the statement
+ </entry>
+ </row>
+
</tbody>
</tgroup>
</table>
--
1.8.3.1
v14-0004-Allow-verbose-auto-vacuum-to-display-WAL-usage-s.patchapplication/octet-stream; name=v14-0004-Allow-verbose-auto-vacuum-to-display-WAL-usage-s.patchDownload
From f05e52e36e812092434b3831457da22e7fd0b04a Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 3 Apr 2020 19:03:55 +0530
Subject: [PATCH v14 4/4] Allow verbose (auto)vacuum to display WAL usage
statistics.
This commit allows verbose (auto)vacuum to display WAL usage statistics
added by commit <>.
Author: Julien Rouhaud
Reviewed-by: Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3ca7f5d..d26094c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
IndexBulkDeleteResult **indstats;
int i;
PGRUsage ru0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
Buffer vmbuffer = InvalidBuffer;
BlockNumber next_unskippable_block;
bool skipping_blocks;
@@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+ appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, "
+ UINT64_FORMAT " WAL bytes\n"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
1.8.3.1
On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have analyzed the WAL and there could be multiple reasons for the
same. With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.Thanks for the investigation. I think it is clear that we can't
expect the same WAL size even if we repeat the same operation unless
it is a fresh database.Attached find the latest patches. I have modified based on our
discussion on user interface thread [1], ran pgindent on all patches,
slightly modified one comment based on Dilip's input and added commit
messages. I think the patches are in good shape. I would like to
commit the first patch in this series tomorrow unless I see more
comments or any other objections.
Pushed.
The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.
The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.). Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase. Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much. What do
you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have analyzed the WAL and there could be multiple reasons for the
same. With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.Thanks for the investigation. I think it is clear that we can't
expect the same WAL size even if we repeat the same operation unless
it is a fresh database.Attached find the latest patches. I have modified based on our
discussion on user interface thread [1], ran pgindent on all patches,
slightly modified one comment based on Dilip's input and added commit
messages. I think the patches are in good shape. I would like to
commit the first patch in this series tomorrow unless I see more
comments or any other objections.Pushed.
Thanks!
The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.). Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase. Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much. What do
you think?
I agree that WAL activity during planning shouldn't be very frequent, but it
might still be worthwhile to add it. I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero. Maybe it'd be better to remove them from the output, same
as the buffers?
On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.). Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase. Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much. What do
you think?I agree that WAL activity during planning shouldn't be very frequent, but it
might still be worthwhile to add it.
We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind? I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.
I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero. Maybe it'd be better to remove them from the output, same
as the buffers?
Which regression tests are you referring to? pg_stat_statements? If
so, why would it be unstable? It should always generate WAL although
the exact values may differ and we have already taken care of that in
the patch, no?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 02:12:59PM +0530, Amit Kapila wrote:
On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.). Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase. Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much. What do
you think?I agree that WAL activity during planning shouldn't be very frequent, but it
might still be worthwhile to add it.We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind? I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.
I'm mostly thinking of people trying to investigate possible slowdowns on a
hot-standby replica with a primary without wal_log_hints. If they explicitly
ask for WAL information, we should provide them, even if it's quite unlikely to
happen.
I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero. Maybe it'd be better to remove them from the output, same
as the buffers?Which regression tests are you referring to? pg_stat_statements? If
so, why would it be unstable? It should always generate WAL although
the exact values may differ and we have already taken care of that in
the patch, no?
I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
which could be unstable for similar reason to why the first attempt to add
BUFFERS in the planning part of EXPLAIN was unstable. I thought that's why you
were hesitating of adding it.
On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind? I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.I'm mostly thinking of people trying to investigate possible slowdowns on a
hot-standby replica with a primary without wal_log_hints. If they explicitly
ask for WAL information, we should provide them, even if it's quite unlikely to
happen.
Yeah, possible but I am not completely sure. I would like to hear the
opinion of others if any before adding code for this. How about if we
first commit pg_stat_statements and wait for this till Monday and if
nobody responds we can commit the current patch but would start a new
thread and try to get the opinion of others?
I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero. Maybe it'd be better to remove them from the output, same
as the buffers?Which regression tests are you referring to? pg_stat_statements? If
so, why would it be unstable? It should always generate WAL although
the exact values may differ and we have already taken care of that in
the patch, no?I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
which could be unstable for similar reason to why the first attempt to add
BUFFERS in the planning part of EXPLAIN was unstable.
oh, then leave it for now because I don't see much use of those as the
code path can anyway be hit by the tests added by pg_stat_statements
patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote:
On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind? I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.I'm mostly thinking of people trying to investigate possible slowdowns on a
hot-standby replica with a primary without wal_log_hints. If they explicitly
ask for WAL information, we should provide them, even if it's quite unlikely to
happen.Yeah, possible but I am not completely sure. I would like to hear the
opinion of others if any before adding code for this. How about if we
first commit pg_stat_statements and wait for this till Monday and if
nobody responds we can commit the current patch but would start a new
thread and try to get the opinion of others?
I'm fine with it.
I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero. Maybe it'd be better to remove them from the output, same
as the buffers?Which regression tests are you referring to? pg_stat_statements? If
so, why would it be unstable? It should always generate WAL although
the exact values may differ and we have already taken care of that in
the patch, no?I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
which could be unstable for similar reason to why the first attempt to add
BUFFERS in the planning part of EXPLAIN was unstable.oh, then leave it for now because I don't see much use of those as the
code path can anyway be hit by the tests added by pg_stat_statements
patch.
Perfect then!
On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote:
On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind? I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.I'm mostly thinking of people trying to investigate possible slowdowns on a
hot-standby replica with a primary without wal_log_hints. If they explicitly
ask for WAL information, we should provide them, even if it's quite unlikely to
happen.Yeah, possible but I am not completely sure. I would like to hear the
opinion of others if any before adding code for this. How about if we
first commit pg_stat_statements and wait for this till Monday and if
nobody responds we can commit the current patch but would start a new
thread and try to get the opinion of others?I'm fine with it.
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate:
%.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
Here, we are not displaying Buffers related data, so why do we think
it is important to display WAL data? I see some point in displaying
Buffers and WAL data in a vacuum (verbose), but I feel it is better to
make a case for both the statistics together rather than just
displaying one and leaving other. I think the other change related to
autovacuum stats seems okay to me.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, 31 Mar 2020 at 14:13, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); +This should be done for launched workers aka
lps->pcxt->nworkers_launched. I think a similar problem exists in
create index related patch.You're right. Fixed in the new patches.
On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
Just minor nitpicking:
+ int i;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);+ /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]);We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?We can do that but I was not sure if it's good since other codes
around there don't use that. So I'd like to leave it for committers.
It's a trivial change.
I've updated the buffer usage patch for parallel index creation as the
previous patch conflicts with commit df3b181499b40.
This comment in commit df3b181499b40 seems the comment which had been
replaced by Amit with a better sentence when introducing buffer usage
to parallel vacuum.
+ /*
+ * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ *
+ * WalUsage during execution of maintenance command can be used by an
+ * extension that reports the WAL usage, such as pg_stat_statements. We
+ * have no way of knowing whether anyone's looking at pgWalUsage, so do it
+ * unconditionally.
+ */
Would the following sentence in lazyvacuum.c be also better for
parallel create index?
* If there are no extensions loaded that care, we could skip this. We
* have no way of knowing whether anyone's looking at pgBufferUsage or
* pgWalUsage, so do it unconditionally.
The attached patch changes to the above comment and removed the code
that is used to un-support only buffer usage accumulation.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_create_index_v3.patchapplication/octet-stream; name=bufferusage_create_index_v3.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 29a6f5ade6..823d45f074 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -71,6 +71,7 @@
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000006)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -194,6 +195,7 @@ typedef struct BTLeader
Sharedsort *sharedsort2;
Snapshot snapshot;
WalUsage *walusage;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1457,6 +1459,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
WalUsage *walusage;
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1510,16 +1513,19 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
}
/*
- * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_KEY_WAL_USAGE
+ * and PARALLEL_KEY_BUFFER_USAGE.
*
- * WalUsage during execution of maintenance command can be used by an
- * extension that reports the WAL usage, such as pg_stat_statements. We
- * have no way of knowing whether anyone's looking at pgWalUsage, so do it
- * unconditionally.
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgWalUsage or
+ * pgBufferUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -1592,10 +1598,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
- /* Allocate space for each worker's WalUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
walusage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
@@ -1608,6 +1620,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
btleader->walusage = walusage;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1646,7 +1659,7 @@ _bt_end_parallel(BTLeader *btleader)
* or we might get incomplete data.)
*/
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(NULL, &btleader->walusage[i]);
+ InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 74ee4808e3..3b9c6aebb9 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -188,11 +188,8 @@ InstrStartParallelQuery(void)
void
InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- if (bufusage)
- {
- memset(bufusage, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
- }
+ memset(bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
memset(walusage, 0, sizeof(WalUsage));
WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
@@ -201,8 +198,7 @@ InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
void
InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- if (bufusage)
- BufferUsageAdd(&pgBufferUsage, bufusage);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
WalUsageAdd(&pgWalUsage, walusage);
}
On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
The attached patch changes to the above comment and removed the code
that is used to un-support only buffer usage accumulation.
So, IIUC, the purpose of this patch will be to count the buffer usage
due to the heap scan (in heapam_index_build_range_scan) we perform
while parallel create index? Because the index creation itself won't
use buffer manager.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The attached patch changes to the above comment and removed the code
that is used to un-support only buffer usage accumulation.So, IIUC, the purpose of this patch will be to count the buffer usage
due to the heap scan (in heapam_index_build_range_scan) we perform
while parallel create index? Because the index creation itself won't
use buffer manager.
Oops, I'd missed Peter's comment. Btree index doesn't use
heapam_index_build_range_scan so it's not necessary. Sorry for the
noise.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.
Thanks!
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + read_rate = 0; write_rate = 0; if ((secs > 0) || (usecs > 0)) @@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, (long long) VacuumPageDirty); appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"), read_rate, write_rate); - appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, + _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"), + walusage.wal_records, + walusage.wal_num_fpw, + walusage.wal_bytes);Here, we are not displaying Buffers related data, so why do we think
it is important to display WAL data? I see some point in displaying
Buffers and WAL data in a vacuum (verbose), but I feel it is better to
make a case for both the statistics together rather than just
displaying one and leaving other. I think the other change related to
autovacuum stats seems okay to me.
One thing is that the amount of WAL, and more precisely FPW, is quite
unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
a very useful metric. That being said I totally agree with you that both
should be displayed. Should I send a patch to also expose it?
On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
Here, we are not displaying Buffers related data, so why do we think
it is important to display WAL data? I see some point in displaying
Buffers and WAL data in a vacuum (verbose), but I feel it is better to
make a case for both the statistics together rather than just
displaying one and leaving other. I think the other change related to
autovacuum stats seems okay to me.One thing is that the amount of WAL, and more precisely FPW, is quite
unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
a very useful metric.
I agree but we already have a way via pg_stat_statements to find it if
the metric is so useful.
That being said I totally agree with you that both
should be displayed. Should I send a patch to also expose it?
I think this should be a separate proposal. Let's not add things
unless they are really essential. We can separately discuss of
enhancing vacuum verbose for Buffer and WAL usage stats and see if
others also find that information useful. I think you can send a
patch by removing the code I mentioned above if you agree. Thanks for
working on this.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 6, 2020 at 12:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:The attached patch changes to the above comment and removed the code
that is used to un-support only buffer usage accumulation.So, IIUC, the purpose of this patch will be to count the buffer usage
due to the heap scan (in heapam_index_build_range_scan) we perform
while parallel create index? Because the index creation itself won't
use buffer manager.Oops, I'd missed Peter's comment. Btree index doesn't use
heapam_index_build_range_scan so it's not necessary.
AFAIU, it uses heapam_index_build_range_scan but for writing to index,
it doesn't use buffer manager. So, I guess probably we can accumulate
BufferUsage stats for parallel create index. What I wanted to know is
whether the extra lookup for pg_amproc or any other catalog access via
parallel workers is fine or we somehow want to eliminate that?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 06, 2020 at 02:34:36PM +0530, Amit Kapila wrote:
On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
Here, we are not displaying Buffers related data, so why do we think
it is important to display WAL data? I see some point in displaying
Buffers and WAL data in a vacuum (verbose), but I feel it is better to
make a case for both the statistics together rather than just
displaying one and leaving other. I think the other change related to
autovacuum stats seems okay to me.One thing is that the amount of WAL, and more precisely FPW, is quite
unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
a very useful metric.I agree but we already have a way via pg_stat_statements to find it if
the metric is so useful.
Agreed.
That being said I totally agree with you that both
should be displayed. Should I send a patch to also expose it?I think this should be a separate proposal. Let's not add things
unless they are really essential. We can separately discuss of
enhancing vacuum verbose for Buffer and WAL usage stats and see if
others also find that information useful. I think you can send a
patch by removing the code I mentioned above if you agree. Thanks for
working on this.
Thanks! v15 attached.
Attachments:
v15-0001-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patchtext/plain; charset=us-asciiDownload
From 9f4cbb1817372628a3fc7b3fffe82ec2a3942f91 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 19 Mar 2020 16:08:47 +0100
Subject: [PATCH v15] Expose WAL usage counters in verbose (auto)vacuum output.
Author: Julien Rouhaud
Reviewed-by: Fuji Masao, Amit Kapila, Dilip Kumar
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3ca7f5d136..877512fae4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
int nindexes;
PGRUsage ru0;
TimestampTz starttime = 0;
+ WalUsage walusage_start = pgWalUsage;
+ WalUsage walusage = {0, 0, 0};
long secs;
int usecs;
double read_rate,
@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
TimestampDifference(starttime, endtime, &secs, &usecs);
+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
read_rate = 0;
write_rate = 0;
if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
(long long) VacuumPageDirty);
appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);
ereport(LOG,
(errmsg_internal("%s", buf.data)));
@@ -1727,6 +1738,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
"%u pages are entirely empty.\n",
empty_pages),
empty_pages);
+
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
ereport(elevel,
--
2.20.1
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.
Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.
If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't
like much either version.
On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com>
wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.If we want to be consistent shouldn't we rename it to wal_fpws? FTR I
don't
like much either version.
Since FPW is an acronym, plural form reads better when you are using
uppercase (such as FPWs or FPW's); thus, I prefer singular form because
parameter names are lowercase. Function description will clarify that this
is "number of WAL full page writes".
Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the
rest of the EXPLAIN output. Can that be adjusted?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the rest of
the EXPLAIN output. Can that be adjusted?
We talked about that here:
/messages/by-id/20200402054120.GC14618@telsasoft.com
--
Justin
On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
AFAIU, it uses heapam_index_build_range_scan but for writing to index,
it doesn't use buffer manager.
Right. It doesn't need to use the buffer manager to write to the
index, unlike (say) GIN's CREATE INDEX.
--
Peter Geoghegan
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the rest of
the EXPLAIN output. Can that be adjusted?We talked about that here:
/messages/by-id/20200402054120.GC14618@telsasoft.com
Yeah. Just to brief here, the main reason was that one of the fields
(full page writes) already had a single space and then we had prior
cases as mentioned in Justin's email [1]/messages/by-id/20200402054120.GC14618@telsasoft.com where we use two spaces which
lead us to decide using two spaces in this case.
Now, we can change back to one space as suggested by you but I am not
sure if that is an improvement over what we have done. Let me know if
you think otherwise.
[1]: /messages/by-id/20200402054120.GC14618@telsasoft.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:
On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't
like much either version.Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefer singular form because parameter names are lowercase. Function description will clarify that this is "number of WAL full page writes".
I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is
better if others who didn't like this name can also share their
opinion now because changing multiple times the same thing is not a
good idea.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, 7 Apr 2020 at 02:40, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
AFAIU, it uses heapam_index_build_range_scan but for writing to index,
it doesn't use buffer manager.Right. It doesn't need to use the buffer manager to write to the
index, unlike (say) GIN's CREATE INDEX.
Hmm, after more thoughts and testing, it seems to me that parallel
btree index creation uses buffer manager while scanning the table in
parallel, i.e in heapam_index_build_range_scan, which affects
shared_blks_xxx in pg_stat_statements. I've some parallel create index
tests with the current HEAD and with the attached patch. The table has
44248 blocks.
HEAD, no workers:
-[ RECORD 1 ]-------+----------
total_plan_time | 0
total_plan_time | 0
shared_blks_hit | 148
shared_blks_read | 44281
total_read_blks | 44429
shared_blks_dirtied | 44261
shared_blks_written | 24644
wal_records | 71693
wal_num_fpw | 71682
wal_bytes | 566815038
HEAD, 4 workers:
-[ RECORD 1 ]-------+----------
total_plan_time | 0
total_plan_time | 0
shared_blks_hit | 160
shared_blks_read | 8892
total_read_blks | 9052
shared_blks_dirtied | 8871
shared_blks_written | 5342
wal_records | 71693
wal_num_fpw | 71682
wal_bytes | 566815038
The WAL usage statistics are good but the buffer usage statistics seem
not correct.
Patched, no workers:
-[ RECORD 1 ]-------+----------
total_plan_time | 0
total_plan_time | 0
shared_blks_hit | 148
shared_blks_read | 44281
total_read_blks | 44429
shared_blks_dirtied | 44261
shared_blks_written | 24843
wal_records | 71693
wal_num_fpw | 71682
wal_bytes | 566815038
Patched, 4 workers:
-[ RECORD 1 ]-------+----------
total_plan_time | 0
total_plan_time | 0
shared_blks_hit | 172
shared_blks_read | 44282
total_read_blks | 44454
shared_blks_dirtied | 44261
shared_blks_written | 26968
wal_records | 71693
wal_num_fpw | 71682
wal_bytes | 566815038
Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_create_index_v4.patchapplication/octet-stream; name=bufferusage_create_index_v4.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 29a6f5ade6..707dd0741f 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -71,6 +71,7 @@
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000006)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -194,6 +195,7 @@ typedef struct BTLeader
Sharedsort *sharedsort2;
Snapshot snapshot;
WalUsage *walusage;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1457,6 +1459,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
WalUsage *walusage;
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1510,16 +1513,19 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
}
/*
- * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_KEY_WAL_USAGE
+ * and PARALLEL_KEY_BUFFER_USAGE.
*
- * WalUsage during execution of maintenance command can be used by an
- * extension that reports the WAL usage, such as pg_stat_statements. We
- * have no way of knowing whether anyone's looking at pgWalUsage, so do it
- * unconditionally.
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgWalUsage or
+ * pgBufferUsage, so do it unconditionally.
*/
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
@@ -1592,10 +1598,16 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
- /* Allocate space for each worker's WalUsage; no need to initialize */
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
walusage = shm_toc_allocate(pcxt->toc,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
@@ -1608,6 +1620,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
btleader->walusage = walusage;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1646,7 +1659,7 @@ _bt_end_parallel(BTLeader *btleader)
* or we might get incomplete data.)
*/
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(NULL, &btleader->walusage[i]);
+ InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
@@ -1779,6 +1792,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
WalUsage *walusage;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1848,9 +1862,11 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
- /* Report WAL usage during parallel execution */
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(NULL, &walusage[ParallelWorkerNumber]);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 74ee4808e3..3b9c6aebb9 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -188,11 +188,8 @@ InstrStartParallelQuery(void)
void
InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- if (bufusage)
- {
- memset(bufusage, 0, sizeof(BufferUsage));
- BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
- }
+ memset(bufusage, 0, sizeof(BufferUsage));
+ BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
memset(walusage, 0, sizeof(WalUsage));
WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
@@ -201,8 +198,7 @@ InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
void
InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
{
- if (bufusage)
- BufferUsageAdd(&pgBufferUsage, bufusage);
+ BufferUsageAdd(&pgBufferUsage, bufusage);
WalUsageAdd(&pgWalUsage, walusage);
}
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.
Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't
like much either version.Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefer singular form because parameter names are lowercase. Function description will clarify that this is "number of WAL full page writes".
I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is
better if others who didn't like this name can also share their
opinion now because changing multiple times the same thing is not a
good idea.
+1
About Justin and your comments on the other thread:
On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote:
"full page records" seems to be showing the number of full page
images, not the record having full page images.I am not sure what exactly is a difference but it is the records
having full page images. Julien correct me if I am wrong.Obviously previous complaints about the meaning and parsability of
"full page writes" should be addressed here for consistency.There's a couple places that say "full page image records" which I think is
language you were trying to avoid. It's the number of pages, not the number of
records, no ? I see explain and autovacuum say what I think is wanted, but
these say the wrong thing? Find attached slightly larger patch.$ git grep 'image record'
contrib/pg_stat_statements/pg_stat_statements.c: int64 wal_num_fpw; /* # of WAL full page image records generated */
doc/src/sgml/ref/explain.sgml: number of records, number of full page image records and amount of WALFew comments: 1. - int64 wal_num_fpw; /* # of WAL full page image records generated */ + int64 wal_num_fpw; /* # of WAL full page images generated */Let's change comment as " /* # of WAL full page writes generated */"
to be consistent with other places like instrument.h. Also, make a
similar change at other places if required.
Agreed. That's pg_stat_statements.c and instrument.h. I'll send a
patch once we reach consensus with the rest of the comments.
2. <entry> - Total amount of WAL bytes generated by the statement + Total number of WAL bytes generated by the statement </entry>I feel the previous text was better as this field can give us the size
of WAL with which we can answer "how much WAL data is generated by a
particular statement?". Julien, do you have any thoughts on this?
I also prefer "amount" as it feels more natural. I'm not a native
english speaker though, so maybe I'm just biased.
On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.
Okay, I'll check it.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-04-07 04:12, Amit Kapila wrote:
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the rest of
the EXPLAIN output. Can that be adjusted?We talked about that here:
/messages/by-id/20200402054120.GC14618@telsasoft.comYeah. Just to brief here, the main reason was that one of the fields
(full page writes) already had a single space and then we had prior
cases as mentioned in Justin's email [1] where we use two spaces which
lead us to decide using two spaces in this case.
We also have existing cases for the other way:
actual time=0.050..0.052
Buffers: shared hit=3 dirtied=1
The cases mentioned by Justin are not formatted in a key=value format,
so it's not quite the same, but it also raises the question why they are
not.
Let's figure out a way to consolidate this without making up a third format.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.Okay, I'll check it.
I've checked the buffer usage differences when parallel btree index creation.
TL;DR;
During tuple sorting individual parallel workers read blocks of
pg_amproc and pg_amproc_fam_proc_index to get the sort support
function. The call flow is like:
ParallelWorkerMain()
_bt_parallel_scan_and_sort()
tuplesort_begin_index_btree()
PrepareSortSupportFromIndexRel()
FinishSortSupportFunction()
get_opfamily_proc()
The details are as follows.
I populated the test table by the following scripts:
create table test (c int) with (autovacuum_enabled = off, parallel_workers = 8);
insert into test select generate_series(1,10000000);
and create index DDL is:
create index test_idx on test (c);
Before executing the test script, I've put code at the following 4
places which checks the buffer usage at that point, and calculated the
difference between points: (a), (b) and (c). For example, (b) shows
the number of blocks read or hit during executing scanning heap and
building index.
1. Before executing CREATE INDEX command (at pgss_ProcessUtility())
(a)
2. Before parallel create index (at _bt_begin_parallel())
(b)
3. After parallel create index, after accumlating workers stats (at
_bt_end_parallel())
(c)
4. After executing CREATE INDEX command (at pgss_ProcessUtility())
And here is the results:
2 workers:
(a) hit: 107, read: 26
(b) hit: 12(=6+3+3), read: 44248(=15538+14453+14527)
(c) hit: 13, read: 2
total hit: 132, read:44276
4 workers:
(a) hit: 107, read: 26
(b) hit: 18(=6+3+3+3+3), read: 44248(=9368+8582+8544+9250+8504)
(c) hit: 13, read: 2
total hit: 138, read:44276
The table 'test' has 44276 blocks.
From the above results, the total number of reading blocks (44248
blocks) during parallel index creation is stable and equals to the
number of blocks of the test table. And we can see that extra three
blocks are read per workers. These three blocks are two for
pg_amproc_fam_proc_index and one for pg_amproc. That is, individual
parallel workers accesses these relations to get the sort support
function. The full backtrace is:
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff779c561a libsystem_kernel.dylib`__select + 10
frame #1: 0x000000010cc9f90d postgres`pg_usleep(microsec=20000000)
at pgsleep.c:56:10
frame #2: 0x000000010ca5a668
postgres`ReadBuffer_common(smgr=0x00007fe872848f70,
relpersistence='p', forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL,
strategy=0x0000000000000000, hit=0x00007ffee363071b) at bufmgr.c:685:3
frame #3: 0x000000010ca5a4b6
postgres`ReadBufferExtended(reln=0x000000010d58f790,
forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL,
strategy=0x0000000000000000) at bufmgr.c:628:8
frame #4: 0x000000010ca5a397
postgres`ReadBuffer(reln=0x000000010d58f790, blockNum=3) at
bufmgr.c:560:9
frame #5: 0x000000010c67187e
postgres`_bt_getbuf(rel=0x000000010d58f790, blkno=3, access=1) at
nbtpage.c:792:9
frame #6: 0x000000010c670507
postgres`_bt_getroot(rel=0x000000010d58f790, access=1) at
nbtpage.c:294:13
frame #7: 0x000000010c679393
postgres`_bt_search(rel=0x000000010d58f790, key=0x00007ffee36312d0,
bufP=0x00007ffee3631bec, access=1, snapshot=0x00007fe8728388e0) at
nbtsearch.c:107:10
frame #8: 0x000000010c67b489
postgres`_bt_first(scan=0x00007fe86f814998, dir=ForwardScanDirection)
at nbtsearch.c:1355:10
frame #9: 0x000000010c676869
postgres`btgettuple(scan=0x00007fe86f814998, dir=ForwardScanDirection)
at nbtree.c:253:10
frame #10: 0x000000010c6656ad
postgres`index_getnext_tid(scan=0x00007fe86f814998,
direction=ForwardScanDirection) at indexam.c:530:10
frame #11: 0x000000010c66585b
postgres`index_getnext_slot(scan=0x00007fe86f814998,
direction=ForwardScanDirection, slot=0x00007fe86f814880) at
indexam.c:622:10
frame #12: 0x000000010c663eac
postgres`systable_getnext(sysscan=0x00007fe86f814828) at genam.c:454:7
frame #13: 0x000000010cc0be41
postgres`SearchCatCacheMiss(cache=0x00007fe872818e80, nkeys=4,
hashValue=3052139574, hashIndex=6, v1=1976, v2=23, v3=23, v4=2) at
catcache.c:1368:9
frame #14: 0x000000010cc0bced
postgres`SearchCatCacheInternal(cache=0x00007fe872818e80, nkeys=4,
v1=1976, v2=23, v3=23, v4=2) at catcache.c:1299:9
frame #15: 0x000000010cc0baa8
postgres`SearchCatCache4(cache=0x00007fe872818e80, v1=1976, v2=23,
v3=23, v4=2) at catcache.c:1191:9
frame #16: 0x000000010cc27c82 postgres`SearchSysCache4(cacheId=5,
key1=1976, key2=23, key3=23, key4=2) at syscache.c:1156:9
frame #17: 0x000000010cc105dd
postgres`get_opfamily_proc(opfamily=1976, lefttype=23, righttype=23,
procnum=2) at lsyscache.c:751:7
frame #18: 0x000000010cc72e1d
postgres`FinishSortSupportFunction(opfamily=1976, opcintype=23,
ssup=0x00007fe86f8147d0) at sortsupport.c:99:24
frame #19: 0x000000010cc73100
postgres`PrepareSortSupportFromIndexRel(indexRel=0x000000010d5ced48,
strategy=1, ssup=0x00007fe86f8147d0) at sortsupport.c:176:2
frame #20: 0x000000010cc75463
postgres`tuplesort_begin_index_btree(heapRel=0x000000010d5cf808,
indexRel=0x000000010d5ced48, enforceUnique=false, workMem=21845,
coordinate=0x00007fe872839248, randomAccess=false) at
tuplesort.c:1114:3
frame #21: 0x000000010c681ffc
postgres`_bt_parallel_scan_and_sort(btspool=0x00007fe872839738,
btspool2=0x0000000000000000, btshared=0x000000010d56c4c0,
sharedsort=0x000000010d56c460, sharedsort2=0x0000000000000000,
sortmem=21845, progress=false) at nbtsort.c:1941:23
frame #22: 0x000000010c681eb2
postgres`_bt_parallel_build_main(seg=0x00007fe87280a058,
toc=0x000000010d56c000) at nbtsort.c:1889:2
frame #23: 0x000000010c6b7358
postgres`ParallelWorkerMain(main_arg=1169089032) at parallel.c:1471:2
frame #24: 0x000000010c9da86f postgres`StartBackgroundWorker at
bgworker.c:813:2
frame #25: 0x000000010c9efbc0
postgres`do_start_bgworker(rw=0x00007fe86f419290) at
postmaster.c:5852:4
frame #26: 0x000000010c9eff9f postgres`maybe_start_bgworkers at
postmaster.c:6078:9
frame #27: 0x000000010c9eee99
postgres`sigusr1_handler(postgres_signal_arg=30) at
postmaster.c:5247:3
frame #28: 0x00007fff77a74b5d libsystem_platform.dylib`_sigtramp + 29
frame #29: 0x00007fff779c561b libsystem_kernel.dylib`__select + 11
frame #30: 0x000000010c9ea48c postgres`ServerLoop at postmaster.c:1691:13
frame #31: 0x000000010c9e9e06 postgres`PostmasterMain(argc=5,
argv=0x00007fe86f4036f0) at postmaster.c:1400:11
frame #32: 0x000000010c8ee399 postgres`main(argc=<unavailable>,
argv=<unavailable>) at main.c:210:3
frame #33: 0x00007fff778893d5 libdyld.dylib`start + 1
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 7, 2020 at 12:00 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-04-07 04:12, Amit Kapila wrote:
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the rest of
the EXPLAIN output. Can that be adjusted?We talked about that here:
/messages/by-id/20200402054120.GC14618@telsasoft.comYeah. Just to brief here, the main reason was that one of the fields
(full page writes) already had a single space and then we had prior
cases as mentioned in Justin's email [1] where we use two spaces which
lead us to decide using two spaces in this case.We also have existing cases for the other way:
actual time=0.050..0.052
Buffers: shared hit=3 dirtied=1The cases mentioned by Justin are not formatted in a key=value format,
so it's not quite the same, but it also raises the question why they are
not.Let's figure out a way to consolidate this without making up a third format.
The parsability problem Justin was mentioning is only due to "full
page writes", so we could use "full_page_writes" or "fpw" instead and
remove the extra spaces. There would be a small discrepancy with the
verbose autovacuum log, but there are others differences already.
I'd slightly in favor of "fpw" to be more concise. Would that be ok?
On Tue, Apr 07, 2020 at 12:00:29PM +0200, Peter Eisentraut wrote:
We also have existing cases for the other way:
actual time=0.050..0.052
Buffers: shared hit=3 dirtied=1The cases mentioned by Justin are not formatted in a key=value format, so
it's not quite the same, but it also raises the question why they are not.Let's figure out a way to consolidate this without making up a third format.
So this re-raises my suggestion here to use colons, Title Case Field Names, and
"Size: ..kB" rather than "bytes=":
|/messages/by-id/20200403054451.GN14618@telsasoft.com
As I see it, the sort/hashjoin style is being used for cases with fields with
different units:
Sort Method: quicksort Memory: 931kB
Buckets: 1024 Batches: 1 Memory Usage: 16kB
..which is distinguished from the case where the units are the same, like
buffers (hit=Npages read=Npages dirtied=Npages written=Npages).
Note, as of 1f39bce021, we have hashagg_disk, which looks like this:
template1=# explain analyze SELECT a, COUNT(1) FROM generate_series(1,99999) a GROUP BY 1 ORDER BY 1;
...
-> HashAggregate (cost=1499.99..1501.99 rows=200 width=12) (actual time=166.883..280.943 rows=99999 loops=1)
Group Key: a
Peak Memory Usage: 4913 kB
Disk Usage: 1848 kB
HashAgg Batches: 8
Incremental sort adds yet another variation, which I've mentioned that thread.
I'm hoping to come to some resolution here, first.
/messages/by-id/20200407042521.GH2228@telsasoft.com
--
Justin
On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-04-07 04:12, Amit Kapila wrote:
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
I noticed in some of the screenshots that were tweeted that for example in
WAL: records=1 bytes=56
there are two spaces between pieces of data. This doesn't match the rest of
the EXPLAIN output. Can that be adjusted?We talked about that here:
/messages/by-id/20200402054120.GC14618@telsasoft.comYeah. Just to brief here, the main reason was that one of the fields
(full page writes) already had a single space and then we had prior
cases as mentioned in Justin's email [1] where we use two spaces which
lead us to decide using two spaces in this case.We also have existing cases for the other way:
actual time=0.050..0.052
Buffers: shared hit=3 dirtied=1
Buffers case is not the same because 'shared' is used for 'hit',
'read', 'dirtied', etc. However, I think it is arguable.
The cases mentioned by Justin are not formatted in a key=value format,
so it's not quite the same, but it also raises the question why they are
not.Let's figure out a way to consolidate this without making up a third format.
Sure, I think my intention is to keep the format of WAL stats as close
to Buffers stats as possible because both depict I/O and users would
probably be interested to check/read both together. There is a point
to keep things in a format so that it is easier for someone to parse
but I guess as these as fixed 'words', it shouldn't be difficult
either way and we should give more weightage to consistency. Any
suggestions?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.Okay, I'll check it.
I've checked the buffer usage differences when parallel btree index creation.
TL;DR;
During tuple sorting individual parallel workers read blocks of
pg_amproc and pg_amproc_fam_proc_index to get the sort support
function. The call flow is like:ParallelWorkerMain()
_bt_parallel_scan_and_sort()
tuplesort_begin_index_btree()
PrepareSortSupportFromIndexRel()
FinishSortSupportFunction()
get_opfamily_proc()
Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.Okay, I'll check it.
I've checked the buffer usage differences when parallel btree index creation.
TL;DR;
During tuple sorting individual parallel workers read blocks of
pg_amproc and pg_amproc_fam_proc_index to get the sort support
function. The call flow is like:ParallelWorkerMain()
_bt_parallel_scan_and_sort()
tuplesort_begin_index_btree()
PrepareSortSupportFromIndexRel()
FinishSortSupportFunction()
get_opfamily_proc()Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?
I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.
Fair enough, can you once check this in back-branches as this needs to
be backpatched? I will do that once by myself as well.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 8, 2020 at 8:23 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.Agreed, but can you check which part of code does that lookup? I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.Okay, I'll check it.
I've checked the buffer usage differences when parallel btree index creation.
TL;DR;
During tuple sorting individual parallel workers read blocks of
pg_amproc and pg_amproc_fam_proc_index to get the sort support
function. The call flow is like:ParallelWorkerMain()
_bt_parallel_scan_and_sort()
tuplesort_begin_index_btree()
PrepareSortSupportFromIndexRel()
FinishSortSupportFunction()
get_opfamily_proc()Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.
+1
On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.Fair enough, can you once check this in back-branches as this needs to
be backpatched? I will do that once by myself as well.
I've done the same test with HEAD of both REL_12_STABLE and
REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
parallel index creation was introduced. I've attached the patches
for PG12 and PG11 I used for this test for reference.
Here are the results:
* PG12
With no worker:
-[ RECORD 1 ]-------+-------------
shared_blks_hit | 119
shared_blks_read | 44283
total_read_blks | 44402
shared_blks_dirtied | 44262
shared_blks_written | 24925
With 4 workers:
-[ RECORD 1 ]-------+------------
shared_blks_hit | 128
shared_blks_read | 8844
total_read_blks | 8972
shared_blks_dirtied | 8822
shared_blks_written | 5393
With 4 workers after patching:
-[ RECORD 1 ]-------+------------
shared_blks_hit | 140
shared_blks_read | 44284
total_read_blks | 44424
shared_blks_dirtied | 44262
shared_blks_written | 26574
* PG11
With no worker:
-[ RECORD 1 ]-------+------------
shared_blks_hit | 124
shared_blks_read | 44284
total_read_blks | 44408
shared_blks_dirtied | 44263
shared_blks_written | 24908
With 4 workers:
-[ RECORD 1 ]-------+-------------
shared_blks_hit | 132
shared_blks_read | 8910
total_read_blks | 9042
shared_blks_dirtied | 8888
shared_blks_written | 5370
With 4 workers after patched:
-[ RECORD 1 ]-------+-------------
shared_blks_hit | 144
shared_blks_read | 44285
total_read_blks | 44429
shared_blks_dirtied | 44263
shared_blks_written | 26861
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
bufferusage_create_index_pg12.patchapplication/x-patch; name=bufferusage_create_index_pg12.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index edc4a82b02..da5b39eb02 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
#include "access/xloginsert.h"
#include "catalog/index.h"
#include "commands/progress.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -81,6 +82,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1336,6 +1339,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1388,6 +1392,17 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1459,6 +1474,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1469,6 +1489,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1497,8 +1518,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1629,6 +1660,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1690,11 +1722,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem, false);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
bufferusage_create_index_pg11.patchapplication/x-patch; name=bufferusage_create_index_pg11.patchDownload
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index dab41ea298..54627b786a 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -64,6 +64,7 @@
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "catalog/index.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/smgr.h"
@@ -78,6 +79,7 @@
#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002)
#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003)
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004)
+#define PARALLEL_KEY_BUFFER_USAGE UINT64CONST(0xA000000000000005)
/*
* DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -192,6 +194,7 @@ typedef struct BTLeader
Sharedsort *sharedsort;
Sharedsort *sharedsort2;
Snapshot snapshot;
+ BufferUsage *bufferusage;
} BTLeader;
/*
@@ -1240,6 +1243,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
Sharedsort *sharedsort2;
BTSpool *btspool = buildstate->spool;
BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+ BufferUsage *bufferusage;
bool leaderparticipates = true;
char *sharedquery;
int querylen;
@@ -1292,6 +1296,17 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
shm_toc_estimate_keys(&pcxt->estimator, 3);
}
+ /*
+ * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage,
+ * so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
querylen = strlen(debug_query_string);
shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1361,6 +1376,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
memcpy(sharedquery, debug_query_string, querylen + 1);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
+ /* Allocate space for each worker's BufferUsage; no need to initialize */
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
/* Launch workers, saving status for leader/caller */
LaunchParallelWorkers(pcxt);
btleader->pcxt = pcxt;
@@ -1371,6 +1391,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
btleader->sharedsort = sharedsort;
btleader->sharedsort2 = sharedsort2;
btleader->snapshot = snapshot;
+ btleader->bufferusage = bufferusage;
/* If no workers were successfully launched, back out (do serial build) */
if (pcxt->nworkers_launched == 0)
@@ -1399,8 +1420,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
static void
_bt_end_parallel(BTLeader *btleader)
{
+ int i;
+
/* Shutdown worker processes */
WaitForParallelWorkersToFinish(btleader->pcxt);
+
+ /*
+ * Next, accumulate buffer usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&btleader->bufferusage[i]);
+
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
UnregisterSnapshot(btleader->snapshot);
@@ -1537,6 +1568,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
Relation indexRel;
LOCKMODE heapLockmode;
LOCKMODE indexLockmode;
+ BufferUsage *bufferusage;
int sortmem;
#ifdef BTREE_BUILD_STATS
@@ -1598,11 +1630,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort2, seg);
}
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
_bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort,
sharedsort2, sortmem);
+ /* Report buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber]);
+
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
{
On Wed, Apr 8, 2020 at 1:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks for the investigation. I don't see we can do anything special
about this. In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much. I am not sure if
it is worth adding a comment for this, what do you think?I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.Fair enough, can you once check this in back-branches as this needs to
be backpatched? I will do that once by myself as well.I've done the same test with HEAD of both REL_12_STABLE and
REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
parallel index creation was introduced. I've attached the patches
for PG12 and PG11 I used for this test for reference.
Thanks, I will once again verify and push this tomorrow if there are
no other comments.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.I wasn't paying much attention to this thread. May I suggest changing
wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
'num'. It seems inconsistent to me.If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't
like much either version.Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefer singular form because parameter names are lowercase. Function description will clarify that this is "number of WAL full page writes".
I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is
better if others who didn't like this name can also share their
opinion now because changing multiple times the same thing is not a
good idea.+1
About Justin and your comments on the other thread:
On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote:
"full page records" seems to be showing the number of full page
images, not the record having full page images.I am not sure what exactly is a difference but it is the records
having full page images. Julien correct me if I am wrong.Obviously previous complaints about the meaning and parsability of
"full page writes" should be addressed here for consistency.There's a couple places that say "full page image records" which I think is
language you were trying to avoid. It's the number of pages, not the number of
records, no ? I see explain and autovacuum say what I think is wanted, but
these say the wrong thing? Find attached slightly larger patch.$ git grep 'image record'
contrib/pg_stat_statements/pg_stat_statements.c: int64 wal_num_fpw; /* # of WAL full page image records generated */
doc/src/sgml/ref/explain.sgml: number of records, number of full page image records and amount of WALFew comments: 1. - int64 wal_num_fpw; /* # of WAL full page image records generated */ + int64 wal_num_fpw; /* # of WAL full page images generated */Let's change comment as " /* # of WAL full page writes generated */"
to be consistent with other places like instrument.h. Also, make a
similar change at other places if required.Agreed. That's pg_stat_statements.c and instrument.h. I'll send a
patch once we reach consensus with the rest of the comments.
Would you like to send a consolidated patch that includes Euler's
suggestion and Justin's patch (by making changes for points we
discussed.)? I think we can keep the point related to number of
spaces before each field open?
2. <entry> - Total amount of WAL bytes generated by the statement + Total number of WAL bytes generated by the statement </entry>I feel the previous text was better as this field can give us the size
of WAL with which we can answer "how much WAL data is generated by a
particular statement?". Julien, do you have any thoughts on this?I also prefer "amount" as it feels more natural.
As we see no other opinion on this matter, we can use "amount" here.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira <euler.taveira@2ndquadrant.com> wrote: Few comments: 1. - int64 wal_num_fpw; /* # of WAL full page image records generated */ + int64 wal_num_fpw; /* # of WAL full page images generated */Let's change comment as " /* # of WAL full page writes generated */"
to be consistent with other places like instrument.h. Also, make a
similar change at other places if required.Agreed. That's pg_stat_statements.c and instrument.h. I'll send a
patch once we reach consensus with the rest of the comments.Would you like to send a consolidated patch that includes Euler's
suggestion and Justin's patch (by making changes for points we
discussed.)? I think we can keep the point related to number of
spaces before each field open?
Sure, I'll take care of that tomorrow!
2. <entry> - Total amount of WAL bytes generated by the statement + Total number of WAL bytes generated by the statement </entry>I feel the previous text was better as this field can give us the size
of WAL with which we can answer "how much WAL data is generated by a
particular statement?". Julien, do you have any thoughts on this?I also prefer "amount" as it feels more natural.
As we see no other opinion on this matter, we can use "amount" here.
Ok.
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Would you like to send a consolidated patch that includes Euler's
suggestion and Justin's patch (by making changes for points we
discussed.)? I think we can keep the point related to number of
spaces before each field open?Sure, I'll take care of that tomorrow!
I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here. I went with those changes:
- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writes generated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another thread
- kept "amount" of WAL bytes
- no change to the explain output as I have no idea what is the
consensus (one or two spaces, use semicolon or equal, show unit or
not)
Attachments:
v1-wal_usage_fixup.difftext/x-patch; charset=US-ASCII; name=v1-wal_usage_fixup.diffDownload
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 30566574ab..d0a6c3b4fc 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -43,7 +43,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT blk_read_time float8,
OUT blk_write_time float8,
OUT wal_records int8,
- OUT wal_num_fpw int8,
+ OUT wal_fpw int8,
OUT wal_bytes numeric
)
RETURNS SETOF record
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 04abdab904..90bc6fd013 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -189,7 +189,7 @@ typedef struct Counters
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
int64 wal_records; /* # of WAL records generated */
- int64 wal_num_fpw; /* # of WAL full page image records generated */
+ int64 wal_fpw; /* # of WAL full page writes generated */
uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
@@ -1432,7 +1432,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
e->counters.wal_records += walusage->wal_records;
- e->counters.wal_num_fpw += walusage->wal_num_fpw;
+ e->counters.wal_fpw += walusage->wal_fpw;
e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
@@ -1824,7 +1824,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
Datum wal_bytes;
values[i++] = Int64GetDatumFast(tmp.wal_records);
- values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+ values[i++] = Int64GetDatumFast(tmp.wal_fpw);
snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 5a962feb39..f0b769fad8 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -283,11 +283,11 @@
</row>
<row>
- <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><structfield>wal_fpw</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>
- Total count of WAL full page writes generated by the statement
+ Total count of WAL full page images generated by the statement
</entry>
</row>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 024ede4a8d..1e12715a03 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page image records and amount of WAL
- bytes generated. In text format, only non-zero values are printed. This
+ number of records, number of full page images and amount of WAL bytes
+ generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d37a4..d8bc06fe0b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -676,7 +676,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
_("WAL usage: %ld records, %ld full page writes, "
UINT64_FORMAT " bytes"),
walusage.wal_records,
- walusage.wal_num_fpw,
+ walusage.wal_fpw,
walusage.wal_bytes);
ereport(LOG,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c38bc1412d..11e32733c4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1258,7 +1258,7 @@ XLogInsertRecord(XLogRecData *rdata,
{
pgWalUsage.wal_bytes += rechdr->xl_tot_len;
pgWalUsage.wal_records++;
- pgWalUsage.wal_num_fpw += num_fpw;
+ pgWalUsage.wal_fpw += num_fpw;
}
return EndPos;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 455f54ef83..57ab676572 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3355,7 +3355,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (es->format == EXPLAIN_FORMAT_TEXT)
{
/* Show only positive counter values. */
- if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ if ((usage->wal_records > 0) || (usage->wal_fpw > 0) ||
(usage->wal_bytes > 0))
{
ExplainIndentText(es);
@@ -3364,9 +3364,9 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (usage->wal_records > 0)
appendStringInfo(es->str, " records=%ld",
usage->wal_records);
- if (usage->wal_num_fpw > 0)
+ if (usage->wal_fpw > 0)
appendStringInfo(es->str, " full page writes=%ld",
- usage->wal_num_fpw);
+ usage->wal_fpw);
if (usage->wal_bytes > 0)
appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
@@ -3378,7 +3378,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
ExplainPropertyInteger("WAL full page writes", NULL,
- usage->wal_num_fpw, es);
+ usage->wal_fpw, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
}
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 3b9c6aebb9..7c9d723552 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -248,7 +248,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add)
{
dst->wal_bytes += add->wal_bytes;
dst->wal_records += add->wal_records;
- dst->wal_num_fpw += add->wal_num_fpw;
+ dst->wal_fpw += add->wal_fpw;
}
void
@@ -256,5 +256,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
{
dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
dst->wal_records += add->wal_records - sub->wal_records;
- dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+ dst->wal_fpw += add->wal_fpw - sub->wal_fpw;
}
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 64439c6819..50d672b270 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -35,7 +35,7 @@ typedef struct BufferUsage
typedef struct WalUsage
{
long wal_records; /* # of WAL records produced */
- long wal_num_fpw; /* # of WAL full page image writes produced */
+ long wal_fpw; /* # of WAL full page writes produced */
uint64 wal_bytes; /* size of WAL records produced */
} WalUsage;
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
commit adding parallel maintenance.
I believe this is resolved for parallel vacuum in master and parallel create
index back to PG11.
I marked this as closed.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781
--
Justin
Le dim. 12 avr. 2020 à 00:33, Justin Pryzby <pryzby@telsasoft.com> a écrit :
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly bufferusage
doesn't reflect parallel workers' activity.
I added an open for that, and adding Robert in Cc as 9da0cc352 is the
first
commit adding parallel maintenance.
I believe this is resolved for parallel vacuum in master and parallel
create
index back to PG11.
indeed, I was about to take care of this too
I marked this as closed.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781
thanks a lot!
On Sun, Apr 12, 2020 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
I see some basic problems with the patch. The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me. Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for. That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
commit adding parallel maintenance.I believe this is resolved for parallel vacuum in master and parallel create
index back to PG11.I marked this as closed.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781
Okay, thanks.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Would you like to send a consolidated patch that includes Euler's
suggestion and Justin's patch (by making changes for points we
discussed.)? I think we can keep the point related to number of
spaces before each field open?Sure, I'll take care of that tomorrow!
I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here. I went with those changes:- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writes generated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another thread
I think it is better to use "full page writes" to be consistent with
other places.
- kept "amount" of WAL bytes
Okay, but I would like to make another change suggested by Justin
which is to replace "count" with "number" at a few places.
I have made the above two changes in the attached. Let me know what
you think about attached?
- no change to the explain output as I have no idea what is the
consensus (one or two spaces, use semicolon or equal, show unit or
not)
Yeah, let's do this separately once we have consensus.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-wal_usage_fixup.patchapplication/octet-stream; name=v2-wal_usage_fixup.patchDownload
From dbbb1636403028fc3d66c4a0929239464ff7caf0 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 13 Apr 2020 11:33:52 +0530
Subject: [PATCH] Cosmetic fixups for WAL usage work.
Reported-by: Justin Pryzby and Euler Taveira
Author: Justin Pryzby and Julien Rouhaud
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql | 2 +-
contrib/pg_stat_statements/pg_stat_statements.c | 6 +++---
doc/src/sgml/pgstatstatements.sgml | 6 +++---
doc/src/sgml/ref/explain.sgml | 4 ++--
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/transam/xlog.c | 2 +-
src/backend/commands/explain.c | 8 ++++----
src/backend/executor/instrument.c | 4 ++--
src/include/executor/instrument.h | 2 +-
9 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 3056657..d0a6c3b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -43,7 +43,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT blk_read_time float8,
OUT blk_write_time float8,
OUT wal_records int8,
- OUT wal_num_fpw int8,
+ OUT wal_fpw int8,
OUT wal_bytes numeric
)
RETURNS SETOF record
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 04abdab..90bc6fd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -189,7 +189,7 @@ typedef struct Counters
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
int64 wal_records; /* # of WAL records generated */
- int64 wal_num_fpw; /* # of WAL full page image records generated */
+ int64 wal_fpw; /* # of WAL full page writes generated */
uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
@@ -1432,7 +1432,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
e->counters.wal_records += walusage->wal_records;
- e->counters.wal_num_fpw += walusage->wal_num_fpw;
+ e->counters.wal_fpw += walusage->wal_fpw;
e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
@@ -1824,7 +1824,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
Datum wal_bytes;
values[i++] = Int64GetDatumFast(tmp.wal_records);
- values[i++] = Int64GetDatumFast(tmp.wal_num_fpw);
+ values[i++] = Int64GetDatumFast(tmp.wal_fpw);
snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 5a962fe..2120fb4 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -278,16 +278,16 @@
<entry><type>bigint</type></entry>
<entry></entry>
<entry>
- Total count of WAL records generated by the statement
+ Total number of WAL records generated by the statement
</entry>
</row>
<row>
- <entry><structfield>wal_num_fpw</structfield></entry>
+ <entry><structfield>wal_fpw</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>
- Total count of WAL full page writes generated by the statement
+ Total number of WAL full page writes generated by the statement
</entry>
</row>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 024ede4..aedd70a 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page image records and amount of WAL
- bytes generated. In text format, only non-zero values are printed. This
+ number of records, number of full page writes and amount of WAL bytes
+ generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d3..d8bc06f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -676,7 +676,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
_("WAL usage: %ld records, %ld full page writes, "
UINT64_FORMAT " bytes"),
walusage.wal_records,
- walusage.wal_num_fpw,
+ walusage.wal_fpw,
walusage.wal_bytes);
ereport(LOG,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c38bc14..11e3273 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1258,7 +1258,7 @@ XLogInsertRecord(XLogRecData *rdata,
{
pgWalUsage.wal_bytes += rechdr->xl_tot_len;
pgWalUsage.wal_records++;
- pgWalUsage.wal_num_fpw += num_fpw;
+ pgWalUsage.wal_fpw += num_fpw;
}
return EndPos;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f3c8da1..7ae6131 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3343,7 +3343,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (es->format == EXPLAIN_FORMAT_TEXT)
{
/* Show only positive counter values. */
- if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) ||
+ if ((usage->wal_records > 0) || (usage->wal_fpw > 0) ||
(usage->wal_bytes > 0))
{
ExplainIndentText(es);
@@ -3352,9 +3352,9 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (usage->wal_records > 0)
appendStringInfo(es->str, " records=%ld",
usage->wal_records);
- if (usage->wal_num_fpw > 0)
+ if (usage->wal_fpw > 0)
appendStringInfo(es->str, " full page writes=%ld",
- usage->wal_num_fpw);
+ usage->wal_fpw);
if (usage->wal_bytes > 0)
appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
@@ -3366,7 +3366,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
ExplainPropertyInteger("WAL full page writes", NULL,
- usage->wal_num_fpw, es);
+ usage->wal_fpw, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
}
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 3b9c6ae..7c9d723 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -248,7 +248,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add)
{
dst->wal_bytes += add->wal_bytes;
dst->wal_records += add->wal_records;
- dst->wal_num_fpw += add->wal_num_fpw;
+ dst->wal_fpw += add->wal_fpw;
}
void
@@ -256,5 +256,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
{
dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
dst->wal_records += add->wal_records - sub->wal_records;
- dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw;
+ dst->wal_fpw += add->wal_fpw - sub->wal_fpw;
}
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 64439c6..50d672b 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -35,7 +35,7 @@ typedef struct BufferUsage
typedef struct WalUsage
{
long wal_records; /* # of WAL records produced */
- long wal_num_fpw; /* # of WAL full page image writes produced */
+ long wal_fpw; /* # of WAL full page writes produced */
uint64 wal_bytes; /* size of WAL records produced */
} WalUsage;
--
1.8.3.1
On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here. I went with those changes:- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writes generated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another threadI think it is better to use "full page writes" to be consistent with
other places.- kept "amount" of WAL bytes
Okay, but I would like to make another change suggested by Justin
which is to replace "count" with "number" at a few places.
Ah sorry I missed this one. +1 it also sounds better.
I have made the above two changes in the attached. Let me know what
you think about attached?
It all looks good to me!
- no change to the explain output as I have no idea what is the
consensus (one or two spaces, use semicolon or equal, show unit or
not)Yeah, let's do this separately once we have consensus.
Agreed.
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here. I went with those changes:- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writes generated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another threadI think it is better to use "full page writes" to be consistent with
other places.- kept "amount" of WAL bytes
Okay, but I would like to make another change suggested by Justin
which is to replace "count" with "number" at a few places.Ah sorry I missed this one. +1 it also sounds better.
I have made the above two changes in the attached. Let me know what
you think about attached?It all looks good to me!
Pushed.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Le lun. 13 avr. 2020 à 13:47, Amit Kapila <amit.kapila16@gmail.com> a
écrit :
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here. I went with those changes:- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writesgenerated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another threadI think it is better to use "full page writes" to be consistent with
other places.- kept "amount" of WAL bytes
Okay, but I would like to make another change suggested by Justin
which is to replace "count" with "number" at a few places.Ah sorry I missed this one. +1 it also sounds better.
I have made the above two changes in the attached. Let me know what
you think about attached?It all looks good to me!
Pushed.
Thanks a lot Amit!
Show quoted text
On Wed, Apr 8, 2020 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:We also have existing cases for the other way:
actual time=0.050..0.052
Buffers: shared hit=3 dirtied=1Buffers case is not the same because 'shared' is used for 'hit',
'read', 'dirtied', etc. However, I think it is arguable.The cases mentioned by Justin are not formatted in a key=value format,
so it's not quite the same, but it also raises the question why they are
not.Let's figure out a way to consolidate this without making up a third format.
Sure, I think my intention is to keep the format of WAL stats as close
to Buffers stats as possible because both depict I/O and users would
probably be interested to check/read both together. There is a point
to keep things in a format so that it is easier for someone to parse
but I guess as these as fixed 'words', it shouldn't be difficult
either way and we should give more weightage to consistency. Any
suggestions?
Peter E, others, any suggestions on how to move forward? I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format. It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change. What
do you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 2020-04-14 05:57, Amit Kapila wrote:
Peter E, others, any suggestions on how to move forward? I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format. It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change. What
do you think?
If looks like shortening to fpw= and using one space is the easiest way
to solve this issue.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-04-14 05:57, Amit Kapila wrote:
Peter E, others, any suggestions on how to move forward? I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format. It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change. What
do you think?If looks like shortening to fpw= and using one space is the easiest way
to solve this issue.
I am fine with this approach and will change accordingly. I will wait
for a few days (3-4 days) to see if someone shows up with either an
objection to this or with a better idea for the display of WAL usage
information.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 2020-04-14 05:57, Amit Kapila wrote:
Peter E, others, any suggestions on how to move forward? I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format. It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change. What
do you think?If looks like shortening to fpw= and using one space is the easiest way
to solve this issue.I am fine with this approach and will change accordingly. I will wait
for a few days (3-4 days) to see if someone shows up with either an
objection to this or with a better idea for the display of WAL usage
information.
That was also my preferred alternative. PFA a patch for that. I also
changed to "fpw" for the non textual output for consistency.
Attachments:
v1-fix_explain_wal_output.difftext/x-patch; charset=US-ASCII; name=v1-fix_explain_wal_output.diffDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..9cc1b13b76 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3350,13 +3350,13 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
appendStringInfoString(es->str, "WAL:");
if (usage->wal_records > 0)
- appendStringInfo(es->str, " records=%ld",
+ appendStringInfo(es->str, " records=%ld",
usage->wal_records);
if (usage->wal_fpw > 0)
- appendStringInfo(es->str, " full page writes=%ld",
+ appendStringInfo(es->str, " fpw=%ld",
usage->wal_fpw);
if (usage->wal_bytes > 0)
- appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
appendStringInfoChar(es->str, '\n');
}
@@ -3365,7 +3365,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page writes", NULL,
+ ExplainPropertyInteger("WAL fpw", NULL,
usage->wal_fpw, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
On Sat, Apr 18, 2020 at 05:39:35PM +0200, Julien Rouhaud wrote:
On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-04-14 05:57, Amit Kapila wrote:
Peter E, others, any suggestions on how to move forward? I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format. It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change. What
do you think?If looks like shortening to fpw= and using one space is the easiest way
to solve this issue.I am fine with this approach and will change accordingly. I will wait
for a few days (3-4 days) to see if someone shows up with either an
objection to this or with a better idea for the display of WAL usage
information.That was also my preferred alternative. PFA a patch for that. I also
changed to "fpw" for the non textual output for consistency.
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?
+ ExplainPropertyInteger("WAL fpw", NULL,
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes
"..full page writes (FPW).."
Should we also change vacuumlazy.c for consistency ?
+ _("WAL usage: %ld records, %ld full page writes, "
+ UINT64_FORMAT " bytes"),
--
Justin
Hi Justin,
Thanks for the review!
On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?+ ExplainPropertyInteger("WAL fpw", NULL,
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
Should we also change vacuumlazy.c for consistency ?
+ _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"),
I don't think this one should be changed, vacuumlazy output is already
entirely different, and is way more verbose so keeping it as is makes
sense to me.
Attachments:
v2-fix_explain_wal_output.difftext/x-patch; charset=US-ASCII; name=v2-fix_explain_wal_output.diffDownload
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index aedd70a6ad..e5b50d3790 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page writes and amount of WAL bytes
- generated. In text format, only non-zero values are printed. This
+ number of records, number of full page writes (fpw) and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..9cc1b13b76 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3350,13 +3350,13 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
appendStringInfoString(es->str, "WAL:");
if (usage->wal_records > 0)
- appendStringInfo(es->str, " records=%ld",
+ appendStringInfo(es->str, " records=%ld",
usage->wal_records);
if (usage->wal_fpw > 0)
- appendStringInfo(es->str, " full page writes=%ld",
+ appendStringInfo(es->str, " fpw=%ld",
usage->wal_fpw);
if (usage->wal_bytes > 0)
- appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
appendStringInfoChar(es->str, '\n');
}
@@ -3365,7 +3365,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page writes", NULL,
+ ExplainPropertyInteger("WAL fpw", NULL,
usage->wal_fpw, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
Hi Justin,
Thanks for the review!
On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?+ ExplainPropertyInteger("WAL fpw", NULL,
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.
One space follwed by an acronym looks perfect. I'd prefer capital
letters but small-letters also works well.
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.
I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..
Should we also change vacuumlazy.c for consistency ?
+ _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"),I don't think this one should be changed, vacuumlazy output is already
entirely different, and is way more verbose so keeping it as is makes
sense to me.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
Hi Justin,
Thanks for the review!
On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?+ ExplainPropertyInteger("WAL fpw", NULL,
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.
I think we can keep upper-case for all non-text ones in case of WAL
usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer
usage seems to be following a similar convention.
One space follwed by an acronym looks perfect. I'd prefer capital
letters but small-letters also works well.And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..
I don't understand this. Where are we using such a description of fpw?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..I don't understand this. Where are we using such a description of fpw?
I suggested to add " (FPW)" to the new docs for "explain(wal)"
But, the documentation before this commit mostly refers to "full page images".
So the implication is that maybe we should use that language (and FPI acronym).
The only pre-existing use of "full page writes" seems to be here:
$ git grep -iC2 'full page write' origin doc
origin:doc/src/sgml/wal.sgml- Internal data structures such as <filename>pg_xact</filename>, <filename>pg_subtrans</filename>, <filename>pg_multixact</filename>,
origin:doc/src/sgml/wal.sgml- <filename>pg_serial</filename>, <filename>pg_notify</filename>, <filename>pg_stat</filename>, <filename>pg_snapshots</filename> are not directly
origin:doc/src/sgml/wal.sgml: checksummed, nor are pages protected by full page writes. However, where
And we're not using either acronym.
--
Justin
On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..I don't understand this. Where are we using such a description of fpw?
I suggested to add " (FPW)" to the new docs for "explain(wal)"
But, the documentation before this commit mostly refers to "full page images".
So the implication is that maybe we should use that language (and FPI acronym).
I am not sure if it matters that much. I think we can use "full page
writes (FPW)" in this case but we should be consistent wherever we
refer it in the WAL usage context and I think we already are, if not
then let's be consistent.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
Hi Justin,
Thanks for the review!
On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?+ ExplainPropertyInteger("WAL fpw", NULL,
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.I think we can keep upper-case for all non-text ones in case of WAL
usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer
usage seems to be following a similar convention.
The attached patch changed the non-text display format as mentioned.
Let me know if you have any comments?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-fix_explain_wal_output.patchapplication/octet-stream; name=v3-fix_explain_wal_output.patchDownload
From d4d9d2dee145cafccc286093d0c3c8e5c68622e7 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 23 Apr 2020 10:43:04 +0530
Subject: [PATCH] Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain. The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field. Change the format to make WAL usage
statistics consistent with Buffer usage statistics.
Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
doc/src/sgml/ref/explain.sgml | 4 ++--
src/backend/commands/explain.c | 12 ++++++------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index aedd70a..e5b50d3 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page writes and amount of WAL bytes
- generated. In text format, only non-zero values are printed. This
+ number of records, number of full page writes (fpw) and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131..cfd0df3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3350,24 +3350,24 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
appendStringInfoString(es->str, "WAL:");
if (usage->wal_records > 0)
- appendStringInfo(es->str, " records=%ld",
+ appendStringInfo(es->str, " records=%ld",
usage->wal_records);
if (usage->wal_fpw > 0)
- appendStringInfo(es->str, " full page writes=%ld",
+ appendStringInfo(es->str, " fpw=%ld",
usage->wal_fpw);
if (usage->wal_bytes > 0)
- appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
appendStringInfoChar(es->str, '\n');
}
}
else
{
- ExplainPropertyInteger("WAL records", NULL,
+ ExplainPropertyInteger("WAL Records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page writes", NULL,
+ ExplainPropertyInteger("WAL FPW", NULL,
usage->wal_fpw, es);
- ExplainPropertyUInteger("WAL bytes", NULL,
+ ExplainPropertyUInteger("WAL Bytes", NULL,
usage->wal_bytes, es);
}
}
--
1.8.3.1
On Wed, Apr 22, 2020 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
And add the acronym to the docs:
$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes"..full page writes (FPW).."
Indeed! Fixed (using lowercase to match current output).
I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..I don't understand this. Where are we using such a description of fpw?
I suggested to add " (FPW)" to the new docs for "explain(wal)"
But, the documentation before this commit mostly refers to "full page images".
So the implication is that maybe we should use that language (and FPI acronym).I am not sure if it matters that much. I think we can use "full page
writes (FPW)" in this case but we should be consistent wherever we
refer it in the WAL usage context and I think we already are, if not
then let's be consistent.
I agree that full page writes can be used in this case, but I'm
wondering if that can be misleading for some reader which might e.g.
confuse with the full_page_writes GUC. And as Justin pointed out, the
documentation for now usually mentions "full page image(s)" in such
cases.
On Thu, Apr 23, 2020 at 7:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
Hi Justin,
Thanks for the review!
On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Should capitalize at least the non-text one ? And maybe the text one for
consistency ?+ ExplainPropertyInteger("WAL fpw", NULL,
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.I think we can keep upper-case for all non-text ones in case of WAL
usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer
usage seems to be following a similar convention.The attached patch changed the non-text display format as mentioned.
Let me know if you have any comments?
Assuming that we're fine using full page write(s) / FPW rather than
full page image(s) / FPI (see previous mail), I'm fine with this
patch.
At Thu, 23 Apr 2020 07:33:13 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
I think we should keep both version consistent, whether lower or upper
case. The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.I think we can keep upper-case for all non-text ones in case of WAL
usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer
usage seems to be following a similar convention.The attached patch changed the non-text display format as mentioned.
Let me know if you have any comments?Assuming that we're fine using full page write(s) / FPW rather than
full page image(s) / FPI (see previous mail), I'm fine with this
patch.
FWIW, I like FPW, and the patch looks good to me. The index in the
documentation has the entry for full_page_writes (having underscores)
and it would work.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 2020-04-23 07:31, Julien Rouhaud wrote:
I agree that full page writes can be used in this case, but I'm
wondering if that can be misleading for some reader which might e.g.
confuse with the full_page_writes GUC. And as Justin pointed out, the
documentation for now usually mentions "full page image(s)" in such
cases.
ISTM that in the context of this patch, "full-page image" is correct. A
"full-page write" is what you do to a table or index page when you are
recovering a full-page image. The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-04-23 07:31, Julien Rouhaud wrote:
I agree that full page writes can be used in this case, but I'm
wondering if that can be misleading for some reader which might e.g.
confuse with the full_page_writes GUC. And as Justin pointed out, the
documentation for now usually mentions "full page image(s)" in such
cases.ISTM that in the context of this patch, "full-page image" is correct. A
"full-page write" is what you do to a table or index page when you are
recovering a full-page image.
So what do we call when we log the page after it is touched after
checkpoint? I thought we call that as full-page write.
The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".
That is just one way/reason we log the page. There are others as
well. I thought here we are computing the number of full-page writes
happened in the system due to various reasons like (a) a page is
operated upon first time after the checkpoint, (b) log the XLOG_FPI
record, (c) Guc for WAL consistency checker is on, etc. If we see in
XLogRecordAssemble where we decide to log this information, there is a
comment " .... log a full-page write for the current block." and there
was an existing variable with 'fpw_lsn' which indicates to an extent
that what we are computing in this patch is full-page writes. But
there is a reference to full-page image as well. I think as
full_page_writes is an exposed variable that is well understood so
exposing information with similar name via this patch doesn't sound
illogical to me. Whatever we use here we need to be consistent all
throughout, even pg_stat_statements need to name exposed variable as
wal_fpi instead of wal_fpw.
To me, full-page writes sound more appealing with other WAL usage
variables like records and bytes. I might be more used to this term as
'fpw' that is why it occurred better to me. OTOH, if most of us think
that a full-page image is better suited here, I am fine with changing
it at all places.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".That is just one way/reason we log the page. There are others as
well. I thought here we are computing the number of full-page writes
happened in the system due to various reasons like (a) a page is
operated upon first time after the checkpoint, (b) log the XLOG_FPI
record, (c) Guc for WAL consistency checker is on, etc. If we see in
XLogRecordAssemble where we decide to log this information, there is a
comment " .... log a full-page write for the current block." and there
was an existing variable with 'fpw_lsn' which indicates to an extent
that what we are computing in this patch is full-page writes. But
there is a reference to full-page image as well. I think as
full_page_writes is an exposed variable that is well understood so
exposing information with similar name via this patch doesn't sound
illogical to me. Whatever we use here we need to be consistent all
throughout, even pg_stat_statements need to name exposed variable as
wal_fpi instead of wal_fpw.To me, full-page writes sound more appealing with other WAL usage
variables like records and bytes. I might be more used to this term as
'fpw' that is why it occurred better to me. OTOH, if most of us think
that a full-page image is better suited here, I am fine with changing
it at all places.
Julien, Peter, others do you have any opinion here? I think it is
better if we decide on one of FPW or FPI and make the changes at all
places for this patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".Julien, Peter, others do you have any opinion here? I think it is
better if we decide on one of FPW or FPI and make the changes at all
places for this patch.
It seems to me that Peter is right here. A full-page write is the
action to write a full-page image, so if you consider only a way to
define the static data of a full-page and/or a quantity associated to
it, we should talk about full-page images.
--
Michael
On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".Julien, Peter, others do you have any opinion here? I think it is
better if we decide on one of FPW or FPI and make the changes at all
places for this patch.It seems to me that Peter is right here. A full-page write is the
action to write a full-page image, so if you consider only a way to
define the static data of a full-page and/or a quantity associated to
it, we should talk about full-page images.
I agree with that definition. I can send a cleanup patch if there's
no objection.
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
The internal symbol for the WAL record is
XLOG_FPI and xlogdesc.c prints it as "FPI".Julien, Peter, others do you have any opinion here? I think it is
better if we decide on one of FPW or FPI and make the changes at all
places for this patch.It seems to me that Peter is right here. A full-page write is the
action to write a full-page image, so if you consider only a way to
define the static data of a full-page and/or a quantity associated to
it, we should talk about full-page images.
Fair enough, if more people want full-page image terminology in this
context then we can do that.
I agree with that definition. I can send a cleanup patch if there's
no objection.
Okay, feel free to send the patch. Thanks for taking the initiative
to write a patch for this.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I agree with that definition. I can send a cleanup patch if there's
no objection.Okay, feel free to send the patch. Thanks for taking the initiative
to write a patch for this.
Julien, are you planning to write a cleanup patch for this open item?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I agree with that definition. I can send a cleanup patch if there's
no objection.Okay, feel free to send the patch. Thanks for taking the initiative
to write a patch for this.Julien, are you planning to write a cleanup patch for this open item?
Sorry Amit, I've been quite busy at work for the last couple of days.
I'll take care of that this morning for sure!
On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I agree with that definition. I can send a cleanup patch if there's
no objection.Okay, feel free to send the patch. Thanks for taking the initiative
to write a patch for this.Julien, are you planning to write a cleanup patch for this open item?
Sorry Amit, I've been quite busy at work for the last couple of days.
I'll take care of that this morning for sure!
Here's the patch. I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).
Attachments:
v4-fix_wal_usage.diffapplication/octet-stream; name=v4-fix_wal_usage.diffDownload
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index d0a6c3b4fc..0f63f08f7e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -43,7 +43,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT blk_read_time float8,
OUT blk_write_time float8,
OUT wal_records int8,
- OUT wal_fpw int8,
+ OUT wal_fpi int8,
OUT wal_bytes numeric
)
RETURNS SETOF record
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 90bc6fd013..8b7983301d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -189,7 +189,7 @@ typedef struct Counters
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
int64 wal_records; /* # of WAL records generated */
- int64 wal_fpw; /* # of WAL full page writes generated */
+ int64 wal_fpi; /* # of WAL full page images generated */
uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
@@ -1432,7 +1432,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
e->counters.wal_records += walusage->wal_records;
- e->counters.wal_fpw += walusage->wal_fpw;
+ e->counters.wal_fpi += walusage->wal_fpi;
e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
@@ -1824,7 +1824,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
Datum wal_bytes;
values[i++] = Int64GetDatumFast(tmp.wal_records);
- values[i++] = Int64GetDatumFast(tmp.wal_fpw);
+ values[i++] = Int64GetDatumFast(tmp.wal_fpi);
snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 2120fb4c3f..b1d5f3d1dc 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -283,11 +283,11 @@
</row>
<row>
- <entry><structfield>wal_fpw</structfield></entry>
+ <entry><structfield>wal_fpi</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>
- Total number of WAL full page writes generated by the statement
+ Total number of WAL full page images generated by the statement
</entry>
</row>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index aedd70a6ad..c6f333c3c9 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page writes and amount of WAL bytes
- generated. In text format, only non-zero values are printed. This
+ number of records, number of full page images (fpi) and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3c18db29f1..3bef0e124b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -673,10 +673,10 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
read_rate, write_rate);
appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
appendStringInfo(&buf,
- _("WAL usage: %ld records, %ld full page writes, "
+ _("WAL usage: %ld records, %ld full page images, "
UINT64_FORMAT " bytes"),
walusage.wal_records,
- walusage.wal_fpw,
+ walusage.wal_fpi,
walusage.wal_bytes);
ereport(LOG,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 065eb275b1..0d3d670928 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -998,7 +998,7 @@ XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
uint8 flags,
- int num_fpw)
+ int num_fpi)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1259,7 +1259,7 @@ XLogInsertRecord(XLogRecData *rdata,
{
pgWalUsage.wal_bytes += rechdr->xl_tot_len;
pgWalUsage.wal_records++;
- pgWalUsage.wal_fpw += num_fpw;
+ pgWalUsage.wal_fpi += num_fpi;
}
return EndPos;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 4259309dba..b21679f09e 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -109,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn, int *num_fpw);
+ XLogRecPtr *fpw_lsn, int *num_fpi);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -449,7 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
- int num_fpw = 0;
+ int num_fpi = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -459,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn, &num_fpw);
+ &fpw_lsn, &num_fpi);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpi);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -484,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn, int *num_fpw)
+ XLogRecPtr *fpw_lsn, int *num_fpi)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -638,7 +638,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
/* Report a full page image constructed for the WAL record */
- *num_fpw += 1;
+ *num_fpi += 1;
/*
* Construct XLogRecData entries for the page content.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..09256cef19 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3343,20 +3343,20 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (es->format == EXPLAIN_FORMAT_TEXT)
{
/* Show only positive counter values. */
- if ((usage->wal_records > 0) || (usage->wal_fpw > 0) ||
+ if ((usage->wal_records > 0) || (usage->wal_fpi > 0) ||
(usage->wal_bytes > 0))
{
ExplainIndentText(es);
appendStringInfoString(es->str, "WAL:");
if (usage->wal_records > 0)
- appendStringInfo(es->str, " records=%ld",
+ appendStringInfo(es->str, " records=%ld",
usage->wal_records);
- if (usage->wal_fpw > 0)
- appendStringInfo(es->str, " full page writes=%ld",
- usage->wal_fpw);
+ if (usage->wal_fpi > 0)
+ appendStringInfo(es->str, " fpi=%ld",
+ usage->wal_fpi);
if (usage->wal_bytes > 0)
- appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
appendStringInfoChar(es->str, '\n');
}
@@ -3365,8 +3365,8 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
{
ExplainPropertyInteger("WAL records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page writes", NULL,
- usage->wal_fpw, es);
+ ExplainPropertyInteger("WAL FPI", NULL,
+ usage->wal_fpi, es);
ExplainPropertyUInteger("WAL bytes", NULL,
usage->wal_bytes, es);
}
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 7c9d723552..fbedb5aaf6 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -248,7 +248,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add)
{
dst->wal_bytes += add->wal_bytes;
dst->wal_records += add->wal_records;
- dst->wal_fpw += add->wal_fpw;
+ dst->wal_fpi += add->wal_fpi;
}
void
@@ -256,5 +256,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
{
dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
dst->wal_records += add->wal_records - sub->wal_records;
- dst->wal_fpw += add->wal_fpw - sub->wal_fpw;
+ dst->wal_fpi += add->wal_fpi - sub->wal_fpi;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0a12afb59e..e917dfe92d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -280,7 +280,7 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
uint8 flags,
- int num_fpw);
+ int num_fpi);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 50d672b270..464172e696 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -35,7 +35,7 @@ typedef struct BufferUsage
typedef struct WalUsage
{
long wal_records; /* # of WAL records produced */
- long wal_fpw; /* # of WAL full page writes produced */
+ long wal_fpi; /* # of WAL full page images produced */
uint64 wal_bytes; /* size of WAL records produced */
} WalUsage;
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Julien, are you planning to write a cleanup patch for this open item?
Sorry Amit, I've been quite busy at work for the last couple of days.
I'll take care of that this morning for sure!Here's the patch.
Thanks for the patch. I will look into it early next week.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Here's the patch. I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).
Your patch looks mostly good to me. I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.
Let me know what do you think of attached?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v5-fix_wal_usage.patchapplication/octet-stream; name=v5-fix_wal_usage.patchDownload
From 812952e8a330164800bf6368b4eb1a5960dcca2f Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 4 May 2020 09:30:13 +0530
Subject: [PATCH] Change the display of WAL usage statistics in Explain.
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain. The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field. Change the format to make WAL usage
statistics consistent with Buffer usage statistics.
This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.
Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
---
.../pg_stat_statements--1.7--1.8.sql | 2 +-
contrib/pg_stat_statements/pg_stat_statements.c | 6 +++---
doc/src/sgml/pgstatstatements.sgml | 4 ++--
doc/src/sgml/ref/explain.sgml | 4 ++--
src/backend/access/heap/vacuumlazy.c | 4 ++--
src/backend/access/transam/xlog.c | 4 ++--
src/backend/access/transam/xloginsert.c | 12 ++++++------
src/backend/commands/explain.c | 20 ++++++++++----------
src/backend/executor/instrument.c | 4 ++--
src/include/access/xlog.h | 2 +-
src/include/executor/instrument.h | 2 +-
11 files changed, 32 insertions(+), 32 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index d0a6c3b..0f63f08 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -43,7 +43,7 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
OUT blk_read_time float8,
OUT blk_write_time float8,
OUT wal_records int8,
- OUT wal_fpw int8,
+ OUT wal_fpi int8,
OUT wal_bytes numeric
)
RETURNS SETOF record
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 90bc6fd..4ce25fb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -189,7 +189,7 @@ typedef struct Counters
double blk_write_time; /* time spent writing, in msec */
double usage; /* usage factor */
int64 wal_records; /* # of WAL records generated */
- int64 wal_fpw; /* # of WAL full page writes generated */
+ int64 wal_fpi; /* # of WAL full page images generated */
uint64 wal_bytes; /* total amount of WAL bytes generated */
} Counters;
@@ -1432,7 +1432,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time);
e->counters.usage += USAGE_EXEC(total_time);
e->counters.wal_records += walusage->wal_records;
- e->counters.wal_fpw += walusage->wal_fpw;
+ e->counters.wal_fpi += walusage->wal_fpi;
e->counters.wal_bytes += walusage->wal_bytes;
SpinLockRelease(&e->mutex);
@@ -1824,7 +1824,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
Datum wal_bytes;
values[i++] = Int64GetDatumFast(tmp.wal_records);
- values[i++] = Int64GetDatumFast(tmp.wal_fpw);
+ values[i++] = Int64GetDatumFast(tmp.wal_fpi);
snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 2120fb4..b1d5f3d 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -283,11 +283,11 @@
</row>
<row>
- <entry><structfield>wal_fpw</structfield></entry>
+ <entry><structfield>wal_fpi</structfield></entry>
<entry><type>bigint</type></entry>
<entry></entry>
<entry>
- Total number of WAL full page writes generated by the statement
+ Total number of WAL full page images generated by the statement
</entry>
</row>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index aedd70a..c6f333c 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,8 +198,8 @@ ROLLBACK;
<listitem>
<para>
Include information on WAL record generation. Specifically, include the
- number of records, number of full page writes and amount of WAL bytes
- generated. In text format, only non-zero values are printed. This
+ number of records, number of full page images (fpi) and amount of WAL
+ bytes generated. In text format, only non-zero values are printed. This
parameter may only be used when <literal>ANALYZE</literal> is also
enabled. It defaults to <literal>FALSE</literal>.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3c18db2..3bef0e1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -673,10 +673,10 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
read_rate, write_rate);
appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
appendStringInfo(&buf,
- _("WAL usage: %ld records, %ld full page writes, "
+ _("WAL usage: %ld records, %ld full page images, "
UINT64_FORMAT " bytes"),
walusage.wal_records,
- walusage.wal_fpw,
+ walusage.wal_fpi,
walusage.wal_bytes);
ereport(LOG,
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 065eb27..0d3d670 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -998,7 +998,7 @@ XLogRecPtr
XLogInsertRecord(XLogRecData *rdata,
XLogRecPtr fpw_lsn,
uint8 flags,
- int num_fpw)
+ int num_fpi)
{
XLogCtlInsert *Insert = &XLogCtl->Insert;
pg_crc32c rdata_crc;
@@ -1259,7 +1259,7 @@ XLogInsertRecord(XLogRecData *rdata,
{
pgWalUsage.wal_bytes += rechdr->xl_tot_len;
pgWalUsage.wal_records++;
- pgWalUsage.wal_fpw += num_fpw;
+ pgWalUsage.wal_fpi += num_fpi;
}
return EndPos;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 4259309..b21679f 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -109,7 +109,7 @@ static MemoryContext xloginsert_cxt;
static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn, int *num_fpw);
+ XLogRecPtr *fpw_lsn, int *num_fpi);
static bool XLogCompressBackupBlock(char *page, uint16 hole_offset,
uint16 hole_length, char *dest, uint16 *dlen);
@@ -449,7 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
bool doPageWrites;
XLogRecPtr fpw_lsn;
XLogRecData *rdt;
- int num_fpw = 0;
+ int num_fpi = 0;
/*
* Get values needed to decide whether to do full-page writes. Since
@@ -459,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn, &num_fpw);
+ &fpw_lsn, &num_fpi);
- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpi);
} while (EndPos == InvalidXLogRecPtr);
XLogResetInsertion();
@@ -484,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info)
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
- XLogRecPtr *fpw_lsn, int *num_fpw)
+ XLogRecPtr *fpw_lsn, int *num_fpi)
{
XLogRecData *rdt;
uint32 total_len = 0;
@@ -638,7 +638,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
/* Report a full page image constructed for the WAL record */
- *num_fpw += 1;
+ *num_fpi += 1;
/*
* Construct XLogRecData entries for the page content.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131..1275bec 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3343,31 +3343,31 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
if (es->format == EXPLAIN_FORMAT_TEXT)
{
/* Show only positive counter values. */
- if ((usage->wal_records > 0) || (usage->wal_fpw > 0) ||
+ if ((usage->wal_records > 0) || (usage->wal_fpi > 0) ||
(usage->wal_bytes > 0))
{
ExplainIndentText(es);
appendStringInfoString(es->str, "WAL:");
if (usage->wal_records > 0)
- appendStringInfo(es->str, " records=%ld",
+ appendStringInfo(es->str, " records=%ld",
usage->wal_records);
- if (usage->wal_fpw > 0)
- appendStringInfo(es->str, " full page writes=%ld",
- usage->wal_fpw);
+ if (usage->wal_fpi > 0)
+ appendStringInfo(es->str, " fpi=%ld",
+ usage->wal_fpi);
if (usage->wal_bytes > 0)
- appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
usage->wal_bytes);
appendStringInfoChar(es->str, '\n');
}
}
else
{
- ExplainPropertyInteger("WAL records", NULL,
+ ExplainPropertyInteger("WAL Records", NULL,
usage->wal_records, es);
- ExplainPropertyInteger("WAL full page writes", NULL,
- usage->wal_fpw, es);
- ExplainPropertyUInteger("WAL bytes", NULL,
+ ExplainPropertyInteger("WAL FPI", NULL,
+ usage->wal_fpi, es);
+ ExplainPropertyUInteger("WAL Bytes", NULL,
usage->wal_bytes, es);
}
}
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 7c9d723..fbedb5a 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -248,7 +248,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add)
{
dst->wal_bytes += add->wal_bytes;
dst->wal_records += add->wal_records;
- dst->wal_fpw += add->wal_fpw;
+ dst->wal_fpi += add->wal_fpi;
}
void
@@ -256,5 +256,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub)
{
dst->wal_bytes += add->wal_bytes - sub->wal_bytes;
dst->wal_records += add->wal_records - sub->wal_records;
- dst->wal_fpw += add->wal_fpw - sub->wal_fpw;
+ dst->wal_fpi += add->wal_fpi - sub->wal_fpi;
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0a12afb..e917dfe 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -280,7 +280,7 @@ struct XLogRecData;
extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata,
XLogRecPtr fpw_lsn,
uint8 flags,
- int num_fpw);
+ int num_fpi);
extern void XLogFlush(XLogRecPtr RecPtr);
extern bool XLogBackgroundFlush(void);
extern bool XLogNeedsFlush(XLogRecPtr RecPtr);
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 50d672b..a97562e 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -35,7 +35,7 @@ typedef struct BufferUsage
typedef struct WalUsage
{
long wal_records; /* # of WAL records produced */
- long wal_fpw; /* # of WAL full page writes produced */
+ long wal_fpi; /* # of WAL full page images produced */
uint64 wal_bytes; /* size of WAL records produced */
} WalUsage;
--
1.8.3.1
On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Here's the patch. I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).Your patch looks mostly good to me. I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.Let me know what do you think of attached?
Thanks a lot Amit. It looks perfect to me!
On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Here's the patch. I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).Your patch looks mostly good to me. I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.Let me know what do you think of attached?
Thanks a lot Amit. It looks perfect to me!
Pushed.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Here's the patch. I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).Your patch looks mostly good to me. I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.Let me know what do you think of attached?
Thanks a lot Amit. It looks perfect to me!
Pushed.
Thanks!
On Wed, May 6, 2020 at 12:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Your patch looks mostly good to me. I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.Let me know what do you think of attached?
Thanks a lot Amit. It looks perfect to me!
Pushed.
Thanks!
I have updated the open items page to reflect this commit [1]https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items.
[1]: https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com