Is WAL_DEBUG related code still relevant today?
Hi,
I was recently looking at the code around the WAL_DEBUG macro and GUC.
When enabled, the code does the following:
1. Creates a memory context that allows pallocs within critical sections.
2. Decodes (not logical decoding but DecodeXLogRecord()) every WAL
record using the above memory context that's generated in the server
and emits a LOG message.
3. Emits messages at DEBUG level in AdvanceXLInsertBuffer(), at LOG
level in XLogFlush(), at LOG level in XLogBackgroundFlush().
4. Emits messages at LOG level for every record that the server
replays/applies in the main redo loop.
I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used. After fixing the compilation issue [1]diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ca7100d4db..52633793d4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1023,8 +1023,12 @@ XLogInsertRecord(XLogRecData *rdata,, the TAP tests started to
fail [2]src/test/subscription/t/029_on_error.pl because the test gets LSN from an error context message emitted to server logs which the new WAL_DEBUG LOG messages flood the server logs with. src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are emitted to the console while initdb. which I'm sure we can fix.
I started to think if this code is needed at all in production. How
about we do either of the following?
a) Remove the WAL_DEBUG macro and move all the code under the
wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
the users will know the consequences of enabling it in production.
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.
I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).
I'm in favour of option (b), but I'd like to hear more thoughts on this.
[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index ca7100d4db..52633793d4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1023,8 +1023,12 @@ XLogInsertRecord(XLogRecData *rdata,
palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
if (!debug_reader)
- debug_reader =
XLogReaderAllocate(wal_segment_size, NULL,
-
XL_ROUTINE(), NULL);
+ debug_reader = XLogReaderAllocate(wal_segment_size,
+
NULL,
+
XL_ROUTINE(.page_read = NULL,
+
.segment_open = NULL,
+
.segment_close = NULL),
+
NULL);
[2]: src/test/subscription/t/029_on_error.pl because the test gets LSN from an error context message emitted to server logs which the new WAL_DEBUG LOG messages flood the server logs with. src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are emitted to the console while initdb.
src/test/subscription/t/029_on_error.pl because the test gets LSN from
an error context message emitted to server logs which the new
WAL_DEBUG LOG messages flood the server logs with.
src/bin/initdb/t/001_initdb.pl because the WAL_DEBUG LOG messages are
emitted to the console while initdb.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
I started to think if this code is needed at all in production. How
about we do either of the following?
Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production. But that doesn't mean we should remove it.
a) Remove the WAL_DEBUG macro and move all the code under the
wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
the users will know the consequences of enabling it in production.
I think the key to this option is verifying there's no measurable
performance impact.
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.
Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?
I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).
+1 for at least fixing the code and tests, provided we decide to keep it.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
I started to think if this code is needed at all in production. How
about we do either of the following?
Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production. But that doesn't mean we should remove it.
Agreed, production is not the question here. The question is whether
it's of any use to developers either. It looks to me that the code's
been broken since v13, if not before, which very strongly suggests
that nobody is using it. Think I'd vote for nuking it rather than
putting effort into fixing it.
regards, tom lane
On Sun, Dec 3, 2023 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
I started to think if this code is needed at all in production. How
about we do either of the following?Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production. But that doesn't mean we should remove it.Agreed, production is not the question here. The question is whether
it's of any use to developers either. It looks to me that the code's
been broken since v13, if not before, which very strongly suggests
that nobody is using it. Think I'd vote for nuking it rather than
putting effort into fixing it.
How about something like the attached? Please see the commit message
for more detailed information.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Remove-WAL_DEBUG-related-code.patchapplication/x-patch; name=v1-0001-Remove-WAL_DEBUG-related-code.patchDownload
From 9963f25be8c8400365f592c0e0b125f4dd62d3d6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Sun, 3 Dec 2023 14:05:02 +0000
Subject: [PATCH v1] Remove WAL_DEBUG related code
At a high-level the code under WAL_DEBUG macro does the following:
1. Decodes (not logical decoding but DecodeXLogRecord()) every
generated WAL record using a special memory context and emits the
decoded info as a LOG message.
2. Emits messages at DEBUG level in AdvanceXLInsertBuffer(), at
LOG level in XLogFlush(), at LOG level in XLogBackgroundFlush().
3. Emits messages at LOG level for every record that the server
replays in the main redo loop.
Decoding of generated WAL records (1) if needed can be done by
either pg_walinspect or pg_waldump. The main redo apply loop is
anyway generating a LOG record with LSN of the record being
applied at certain intervals, so (3) is not needed.
Before this commit, compilation of the source code was failing
with WAL_DEBUG enabled. If compilation issues were fixed, some of
the TAP tests were failing.
With no real use of this code in production, instead of fixing
compilation issues and TAP test failures to maintain the code,
this commit removes code related to both WAL_DEBUG macro and
wal_debug GUC, converts the messages (2) to DEBUG2 level and
moves before and after page comparison check in generic_xlog.c
under USE_ASSERT_CHECKING macro.
---
doc/src/sgml/config.sgml | 16 ---
src/backend/access/transam/generic_xlog.c | 8 +-
src/backend/access/transam/xlog.c | 128 +++-------------------
src/backend/access/transam/xlogrecovery.c | 38 -------
src/backend/utils/misc/guc_tables.c | 13 ---
src/include/access/xlog.h | 4 -
src/include/pg_config_manual.h | 6 -
src/test/recovery/t/018_wal_optimize.pl | 1 -
8 files changed, 20 insertions(+), 194 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 94d1eb2b81..3c4c93cf1b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11446,22 +11446,6 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
</listitem>
</varlistentry>
- <varlistentry id="guc-wal-debug" xreflabel="wal_debug">
- <term><varname>wal_debug</varname> (<type>boolean</type>)
- <indexterm>
- <primary><varname>wal_debug</varname> configuration parameter</primary>
- </indexterm>
- </term>
- <listitem>
- <para>
- If on, emit WAL-related debugging output. This parameter is
- only available if the <symbol>WAL_DEBUG</symbol> macro was
- defined when <productname>PostgreSQL</productname> was
- compiled.
- </para>
- </listitem>
- </varlistentry>
-
<varlistentry id="guc-ignore-checksum-failure" xreflabel="ignore_checksum_failure">
<term><varname>ignore_checksum_failure</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index abd9e1c749..75f951daff 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -244,12 +244,12 @@ computeDelta(PageData *pageData, Page curpage, Page targetpage)
targetUpper, BLCKSZ,
curUpper, BLCKSZ);
+#ifdef USE_ASSERT_CHECKING
+
/*
- * If xlog debug is enabled, then check produced delta. Result of delta
- * application to curpage should be equivalent to targetpage.
+ * Check produced delta. Result of delta application to curpage should be
+ * equivalent to targetpage.
*/
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG)
{
PGAlignedBlock tmp;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6526bd4f43..e8d529b609 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -138,11 +138,6 @@ int wal_retrieve_retry_interval = 5000;
int max_slot_wal_keep_size_mb = -1;
int wal_decode_buffer_size = 512 * 1024;
bool track_wal_io_timing = false;
-
-#ifdef WAL_DEBUG
-bool XLOG_DEBUG = false;
-#endif
-
int wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
/*
@@ -644,10 +639,6 @@ static bool updateMinRecoveryPoint = true;
static int MyLockNo = 0;
static bool holdingAllLocks = false;
-#ifdef WAL_DEBUG
-static MemoryContext walDebugCxt = NULL;
-#endif
-
static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
XLogRecPtr EndOfLog,
TimeLineID newTLI);
@@ -997,70 +988,6 @@ XLogInsertRecord(XLogRecData *rdata,
}
}
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG)
- {
- static XLogReaderState *debug_reader = NULL;
- XLogRecord *record;
- DecodedXLogRecord *decoded;
- StringInfoData buf;
- StringInfoData recordBuf;
- char *errormsg = NULL;
- MemoryContext oldCxt;
-
- oldCxt = MemoryContextSwitchTo(walDebugCxt);
-
- initStringInfo(&buf);
- appendStringInfo(&buf, "INSERT @ %X/%X: ", LSN_FORMAT_ARGS(EndPos));
-
- /*
- * We have to piece together the WAL record data from the XLogRecData
- * entries, so that we can pass it to the rm_desc function as one
- * contiguous chunk.
- */
- initStringInfo(&recordBuf);
- for (; rdata != NULL; rdata = rdata->next)
- appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len);
-
- /* We also need temporary space to decode the record. */
- record = (XLogRecord *) recordBuf.data;
- decoded = (DecodedXLogRecord *)
- palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
-
- if (!debug_reader)
- debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
- XL_ROUTINE(), NULL);
-
- if (!debug_reader)
- {
- appendStringInfoString(&buf, "error decoding record: out of memory while allocating a WAL reading processor");
- }
- else if (!DecodeXLogRecord(debug_reader,
- decoded,
- record,
- EndPos,
- &errormsg))
- {
- appendStringInfo(&buf, "error decoding record: %s",
- errormsg ? errormsg : "no error message");
- }
- else
- {
- appendStringInfoString(&buf, " - ");
-
- debug_reader->record = decoded;
- xlog_outdesc(&buf, debug_reader);
- debug_reader->record = NULL;
- }
- elog(LOG, "%s", buf.data);
-
- pfree(decoded);
- pfree(buf.data);
- pfree(recordBuf.data);
- MemoryContextSwitchTo(oldCxt);
- }
-#endif
-
/*
* Update our global variables
*/
@@ -1995,13 +1922,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
}
LWLockRelease(WALBufMappingLock);
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG && npages > 0)
- {
- elog(DEBUG1, "initialized %d pages, up to %X/%X",
- npages, LSN_FORMAT_ARGS(NewPageEndPtr));
- }
-#endif
+ if (npages > 0)
+ ereport(DEBUG2,
+ (errmsg_internal("initialized %d pages, up to %X/%X",
+ npages,
+ LSN_FORMAT_ARGS(NewPageEndPtr))));
}
/*
@@ -2639,13 +2564,11 @@ XLogFlush(XLogRecPtr record)
if (record <= LogwrtResult.Flush)
return;
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG)
- elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
- LSN_FORMAT_ARGS(record),
- LSN_FORMAT_ARGS(LogwrtResult.Write),
- LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
+ ereport(DEBUG2,
+ (errmsg_internal("xlog flush request %X/%X; current write position %X/%X, flush position %X/%X",
+ LSN_FORMAT_ARGS(record),
+ LSN_FORMAT_ARGS(LogwrtResult.Write),
+ LSN_FORMAT_ARGS(LogwrtResult.Flush))));
START_CRIT_SECTION();
@@ -2901,14 +2824,12 @@ XLogBackgroundFlush(void)
WriteRqst.Flush = 0;
}
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG)
- elog(LOG, "xlog bg flush request write %X/%X; flush: %X/%X, current is write %X/%X; flush %X/%X",
- LSN_FORMAT_ARGS(WriteRqst.Write),
- LSN_FORMAT_ARGS(WriteRqst.Flush),
- LSN_FORMAT_ARGS(LogwrtResult.Write),
- LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
+ ereport(DEBUG2,
+ (errmsg_internal("xlog background flush request write position %X/%X, flush position %X/%X; current write position %X/%X, flush position %X/%X",
+ LSN_FORMAT_ARGS(WriteRqst.Write),
+ LSN_FORMAT_ARGS(WriteRqst.Flush),
+ LSN_FORMAT_ARGS(LogwrtResult.Write),
+ LSN_FORMAT_ARGS(LogwrtResult.Flush))));
START_CRIT_SECTION();
@@ -4654,23 +4575,6 @@ XLOGShmemInit(void)
int i;
ControlFileData *localControlFile;
-#ifdef WAL_DEBUG
-
- /*
- * Create a memory context for WAL debugging that's exempt from the normal
- * "no pallocs in critical section" rule. Yes, that can lead to a PANIC if
- * an allocation fails, but wal_debug is not for production use anyway.
- */
- if (walDebugCxt == NULL)
- {
- walDebugCxt = AllocSetContextCreate(TopMemoryContext,
- "WAL Debug",
- ALLOCSET_DEFAULT_SIZES);
- MemoryContextAllowInCriticalSection(walDebugCxt, true);
- }
-#endif
-
-
XLogCtl = (XLogCtlData *)
ShmemInitStruct("XLOG Ctl", XLOGShmemSize(), &foundXLog);
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index c61566666a..c0b6dc67fa 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -396,9 +396,6 @@ static bool read_tablespace_map(List **tablespaces);
static void xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI);
static void CheckRecoveryConsistency(void);
static void rm_redo_error_callback(void *arg);
-#ifdef WAL_DEBUG
-static void xlog_outrec(StringInfo buf, XLogReaderState *record);
-#endif
static void xlog_block_info(StringInfo buf, XLogReaderState *record);
static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
TimeLineID prevTLI, TimeLineID replayTLI);
@@ -1703,25 +1700,6 @@ PerformWalRecovery(void)
ereport_startup_progress("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X",
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr));
-#ifdef WAL_DEBUG
- if (XLOG_DEBUG ||
- (record->xl_rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
- (record->xl_rmid != RM_XACT_ID && trace_recovery_messages <= DEBUG3))
- {
- StringInfoData buf;
-
- initStringInfo(&buf);
- appendStringInfo(&buf, "REDO @ %X/%X; LSN %X/%X: ",
- LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),
- LSN_FORMAT_ARGS(xlogreader->EndRecPtr));
- xlog_outrec(&buf, xlogreader);
- appendStringInfoString(&buf, " - ");
- xlog_outdesc(&buf, xlogreader);
- elog(LOG, "%s", buf.data);
- pfree(buf.data);
- }
-#endif
-
/* Handle interrupt signals of startup process */
HandleStartupProcInterrupts();
@@ -2256,22 +2234,6 @@ xlog_outdesc(StringInfo buf, XLogReaderState *record)
rmgr.rm_desc(buf, record);
}
-#ifdef WAL_DEBUG
-
-static void
-xlog_outrec(StringInfo buf, XLogReaderState *record)
-{
- appendStringInfo(buf, "prev %X/%X; xid %u",
- LSN_FORMAT_ARGS(XLogRecGetPrev(record)),
- XLogRecGetXid(record));
-
- appendStringInfo(buf, "; len %u",
- XLogRecGetDataLen(record));
-
- xlog_block_info(buf, record);
-}
-#endif /* WAL_DEBUG */
-
/*
* Returns a string giving information about all the blocks in an
* XLogRecord.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6474e35ec0..db332a06ba 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1702,19 +1702,6 @@ struct config_bool ConfigureNamesBool[] =
},
#endif
-#ifdef WAL_DEBUG
- {
- {"wal_debug", PGC_SUSET, DEVELOPER_OPTIONS,
- gettext_noop("Emit WAL-related debugging output."),
- NULL,
- GUC_NOT_IN_SAMPLE
- },
- &XLOG_DEBUG,
- false,
- NULL, NULL, NULL
- },
-#endif
-
{
{"integer_datetimes", PGC_INTERNAL, PRESET_OPTIONS,
gettext_noop("Shows whether datetimes are integer based."),
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index a14126d164..50dc6140a5 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -123,10 +123,6 @@ extern PGDLLIMPORT int wal_level;
/* Do we need to WAL-log information required only for logical replication? */
#define XLogLogicalInfoActive() (wal_level >= WAL_LEVEL_LOGICAL)
-#ifdef WAL_DEBUG
-extern PGDLLIMPORT bool XLOG_DEBUG;
-#endif
-
/*
* OR-able request flag bits for checkpoints. The "cause" bits are used only
* for logging purposes. Note: the flags must be defined so that it's
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 8a6e67a445..5b9b01f4b7 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -354,12 +354,6 @@
*/
/* #define LOCK_DEBUG */
-/*
- * Enable debugging print statements for WAL-related operations; see
- * also the wal_debug GUC var.
- */
-/* #define WAL_DEBUG */
-
/*
* Enable tracing of resource consumption during sort operations;
* see also the trace_sort GUC var. For 8.1 this is enabled by default.
diff --git a/src/test/recovery/t/018_wal_optimize.pl b/src/test/recovery/t/018_wal_optimize.pl
index 1d613eaede..bb6082f956 100644
--- a/src/test/recovery/t/018_wal_optimize.pl
+++ b/src/test/recovery/t/018_wal_optimize.pl
@@ -53,7 +53,6 @@ wal_level = $wal_level
max_prepared_transactions = 1
wal_log_hints = on
wal_skip_threshold = 0
-#wal_debug = on
));
$node->start;
--
2.34.1
On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
I started to think if this code is needed at all in production. How
about we do either of the following?Well, the fact that this code is hidden behind an off-by-default macro
seems like a pretty strong indicator that it is not intended for
production. But that doesn't mean we should remove it.
I think all that the WAL_DEBUG code offers can be achieved with other
means after adjusting a few pieces. Please see the commit message in
the v1 patch posted in this thread at
/messages/by-id/CALj2ACW5zPMT09eqXvh2Ve7Kz02HShTwyjG+xTzkrzeKtQMnQQ@mail.gmail.com.
a) Remove the WAL_DEBUG macro and move all the code under the
wal_debug GUC? Since the GUC is already marked as DEVELOPER_OPTION,
the users will know the consequences of enabling it in production.I think the key to this option is verifying there's no measurable
performance impact.
FWIW, enabling this has a huge impact in production. For instance,
recovery TAP tests are ~10% slower with the WAL_DEBUG macro enabled. I
don't think we go the route of keeping this code.
WAL_DEBUG macro enabled:
All tests successful.
Files=38, Tests=531, 157 wallclock secs ( 0.18 usr 0.05 sys + 14.96
cusr 16.11 csys = 31.30 CPU)
Result: PASS
HEAD:
All tests successful.
Files=38, Tests=531, 143 wallclock secs ( 0.15 usr 0.06 sys + 14.24
cusr 15.62 csys = 30.07 CPU)
Result: PASS
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?
I don't think so. The WAL record decoding can be achieved with
pg_walinspect or pg_waldump. The page comparison check in
generic_xlog.c can be moved under USE_ASSERT_CHECKING macro. PSA v1
patch posted in this thread.
I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).+1 for at least fixing the code and tests, provided we decide to keep it.
With no real use of this code in production, instead of fixing
compilation issues and TAP test failures to maintain the code, I think
it's better to adjust a few pieces and remove the other stuff like in
the attached v1 patch.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?I don't think so. The WAL record decoding can be achieved with
pg_walinspect or pg_waldump.
Can be, but the WAL_DEBUG model is mighty convenient:
- Cooperates with backtrace_functions
- Change log_line_prefix to correlate any log_line_prefix fact with WAL records
- See WAL records interleaved with non-WAL log messages
I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix the
I don't use it in production, but I use it more than any other of our many
DEBUG macros.
compilation and test failure issues, and start testing this code
(perhaps I can think of setting up a buildfarm member to help here).
WAL_DEBUG compiles and works just fine on GNU/Linux. I'm not surprised the
failure to compile on Windows has escaped notice, because Windows-specific WAL
behaviors are so rare. We consistently do our WAL-related development on
non-Windows. Needless to say, I wouldn't object to fixing WAL_DEBUG for
Windows.
Fixing tests is less valuable, especially since it's clear when a test fails
through extra messages the test didn't expect. I bet other DEBUG macros make
some tests fail that way, which doesn't devalue those macros. A test patch
might be okay nonetheless, but a buildfarm member is more likely to have
negative value. It would create urgent work. In the hypothetical buildfarm
member's absence, the project would be just fine if that work never happens.
A buildfarm member that compiles but doesn't test could be okay.
On Sun, Dec 03, 2023 at 11:07:05AM -0800, Noah Misch wrote:
Can be, but the WAL_DEBUG model is mighty convenient:
- Cooperates with backtrace_functions
- Change log_line_prefix to correlate any log_line_prefix fact with WAL records
- See WAL records interleaved with non-WAL log messagesI don't use it in production, but I use it more than any other of our many
DEBUG macros.
So do I as a quick workaround to check the validity of records
generated without having to spawn a standby replaying the records.
Since 027_stream_regress.pl exists, I agree that its value has
decreased and that all patches should have queries to check their
records anyway, but it does not make it useless for developers.
Fixing tests is less valuable, especially since it's clear when a test fails
through extra messages the test didn't expect. I bet other DEBUG macros make
some tests fail that way, which doesn't devalue those macros. A test patch
might be okay nonetheless, but a buildfarm member is more likely to have
negative value. It would create urgent work. In the hypothetical buildfarm
member's absence, the project would be just fine if that work never happens.
A buildfarm member that compiles but doesn't test could be okay.
I can add the flag in one of my nix animals if we don't have any to
provide minimal coverage, that's not an issue for me. I'd suggest to
just fix the build on Windows, this flag is a low maintenance burden.
--
Michael
On Mon, Dec 04, 2023 at 10:14:36AM +0900, Michael Paquier wrote:
I can add the flag in one of my nix animals if we don't have any to
provide minimal coverage, that's not an issue for me. I'd suggest to
just fix the build on Windows, this flag is a low maintenance burden.
Hearing nothing about that, I've reproduced the failure, checked that
the proposed fix is OK, and applied it down to 13 where this was
introduced.
Regarding the tests, like Noah, I am not really sure that it is worth
spending resources on fixing as they'd require wal_debug = on to
break.
--
Michael
On 02.12.23 15:06, Bharath Rupireddy wrote:
I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used.
This kind of thing could be mostly avoided if we didn't hide all the
WAL_DEBUG behind #ifdefs. For example, in the attached patch, I instead
changed it so that
if (XLOG_DEBUG)
resolves to
if (false)
in the normal case. That way, we don't need to wrap that in #ifdef
WAL_DEBUG, and the compiler can see the disabled code and make sure it
continues to build.
Attachments:
0001-Make-WAL_DEBUG-code-harder-to-break-accidentally.patchtext/plain; charset=UTF-8; name=0001-Make-WAL_DEBUG-code-harder-to-break-accidentally.patchDownload
From 93c74462c3e253f3ce9fd1beba1f09290988de8c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 6 Dec 2023 12:22:38 +0100
Subject: [PATCH] Make WAL_DEBUG code harder to break accidentally
---
src/backend/access/transam/generic_xlog.c | 2 --
src/backend/access/transam/xlog.c | 10 ----------
src/backend/access/transam/xlogrecovery.c | 7 -------
src/include/access/xlog.h | 2 ++
4 files changed, 2 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index abd9e1c749..5b910f1024 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -248,7 +248,6 @@ computeDelta(PageData *pageData, Page curpage, Page targetpage)
* If xlog debug is enabled, then check produced delta. Result of delta
* application to curpage should be equivalent to targetpage.
*/
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
{
PGAlignedBlock tmp;
@@ -260,7 +259,6 @@ computeDelta(PageData *pageData, Page curpage, Page targetpage)
BLCKSZ - targetUpper) != 0)
elog(ERROR, "result of generic xlog apply does not match");
}
-#endif
}
/*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2d603d8dee..3a7a38a100 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -644,9 +644,7 @@ static bool updateMinRecoveryPoint = true;
static int MyLockNo = 0;
static bool holdingAllLocks = false;
-#ifdef WAL_DEBUG
static MemoryContext walDebugCxt = NULL;
-#endif
static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
XLogRecPtr EndOfLog,
@@ -997,7 +995,6 @@ XLogInsertRecord(XLogRecData *rdata,
}
}
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
{
static XLogReaderState *debug_reader = NULL;
@@ -1061,7 +1058,6 @@ XLogInsertRecord(XLogRecData *rdata,
pfree(recordBuf.data);
MemoryContextSwitchTo(oldCxt);
}
-#endif
/*
* Update our global variables
@@ -1997,13 +1993,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
}
LWLockRelease(WALBufMappingLock);
-#ifdef WAL_DEBUG
if (XLOG_DEBUG && npages > 0)
{
elog(DEBUG1, "initialized %d pages, up to %X/%X",
npages, LSN_FORMAT_ARGS(NewPageEndPtr));
}
-#endif
}
/*
@@ -2641,13 +2635,11 @@ XLogFlush(XLogRecPtr record)
if (record <= LogwrtResult.Flush)
return;
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
LSN_FORMAT_ARGS(record),
LSN_FORMAT_ARGS(LogwrtResult.Write),
LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
START_CRIT_SECTION();
@@ -2903,14 +2895,12 @@ XLogBackgroundFlush(void)
WriteRqst.Flush = 0;
}
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
elog(LOG, "xlog bg flush request write %X/%X; flush: %X/%X, current is write %X/%X; flush %X/%X",
LSN_FORMAT_ARGS(WriteRqst.Write),
LSN_FORMAT_ARGS(WriteRqst.Flush),
LSN_FORMAT_ARGS(LogwrtResult.Write),
LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
START_CRIT_SECTION();
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index c61566666a..26ba990b71 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -396,9 +396,7 @@ static bool read_tablespace_map(List **tablespaces);
static void xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI);
static void CheckRecoveryConsistency(void);
static void rm_redo_error_callback(void *arg);
-#ifdef WAL_DEBUG
static void xlog_outrec(StringInfo buf, XLogReaderState *record);
-#endif
static void xlog_block_info(StringInfo buf, XLogReaderState *record);
static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
TimeLineID prevTLI, TimeLineID replayTLI);
@@ -1703,7 +1701,6 @@ PerformWalRecovery(void)
ereport_startup_progress("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X",
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr));
-#ifdef WAL_DEBUG
if (XLOG_DEBUG ||
(record->xl_rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
(record->xl_rmid != RM_XACT_ID && trace_recovery_messages <= DEBUG3))
@@ -1720,7 +1717,6 @@ PerformWalRecovery(void)
elog(LOG, "%s", buf.data);
pfree(buf.data);
}
-#endif
/* Handle interrupt signals of startup process */
HandleStartupProcInterrupts();
@@ -2256,8 +2252,6 @@ xlog_outdesc(StringInfo buf, XLogReaderState *record)
rmgr.rm_desc(buf, record);
}
-#ifdef WAL_DEBUG
-
static void
xlog_outrec(StringInfo buf, XLogReaderState *record)
{
@@ -2270,7 +2264,6 @@ xlog_outrec(StringInfo buf, XLogReaderState *record)
xlog_block_info(buf, record);
}
-#endif /* WAL_DEBUG */
/*
* Returns a string giving information about all the blocks in an
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index a14126d164..698dd79c2d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -125,6 +125,8 @@ extern PGDLLIMPORT int wal_level;
#ifdef WAL_DEBUG
extern PGDLLIMPORT bool XLOG_DEBUG;
+#else
+#define XLOG_DEBUG false
#endif
/*
--
2.43.0
On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
On 02.12.23 15:06, Bharath Rupireddy wrote:
I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used.This kind of thing could be mostly avoided if we didn't hide all the
WAL_DEBUG behind #ifdefs.
AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
is a variable but seems like a constant surprises me. I would rename it to
XLogDebug or xlog_debug.
in the normal case. That way, we don't need to wrap that in #ifdef
WAL_DEBUG, and the compiler can see the disabled code and make sure it
continues to build.
I didn't check the LOCK_DEBUG code path to make sure it fits in the same
category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
there.
--
Euler Taveira
EDB https://www.enterprisedb.com/
Peter Eisentraut <peter@eisentraut.org> writes:
This kind of thing could be mostly avoided if we didn't hide all the
WAL_DEBUG behind #ifdefs. For example, in the attached patch, I instead
changed it so that
if (XLOG_DEBUG)
resolves to
if (false)
in the normal case. That way, we don't need to wrap that in #ifdef
WAL_DEBUG, and the compiler can see the disabled code and make sure it
continues to build.
Hmm, maybe, but I'm not sure this would be an unalloyed good.
The main concern I have is compilers and static analyzers starting
to bleat about unreachable code (warnings like "variable set but
never used", or the like, seem plausible). The dead code would
also decrease our code coverage statistics, not that those are
wonderful now.
regards, tom lane
On Wed, Dec 06, 2023 at 09:46:09AM -0300, Euler Taveira wrote:
On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
This kind of thing could be mostly avoided if we didn't hide all the
WAL_DEBUG behind #ifdefs.AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
is a variable but seems like a constant surprises me. I would rename it to
XLogDebug or xlog_debug.
+1. Or just wal_debug for greppability.
in the normal case. That way, we don't need to wrap that in #ifdef
WAL_DEBUG, and the compiler can see the disabled code and make sure it
continues to build.I didn't check the LOCK_DEBUG code path to make sure it fits in the same
category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
there.
PerformWalRecovery() with its log for RM_XACT_ID is something that
stresses me a bit though because this is in the main redo loop which
is never free. The same can be said about GenericXLogFinish() because
the extra computation happens while holding a buffer and marking it
dirty. The ones in xlog.c are free of charge as they are called
outside any critical portions.
This makes me wonder how much we need to care about
trace_recovery_messages, actually, and I've never used it.
--
Michael
On Wed, Dec 6, 2023, at 9:51 PM, Michael Paquier wrote:
PerformWalRecovery() with its log for RM_XACT_ID is something that
stresses me a bit though because this is in the main redo loop which
is never free. The same can be said about GenericXLogFinish() because
the extra computation happens while holding a buffer and marking it
dirty. The ones in xlog.c are free of charge as they are called
outside any critical portions.This makes me wonder how much we need to care about
trace_recovery_messages, actually, and I've never used it.
IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
introduced. I'm also wondering if anyone used it in the past years.
elog.c:
* Intention is to keep this for at least the whole of the 9.0 production
* release, so we can more easily diagnose production problems in the field.
* It should go away eventually, though, because it's an ugly and
* hard-to-explain kluge.
*/
int
trace_recovery(int trace_level)
{
if (trace_level < LOG &&
trace_level >= trace_recovery_messages)
return LOG;
return trace_level;
}
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Wed, Dec 06, 2023 at 11:32:19PM -0300, Euler Taveira wrote:
IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
introduced. I'm also wondering if anyone used it in the past years.
FWIW, I'd be +1 for getting rid of entirely, with its conditional
block in PerformWalRecovery(), as it does not bring any additional
value now that it is possible to achieve much more with pg_waldump
(pg_xlogdump before that) introduced a couple of years later in 9.3.
--
Michael
On Mon, Dec 4, 2023 at 12:37 AM Noah Misch <noah@leadboat.com> wrote:
On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
b) Remove both the WAL_DEBUG macro and the wal_debug GUC. I don't
think (2) is needed to be in core especially when tools like
pg_walinspect and pg_waldump can do the same job. And, the messages in
(3) and (4) can be turned to some DEBUGX level without being under the
WAL_DEBUG macro.Is there anything provided by wal_debug that can't be found via
pg_walinspect/pg_waldump?I don't think so. The WAL record decoding can be achieved with
pg_walinspect or pg_waldump.Can be, but the WAL_DEBUG model is mighty convenient:
- Cooperates with backtrace_functions
- Change log_line_prefix to correlate any log_line_prefix fact with WAL records
- See WAL records interleaved with non-WAL log messages
Agree it helps in all of the above situations, but I'm curious to know
what sorts of problems it helps debug with.
The interesting pieces that WAL_DEBUG code does are the following:
1. Decodes the WAL record right after it's written to WAL buffers in
XLogInsertRecord. What problem does it help to detect?
2. Emits a log message for every WAL record applied in the main redo
apply loop. Enabling this isn't cheap for sure even for developer
environments; I've observed a 10% increase in recovery test time)
3. Emits log messages for WAL writes/flushes and WAL buffer page
initializations. These messages don't have to be hiding under a macro,
but a DEBUGX level is sufficient.
I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix theI don't use it in production, but I use it more than any other of our many
DEBUG macros.
I'm just curious to know what sorts of problems WAL_DEBUG code helps
debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
the most? Is converting the LOG messages (3) to DEBUGX level going to
help in your case? Can you please throw some light on this?
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Dec 7, 2023 at 8:10 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 06, 2023 at 11:32:19PM -0300, Euler Taveira wrote:
IIUC trace_recovery_messages was a debugging aid in the 9.0 era when the HS was
introduced. I'm also wondering if anyone used it in the past years.FWIW, I'd be +1 for getting rid of entirely, with
+1 for removing trace_recovery_messages. Firstly, it doesn't cover all
the recovery related messages as it promises, so it's an incomplete
feature. Secondly, it needs a bit of understanding as to how it gels
with client_min_messages and log_min_messages.
its conditional
block in PerformWalRecovery(), as it does not bring any additional
value now that it is possible to achieve much more with pg_waldump
(pg_xlogdump before that) introduced a couple of years later in 9.3.
And, I agree that the functionality (description of the WAL record
being applied) of conditional trace_recovery_messages code under
WAL_DEBUG macro in PerformWalRecovery's main redo apply loop can more
easily be achieved with either pg_walinspect or pg_waldump. That's my
point as well for getting rid of WAL_DEBUG macro related code after
converting a few messages to DEBUGX level.
The comment atop trace_recovery [1]* Intention is to keep this for at least the whole of the 9.0 production * release, so we can more easily diagnose production problems in the field. * It should go away eventually, though, because it's an ugly and * hard-to-explain kluge. */ int trace_recovery(int trace_level) function says it should go away
eventually and seems to have served the purpose when the recovery
related code was introduced in PG 9.0.
FWIW, the attached patch is what I've left with after removing
trace_recovery_messages related code, 9 files changed, 19
insertions(+), 97 deletions(-).
[1]: * Intention is to keep this for at least the whole of the 9.0 production * release, so we can more easily diagnose production problems in the field. * It should go away eventually, though, because it's an ugly and * hard-to-explain kluge. */ int trace_recovery(int trace_level)
* Intention is to keep this for at least the whole of the 9.0 production
* release, so we can more easily diagnose production problems in the field.
* It should go away eventually, though, because it's an ugly and
* hard-to-explain kluge.
*/
int
trace_recovery(int trace_level)
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Remove-trace_recovery_messages-related-code.patchapplication/x-patch; name=v1-0001-Remove-trace_recovery_messages-related-code.patchDownload
From 8b804df3959e5e4e2fdc4f147b8309bece92fba4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Thu, 7 Dec 2023 11:49:53 +0000
Subject: [PATCH v1] Remove trace_recovery_messages related code
---
doc/src/sgml/config.sgml | 26 -----------------------
src/backend/access/transam/xlog.c | 2 +-
src/backend/access/transam/xlogrecovery.c | 4 +---
src/backend/storage/ipc/procarray.c | 20 ++++++++---------
src/backend/storage/ipc/standby.c | 11 +++++-----
src/backend/utils/cache/inval.c | 5 ++---
src/backend/utils/error/elog.c | 26 -----------------------
src/backend/utils/misc/guc_tables.c | 18 ----------------
src/include/miscadmin.h | 4 ----
9 files changed, 19 insertions(+), 97 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 94d1eb2b81..44cada2b40 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11216,32 +11216,6 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</listitem>
</varlistentry>
- <varlistentry id="guc-trace-recovery-messages" xreflabel="trace_recovery_messages">
- <term><varname>trace_recovery_messages</varname> (<type>enum</type>)
- <indexterm>
- <primary><varname>trace_recovery_messages</varname> configuration parameter</primary>
- </indexterm>
- </term>
- <listitem>
- <para>
- Enables logging of recovery-related debugging output that otherwise
- would not be logged. This parameter allows the user to override the
- normal setting of <xref linkend="guc-log-min-messages"/>, but only for
- specific messages. This is intended for use in debugging hot standby.
- Valid values are <literal>DEBUG5</literal>, <literal>DEBUG4</literal>,
- <literal>DEBUG3</literal>, <literal>DEBUG2</literal>, <literal>DEBUG1</literal>, and
- <literal>LOG</literal>. The default, <literal>LOG</literal>, does not affect
- logging decisions at all. The other values cause recovery-related
- debug messages of that priority or higher to be logged as though they
- had <literal>LOG</literal> priority; for common settings of
- <varname>log_min_messages</varname> this results in unconditionally sending
- them to the server log.
- This parameter can only be set in the <filename>postgresql.conf</filename>
- file or on the server command line.
- </para>
- </listitem>
- </varlistentry>
-
<varlistentry id="guc-trace-sort" xreflabel="trace_sort">
<term><varname>trace_sort</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2d603d8dee..14342a16bf 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7213,7 +7213,7 @@ RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
*/
if (XLogHaveInvalidPages())
{
- elog(trace_recovery(DEBUG2),
+ elog(DEBUG2,
"could not record restart point at %X/%X because there "
"are unresolved references to invalid pages",
LSN_FORMAT_ARGS(checkPoint->redo));
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index c61566666a..0aed9f5ccb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1704,9 +1704,7 @@ PerformWalRecovery(void)
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr));
#ifdef WAL_DEBUG
- if (XLOG_DEBUG ||
- (record->xl_rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
- (record->xl_rmid != RM_XACT_ID && trace_recovery_messages <= DEBUG3))
+ if (XLOG_DEBUG)
{
StringInfoData buf;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 80ab026bf5..084d3e1db2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1109,11 +1109,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
running->oldestRunningXid))
{
standbyState = STANDBY_SNAPSHOT_READY;
- elog(trace_recovery(DEBUG1),
+ elog(DEBUG1,
"recovery snapshots are now enabled");
}
else
- elog(trace_recovery(DEBUG1),
+ elog(DEBUG1,
"recovery snapshot waiting for non-overflowed snapshot or "
"until oldest active xid on standby is at least %u (now %u)",
standbySnapshotPendingXmin,
@@ -1208,7 +1208,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
KnownAssignedXidsAdd(xids[i], xids[i], true);
}
- KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
+ KnownAssignedXidsDisplay(DEBUG3);
}
pfree(xids);
@@ -1280,11 +1280,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
Assert(FullTransactionIdIsValid(ShmemVariableCache->nextXid));
- KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
+ KnownAssignedXidsDisplay(DEBUG3);
if (standbyState == STANDBY_SNAPSHOT_READY)
- elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
+ elog(DEBUG1, "recovery snapshots are now enabled");
else
- elog(trace_recovery(DEBUG1),
+ elog(DEBUG1,
"recovery snapshot waiting for non-overflowed snapshot or "
"until oldest active xid on standby is at least %u (now %u)",
standbySnapshotPendingXmin,
@@ -4339,7 +4339,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
Assert(TransactionIdIsValid(xid));
Assert(TransactionIdIsValid(latestObservedXid));
- elog(trace_recovery(DEBUG4), "record known xact %u latestObservedXid %u",
+ elog(DEBUG4, "record known xact %u latestObservedXid %u",
xid, latestObservedXid);
/*
@@ -4897,7 +4897,7 @@ KnownAssignedXidsRemove(TransactionId xid)
{
Assert(TransactionIdIsValid(xid));
- elog(trace_recovery(DEBUG4), "remove KnownAssignedXid %u", xid);
+ elog(DEBUG4, "remove KnownAssignedXid %u", xid);
/*
* Note: we cannot consider it an error to remove an XID that's not
@@ -4951,13 +4951,13 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)
if (!TransactionIdIsValid(removeXid))
{
- elog(trace_recovery(DEBUG4), "removing all KnownAssignedXids");
+ elog(DEBUG4, "removing all KnownAssignedXids");
pArray->numKnownAssignedXids = 0;
pArray->headKnownAssignedXids = pArray->tailKnownAssignedXids = 0;
return;
}
- elog(trace_recovery(DEBUG4), "prune KnownAssignedXids to %u", removeXid);
+ elog(DEBUG4, "prune KnownAssignedXids to %u", removeXid);
/*
* Mark entries invalid starting at the tail. Since array is sorted, we
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index cc22d2e87c..cea47bf6e5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -997,8 +997,7 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
TransactionIdDidAbort(xid))
return;
- elog(trace_recovery(DEBUG4),
- "adding recovery lock: db %u rel %u", dbOid, relOid);
+ elog(DEBUG4, "adding recovery lock: db %u rel %u", dbOid, relOid);
/* dbOid is InvalidOid when we are locking a shared relation. */
Assert(OidIsValid(relOid));
@@ -1042,7 +1041,7 @@ StandbyReleaseXidEntryLocks(RecoveryLockXidEntry *xidentry)
{
LOCKTAG locktag;
- elog(trace_recovery(DEBUG4),
+ elog(DEBUG4,
"releasing recovery lock: xid %u db %u rel %u",
entry->key.xid, entry->key.dbOid, entry->key.relOid);
/* Release the lock ... */
@@ -1109,7 +1108,7 @@ StandbyReleaseAllLocks(void)
HASH_SEQ_STATUS status;
RecoveryLockXidEntry *entry;
- elog(trace_recovery(DEBUG2), "release all standby locks");
+ elog(DEBUG2, "release all standby locks");
hash_seq_init(&status, RecoveryLockXidHash);
while ((entry = hash_seq_search(&status)))
@@ -1369,7 +1368,7 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
if (CurrRunningXacts->subxid_overflow)
- elog(trace_recovery(DEBUG2),
+ elog(DEBUG2,
"snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
CurrRunningXacts->xcnt,
LSN_FORMAT_ARGS(recptr),
@@ -1377,7 +1376,7 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
CurrRunningXacts->latestCompletedXid,
CurrRunningXacts->nextXid);
else
- elog(trace_recovery(DEBUG2),
+ elog(DEBUG2,
"snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
LSN_FORMAT_ARGS(recptr),
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index a041d7d604..30d872dfde 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -966,13 +966,12 @@ ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
if (nmsgs <= 0)
return;
- elog(trace_recovery(DEBUG4), "replaying commit with %d messages%s", nmsgs,
+ elog(DEBUG4, "replaying commit with %d messages%s", nmsgs,
(RelcacheInitFileInval ? " and relcache file invalidation" : ""));
if (RelcacheInitFileInval)
{
- elog(trace_recovery(DEBUG4), "removing relcache init files for database %u",
- dbid);
+ elog(DEBUG4, "removing relcache init files for database %u", dbid);
/*
* RelationCacheInitFilePreInvalidate, when the invalidation message
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 6aeb855e49..6ea575a53b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -3759,29 +3759,3 @@ write_stderr_signal_safe(const char *str)
nwritten += rc;
}
}
-
-
-/*
- * Adjust the level of a recovery-related message per trace_recovery_messages.
- *
- * The argument is the default log level of the message, eg, DEBUG2. (This
- * should only be applied to DEBUGn log messages, otherwise it's a no-op.)
- * If the level is >= trace_recovery_messages, we return LOG, causing the
- * message to be logged unconditionally (for most settings of
- * log_min_messages). Otherwise, we return the argument unchanged.
- * The message will then be shown based on the setting of log_min_messages.
- *
- * Intention is to keep this for at least the whole of the 9.0 production
- * release, so we can more easily diagnose production problems in the field.
- * It should go away eventually, though, because it's an ugly and
- * hard-to-explain kluge.
- */
-int
-trace_recovery(int trace_level)
-{
- if (trace_level < LOG &&
- trace_level >= trace_recovery_messages)
- return LOG;
-
- return trace_level;
-}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6474e35ec0..f7c9882f7c 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -525,7 +525,6 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
-int trace_recovery_messages = LOG;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -4780,23 +4779,6 @@ struct config_enum ConfigureNamesEnum[] =
NULL, NULL, NULL
},
- {
- {"trace_recovery_messages", PGC_SIGHUP, DEVELOPER_OPTIONS,
- gettext_noop("Enables logging of recovery-related debugging information."),
- gettext_noop("Each level includes all the levels that follow it. The later"
- " the level, the fewer messages are sent."),
- GUC_NOT_IN_SAMPLE,
- },
- &trace_recovery_messages,
-
- /*
- * client_message_level_options allows too many values, really, but
- * it's not worth having a separate options array for this.
- */
- LOG, client_message_level_options,
- NULL, NULL, NULL
- },
-
{
{"track_functions", PGC_SUSET, STATS_CUMULATIVE,
gettext_noop("Collects function-level statistics on database activity."),
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index f0cc651435..1043a4d782 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -302,10 +302,6 @@ extern void PreventCommandIfReadOnly(const char *cmdname);
extern void PreventCommandIfParallelMode(const char *cmdname);
extern void PreventCommandDuringRecovery(const char *cmdname);
-/* in utils/misc/guc_tables.c */
-extern PGDLLIMPORT int trace_recovery_messages;
-extern int trace_recovery(int trace_level);
-
/*****************************************************************************
* pdir.h -- *
* POSTGRES directory path definitions. *
--
2.34.1
On Thu, Dec 7, 2023 at 6:20 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
I'm just curious to know what sorts of problems WAL_DEBUG code helps
debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
the most? Is converting the LOG messages (3) to DEBUGX level going to
help in your case? Can you please throw some light on this?
I don't like the idea of removing WAL_DEBUG. I haven't used it in a
while, but I have used it, and it's been very helpful. You can for
example run a psql session and easily see what each command you type
generates in terms of WAL, without having to run pg_waldump over and
over and figure out which output is new. That's not something that I
need to do super-commonly, but I have wanted to do it for certain
projects, and I don't think that maintaining the WAL_DEBUG code in
tree is really causing us very much hassle. In fact, I'd argue that of
all of the various debugging bits that are part of the tree, WAL_DEBUG
is the most useful by a good margin. Things like OPTIMIZER_DEBUG and
LOCK_DEBUG seem to me to have much less utility. LOCK_DEBUG for
example produces a completely unworkable volume of output even for
very simple operations.
I've never been much of a believer in trace_recovery_messages, either,
but I'm somewhat sympathetic to the problem it's trying to solve. My
incremental backup patch set adds a new background process, and what
are you supposed to do if you have a problem with that process? You
can crank up the server debugging level overall, but what if you just
want the walsummarizer process, or say the walsummarizer and any
processes trying to take an incremental backup, to do super-detailed
logging? We don't have a model for that. I thought about adding
trace_walsummarizer_messages or debug_walsummarizer or something for
this exact reason, but I didn't because the existing
trace_recovery_messages setting seems like a kludge I don't want to
propagate. But that leaves me with nothing other than boosting up the
debug level for the whole server, which is an annoying thing to have
to do if you really only care about one subsystem.
I don't know what the right answer is exactly, but there should be
some way of telling the system something more fine-grained than "do
more logging" or "do a lot more logging" or "do really a lot more
logging" ...
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Dec 07, 2023 at 04:50:30PM +0530, Bharath Rupireddy wrote:
On Mon, Dec 4, 2023 at 12:37 AM Noah Misch <noah@leadboat.com> wrote:
On Sun, Dec 03, 2023 at 08:30:24PM +0530, Bharath Rupireddy wrote:
On Sun, Dec 3, 2023 at 4:00 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Sat, Dec 02, 2023 at 07:36:29PM +0530, Bharath Rupireddy wrote:
The interesting pieces that WAL_DEBUG code does are the following:
1. Decodes the WAL record right after it's written to WAL buffers in
XLogInsertRecord. What problem does it help to detect?
I think it helped me understand why a test case I was writing didn't reach the
bug I expected it to reach.
2. Emits a log message for every WAL record applied in the main redo
apply loop. Enabling this isn't cheap for sure even for developer
environments; I've observed a 10% increase in recovery test time)
3. Emits log messages for WAL writes/flushes and WAL buffer page
initializations. These messages don't have to be hiding under a macro,
but a DEBUGX level is sufficient.I have no idea if anyone uses WAL_DEBUG macro and wal_debug GUCs in
production, if we have somebody using it, I think we need to fix theI don't use it in production, but I use it more than any other of our many
DEBUG macros.I'm just curious to know what sorts of problems WAL_DEBUG code helps
debug with. Is the WAL_DEBUG code (1) or (2) or (3) that helped you
the most?
For me, (1) and (2) came up several times, and (3) came up once. I don't
remember which of (1) or (2) helped more.
Is converting the LOG messages (3) to DEBUGX level going to
help in your case?
Not in my case.
On Thu, Dec 07, 2023 at 05:29:55PM +0530, Bharath Rupireddy wrote:
The comment atop trace_recovery [1] function says it should go away
eventually and seems to have served the purpose when the recovery
related code was introduced in PG 9.0.FWIW, the attached patch is what I've left with after removing
trace_recovery_messages related code, 9 files changed, 19
insertions(+), 97 deletions(-).
Looks acceptable to me. Does somebody object to this removal?
--
Michael