Using XLogFileNameP in critical section

Started by Masahiko Sawadaabout 6 years ago12 messages
#1Masahiko Sawada
masahiko.sawada@2ndquadrant.com
1 attachment(s)

Hi,

I encountered that the assertion error is reported instead of a proper
PANIC message when failed to fsync WAL. The cause is that there are
multiple places where we call XLogFileNameP function that calls palloc
during critical section, for example XLogWrite function.

TRAP: FailedAssertion("CritSectionCount == 0 ||
(context)->allowInCritSection", File: "mcxt.c", Line: 956)

As far as I can see there are five places we need to fix.I've attached a patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

fix_XLogFileNameP.patchapplication/x-patch; name=fix_XLogFileNameP.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5f0ee50092..436409bca1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2499,14 +2499,21 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				pgstat_report_wait_end();
 				if (written <= 0)
 				{
+					char xlogfname[MAXFNAMELEN];
+					int	save_errno;
+
 					if (errno == EINTR)
 						continue;
+
+					save_errno = errno;
+					XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+								 wal_segment_size);
+					errno = save_errno;
 					ereport(PANIC,
 							(errcode_for_file_access(),
 							 errmsg("could not write to log file %s "
 									"at offset %u, length %zu: %m",
-									XLogFileNameP(ThisTimeLineID, openLogSegNo),
-									startoffset, nleft)));
+									xlogfname, startoffset, nleft)));
 				}
 				nleft -= written;
 				from += written;
@@ -3792,10 +3799,17 @@ XLogFileClose(void)
 #endif
 
 	if (close(openLogFile) != 0)
+	{
+		char xlogfname[MAXFNAMELEN];
+		int	save_errno = errno;
+
+		XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						XLogFileNameP(ThisTimeLineID, openLogSegNo))));
+				 errmsg("could not close file \"%s\": %m", xlogfname)));
+	}
+
 	openLogFile = -1;
 }
 
@@ -10100,32 +10114,25 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	char *msg = NULL;
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
 	{
 		case SYNC_METHOD_FSYNC:
 			if (pg_fsync_no_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = "could not fsync file \"%s\": %m";
 			break;
 #ifdef HAVE_FSYNC_WRITETHROUGH
 		case SYNC_METHOD_FSYNC_WRITETHROUGH:
 			if (pg_fsync_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync write-through file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = "could not fsync write-through file \"%s\": %m";
 			break;
 #endif
 #ifdef HAVE_FDATASYNC
 		case SYNC_METHOD_FDATASYNC:
 			if (pg_fdatasync(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fdatasync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = "could not fdatasync file \"%s\": %m";
 			break;
 #endif
 		case SYNC_METHOD_OPEN:
@@ -10136,6 +10143,21 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
 			elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
 			break;
 	}
+
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char	xlogfname[MAXFNAMELEN];
+		int		save_errno = errno;
+
+		XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+					 wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg(msg, xlogfname)));
+	}
+
 	pgstat_report_wait_end();
 }
 
#2Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#1)
Re: Using XLogFileNameP in critical section

On Fri, Nov 29, 2019 at 06:44:58PM +0100, Masahiko Sawada wrote:

I encountered that the assertion error is reported instead of a proper
PANIC message when failed to fsync WAL. The cause is that there are
multiple places where we call XLogFileNameP function that calls palloc
during critical section, for example XLogWrite function.

TRAP: FailedAssertion("CritSectionCount == 0 ||
(context)->allowInCritSection", File: "mcxt.c", Line: 956)

As far as I can see there are five places we need to fix. I've
attached a patch.

+ msg = "could not fdatasync file \"%s\": %m";
Missing some translations, no?

You are missing a couple of code paths in walreceiver.c,
XLogWalRcvWrite(), where XLogFileNameP is used on a PANIC. This
brings me the following points:
1) If you look closely, all the callers of XLogFileNameP() are used
for the generation of error strings.
2) I am ready to bet that we'll have the same discussion in the future
because somebody will make the same mistake for a new code path.

I think that we had better just get rid of XLogFileNameP() (on HEAD)
and just change those code paths so as they use a buffer of size
MAXFNAMELEN, with XLogFileName() generating the file name. This leads
actually to some simplifications, see for example XLogWalRcvWrite..
--
Michael

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#2)
Re: Using XLogFileNameP in critical section

On 2019-Dec-02, Michael Paquier wrote:

I think that we had better just get rid of XLogFileNameP() (on HEAD)
and just change those code paths so as they use a buffer of size
MAXFNAMELEN, with XLogFileName() generating the file name. This leads
actually to some simplifications, see for example XLogWalRcvWrite..

+1, that function is a footgun and there's no possible way to fix it so
that it's never misused(*). Also, put a comment atop XLogFileName
indicating never to create such a "helper" function again.

(*) unless there was a way to have a StaticAssert() that the function is
not called in a critical section -- have the compiler work for us. I
don't think it's possible to write such a thing.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
1 attachment(s)
Re: Using XLogFileNameP in critical section

On Mon, 2 Dec 2019 at 16:56, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Dec-02, Michael Paquier wrote:

I think that we had better just get rid of XLogFileNameP() (on HEAD)
and just change those code paths so as they use a buffer of size
MAXFNAMELEN, with XLogFileName() generating the file name. This leads
actually to some simplifications, see for example XLogWalRcvWrite..

+1, that function is a footgun and there's no possible way to fix it so
that it's never misused(*). Also, put a comment atop XLogFileName
indicating never to create such a "helper" function again.

(*) unless there was a way to have a StaticAssert() that the function is
not called in a critical section -- have the compiler work for us. I
don't think it's possible to write such a thing.

Agreed. I've attached the updated version patch. Please review it.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

fix_XLogFileNameP_v2.patchapplication/octet-stream; name=fix_XLogFileNameP_v2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5f0ee50092..fb1262a175 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2499,14 +2499,21 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				pgstat_report_wait_end();
 				if (written <= 0)
 				{
+					char xlogfname[MAXFNAMELEN];
+					int	save_errno;
+
 					if (errno == EINTR)
 						continue;
+
+					save_errno = errno;
+					XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+								 wal_segment_size);
+					errno = save_errno;
 					ereport(PANIC,
 							(errcode_for_file_access(),
 							 errmsg("could not write to log file %s "
 									"at offset %u, length %zu: %m",
-									XLogFileNameP(ThisTimeLineID, openLogSegNo),
-									startoffset, nleft)));
+									xlogfname, startoffset, nleft)));
 				}
 				nleft -= written;
 				from += written;
@@ -3792,10 +3799,17 @@ XLogFileClose(void)
 #endif
 
 	if (close(openLogFile) != 0)
+	{
+		char xlogfname[MAXFNAMELEN];
+		int	save_errno = errno;
+
+		XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						XLogFileNameP(ThisTimeLineID, openLogSegNo))));
+				 errmsg("could not close file \"%s\": %m", xlogfname)));
+	}
+
 	openLogFile = -1;
 }
 
@@ -5510,10 +5524,17 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		fd = XLogFileInit(startLogSegNo, &use_existent, true);
 
 		if (close(fd) != 0)
+		{
+			char xlogfname[MAXFNAMELEN];
+			int	save_errno = errno;
+
+			XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+						 wal_segment_size);
+			errno = save_errno;
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not close file \"%s\": %m",
-							XLogFileNameP(ThisTimeLineID, startLogSegNo))));
+					 errmsg("could not close file \"%s\": %m", xlogfname)));
+		}
 	}
 
 	/*
@@ -10079,10 +10100,18 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 		{
 			pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 			if (pg_fsync(openLogFile) != 0)
+			{
+				char xlogfname[MAXFNAMELEN];
+				int	save_errno;
+
+				save_errno = errno;
+				XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+							 wal_segment_size);
 				ereport(PANIC,
 						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, openLogSegNo))));
+						 errmsg("could not fsync file \"%s\": %m", xlogfname)));
+			}
+
 			pgstat_report_wait_end();
 			if (get_sync_bit(sync_method) != get_sync_bit(new_sync_method))
 				XLogFileClose();
@@ -10100,32 +10129,25 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	char *msg = NULL;
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
 	{
 		case SYNC_METHOD_FSYNC:
 			if (pg_fsync_no_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fsync file \"%s\": %m");
 			break;
 #ifdef HAVE_FSYNC_WRITETHROUGH
 		case SYNC_METHOD_FSYNC_WRITETHROUGH:
 			if (pg_fsync_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync write-through file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fsync write-through file \"%s\": %m");
 			break;
 #endif
 #ifdef HAVE_FDATASYNC
 		case SYNC_METHOD_FDATASYNC:
 			if (pg_fdatasync(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fdatasync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fdatasync file \"%s\": %m");
 			break;
 #endif
 		case SYNC_METHOD_OPEN:
@@ -10136,19 +10158,22 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
 			elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
 			break;
 	}
-	pgstat_report_wait_end();
-}
 
-/*
- * Return the filename of given log segment, as a palloc'd string.
- */
-char *
-XLogFileNameP(TimeLineID tli, XLogSegNo segno)
-{
-	char	   *result = palloc(MAXFNAMELEN);
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char	xlogfname[MAXFNAMELEN];
+		int		save_errno = errno;
 
-	XLogFileName(result, tli, segno, wal_segment_size);
-	return result;
+		XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+					 wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg(msg, xlogfname)));
+	}
+
+	pgstat_report_wait_end();
 }
 
 /*
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 446760ed6e..ca73ef5131 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -944,7 +944,9 @@ void
 WALReadRaiseError(WALReadError *errinfo)
 {
 	WALOpenSegment *seg = &errinfo->wre_seg;
-	char	   *fname = XLogFileNameP(seg->ws_tli, seg->ws_segno);
+	char	   fname[MAXFNAMELEN];
+
+	XLogFileName(fname, seg->ws_tli, seg->ws_segno, wal_segment_size);
 
 	if (errinfo->wre_read < 0)
 	{
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 72acb10767..9214ca6526 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -576,17 +576,17 @@ WalReceiverMain(void)
 			char		xlogfname[MAXFNAMELEN];
 
 			XLogWalRcvFlush(false);
+			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 			if (close(recvFile) != 0)
 				ereport(PANIC,
 						(errcode_for_file_access(),
 						 errmsg("could not close log segment %s: %m",
-								XLogFileNameP(recvFileTLI, recvSegNo))));
+								xlogfname)));
 
 			/*
 			 * Create .done file forcibly to prevent the streamed segment from
 			 * being archived later.
 			 */
-			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 			if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
 				XLogArchiveForceDone(xlogfname);
 			else
@@ -900,6 +900,8 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 
 				XLogWalRcvFlush(false);
 
+				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
 				/*
 				 * XLOG segment files will be re-read by recovery in startup
 				 * process soon, so we don't advise the OS to release cache
@@ -909,13 +911,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 					ereport(PANIC,
 							(errcode_for_file_access(),
 							 errmsg("could not close log segment %s: %m",
-									XLogFileNameP(recvFileTLI, recvSegNo))));
+									xlogfname)));
 
 				/*
 				 * Create .done file forcibly to prevent the streamed segment
 				 * from being archived later.
 				 */
-				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 				if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
 					XLogArchiveForceDone(xlogfname);
 				else
@@ -943,11 +944,18 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		if (recvOff != startoff)
 		{
 			if (lseek(recvFile, (off_t) startoff, SEEK_SET) < 0)
+			{
+				char xlogfname[MAXFNAMELEN];
+				int	save_errno = errno;
+
+				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+				errno = save_errno;
 				ereport(PANIC,
 						(errcode_for_file_access(),
 						 errmsg("could not seek in log segment %s to offset %u: %m",
-								XLogFileNameP(recvFileTLI, recvSegNo),
-								startoff)));
+								xlogfname, startoff)));
+			}
+
 			recvOff = startoff;
 		}
 
@@ -957,15 +965,21 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		byteswritten = write(recvFile, buf, segbytes);
 		if (byteswritten <= 0)
 		{
+			char xlogfname[MAXFNAMELEN];
+			int	save_errno;
+
 			/* if write didn't set errno, assume no disk space */
 			if (errno == 0)
 				errno = ENOSPC;
+
+			save_errno = errno;
+			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+			errno = save_errno;
 			ereport(PANIC,
 					(errcode_for_file_access(),
 					 errmsg("could not write to log segment %s "
 							"at offset %u, length %lu: %m",
-							XLogFileNameP(recvFileTLI, recvSegNo),
-							recvOff, (unsigned long) segbytes)));
+							xlogfname, recvOff, (unsigned long) segbytes)));
 		}
 
 		/* Update state for write */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8bafa65e50..5fcdec5f21 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2434,10 +2434,17 @@ WalSndSegmentOpen(XLogSegNo nextSegNo, WALSegmentContext *segcxt,
 	 * too old WAL segment that has already been removed or recycled.
 	 */
 	if (errno == ENOENT)
+	{
+		char xlogfname[MAXFNAMELEN];
+		int	save_errno = errno;
+
+		XLogFileName(xlogfname, *tli_p, nextSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("requested WAL segment %s has already been removed",
-						XLogFileNameP(*tli_p, nextSegNo))));
+						xlogfname)));
+	}
 	else
 		ereport(ERROR,
 				(errcode_for_file_access(),
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..9b588c87a5 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -288,7 +288,6 @@ extern bool RecoveryIsPaused(void);
 extern void SetRecoveryPause(bool recoveryPause);
 extern TimestampTz GetLatestXTime(void);
 extern TimestampTz GetCurrentChunkReplayStartTime(void);
-extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index e7e10beb4c..ba8e0f8d9f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -152,6 +152,12 @@ typedef XLogLongPageHeaderData *XLogLongPageHeader;
 /* Length of XLog file name */
 #define XLOG_FNAME_LEN	   24
 
+/*
+ * Write xlog file name to fname. We used to have a helper function that returns
+ * the xlog file name as a palloc'd string. But we will never need such a function
+ * because it cannot be used during the critical section and it is hard to fix so
+ * that it is never misused.
+ */
 #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes)	\
 	snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli,		\
 			 (uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \
#5Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#4)
Re: Using XLogFileNameP in critical section

On Mon, Dec 02, 2019 at 10:14:58PM +0100, Masahiko Sawada wrote:

Agreed. I've attached the updated version patch. Please review it.

Thanks, applied on HEAD after a few edits. gettext() does not set
errno, so the new style of issue_xlog_fsync() is actually fine.
Please note that there was one mistake in the patch: you forgot to
assign back errno in assign_xlog_sync_method() after generating the
file name.

Per the low probability of the failures, I did not backpatch that
stuff. I quickly looked at applying that further down, and attached
is a version for v12 FWIW, and I suspect much more conflicts the more
you go down (wal segment size added in 11, different code paths for
replication, etc.).
--
Michael

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#5)
Re: Using XLogFileNameP in critical section

On 2019-Dec-03, Michael Paquier wrote:

Per the low probability of the failures, I did not backpatch that
stuff. I quickly looked at applying that further down, and attached
is a version for v12 FWIW, and I suspect much more conflicts the more
you go down (wal segment size added in 11, different code paths for
replication, etc.).

You didn't attach anything, but I concur about the low probability
aspect: the assertion failure does not occur in production builds
(obviously); and only an out-of-memory situation is a real problem when
an fsync fails. Anyway this should be a very localized fix, right?

I'm not sure that the internationalization stuff in issue_xlog_fsync is
correct. I think the _() should be gettext_noop(), or alternatively the
errmsg() should be errmsg_internal(); otherwise the translation is
invoked twice. (I didn't verify this.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Using XLogFileNameP in critical section

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm not sure that the internationalization stuff in issue_xlog_fsync is
correct. I think the _() should be gettext_noop(), or alternatively the
errmsg() should be errmsg_internal(); otherwise the translation is
invoked twice. (I didn't verify this.)

Also, buildfarm member drongo is not happy:

postgres.def : error LNK2001: unresolved external symbol XLogFileNameP [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]
Release/postgres/postgres.lib : fatal error LNK1120: 1 unresolved externals [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]

I'm guessing you missed a reference someplace.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: Using XLogFileNameP in critical section

I wrote:

Also, buildfarm member drongo is not happy:
postgres.def : error LNK2001: unresolved external symbol XLogFileNameP [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]
Release/postgres/postgres.lib : fatal error LNK1120: 1 unresolved externals [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]
I'm guessing you missed a reference someplace.

Hm ... grep swears up and down that there is no remaining instance
of the string "XLogFileNameP" anywhere in the tree. So this doesn't
seem to be the fault of 9989d37d1 per se. What my eye now falls on
is this, a bit further up in the build log [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2019-12-03%2007%3A30%3A01:

...
PreLinkEvent:
Generate DEF file
perl src\tools\msvc\gendef.pl Release\postgres x64
:VCEnd
Not re-generating POSTGRES.DEF, file already exists.
Link:
...

So it seems that the problem might really be a faulty rule in our
MSVC build script about when postgres.def needs to be regenerated?
Or else it's some weird caching problem on drongo --- the lack of
complaints from other Windows critters might point the finger
that way.

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2019-12-03%2007%3A30%3A01

#9Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Michael Paquier (#5)
Re: Using XLogFileNameP in critical section

On Tue, 3 Dec 2019 at 07:09, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 02, 2019 at 10:14:58PM +0100, Masahiko Sawada wrote:

Agreed. I've attached the updated version patch. Please review it.

Thanks, applied on HEAD after a few edits. gettext() does not set
errno, so the new style of issue_xlog_fsync() is actually fine.

Thanks!

Please note that there was one mistake in the patch: you forgot to
assign back errno in assign_xlog_sync_method() after generating the
file name.

My bad. Thank you for fixing it.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#8)
Re: Using XLogFileNameP in critical section

On Tue, Dec 03, 2019 at 11:24:57AM -0500, Tom Lane wrote:

So it seems that the problem might really be a faulty rule in our
MSVC build script about when postgres.def needs to be regenerated?
Or else it's some weird caching problem on drongo --- the lack of
complaints from other Windows critters might point the finger
that way.

Yes, I saw the failure from the buildfarm logs, but I got to the
conclusion that the animal just got crazy with a portion of its
caching because there are no more references to routine removed. So I
did not bother much.

FWIW, I have seen sometimes similar warnings regarding conflicting
projects when doing a compilation on MSVC, applying a patch and then
recompiling (because that's just faster than recompiling the whole).
Perhaps we could do better here. I am not completely sure what
though, it's not a problem I have spent much brain-time on.
--
Michael

#11Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#6)
2 attachment(s)
Re: Using XLogFileNameP in critical section

On Tue, Dec 03, 2019 at 09:35:00AM -0300, Alvaro Herrera wrote:

You didn't attach anything, but I concur about the low probability
aspect: the assertion failure does not occur in production builds
(obviously); and only an out-of-memory situation is a real problem
when
an fsync fails. Anyway this should be a very localized fix, right?

Sorry. You get something like the attached. The recent refactoring
work you committed in this area causes already conflicts on
REL_12_STABLE.

I'm not sure that the internationalization stuff in issue_xlog_fsync
is correct. I think the _() should be gettext_noop(), or alternatively
the errmsg() should be errmsg_internal(); otherwise the translation is
invoked twice. (I didn't verify this.)

Hmm. We actually do both in tablecmds.c:ATWrongRelkindError(), and
that's the code I was looking at yesterday when thinking about the
problem.. However, parse_agg.c, parse_expr.c and parse_func.c among
others like vacuumlazy.c use directly errmsg_internal() without
translating the string first. So there is indeed duplicated work for
both. Does the attached patch look correct to you?
--
Michael

Attachments:

gettext-fix.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..01ca07b552 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10171,7 +10171,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
 		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
-				 errmsg(msg, xlogfname)));
+				 errmsg_internal(msg, xlogfname)));
 	}
 
 	pgstat_report_wait_end();
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5440eb9015..3d98af47b1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5230,7 +5230,7 @@ ATWrongRelkindError(Relation rel, int allowed_targets)
 
 	ereport(ERROR,
 			(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-			 errmsg(msg, RelationGetRelationName(rel))));
+			 errmsg_internal(msg, RelationGetRelationName(rel))));
 }
 
 /*
0001-Remove-use-of-XLogFileNameP-in-error-string-generati-v12.patchtext/x-diff; charset=iso-8859-1Download
From 491faa2e92b5f637bb8b76eb1a9c2e93a6eec3a6 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 4 Dec 2019 13:57:18 +0900
Subject: [PATCH] Remove use of XLogFileNameP() in error string generation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XLogFileNameP() is a wrapper routine able to build a palloc'd string for
a WAL segment name, which is used in error strings.  There were several
code paths where it is called in a critical section, where memory
allocation is not allowed, triggering an assertion failure on PANIC
instead of generating a proper message.  Note that this could be a
problem on OOM when calling the routine, as any failure would be
escalated to a PANIC as well.  This removes all the callers of this
routine, to fix existing mistakes and prevent future ones, replacing the
incorrect locations with a logic using a fixed-size buffer.

Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/CA+fd4k5gC9H4uoWMLg9K_QfNrnkkdEw+-AFveob9YX7z8JnKTA@mail.gmail.com
---
 src/backend/access/transam/xlog.c     | 78 ++++++++++++++++++++-------
 src/backend/replication/walreceiver.c | 30 ++++++++---
 src/backend/replication/walsender.c   | 39 +++++++++++---
 3 files changed, 112 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1a87042223..58c4ed9083 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2499,14 +2499,21 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				pgstat_report_wait_end();
 				if (written <= 0)
 				{
+					char		xlogfname[MAXFNAMELEN];
+					int			save_errno;
+
 					if (errno == EINTR)
 						continue;
+
+					save_errno = errno;
+					XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+								 wal_segment_size);
+					errno = save_errno;
 					ereport(PANIC,
 							(errcode_for_file_access(),
 							 errmsg("could not write to log file %s "
 									"at offset %u, length %zu: %m",
-									XLogFileNameP(ThisTimeLineID, openLogSegNo),
-									startoffset, nleft)));
+									xlogfname, startoffset, nleft)));
 				}
 				nleft -= written;
 				from += written;
@@ -3792,10 +3799,17 @@ XLogFileClose(void)
 #endif
 
 	if (close(openLogFile))
+	{
+		char		xlogfname[MAXFNAMELEN];
+		int			save_errno = errno;
+
+		XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						XLogFileNameP(ThisTimeLineID, openLogSegNo))));
+				 errmsg("could not close file \"%s\": %m", xlogfname)));
+	}
+
 	openLogFile = -1;
 }
 
@@ -5527,10 +5541,17 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		fd = XLogFileInit(startLogSegNo, &use_existent, true);
 
 		if (close(fd))
+		{
+			char		xlogfname[MAXFNAMELEN];
+			int			save_errno = errno;
+
+			XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo,
+						 wal_segment_size);
+			errno = save_errno;
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not close file \"%s\": %m",
-							XLogFileNameP(ThisTimeLineID, startLogSegNo))));
+					 errmsg("could not close file \"%s\": %m", xlogfname)));
+		}
 	}
 
 	/*
@@ -10078,10 +10099,19 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 		{
 			pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 			if (pg_fsync(openLogFile) != 0)
+			{
+				char		xlogfname[MAXFNAMELEN];
+				int			save_errno;
+
+				save_errno = errno;
+				XLogFileName(xlogfname, ThisTimeLineID, openLogSegNo,
+							 wal_segment_size);
+				errno = save_errno;
 				ereport(PANIC,
 						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, openLogSegNo))));
+						 errmsg("could not fsync file \"%s\": %m", xlogfname)));
+			}
+
 			pgstat_report_wait_end();
 			if (get_sync_bit(sync_method) != get_sync_bit(new_sync_method))
 				XLogFileClose();
@@ -10099,32 +10129,25 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	char	   *msg = NULL;
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
 	{
 		case SYNC_METHOD_FSYNC:
 			if (pg_fsync_no_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fsync file \"%s\": %m");
 			break;
 #ifdef HAVE_FSYNC_WRITETHROUGH
 		case SYNC_METHOD_FSYNC_WRITETHROUGH:
 			if (pg_fsync_writethrough(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fsync write-through file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fsync write-through file \"%s\": %m");
 			break;
 #endif
 #ifdef HAVE_FDATASYNC
 		case SYNC_METHOD_FDATASYNC:
 			if (pg_fdatasync(fd) != 0)
-				ereport(PANIC,
-						(errcode_for_file_access(),
-						 errmsg("could not fdatasync file \"%s\": %m",
-								XLogFileNameP(ThisTimeLineID, segno))));
+				msg = _("could not fdatasync file \"%s\": %m");
 			break;
 #endif
 		case SYNC_METHOD_OPEN:
@@ -10135,6 +10158,21 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
 			elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
 			break;
 	}
+
+	/* PANIC if failed to fsync */
+	if (msg)
+	{
+		char		xlogfname[MAXFNAMELEN];
+		int			save_errno = errno;
+
+		XLogFileName(xlogfname, ThisTimeLineID, segno,
+					 wal_segment_size);
+		errno = save_errno;
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg(msg, xlogfname)));
+	}
+
 	pgstat_report_wait_end();
 }
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 6abc780778..dbba96af6c 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -576,17 +576,17 @@ WalReceiverMain(void)
 			char		xlogfname[MAXFNAMELEN];
 
 			XLogWalRcvFlush(false);
+			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 			if (close(recvFile) != 0)
 				ereport(PANIC,
 						(errcode_for_file_access(),
 						 errmsg("could not close log segment %s: %m",
-								XLogFileNameP(recvFileTLI, recvSegNo))));
+								xlogfname)));
 
 			/*
 			 * Create .done file forcibly to prevent the streamed segment from
 			 * being archived later.
 			 */
-			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 			if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
 				XLogArchiveForceDone(xlogfname);
 			else
@@ -911,6 +911,8 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 
 				XLogWalRcvFlush(false);
 
+				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
 				/*
 				 * XLOG segment files will be re-read by recovery in startup
 				 * process soon, so we don't advise the OS to release cache
@@ -920,13 +922,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 					ereport(PANIC,
 							(errcode_for_file_access(),
 							 errmsg("could not close log segment %s: %m",
-									XLogFileNameP(recvFileTLI, recvSegNo))));
+									xlogfname)));
 
 				/*
 				 * Create .done file forcibly to prevent the streamed segment
 				 * from being archived later.
 				 */
-				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
 				if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
 					XLogArchiveForceDone(xlogfname);
 				else
@@ -954,11 +955,18 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		if (recvOff != startoff)
 		{
 			if (lseek(recvFile, (off_t) startoff, SEEK_SET) < 0)
+			{
+				char		xlogfname[MAXFNAMELEN];
+				int			save_errno = errno;
+
+				XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+				errno = save_errno;
 				ereport(PANIC,
 						(errcode_for_file_access(),
 						 errmsg("could not seek in log segment %s to offset %u: %m",
-								XLogFileNameP(recvFileTLI, recvSegNo),
-								startoff)));
+								xlogfname, startoff)));
+			}
+
 			recvOff = startoff;
 		}
 
@@ -968,15 +976,21 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 		byteswritten = write(recvFile, buf, segbytes);
 		if (byteswritten <= 0)
 		{
+			char		xlogfname[MAXFNAMELEN];
+			int			save_errno;
+
 			/* if write didn't set errno, assume no disk space */
 			if (errno == 0)
 				errno = ENOSPC;
+
+			save_errno = errno;
+			XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+			errno = save_errno;
 			ereport(PANIC,
 					(errcode_for_file_access(),
 					 errmsg("could not write to log segment %s "
 							"at offset %u, length %lu: %m",
-							XLogFileNameP(recvFileTLI, recvSegNo),
-							recvOff, (unsigned long) segbytes)));
+							xlogfname, recvOff, (unsigned long) segbytes)));
 		}
 
 		/* Update state for write */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 356073ed4a..813d7d92cc 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2440,10 +2440,19 @@ retry:
 				 * removed or recycled.
 				 */
 				if (errno == ENOENT)
+				{
+					char		xlogfname[MAXFNAMELEN];
+					int			save_errno = errno;
+
+					XLogFileName(xlogfname, curFileTimeLine, sendSegNo,
+								 wal_segment_size);
+					errno = save_errno;
+
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("requested WAL segment %s has already been removed",
-									XLogFileNameP(curFileTimeLine, sendSegNo))));
+									xlogfname)));
+				}
 				else
 					ereport(ERROR,
 							(errcode_for_file_access(),
@@ -2457,11 +2466,18 @@ retry:
 		if (sendOff != startoff)
 		{
 			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+			{
+				char		xlogfname[MAXFNAMELEN];
+				int			save_errno = errno;
+
+				XLogFileName(xlogfname, curFileTimeLine, sendSegNo,
+							 wal_segment_size);
+				errno = save_errno;
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not seek in log segment %s to offset %u: %m",
-								XLogFileNameP(curFileTimeLine, sendSegNo),
-								startoff)));
+								xlogfname, startoff)));
+			}
 			sendOff = startoff;
 		}
 
@@ -2476,19 +2492,28 @@ retry:
 		pgstat_report_wait_end();
 		if (readbytes < 0)
 		{
+			char		xlogfname[MAXFNAMELEN];
+			int			save_errno = errno;
+
+			XLogFileName(xlogfname, curFileTimeLine, sendSegNo,
+						 wal_segment_size);
+			errno = save_errno;
+
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read from log segment %s, offset %u, length %zu: %m",
-							XLogFileNameP(curFileTimeLine, sendSegNo),
-							sendOff, (Size) segbytes)));
+							xlogfname, sendOff, (Size) segbytes)));
 		}
 		else if (readbytes == 0)
 		{
+			char		xlogfname[MAXFNAMELEN];
+
+			XLogFileName(xlogfname, curFileTimeLine, sendSegNo,
+						 wal_segment_size);
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
 					 errmsg("could not read from log segment %s, offset %u: read %d of %zu",
-							XLogFileNameP(curFileTimeLine, sendSegNo),
-							sendOff, readbytes, (Size) segbytes)));
+							xlogfname, sendOff, readbytes, (Size) segbytes)));
 		}
 
 		/* Update state for read */
-- 
2.24.0

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: Using XLogFileNameP in critical section

On 12/3/19 11:24 AM, Tom Lane wrote:

I wrote:

Also, buildfarm member drongo is not happy:
postgres.def : error LNK2001: unresolved external symbol XLogFileNameP [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]
Release/postgres/postgres.lib : fatal error LNK1120: 1 unresolved externals [C:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]
I'm guessing you missed a reference someplace.

Hm ... grep swears up and down that there is no remaining instance
of the string "XLogFileNameP" anywhere in the tree. So this doesn't
seem to be the fault of 9989d37d1 per se. What my eye now falls on
is this, a bit further up in the build log [1]:

...
PreLinkEvent:
Generate DEF file
perl src\tools\msvc\gendef.pl Release\postgres x64
:VCEnd
Not re-generating POSTGRES.DEF, file already exists.
Link:
...

So it seems that the problem might really be a faulty rule in our
MSVC build script about when postgres.def needs to be regenerated?
Or else it's some weird caching problem on drongo --- the lack of
complaints from other Windows critters might point the finger
that way.

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2019-12-03%2007%3A30%3A01

this was pilot error on my part. Should be fixed now.

cheers

andrew