Unified File API

Started by John Morrisover 2 years ago3 messages
#1John Morris
john.morris@crunchydata.com
2 attachment(s)

Background
==========
PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines.
PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet,
BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen,
Smgr_open,

On the downside, “amazing variety” also means somewhat confusing and difficult to add new features.
Someday, we’d like to add encryption or compression to the various PostgreSql files.
To do that, we need to bring all the relevant files into a common file API where we can implement
the new features.

Goals of Patch
=============
1)Unify file access so most of “the other” files can go through a common interface, allowing new features
like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t
change the logic of current code. 3)Convert a reasonable set of callers to use the new interface.

Note the focus is on the “other” files. The buffer cache and the WAL have similar needs,
but they are being done in a separate project. (yes, the two projects are coordinating)

Patch 0001. Create a common file API.
===============================
Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open),
2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile).
Of these three, virtual file descriptors (VFDs) are the most common. They are also the
only funnel which is implemented by PostgresSql.

Decision: Choose VFDs as the common interface.

Problem: VFDs are random access only.
Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND)

Problem: VFDs have minimal error handling (based on errno.)
Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg)

Problem: Must maintain compatibility with existing error handling code.
Solution: save and restore errno to minimize changes to existing code.

Patch 0002. Update code to use the common file API
===========================================
The second patch alters callers so they use VFDs rather than system or C library files.
It doesn’t modify all callers, but it does capture many of the files which need
to be encrypted or compressed. This is definitely WIP.

Future (not too far away)
=====================
Looking ahead, there will be another set of patches which inject buffering and encryption into
the VFD interface. The future patches will build on the current work and introduce new “oflags”
to enable encryption and buffering.

Compression is also a possibility, but currently lower priority and a bit tricky for random access files.
Let us know if you have a use case.

Attachments:

0001-UpdateFileAPI.patchapplication/octet-stream; name=0001-UpdateFileAPI.patchDownload
From 0812565a8c0a4c5e6680a59bdf0da0ba3a53e748 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 17:56:14 -0700
Subject: [PATCH 1/2] Added new APIs to Virtual File Descriptors.

---
 src/backend/storage/file/fd.c | 388 +++++++++++++++++++++++++++++++++-
 src/include/storage/fd.h      |  46 +++-
 2 files changed, 421 insertions(+), 13 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 3c2a2fbef7..ff37ca41fd 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -206,6 +206,11 @@ typedef struct vfd
 	/* NB: fileName is malloc'd, and must be free'd when closing the VFD */
 	int			fileFlags;		/* open(2) flags for (re)opening the file */
 	mode_t		fileMode;		/* mode to pass to open(2) */
+
+	int 	    errorCode;        /* Code of the most recent error */
+	char        errorMsg[121];	 /* The most recent error message */
+	bool		eof;	         /* Result of last read */
+	off_t 		offset; 		 /* Current position in file */
 } Vfd;
 
 /*
@@ -1538,8 +1543,8 @@ PathNameOpenFile(const char *fileName, int fileFlags)
  * it will be interpreted relative to the process' working directory
  * (which should always be $PGDATA when this code is running).
  */
-File
-PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
+static File
+PathNameOpenFilePerm_Internal(const char *fileName, int fileFlags, mode_t fileMode)
 {
 	char	   *fnamecopy;
 	File		file;
@@ -1928,10 +1933,11 @@ PathNameDeleteTemporaryFile(const char *path, bool error_on_failure)
 /*
  * close a file when done with it
  */
-void
-FileClose(File file)
+static int
+FileClose_Internal(File file)
 {
 	Vfd		   *vfdP;
+	int 	   save_errno = 0;
 
 	Assert(FileIsValid(file));
 
@@ -1945,6 +1951,7 @@ FileClose(File file)
 		/* close the file */
 		if (close(vfdP->fd) != 0)
 		{
+			save_errno = errno;
 			/*
 			 * We may need to panic on failure to close non-temporary files;
 			 * see LruDelete.
@@ -1993,16 +2000,19 @@ FileClose(File file)
 
 		/* in any case do the unlink */
 		if (unlink(vfdP->fileName))
+		{
+			save_errno = errno;
 			ereport(LOG,
 					(errcode_for_file_access(),
-					 errmsg("could not delete file \"%s\": %m", vfdP->fileName)));
+						errmsg("could not delete file \"%s\": %m", vfdP->fileName)));
+		}
 
 		/* and last report the stat results */
 		if (stat_errno == 0)
 			ReportTemporaryFileUsage(vfdP->fileName, filestats.st_size);
 		else
 		{
-			errno = stat_errno;
+			save_errno = errno = stat_errno;
 			ereport(LOG,
 					(errcode_for_file_access(),
 					 errmsg("could not stat file \"%s\": %m", vfdP->fileName)));
@@ -2017,6 +2027,14 @@ FileClose(File file)
 	 * Return the Vfd slot to the free list
 	 */
 	FreeVfd(file);
+
+	if (save_errno != 0)
+	{
+		errno = save_errno;
+		return -1;
+	}
+
+	return 0;
 }
 
 /*
@@ -2086,8 +2104,8 @@ FileWriteback(File file, off_t offset, off_t nbytes, uint32 wait_event_info)
 	pgstat_report_wait_end();
 }
 
-int
-FileRead(File file, void *buffer, size_t amount, off_t offset,
+static ssize_t
+FileRead_Internal(File file, void *buffer, size_t amount, off_t offset,
 		 uint32 wait_event_info)
 {
 	int			returnCode;
@@ -2142,8 +2160,8 @@ retry:
 	return returnCode;
 }
 
-int
-FileWrite(File file, const void *buffer, size_t amount, off_t offset,
+static ssize_t
+FileWrite_Internal(File file, const void *buffer, size_t amount, off_t offset,
 		  uint32 wait_event_info)
 {
 	int			returnCode;
@@ -3974,3 +3992,353 @@ assign_io_direct(const char *newval, void *extra)
 
 	io_direct_flags = *flags;
 }
+
+
+/*******************************************************************************
+* The following functions add sequential and error handling to the VFD routines.
+* They are mostly wrappers around the original VFD routines,
+* which have been renamed by appending "_internal".
+*/
+
+/* Point to the Vfd struct for the given file descriptor */
+static inline Vfd* getVfd(File file)
+{
+	Assert(file >= 0 && file < MAXIMUM_VFD);
+	return &VfdCache[file];
+}
+
+/* Point to the Vfd struct, or a dummy Vfd if the file descriptor is -1 */
+static inline Vfd* getVfdErr(File file)
+{
+	/* Allocate a static Vfd to handle the file = -1 case */
+	static Vfd dummyVfd[1] = {{.fileName = "dummy(-1)"}};
+
+	if (file == -1)
+		return dummyVfd;
+	else
+		return getVfd(file);
+}
+
+static const char *getName(File file)
+{
+	return getVfdErr(file)->fileName;
+}
+
+/*
+ * Open a file
+ * If an error occurs, returns -1 and set up error information
+ * so FileError(-1) will return true. Note errno is set for compatibility.
+ *
+ * We must be sure to release *all* resources if we fail to open the file.
+ * It should be the same as though never opened.
+ */
+File PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
+{
+	File file = -1;
+	bool append;
+	off_t position = 0;
+
+	debug("FileOpenPerm: fileName=%s fileFlags=0x%x fileMode=0x%x\n", fileName, fileFlags, fileMode);
+
+	/* VFDs don't implement O_APPEND. We will position to FileSize instead. */
+	append = (fileFlags & O_APPEND) != 0;
+	fileFlags &= ~O_APPEND;
+
+	/* Clear any previous error information */
+	FileClearError(-1);
+
+	/* Open the VFD */
+	file = PathNameOpenFilePerm_Internal(fileName, fileFlags, fileMode);
+	if (file == -1)
+		return setFileError(-1, errno, "Unable to open file: %s", fileName);
+
+	/* Position at end of file if appending. This only impacts WriteSeq and ReadSeq. */
+	if (append) {
+
+		/* Get the size of the file */
+		position = FileSize(file);
+		if (position == -1) {
+			FileClose(file);
+			return setFileError(-1, errno, "Unable to O_APPEND to file: %s", fileName);
+		}
+	}
+
+	/* Success!. Save the desired position */
+	getVfd(file)->offset = position;
+	FileClearError(file);
+
+	return file;
+}
+
+/*
+ * Close a file. Like FileOpen(), the error information is saved in the dummy "-1" file,
+ * but it can also be accessed using the closed virtual file descriptor.
+ *
+ * Close has special error handling. If the vfd already has an error, we don't
+ * overwrite it.  This is because the error may have been set by a previous
+ * operation on the file, and we don't want to lose that information.
+ * However, the return value will always be 0 if we closed the file successfully.
+ */
+int FileClose(File file)
+{
+	debug("FileClose: name=%s, file=%d\n", getName(file), file);
+
+	/* Save any existing error information */
+	copyFileError(-1, file);
+
+	/* Close the file */
+	if (FileClose_Internal(file) == -1)
+	    return updateFileError(-1, errno, "Unable to close file: %s", getName(file));
+
+	debug("FileClose(done): file=%d\n", file);
+
+	return 0;
+}
+
+
+ssize_t FileRead(File file, void *buffer, size_t amount, off_t offset, uint32 wait_event_info)
+{
+	ssize_t actual;
+
+	debug("FileRead: name=%s file=%d  amount=%zd offset=%lld\n", getName(file), file, amount, offset);
+	Assert(offset >= 0);
+	Assert((ssize_t)amount > 0);
+
+	/* Read the data as requested */
+	actual = FileRead_Internal(file, buffer, amount, offset, wait_event_info);
+	getVfd(file)->eof = (actual == 0);
+
+	/* If successful, update the file offset */
+	if (actual >= 0)
+		getVfd(file)->offset = offset + actual;
+
+	debug("FileRead(done): file=%d  name=%s  actual=%zd\n", file, getName(file), actual);
+	return actual;
+}
+
+
+ssize_t FileWrite(File file, const void *buffer, size_t amount, off_t offset, uint32 wait_event_info)
+{
+	ssize_t actual;
+
+	debug("FileWrite: name=%s file=%d  amount=%zd offset=%lld\n", getName(file), file, amount, offset);
+	Assert(offset >= 0 && (ssize_t)amount > 0);
+
+	/* Write the data as requested */
+	actual = FileWrite_Internal(file, buffer, amount, offset, wait_event_info);
+
+	/* If successful, update the file offset */
+	if (actual >= 0)
+		getVfd(file)->offset = offset + actual;
+
+	debug("FileWrite(done): file=%d  name=%s  actual=%zd\n", file, getName(file), actual);
+
+	return actual;
+}
+
+
+
+
+/*========================================================================================
+ * Routines to emuulate C library FILE routines (fgetc, fprintf, ...)
+ */
+
+/*
+ * Similar to fgetc. Probably best if used with buffered files.
+ */
+ssize_t
+FileGetc(File file)
+{
+	char c;
+	int ret = (int)FileReadSeq(file, &c, 1, 0);
+	if (ret <= 0)
+		ret = EOF;
+	return ret;
+}
+
+/* Similar to fputc */
+ssize_t FilePutc(int c, File file)
+{
+	char cbuf;
+	cbuf = c;
+	if (FileWriteSeq(file, &cbuf, 1, 0) <= 0)
+		c = EOF;
+	return c;
+}
+
+/*
+ * A temporary equivalent of fprintf.
+ * This version limits text to what fits in a local buffer.
+ * Ultimately, we need to update the internal snprintf.c (dopr) to spill
+ * to temporary files.
+ */
+ssize_t FilePrintf(File file, const char *format, ...)
+{
+	va_list args;
+	char buffer[4*1024]; /* arbitrary size, big enough? */
+	int size;
+
+	va_start(args, format);
+	size = vsnprintf(buffer, sizeof(buffer), format, args);
+	va_end(args);
+
+	if (size < 0 || size >= sizeof(buffer))
+		ereport(ERROR,
+				errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				errmsg("FilePrintf buffer overflow - %d characters exceeded %lu buffer", size, sizeof(buffer)));
+
+	return FileWriteSeq(file, buffer, Min(size, sizeof(buffer)-1), 0);
+}
+
+ssize_t FileScanf(File file, const char *format, ...)
+{
+	Assert(false); /* Not implemented */
+	return -1;
+}
+
+ssize_t FilePuts(File file, const char *string)
+{
+	return FileWriteSeq(file, string, strlen(string), 0);
+}
+
+/*
+ * Read sequentially from the file.
+ */
+ssize_t
+FileReadSeq(File file, void *buffer, size_t amount, uint32 wait_event_info)
+{
+	return FileRead(file, buffer, amount, getVfd(file)->offset, wait_event_info);
+}
+
+/*
+ * Write sequentially to the file.
+ */
+ssize_t
+FileWriteSeq(File file, const void *buffer, size_t amount, uint32 wait_event_info)
+{
+	return FileWrite(file, buffer, amount, getVfd(file)->offset, wait_event_info);
+}
+
+/*
+ * Seek to an absolute position within the file.
+ * Relative positions can be calculated using FileTell or FileSize.
+ */
+off_t
+FileSeek(File file, off_t offset)
+{
+	getVfd(file)->offset = offset;
+	return offset;
+}
+
+/*
+ * Tell us the current file position
+ */
+off_t
+FileTell(File file)
+{
+	return getVfd(file)->offset;
+}
+
+/* ===================================================================
+ * Error handling code.
+ * These functions are similar to ferror(), but can be accessed even when the file is closed or -1.
+ * This added feature allows error info to be fetched after a failed "open" or "close" call.
+ */
+
+/* True if an error occurred on the file.  (EOF is not an error) */
+bool FileError(File file)
+{
+	return FileErrorCode(file) != 0;
+}
+
+/* True if the last read generated an EOF */
+int FileEof(File file)
+{
+	return getVfd(file)->eof;
+}
+
+/* Clears an error, and is true if an error had been encountered */
+bool FileClearError(File file)
+{
+	Vfd *vfd = getVfdErr(file);
+	bool hasError = vfd->errorCode != 0;
+	if (hasError)
+	{
+		vfd->errorCode = 0;
+		vfd->errorMsg[0] = '\0';
+	}
+	return hasError;
+}
+
+/* Get a pointer to the error message */
+const char *FileErrorMsg(File file)
+{
+	return getVfdErr(file)->errorMsg;
+}
+
+/*
+ * Get the errno associated the file.
+ * As a side effect, restores errno to the value it had when the error occurred.
+ */
+int FileErrorCode(File file)
+{
+	errno = getVfdErr(file)->errorCode;
+	return errno;
+}
+
+int setFileError(File file, int err, const char *fmt, ...)
+{
+	va_list args;
+	Vfd *vfd = getVfdErr(file);
+
+	/* Save the errno */
+	vfd->errorCode = errno;
+
+	/* Format the error message */
+	va_start(args, fmt);
+	vsnprintf(vfd->errorMsg, sizeof(vfd->errorMsg), fmt, args);
+	va_end(args);
+
+	/* Restore the error code for compatibility */
+	errno = vfd->errorCode;
+
+	/* Return -1 to indicate an error */
+	return -1;
+}
+
+int updateFileError(File file, int err, const char *fmt, ...)
+{
+    va_list args;
+	Vfd *vfd = getVfdErr(file);
+
+	/* if we already have an error, don't overwrite it */
+	if (vfd->errorCode != 0)
+		return -1;
+
+	/* Save the errno */
+	vfd->errorCode = errno;
+
+	/* Format the error message */
+	va_start(args, fmt);
+	vsnprintf(vfd->errorMsg, sizeof(vfd->errorMsg), fmt, args);
+	va_end(args);
+
+	/* Restore the error code for compatibility */
+	errno = vfd->errorCode;
+
+	/* Return -1 to indicate an error */
+	return -1;
+}
+
+int copyFileError(File dst, File src)
+{
+	Vfd *vfdDst = getVfdErr(dst);
+	Vfd *vfdSrc = getVfdErr(src);
+
+	/* Copy the error code and message */
+	vfdDst->errorCode = vfdSrc->errorCode;
+	strncpy(vfdDst->errorMsg, vfdSrc->errorMsg, sizeof(vfdDst->errorMsg));
+
+	/* Return -1 to indicate an error */
+	return -1;
+}
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..4b26b501d2 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -109,10 +109,10 @@ extern PGDLLIMPORT int max_safe_fds;
 extern File PathNameOpenFile(const char *fileName, int fileFlags);
 extern File PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode);
 extern File OpenTemporaryFile(bool interXact);
-extern void FileClose(File file);
+extern int  FileClose(File file);
 extern int	FilePrefetch(File file, off_t offset, off_t amount, uint32 wait_event_info);
-extern int	FileRead(File file, void *buffer, size_t amount, off_t offset, uint32 wait_event_info);
-extern int	FileWrite(File file, const void *buffer, size_t amount, off_t offset, uint32 wait_event_info);
+extern ssize_t	FileRead(File file, void *buffer, size_t amount, off_t offset, uint32 wait_event_info);
+extern ssize_t	FileWrite(File file, const void *buffer, size_t amount, off_t offset, uint32 wait_event_info);
 extern int	FileSync(File file, uint32 wait_event_info);
 extern int	FileZero(File file, off_t offset, off_t amount, uint32 wait_event_info);
 extern int	FileFallocate(File file, off_t offset, off_t amount, uint32 wait_event_info);
@@ -199,4 +199,44 @@ extern int	data_sync_elevel(int elevel);
 #define PG_TEMP_FILES_DIR "pgsql_tmp"
 #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
 
+/* Operations on virtual files -- Sequential I/O */
+extern ssize_t FileWriteSeq(File file, const void *buffer, size_t amount, uint32 wait_event_info);
+extern ssize_t FileReadSeq(File file, void *buffer, size_t amount, uint32 wait_event_info);
+extern off_t FileTell(File file);
+extern off_t FileSeek(File file, off_t offset);
+
+/* Operations on virtual files --- similar to fread/fwrite */
+extern ssize_t FilePrintf(File file, const char *format, ...) pg_attribute_printf(2,3);
+extern ssize_t FileScanf(File file, const char *format, ...) pg_attribute_printf(2,3);
+extern ssize_t FilePuts(File, const char *string);
+extern ssize_t FileGetc(File file);
+extern ssize_t FilePutc(int c, File file);
+
+/* Error handling on virtual files -- similar to feof/ferror */
+extern int FileEof(File file);           /* Reset on every read */
+extern bool FileError(File file);        /* Persists until cleared */
+extern bool FileClearError(File file);   /* Clears both Eof and error */
+
+extern const char *FileErrorMsg(File file);
+extern int FileErrorCode(File file);
+
+/* Internal helpers for error handling */
+extern int setFileError(File file, int err, const char *format, ...);
+extern int updateFileError(File file, int err, const char *format, ...);
+extern int copyFileError(File dst, File src);
+
+/* Declare a "debug" macro */
+#ifdef DEBUG
+#define debug(...) \
+    do {  \
+	    int save_errno = errno; \
+		/* fprintf(stderr, __VA_ARGS__); */ \
+		elog(DEBUG2, args);   \
+        errno = save_errno;  \
+	while (0)
+
+#else
+#define debug(...) ((void)0)
+#endif
+
 #endif							/* FD_H */
-- 
2.33.0


From 9387e8fe8ed9064bcfdc6da7d51df432d00c2586 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 19:59:47 -0700
Subject: [PATCH 2/2] Added sequential unit test and fixed error reporting bugs

---
 src/backend/storage/file/fd.c                 |  25 +-
 src/include/storage/fd.h                      |  13 +-
 src/test/Makefile                             |   2 +-
 src/test/meson.build                          |   1 +
 src/test/storage/framework/fileFramework.c    | 473 ++++++++++++++++++
 src/test/storage/framework/fileFramework.h    |  25 +
 src/test/storage/framework/unitTest.h         |  20 +
 src/test/storage/framework/unitTestInternal.h |  84 ++++
 src/test/storage/meson.build                  |  22 +
 src/test/storage/vfdTest.c                    |  23 +
 10 files changed, 672 insertions(+), 16 deletions(-)
 create mode 100644 src/test/storage/framework/fileFramework.c
 create mode 100644 src/test/storage/framework/fileFramework.h
 create mode 100644 src/test/storage/framework/unitTest.h
 create mode 100644 src/test/storage/framework/unitTestInternal.h
 create mode 100644 src/test/storage/meson.build
 create mode 100644 src/test/storage/vfdTest.c

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index ff37ca41fd..cd7441eb37 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -4083,6 +4083,10 @@ int FileClose(File file)
 {
 	debug("FileClose: name=%s, file=%d\n", getName(file), file);
 
+	/* If invalid vfd or if already closed, then EBADF */
+	if (file < 0 || file >= SizeVfdCache || getVfd(file)->fd == -1)
+		return setFileError(-1, EBADF, "FileClose: invalid file descriptor %d", file);
+
 	/* Save any existing error information */
 	copyFileError(-1, file);
 
@@ -4262,11 +4266,9 @@ bool FileClearError(File file)
 {
 	Vfd *vfd = getVfdErr(file);
 	bool hasError = vfd->errorCode != 0;
-	if (hasError)
-	{
-		vfd->errorCode = 0;
-		vfd->errorMsg[0] = '\0';
-	}
+	vfd->errorCode = 0;
+	vfd->errorMsg[0] = '\0';
+	vfd->eof = false;
 	return hasError;
 }
 
@@ -4282,17 +4284,18 @@ const char *FileErrorMsg(File file)
  */
 int FileErrorCode(File file)
 {
-	errno = getVfdErr(file)->errorCode;
-	return errno;
+	int errorCode = getVfdErr(file)->errorCode;
+	errno = errorCode;
+	return errorCode;
 }
 
-int setFileError(File file, int err, const char *fmt, ...)
+int setFileError(File file, int errorCode, const char *fmt, ...)
 {
 	va_list args;
 	Vfd *vfd = getVfdErr(file);
 
 	/* Save the errno */
-	vfd->errorCode = errno;
+	vfd->errorCode = errorCode;
 
 	/* Format the error message */
 	va_start(args, fmt);
@@ -4306,7 +4309,7 @@ int setFileError(File file, int err, const char *fmt, ...)
 	return -1;
 }
 
-int updateFileError(File file, int err, const char *fmt, ...)
+int updateFileError(File file, int errorCode, const char *fmt, ...)
 {
     va_list args;
 	Vfd *vfd = getVfdErr(file);
@@ -4316,7 +4319,7 @@ int updateFileError(File file, int err, const char *fmt, ...)
 		return -1;
 
 	/* Save the errno */
-	vfd->errorCode = errno;
+	vfd->errorCode = errorCode;
 
 	/* Format the error message */
 	va_start(args, fmt);
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 4b26b501d2..69b6ce7111 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -12,6 +12,8 @@
  *-------------------------------------------------------------------------
  */
 
+//#define DEBUG
+
 /*
  * calls:
  *
@@ -225,15 +227,18 @@ extern int setFileError(File file, int err, const char *format, ...);
 extern int updateFileError(File file, int err, const char *format, ...);
 extern int copyFileError(File dst, File src);
 
+/* Some nicer names */
+static inline File FileOpen(const char *name, int fileFlags) {return PathNameOpenFile(name, fileFlags);}
+
 /* Declare a "debug" macro */
 #ifdef DEBUG
 #define debug(...) \
     do {  \
-	    int save_errno = errno; \
-		/* fprintf(stderr, __VA_ARGS__); */ \
-		elog(DEBUG2, args);   \
+        int save_errno = errno; \
+        fprintf(stderr, __VA_ARGS__); \
+        /* elog(DEBUG2, __VA_ARGS__);  */ \
         errno = save_errno;  \
-	while (0)
+    } while (0)
 
 #else
 #define debug(...) ((void)0)
diff --git a/src/test/Makefile b/src/test/Makefile
index dbd3192874..ed56a2de86 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription storage
 
 ifeq ($(with_icu),yes)
 SUBDIRS += icu
diff --git a/src/test/meson.build b/src/test/meson.build
index 5f3c9c2ba2..f582e6c497 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -5,6 +5,7 @@ subdir('isolation')
 
 subdir('authentication')
 subdir('recovery')
+subdir('storage')
 subdir('subscription')
 subdir('modules')
 
diff --git a/src/test/storage/framework/fileFramework.c b/src/test/storage/framework/fileFramework.c
new file mode 100644
index 0000000000..405859b27b
--- /dev/null
+++ b/src/test/storage/framework/fileFramework.c
@@ -0,0 +1,473 @@
+/**/
+
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fcntl.h>
+#include <unistd.h>
+#include <stdbool.h>
+
+#include "c.h"
+//#include "storage/iostack_internal.h"
+#include "storage/fd.h"
+//#include "storage/vfd.h"
+#include "utils/wait_event.h"
+
+#include "fileFramework.h"
+#include "unitTestInternal.h"
+
+typedef uint8_t Byte;
+
+typedef void IoStack;
+typedef IoStack *(*CreateTestStackFn)(size_t blockSize);
+void setTestStack(CreateTestStackFn fn, size_t blockSize);
+#define MIN(a,b) ((a) < (b) ? (a) : (b))
+#define ROUNDUP(a,b) (((a) + (b) - 1) / (b) * (b))
+#define ROUNDDOWN(a,b) ((a) / (b) * (b))
+#define PATH_MAX 1024
+
+#define countof(array) (sizeof(array)/sizeof(array[0]))
+
+/* Matrix of file and block sizes for testing. */
+off_t fileSize[] = {0, 1024, 1, 64, 1027, 7*1024, 32*1024 + 127, 6*1024*1024+153};
+size_t blockSize[] = {1024, 4 * 1024, 3 * 1024 + 357, 1024 - 237, 64, 1};
+
+
+/* Given the position in the seek, generate one byte of data for that position. */
+static inline Byte generateByte(size_t position)
+{
+    static char data[] = "The cat in the hat jumped over the quick brown fox while the dog ran away with the spoon.\n";
+    size_t idx = position % (sizeof(data)-1);    // Skip the nil character.
+    return data[idx];
+}
+
+/* Fill a buffer with data appropriate to that position in the seek */
+static void generateBuffer(size_t position, Byte *buf, size_t size)
+{
+    for (size_t i = 0; i < size; i++)
+        buf[i] = generateByte(position+i);
+}
+
+/* Verify a buffer has appropriate data for that position in the test file. */
+static bool verifyBuffer(size_t position, Byte *buf, size_t size)
+{
+    for (size_t i = 0; i < size; i++)
+    {
+        Byte expected = generateByte(position + i);
+		if (expected != buf[i])
+			debug("verifyBuffer: i=%zu position=%zu  buf[i]=%c expected=%c\n", i, position, buf[i], expected);
+        PG_ASSERT_EQ(expected, buf[i]);
+    }
+    return true;
+}
+
+/*
+ * Create a file and fill it with known data.
+ * The file contains the same line of text repeated over and over, which
+ *   - makes it easy to verify with a text editor,
+ *   - doesn't align with typical block sizes, and
+ *   - is compressible.
+ */
+static void generateFile(char *path, off_t size, size_t bufferSize)
+{
+    Byte *buf;
+    off_t position;
+    File file;
+
+    debug("generateFile: path=%s\n", path);
+    file = FileOpen( path, O_WRONLY|O_CREAT|O_TRUNC);
+	PG_ASSERT(file != -1);
+    buf = malloc(bufferSize); /* TODO: make buf be at end of struct */
+
+    for (position = 0; position < size; position += bufferSize)
+    {
+        ssize_t actual;
+        off_t expected = MIN(bufferSize, size-position);
+        generateBuffer(position, buf, expected);
+        actual = FileWriteSeq(file, buf, expected, 0);
+        PG_ASSERT_EQ(expected, actual);
+    }
+
+    free(buf);
+    PG_ASSERT(FileClose(file) == 0);
+}
+
+/* Verify a iostack has the correct data */
+static void verifyFile(char *path, off_t fileSize, ssize_t bufferSize)
+{
+    File file;
+    Byte *buf;
+
+    debug("verifyFile: path=%s\n", path);
+    file = FileOpen(path, O_RDONLY|PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+	PG_ASSERT(!FileEof(file));
+	PG_ASSERT(!FileError(file));
+    buf = malloc(bufferSize);
+
+    for (off_t actual, position = 0; position < fileSize; position += actual)
+    {
+        size_t expected = MIN(bufferSize, fileSize - position);
+        actual = FileReadSeq(file, buf, bufferSize, 0);
+        PG_ASSERT_EQ(expected, actual);
+        PG_ASSERT(verifyBuffer(position, buf, actual));
+		PG_ASSERT(!FileEof(file));
+		PG_ASSERT(!FileError(file));
+    }
+
+    // Read a final EOF.
+	PG_ASSERT(!FileEof(file));
+    FileReadSeq(file, buf, 1, 0);
+    PG_ASSERT(FileEof(file));
+
+    PG_ASSERT(FileClose(file) == 0);
+}
+
+/*
+ * Create a file and fill it with known data using random seeks.
+ * The file contains the same line of text repeated over and over, which
+ *   - makes it easy to verify output with a text editor,
+ *   - doesn't align with typical block sizes, and
+ *   - is compressible.
+ */
+static void allocateFile(char *path, off_t size, ssize_t bufferSize)
+{
+    File file;
+    Byte *buf;
+    off_t position;
+
+    debug("allocateFile: path=%s\n", path);
+    /* Start out by allocating space and filling the file with "X"s. */
+    file = FileOpen(path, O_WRONLY|O_CREAT|O_TRUNC|PG_TESTSTACK);
+    buf = malloc(bufferSize);
+    memset(buf, 'X', bufferSize);
+
+    for (position = 0; position < size; position += bufferSize)
+    {
+        size_t expected = (size_t)MIN(bufferSize, size-position);
+        size_t actual = FileWrite(file, buf, expected, position, 0);
+        PG_ASSERT_EQ(actual, expected);
+    }
+
+    PG_ASSERT(FileClose(file) == 0);
+    free(buf);
+}
+
+static const int prime = 3197;
+
+static void generateRandomFile(char *path, off_t size, size_t blockSize)
+{
+    size_t nrBlocks;
+    File file;
+    Byte *buf;
+
+    debug("generateRandomFile: path=%s\n", path);
+    /* The nr of blocks must be relatively prime to "prime", otherwise we won't visit all the blocks. */
+    nrBlocks = (size + blockSize - 1) / blockSize;
+    PG_ASSERT( nrBlocks == 0 || (nrBlocks % prime) != 0);
+
+    file = FileOpen(path, O_RDWR|PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+    buf = malloc(blockSize);
+
+
+    for (off_t idx = 0; idx < nrBlocks; idx++)
+    {
+        ssize_t actual, expected;
+        /* Pick a pseudo-random block and seek to it */
+        off_t position = ((idx * prime) % nrBlocks) * blockSize;
+        //printf("fileSeek - idx = %u  blockNr=%u nrBlocks=%u\n", idx, (idx*prime)%nrBlocks, nrBlocks);
+
+        /* Generate data appropriate for that block. */
+        expected = (size_t)MIN(blockSize, size - position);
+        generateBuffer(position, buf, expected);
+
+        /* Write the block */
+        actual = FileWrite(file, buf, expected, position, 0);
+        PG_ASSERT_EQ(expected,actual);
+    }
+
+    PG_ASSERT(FileClose(file) == 0);
+}
+
+static void appendFile(char *path, off_t fileSize, size_t bufferSize)
+{
+    File file;
+    Byte *buf;
+    ssize_t blockSize;
+    ssize_t lastSize;
+    off_t lastBlock;
+    ssize_t remaining, actual;
+
+    debug("appendFile: path=%s\n", path);
+    file = FileOpen(path, O_RDWR|O_APPEND|PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+    buf = malloc(bufferSize);
+
+    /* Since we are appending, we are at the end of file - should match file size */
+	PG_ASSERT_EQ(fileSize, FileTell(file));
+
+	/* The requested buffer size should be a multiple of the underlying block size. (Property of unit test) */
+	blockSize = 1; //FileBlockSize(file);
+	PG_ASSERT_EQ(0, bufferSize % blockSize);
+
+	/* If the last block is a partial block, ... */
+	lastBlock = ROUNDDOWN(fileSize, blockSize);
+	lastSize = fileSize - lastBlock;
+	if (lastSize > 0)
+	{
+		/* Rewrite the last block, which is now full */
+		generateBuffer(lastBlock, buf, blockSize);
+		PG_ASSERT_EQ(blockSize, FileWrite(file, buf, blockSize, lastBlock, 0));
+
+		/* Adjust the write parameters to write whatever is left. It is now block aligned. */
+		lastBlock += blockSize;
+	}
+
+    /* Write whatever remains to the end of file */
+	remaining = (fileSize + bufferSize) - lastBlock;
+    generateBuffer(lastBlock, buf, remaining);
+    actual = FileWriteSeq(file, buf, remaining, 0);
+    PG_ASSERT_EQ(remaining, actual);
+
+    /* Close the file and verify it is correct. */
+    PG_ASSERT_EQ(0, FileClose(file));
+    verifyFile(path, fileSize+bufferSize, bufferSize);
+}
+
+/*
+ * Verify an ioStack has the correct data through random seeks.
+ * This should do a complete verification - examining every byte of the file.
+ */
+static void verifyRandomFile(char *path, off_t size, size_t blockSize)
+{
+    File file;
+    Byte *buf;
+    size_t nrBlocks;
+
+    debug("verifyRandomFile: path=%s\n", path);
+	file = FileOpen(path, O_RDONLY|PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+    buf = malloc(blockSize);
+
+    nrBlocks = (size + blockSize -1) / blockSize;
+    PG_ASSERT(nrBlocks == 0 || (nrBlocks % prime) != 0);
+    for (size_t idx = 0;  idx < nrBlocks; idx++)
+    {
+        ssize_t actual, expected;
+
+        /* Pick a pseudo-random block and read it */
+        off_t position = ((idx * prime) % nrBlocks) * blockSize;
+
+        actual = FileRead(file, buf, blockSize, position, 0);
+
+        /* Verify we read the correct data */
+        expected = MIN(blockSize, size-position);
+        PG_ASSERT_EQ(expected, actual);
+        PG_ASSERT(verifyBuffer(position, buf, actual));
+    }
+
+    PG_ASSERT_EQ(0, FileClose(file));
+}
+
+
+static void deleteFile(char *name)
+{
+	unlink(name);
+}
+
+
+static void regression(char *name, size_t blockSize)
+{
+    File file;
+    Byte buf[128];
+    Byte *block;
+
+    deleteFile(name);
+
+	/* Shouldn't open a non-existent file - various modes) */
+	file = FileOpen(name, O_RDWR|PG_TESTSTACK);
+	PG_ASSERT_EQ(-1, file);
+	PG_ASSERT_EQ(ENOENT, errno);
+
+	file = FileOpen(name, O_RDONLY|PG_TESTSTACK);
+	PG_ASSERT(file == -1);
+	PG_ASSERT_EQ(errno, ENOENT);
+
+	/* OK to create a file and reopen readonly */
+	file = FileOpen(name, O_CREAT | O_WRONLY | O_TRUNC|PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+	PG_ASSERT_EQ(FileClose(file), 0);
+
+	file = FileOpen(name, O_CREAT | O_WRONLY | O_TRUNC | PG_TESTSTACK);
+	PG_ASSERT(file >= 0);
+	PG_ASSERT_EQ(FileClose(file), 0);
+
+	/* EBADF if closing an already closed file */
+	PG_ASSERT(FileClose(file) != 0 && errno == EBADF);
+
+	/* Should read EOF on empty file */
+	file = FileOpen(name, O_RDONLY|PG_TESTSTACK);
+	PG_ASSERT(0 == FileRead(file, buf, sizeof(buf), 0, 0));
+	PG_ASSERT(FileEof(file));
+	PG_ASSERT(!FileError(file));
+	PG_ASSERT(FileClose(file) == 0);
+
+	/* Should write a block and then read EOF */
+	block = calloc(blockSize, 1);
+	file = FileOpen(name, O_RDWR|PG_TESTSTACK);
+	PG_ASSERT_EQ(blockSize, FileWriteSeq(file, block, blockSize, 0));
+	PG_ASSERT_EQ(0, FileReadSeq(file, block, blockSize,  0));
+	PG_ASSERT(FileEof(file));
+	PG_ASSERT(!FileError(file));
+	PG_ASSERT(FileClose(file) == 0);
+	free(block);
+
+	deleteFile(name);
+}
+
+
+/*
+ * Run a test on a single configuration determined by file size and buffer size
+ */
+void singleSeekTest(CreateTestStackFn testStack, char *nameFmt, off_t size, size_t bufferSize)
+{
+    char fileName[PATH_MAX];
+    snprintf(fileName, sizeof(fileName), nameFmt, size, bufferSize);
+    beginTest(fileName);
+
+	/* Inject the procedure to create an I/O Stack */
+	setTestStack(testStack, bufferSize);
+
+    /* create and read back as a stream */
+    generateFile(fileName, size, bufferSize);
+    verifyFile(fileName, size, bufferSize);
+
+    /* Fill in the file with garbage, then write it out as random writes */
+    allocateFile(fileName, size, bufferSize);
+    generateRandomFile(fileName, size, bufferSize);
+    verifyFile(fileName, size, bufferSize);
+
+    /* append to the file */
+    appendFile(fileName, size, bufferSize);
+    verifyFile(fileName, size+bufferSize, ROUNDDOWN(16*1024, bufferSize));  /* larger buffer */
+
+    /* Read back as random reads */
+    verifyRandomFile(fileName, size+bufferSize, bufferSize);
+
+	regression(fileName, bufferSize);
+
+    /* Clean things up */
+    deleteFile(fileName);
+}
+
+/* run a matrix of tests for various file sizes and I/O sizes.  All will use a 1K block size. */
+void seekTest(CreateTestStackFn testStack, char *nameFmt)
+{
+    for (int fileIdx = 0; fileIdx<countof(fileSize); fileIdx++)
+        for (int bufIdx = 0; bufIdx<countof(blockSize); bufIdx++)
+            if  (fileSize[fileIdx] / blockSize[bufIdx] < 4 * 1024 * 1024)  // Keep nr blocks under 4M to complete in reasonable time.
+                singleSeekTest(testStack, nameFmt, fileSize[fileIdx], blockSize[bufIdx]);
+}
+
+
+
+/* Run a test on a single configuration determined by file size and buffer size */
+void singleStreamTest(CreateTestStackFn testStack, char *nameFmt, off_t size, size_t bufferSize)
+{
+    char fileName[PATH_MAX];
+    snprintf(fileName, sizeof(fileName), nameFmt, size, bufferSize);
+
+    beginTest(fileName);
+
+	/* Inject the procedure to create an I/O Stack */
+	setTestStack(testStack, bufferSize);
+
+	generateFile(fileName, size, bufferSize);
+    verifyFile(fileName, size, bufferSize);
+
+    appendFile(fileName, size, bufferSize);
+    verifyFile(fileName, size + bufferSize, 16 * 1024);
+
+	regression(fileName, bufferSize);
+
+    /* Clean things up */
+    deleteFile(fileName);
+}
+
+
+/* run a matrix of tests for various file sizes and buffer sizes */
+void streamTest(CreateTestStackFn testStack, char *nameFmt)
+{
+    for (int fileIdx = 0; fileIdx<countof(fileSize); fileIdx++)
+        for (int bufIdx = 0; bufIdx<countof(blockSize); bufIdx++)
+			if  (fileSize[fileIdx] / blockSize[bufIdx] < 4 * 1024 * 1024)  // Keep nr blocks under 4M to complete in reasonable time.
+                singleStreamTest(testStack, nameFmt, fileSize[fileIdx], blockSize[bufIdx]);
+}
+
+
+
+/* Run a test on a single configuration determined by file size and buffer size */
+void singleReadSeekTest(CreateTestStackFn testStack, char *nameFmt, off_t fileSize, size_t bufferSize)
+{
+    char fileName[PATH_MAX];
+    snprintf(fileName, sizeof(fileName), nameFmt, fileSize, bufferSize);
+
+    beginTest(fileName);
+
+	/* Set up the I/O stack we want to test */
+	setTestStack(testStack, bufferSize);
+
+	generateFile(fileName, fileSize, bufferSize);
+    verifyFile(fileName, fileSize, bufferSize);
+
+    verifyRandomFile(fileName, fileSize, bufferSize);
+
+    appendFile(fileName, fileSize, bufferSize);
+    verifyRandomFile(fileName, fileSize + bufferSize, bufferSize);
+
+	regression(fileName, bufferSize);
+
+    /* Clean things up */
+    deleteFile(fileName);
+}
+
+/* Create a test stack with a certain blockSize */
+typedef IoStack *(*CreateTestStackFn)(size_t blockSize);
+
+/* run a matrix of tests for various file sizes and buffer sizes */
+void readSeekTest(CreateTestStackFn testStack, char *nameFmt)
+{
+    for (int fileIdx = 0; fileIdx<countof(fileSize); fileIdx++)
+        for (int bufIdx = 0; bufIdx<countof(blockSize); bufIdx++)
+			if  (fileSize[fileIdx] / blockSize[bufIdx] < 4 * 1024 * 1024)  // Keep nr blocks under 4M to complete in reasonable time.
+                singleReadSeekTest(testStack, nameFmt, fileSize[fileIdx], blockSize[bufIdx]);
+}
+
+/*
+ * Here is a nuisance problem for testing I/O Stacks.
+ * PG_TESTSTACK requires a fully built IoStack as a prototype,
+ * but the test framework wants a "createTestStack(blockSize) function which accepts a blockSize parameter.
+ * In functional programing the solution would be easy - simply create a new function
+ * by binding blockSize in a lambda expression.
+ * Implementing in C is awkward. Our solution is to save blockSize and createTestStack(blockSize)
+ * in global variables and implement "createStack()" on top of them. Not elegant, but it
+ * is "good enough" for unit testing.
+ */
+
+/* Create boundFunction() by binding blockSize */
+static size_t boundBlockSize;
+static CreateTestStackFn boundTestStackFn;
+
+static inline IoStack *boundFunction()
+{
+	return boundTestStackFn(boundBlockSize);
+}
+
+/*
+ * Setup up test stack for PG_TESTSTACK.
+ */
+void setTestStack(CreateTestStackFn fn, size_t blockSize)
+{
+	//ioStackTest = fn(blockSize);
+}
diff --git a/src/test/storage/framework/fileFramework.h b/src/test/storage/framework/fileFramework.h
new file mode 100644
index 0000000000..13f1ea26d4
--- /dev/null
+++ b/src/test/storage/framework/fileFramework.h
@@ -0,0 +1,25 @@
+/* */
+#ifndef FILTER_FILEFRAMEWORK_H
+#define FILTER_FILEFRAMEWORK_H
+
+//#include "storage/iostack.h"
+
+
+#define PG_TESTSTACK 0
+
+/* Function type to create an IoStack with the given block size */
+typedef void *(*CreateStackFn)(size_t blockSize);
+
+void seekTest(CreateStackFn createStack, char *nameFmt);
+void singleSeekTest(CreateStackFn createStack, char *nameFmt, off_t fileSize, size_t bufSize);
+
+void streamTest(CreateStackFn createStack, char *nameFmt);
+void singleStreamTest(CreateStackFn createStack, char *nameFmt, off_t fileSize, size_t bufSize);
+
+void readSeekTest(CreateStackFn createStack, char *nameFmt);
+void singleReadSeekTest(CreateStackFn createStack, char *nameFmt, off_t fileSize, size_t bufSize);
+
+#include "unitTestInternal.h"
+
+
+#endif //FILTER_FILEFRAMEWORK_H
diff --git a/src/test/storage/framework/unitTest.h b/src/test/storage/framework/unitTest.h
new file mode 100644
index 0000000000..0dad661600
--- /dev/null
+++ b/src/test/storage/framework/unitTest.h
@@ -0,0 +1,20 @@
+//
+// Created by John Morris on 10/20/22.
+//
+
+#ifndef FILTER_UNITTESTFRAMEWORK_H
+#define FILTER_UNITTESTFRAMEWORK_H
+#include "unitTestInternal.h"
+void testMain(void);
+char *progname;
+extern void InitFileAccess(void);
+int main(int argc, char **argv)
+{
+	progname = argv[0];
+	MemoryContextInit();
+	InitFileAccess();
+    testMain();
+}
+
+
+#endif //FILTER_UNITTESTFRAMEWORK_H
diff --git a/src/test/storage/framework/unitTestInternal.h b/src/test/storage/framework/unitTestInternal.h
new file mode 100644
index 0000000000..f95a72b8f7
--- /dev/null
+++ b/src/test/storage/framework/unitTestInternal.h
@@ -0,0 +1,84 @@
+/*
+ * A set of macros for creating simple unit tests.
+ */
+#ifndef FILTER_UNITTESTINTERNAL_H
+#define FILTER_UNITTESTINTERNAL_H
+
+#include <stdio.h>
+#include <stdlib.h>
+extern void MemoryContextInit(void);
+
+#define TEST_DIR "/tmp/pgtest/"
+
+#define BEGIN do {
+#define END   } while (0)
+
+// static const char *expectFmt = "Expected '%s' but got '%s'";
+#define expectFmt "Expected '%s' but got '%s'"
+
+/* Is an integer expression signed? Note _Generic is a C11 feature.  Define as "false" if c99. */
+#define isSigned(x) _Generic(x,  \
+     char: true,                 \
+	 int: true,                     \
+	 long: true,                    \
+	 long long: true,               \
+	 float: (void)0,                   \
+	 double: (void)0,                  \
+	 default: false)
+
+
+/* Verify two scaler values are equal */
+#define PG_ASSERT_EQ(a,b)                                                                                              \
+   BEGIN                                                                                                               \
+       uint64_t _a=a; uint64_t _b = b;                                                                                 \
+       char _bufa[16], _bufb[16];                                                                                      \
+       if ( _a != _b || (isSigned(a) && (int64_t)_a < 0) != (isSigned(b) && (int64_t)_b < 0))                          \
+           PG_ASSERT_FMT(expectFmt, PG_INT_TO_STR(a, _a, _bufa), PG_INT_TO_STR(b, _b, _bufb));                         \
+   END
+
+
+/* Verify two strings are equal */
+#define PG_ASSERT_EQ_STR(stra, strb)                                                                                   \
+    BEGIN                                                                                                              \
+        if (strcmp(stra, strb) != 0)                                                                                   \
+            PG_ASSERT_FMT(expectFmt, stra, strb);                                                                      \
+    END
+
+
+/*
+ * Format a signed/unsigned integer as a string.
+ * Since we only want to evalate an expression once, accepts both expression (for type) and value.
+ */
+#define PG_INT_TO_STR(expr, val, buf)                                                                                      \
+    isSigned(expr)? (snprintf(buf, sizeof(buf), "%lld", (long long)(val)), buf)                                     \
+           : (snprintf(buf, sizeof(buf), "%llu",   (unsigned long long)(val)), buf)
+
+
+#define PG_ASSERT_ERRNO(_expectedErrno) \
+    BEGIN                             \
+		PG_ASSERT_EQ(_expectedErrno, errno) ; \
+    END
+
+/* Display a formatted message and exit */
+#define PG_ASSERT_FMT(fmt, ...)                                                                                        \
+    BEGIN                                                                                                              \
+        char _buf[256];                                                                                                \
+        snprintf(_buf, sizeof(_buf), fmt, __VA_ARGS__);                                                                \
+        PG_ASSERT_MSG(_buf);                                                                                           \
+    END
+
+/* Display an unformatted message and exit */
+#define PG_ASSERT_MSG(msg)                                                                                             \
+    (fprintf(stderr, "FAILED: %s (%s:%d) %s\n", __func__, __FILE__, __LINE__, msg ), abort())
+
+/* Verify the expression is true */
+#define PG_ASSERT(expr)                                                                                                \
+    BEGIN                                                                                                              \
+        if (!(expr))                                                                                                   \
+            PG_ASSERT_MSG("'" #expr "' is false");                                                                     \
+    END
+
+static inline void beginTestGroup(char *name) {fprintf(stderr, "Begin Testgroup %s\n", name);}
+static inline void beginTest(char *name) {fprintf(stderr, "    Test %s\n", name);}
+
+#endif //FILTER_UNITTESTINTERNAL_H
diff --git a/src/test/storage/meson.build b/src/test/storage/meson.build
new file mode 100644
index 0000000000..732363dd4d
--- /dev/null
+++ b/src/test/storage/meson.build
@@ -0,0 +1,22 @@
+####################################################################
+# Create an archive library containing all the backend object files.
+####################################################################
+test_lib = static_library('test_lib',
+   objects: backend_objs,
+   )
+
+
+###################################################
+# Individual unit tests in the 'iostack' test suite.
+#  TODO: refactor so each is a single line?
+###################################################
+
+# Test the basic File layer
+storage_vfdtest = executable('storage_vfdtest',
+  files('vfdTest.c',  'framework/fileFramework.c', ),
+  include_directories: [include_directories('.'), postgres_inc],
+  link_args: backend_link_args,
+  link_with: [test_lib, backend_link_with],
+  link_depends: backend_link_depends,
+)
+test('storage/vfd', storage_vfdtest, suite: 'iostack')
diff --git a/src/test/storage/vfdTest.c b/src/test/storage/vfdTest.c
new file mode 100644
index 0000000000..b01ae19ad1
--- /dev/null
+++ b/src/test/storage/vfdTest.c
@@ -0,0 +1,23 @@
+/*
+ *
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/fcntl.h>
+#include "./framework/fileFramework.h"
+#include "./framework/unitTest.h"
+
+
+#define createStack NULL
+
+void testMain()
+{
+    system("rm -rf " TEST_DIR "vfd; mkdir -p " TEST_DIR "vfd");
+
+
+
+	beginTest("Storage");
+    beginTestGroup("Vfd Stack");
+	singleSeekTest(createStack, TEST_DIR "vfd/testfile_%u_%u.dat", 1024, 4096);
+    seekTest(createStack, TEST_DIR "vfd/testfile_%u_%u.dat");
+}
-- 
2.33.0

0002-UseNewFileAPI.patchapplication/octet-stream; name=0002-UseNewFileAPI.patchDownload
From 3e1ef58a06fe6d1521e47535c335c5f5622f6ab9 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 20:27:17 -0700
Subject: [PATCH 1/6] pg_stat_statement5 passes, including initializing
 postmaster

---
 .../pg_stat_statements/pg_stat_statements.c   | 139 +++++++++---------
 src/backend/postmaster/postmaster.c           |  63 ++++----
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dcb2ddd64..be7f023aef 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -500,8 +500,8 @@ pgss_shmem_startup(void)
 {
 	bool		found;
 	HASHCTL		info;
-	FILE	   *file = NULL;
-	FILE	   *qfile = NULL;
+	File	    file = -1;
+	File	    qfile = -1;
 	uint32		header;
 	int32		num;
 	int32		pgver;
@@ -570,8 +570,8 @@ pgss_shmem_startup(void)
 	unlink(PGSS_TEXT_FILE);
 
 	/* Allocate new query text temp file */
-	qfile = AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);
-	if (qfile == NULL)
+	qfile = PathNameOpenFile(PGSS_TEXT_FILE, O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (qfile < 0)
 		goto write_error;
 
 	/*
@@ -581,29 +581,29 @@ pgss_shmem_startup(void)
 	 */
 	if (!pgss_save)
 	{
-		FreeFile(qfile);
+		FileClose(qfile);
 		return;
 	}
 
 	/*
 	 * Attempt to load old statistics from the dump file.
 	 */
-	file = AllocateFile(PGSS_DUMP_FILE, PG_BINARY_R);
-	if (file == NULL)
+	file = PathNameOpenFile(PGSS_DUMP_FILE, O_RDONLY|PG_BINARY);
+	if (file < 0)
 	{
 		if (errno != ENOENT)
 			goto read_error;
 		/* No existing persisted stats file, so we're done */
-		FreeFile(qfile);
+		FileClose(qfile);
 		return;
 	}
 
 	buffer_size = 2048;
 	buffer = (char *) palloc(buffer_size);
 
-	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
-		fread(&pgver, sizeof(uint32), 1, file) != 1 ||
-		fread(&num, sizeof(int32), 1, file) != 1)
+	if (FileReadSeq(file, &header, sizeof(uint32), 0) != sizeof(uint32) ||
+		FileReadSeq(file, &pgver, sizeof(uint32), 0) != sizeof(uint32) ||
+		FileReadSeq(file, &num, sizeof(int32), 0) != sizeof(uint32))
 		goto read_error;
 
 	if (header != PGSS_FILE_HEADER ||
@@ -616,7 +616,7 @@ pgss_shmem_startup(void)
 		pgssEntry  *entry;
 		Size		query_offset;
 
-		if (fread(&temp, sizeof(pgssEntry), 1, file) != 1)
+		if (FileReadSeq(file, &temp, sizeof(pgssEntry), 0 != sizeof(pgssEntry)))
 			goto read_error;
 
 		/* Encoding is the only field we can easily sanity-check */
@@ -630,7 +630,7 @@ pgss_shmem_startup(void)
 			buffer = repalloc(buffer, buffer_size);
 		}
 
-		if (fread(buffer, 1, temp.query_len + 1, file) != temp.query_len + 1)
+		if (FileReadSeq(file, buffer, temp.query_len + 1, 0) != temp.query_len + 1)
 			goto read_error;
 
 		/* Should have a trailing null, but let's make sure */
@@ -642,7 +642,7 @@ pgss_shmem_startup(void)
 
 		/* Store the query text */
 		query_offset = pgss->extent;
-		if (fwrite(buffer, 1, temp.query_len + 1, qfile) != temp.query_len + 1)
+		if (FileWriteSeq(qfile, buffer, temp.query_len + 1, 0) != temp.query_len + 1)
 			goto write_error;
 		pgss->extent += temp.query_len + 1;
 
@@ -656,12 +656,12 @@ pgss_shmem_startup(void)
 	}
 
 	/* Read global statistics for pg_stat_statements */
-	if (fread(&pgss->stats, sizeof(pgssGlobalStats), 1, file) != 1)
+	if (FileReadSeq(file, &pgss->stats, sizeof(pgssGlobalStats), 0) != 1)
 		goto read_error;
 
 	pfree(buffer);
-	FreeFile(file);
-	FreeFile(qfile);
+	FileClose(file);
+	FileClose(qfile);
 
 	/*
 	 * Remove the persisted stats file so it's not included in
@@ -700,10 +700,10 @@ write_error:
 fail:
 	if (buffer)
 		pfree(buffer);
-	if (file)
-		FreeFile(file);
-	if (qfile)
-		FreeFile(qfile);
+	if (file >= 0)
+		FileClose(file);
+	if (qfile >= 0)
+		FileClose(qfile);
 	/* If possible, throw away the bogus file; ignore any error */
 	unlink(PGSS_DUMP_FILE);
 
@@ -722,7 +722,7 @@ fail:
 static void
 pgss_shmem_shutdown(int code, Datum arg)
 {
-	FILE	   *file;
+	File	   file;
 	char	   *qbuffer = NULL;
 	Size		qbuffer_size = 0;
 	HASH_SEQ_STATUS hash_seq;
@@ -741,16 +741,16 @@ pgss_shmem_shutdown(int code, Datum arg)
 	if (!pgss_save)
 		return;
 
-	file = AllocateFile(PGSS_DUMP_FILE ".tmp", PG_BINARY_W);
-	if (file == NULL)
+	file = PathNameOpenFile(PGSS_DUMP_FILE ".tmp", O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (file < 0)
 		goto error;
 
-	if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
+	if (FileWriteSeq(file, &PGSS_FILE_HEADER, sizeof(uint32), 0) != sizeof(uint32))
 		goto error;
-	if (fwrite(&PGSS_PG_MAJOR_VERSION, sizeof(uint32), 1, file) != 1)
+	if (FileWriteSeq(file, &PGSS_PG_MAJOR_VERSION, sizeof(uint32), 0) != sizeof(uint32))
 		goto error;
 	num_entries = hash_get_num_entries(pgss_hash);
-	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
+	if (FileWriteSeq(file, &num_entries, sizeof(int32), 0) != 1)
 		goto error;
 
 	qbuffer = qtext_load_file(&qbuffer_size);
@@ -771,8 +771,8 @@ pgss_shmem_shutdown(int code, Datum arg)
 		if (qstr == NULL)
 			continue;			/* Ignore any entries with bogus texts */
 
-		if (fwrite(entry, sizeof(pgssEntry), 1, file) != 1 ||
-			fwrite(qstr, 1, len + 1, file) != len + 1)
+		if (FileWriteSeq(file, entry, sizeof(pgssEntry), 0) != sizeof(pgssEntry) ||
+			FileWriteSeq(file, &qstr, len + 1, 0) != len + 1)
 		{
 			/* note: we assume hash_seq_term won't change errno */
 			hash_seq_term(&hash_seq);
@@ -781,15 +781,15 @@ pgss_shmem_shutdown(int code, Datum arg)
 	}
 
 	/* Dump global statistics for pg_stat_statements */
-	if (fwrite(&pgss->stats, sizeof(pgssGlobalStats), 1, file) != 1)
+	if (FileWriteSeq(file, &pgss->stats, sizeof(pgssGlobalStats), 0) != sizeof(pgssGlobalStats))
 		goto error;
 
 	free(qbuffer);
 	qbuffer = NULL;
 
-	if (FreeFile(file))
+	if (FileClose(file) < 0)
 	{
-		file = NULL;
+		file = -1;
 		goto error;
 	}
 
@@ -809,8 +809,8 @@ error:
 			 errmsg("could not write file \"%s\": %m",
 					PGSS_DUMP_FILE ".tmp")));
 	free(qbuffer);
-	if (file)
-		FreeFile(file);
+	if (file >= 0)
+		FileClose(file);
 	unlink(PGSS_DUMP_FILE ".tmp");
 	unlink(PGSS_TEXT_FILE);
 }
@@ -2081,7 +2081,7 @@ qtext_store(const char *query, int query_len,
 			Size *query_offset, int *gc_count)
 {
 	Size		off;
-	int			fd;
+	File		fd;
 
 	/*
 	 * We use a spinlock to protect extent/n_writers/gc_count, so that
@@ -2114,16 +2114,16 @@ qtext_store(const char *query, int query_len,
 	}
 
 	/* Now write the data into the successfully-reserved part of the file */
-	fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
+	fd = PathNameOpenFile(PGSS_TEXT_FILE, O_RDWR | O_CREAT | PG_BINARY);
 	if (fd < 0)
 		goto error;
 
-	if (pg_pwrite(fd, query, query_len, off) != query_len)
+	if (FileWrite(fd, query, query_len, off, 0) != query_len)
 		goto error;
-	if (pg_pwrite(fd, "\0", 1, off + query_len) != 1)
+	if (FileWrite(fd, "\0", 1, off + query_len, 0) != 1)
 		goto error;
 
-	CloseTransientFile(fd);
+	FileClose(fd);
 
 	/* Mark our write complete */
 	{
@@ -2172,11 +2172,11 @@ static char *
 qtext_load_file(Size *buffer_size)
 {
 	char	   *buf;
-	int			fd;
-	struct stat stat;
+	File		fd;
 	Size		nread;
+    off_t       length;
 
-	fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
+	fd = PathNameOpenFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		if (errno != ENOENT)
@@ -2188,19 +2188,20 @@ qtext_load_file(Size *buffer_size)
 	}
 
 	/* Get file length */
-	if (fstat(fd, &stat))
+	length = FileSize(fd);
+	if (length < 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not stat file \"%s\": %m",
 						PGSS_TEXT_FILE)));
-		CloseTransientFile(fd);
+		FileClose(fd);
 		return NULL;
 	}
 
 	/* Allocate buffer; beware that off_t might be wider than size_t */
-	if (stat.st_size <= MaxAllocHugeSize)
-		buf = (char *) malloc(stat.st_size);
+	if (length <= MaxAllocHugeSize)
+		buf = (char *) malloc(length);
 	else
 		buf = NULL;
 	if (buf == NULL)
@@ -2210,7 +2211,7 @@ qtext_load_file(Size *buffer_size)
 				 errmsg("out of memory"),
 				 errdetail("Could not allocate enough memory to read file \"%s\".",
 						   PGSS_TEXT_FILE)));
-		CloseTransientFile(fd);
+		FileClose(fd);
 		return NULL;
 	}
 
@@ -2220,9 +2221,9 @@ qtext_load_file(Size *buffer_size)
 	 * so read a very large file in 1GB segments.
 	 */
 	nread = 0;
-	while (nread < stat.st_size)
+	while (nread < length)
 	{
-		int			toread = Min(1024 * 1024 * 1024, stat.st_size - nread);
+		int			toread = Min(1024 * 1024 * 1024, length - nread);
 
 		/*
 		 * If we get a short read and errno doesn't get set, the reason is
@@ -2232,7 +2233,7 @@ qtext_load_file(Size *buffer_size)
 		 * writes from garbage collection.
 		 */
 		errno = 0;
-		if (read(fd, buf + nread, toread) != toread)
+		if (FileReadSeq(fd, buf + nread, toread, 0) != toread)
 		{
 			if (errno)
 				ereport(LOG,
@@ -2240,13 +2241,13 @@ qtext_load_file(Size *buffer_size)
 						 errmsg("could not read file \"%s\": %m",
 								PGSS_TEXT_FILE)));
 			free(buf);
-			CloseTransientFile(fd);
+			FileClose(fd);
 			return NULL;
 		}
 		nread += toread;
 	}
 
-	if (CloseTransientFile(fd) != 0)
+	if (FileClose(fd) != 0)
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE)));
@@ -2342,7 +2343,7 @@ gc_qtexts(void)
 {
 	char	   *qbuffer;
 	Size		qbuffer_size;
-	FILE	   *qfile = NULL;
+	File	   qfile = -1;
 	HASH_SEQ_STATUS hash_seq;
 	pgssEntry  *entry;
 	Size		extent;
@@ -2373,8 +2374,8 @@ gc_qtexts(void)
 	 * larger, this should always work on traditional filesystems; though we
 	 * could still lose on copy-on-write filesystems.
 	 */
-	qfile = AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);
-	if (qfile == NULL)
+	qfile = PathNameOpenFile(PGSS_TEXT_FILE, O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (qfile < 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
@@ -2404,7 +2405,7 @@ gc_qtexts(void)
 			continue;
 		}
 
-		if (fwrite(qry, 1, query_len + 1, qfile) != query_len + 1)
+		if (FileWriteSeq(qfile, qry, query_len + 1, 0) != query_len + 1)
 		{
 			ereport(LOG,
 					(errcode_for_file_access(),
@@ -2423,19 +2424,19 @@ gc_qtexts(void)
 	 * Truncate away any now-unused space.  If this fails for some odd reason,
 	 * we log it, but there's no need to fail.
 	 */
-	if (ftruncate(fileno(qfile), extent) != 0)
+	if (FileTruncate(qfile, extent, 0) != 0)
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not truncate file \"%s\": %m",
 						PGSS_TEXT_FILE)));
 
-	if (FreeFile(qfile))
+	if (FileClose(qfile) != 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not write file \"%s\": %m",
 						PGSS_TEXT_FILE)));
-		qfile = NULL;
+		qfile = -1;
 		goto gc_fail;
 	}
 
@@ -2469,8 +2470,8 @@ gc_qtexts(void)
 
 gc_fail:
 	/* clean up resources */
-	if (qfile)
-		FreeFile(qfile);
+	if (qfile < 0)
+		FileClose(qfile);
 	free(qbuffer);
 
 	/*
@@ -2488,14 +2489,14 @@ gc_fail:
 	 * Destroy the query text file and create a new, empty one
 	 */
 	(void) unlink(PGSS_TEXT_FILE);
-	qfile = AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);
-	if (qfile == NULL)
+	qfile = PathNameOpenFile(PGSS_TEXT_FILE, O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (qfile < 0)
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not recreate file \"%s\": %m",
 						PGSS_TEXT_FILE)));
 	else
-		FreeFile(qfile);
+		FileClose(qfile);
 
 	/* Reset the shared extent pointer */
 	pgss->extent = 0;
@@ -2525,7 +2526,7 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 {
 	HASH_SEQ_STATUS hash_seq;
 	pgssEntry  *entry;
-	FILE	   *qfile;
+	File	   qfile;
 	long		num_entries;
 	long		num_remove = 0;
 	pgssHashKey key;
@@ -2608,8 +2609,8 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 	 * Write new empty query file, perhaps even creating a new one to recover
 	 * if the file was missing.
 	 */
-	qfile = AllocateFile(PGSS_TEXT_FILE, PG_BINARY_W);
-	if (qfile == NULL)
+	qfile = PathNameOpenFile(PGSS_TEXT_FILE, O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (qfile < 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
@@ -2619,13 +2620,13 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 	}
 
 	/* If ftruncate fails, log it, but it's not a fatal problem */
-	if (ftruncate(fileno(qfile), 0) != 0)
+	if (FileTruncate(qfile, 0, 0) != 0)
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not truncate file \"%s\": %m",
 						PGSS_TEXT_FILE)));
 
-	FreeFile(qfile);
+	FileClose(qfile);
 
 done:
 	pgss->extent = 0;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 4c49393fc5..56dc016d3c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -859,6 +859,9 @@ PostmasterMain(int argc, char *argv[])
 		ExitPostmaster(1);
 	}
 
+	/* Enable I/O to virtual files */
+	InitFileAccess();
+
 	/*
 	 * Locate the proper configuration files and data directory, and read
 	 * postgresql.conf for the first time.
@@ -1361,12 +1364,12 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	if (external_pid_file)
 	{
-		FILE	   *fpidfile = fopen(external_pid_file, "w");
+		File	   fpidfile = PathNameOpenFile(external_pid_file, O_WRONLY|O_CREAT|O_TRUNC);
 
-		if (fpidfile)
+		if (fpidfile >= 0)
 		{
-			fprintf(fpidfile, "%d\n", MyProcPid);
-			fclose(fpidfile);
+			FilePrintf(fpidfile, "%d\n", MyProcPid);
+			FileClose(fpidfile);
 
 			/* Make PID file world readable */
 			if (chmod(external_pid_file, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH) != 0)
@@ -1580,12 +1583,12 @@ static void
 checkControlFile(void)
 {
 	char		path[MAXPGPATH];
-	FILE	   *fp;
+	File	   file;
 
 	snprintf(path, sizeof(path), "%s/global/pg_control", DataDir);
 
-	fp = AllocateFile(path, PG_BINARY_R);
-	if (fp == NULL)
+	file = PathNameOpenFile(path, O_RDONLY | PG_BINARY);
+	if (file < 0)
 	{
 		write_stderr("%s: could not find the database system\n"
 					 "Expected to find it in the directory \"%s\",\n"
@@ -1593,7 +1596,7 @@ checkControlFile(void)
 					 progname, DataDir, path, strerror(errno));
 		ExitPostmaster(2);
 	}
-	FreeFile(fp);
+	FileClose(file);
 }
 
 /*
@@ -4526,7 +4529,7 @@ internal_forkexec(int argc, char *argv[], Port *port)
 	pid_t		pid;
 	char		tmpfilename[MAXPGPATH];
 	BackendParameters param;
-	FILE	   *fp;
+	File	   file;
 
 	if (!save_backend_variables(&param, port))
 		return -1;				/* log made by save_backend_variables */
@@ -4537,8 +4540,8 @@ internal_forkexec(int argc, char *argv[], Port *port)
 			 MyProcPid, ++tmpBackendFileNum);
 
 	/* Open file */
-	fp = AllocateFile(tmpfilename, PG_BINARY_W);
-	if (!fp)
+	file  = PathNameOpenFile(tmpfilename, O_WRONLY|O_CREAT|O_TRUNC|PG_BINARY);
+	if (file < 0)
 	{
 		/*
 		 * As in OpenTemporaryFileInTablespace, try to make the temp-file
@@ -4546,8 +4549,8 @@ internal_forkexec(int argc, char *argv[], Port *port)
 		 */
 		(void) MakePGDirectory(PG_TEMP_FILES_DIR);
 
-		fp = AllocateFile(tmpfilename, PG_BINARY_W);
-		if (!fp)
+		file  = PathNameOpenFile(tmpfilename, O_WRONLY|O_CREAT|O_TRUNC|PG_BINARY);
+		if (file < 0)
 		{
 			ereport(LOG,
 					(errcode_for_file_access(),
@@ -4557,17 +4560,17 @@ internal_forkexec(int argc, char *argv[], Port *port)
 		}
 	}
 
-	if (fwrite(&param, sizeof(param), 1, fp) != 1)
+	if (FileWriteSeq(file, &param, sizeof(param), 0) != 1)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", tmpfilename)));
-		FreeFile(fp);
+		FileClose(file);
 		return -1;
 	}
 
 	/* Release file */
-	if (FreeFile(fp))
+	if (FileClose(file))
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
@@ -5545,12 +5548,13 @@ MaybeStartWalReceiver(void)
 static bool
 CreateOptsFile(int argc, char *argv[], char *fullprogname)
 {
-	FILE	   *fp;
+	File	   file;
 	int			i;
 
 #define OPTS_FILE	"postmaster.opts"
 
-	if ((fp = fopen(OPTS_FILE, "w")) == NULL)
+	file = PathNameOpenFile(OPTS_FILE, O_WRONLY | O_CREAT | O_TRUNC);
+	if (file < 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
@@ -5558,18 +5562,13 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
 		return false;
 	}
 
-	fprintf(fp, "%s", fullprogname);
+	FilePrintf(file, "%s", fullprogname);
 	for (i = 1; i < argc; i++)
-		fprintf(fp, " \"%s\"", argv[i]);
-	fputs("\n", fp);
+		FilePrintf(file, " \"%s\"", argv[i]);
+	FilePuts(file, "\n");
 
-	if (fclose(fp))
-	{
-		ereport(LOG,
-				(errcode_for_file_access(),
-				 errmsg("could not write file \"%s\": %m", OPTS_FILE)));
+	if (FileClose(file))
 		return false;
-	}
 
 	return true;
 }
@@ -6226,18 +6225,18 @@ read_backend_variables(char *id, Port *port)
 
 #ifndef WIN32
 	/* Non-win32 implementation reads from file */
-	FILE	   *fp;
+	File	   file;
 
 	/* Open file */
-	fp = AllocateFile(id, PG_BINARY_R);
-	if (!fp)
+	file = PathNameOpenFile(id, O_RDONLY|PG_BINARY);
+	if (file < 0)
 	{
 		write_stderr("could not open backend variables file \"%s\": %s\n",
 					 id, strerror(errno));
 		exit(1);
 	}
 
-	if (fread(&param, sizeof(param), 1, fp) != 1)
+	if (FileReadSeq(file, &param, sizeof(param), 0 != 1)
 	{
 		write_stderr("could not read from backend variables file \"%s\": %s\n",
 					 id, strerror(errno));
@@ -6245,7 +6244,7 @@ read_backend_variables(char *id, Port *port)
 	}
 
 	/* Release file */
-	FreeFile(fp);
+	FileClose(file);
 	if (unlink(id) != 0)
 	{
 		write_stderr("could not remove file \"%s\": %s\n",
-- 
2.33.0


From ed9572251aba5739fae3c7c7799f9cb7e8779d52 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 21:01:02 -0700
Subject: [PATCH 2/6] Rewrite heap, with new sync routine

---
 src/backend/access/heap/rewriteheap.c | 57 ++++++---------------------
 src/backend/storage/file/fd.c         | 22 +++++++++++
 src/include/storage/fd.h              |  1 +
 3 files changed, 36 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 424958912c..1d8f317d48 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1107,7 +1107,7 @@ void
 heap_xlog_logical_rewrite(XLogReaderState *r)
 {
 	char		path[MAXPGPATH];
-	int			fd;
+	File		file;
 	xl_heap_rewrite_mapping *xlrec;
 	uint32		len;
 	char	   *data;
@@ -1120,9 +1120,9 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 			 LSN_FORMAT_ARGS(xlrec->start_lsn),
 			 xlrec->mapped_xid, XLogRecGetXid(r));
 
-	fd = OpenTransientFile(path,
+	file = PathNameOpenTemporaryFile(path,
 						   O_CREAT | O_WRONLY | PG_BINARY);
-	if (fd < 0)
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m", path)));
@@ -1131,48 +1131,33 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 	 * Truncate all data that's not guaranteed to have been safely fsynced (by
 	 * previous record or by the last checkpoint).
 	 */
-	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE);
-	if (ftruncate(fd, xlrec->offset) != 0)
+	if (FileTruncate(file, xlrec->offset, WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not truncate file \"%s\" to %u: %m",
 						path, (uint32) xlrec->offset)));
-	pgstat_report_wait_end();
 
 	data = XLogRecGetData(r) + sizeof(*xlrec);
 
 	len = xlrec->num_mappings * sizeof(LogicalRewriteMappingData);
 
 	/* write out tail end of mapping file (again) */
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
-	if (pg_pwrite(fd, data, len, xlrec->offset) != len)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+	if (FileWrite(file, data, len, xlrec->offset, WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE) != len)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", path)));
-	}
-	pgstat_report_wait_end();
 
 	/*
 	 * Now fsync all previously written data. We could improve things and only
 	 * do this for the last write to a file, but the required bookkeeping
 	 * doesn't seem worth the trouble.
 	 */
-	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC);
-	if (pg_fsync(fd) != 0)
+	if (FileSync(file, WAIT_EVENT_LOGICAL_REWRITE_MAPPING_SYNC) != 0)
 		ereport(data_sync_elevel(ERROR),
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", path)));
-	pgstat_report_wait_end();
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", path)));
+	FileClose(file);
 }
 
 /* ---
@@ -1249,39 +1234,23 @@ CheckPointLogicalRewriteHeap(void)
 		}
 		else
 		{
-			/* on some operating systems fsyncing a file requires O_RDWR */
-			int			fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
-
 			/*
 			 * The file cannot vanish due to concurrency since this function
 			 * is the only one removing logical mappings and only one
 			 * checkpoint can be in progress at a time.
-			 */
-			if (fd < 0)
-				ereport(ERROR,
-						(errcode_for_file_access(),
-						 errmsg("could not open file \"%s\": %m", path)));
-
-			/*
-			 * We could try to avoid fsyncing files that either haven't
+			 * We could try to avoid syncing files that either haven't
 			 * changed or have only been created since the checkpoint's start,
 			 * but it's currently not deemed worth the effort.
 			 */
-			pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC);
-			if (pg_fsync(fd) != 0)
+			if (PathNameFileSync(path, WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC) < 0)
 				ereport(data_sync_elevel(ERROR),
 						(errcode_for_file_access(),
-						 errmsg("could not fsync file \"%s\": %m", path)));
-			pgstat_report_wait_end();
-
-			if (CloseTransientFile(fd) != 0)
-				ereport(ERROR,
-						(errcode_for_file_access(),
-						 errmsg("could not close file \"%s\": %m", path)));
+							errmsg("could not fsync file \"%s\": %m", path)));
 		}
 	}
-	FreeDir(mappings_dir);
 
-	/* persist directory entries to disk */
+
+	/* Delete the directory and persist the directory changes */
+	FreeDir(mappings_dir);
 	fsync_fname("pg_logical/mappings", true);
 }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index cd7441eb37..488e279595 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2416,6 +2416,28 @@ FileTruncate(File file, off_t offset, uint32 wait_event_info)
 	return returnCode;
 }
 
+int
+PathNameFileSync(const char *pathName, uint32 wait_event_info)
+{
+	int ret, save_errno;
+
+	/* Open the file, returning immediately if unable */
+	File file = PathNameOpenFile(pathName, O_RDWR | PG_BINARY);
+	if (file < 0)
+		return file;
+
+	/* Sync the now opened file, remembering if error occurred. */
+	ret = FileSync(file, wait_event_info);
+	if (ret == -1)
+	    setFileError(file, errno, "Error while syncing file: %s", pathName);
+
+	/* Close the file. */
+	FileClose(file);
+
+	/* Done, remembering the sync error */
+	return ret;
+}
+
 /*
  * Return the pathname associated with an open file.
  *
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 69b6ce7111..343b12f90e 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -118,6 +118,7 @@ extern ssize_t	FileWrite(File file, const void *buffer, size_t amount, off_t off
 extern int	FileSync(File file, uint32 wait_event_info);
 extern int	FileZero(File file, off_t offset, off_t amount, uint32 wait_event_info);
 extern int	FileFallocate(File file, off_t offset, off_t amount, uint32 wait_event_info);
+extern int PathNameFileSync(const char *pathName, uint32 wait_event_info);
 
 extern off_t FileSize(File file);
 extern int	FileTruncate(File file, off_t offset, uint32 wait_event_info);
-- 
2.33.0


From f5d8636e6be041c30cb10393282298d020280832 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 21:01:27 -0700
Subject: [PATCH 3/6] Two phase uses new api

---
 src/backend/access/transam/twophase.c | 82 +++++++++------------------
 1 file changed, 26 insertions(+), 56 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 068e59bec0..b6db37c872 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1282,25 +1282,20 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 	char		path[MAXPGPATH];
 	char	   *buf;
 	TwoPhaseFileHeader *hdr;
-	int			fd;
-	struct stat stat;
+	File		file;
 	uint32		crc_offset;
 	pg_crc32c	calc_crc,
 				file_crc;
 	int			r;
+    off_t       size;
 
 	TwoPhaseFilePath(path, xid);
 
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
-	if (fd < 0)
-	{
-		if (missing_ok && errno == ENOENT)
-			return NULL;
-
+	file = PathNameOpenTemporaryFile(path, O_RDONLY | PG_BINARY);
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m", path)));
-	}
 
 	/*
 	 * Check file length.  We can determine a lower bound pretty easily. We
@@ -1308,37 +1303,35 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 	 * we can't guarantee that we won't get an out of memory error anyway,
 	 * even on a valid file.
 	 */
-	if (fstat(fd, &stat))
+	size = FileSize(file);
+	if (size < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not stat file \"%s\": %m", path)));
 
-	if (stat.st_size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
+	if (size < (MAXALIGN(sizeof(TwoPhaseFileHeader)) +
 						MAXALIGN(sizeof(TwoPhaseRecordOnDisk)) +
 						sizeof(pg_crc32c)) ||
-		stat.st_size > MaxAllocSize)
+		size > MaxAllocSize)
 		ereport(ERROR,
 				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg_plural("incorrect size of file \"%s\": %lld byte",
-							   "incorrect size of file \"%s\": %lld bytes",
-							   (long long int) stat.st_size, path,
-							   (long long int) stat.st_size)));
+				 errmsg("incorrect size of file \"%s\": %lld byte",
+							   path, (long long int) size)));
 
-	crc_offset = stat.st_size - sizeof(pg_crc32c);
+	crc_offset = size - sizeof(pg_crc32c);
 	if (crc_offset != MAXALIGN(crc_offset))
 		ereport(ERROR,
 				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("incorrect alignment of CRC offset for file \"%s\"",
+                errmsg("incorrect alignment of CRC offset for file \"%s\"",
 						path)));
 
 	/*
 	 * OK, slurp in the file.
 	 */
-	buf = (char *) palloc(stat.st_size);
+	buf = (char *) palloc(size);
 
-	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
-	r = read(fd, buf, stat.st_size);
-	if (r != stat.st_size)
+	r = FileReadSeq(file, buf, size, WAIT_EVENT_TWOPHASE_FILE_READ);
+	if (r != size)
 	{
 		if (r < 0)
 			ereport(ERROR,
@@ -1347,15 +1340,10 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 		else
 			ereport(ERROR,
 					(errmsg("could not read file \"%s\": read %d of %lld",
-							path, r, (long long int) stat.st_size)));
+							path, r, (long long int) size)));
 	}
 
-	pgstat_report_wait_end();
-
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", path)));
+	FileClose(file);
 
 	hdr = (TwoPhaseFileHeader *) buf;
 	if (hdr->magic != TWOPHASE_MAGIC)
@@ -1364,7 +1352,7 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 				 errmsg("invalid magic number stored in file \"%s\"",
 						path)));
 
-	if (hdr->total_len != stat.st_size)
+	if (hdr->total_len != size)
 		ereport(ERROR,
 				(errcode(ERRCODE_DATA_CORRUPTED),
 				 errmsg("invalid size stored in file \"%s\"",
@@ -1714,7 +1702,7 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 {
 	char		path[MAXPGPATH];
 	pg_crc32c	statefile_crc;
-	int			fd;
+	File		file;
 
 	/* Recompute CRC */
 	INIT_CRC32C(statefile_crc);
@@ -1723,51 +1711,33 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 
 	TwoPhaseFilePath(path, xid);
 
-	fd = OpenTransientFile(path,
-						   O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
-	if (fd < 0)
+	file = PathNameOpenTemporaryFile(path, O_CREAT | O_TRUNC | O_WRONLY | PG_BINARY);
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not recreate file \"%s\": %m", path)));
 
 	/* Write content and CRC */
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
-	if (write(fd, content, len) != len)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+	if (FileWriteSeq(file, content, len, WAIT_EVENT_TWOPHASE_FILE_WRITE) != len)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write file \"%s\": %m", path)));
-	}
-	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+
+	if (FileWriteSeq(file, &statefile_crc, sizeof(pg_crc32c), WAIT_EVENT_TWOPHASE_FILE_WRITE) != sizeof(pg_crc32c))
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write file \"%s\": %m", path)));
-	}
-	pgstat_report_wait_end();
 
 	/*
 	 * We must fsync the file because the end-of-replay checkpoint will not do
 	 * so, there being no GXACT in shared memory yet to tell it to.
 	 */
-	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
-	if (pg_fsync(fd) != 0)
+	if (FileSync(file, WAIT_EVENT_TWOPHASE_FILE_SYNC) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", path)));
-	pgstat_report_wait_end();
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", path)));
+	FileClose(file);
 }
 
 /*
-- 
2.33.0


From 83492a36c892fc2b6071a73b467a13ea03de3500 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 21:05:32 -0700
Subject: [PATCH 4/6] commands and extension use new api

---
 src/backend/commands/dbcommands.c | 22 +++++++---------------
 src/backend/commands/extension.c  | 28 ++++++++++++----------------
 2 files changed, 19 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 09f1ab41ad..8b65bc69e1 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -456,7 +456,7 @@ ScanSourceDatabasePgClassTuple(HeapTupleData *tuple, Oid tbid, Oid dbid,
 static void
 CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid, bool isRedo)
 {
-	int			fd;
+	File		file;
 	int			nbytes;
 	char		versionfile[MAXPGPATH];
 	char		buf[16];
@@ -508,31 +508,23 @@ CreateDirAndVersionFile(char *dbpath, Oid dbid, Oid tsid, bool isRedo)
 	 */
 	snprintf(versionfile, sizeof(versionfile), "%s/%s", dbpath, "PG_VERSION");
 
-	fd = OpenTransientFile(versionfile, O_WRONLY | O_CREAT | O_EXCL | PG_BINARY);
-	if (fd < 0 && errno == EEXIST && isRedo)
-		fd = OpenTransientFile(versionfile, O_WRONLY | O_TRUNC | PG_BINARY);
+	file = PathNameOpenTemporaryFile(versionfile, O_WRONLY | O_CREAT | O_EXCL | PG_BINARY);
+	if (file < 0 && errno == EEXIST && isRedo)
+		file = PathNameOpenTemporaryFile(versionfile, O_WRONLY | O_TRUNC | PG_BINARY);
 
-	if (fd < 0)
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m", versionfile)));
 
 	/* Write PG_MAJORVERSION in the PG_VERSION file. */
-	pgstat_report_wait_start(WAIT_EVENT_VERSION_FILE_WRITE);
-	errno = 0;
-	if ((int) write(fd, buf, nbytes) != nbytes)
-	{
-		/* If write didn't set errno, assume problem is no disk space. */
-		if (errno == 0)
-			errno = ENOSPC;
+	if (FileWriteSeq(file, buf, nbytes, WAIT_EVENT_VERSION_FILE_WRITE) != nbytes)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", versionfile)));
-	}
-	pgstat_report_wait_end();
 
 	/* Close the version file. */
-	CloseTransientFile(fd);
+	FileClose(file);
 
 	/* Critical section done. */
 	if (!isRedo)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 0eabe18335..a1220a530b 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -67,6 +67,7 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/varlena.h"
+#include "utils/wait_event.h"
 
 
 /* Globally visible state variables */
@@ -3435,37 +3436,32 @@ static char *
 read_whole_file(const char *filename, int *length)
 {
 	char	   *buf;
-	FILE	   *file;
+	File	   file;
 	size_t		bytes_to_read;
-	struct stat fst;
 
-	if (stat(filename, &fst) < 0)
+	file = PathNameOpenTemporaryFile(filename, O_RDONLY);
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not stat file \"%s\": %m", filename)));
+				 errmsg("could not open file \"%s\" for reading: %m",
+						filename)));
 
-	if (fst.st_size > (MaxAllocSize - 1))
+	bytes_to_read = FileSize(file);
+	if (bytes_to_read > (MaxAllocSize -1))
 		ereport(ERROR,
 				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-				 errmsg("file \"%s\" is too large", filename)));
-	bytes_to_read = (size_t) fst.st_size;
+					errmsg("file \"%s\" is too large", filename)));
 
-	if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\" for reading: %m",
-						filename)));
 
 	buf = (char *) palloc(bytes_to_read + 1);
 
-	*length = fread(buf, 1, bytes_to_read, file);
-
-	if (ferror(file))
+	*length = FileReadSeq(file, buf, bytes_to_read, 0);
+	if (*length != bytes_to_read)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\": %m", filename)));
 
-	FreeFile(file);
+	FileClose(file);
 
 	buf[*length] = '\0';
 	return buf;
-- 
2.33.0


From 5971a3715156613484670aa6ccb6a0f145d2828e Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 21:11:22 -0700
Subject: [PATCH 5/6] Replication uses new api

---
 src/backend/replication/logical/origin.c      | 56 ++++-------
 .../replication/logical/reorderbuffer.c       | 94 +++++++------------
 src/backend/replication/logical/snapbuild.c   | 69 +++++---------
 src/backend/replication/slot.c                | 47 ++++------
 4 files changed, 93 insertions(+), 173 deletions(-)

diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index b0255ffd25..0f8dd4b134 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -574,7 +574,7 @@ CheckPointReplicationOrigin(void)
 {
 	const char *tmppath = "pg_logical/replorigin_checkpoint.tmp";
 	const char *path = "pg_logical/replorigin_checkpoint";
-	int			tmpfd;
+	File		file;
 	int			i;
 	uint32		magic = REPLICATION_STATE_MAGIC;
 	pg_crc32c	crc;
@@ -595,9 +595,9 @@ CheckPointReplicationOrigin(void)
 	 * no other backend can perform this at the same time; only one checkpoint
 	 * can happen at a time.
 	 */
-	tmpfd = OpenTransientFile(tmppath,
+	file = PathNameOpenTemporaryFile(tmppath,
 							  O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
-	if (tmpfd < 0)
+	if (file < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m",
@@ -605,16 +605,12 @@ CheckPointReplicationOrigin(void)
 
 	/* write magic */
 	errno = 0;
-	if ((write(tmpfd, &magic, sizeof(magic))) != sizeof(magic))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+	if ((FileWriteSeq(file, &magic, sizeof(magic), 0)) != sizeof(magic))
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
 						tmppath)));
-	}
+	
 	COMP_CRC32C(crc, &magic, sizeof(magic));
 
 	/* prevent concurrent creations/drops */
@@ -644,19 +640,13 @@ CheckPointReplicationOrigin(void)
 
 		/* make sure we only write out a commit that's persistent */
 		XLogFlush(local_lsn);
-
-		errno = 0;
-		if ((write(tmpfd, &disk_state, sizeof(disk_state))) !=
+		
+		if ((FileWriteSeq(file, &disk_state, sizeof(disk_state), 0)) !=
 			sizeof(disk_state))
-		{
-			/* if write didn't set errno, assume problem is no disk space */
-			if (errno == 0)
-				errno = ENOSPC;
 			ereport(PANIC,
 					(errcode_for_file_access(),
 					 errmsg("could not write to file \"%s\": %m",
 							tmppath)));
-		}
 
 		COMP_CRC32C(crc, &disk_state, sizeof(disk_state));
 	}
@@ -665,23 +655,13 @@ CheckPointReplicationOrigin(void)
 
 	/* write out the CRC */
 	FIN_CRC32C(crc);
-	errno = 0;
-	if ((write(tmpfd, &crc, sizeof(crc))) != sizeof(crc))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+	if ((FileWriteSeq(file, &crc, sizeof(crc), 0)) != sizeof(crc))
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
 						tmppath)));
-	}
 
-	if (CloseTransientFile(tmpfd) != 0)
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						tmppath)));
+	FileClose(file);
 
 	/* fsync, rename to permanent file, fsync file and directory */
 	durable_rename(tmppath, path, PANIC);
@@ -699,7 +679,7 @@ void
 StartupReplicationOrigin(void)
 {
 	const char *path = "pg_logical/replorigin_checkpoint";
-	int			fd;
+	File		file;
 	int			readBytes;
 	uint32		magic = REPLICATION_STATE_MAGIC;
 	int			last_state = 0;
@@ -721,22 +701,22 @@ StartupReplicationOrigin(void)
 
 	elog(DEBUG2, "starting up replication origin progress state");
 
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
+	file = PathNameOpenTemporaryFile(path, O_RDONLY | PG_BINARY);
 
 	/*
 	 * might have had max_replication_slots == 0 last run, or we just brought
 	 * up a standby.
 	 */
-	if (fd < 0 && errno == ENOENT)
+	if (file < 0 && errno == ENOENT)
 		return;
-	else if (fd < 0)
+	else if (file < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						path)));
 
 	/* verify magic, that is written even if nothing was active */
-	readBytes = read(fd, &magic, sizeof(magic));
+	readBytes = FileReadSeq(file, &magic, sizeof(magic), 0);
 	if (readBytes != sizeof(magic))
 	{
 		if (readBytes < 0)
@@ -764,7 +744,7 @@ StartupReplicationOrigin(void)
 	{
 		ReplicationStateOnDisk disk_state;
 
-		readBytes = read(fd, &disk_state, sizeof(disk_state));
+		readBytes = FileReadSeq(file, &disk_state, sizeof(disk_state), 0);
 
 		/* no further data */
 		if (readBytes == sizeof(crc))
@@ -816,11 +796,7 @@ StartupReplicationOrigin(void)
 				 errmsg("replication slot checkpoint has wrong checksum %u, expected %u",
 						crc, file_crc)));
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						path)));
+	FileClose(file);
 }
 
 void
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 26d252bd87..995318534d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -131,21 +131,13 @@ typedef struct ReorderBufferTupleCidEnt
 	CommandId	combocid;		/* just for debugging */
 } ReorderBufferTupleCidEnt;
 
-/* Virtual file descriptor with file offset tracking */
-typedef struct TXNEntryFile
-{
-	File		vfd;			/* -1 when the file is closed */
-	off_t		curOffset;		/* offset for next write or read. Reset to 0
-								 * when vfd is opened. */
-} TXNEntryFile;
-
 /* k-way in-order change iteration support structures */
 typedef struct ReorderBufferIterTXNEntry
 {
 	XLogRecPtr	lsn;
 	ReorderBufferChange *change;
 	ReorderBufferTXN *txn;
-	TXNEntryFile file;
+	File file;
 	XLogSegNo	segno;
 } ReorderBufferIterTXNEntry;
 
@@ -249,9 +241,9 @@ static void ReorderBufferExecuteInvalidations(uint32 nmsgs, SharedInvalidationMe
 static void ReorderBufferCheckMemoryLimit(ReorderBuffer *rb);
 static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
 static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
-										 int fd, ReorderBufferChange *change);
+										 int file, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
-										TXNEntryFile *file, XLogSegNo *segno);
+										File *fileP, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -1291,7 +1283,7 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	for (off = 0; off < state->nr_txns; off++)
 	{
-		state->entries[off].file.vfd = -1;
+		state->entries[off].file = -1;
 		state->entries[off].segno = 0;
 	}
 
@@ -1473,8 +1465,8 @@ ReorderBufferIterTXNFinish(ReorderBuffer *rb,
 
 	for (off = 0; off < state->nr_txns; off++)
 	{
-		if (state->entries[off].file.vfd != -1)
-			FileClose(state->entries[off].file.vfd);
+		if (state->entries[off].file != -1)
+			FileClose(state->entries[off].file);
 	}
 
 	/* free memory we might have "leaked" in the last *Next call */
@@ -3651,7 +3643,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 {
 	dlist_iter	subtxn_i;
 	dlist_mutable_iter change_i;
-	int			fd = -1;
+	File		file = -1;
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
@@ -3679,13 +3671,13 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		 * store in segment in which it belongs by start lsn, don't split over
 		 * multiple segments tho
 		 */
-		if (fd == -1 ||
+		if (file == -1 ||
 			!XLByteInSeg(change->lsn, curOpenSegNo, wal_segment_size))
 		{
 			char		path[MAXPGPATH];
 
-			if (fd != -1)
-				CloseTransientFile(fd);
+			if (file != -1)
+				FileClose(file);
 
 			XLByteToSeg(change->lsn, curOpenSegNo, wal_segment_size);
 
@@ -3697,16 +3689,16 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 										curOpenSegNo);
 
 			/* open segment, create it if necessary */
-			fd = OpenTransientFile(path,
+			file = PathNameOpenTemporaryFile(path,
 								   O_CREAT | O_WRONLY | O_APPEND | PG_BINARY);
 
-			if (fd < 0)
+			if (file < 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 
-		ReorderBufferSerializeChange(rb, txn, fd, change);
+		ReorderBufferSerializeChange(rb, txn, file, change);
 		dlist_delete(&change->node);
 		ReorderBufferReturnChange(rb, change, true);
 
@@ -3731,8 +3723,8 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	txn->nentries_mem = 0;
 	txn->txn_flags |= RBTXN_IS_SERIALIZED;
 
-	if (fd != -1)
-		CloseTransientFile(fd);
+	if (file != -1)
+		FileClose(file);
 }
 
 /*
@@ -3740,7 +3732,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
  */
 static void
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
-							 int fd, ReorderBufferChange *change)
+							 File file, ReorderBufferChange *change)
 {
 	ReorderBufferDiskChange *ondisk;
 	Size		sz = sizeof(ReorderBufferDiskChange);
@@ -3922,12 +3914,11 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ondisk->size = sz;
 
 	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
-	if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+	if (FileWriteSeq(file, rb->outbuf, ondisk->size, WAIT_EVENT_REORDER_BUFFER_WRITE) != ondisk->size)
 	{
 		int			save_errno = errno;
 
-		CloseTransientFile(fd);
+		FileClose(file);
 
 		/* if write didn't set errno, assume problem is no disk space */
 		errno = save_errno ? save_errno : ENOSPC;
@@ -3936,7 +3927,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				 errmsg("could not write to data file for XID %u: %m",
 						txn->xid)));
 	}
-	pgstat_report_wait_end();
 
 	/*
 	 * Keep the transaction's final_lsn up to date with each change we send to
@@ -4192,12 +4182,11 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
  */
 static Size
 ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
-							TXNEntryFile *file, XLogSegNo *segno)
+							File *fileP, XLogSegNo *segno)
 {
 	Size		restored = 0;
 	XLogSegNo	last_segno;
 	dlist_mutable_iter cleanup_iter;
-	File	   *fd = &file->vfd;
 
 	Assert(txn->first_lsn != InvalidXLogRecPtr);
 	Assert(txn->final_lsn != InvalidXLogRecPtr);
@@ -4223,7 +4212,7 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 		CHECK_FOR_INTERRUPTS();
 
-		if (*fd == -1)
+		if (*fileP == -1)
 		{
 			char		path[MAXPGPATH];
 
@@ -4240,18 +4229,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			ReorderBufferSerializedPath(path, MyReplicationSlot, txn->xid,
 										*segno);
 
-			*fd = PathNameOpenFile(path, O_RDONLY | PG_BINARY);
+			*fileP = PathNameOpenFile(path, O_RDONLY | PG_BINARY);
 
-			/* No harm in resetting the offset even in case of failure */
-			file->curOffset = 0;
-
-			if (*fd < 0 && errno == ENOENT)
+			if (*fileP < 0 && errno == ENOENT)
 			{
-				*fd = -1;
+				*fileP = -1;
 				(*segno)++;
 				continue;
 			}
-			else if (*fd < 0)
+			else if (*fileP < 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not open file \"%s\": %m",
@@ -4264,15 +4250,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 * end of this file.
 		 */
 		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+		readBytes = FileReadSeq(*fileP, rb->outbuf,
+								sizeof(ReorderBufferDiskChange),
+								WAIT_EVENT_REORDER_BUFFER_READ);
 
 		/* eof */
 		if (readBytes == 0)
 		{
-			FileClose(*fd);
-			*fd = -1;
+			FileClose(*fileP);
+			*fileP = -1;
 			(*segno)++;
 			continue;
 		}
@@ -4287,18 +4273,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							readBytes,
 							(uint32) sizeof(ReorderBufferDiskChange))));
 
-		file->curOffset += readBytes;
-
 		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
 		ReorderBufferSerializeReserve(rb,
 									  sizeof(ReorderBufferDiskChange) + ondisk->size);
 		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
-		readBytes = FileRead(file->vfd,
+		readBytes = FileReadSeq(*fileP,
 							 rb->outbuf + sizeof(ReorderBufferDiskChange),
 							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
 							 WAIT_EVENT_REORDER_BUFFER_READ);
 
 		if (readBytes < 0)
@@ -4312,8 +4295,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							readBytes,
 							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
 
-		file->curOffset += readBytes;
-
 		/*
 		 * ok, read a full change from disk, now restore it into proper
 		 * in-memory format
@@ -5018,13 +4999,13 @@ static void
 ApplyLogicalMappingFile(HTAB *tuplecid_data, Oid relid, const char *fname)
 {
 	char		path[MAXPGPATH];
-	int			fd;
+	File		file;
 	int			readBytes;
 	LogicalRewriteMappingData map;
 
 	sprintf(path, "pg_logical/mappings/%s", fname);
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
-	if (fd < 0)
+	file = PathNameOpenTemporaryFile(path, O_RDONLY | PG_BINARY);
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m", path)));
@@ -5040,9 +5021,7 @@ ApplyLogicalMappingFile(HTAB *tuplecid_data, Oid relid, const char *fname)
 		memset(&key, 0, sizeof(ReorderBufferTupleCidKey));
 
 		/* read all mappings till the end of the file */
-		pgstat_report_wait_start(WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ);
-		readBytes = read(fd, &map, sizeof(LogicalRewriteMappingData));
-		pgstat_report_wait_end();
+		readBytes = FileReadSeq(file, &map, sizeof(LogicalRewriteMappingData), WAIT_EVENT_REORDER_LOGICAL_MAPPING_READ);
 
 		if (readBytes < 0)
 			ereport(ERROR,
@@ -5096,10 +5075,7 @@ ApplyLogicalMappingFile(HTAB *tuplecid_data, Oid relid, const char *fname)
 		}
 	}
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", path)));
+	FileClose(file);
 }
 
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0786bb0ab7..945a80fd4b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -300,7 +300,7 @@ static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutof
 /* serialization functions */
 static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
 static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
-static void SnapBuildRestoreContents(int fd, char *dest, Size size, const char *path);
+static void SnapBuildRestoreContents(File file, char *dest, Size size, const char *path);
 
 /*
  * Allocate a new snapshot builder.
@@ -1606,7 +1606,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	MemoryContext old_ctx;
 	size_t		catchange_xcnt;
 	char	   *ondisk_c;
-	int			fd;
+	int			file;
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
 	int			ret;
@@ -1745,23 +1745,16 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	FIN_CRC32C(ondisk->checksum);
 
 	/* we have valid data now, open tempfile and write it there */
-	fd = OpenTransientFile(tmppath,
+	file = PathNameOpenTemporaryFile(tmppath,
 						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
-	if (fd < 0)
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m", tmppath)));
 
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
-	if ((write(fd, ondisk, needed_length)) != needed_length)
+	if ((FileWriteSeq(file, ondisk, needed_length, WAIT_EVENT_SNAPBUILD_WRITE)) != needed_length)
 	{
-		int			save_errno = errno;
-
-		CloseTransientFile(fd);
-
-		/* if write didn't set errno, assume problem is no disk space */
-		errno = save_errno ? save_errno : ENOSPC;
+		FileClose(file);
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", tmppath)));
@@ -1779,24 +1772,20 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	 * some noticeable overhead since it's performed synchronously during
 	 * decoding?
 	 */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
-	if (pg_fsync(fd) != 0)
+	if (FileSync(file, WAIT_EVENT_SNAPBUILD_SYNC) != 0)
 	{
 		int			save_errno = errno;
-
-		CloseTransientFile(fd);
+		FileClose(file);
 		errno = save_errno;
+
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", tmppath)));
 	}
-	pgstat_report_wait_end();
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", tmppath)));
+	FileClose(file);
 
+	/* Sync the directory as well */
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
@@ -1841,7 +1830,7 @@ static bool
 SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 {
 	SnapBuildOnDisk ondisk;
-	int			fd;
+	File			file;
 	char		path[MAXPGPATH];
 	Size		sz;
 	pg_crc32c	checksum;
@@ -1853,11 +1842,11 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
 			LSN_FORMAT_ARGS(lsn));
 
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
-
-	if (fd < 0 && errno == ENOENT)
+	file = PathNameOpenTemporaryFile(path, O_RDONLY | PG_BINARY);
+    if (file < 0 && errno == ENOENT)
 		return false;
-	else if (fd < 0)
+
+	else if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m", path)));
@@ -1875,7 +1864,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 
 
 	/* read statically sized portion of snapshot */
-	SnapBuildRestoreContents(fd, (char *) &ondisk, SnapBuildOnDiskConstantSize, path);
+	SnapBuildRestoreContents(file, (char *) &ondisk, SnapBuildOnDiskConstantSize, path);
 
 	if (ondisk.magic != SNAPBUILD_MAGIC)
 		ereport(ERROR,
@@ -1895,7 +1884,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
 
 	/* read SnapBuild */
-	SnapBuildRestoreContents(fd, (char *) &ondisk.builder, sizeof(SnapBuild), path);
+	SnapBuildRestoreContents(file, (char *) &ondisk.builder, sizeof(SnapBuild), path);
 	COMP_CRC32C(checksum, &ondisk.builder, sizeof(SnapBuild));
 
 	/* restore committed xacts information */
@@ -1903,7 +1892,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
 		ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
-		SnapBuildRestoreContents(fd, (char *) ondisk.builder.committed.xip, sz, path);
+		SnapBuildRestoreContents(file, (char *) ondisk.builder.committed.xip, sz, path);
 		COMP_CRC32C(checksum, ondisk.builder.committed.xip, sz);
 	}
 
@@ -1912,14 +1901,11 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		sz = sizeof(TransactionId) * ondisk.builder.catchange.xcnt;
 		ondisk.builder.catchange.xip = MemoryContextAllocZero(builder->context, sz);
-		SnapBuildRestoreContents(fd, (char *) ondisk.builder.catchange.xip, sz, path);
+		SnapBuildRestoreContents(file, (char *) ondisk.builder.catchange.xip, sz, path);
 		COMP_CRC32C(checksum, ondisk.builder.catchange.xip, sz);
 	}
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m", path)));
+	FileClose(file);
 
 	FIN_CRC32C(checksum);
 
@@ -2006,26 +1992,19 @@ snapshot_not_interesting:
  * Read the contents of the serialized snapshot to 'dest'.
  */
 static void
-SnapBuildRestoreContents(int fd, char *dest, Size size, const char *path)
+SnapBuildRestoreContents(File file, char *dest, Size size, const char *path)
 {
 	int			readBytes;
 
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, dest, size);
-	pgstat_report_wait_end();
+	readBytes = FileReadSeq(file, dest, size, WAIT_EVENT_SNAPBUILD_READ);
 	if (readBytes != size)
 	{
-		int			save_errno = errno;
-
-		CloseTransientFile(fd);
+		FileClose(file);
 
 		if (readBytes < 0)
-		{
-			errno = save_errno;
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read file \"%s\": %m", path)));
-		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_DATA_CORRUPTED),
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 1dc27264f6..209b18dfc3 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1730,7 +1730,7 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 {
 	char		tmppath[MAXPGPATH];
 	char		path[MAXPGPATH];
-	int			fd;
+	File		file;
 	ReplicationSlotOnDisk cp;
 	bool		was_dirty;
 
@@ -1752,8 +1752,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	sprintf(tmppath, "%s/state.tmp", dir);
 	sprintf(path, "%s/state", dir);
 
-	fd = OpenTransientFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
-	if (fd < 0)
+	file = PathNameOpenTemporaryFile(tmppath, O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
+	if (file < 0)
 	{
 		/*
 		 * If not an ERROR, then release the lock before returning.  In case
@@ -1789,33 +1789,28 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 	FIN_CRC32C(cp.checksum);
 
 	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_WRITE);
-	if ((write(fd, &cp, sizeof(cp))) != sizeof(cp))
+	if (FileWriteSeq(file, &cp, sizeof(cp), WAIT_EVENT_REPLICATION_SLOT_WRITE) != sizeof(cp))
 	{
 		int			save_errno = errno;
 
-		pgstat_report_wait_end();
-		CloseTransientFile(fd);
+		FileClose(file);
 		LWLockRelease(&slot->io_in_progress_lock);
 
 		/* if write didn't set errno, assume problem is no disk space */
-		errno = save_errno ? save_errno : ENOSPC;
+		errno = save_errno;
 		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
 						tmppath)));
 		return;
 	}
-	pgstat_report_wait_end();
 
 	/* fsync the temporary file */
-	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_SYNC);
-	if (pg_fsync(fd) != 0)
+	if (FileSync(file, WAIT_EVENT_REPLICATION_SLOT_SYNC) != 0)
 	{
 		int			save_errno = errno;
 
-		pgstat_report_wait_end();
-		CloseTransientFile(fd);
+		FileClose(file);
 		LWLockRelease(&slot->io_in_progress_lock);
 		errno = save_errno;
 		ereport(elevel,
@@ -1824,9 +1819,8 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 						tmppath)));
 		return;
 	}
-	pgstat_report_wait_end();
 
-	if (CloseTransientFile(fd) != 0)
+	if (FileClose(file) != 0)
 	{
 		int			save_errno = errno;
 
@@ -1886,7 +1880,7 @@ RestoreSlotFromDisk(const char *name)
 	int			i;
 	char		slotdir[MAXPGPATH + 12];
 	char		path[MAXPGPATH + 22];
-	int			fd;
+	File		file;
 	bool		restored = false;
 	int			readBytes;
 	pg_crc32c	checksum;
@@ -1906,13 +1900,13 @@ RestoreSlotFromDisk(const char *name)
 	elog(DEBUG1, "restoring replication slot from \"%s\"", path);
 
 	/* on some operating systems fsyncing a file requires O_RDWR */
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	file = PathNameOpenTemporaryFile(path, O_RDWR | PG_BINARY);
 
 	/*
 	 * We do not need to handle this as we are rename()ing the directory into
 	 * place only after we fsync()ed the state file.
 	 */
-	if (fd < 0)
+	if (file < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m", path)));
@@ -1921,13 +1915,11 @@ RestoreSlotFromDisk(const char *name)
 	 * Sync state file before we're reading from it. We might have crashed
 	 * while it wasn't synced yet and we shouldn't continue on that basis.
 	 */
-	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC);
-	if (pg_fsync(fd) != 0)
+	if (FileSync(file, WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC) != 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m",
 						path)));
-	pgstat_report_wait_end();
 
 	/* Also sync the parent directory */
 	START_CRIT_SECTION();
@@ -1935,9 +1927,7 @@ RestoreSlotFromDisk(const char *name)
 	END_CRIT_SECTION();
 
 	/* read part of statefile that's guaranteed to be version independent */
-	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_READ);
-	readBytes = read(fd, &cp, ReplicationSlotOnDiskConstantSize);
-	pgstat_report_wait_end();
+	readBytes = FileReadSeq(file, &cp, ReplicationSlotOnDiskConstantSize, WAIT_EVENT_REPLICATION_SLOT_READ);
 	if (readBytes != ReplicationSlotOnDiskConstantSize)
 	{
 		if (readBytes < 0)
@@ -1974,11 +1964,10 @@ RestoreSlotFromDisk(const char *name)
 						path, cp.length)));
 
 	/* Now that we know the size, read the entire file */
-	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_READ);
-	readBytes = read(fd,
+	readBytes = FileReadSeq(file,
 					 (char *) &cp + ReplicationSlotOnDiskConstantSize,
-					 cp.length);
-	pgstat_report_wait_end();
+					 cp.length,
+					 WAIT_EVENT_REPLICATION_SLOT_READ);
 	if (readBytes != cp.length)
 	{
 		if (readBytes < 0)
@@ -1992,7 +1981,7 @@ RestoreSlotFromDisk(const char *name)
 							path, readBytes, (Size) cp.length)));
 	}
 
-	if (CloseTransientFile(fd) != 0)
+	if (FileClose(file) != 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not close file \"%s\": %m", path)));
-- 
2.33.0


From 0709ea41ff7b4fca95e3e37446123d27a4b7c655 Mon Sep 17 00:00:00 2001
From: John Morris <john.morris@crunchydata.com>
Date: Wed, 28 Jun 2023 21:21:41 -0700
Subject: [PATCH 6/6] utils uses new api

---
 src/backend/utils/activity/pgstat.c   | 102 ++++++++++++--------------
 src/backend/utils/cache/relmapper.c   |  54 +++++++-------
 src/backend/utils/resowner/resowner.c |   3 +-
 3 files changed, 71 insertions(+), 88 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index d743fc0b28..f5759fac1a 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1288,14 +1288,9 @@ pgstat_assert_is_up(void)
 
 /* helpers for pgstat_write_statsfile() */
 static void
-write_chunk(FILE *fpout, void *ptr, size_t len)
+write_chunk(File file, void *ptr, size_t len)
 {
-	int			rc;
-
-	rc = fwrite(ptr, len, 1, fpout);
-
-	/* we'll check for errors with ferror once at the end */
-	(void) rc;
+	FileWriteSeq(file, ptr, len, 0);
 }
 
 #define write_chunk_s(fpout, ptr) write_chunk(fpout, ptr, sizeof(*ptr))
@@ -1307,7 +1302,7 @@ write_chunk(FILE *fpout, void *ptr, size_t len)
 static void
 pgstat_write_statsfile(void)
 {
-	FILE	   *fpout;
+	File file;
 	int32		format_id;
 	const char *tmpfile = PGSTAT_STAT_PERMANENT_TMPFILE;
 	const char *statfile = PGSTAT_STAT_PERMANENT_FILENAME;
@@ -1324,8 +1319,8 @@ pgstat_write_statsfile(void)
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
-	fpout = AllocateFile(tmpfile, PG_BINARY_W);
-	if (fpout == NULL)
+	file = PathNameOpenTemporaryFile(tmpfile, O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
+	if (file < 0)
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
@@ -1338,7 +1333,7 @@ pgstat_write_statsfile(void)
 	 * Write the file header --- currently just a format ID.
 	 */
 	format_id = PGSTAT_FILE_FORMAT_ID;
-	write_chunk_s(fpout, &format_id);
+	write_chunk_s(file, &format_id);
 
 	/*
 	 * XXX: The following could now be generalized to just iterate over
@@ -1350,37 +1345,37 @@ pgstat_write_statsfile(void)
 	 * Write archiver stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_ARCHIVER);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.archiver);
+	write_chunk_s(file, &pgStatLocal.snapshot.archiver);
 
 	/*
 	 * Write bgwriter stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_BGWRITER);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.bgwriter);
+	write_chunk_s(file, &pgStatLocal.snapshot.bgwriter);
 
 	/*
 	 * Write checkpointer stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_CHECKPOINTER);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.checkpointer);
+	write_chunk_s(file, &pgStatLocal.snapshot.checkpointer);
 
 	/*
 	 * Write IO stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_IO);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.io);
+	write_chunk_s(file, &pgStatLocal.snapshot.io);
 
 	/*
 	 * Write SLRU stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_SLRU);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.slru);
+	write_chunk_s(file, &pgStatLocal.snapshot.slru);
 
 	/*
 	 * Write WAL stats struct
 	 */
 	pgstat_build_snapshot_fixed(PGSTAT_KIND_WAL);
-	write_chunk_s(fpout, &pgStatLocal.snapshot.wal);
+	write_chunk_s(file, &pgStatLocal.snapshot.wal);
 
 	/*
 	 * Walk through the stats entries
@@ -1408,8 +1403,8 @@ pgstat_write_statsfile(void)
 		if (!kind_info->to_serialized_name)
 		{
 			/* normal stats entry, identified by PgStat_HashKey */
-			fputc('S', fpout);
-			write_chunk_s(fpout, &ps->key);
+			FilePutc('S', file);
+			write_chunk_s(file, &ps->key);
 		}
 		else
 		{
@@ -1418,13 +1413,13 @@ pgstat_write_statsfile(void)
 
 			kind_info->to_serialized_name(&ps->key, shstats, &name);
 
-			fputc('N', fpout);
-			write_chunk_s(fpout, &ps->key.kind);
-			write_chunk_s(fpout, &name);
+			FilePutc('N', file);
+			write_chunk_s(file, &ps->key.kind);
+			write_chunk_s(file, &name);
 		}
 
 		/* Write except the header part of the entry */
-		write_chunk(fpout,
+		write_chunk(file,
 					pgstat_get_entry_data(ps->key.kind, shstats),
 					pgstat_get_entry_len(ps->key.kind));
 	}
@@ -1435,25 +1430,17 @@ pgstat_write_statsfile(void)
 	 * pgstat.stat with it.  The ferror() check replaces testing for error
 	 * after each individual fputc or fwrite (in write_chunk()) above.
 	 */
-	fputc('E', fpout);
-
-	if (ferror(fpout))
+	FilePutc('E', file);
+	FileClose(file);
+	if (FileError(file))
 	{
 		ereport(LOG,
 				(errcode_for_file_access(),
-				 errmsg("could not write temporary statistics file \"%s\": %m",
-						tmpfile)));
-		FreeFile(fpout);
-		unlink(tmpfile);
-	}
-	else if (FreeFile(fpout) < 0)
-	{
-		ereport(LOG,
-				(errcode_for_file_access(),
-				 errmsg("could not close temporary statistics file \"%s\": %m",
-						tmpfile)));
+					errmsg("could not write temporary statistics file \"%s\": %m",
+						   tmpfile)));
 		unlink(tmpfile);
 	}
+
 	else if (rename(tmpfile, statfile) < 0)
 	{
 		ereport(LOG,
@@ -1466,12 +1453,12 @@ pgstat_write_statsfile(void)
 
 /* helpers for pgstat_read_statsfile() */
 static bool
-read_chunk(FILE *fpin, void *ptr, size_t len)
+read_chunk(File file, void *ptr, size_t len)
 {
-	return fread(ptr, 1, len, fpin) == len;
+	return FileReadSeq(file, ptr, len, 0) == len;
 }
 
-#define read_chunk_s(fpin, ptr) read_chunk(fpin, ptr, sizeof(*ptr))
+#define read_chunk_s(file, ptr) read_chunk(file, ptr, sizeof(*ptr))
 
 /*
  * Reads in existing statistics file into the shared stats hash.
@@ -1482,7 +1469,7 @@ read_chunk(FILE *fpin, void *ptr, size_t len)
 static void
 pgstat_read_statsfile(void)
 {
-	FILE	   *fpin;
+	File	    file;
 	int32		format_id;
 	bool		found;
 	const char *statfile = PGSTAT_STAT_PERMANENT_FILENAME;
@@ -1502,7 +1489,8 @@ pgstat_read_statsfile(void)
 	 * has not yet written the stats file for the first time.  Any other
 	 * failure condition is suspicious.
 	 */
-	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	file = PathNameOpenTemporaryFile(statfile, O_RDONLY | PG_BINARY);
+	if (file < 0)
 	{
 		if (errno != ENOENT)
 			ereport(LOG,
@@ -1516,7 +1504,7 @@ pgstat_read_statsfile(void)
 	/*
 	 * Verify it's of the expected format.
 	 */
-	if (!read_chunk_s(fpin, &format_id) ||
+	if (!read_chunk_s(file, &format_id) ||
 		format_id != PGSTAT_FILE_FORMAT_ID)
 		goto error;
 
@@ -1529,37 +1517,37 @@ pgstat_read_statsfile(void)
 	/*
 	 * Read archiver stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->archiver.stats))
+	if (!read_chunk_s(file, &shmem->archiver.stats))
 		goto error;
 
 	/*
 	 * Read bgwriter stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->bgwriter.stats))
+	if (!read_chunk_s(file, &shmem->bgwriter.stats))
 		goto error;
 
 	/*
 	 * Read checkpointer stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
+	if (!read_chunk_s(file, &shmem->checkpointer.stats))
 		goto error;
 
 	/*
 	 * Read IO stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->io.stats))
+	if (!read_chunk_s(file, &shmem->io.stats))
 		goto error;
 
 	/*
 	 * Read SLRU stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->slru.stats))
+	if (!read_chunk_s(file, &shmem->slru.stats))
 		goto error;
 
 	/*
 	 * Read WAL stats struct
 	 */
-	if (!read_chunk_s(fpin, &shmem->wal.stats))
+	if (!read_chunk_s(file, &shmem->wal.stats))
 		goto error;
 
 	/*
@@ -1568,7 +1556,7 @@ pgstat_read_statsfile(void)
 	 */
 	for (;;)
 	{
-		int			t = fgetc(fpin);
+		int			t = FileGetc(file);
 
 		switch (t)
 		{
@@ -1584,7 +1572,7 @@ pgstat_read_statsfile(void)
 					if (t == 'S')
 					{
 						/* normal stats entry, identified by PgStat_HashKey */
-						if (!read_chunk_s(fpin, &key))
+						if (!read_chunk_s(file, &key))
 							goto error;
 
 						if (!pgstat_is_kind_valid(key.kind))
@@ -1597,9 +1585,9 @@ pgstat_read_statsfile(void)
 						PgStat_Kind kind;
 						NameData	name;
 
-						if (!read_chunk_s(fpin, &kind))
+						if (!read_chunk_s(file, &kind))
 							goto error;
-						if (!read_chunk_s(fpin, &name))
+						if (!read_chunk_s(file, &name))
 							goto error;
 						if (!pgstat_is_kind_valid(kind))
 							goto error;
@@ -1612,7 +1600,7 @@ pgstat_read_statsfile(void)
 						if (!kind_info->from_serialized_name(&name, &key))
 						{
 							/* skip over data for entry we don't care about */
-							if (fseek(fpin, pgstat_get_entry_len(kind), SEEK_CUR) != 0)
+							if (FileSeek(file, FileTell(file) + pgstat_get_entry_len(kind)) < 0)
 								goto error;
 
 							continue;
@@ -1640,7 +1628,7 @@ pgstat_read_statsfile(void)
 					header = pgstat_init_entry(key.kind, p);
 					dshash_release_lock(pgStatLocal.shared_hash, p);
 
-					if (!read_chunk(fpin,
+					if (!read_chunk(file,
 									pgstat_get_entry_data(key.kind, header),
 									pgstat_get_entry_len(key.kind)))
 						goto error;
@@ -1649,7 +1637,7 @@ pgstat_read_statsfile(void)
 				}
 			case 'E':
 				/* check that 'E' actually signals end of file */
-				if (fgetc(fpin) != EOF)
+				if (FileGetc(file) != EOF)
 					goto error;
 
 				goto done;
@@ -1660,7 +1648,7 @@ pgstat_read_statsfile(void)
 	}
 
 done:
-	FreeFile(fpin);
+	FileClose(file);
 
 	elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
 	unlink(statfile);
diff --git a/src/backend/utils/cache/relmapper.c b/src/backend/utils/cache/relmapper.c
index 26575cae6c..de81bfa7f3 100644
--- a/src/backend/utils/cache/relmapper.c
+++ b/src/backend/utils/cache/relmapper.c
@@ -783,7 +783,7 @@ read_relmap_file(RelMapFile *map, char *dbpath, bool lock_held, int elevel)
 {
 	char		mapfilename[MAXPGPATH];
 	pg_crc32c	crc;
-	int			fd;
+	File		file;
 	int			r;
 
 	Assert(elevel >= ERROR);
@@ -809,18 +809,21 @@ read_relmap_file(RelMapFile *map, char *dbpath, bool lock_held, int elevel)
 	 */
 	snprintf(mapfilename, sizeof(mapfilename), "%s/%s", dbpath,
 			 RELMAPPER_FILENAME);
-	fd = OpenTransientFile(mapfilename, O_RDONLY | PG_BINARY);
-	if (fd < 0)
+	file = PathNameOpenFile(mapfilename, O_RDONLY | PG_BINARY);
+	if (file < 0)
 		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						mapfilename)));
 
 	/* Now read the data. */
-	pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_READ);
-	r = read(fd, map, sizeof(RelMapFile));
+	r = FileReadSeq(file, map, sizeof(RelMapFile), WAIT_EVENT_RELATION_MAP_READ);
 	if (r != sizeof(RelMapFile))
 	{
+		int save_errno = errno;
+		FileClose(file);
+		errno = save_errno;
+
 		if (r < 0)
 			ereport(elevel,
 					(errcode_for_file_access(),
@@ -831,13 +834,8 @@ read_relmap_file(RelMapFile *map, char *dbpath, bool lock_held, int elevel)
 					 errmsg("could not read file \"%s\": read %d of %zu",
 							mapfilename, r, sizeof(RelMapFile))));
 	}
-	pgstat_report_wait_end();
 
-	if (CloseTransientFile(fd) != 0)
-		ereport(elevel,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						mapfilename)));
+	FileClose(file);
 
 	if (!lock_held)
 		LWLockRelease(RelationMappingLock);
@@ -887,7 +885,7 @@ static void
 write_relmap_file(RelMapFile *newmap, bool write_wal, bool send_sinval,
 				  bool preserve_files, Oid dbid, Oid tsid, const char *dbpath)
 {
-	int			fd;
+	File		file;
 	char		mapfilename[MAXPGPATH];
 	char		maptempfilename[MAXPGPATH];
 
@@ -912,38 +910,36 @@ write_relmap_file(RelMapFile *newmap, bool write_wal, bool send_sinval,
 			 dbpath, RELMAPPER_TEMP_FILENAME);
 
 	/*
-	 * Open a temporary file. If a file already exists with this name, it must
+	 * Open a virtual file. If a file already exists with this name, it must
 	 * be left over from a previous crash, so we can overwrite it. Concurrent
 	 * calls to this function are not allowed.
 	 */
-	fd = OpenTransientFile(maptempfilename,
+	file = PathNameOpenFile(maptempfilename,
 						   O_WRONLY | O_CREAT | O_TRUNC | PG_BINARY);
-	if (fd < 0)
+	if (file < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
 						maptempfilename)));
 
-	/* Write new data to the file. */
-	pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
-	if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
+	/*
+	 * Write new data to the file.
+	 * We may be invoked during bootstrap, so we do our own cleanup
+	 * rather than depending on the resource owner to close the file.
+	 */
+	if (FileWriteSeq(file, newmap, sizeof(RelMapFile), WAIT_EVENT_RELATION_MAP_WRITE) != sizeof(RelMapFile))
 	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
+		int save_errno = errno;
+		FileClose(file);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not write file \"%s\": %m",
-						maptempfilename)));
+					errmsg("could not write file \"%s\": %m",
+						   maptempfilename)));
 	}
-	pgstat_report_wait_end();
 
 	/* And close the file. */
-	if (CloseTransientFile(fd) != 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						maptempfilename)));
+	FileClose(file);
 
 	if (write_wal)
 	{
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index f926f1faad..1eb37aecbb 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -33,7 +33,6 @@
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 
-
 /*
  * All resource IDs managed by this code are required to fit into a Datum,
  * which is fine since they are generally pointers or integers.
@@ -714,7 +713,7 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
 		while (ResourceArrayGetAny(&(owner->filearr), &foundres))
 		{
 			File		res = DatumGetFile(foundres);
-
+            
 			if (isCommit)
 				PrintFileLeakWarning(res);
 			FileClose(res);
-- 
2.33.0

#2vignesh C
vignesh21@gmail.com
In reply to: John Morris (#1)
Re: Unified File API

On Thu, 29 Jun 2023 at 13:20, John Morris <john.morris@crunchydata.com> wrote:

Background

==========

PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines.
PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet,

BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen,

Smgr_open,

On the downside, “amazing variety” also means somewhat confusing and difficult to add new features.
Someday, we’d like to add encryption or compression to the various PostgreSql files.
To do that, we need to bring all the relevant files into a common file API where we can implement
the new features.

Goals of Patch

=============

1)Unify file access so most of “the other” files can go through a common interface, allowing new features
like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t
change the logic of current code. 3)Convert a reasonable set of callers to use the new interface.

Note the focus is on the “other” files. The buffer cache and the WAL have similar needs,
but they are being done in a separate project. (yes, the two projects are coordinating)

Patch 0001. Create a common file API.

===============================

Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open),
2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile).
Of these three, virtual file descriptors (VFDs) are the most common. They are also the
only funnel which is implemented by PostgresSql.

Decision: Choose VFDs as the common interface.

Problem: VFDs are random access only.

Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND)

Problem: VFDs have minimal error handling (based on errno.)

Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg)

Problem: Must maintain compatibility with existing error handling code.

Solution: save and restore errno to minimize changes to existing code.

Patch 0002. Update code to use the common file API

===========================================

The second patch alters callers so they use VFDs rather than system or C library files.
It doesn’t modify all callers, but it does capture many of the files which need
to be encrypted or compressed. This is definitely WIP.

Future (not too far away)

=====================

Looking ahead, there will be another set of patches which inject buffering and encryption into
the VFD interface. The future patches will build on the current work and introduce new “oflags”

to enable encryption and buffering.

Compression is also a possibility, but currently lower priority and a bit tricky for random access files.
Let us know if you have a use case.

CFbot shows few compilation warnings/error at [1]https://cirrus-ci.com/task/6552527404007424:
[15:54:06.825] ../src/backend/storage/file/fd.c:2420:11: warning:
unused variable 'save_errno' [-Wunused-variable]
[15:54:06.825] int ret, save_errno;
[15:54:06.825] ^
[15:54:06.825] ../src/backend/storage/file/fd.c:4026:29: error: use of
undeclared identifier 'MAXIMUM_VFD'
[15:54:06.825] Assert(file >= 0 && file < MAXIMUM_VFD);
[15:54:06.825] ^
[15:54:06.825] 1 warning and 1 error generated.

[1]: https://cirrus-ci.com/task/6552527404007424

Regards,
Vignesh

#3vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#2)
Re: Unified File API

On Sat, 6 Jan 2024 at 22:58, vignesh C <vignesh21@gmail.com> wrote:

On Thu, 29 Jun 2023 at 13:20, John Morris <john.morris@crunchydata.com> wrote:

Background

==========

PostgreSQL has an amazing variety of routines for accessing files. Consider just the “open file” routines.
PathNameOpenFile, OpenTemporaryFile, BasicOpenFile, open, fopen, BufFileCreateFileSet,

BufFileOpenFileSet, AllocateFile, OpenTransientFile, FileSetCreate, FileSetOpen, mdcreate, mdopen,

Smgr_open,

On the downside, “amazing variety” also means somewhat confusing and difficult to add new features.
Someday, we’d like to add encryption or compression to the various PostgreSql files.
To do that, we need to bring all the relevant files into a common file API where we can implement
the new features.

Goals of Patch

=============

1)Unify file access so most of “the other” files can go through a common interface, allowing new features
like checksums, encryption or compression to be added transparently. 2) Do it in a way which doesn’t
change the logic of current code. 3)Convert a reasonable set of callers to use the new interface.

Note the focus is on the “other” files. The buffer cache and the WAL have similar needs,
but they are being done in a separate project. (yes, the two projects are coordinating)

Patch 0001. Create a common file API.

===============================

Currrently, PostgreSQL files feed into three funnels. 1) system file descriptors (read/write/open),
2) C library buffered files (fread/fwri;te/fopn), and 3) virtual file descriptors (FileRead/FileWrite/PathNameOpenFile).
Of these three, virtual file descriptors (VFDs) are the most common. They are also the
only funnel which is implemented by PostgresSql.

Decision: Choose VFDs as the common interface.

Problem: VFDs are random access only.

Solution: Add sequential read/write code on top of VFDs. (FileReadSeq, FileWriteSeq, FileSeek, FileTell, O_APPEND)

Problem: VFDs have minimal error handling (based on errno.)

Solution: Add an “ferror” style interface (FileError, FileEof, FileErrorCode, FileErrorMsg)

Problem: Must maintain compatibility with existing error handling code.

Solution: save and restore errno to minimize changes to existing code.

Patch 0002. Update code to use the common file API

===========================================

The second patch alters callers so they use VFDs rather than system or C library files.
It doesn’t modify all callers, but it does capture many of the files which need
to be encrypted or compressed. This is definitely WIP.

Future (not too far away)

=====================

Looking ahead, there will be another set of patches which inject buffering and encryption into
the VFD interface. The future patches will build on the current work and introduce new “oflags”

to enable encryption and buffering.

Compression is also a possibility, but currently lower priority and a bit tricky for random access files.
Let us know if you have a use case.

CFbot shows few compilation warnings/error at [1]:
[15:54:06.825] ../src/backend/storage/file/fd.c:2420:11: warning:
unused variable 'save_errno' [-Wunused-variable]
[15:54:06.825] int ret, save_errno;
[15:54:06.825] ^
[15:54:06.825] ../src/backend/storage/file/fd.c:4026:29: error: use of
undeclared identifier 'MAXIMUM_VFD'
[15:54:06.825] Assert(file >= 0 && file < MAXIMUM_VFD);
[15:54:06.825] ^
[15:54:06.825] 1 warning and 1 error generated.

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh