Incorrect errno used with %m for backend code

Started by Michael Paquierover 7 years ago11 messages

michael@paquier.xyz

over 7 years ago

1 attachment(s)

Hi all,

I have been reviewing the use of errno in the backend code when %m is
used in the logs, and found more places than when I looked at improving
the error messages for read() calls which bloat the errno value because
of intermediate system calls. As the problem is separate and should be
back-patched, I have preferred beginning a new thread.

A couple of places use CloseTransientFile without bothering much that
this can overwrite errno. I was tempted in keeping errno saved and kept
if set to a non-zero value, but refrained from doing so as some callers
may rely on the existing behavior, and the attached shows better the
intention.

Attached is a patch with everything I have spotted. Any opinions or
thoughts in getting this back-patched?

Thanks,
--
Michael

Attachments:

errno-m-logs-v1.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db70e..e5571a67d6 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	 */
 	if (fstat(fd, &stat))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not stat two-phase state file \"%s\": %m",
 							path)));
+		}
 		return NULL;
 	}
 
@@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
 	if (read(fd, buf, stat.st_size) != stat.st_size)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not read two-phase state file \"%s\": %m",
 							path)));
+		}
 		pfree(buf);
 		return NULL;
 	}
@@ -1663,16 +1673,22 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
 	if (write(fd, content, len) != len)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write two-phase state file: %m")));
 	}
 	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write two-phase state file: %m")));
@@ -1686,7 +1702,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync two-phase state file: %m")));
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..1a419aa49b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		close(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -11675,8 +11678,10 @@ retry:
 	if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 				(errcode_for_file_access(),
 				 errmsg("could not seek in log segment %s to offset %u: %m",
@@ -11688,9 +11693,11 @@ retry:
 	if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		pgstat_report_wait_end();
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 				(errcode_for_file_access(),
 				 errmsg("could not read from log segment %s, offset %u: %m",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 52fe55e2af..4ecdc9220f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -718,9 +718,10 @@ XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
 			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
 			{
 				char		path[MAXPGPATH];
+				int			save_errno = errno;
 
 				XLogFilePath(path, tli, sendSegNo, segsize);
-
+				errno = save_errno;
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not seek in log segment %s to offset %u: %m",
@@ -741,9 +742,10 @@ XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
 		if (readbytes <= 0)
 		{
 			char		path[MAXPGPATH];
+			int			save_errno = errno;
 
 			XLogFilePath(path, tli, sendSegNo, segsize);
-
+			errno = save_errno;
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read from log segment %s, offset %u, length %lu: %m",
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5688cbe2e9..3f1eae38a9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -495,6 +495,8 @@ perform_base_backup(basebackup_options *opt)
 			fp = AllocateFile(pathbuf, "rb");
 			if (fp == NULL)
 			{
+				int			save_errno = errno;
+
 				/*
 				 * Most likely reason for this is that the file was already
 				 * removed by a checkpoint, so check for that to get a better
@@ -502,6 +504,7 @@ perform_base_backup(basebackup_options *opt)
 				 */
 				CheckXLogRemoved(segno, tli);
 
+				errno = save_errno;
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not open file \"%s\": %m", pathbuf)));
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 963878a5d8..ddcff3b643 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -578,7 +578,10 @@ CheckPointReplicationOrigin(void)
 	/* write magic */
 	if ((write(tmpfd, &magic, sizeof(magic))) != sizeof(magic))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(tmpfd);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
@@ -617,7 +620,10 @@ CheckPointReplicationOrigin(void)
 		if ((write(tmpfd, &disk_state, sizeof(disk_state))) !=
 			sizeof(disk_state))
 		{
+			int			save_errno = errno;
+
 			CloseTransientFile(tmpfd);
+			errno = save_errno;
 			ereport(PANIC,
 					(errcode_for_file_access(),
 					 errmsg("could not write to file \"%s\": %m",
@@ -633,7 +639,10 @@ CheckPointReplicationOrigin(void)
 	FIN_CRC32C(crc);
 	if ((write(tmpfd, &crc, sizeof(crc))) != sizeof(crc))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(tmpfd);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 4123cdebcf..b0f82391b4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1605,7 +1605,10 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
 	if ((write(fd, ondisk, needed_length)) != needed_length)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", tmppath)));
@@ -1623,7 +1626,10 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -1708,7 +1714,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != SnapBuildOnDiskConstantSize)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1736,7 +1745,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sizeof(SnapBuild))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1753,7 +1765,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sz)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1769,7 +1784,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sz)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index e8b76b2f20..5b7116838f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1378,7 +1378,10 @@ RestoreSlotFromDisk(const char *name)
 	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m",

Ashutosh Sharma

ashu.coek88@gmail.com

over 7 years ago

In reply to: Michael Paquier (#1)

Re: Incorrect errno used with %m for backend code

Hi,

It seems like in case of few system calls for e.g. write system call,
errno is not set even if the number of bytes written is smaller than
the bytes requested and for such cases we explicitly set an errno to
ENOSPC. Something like this,

/*
* if write didn't set errno, assume problem is no disk space
*/
errno = save_errno ? save_errno : ENOSPC;

Shouldn't we do similar handling in your patch as well or please let
me know if i am missing something here.

Show quoted text

On Fri, Jun 22, 2018 at 11:45 AM, Michael Paquier <michael@paquier.xyz> wrote:

Hi all,

I have been reviewing the use of errno in the backend code when %m is
used in the logs, and found more places than when I looked at improving
the error messages for read() calls which bloat the errno value because
of intermediate system calls. As the problem is separate and should be
back-patched, I have preferred beginning a new thread.

A couple of places use CloseTransientFile without bothering much that
this can overwrite errno. I was tempted in keeping errno saved and kept
if set to a non-zero value, but refrained from doing so as some callers
may rely on the existing behavior, and the attached shows better the
intention.

Attached is a patch with everything I have spotted. Any opinions or
thoughts in getting this back-patched?

Thanks,
--
Michael

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Ashutosh Sharma (#2)

Re: Incorrect errno used with %m for backend code

On Fri, Jun 22, 2018 at 01:00:45PM +0530, Ashutosh Sharma wrote:

It seems like in case of few system calls for e.g. write system call,
errno is not set even if the number of bytes written is smaller than
the bytes requested and for such cases we explicitly set an errno to
ENOSPC. Something like this,

/*
* if write didn't set errno, assume problem is no disk space
*/
errno = save_errno ? save_errno : ENOSPC;

Shouldn't we do similar handling in your patch as well or please let
me know if i am missing something here.

Yeah, I can see at least three of them in twophase.c. Let's fix those
as well at the same time.
--
Michael

Ashutosh Sharma

ashu.coek88@gmail.com

over 7 years ago

In reply to: Michael Paquier (#3)

Re: Incorrect errno used with %m for backend code

On Fri, Jun 22, 2018 at 2:44 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jun 22, 2018 at 01:00:45PM +0530, Ashutosh Sharma wrote:

It seems like in case of few system calls for e.g. write system call,
errno is not set even if the number of bytes written is smaller than
the bytes requested and for such cases we explicitly set an errno to
ENOSPC. Something like this,

/*
* if write didn't set errno, assume problem is no disk space
*/
errno = save_errno ? save_errno : ENOSPC;

Shouldn't we do similar handling in your patch as well or please let
me know if i am missing something here.

Yeah, I can see at least three of them in twophase.c. Let's fix those
as well at the same time.

Okay, thanks for the confirmation. Few of them are also there in
origin.c and snapbuild.c files.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Michael Paquier (#1)

Re: Incorrect errno used with %m for backend code

On 2018-Jun-22, Michael Paquier wrote:

A couple of places use CloseTransientFile without bothering much that
this can overwrite errno. I was tempted in keeping errno saved and kept
if set to a non-zero value, but refrained from doing so as some callers
may rely on the existing behavior, and the attached shows better the
intention.

I wondered for a bit if it would be a better idea to have
CloseTransientFile restore the existing errno if there is any and close
works fine -- when I noticed that that function itself says that caller
should check errno for close() errors. Most callers seem to do it
correctly, but a few fail to check the return value.

Quite some places open files O_RDONLY, so lack of error checking after
closing seems ok. (Unless there's some funny interaction with the fsync
fiasco, of course.)

A bunch of other places use CloseTransientFile while closing shop after
some other syscall failure, which is what your patch fixes. I didn't
review those; checking for close failure seems pointless.

In some cases, we fsync the file and check that return value, then close
the file and do *not* check CloseTransientFile's return value --
examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite,
SnapBuildSerialize, SaveSlotToPath, durable_rename. I don't know if
it's reasonable for close() to fail after successfully fsyncing a file;
maybe this is not a problem. I would patch those nonetheless.

be_lo_export() is certainly missing a check: it writes and closes,
without intervening fsync.

I don't understand the logic in RestoreSlotFromDisk that fsync the file
prior to reading it. What are we protecting against?

Anyway, while I think it would be nice to have CloseTransientFile
restore the original errno if there was one and close goes well, it
probably unduly complicates its API contract.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Alvaro Herrera (#5)

Re: Incorrect errno used with %m for backend code

Hi,

On 2018-06-22 11:15:53 -0400, Alvaro Herrera wrote:

I don't understand the logic in RestoreSlotFromDisk that fsync the file
prior to reading it. What are we protecting against?

we could previously have crashed before fsyncing. But we can only rely
on slot contents being durable if we fsync them. Therefore we fsync
before reading, after a crash.

Anyway, while I think it would be nice to have CloseTransientFile
restore the original errno if there was one and close goes well, it
probably unduly complicates its API contract.

Yea, agreed.

Greetings,

Andres Freund

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Ashutosh Sharma (#4)

1 attachment(s)

Re: Incorrect errno used with %m for backend code

On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:

Okay, thanks for the confirmation. Few of them are also there in
origin.c and snapbuild.c files.

Thanks Ashutosh. I have been reviewing the whole tree and I found more
places where this is missing, like rewriteheap.c, reorderbuffer.c or
pg_basebackup, which gives the attached.
--
Michael

Attachments:

errno-m-logs-v2.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 8d3c861a33..ed7ba181c7 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1168,9 +1168,14 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 	/* write out tail end of mapping file (again) */
 	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
 	if (write(fd, data, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", path)));
+	}
 	pgstat_report_wait_end();
 
 	/*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db70e..a9ef1b3d73 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	 */
 	if (fstat(fd, &stat))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not stat two-phase state file \"%s\": %m",
 							path)));
+		}
 		return NULL;
 	}
 
@@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
 	if (read(fd, buf, stat.st_size) != stat.st_size)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not read two-phase state file \"%s\": %m",
 							path)));
+		}
 		pfree(buf);
 		return NULL;
 	}
@@ -1663,16 +1673,26 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
 	if (write(fd, content, len) != len)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write two-phase state file: %m")));
 	}
 	if (write(fd, &statefile_crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write two-phase state file: %m")));
@@ -1686,7 +1706,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync two-phase state file: %m")));
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..1a419aa49b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		close(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -11675,8 +11678,10 @@ retry:
 	if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 				(errcode_for_file_access(),
 				 errmsg("could not seek in log segment %s to offset %u: %m",
@@ -11688,9 +11693,11 @@ retry:
 	if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		pgstat_report_wait_end();
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 				(errcode_for_file_access(),
 				 errmsg("could not read from log segment %s, offset %u: %m",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 52fe55e2af..4ecdc9220f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -718,9 +718,10 @@ XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
 			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
 			{
 				char		path[MAXPGPATH];
+				int			save_errno = errno;
 
 				XLogFilePath(path, tli, sendSegNo, segsize);
-
+				errno = save_errno;
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not seek in log segment %s to offset %u: %m",
@@ -741,9 +742,10 @@ XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
 		if (readbytes <= 0)
 		{
 			char		path[MAXPGPATH];
+			int			save_errno = errno;
 
 			XLogFilePath(path, tli, sendSegNo, segsize);
-
+			errno = save_errno;
 			ereport(ERROR,
 					(errcode_for_file_access(),
 					 errmsg("could not read from log segment %s, offset %u, length %lu: %m",
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5688cbe2e9..3f1eae38a9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -495,6 +495,8 @@ perform_base_backup(basebackup_options *opt)
 			fp = AllocateFile(pathbuf, "rb");
 			if (fp == NULL)
 			{
+				int			save_errno = errno;
+
 				/*
 				 * Most likely reason for this is that the file was already
 				 * removed by a checkpoint, so check for that to get a better
@@ -502,6 +504,7 @@ perform_base_backup(basebackup_options *opt)
 				 */
 				CheckXLogRemoved(segno, tli);
 
+				errno = save_errno;
 				ereport(ERROR,
 						(errcode_for_file_access(),
 						 errmsg("could not open file \"%s\": %m", pathbuf)));
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 963878a5d8..3d3f6dff1b 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -578,7 +578,12 @@ CheckPointReplicationOrigin(void)
 	/* write magic */
 	if ((write(tmpfd, &magic, sizeof(magic))) != sizeof(magic))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(tmpfd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
@@ -617,7 +622,12 @@ CheckPointReplicationOrigin(void)
 		if ((write(tmpfd, &disk_state, sizeof(disk_state))) !=
 			sizeof(disk_state))
 		{
+			int			save_errno = errno;
+
 			CloseTransientFile(tmpfd);
+
+			/* if write didn't set errno, assume problem is no disk space */
+			errno = save_errno ? save_errno : ENOSPC;
 			ereport(PANIC,
 					(errcode_for_file_access(),
 					 errmsg("could not write to file \"%s\": %m",
@@ -633,7 +643,12 @@ CheckPointReplicationOrigin(void)
 	FIN_CRC32C(crc);
 	if ((write(tmpfd, &crc, sizeof(crc))) != sizeof(crc))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(tmpfd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index e2f59bf580..3799ad4011 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2304,7 +2304,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		int			save_errno = errno;
 
 		CloseTransientFile(fd);
-		errno = save_errno;
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to data file for XID %u: %m",
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 4123cdebcf..2c4a1bab4b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1605,7 +1605,12 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
 	if ((write(fd, ondisk, needed_length)) != needed_length)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m", tmppath)));
@@ -1623,7 +1628,10 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -1708,7 +1716,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != SnapBuildOnDiskConstantSize)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1736,7 +1747,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sizeof(SnapBuild))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1753,7 +1767,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sz)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
@@ -1769,7 +1786,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	pgstat_report_wait_end();
 	if (readBytes != sz)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 				(errcode_for_file_access(),
 				 errmsg("could not read file \"%s\", read %d of %d: %m",
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index e8b76b2f20..f5927b4d1d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1274,7 +1274,9 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
-		errno = save_errno;
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not write to file \"%s\": %m",
@@ -1378,7 +1380,10 @@ RestoreSlotFromDisk(const char *name)
 	pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not fsync file \"%s\": %m",
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index 267a40debb..32a963a189 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -128,7 +128,12 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
 
 				pg_free(zerobuf);
 				close(fd);
-				errno = save_errno;
+
+				/*
+				 * If write didn't set errno, assume problem is no disk
+				 * space.
+				 */
+				errno = save_errno ? save_errno : ENOSPC;
 				return NULL;
 			}
 		}
@@ -442,7 +447,15 @@ tar_write_compressed_data(void *buf, size_t count, bool flush)
 			size_t		len = ZLIB_OUT_SIZE - tar_data->zp->avail_out;
 
 			if (write(tar_data->fd, tar_data->zlibOut, len) != len)
+			{
+				/*
+				 * If write didn't set errno, assume problem is no disk
+				 * space.
+				 */
+				if (errno == 0)
+					errno = ENOSPC;
 				return false;
+			}
 
 			tar_data->zp->next_out = tar_data->zlibOut;
 			tar_data->zp->avail_out = ZLIB_OUT_SIZE;
@@ -623,7 +636,8 @@ tar_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
 			save_errno = errno;
 			pg_free(tar_data->currentfile);
 			tar_data->currentfile = NULL;
-			errno = save_errno;
+			/* if write didn't set errno, assume problem is no disk space */
+			errno = save_errno ? save_errno : ENOSPC;
 			return NULL;
 		}
 	}
@@ -818,7 +832,12 @@ tar_close(Walfile f, WalCloseMethod method)
 	if (!tar_data->compression)
 	{
 		if (write(tar_data->fd, tf->header, 512) != 512)
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
 			return -1;
+		}
 	}
 #ifdef HAVE_LIBZ
 	else
@@ -884,7 +903,12 @@ tar_finish(void)
 	if (!tar_data->compression)
 	{
 		if (write(tar_data->fd, zerobuf, sizeof(zerobuf)) != sizeof(zerobuf))
+		{
+			/* if write didn't set errno, assume problem is no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
 			return false;
+		}
 	}
 #ifdef HAVE_LIBZ
 	else
@@ -911,7 +935,15 @@ tar_finish(void)
 				size_t		len = ZLIB_OUT_SIZE - tar_data->zp->avail_out;
 
 				if (write(tar_data->fd, tar_data->zlibOut, len) != len)
+				{
+					/*
+					 * If write didn't set errno, assume problem is no disk
+					 * space.
+					 */
+					if (errno == 0)
+						errno = ENOSPC;
 					return false;
+				}
 			}
 			if (r == Z_STREAM_END)
 				break;

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Alvaro Herrera (#5)

Re: Incorrect errno used with %m for backend code

On Fri, Jun 22, 2018 at 11:15:53AM -0400, Alvaro Herrera wrote:

I wondered for a bit if it would be a better idea to have
CloseTransientFile restore the existing errno if there is any and close
works fine -- when I noticed that that function itself says that caller
should check errno for close() errors. Most callers seem to do it
correctly, but a few fail to check the return value.

Yeah, the API in its current shape is simpler to understand. Once you
get used to the errno stanza of course...

A bunch of other places use CloseTransientFile while closing shop after
some other syscall failure, which is what your patch fixes. I didn't
review those; checking for close failure seems pointless.

Agreed.

In some cases, we fsync the file and check that return value, then close
the file and do *not* check CloseTransientFile's return value --
examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite,
SnapBuildSerialize, SaveSlotToPath, durable_rename. I don't know if
it's reasonable for close() to fail after successfully fsyncing a file;
maybe this is not a problem. I would patch those nonetheless.

And writeTimeLineHistory.

be_lo_export() is certainly missing a check: it writes and closes,
without intervening fsync.

One problem at the same time if possible :) I think that those
adjustments should be a separate patch.
--
Michael

Ashutosh Sharma

ashu.coek88@gmail.com

over 7 years ago

In reply to: Michael Paquier (#7)

Re: Incorrect errno used with %m for backend code

On Sat, Jun 23, 2018 at 6:43 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:

Okay, thanks for the confirmation. Few of them are also there in
origin.c and snapbuild.c files.

Thanks Ashutosh. I have been reviewing the whole tree and I found more
places where this is missing, like rewriteheap.c, reorderbuffer.c or
pg_basebackup, which gives the attached.
--

Okay, I too had a quick look into the source code to see if there are
still some places where we could have missed to set an errno to ENOSPC
in case of write system call failure but, couldn't find any such place
in the code. The v2 version of patch looks good to me.

So, to conclude, now, v2 patch fixes two things - 1) It makes ereport
to print a correct error number (the error number that matches with
the error message), 2) It sets the errno to ENOSPC (assuming that the
problem is no disk space) if write system call fails to set an errno.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

#10

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Ashutosh Sharma (#9)

Re: Incorrect errno used with %m for backend code

On Sun, Jun 24, 2018 at 09:22:01AM +0530, Ashutosh Sharma wrote:

Okay, I too had a quick look into the source code to see if there are
still some places where we could have missed to set an errno to ENOSPC
in case of write system call failure but, couldn't find any such place
in the code. The v2 version of patch looks good to me.

Thanks for the review. I'll try to wrap that tomorrow with proper
patches for back-branches as things diverge a bit here and there.

So, to conclude, now, v2 patch fixes two things - 1) It makes ereport
to print a correct error number (the error number that matches with
the error message), 2) It sets the errno to ENOSPC (assuming that the
problem is no disk space) if write system call fails to set an errno.

Yes, 1) and 2) and not completely exclusive either, there are some code
paths where both problems happen, like RecreateTwoPhaseFile.
--
Michael

#11

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#10)

Re: Incorrect errno used with %m for backend code

On Sun, Jun 24, 2018 at 09:23:50PM +0900, Michael Paquier wrote:

On Sun, Jun 24, 2018 at 09:22:01AM +0530, Ashutosh Sharma wrote:

Okay, I too had a quick look into the source code to see if there are
still some places where we could have missed to set an errno to ENOSPC
in case of write system call failure but, couldn't find any such place
in the code. The v2 version of patch looks good to me.

Thanks for the review. I'll try to wrap that tomorrow with proper
patches for back-branches as things diverge a bit here and there.

Pushed down to 9.3. All branches conflicted, and particularly in
pg_basebackup some write() calls have been moved around.
--
Michael