BufFileRead() error signalling

Started by Thomas Munroabout 6 years ago28 messages
#1Thomas Munro
thomas.munro@gmail.com

Hi,

As noted by Amit Khandhekar yesterday[1]/messages/by-id/CAJ3gD9emnEys=R+T3OVt_87DuMpMfS4KvoRV6e=iSi5PT+9f3w@mail.gmail.com, BufFileLoad() silently eats
pread()'s error and makes them indistinguishable from EOF.

Some observations:

1. BufFileRead() returns size_t (not ssize_t), so it's an
fread()-style interface, not a read()-style interface that could use
-1 to signal errors. Unlike fread(), it doesn't seem to have anything
corresponding to feof()/ferror()/clearerr() that lets the caller
distinguish between EOF and error, so our hash and tuplestore spill
code simply trusts that if there is a 0 size read where it expects a
tuple boundary, it must be EOF.

2. BufFileWrite() is the same, but just like fwrite(), a short write
must always mean error, so there is no problem here.

3. The calling code assumes that unexpected short read or write sets
errno, which isn't the documented behaviour of fwrite() and fread(),
so there we're doing something a bit different (which is fine, just
pointing it out; we're sort of somewhere between the <stdio.h> and
<unistd.h> functions, in terms of error reporting).

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

If we simply added an error flag to track if FileRead() had ever
signalled an error, we could change nodeHashjoin.c to do something
along these lines:

-    if (nread == 0)             /* end of file */
+    if (!BufFileError(file) && nread == 0)             /* end of file */

... and likewise for tuplestore.c:

-    if (nbytes != 0 || !eofOK)
+    if (BufFileError(file) || (nbytes == 0 && !eofOK))
         ereport(ERROR,

About the only advantage to the current design I can see if that you
can probably make your query finish faster by pulling out your temp
tablespace USB stick at the right time. Or am I missing some
complication?

[1]: /messages/by-id/CAJ3gD9emnEys=R+T3OVt_87DuMpMfS4KvoRV6e=iSi5PT+9f3w@mail.gmail.com

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: BufFileRead() error signalling

On Thu, Nov 21, 2019 at 10:50 AM Thomas Munro <thomas.munro@gmail.com> wrote:

As noted by Amit Khandhekar yesterday[1], BufFileLoad() silently eats

Erm, Khandekar, sorry for the extra h!

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: BufFileRead() error signalling

Thomas Munro <thomas.munro@gmail.com> writes:

As noted by Amit Khandhekar yesterday[1], BufFileLoad() silently eats
pread()'s error and makes them indistinguishable from EOF.

That's definitely bad.

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

I'd vote for (3), I think. Making the callers responsible for error
checks just leaves us with a permanent hazard of errors-of-omission,
and as you say, there's really no use-case where we'd be trying to
recover from the error.

I think that the motivation for making the caller do it might've
been an idea that the caller could provide a more useful error
message, but I'm not real sure that that's true --- the caller
doesn't know the physical file's name, and it doesn't necessarily
have the right errno either.

Possibly we could address any loss of usefulness by requiring callers
to pass some sort of context identification to BufFileCreate?

regards, tom lane

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
3 attachment(s)
Re: BufFileRead() error signalling

On Thu, Nov 21, 2019 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

I'd vote for (3), I think. Making the callers responsible for error
checks just leaves us with a permanent hazard of errors-of-omission,
and as you say, there's really no use-case where we'd be trying to
recover from the error.

Ok. Here is a first attempt at that. I also noticed that some
callers of BufFileFlush() eat or disguise I/O errors too, so here's a
patch for that, though I'm a little confused about the exact meaning
of EOF from BufFileSeek().

I think that the motivation for making the caller do it might've
been an idea that the caller could provide a more useful error
message, but I'm not real sure that that's true --- the caller
doesn't know the physical file's name, and it doesn't necessarily
have the right errno either.

Yeah, the errno is undefined right now since we don't know if there
was an error.

Possibly we could address any loss of usefulness by requiring callers
to pass some sort of context identification to BufFileCreate?

Hmm. It's an idea. While thinking about the cohesion of this
module's API, I thought it seemed pretty strange to have
BufFileWrite() using a different style of error reporting, so here's
an experimental 0003 patch to make it consistent. I realise that an
API change might affect extensions, so I'm not sure if it's a good
idea even for master (obviously it's not back-patchable). You could
be right that more context would be good at least in the case of
ENOSPC: knowing that (say) a hash join or a sort or CTE is implicated
would be helpful.

Attachments:

0001-Raise-errors-for-I-O-errors-during-BufFileRead.patchapplication/octet-stream; name=0001-Raise-errors-for-I-O-errors-during-BufFileRead.patchDownload
From 00b2918e1bb18601cd78ac4eebb2f1882a011bd3 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 30 Nov 2019 12:44:47 +1300
Subject: [PATCH 1/3] Raise errors for I/O errors during BufFileRead().

Previously, we returned 0 in case of I/O errors, which was
indistinguishable from hitting the end of the file.  Remove %m from
error messages about short reads, since BufFileRead()t now only
returns control if there was no error.

Author: Thomas Munro
Reported-by: Amit Khandekar
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
---
 src/backend/access/gist/gistbuildbuffers.c |  2 +-
 src/backend/executor/nodeHashjoin.c        |  4 ++--
 src/backend/storage/file/buffile.c         | 10 +++++++++-
 src/backend/utils/sort/logtape.c           |  2 +-
 src/backend/utils/sort/tuplestore.c        |  6 +++---
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 38f786848d..2feec247cf 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -760,7 +760,7 @@ ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
 	if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
-		elog(ERROR, "could not read temporary file: %m");
+		elog(ERROR, "could not read temporary file");
 }
 
 static void
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index ec37558c12..35a181632b 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1260,7 +1260,7 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != sizeof(header))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file")));
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
@@ -1270,7 +1270,7 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != header[1] - sizeof(uint32))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file")));
 	ExecForceStoreMinimalTuple(tuple, tupleSlot, true);
 	return tupleSlot;
 }
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 440ff77e1f..136f80c64d 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -434,7 +434,14 @@ BufFileLoadBuffer(BufFile *file)
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 	if (file->nbytes < 0)
+	{
 		file->nbytes = 0;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						FilePathName(thisfile))));
+	}
+
 	/* we choose not to advance curOffset here */
 
 	if (file->nbytes > 0)
@@ -522,7 +529,8 @@ BufFileDumpBuffer(BufFile *file)
 /*
  * BufFileRead
  *
- * Like fread() except we assume 1-byte element size.
+ * Like fread() except we assume 1-byte element size and report I/O errors via
+ * ereport().
  */
 size_t
 BufFileRead(BufFile *file, void *ptr, size_t size)
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 8985b9e095..a410ff82f8 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -273,7 +273,7 @@ ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 		BufFileRead(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read block %ld of temporary file: %m",
+				 errmsg("could not read block %ld of temporary file",
 						blocknum)));
 }
 
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index 3fc7f92182..2761b063a6 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1474,7 +1474,7 @@ getlen(Tuplestorestate *state, bool eofOK)
 	if (nbytes != 0 || !eofOK)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file")));
 	return 0;
 }
 
@@ -1547,12 +1547,12 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 					tupbodylen) != (size_t) tupbodylen)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file")));
 	if (state->backward)		/* need trailing length word? */
 		if (BufFileRead(state->myfile, (void *) &tuplen,
 						sizeof(tuplen)) != sizeof(tuplen))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not read from tuplestore temporary file: %m")));
+					 errmsg("could not read from tuplestore temporary file")));
 	return (void *) tuple;
 }
-- 
2.23.0

0002-Report-I-O-errors-from-BufFileFlush-via-ereport.patchapplication/octet-stream; name=0002-Report-I-O-errors-from-BufFileFlush-via-ereport.patchDownload
From 348920f0898fd51f97b73830ec0673dc9d4c48b9 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 30 Nov 2019 15:16:32 +1300
Subject: [PATCH 2/3] Report I/O errors from BufFileFlush() via ereport().

Previously we ignored the I/O errors while flushing data in BufFileRead()
(following BufFileWrite()), BufFileClose() and BufFileSeek().
---
 src/backend/storage/file/buffile.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 136f80c64d..1706f9d4d5 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -99,7 +99,7 @@ static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
 static void BufFileDumpBuffer(BufFile *file);
-static int	BufFileFlush(BufFile *file);
+static void BufFileFlush(BufFile *file);
 static File MakeNewSharedSegment(BufFile *file, int segment);
 
 /*
@@ -540,8 +540,7 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 
 	if (file->dirty)
 	{
-		if (BufFileFlush(file) != 0)
-			return 0;			/* could not flush... */
+		BufFileFlush(file);
 		Assert(!file->dirty);
 	}
 
@@ -629,19 +628,19 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileFlush
  *
- * Like fflush()
+ * Like fflush(), except that I/O errors are reported with ereport().
  */
-static int
+static void
 BufFileFlush(BufFile *file)
 {
 	if (file->dirty)
 	{
 		BufFileDumpBuffer(file);
 		if (file->dirty)
-			return EOF;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to temporary file: %m")));
 	}
-
-	return 0;
 }
 
 /*
@@ -707,8 +706,7 @@ BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
 		return 0;
 	}
 	/* Otherwise, must reposition buffer, so flush any dirty data */
-	if (BufFileFlush(file) != 0)
-		return EOF;
+	BufFileFlush(file);
 
 	/*
 	 * At this point and no sooner, check for seek past last segment. The
-- 
2.23.0

0003-Align-BufFileWrite-s-error-reporting-with-BufFileRea.patchapplication/octet-stream; name=0003-Align-BufFileWrite-s-error-reporting-with-BufFileRea.patchDownload
From b198a3d98ce754c265096c6cbdf67f0700558afb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 30 Nov 2019 13:49:01 +1300
Subject: [PATCH 3/3] Align BufFileWrite()'s error reporting with
 BufFileRead()'s.

BufFileRead() now uses ereport() for I/O errors, and otherwise returns
the number of bytes read.  Let's use ereport for BufFileWrite() too,
for symmetry.

While this removes the ability of the caller to report a more
informative error message, in practice the messages were all pretty
generic anyway.   Since error detection was the only use for the return
value, change the return type to void.
---
 src/backend/access/gist/gistbuildbuffers.c | 12 +----------
 src/backend/executor/nodeHashjoin.c        | 14 ++----------
 src/backend/storage/file/buffile.c         | 25 +++++++++-------------
 src/backend/utils/sort/logtape.c           |  6 +++---
 src/backend/utils/sort/sharedtuplestore.c  |  7 +-----
 src/backend/utils/sort/tuplestore.c        | 18 +++-------------
 src/include/storage/buffile.h              |  2 +-
 7 files changed, 21 insertions(+), 63 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 2feec247cf..5d376258fa 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -768,15 +768,5 @@ WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
 		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileWrite(file, ptr, BLCKSZ) != BLCKSZ)
-	{
-		/*
-		 * the other errors in Read/WriteTempFileBlock shouldn't happen, but
-		 * an error at write can easily happen if you run out of disk space.
-		 */
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
-						blknum)));
-	}
+	BufFileWrite(file, ptr, BLCKSZ);
 }
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 35a181632b..d62b6e8954 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1200,7 +1200,6 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 					  BufFile **fileptr)
 {
 	BufFile    *file = *fileptr;
-	size_t		written;
 
 	if (file == NULL)
 	{
@@ -1209,17 +1208,8 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 		*fileptr = file;
 	}
 
-	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
-	if (written != sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
-
-	written = BufFileWrite(file, (void *) tuple, tuple->t_len);
-	if (written != tuple->t_len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
+	BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
+	BufFileWrite(file, (void *) tuple, tuple->t_len);
 }
 
 /*
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 1706f9d4d5..e7f7c9e253 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -497,7 +497,10 @@ BufFileDumpBuffer(BufFile *file)
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
 		if (bytestowrite <= 0)
-			return;				/* failed to write */
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to \"%s\" : %m",
+							FilePathName(thisfile))));
 		file->curOffset += bytestowrite;
 		wpos += bytestowrite;
 
@@ -576,9 +579,10 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileWrite
  *
- * Like fwrite() except we assume 1-byte element size.
+ * Like fwrite() except we assume 1-byte element size and report errors via
+ * ereport().
  */
-size_t
+void
 BufFileWrite(BufFile *file, void *ptr, size_t size)
 {
 	size_t		nwritten = 0;
@@ -592,11 +596,7 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 		{
 			/* Buffer full, dump it out */
 			if (file->dirty)
-			{
 				BufFileDumpBuffer(file);
-				if (file->dirty)
-					break;		/* I/O error */
-			}
 			else
 			{
 				/* Hmm, went directly from reading to writing? */
@@ -621,8 +621,6 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 		size -= nthistime;
 		nwritten += nthistime;
 	}
-
-	return nwritten;
 }
 
 /*
@@ -634,13 +632,9 @@ static void
 BufFileFlush(BufFile *file)
 {
 	if (file->dirty)
-	{
 		BufFileDumpBuffer(file);
-		if (file->dirty)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not write to temporary file: %m")));
-	}
+
+	Assert(!file->dirty);
 }
 
 /*
@@ -649,6 +643,7 @@ BufFileFlush(BufFile *file)
  * Like fseek(), except that target position needs two values in order to
  * work when logical filesize exceeds maximum value representable by off_t.
  * We do not support relative seeks across more than that, however.
+ * I/O errors are reported by ereport().
  *
  * Result is 0 if OK, EOF if not.  Logical position is not moved if an
  * impossible seek is attempted.
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index a410ff82f8..085c203944 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -248,12 +248,12 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 	}
 
 	/* Write the requested block */
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileWrite(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
+				 errmsg("could not seek to block %ld of temporary file: %m",
 						blocknum)));
+	BufFileWrite(lts->pfile, buffer, BLCKSZ);
 
 	/* Update nBlocksWritten, if we extended the file */
 	if (blocknum == lts->nBlocksWritten)
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index 7765a445c0..c6d09b118f 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -196,14 +196,9 @@ static void
 sts_flush_chunk(SharedTuplestoreAccessor *accessor)
 {
 	size_t		size;
-	size_t		written;
 
 	size = STS_CHUNK_PAGES * BLCKSZ;
-	written = BufFileWrite(accessor->write_file, accessor->write_chunk, size);
-	if (written != size)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(accessor->write_file, accessor->write_chunk, size);
 	memset(accessor->write_chunk, 0, size);
 	accessor->write_pointer = &accessor->write_chunk->data[0];
 	accessor->sts->participants[accessor->participant].npages +=
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index 2761b063a6..b57d90c53b 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -1511,22 +1511,10 @@ writetup_heap(Tuplestorestate *state, void *tup)
 	/* total on-disk footprint: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
-	if (BufFileWrite(state->myfile, (void *) &tuplen,
-					 sizeof(tuplen)) != sizeof(tuplen))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
-	if (BufFileWrite(state->myfile, (void *) tupbody,
-					 tupbodylen) != (size_t) tupbodylen)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
+	BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
+	BufFileWrite(state->myfile, (void *) tupbody, tupbodylen);
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileWrite(state->myfile, (void *) &tuplen,
-						 sizeof(tuplen)) != sizeof(tuplen))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not write to tuplestore temporary file: %m")));
+		BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
 
 	FREEMEM(state, GetMemoryChunkSpace(tuple));
 	heap_free_minimal_tuple(tuple);
diff --git a/src/include/storage/buffile.h b/src/include/storage/buffile.h
index 1fba404fe2..bfb41f5bd6 100644
--- a/src/include/storage/buffile.h
+++ b/src/include/storage/buffile.h
@@ -39,7 +39,7 @@ typedef struct BufFile BufFile;
 extern BufFile *BufFileCreateTemp(bool interXact);
 extern void BufFileClose(BufFile *file);
 extern size_t BufFileRead(BufFile *file, void *ptr, size_t size);
-extern size_t BufFileWrite(BufFile *file, void *ptr, size_t size);
+extern void BufFileWrite(BufFile *file, void *ptr, size_t size);
 extern int	BufFileSeek(BufFile *file, int fileno, off_t offset, int whence);
 extern void BufFileTell(BufFile *file, int *fileno, off_t *offset);
 extern int	BufFileSeekBlock(BufFile *file, long blknum);
-- 
2.23.0

#5Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Thomas Munro (#4)
Re: BufFileRead() error signalling

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.
size_t
BufFileRead(BufFile *file, void *ptr, size_t size)
{  
     size_t      nread = 0;
     size_t      nthistime;
     if (file->dirty)
     {  
         BufFileFlush(file);
         Assert(!file->dirty);
     }
static void
 BufFileFlush(BufFile *file)
 {
     if (file->dirty)
         BufFileDumpBuffer(file);
     Assert(!file->dirty);

The new status of this patch is: Waiting on Author

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Ibrar Ahmed (#5)
Re: BufFileRead() error signalling

On Wed, Dec 11, 2019 at 2:07 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.

True. Thanks for the review. Before I post a new version, any
opinions on whether to back-patch, and whether to do 0003 in master
only, or at all?

#7Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#6)
Re: BufFileRead() error signalling

On Fri, Jan 24, 2020 at 11:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Dec 11, 2019 at 2:07 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.

True. Thanks for the review. Before I post a new version, any
opinions on whether to back-patch, and whether to do 0003 in master
only, or at all?

Rather than answering your actual question, I'd like to complain about this:

  if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
- elog(ERROR, "could not read temporary file: %m");
+ elog(ERROR, "could not read temporary file");

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

Now that is an error message! I am not confused! I don't know why that
happened, but I sure know what happened!

I think we should aim for that kind of style everywhere, so that
complaints about reading and writing files are typically reported as
either of these:

could not read file "%s": %m
could not read file "%s": read only %d of %d bytes

There is an existing precedent in twophase.c which works almost this way:

if (r != stat.st_size)
{
if (r < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read file
\"%s\": %m", path)));
else
ereport(ERROR,
(errmsg("could not read file
\"%s\": read %d of %zu",
path, r,
(Size) stat.st_size)));
}

I'd advocate for a couple more words in the latter message ("only" and
"bytes") but it's still excellent.

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote. It's worth keeping in mind that changes to
message strings will tend to degrade translatability unless the new
strings are copies of existing strings. Regarding 0003, it seems good
to me to make BufFileRead() and BufFileWrite() consistent in terms of
error-handling behavior, so +1 for the concept (but I haven't reviewed
the code).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 10:09:30AM -0500, Robert Haas wrote:

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

Now that is an error message! I am not confused! I don't know why that
happened, but I sure know what happened!

I was briefly looking at 0001, and count -1 from me for the
formulation of the error messages used in those patches.

I think we should aim for that kind of style everywhere, so that
complaints about reading and writing files are typically reported as
either of these:

could not read file "%s": %m
could not read file "%s": read only %d of %d bytes

That's actually not the best fit, because this does not take care of
the pluralization of the second message if you have only 1 byte to
read ;)

A second point to take into account is that the unification of error
messages makes for less translation work, which is always welcome.
Those points have been discussed on this thread:
/messages/by-id/20180520000522.GB1603@paquier.xyz

The related commit is 811b6e3, and the pattern we agreed on for a
partial read was:
"could not read file \"%s\": read %d of %zu"

Then, if the syscall had an error we'd fall down to that:
"could not read file \"%s\": %m"
--
Michael

#9Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#8)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 9:03 PM Michael Paquier <michael@paquier.xyz> wrote:

That's actually not the best fit, because this does not take care of
the pluralization of the second message if you have only 1 byte to
read ;)

But ... if you have only one byte to read, you cannot have a short read.

A second point to take into account is that the unification of error
messages makes for less translation work, which is always welcome.
Those points have been discussed on this thread:
/messages/by-id/20180520000522.GB1603@paquier.xyz

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#9)
Re: BufFileRead() error signalling

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0
--
Michael

#11Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#10)
Re: BufFileRead() error signalling

On Wed, Jan 29, 2020 at 1:26 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0

True. I didn't realize that 'read %d of %zu' was so widely used.

Your grep misses one instance of 'read only %d of %d bytes' because
you grepped for %zu specifically, but that doesn't really change the
overall picture.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#11)
Re: BufFileRead() error signalling

On Wed, Jan 29, 2020 at 10:01:31AM -0500, Robert Haas wrote:

Your grep misses one instance of 'read only %d of %d bytes' because
you grepped for %zu specifically, but that doesn't really change the
overall picture.

Yes, the one in pg_checksums.c. That could actually be changed with a
cast to Size. (Note that there is a second one related to writes but
there is a precedent in md.c, and a similar one in rewriteheap.c..)

Sorry for the digression.
--
Michael

#13Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 7:09 AM Robert Haas <robertmhaas@gmail.com> wrote:

Rather than answering your actual question, I'd like to complain about
this:

if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
- elog(ERROR, "could not read temporary file: %m");
+ elog(ERROR, "could not read temporary file");

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

It would be cool to have the block number in more cases in error
messages. For example, in sts_parallel_scan_next()

/* Seek and load the chunk header. */
if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from shared tuplestore temporary
file"),
errdetail_internal("Could not seek to next block.")));

I'm actually in favor of having the block number in this error
message. I think it would be helpful for multi-batch parallel
hashjoin. If a worker reading one SharedTuplestoreChunk encounters an
error and another worker on a different SharedTuplstoreChunk of the
same file does not, I would find it useful to know more about which
block it was on when the error was encountered.

--
Melanie Plageman

#14David Steele
david@pgmasters.net
In reply to: Thomas Munro (#4)
Re: BufFileRead() error signalling

Hi Thomas,

On 11/29/19 9:46 PM, Thomas Munro wrote:

Ok. Here is a first attempt at that.

It's been a few CFs since this patch received an update, though there
has been plenty of discussion.

Perhaps it would be best to mark it RwF until you have a chance to
produce an update patch?

Regards,
--
-David
david@pgmasters.net

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#10)
Re: BufFileRead() error signalling

On 2020-Jan-29, Michael Paquier wrote:

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that. I guess it was enough of an improvement
over what was there. But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does. So let's not make that an argument for not
changing.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On 2020-Jan-27, Robert Haas wrote:

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote.

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

Thomas, if you're no longer interested in seeing this done, please let
me know and I can see to it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#16)
Re: BufFileRead() error signalling

On Wed, May 27, 2020 at 12:16 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

FWIW, that seems like a plenty good enough reason for back-patching to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#16)
Re: BufFileRead() error signalling

On Thu, May 28, 2020 at 4:16 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Jan-27, Robert Haas wrote:

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote.

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

Ooh.

Thomas, if you're no longer interested in seeing this done, please let
me know and I can see to it.

My indecision on the back-patching question has been resolved by your
crash report and a search on codesearch.debian.org. BufFileRead() and
BufFileWrite() aren't referenced by any of the extensions they
package, so by that standard it's OK to change this stuff in back
branches. I'll post a rebased a patch with Robert and Ibrar's changes
for last reviews later today.

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#18)
Re: BufFileRead() error signalling

On 2020-May-28, Thomas Munro wrote:

My indecision on the back-patching question has been resolved by your
crash report and a search on codesearch.debian.org.

Great news!

BufFileRead() and BufFileWrite() aren't referenced by any of the
extensions they package, so by that standard it's OK to change this
stuff in back branches.

This makes me a bit uncomfortable. For example,
https://inst.eecs.berkeley.edu/~cs186/fa03/hwk5/assign5.html (admittedly
a very old class) encourages students to use this API to create an
aggregate. It might not be the smartest thing in the world, but I'd
prefer not to break such things if they exist proprietarily. Can we
keep the API unchanged in stable branches and just ereport the errors?

I'll post a rebased a patch with Robert and Ibrar's changes
for last reviews later today.

... walks away wondering about BufFileSeekBlock's API ...

(BufFileSeek seems harder to change, due to tuplestore.c)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#15)
Re: BufFileRead() error signalling

On Wed, May 27, 2020 at 11:59:59AM -0400, Alvaro Herrera wrote:

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that. I guess it was enough of an improvement
over what was there. But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does. So let's not make that an argument for not
changing.

Using "only" would be fine by me, though I tend to prefer the existing
one. Now I think that we should avoid "bytes" to not have to worry
about pluralization of error messages. This has been a concern in the
past (see e5d11b9 and the likes).
--
Michael

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#20)
1 attachment(s)
Re: BufFileRead() error signalling

On Thu, May 28, 2020 at 7:10 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, May 27, 2020 at 11:59:59AM -0400, Alvaro Herrera wrote:

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that. I guess it was enough of an improvement
over what was there. But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does. So let's not make that an argument for not
changing.

Using "only" would be fine by me, though I tend to prefer the existing
one. Now I think that we should avoid "bytes" to not have to worry
about pluralization of error messages. This has been a concern in the
past (see e5d11b9 and the likes).

Sorry for the delay in producing a new patch. Here's one that tries
to take into account the various feedback in this thread:

Ibrar said:

You are checking file->dirty twice, first before calling the function
and within the function too. Same for the Assert.

Fixed.

Robert, Melanie, Michael and Alvaro put forward various arguments
about the form of "short read" and block seek error message. While
removing bogus cases of "%m", I changed them all to say "read only %zu
of %zu bytes" in the same place. I removed the bogus "%m" from
BufFileSeek() and BufFileSeekBlock() callers (its call to
BufFileFlush() already throws). I added block numbers to the error
messages about failure to seek by block.

Following existing practice, I made write failure assume ENOSPC if
errno is 0, so that %m would make sense (I am not sure what platform
reports out-of-space that way, but there are certainly a lot of copies
of that code in the tree so it seemed to be missing here).

I didn't change BufFileWrite() to be void, to be friendly to existing
callers outside the tree (if there are any), though I removed all the
code that checks the return code. We can make it void later.

For the future: it feels a bit like we're missing a one line way to
say "read this many bytes and error out if you run out".

Attachments:

0001-Redesign-error-handling-in-buffile.c.patchtext/x-patch; charset=US-ASCII; name=0001-Redesign-error-handling-in-buffile.c.patchDownload
From ab6f90f5e5f3b6e70de28a4ee734e732c2a8c30b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 30 Nov 2019 12:44:47 +1300
Subject: [PATCH] Redesign error handling in buffile.c.

Convert error handling in buffile.c functions to raise errors rather
than expecting callers to check the return value, and improve the error
messages.

This fixes various cases that forgot to check for errors, or couldn't
check for errors because EOF was indistinguishable from error, or
checked for errors unless they fell on certain boundaries which they
assumed to be EOF.

This had been identified by code inspection, but was later identified as
the probable cause of a crash report involving a corrupted hash join
spill file when disk space ran out.

Back-patch to all releases.  For now, BufFileWrite() still returns a
redundant value so as not to change the function prototype for the
benefit of any extensions that might be using it.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reported-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
---
 src/backend/access/gist/gistbuildbuffers.c | 23 ++++-----
 src/backend/executor/nodeHashjoin.c        | 24 ++++------
 src/backend/replication/backup_manifest.c  |  2 +-
 src/backend/storage/file/buffile.c         | 51 +++++++++++---------
 src/backend/utils/sort/logtape.c           | 18 ++++---
 src/backend/utils/sort/sharedtuplestore.c  | 21 ++++----
 src/backend/utils/sort/tuplestore.c        | 56 ++++++++++------------
 7 files changed, 91 insertions(+), 104 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..7ff513a5fe 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -757,26 +757,19 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
 static void
 ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
+	size_t		nread;
+
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
-		elog(ERROR, "could not read temporary file: %m");
+		elog(ERROR, "could not seek block %ld in temporary file", blknum);
+	if ((nread = BufFileRead(file, ptr, BLCKSZ)) != BLCKSZ)
+		elog(ERROR, "could not read temporary file: read only %zu of %zu bytes",
+			 nread, (size_t) BLCKSZ);
 }
 
 static void
 WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileWrite(file, ptr, BLCKSZ) != BLCKSZ)
-	{
-		/*
-		 * the other errors in Read/WriteTempFileBlock shouldn't happen, but
-		 * an error at write can easily happen if you run out of disk space.
-		 */
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
-						blknum)));
-	}
+		elog(ERROR, "could not seek block %ld temporary file", blknum);
+	BufFileWrite(file, ptr, BLCKSZ);
 }
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2cdc38a601..9bb23fef1a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1043,7 +1043,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 
 		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
 												 innerFile,
@@ -1073,7 +1073,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 	}
 
 	return true;
@@ -1219,7 +1219,6 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 					  BufFile **fileptr)
 {
 	BufFile    *file = *fileptr;
-	size_t		written;
 
 	if (file == NULL)
 	{
@@ -1228,17 +1227,8 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 		*fileptr = file;
 	}
 
-	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
-	if (written != sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
-
-	written = BufFileWrite(file, (void *) tuple, tuple->t_len);
-	if (written != tuple->t_len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
+	BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
+	BufFileWrite(file, (void *) tuple, tuple->t_len);
 }
 
 /*
@@ -1279,7 +1269,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != sizeof(header))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, sizeof(header))));
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
@@ -1289,7 +1280,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != header[1] - sizeof(uint32))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, header[1] - sizeof(uint32))));
 	ExecForceStoreMinimalTuple(tuple, tupleSlot, true);
 	return tupleSlot;
 }
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 807c5f16b4..261ef31f28 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -319,7 +319,7 @@ SendBackupManifest(backup_manifest_info *manifest)
 	if (BufFileSeek(manifest->buffile, 0, 0L, SEEK_SET))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not rewind temporary file: %m")));
+				 errmsg("could not rewind temporary file")));
 
 	/* Send CopyOutResponse message */
 	pq_beginmessage(&protobuf, 'H');
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 35e8f12e62..dab748535e 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -99,7 +99,7 @@ static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
 static void BufFileDumpBuffer(BufFile *file);
-static int	BufFileFlush(BufFile *file);
+static void BufFileFlush(BufFile *file);
 static File MakeNewSharedSegment(BufFile *file, int segment);
 
 /*
@@ -434,7 +434,14 @@ BufFileLoadBuffer(BufFile *file)
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 	if (file->nbytes < 0)
+	{
 		file->nbytes = 0;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						FilePathName(thisfile))));
+	}
+
 	/* we choose not to advance curOffset here */
 
 	if (file->nbytes > 0)
@@ -484,13 +491,22 @@ BufFileDumpBuffer(BufFile *file)
 			bytestowrite = (int) availbytes;
 
 		thisfile = file->files[file->curFile];
+		errno = 0;
 		bytestowrite = FileWrite(thisfile,
 								 file->buffer.data + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
 		if (bytestowrite <= 0)
-			return;				/* failed to write */
+		{
+			/* if write didn't set errno, assume no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to \"%s\" : %m",
+							FilePathName(thisfile))));
+		}
 		file->curOffset += bytestowrite;
 		wpos += bytestowrite;
 
@@ -522,7 +538,8 @@ BufFileDumpBuffer(BufFile *file)
 /*
  * BufFileRead
  *
- * Like fread() except we assume 1-byte element size.
+ * Like fread() except we assume 1-byte element size and report I/O errors via
+ * ereport().
  */
 size_t
 BufFileRead(BufFile *file, void *ptr, size_t size)
@@ -530,12 +547,7 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 	size_t		nread = 0;
 	size_t		nthistime;
 
-	if (file->dirty)
-	{
-		if (BufFileFlush(file) != 0)
-			return 0;			/* could not flush... */
-		Assert(!file->dirty);
-	}
+	BufFileFlush(file);
 
 	while (size > 0)
 	{
@@ -569,7 +581,8 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileWrite
  *
- * Like fwrite() except we assume 1-byte element size.
+ * Like fwrite() except we assume 1-byte element size and report errors via
+ * ereport().
  */
 size_t
 BufFileWrite(BufFile *file, void *ptr, size_t size)
@@ -585,11 +598,7 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 		{
 			/* Buffer full, dump it out */
 			if (file->dirty)
-			{
 				BufFileDumpBuffer(file);
-				if (file->dirty)
-					break;		/* I/O error */
-			}
 			else
 			{
 				/* Hmm, went directly from reading to writing? */
@@ -621,19 +630,15 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileFlush
  *
- * Like fflush()
+ * Like fflush(), except that I/O errors are reported with ereport().
  */
-static int
+static void
 BufFileFlush(BufFile *file)
 {
 	if (file->dirty)
-	{
 		BufFileDumpBuffer(file);
-		if (file->dirty)
-			return EOF;
-	}
 
-	return 0;
+	Assert(!file->dirty);
 }
 
 /*
@@ -642,6 +647,7 @@ BufFileFlush(BufFile *file)
  * Like fseek(), except that target position needs two values in order to
  * work when logical filesize exceeds maximum value representable by off_t.
  * We do not support relative seeks across more than that, however.
+ * I/O errors are reported by ereport().
  *
  * Result is 0 if OK, EOF if not.  Logical position is not moved if an
  * impossible seek is attempted.
@@ -699,8 +705,7 @@ BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
 		return 0;
 	}
 	/* Otherwise, must reposition buffer, so flush any dirty data */
-	if (BufFileFlush(file) != 0)
-		return EOF;
+	BufFileFlush(file);
 
 	/*
 	 * At this point and no sooner, check for seek past last segment. The
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 666a7c0e81..db5357793b 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -262,12 +262,12 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 	}
 
 	/* Write the requested block */
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileWrite(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
+				 errmsg("could not seek block %ld of temporary file",
 						blocknum)));
+	BufFileWrite(lts->pfile, buffer, BLCKSZ);
 
 	/* Update nBlocksWritten, if we extended the file */
 	if (blocknum == lts->nBlocksWritten)
@@ -283,12 +283,18 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 static void
 ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 {
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileRead(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	size_t		nread;
+
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read block %ld of temporary file: %m",
+				 errmsg("could not seek to block %ld of temporary file",
 						blocknum)));
+	if ((nread = BufFileRead(lts->pfile, buffer, BLCKSZ)) != BLCKSZ)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read block %ld of temporary file: read only %zu of %zu bytes",
+						blocknum, nread, (size_t) BLCKSZ)));
 }
 
 /*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index c3ab494a45..6537a4303b 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -196,14 +196,9 @@ static void
 sts_flush_chunk(SharedTuplestoreAccessor *accessor)
 {
 	size_t		size;
-	size_t		written;
 
 	size = STS_CHUNK_PAGES * BLCKSZ;
-	written = BufFileWrite(accessor->write_file, accessor->write_chunk, size);
-	if (written != size)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(accessor->write_file, accessor->write_chunk, size);
 	memset(accessor->write_chunk, 0, size);
 	accessor->write_pointer = &accessor->write_chunk->data[0];
 	accessor->sts->participants[accessor->participant].npages +=
@@ -555,6 +550,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 		if (!eof)
 		{
 			SharedTuplestoreChunk chunk_header;
+			size_t		nread;
 
 			/* Make sure we have the file open. */
 			if (accessor->read_file == NULL)
@@ -570,14 +566,15 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 			if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Could not seek to next block.")));
-			if (BufFileRead(accessor->read_file, &chunk_header,
-							STS_CHUNK_HEADER_SIZE) != STS_CHUNK_HEADER_SIZE)
+						 errmsg("could not seek block %u in shared tuplestore temporary file",
+								read_page)));
+			nread = BufFileRead(accessor->read_file, &chunk_header,
+								STS_CHUNK_HEADER_SIZE);
+			if (nread != STS_CHUNK_HEADER_SIZE)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Short read while reading chunk header.")));
+						 errmsg("could not read from shared tuplestore temporary file: read only %zu of %zu bytes",
+								nread, STS_CHUNK_HEADER_SIZE)));
 
 			/*
 			 * If this is an overflow chunk, we skip it and any following
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index ebb1da0746..452a85a423 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -515,7 +515,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			else
 			{
@@ -525,7 +525,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			break;
 		default:
@@ -866,7 +866,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 							SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_WRITEFILE;
 
 			/*
@@ -970,7 +970,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_READFILE;
 			/* FALLTHROUGH */
 
@@ -1034,7 +1034,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 									SEEK_CUR) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 					Assert(!state->truncated);
 					return NULL;
 				}
@@ -1051,7 +1051,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 							SEEK_CUR) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			tup = READTUP(state, tuplen);
 			return tup;
 
@@ -1253,7 +1253,7 @@ tuplestore_rescan(Tuplestorestate *state)
 			if (BufFileSeek(state->myfile, 0, 0L, SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			break;
 		default:
 			elog(ERROR, "invalid tuplestore state");
@@ -1318,7 +1318,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 				else
 				{
@@ -1327,7 +1327,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 			}
 			else if (srcptr == state->activeptr)
@@ -1474,7 +1474,8 @@ getlen(Tuplestorestate *state, bool eofOK)
 	if (nbytes != 0 || !eofOK)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nbytes, sizeof(len))));
 	return 0;
 }
 
@@ -1511,22 +1512,10 @@ writetup_heap(Tuplestorestate *state, void *tup)
 	/* total on-disk footprint: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
-	if (BufFileWrite(state->myfile, (void *) &tuplen,
-					 sizeof(tuplen)) != sizeof(tuplen))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
-	if (BufFileWrite(state->myfile, (void *) tupbody,
-					 tupbodylen) != (size_t) tupbodylen)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
+	BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
+	BufFileWrite(state->myfile, (void *) tupbody, tupbodylen);
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileWrite(state->myfile, (void *) &tuplen,
-						 sizeof(tuplen)) != sizeof(tuplen))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not write to tuplestore temporary file: %m")));
+		BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
 
 	FREEMEM(state, GetMemoryChunkSpace(tuple));
 	heap_free_minimal_tuple(tuple);
@@ -1539,20 +1528,25 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 	unsigned int tuplen = tupbodylen + MINIMAL_TUPLE_DATA_OFFSET;
 	MinimalTuple tuple = (MinimalTuple) palloc(tuplen);
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
+	size_t		nread;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 	/* read in the tuple proper */
 	tuple->t_len = tuplen;
-	if (BufFileRead(state->myfile, (void *) tupbody,
-					tupbodylen) != (size_t) tupbodylen)
+	nread = BufFileRead(state->myfile, (void *) tupbody, tupbodylen);
+	if (nread != (size_t) tupbodylen)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nread, (size_t) tupbodylen)));
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileRead(state->myfile, (void *) &tuplen,
-						sizeof(tuplen)) != sizeof(tuplen))
+	{
+		nread = BufFileRead(state->myfile, (void *) &tuplen, sizeof(tuplen));
+		if (nread != sizeof(tuplen))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not read from tuplestore temporary file: %m")));
+					 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+							nread, sizeof(tuplen))));
+	}
 	return (void *) tuple;
 }
-- 
2.20.1

#22Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#21)
Re: BufFileRead() error signalling

On Fri, Jun 05, 2020 at 06:03:59PM +1200, Thomas Munro wrote:

I didn't change BufFileWrite() to be void, to be friendly to existing
callers outside the tree (if there are any), though I removed all the
code that checks the return code. We can make it void later.

Missing one entry in AppendStringToManifest(). It sounds right to not
change the signature of the routine on back-branches to any ABI
breakages. It think that it could change on HEAD.

Anyway, why are we sure that it is fine to not complain even if
BufFileWrite() does a partial write? fwrite() is mentioned at the top
of the thread, but why is that OK?

For the future: it feels a bit like we're missing a one line way to
say "read this many bytes and error out if you run out".

-       ereport(ERROR,
-               (errcode_for_file_access(),
-                errmsg("could not write block %ld of temporary file:
-               %m",
-                       blknum)));
-   }
+       elog(ERROR, "could not seek block %ld temporary file", blknum);

You mean "in temporary file" in the new message, no?

+           ereport(ERROR,
+                   (errcode_for_file_access(),
+                    errmsg("could not write to \"%s\" : %m",
+                           FilePathName(thisfile))));

Nit: "could not write [to] file \"%s\": %m" is a more common error
string.
--
Michael

#23Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#22)
1 attachment(s)
Re: BufFileRead() error signalling

On Fri, Jun 5, 2020 at 8:14 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jun 05, 2020 at 06:03:59PM +1200, Thomas Munro wrote:

I didn't change BufFileWrite() to be void, to be friendly to existing
callers outside the tree (if there are any), though I removed all the
code that checks the return code. We can make it void later.

Missing one entry in AppendStringToManifest().

Fixed.

Anyway, why are we sure that it is fine to not complain even if
BufFileWrite() does a partial write? fwrite() is mentioned at the top
of the thread, but why is that OK?

It's not OK. If any system call fails, we'll now ereport()
immediately. Now there can't be unhandled or unreported errors, and
it's no longer possible for the caller to confuse EOF with errors.
Hence the change in descriptions:

- * Like fread() except we assume 1-byte element size.
+ * Like fread() except we assume 1-byte element size and report I/O errors via
+ * ereport().
- * Like fwrite() except we assume 1-byte element size.
+ * Like fwrite() except we assume 1-byte element size and report errors via
+ * ereport().

Stepping back a bit, one of the problems here is that we tried to
model the functions on <stdio.h> fread(), but we didn't provide the
companion ferror() and feof() functions, and then we were a bit fuzzy
on when errno is set, even though the <stdio.h> functions don't
document that. There were various ways forward, but the one that this
patch follows is to switch to our regular error reporting system. The
only thing that really costs us is marginally more vague error
messages. Perhaps that could eventually be fixed by passing in some
more context into calls like BufFileCreateTemp(), for use in error
messages and perhaps also path names.

Does this make sense?

+ elog(ERROR, "could not seek block %ld temporary file", blknum);

You mean "in temporary file" in the new message, no?

Fixed.

+           ereport(ERROR,
+                   (errcode_for_file_access(),
+                    errmsg("could not write to \"%s\" : %m",
+                           FilePathName(thisfile))));

Nit: "could not write [to] file \"%s\": %m" is a more common error
string.

Fixed.

Attachments:

v2-0001-Redesign-error-handling-in-buffile.c.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Redesign-error-handling-in-buffile.c.patchDownload
From 11a09e65b521966821283a41f49b73adf6a0e40b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 8 Jun 2020 17:01:06 +1200
Subject: [PATCH v2] Redesign error handling in buffile.c.

Convert error handling in buffile.c functions to raise errors rather
than expecting callers to check the return value, and improve the error
messages.

This fixes various cases that forgot to check for errors, or couldn't
check for errors because EOF was indistinguishable from error, or
checked for errors unless they fell on certain boundaries which they
assumed to be EOF, or reported errno when it's not clear that it's been
set.

The problems were initially identified by code inspection, and then
reported as the probable cause of a crash report involving a corrupted
hash join spill file when disk space ran out.

Back-patch to all releases.  For now, BufFileWrite() still returns a
redundant value so as not to change the function prototype for the
benefit of any extensions that might be using it.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reported-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
---
 src/backend/access/gist/gistbuildbuffers.c | 23 ++++-----
 src/backend/executor/nodeHashjoin.c        | 24 ++++------
 src/backend/replication/backup_manifest.c  |  8 +---
 src/backend/storage/file/buffile.c         | 51 +++++++++++---------
 src/backend/utils/sort/logtape.c           | 18 ++++---
 src/backend/utils/sort/sharedtuplestore.c  | 21 ++++----
 src/backend/utils/sort/tuplestore.c        | 56 ++++++++++------------
 7 files changed, 92 insertions(+), 109 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..4436d1c71d 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -757,26 +757,19 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
 static void
 ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
+	size_t		nread;
+
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
-		elog(ERROR, "could not read temporary file: %m");
+		elog(ERROR, "could not seek block %ld in temporary file", blknum);
+	if ((nread = BufFileRead(file, ptr, BLCKSZ)) != BLCKSZ)
+		elog(ERROR, "could not read temporary file: read only %zu of %zu bytes",
+			 nread, (size_t) BLCKSZ);
 }
 
 static void
 WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileWrite(file, ptr, BLCKSZ) != BLCKSZ)
-	{
-		/*
-		 * the other errors in Read/WriteTempFileBlock shouldn't happen, but
-		 * an error at write can easily happen if you run out of disk space.
-		 */
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
-						blknum)));
-	}
+		elog(ERROR, "could not seek block %ld in temporary file", blknum);
+	BufFileWrite(file, ptr, BLCKSZ);
 }
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2cdc38a601..9bb23fef1a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1043,7 +1043,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 
 		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
 												 innerFile,
@@ -1073,7 +1073,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 	}
 
 	return true;
@@ -1219,7 +1219,6 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 					  BufFile **fileptr)
 {
 	BufFile    *file = *fileptr;
-	size_t		written;
 
 	if (file == NULL)
 	{
@@ -1228,17 +1227,8 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 		*fileptr = file;
 	}
 
-	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
-	if (written != sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
-
-	written = BufFileWrite(file, (void *) tuple, tuple->t_len);
-	if (written != tuple->t_len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
+	BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
+	BufFileWrite(file, (void *) tuple, tuple->t_len);
 }
 
 /*
@@ -1279,7 +1269,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != sizeof(header))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, sizeof(header))));
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
@@ -1289,7 +1280,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != header[1] - sizeof(uint32))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, header[1] - sizeof(uint32))));
 	ExecForceStoreMinimalTuple(tuple, tupleSlot, true);
 	return tupleSlot;
 }
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 807c5f16b4..070d97ee6b 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -319,7 +319,7 @@ SendBackupManifest(backup_manifest_info *manifest)
 	if (BufFileSeek(manifest->buffile, 0, 0L, SEEK_SET))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not rewind temporary file: %m")));
+				 errmsg("could not rewind temporary file")));
 
 	/* Send CopyOutResponse message */
 	pq_beginmessage(&protobuf, 'H');
@@ -370,10 +370,6 @@ AppendStringToManifest(backup_manifest_info *manifest, char *s)
 	Assert(manifest != NULL);
 	if (manifest->still_checksumming)
 		pg_sha256_update(&manifest->manifest_ctx, (uint8 *) s, len);
-	written = BufFileWrite(manifest->buffile, s, len);
-	if (written != len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(manifest->buffile, s, len);
 	manifest->manifest_size += len;
 }
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 35e8f12e62..a7806103fd 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -99,7 +99,7 @@ static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
 static void BufFileDumpBuffer(BufFile *file);
-static int	BufFileFlush(BufFile *file);
+static void BufFileFlush(BufFile *file);
 static File MakeNewSharedSegment(BufFile *file, int segment);
 
 /*
@@ -434,7 +434,14 @@ BufFileLoadBuffer(BufFile *file)
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 	if (file->nbytes < 0)
+	{
 		file->nbytes = 0;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						FilePathName(thisfile))));
+	}
+
 	/* we choose not to advance curOffset here */
 
 	if (file->nbytes > 0)
@@ -484,13 +491,22 @@ BufFileDumpBuffer(BufFile *file)
 			bytestowrite = (int) availbytes;
 
 		thisfile = file->files[file->curFile];
+		errno = 0;
 		bytestowrite = FileWrite(thisfile,
 								 file->buffer.data + wpos,
 								 bytestowrite,
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
 		if (bytestowrite <= 0)
-			return;				/* failed to write */
+		{
+			/* if write didn't set errno, assume no disk space */
+			if (errno == 0)
+				errno = ENOSPC;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							FilePathName(thisfile))));
+		}
 		file->curOffset += bytestowrite;
 		wpos += bytestowrite;
 
@@ -522,7 +538,8 @@ BufFileDumpBuffer(BufFile *file)
 /*
  * BufFileRead
  *
- * Like fread() except we assume 1-byte element size.
+ * Like fread() except we assume 1-byte element size and report I/O errors via
+ * ereport().
  */
 size_t
 BufFileRead(BufFile *file, void *ptr, size_t size)
@@ -530,12 +547,7 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 	size_t		nread = 0;
 	size_t		nthistime;
 
-	if (file->dirty)
-	{
-		if (BufFileFlush(file) != 0)
-			return 0;			/* could not flush... */
-		Assert(!file->dirty);
-	}
+	BufFileFlush(file);
 
 	while (size > 0)
 	{
@@ -569,7 +581,8 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileWrite
  *
- * Like fwrite() except we assume 1-byte element size.
+ * Like fwrite() except we assume 1-byte element size and report errors via
+ * ereport().
  */
 size_t
 BufFileWrite(BufFile *file, void *ptr, size_t size)
@@ -585,11 +598,7 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 		{
 			/* Buffer full, dump it out */
 			if (file->dirty)
-			{
 				BufFileDumpBuffer(file);
-				if (file->dirty)
-					break;		/* I/O error */
-			}
 			else
 			{
 				/* Hmm, went directly from reading to writing? */
@@ -621,19 +630,15 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileFlush
  *
- * Like fflush()
+ * Like fflush(), except that I/O errors are reported with ereport().
  */
-static int
+static void
 BufFileFlush(BufFile *file)
 {
 	if (file->dirty)
-	{
 		BufFileDumpBuffer(file);
-		if (file->dirty)
-			return EOF;
-	}
 
-	return 0;
+	Assert(!file->dirty);
 }
 
 /*
@@ -642,6 +647,7 @@ BufFileFlush(BufFile *file)
  * Like fseek(), except that target position needs two values in order to
  * work when logical filesize exceeds maximum value representable by off_t.
  * We do not support relative seeks across more than that, however.
+ * I/O errors are reported by ereport().
  *
  * Result is 0 if OK, EOF if not.  Logical position is not moved if an
  * impossible seek is attempted.
@@ -699,8 +705,7 @@ BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
 		return 0;
 	}
 	/* Otherwise, must reposition buffer, so flush any dirty data */
-	if (BufFileFlush(file) != 0)
-		return EOF;
+	BufFileFlush(file);
 
 	/*
 	 * At this point and no sooner, check for seek past last segment. The
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 138da0c1b4..27c0dbf8c3 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -262,12 +262,12 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 	}
 
 	/* Write the requested block */
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileWrite(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
+				 errmsg("could not seek block %ld of temporary file",
 						blocknum)));
+	BufFileWrite(lts->pfile, buffer, BLCKSZ);
 
 	/* Update nBlocksWritten, if we extended the file */
 	if (blocknum == lts->nBlocksWritten)
@@ -283,12 +283,18 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 static void
 ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 {
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileRead(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	size_t		nread;
+
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read block %ld of temporary file: %m",
+				 errmsg("could not seek to block %ld of temporary file",
 						blocknum)));
+	if ((nread = BufFileRead(lts->pfile, buffer, BLCKSZ)) != BLCKSZ)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read block %ld of temporary file: read only %zu of %zu bytes",
+						blocknum, nread, (size_t) BLCKSZ)));
 }
 
 /*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index c3ab494a45..6537a4303b 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -196,14 +196,9 @@ static void
 sts_flush_chunk(SharedTuplestoreAccessor *accessor)
 {
 	size_t		size;
-	size_t		written;
 
 	size = STS_CHUNK_PAGES * BLCKSZ;
-	written = BufFileWrite(accessor->write_file, accessor->write_chunk, size);
-	if (written != size)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(accessor->write_file, accessor->write_chunk, size);
 	memset(accessor->write_chunk, 0, size);
 	accessor->write_pointer = &accessor->write_chunk->data[0];
 	accessor->sts->participants[accessor->participant].npages +=
@@ -555,6 +550,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 		if (!eof)
 		{
 			SharedTuplestoreChunk chunk_header;
+			size_t		nread;
 
 			/* Make sure we have the file open. */
 			if (accessor->read_file == NULL)
@@ -570,14 +566,15 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 			if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Could not seek to next block.")));
-			if (BufFileRead(accessor->read_file, &chunk_header,
-							STS_CHUNK_HEADER_SIZE) != STS_CHUNK_HEADER_SIZE)
+						 errmsg("could not seek block %u in shared tuplestore temporary file",
+								read_page)));
+			nread = BufFileRead(accessor->read_file, &chunk_header,
+								STS_CHUNK_HEADER_SIZE);
+			if (nread != STS_CHUNK_HEADER_SIZE)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Short read while reading chunk header.")));
+						 errmsg("could not read from shared tuplestore temporary file: read only %zu of %zu bytes",
+								nread, STS_CHUNK_HEADER_SIZE)));
 
 			/*
 			 * If this is an overflow chunk, we skip it and any following
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index ebb1da0746..452a85a423 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -515,7 +515,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			else
 			{
@@ -525,7 +525,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			break;
 		default:
@@ -866,7 +866,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 							SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_WRITEFILE;
 
 			/*
@@ -970,7 +970,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_READFILE;
 			/* FALLTHROUGH */
 
@@ -1034,7 +1034,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 									SEEK_CUR) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 					Assert(!state->truncated);
 					return NULL;
 				}
@@ -1051,7 +1051,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 							SEEK_CUR) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			tup = READTUP(state, tuplen);
 			return tup;
 
@@ -1253,7 +1253,7 @@ tuplestore_rescan(Tuplestorestate *state)
 			if (BufFileSeek(state->myfile, 0, 0L, SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			break;
 		default:
 			elog(ERROR, "invalid tuplestore state");
@@ -1318,7 +1318,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 				else
 				{
@@ -1327,7 +1327,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 			}
 			else if (srcptr == state->activeptr)
@@ -1474,7 +1474,8 @@ getlen(Tuplestorestate *state, bool eofOK)
 	if (nbytes != 0 || !eofOK)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nbytes, sizeof(len))));
 	return 0;
 }
 
@@ -1511,22 +1512,10 @@ writetup_heap(Tuplestorestate *state, void *tup)
 	/* total on-disk footprint: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
-	if (BufFileWrite(state->myfile, (void *) &tuplen,
-					 sizeof(tuplen)) != sizeof(tuplen))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
-	if (BufFileWrite(state->myfile, (void *) tupbody,
-					 tupbodylen) != (size_t) tupbodylen)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
+	BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
+	BufFileWrite(state->myfile, (void *) tupbody, tupbodylen);
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileWrite(state->myfile, (void *) &tuplen,
-						 sizeof(tuplen)) != sizeof(tuplen))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not write to tuplestore temporary file: %m")));
+		BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
 
 	FREEMEM(state, GetMemoryChunkSpace(tuple));
 	heap_free_minimal_tuple(tuple);
@@ -1539,20 +1528,25 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 	unsigned int tuplen = tupbodylen + MINIMAL_TUPLE_DATA_OFFSET;
 	MinimalTuple tuple = (MinimalTuple) palloc(tuplen);
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
+	size_t		nread;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 	/* read in the tuple proper */
 	tuple->t_len = tuplen;
-	if (BufFileRead(state->myfile, (void *) tupbody,
-					tupbodylen) != (size_t) tupbodylen)
+	nread = BufFileRead(state->myfile, (void *) tupbody, tupbodylen);
+	if (nread != (size_t) tupbodylen)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nread, (size_t) tupbodylen)));
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileRead(state->myfile, (void *) &tuplen,
-						sizeof(tuplen)) != sizeof(tuplen))
+	{
+		nread = BufFileRead(state->myfile, (void *) &tuplen, sizeof(tuplen));
+		if (nread != sizeof(tuplen))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not read from tuplestore temporary file: %m")));
+					 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+							nread, sizeof(tuplen))));
+	}
 	return (void *) tuple;
 }
-- 
2.20.1

#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#23)
Re: BufFileRead() error signalling

On 2020-Jun-08, Thomas Munro wrote:

Stepping back a bit, one of the problems here is that we tried to
model the functions on <stdio.h> fread(), but we didn't provide the
companion ferror() and feof() functions, and then we were a bit fuzzy
on when errno is set, even though the <stdio.h> functions don't
document that. There were various ways forward, but the one that this
patch follows is to switch to our regular error reporting system. The
only thing that really costs us is marginally more vague error
messages. Perhaps that could eventually be fixed by passing in some
more context into calls like BufFileCreateTemp(), for use in error
messages and perhaps also path names.

I think using our standard "exception" mechanism makes sense. As for
additional context, I think usefulness of the error messages would be
improved by showing the file path (because then user knows which
filesystem/tablespace was full, for example), but IMO any additional
context on top of that is of marginal additional benefit. If we really
cared, we could have errcontext() callbacks in the sites of interest,
but that would be a separate patch and perhaps not backpatchable.

+ elog(ERROR, "could not seek block %ld temporary file", blknum);

You mean "in temporary file" in the new message, no?

Fixed.

The wording we use is "could not seek TO block N". (Or used to use,
before switching to pread/pwrite in most places, it seems).

Thanks!

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#25Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#24)
1 attachment(s)
Re: BufFileRead() error signalling

On Tue, Jun 9, 2020 at 2:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think using our standard "exception" mechanism makes sense. As for
additional context, I think usefulness of the error messages would be
improved by showing the file path (because then user knows which
filesystem/tablespace was full, for example), but IMO any additional
context on top of that is of marginal additional benefit. If we really
cared, we could have errcontext() callbacks in the sites of interest,
but that would be a separate patch and perhaps not backpatchable.

Cool. It does show the path, so that'll tell you which file system is
full or broken.

I thought a bit about the ENOSPC thing, and took that change out.
Since commit 1173344e we handle write() returning a positive number
less than the full size by predicting that a follow-up call to write()
would surely return ENOSPC, without the hassle of trying to write
more, or having a separate error message sans %m. But
BufFileDumpBuffer() does try again, and only raises an error if the
system call returns < 0 (well, it says <= 0, but 0 is impossible
according to POSIX, at least assuming you didn't try to write zero
bytes, and we already exclude that). So ENOSPC-prediction is
unnecessary here.

The wording we use is "could not seek TO block N". (Or used to use,
before switching to pread/pwrite in most places, it seems).

Fixed in a couple of places.

Attachments:

v3-0001-Redesign-error-handling-in-buffile.c.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Redesign-error-handling-in-buffile.c.patchDownload
From 3632531a2e6f15bd4cd28fd1eb1dcc0fd341f014 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 8 Jun 2020 17:01:06 +1200
Subject: [PATCH v3] Redesign error handling in buffile.c.

Convert error handling in buffile.c functions to raise errors rather
than expecting callers to check the return value, and improve the error
messages.

This fixes various cases that forgot to check for errors, or couldn't
check for errors because EOF was indistinguishable from error, or
checked for errors unless they fell on certain boundaries which they
assumed to be EOF, or reported errno when it's not clear that it's been
set.

The problems were initially identified by code inspection, and then
reported as the probable cause of a crash report involving a corrupted
hash join spill file when disk space ran out.

Back-patch to all releases.  For now, BufFileWrite() still returns a
redundant value so as not to change the function prototype for the
benefit of any extensions that might be using it.

Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reported-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
---
 src/backend/access/gist/gistbuildbuffers.c | 23 ++++-----
 src/backend/executor/nodeHashjoin.c        | 24 ++++------
 src/backend/replication/backup_manifest.c  |  9 +---
 src/backend/storage/file/buffile.c         | 45 +++++++++--------
 src/backend/utils/sort/logtape.c           | 18 ++++---
 src/backend/utils/sort/sharedtuplestore.c  | 21 ++++----
 src/backend/utils/sort/tuplestore.c        | 56 ++++++++++------------
 7 files changed, 86 insertions(+), 110 deletions(-)

diff --git a/src/backend/access/gist/gistbuildbuffers.c b/src/backend/access/gist/gistbuildbuffers.c
index 4b562d8d3f..26bbc9b316 100644
--- a/src/backend/access/gist/gistbuildbuffers.c
+++ b/src/backend/access/gist/gistbuildbuffers.c
@@ -757,26 +757,19 @@ gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate,
 static void
 ReadTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
+	size_t		nread;
+
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
-		elog(ERROR, "could not read temporary file: %m");
+		elog(ERROR, "could not seek to block %ld in temporary file", blknum);
+	if ((nread = BufFileRead(file, ptr, BLCKSZ)) != BLCKSZ)
+		elog(ERROR, "could not read temporary file: read only %zu of %zu bytes",
+			 nread, (size_t) BLCKSZ);
 }
 
 static void
 WriteTempFileBlock(BufFile *file, long blknum, void *ptr)
 {
 	if (BufFileSeekBlock(file, blknum) != 0)
-		elog(ERROR, "could not seek temporary file: %m");
-	if (BufFileWrite(file, ptr, BLCKSZ) != BLCKSZ)
-	{
-		/*
-		 * the other errors in Read/WriteTempFileBlock shouldn't happen, but
-		 * an error at write can easily happen if you run out of disk space.
-		 */
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
-						blknum)));
-	}
+		elog(ERROR, "could not seek to block %ld in temporary file", blknum);
+	BufFileWrite(file, ptr, BLCKSZ);
 }
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 2cdc38a601..9bb23fef1a 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -1043,7 +1043,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(innerFile, 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 
 		while ((slot = ExecHashJoinGetSavedTuple(hjstate,
 												 innerFile,
@@ -1073,7 +1073,7 @@ ExecHashJoinNewBatch(HashJoinState *hjstate)
 		if (BufFileSeek(hashtable->outerBatchFile[curbatch], 0, 0L, SEEK_SET))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not rewind hash-join temporary file: %m")));
+					 errmsg("could not rewind hash-join temporary file")));
 	}
 
 	return true;
@@ -1219,7 +1219,6 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 					  BufFile **fileptr)
 {
 	BufFile    *file = *fileptr;
-	size_t		written;
 
 	if (file == NULL)
 	{
@@ -1228,17 +1227,8 @@ ExecHashJoinSaveTuple(MinimalTuple tuple, uint32 hashvalue,
 		*fileptr = file;
 	}
 
-	written = BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
-	if (written != sizeof(uint32))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
-
-	written = BufFileWrite(file, (void *) tuple, tuple->t_len);
-	if (written != tuple->t_len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to hash-join temporary file: %m")));
+	BufFileWrite(file, (void *) &hashvalue, sizeof(uint32));
+	BufFileWrite(file, (void *) tuple, tuple->t_len);
 }
 
 /*
@@ -1279,7 +1269,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != sizeof(header))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, sizeof(header))));
 	*hashvalue = header[0];
 	tuple = (MinimalTuple) palloc(header[1]);
 	tuple->t_len = header[1];
@@ -1289,7 +1280,8 @@ ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
 	if (nread != header[1] - sizeof(uint32))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from hash-join temporary file: %m")));
+				 errmsg("could not read from hash-join temporary file: read only %zu of %zu bytes",
+						nread, header[1] - sizeof(uint32))));
 	ExecForceStoreMinimalTuple(tuple, tupleSlot, true);
 	return tupleSlot;
 }
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 807c5f16b4..b626004927 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -319,7 +319,7 @@ SendBackupManifest(backup_manifest_info *manifest)
 	if (BufFileSeek(manifest->buffile, 0, 0L, SEEK_SET))
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not rewind temporary file: %m")));
+				 errmsg("could not rewind temporary file")));
 
 	/* Send CopyOutResponse message */
 	pq_beginmessage(&protobuf, 'H');
@@ -365,15 +365,10 @@ static void
 AppendStringToManifest(backup_manifest_info *manifest, char *s)
 {
 	int			len = strlen(s);
-	size_t		written;
 
 	Assert(manifest != NULL);
 	if (manifest->still_checksumming)
 		pg_sha256_update(&manifest->manifest_ctx, (uint8 *) s, len);
-	written = BufFileWrite(manifest->buffile, s, len);
-	if (written != len)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(manifest->buffile, s, len);
 	manifest->manifest_size += len;
 }
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 35e8f12e62..298b10b543 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -99,7 +99,7 @@ static BufFile *makeBufFile(File firstfile);
 static void extendBufFile(BufFile *file);
 static void BufFileLoadBuffer(BufFile *file);
 static void BufFileDumpBuffer(BufFile *file);
-static int	BufFileFlush(BufFile *file);
+static void BufFileFlush(BufFile *file);
 static File MakeNewSharedSegment(BufFile *file, int segment);
 
 /*
@@ -434,7 +434,14 @@ BufFileLoadBuffer(BufFile *file)
 							file->curOffset,
 							WAIT_EVENT_BUFFILE_READ);
 	if (file->nbytes < 0)
+	{
 		file->nbytes = 0;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						FilePathName(thisfile))));
+	}
+
 	/* we choose not to advance curOffset here */
 
 	if (file->nbytes > 0)
@@ -490,7 +497,10 @@ BufFileDumpBuffer(BufFile *file)
 								 file->curOffset,
 								 WAIT_EVENT_BUFFILE_WRITE);
 		if (bytestowrite <= 0)
-			return;				/* failed to write */
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							FilePathName(thisfile))));
 		file->curOffset += bytestowrite;
 		wpos += bytestowrite;
 
@@ -522,7 +532,8 @@ BufFileDumpBuffer(BufFile *file)
 /*
  * BufFileRead
  *
- * Like fread() except we assume 1-byte element size.
+ * Like fread() except we assume 1-byte element size and report I/O errors via
+ * ereport().
  */
 size_t
 BufFileRead(BufFile *file, void *ptr, size_t size)
@@ -530,12 +541,7 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 	size_t		nread = 0;
 	size_t		nthistime;
 
-	if (file->dirty)
-	{
-		if (BufFileFlush(file) != 0)
-			return 0;			/* could not flush... */
-		Assert(!file->dirty);
-	}
+	BufFileFlush(file);
 
 	while (size > 0)
 	{
@@ -569,7 +575,8 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileWrite
  *
- * Like fwrite() except we assume 1-byte element size.
+ * Like fwrite() except we assume 1-byte element size and report errors via
+ * ereport().
  */
 size_t
 BufFileWrite(BufFile *file, void *ptr, size_t size)
@@ -585,11 +592,7 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 		{
 			/* Buffer full, dump it out */
 			if (file->dirty)
-			{
 				BufFileDumpBuffer(file);
-				if (file->dirty)
-					break;		/* I/O error */
-			}
 			else
 			{
 				/* Hmm, went directly from reading to writing? */
@@ -621,19 +624,15 @@ BufFileWrite(BufFile *file, void *ptr, size_t size)
 /*
  * BufFileFlush
  *
- * Like fflush()
+ * Like fflush(), except that I/O errors are reported with ereport().
  */
-static int
+static void
 BufFileFlush(BufFile *file)
 {
 	if (file->dirty)
-	{
 		BufFileDumpBuffer(file);
-		if (file->dirty)
-			return EOF;
-	}
 
-	return 0;
+	Assert(!file->dirty);
 }
 
 /*
@@ -642,6 +641,7 @@ BufFileFlush(BufFile *file)
  * Like fseek(), except that target position needs two values in order to
  * work when logical filesize exceeds maximum value representable by off_t.
  * We do not support relative seeks across more than that, however.
+ * I/O errors are reported by ereport().
  *
  * Result is 0 if OK, EOF if not.  Logical position is not moved if an
  * impossible seek is attempted.
@@ -699,8 +699,7 @@ BufFileSeek(BufFile *file, int fileno, off_t offset, int whence)
 		return 0;
 	}
 	/* Otherwise, must reposition buffer, so flush any dirty data */
-	if (BufFileFlush(file) != 0)
-		return EOF;
+	BufFileFlush(file);
 
 	/*
 	 * At this point and no sooner, check for seek past last segment. The
diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 138da0c1b4..2aa747da1e 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -262,12 +262,12 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 	}
 
 	/* Write the requested block */
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileWrite(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not write block %ld of temporary file: %m",
+				 errmsg("could not seek to block %ld of temporary file",
 						blocknum)));
+	BufFileWrite(lts->pfile, buffer, BLCKSZ);
 
 	/* Update nBlocksWritten, if we extended the file */
 	if (blocknum == lts->nBlocksWritten)
@@ -283,12 +283,18 @@ ltsWriteBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 static void
 ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
 {
-	if (BufFileSeekBlock(lts->pfile, blocknum) != 0 ||
-		BufFileRead(lts->pfile, buffer, BLCKSZ) != BLCKSZ)
+	size_t		nread;
+
+	if (BufFileSeekBlock(lts->pfile, blocknum) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read block %ld of temporary file: %m",
+				 errmsg("could not seek to block %ld of temporary file",
 						blocknum)));
+	if ((nread = BufFileRead(lts->pfile, buffer, BLCKSZ)) != BLCKSZ)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read block %ld of temporary file: read only %zu of %zu bytes",
+						blocknum, nread, (size_t) BLCKSZ)));
 }
 
 /*
diff --git a/src/backend/utils/sort/sharedtuplestore.c b/src/backend/utils/sort/sharedtuplestore.c
index c3ab494a45..6537a4303b 100644
--- a/src/backend/utils/sort/sharedtuplestore.c
+++ b/src/backend/utils/sort/sharedtuplestore.c
@@ -196,14 +196,9 @@ static void
 sts_flush_chunk(SharedTuplestoreAccessor *accessor)
 {
 	size_t		size;
-	size_t		written;
 
 	size = STS_CHUNK_PAGES * BLCKSZ;
-	written = BufFileWrite(accessor->write_file, accessor->write_chunk, size);
-	if (written != size)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to temporary file: %m")));
+	BufFileWrite(accessor->write_file, accessor->write_chunk, size);
 	memset(accessor->write_chunk, 0, size);
 	accessor->write_pointer = &accessor->write_chunk->data[0];
 	accessor->sts->participants[accessor->participant].npages +=
@@ -555,6 +550,7 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 		if (!eof)
 		{
 			SharedTuplestoreChunk chunk_header;
+			size_t		nread;
 
 			/* Make sure we have the file open. */
 			if (accessor->read_file == NULL)
@@ -570,14 +566,15 @@ sts_parallel_scan_next(SharedTuplestoreAccessor *accessor, void *meta_data)
 			if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Could not seek to next block.")));
-			if (BufFileRead(accessor->read_file, &chunk_header,
-							STS_CHUNK_HEADER_SIZE) != STS_CHUNK_HEADER_SIZE)
+						 errmsg("could not seek block %u in shared tuplestore temporary file",
+								read_page)));
+			nread = BufFileRead(accessor->read_file, &chunk_header,
+								STS_CHUNK_HEADER_SIZE);
+			if (nread != STS_CHUNK_HEADER_SIZE)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not read from shared tuplestore temporary file"),
-						 errdetail_internal("Short read while reading chunk header.")));
+						 errmsg("could not read from shared tuplestore temporary file: read only %zu of %zu bytes",
+								nread, STS_CHUNK_HEADER_SIZE)));
 
 			/*
 			 * If this is an overflow chunk, we skip it and any following
diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/tuplestore.c
index ebb1da0746..452a85a423 100644
--- a/src/backend/utils/sort/tuplestore.c
+++ b/src/backend/utils/sort/tuplestore.c
@@ -515,7 +515,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			else
 			{
@@ -525,7 +525,7 @@ tuplestore_select_read_pointer(Tuplestorestate *state, int ptr)
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			}
 			break;
 		default:
@@ -866,7 +866,7 @@ tuplestore_puttuple_common(Tuplestorestate *state, void *tuple)
 							SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_WRITEFILE;
 
 			/*
@@ -970,7 +970,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 								SEEK_SET) != 0)
 					ereport(ERROR,
 							(errcode_for_file_access(),
-							 errmsg("could not seek in tuplestore temporary file: %m")));
+							 errmsg("could not seek in tuplestore temporary file")));
 			state->status = TSS_READFILE;
 			/* FALLTHROUGH */
 
@@ -1034,7 +1034,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 									SEEK_CUR) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 					Assert(!state->truncated);
 					return NULL;
 				}
@@ -1051,7 +1051,7 @@ tuplestore_gettuple(Tuplestorestate *state, bool forward,
 							SEEK_CUR) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			tup = READTUP(state, tuplen);
 			return tup;
 
@@ -1253,7 +1253,7 @@ tuplestore_rescan(Tuplestorestate *state)
 			if (BufFileSeek(state->myfile, 0, 0L, SEEK_SET) != 0)
 				ereport(ERROR,
 						(errcode_for_file_access(),
-						 errmsg("could not seek in tuplestore temporary file: %m")));
+						 errmsg("could not seek in tuplestore temporary file")));
 			break;
 		default:
 			elog(ERROR, "invalid tuplestore state");
@@ -1318,7 +1318,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 				else
 				{
@@ -1327,7 +1327,7 @@ tuplestore_copy_read_pointer(Tuplestorestate *state,
 									SEEK_SET) != 0)
 						ereport(ERROR,
 								(errcode_for_file_access(),
-								 errmsg("could not seek in tuplestore temporary file: %m")));
+								 errmsg("could not seek in tuplestore temporary file")));
 				}
 			}
 			else if (srcptr == state->activeptr)
@@ -1474,7 +1474,8 @@ getlen(Tuplestorestate *state, bool eofOK)
 	if (nbytes != 0 || !eofOK)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nbytes, sizeof(len))));
 	return 0;
 }
 
@@ -1511,22 +1512,10 @@ writetup_heap(Tuplestorestate *state, void *tup)
 	/* total on-disk footprint: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
-	if (BufFileWrite(state->myfile, (void *) &tuplen,
-					 sizeof(tuplen)) != sizeof(tuplen))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
-	if (BufFileWrite(state->myfile, (void *) tupbody,
-					 tupbodylen) != (size_t) tupbodylen)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to tuplestore temporary file: %m")));
+	BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
+	BufFileWrite(state->myfile, (void *) tupbody, tupbodylen);
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileWrite(state->myfile, (void *) &tuplen,
-						 sizeof(tuplen)) != sizeof(tuplen))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not write to tuplestore temporary file: %m")));
+		BufFileWrite(state->myfile, (void *) &tuplen, sizeof(tuplen));
 
 	FREEMEM(state, GetMemoryChunkSpace(tuple));
 	heap_free_minimal_tuple(tuple);
@@ -1539,20 +1528,25 @@ readtup_heap(Tuplestorestate *state, unsigned int len)
 	unsigned int tuplen = tupbodylen + MINIMAL_TUPLE_DATA_OFFSET;
 	MinimalTuple tuple = (MinimalTuple) palloc(tuplen);
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
+	size_t		nread;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 	/* read in the tuple proper */
 	tuple->t_len = tuplen;
-	if (BufFileRead(state->myfile, (void *) tupbody,
-					tupbodylen) != (size_t) tupbodylen)
+	nread = BufFileRead(state->myfile, (void *) tupbody, tupbodylen);
+	if (nread != (size_t) tupbodylen)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not read from tuplestore temporary file: %m")));
+				 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+						nread, (size_t) tupbodylen)));
 	if (state->backward)		/* need trailing length word? */
-		if (BufFileRead(state->myfile, (void *) &tuplen,
-						sizeof(tuplen)) != sizeof(tuplen))
+	{
+		nread = BufFileRead(state->myfile, (void *) &tuplen, sizeof(tuplen));
+		if (nread != sizeof(tuplen))
 			ereport(ERROR,
 					(errcode_for_file_access(),
-					 errmsg("could not read from tuplestore temporary file: %m")));
+					 errmsg("could not read from tuplestore temporary file: read only %zu of %zu bytes",
+							nread, sizeof(tuplen))));
+	}
 	return (void *) tuple;
 }
-- 
2.20.1

#26Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#25)
Re: BufFileRead() error signalling

On Mon, Jun 08, 2020 at 05:50:44PM +1200, Thomas Munro wrote:

On Fri, Jun 5, 2020 at 8:14 PM Michael Paquier <michael@paquier.xyz> wrote:\

Anyway, why are we sure that it is fine to not complain even if
BufFileWrite() does a partial write? fwrite() is mentioned at the
top
of the thread, but why is that OK?

It's not OK. If any system call fails, we'll now ereport()
immediately. Now there can't be unhandled or unreported errors, and
it's no longer possible for the caller to confuse EOF with errors.
Hence the change in descriptions:

Oh, OK. I looked at that again this morning and I see your point now.
I was wondering if it could be possible to have BufFileWrite() write
less data than what is expected with errno=0. The code of HEAD would
issue a confusing error message like "could not write: Success" in
such a case, still it would fail on ERROR. And I thought that your
patch would do a different thing and would cause this code path to not
fail in such a case, but the point I missed on the first read of your
patch is that BufFileWrite() is written is such a way that we would
actually just keep looping until the amount of data to write is
written, meaning that we should never see anymore the case where
BufFileWrite() returns a size that does not match with the expected
size to write.

On Tue, Jun 09, 2020 at 12:21:53PM +1200, Thomas Munro wrote:

On Tue, Jun 9, 2020 at 2:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think using our standard "exception" mechanism makes sense. As for
additional context, I think usefulness of the error messages would be
improved by showing the file path (because then user knows which
filesystem/tablespace was full, for example), but IMO any additional
context on top of that is of marginal additional benefit. If we really
cared, we could have errcontext() callbacks in the sites of interest,
but that would be a separate patch and perhaps not backpatchable.

Cool. It does show the path, so that'll tell you which file system is
full or broken.

There are some places in logtape.c, *tuplestore.c and gist where there
is no file path. That would be nice to have, but that's not really
the problem of this patch.

I thought a bit about the ENOSPC thing, and took that change out.
Since commit 1173344e we handle write() returning a positive number
less than the full size by predicting that a follow-up call to write()
would surely return ENOSPC, without the hassle of trying to write
more, or having a separate error message sans %m. But
BufFileDumpBuffer() does try again, and only raises an error if the
system call returns < 0 (well, it says <= 0, but 0 is impossible
according to POSIX, at least assuming you didn't try to write zero
bytes, and we already exclude that). So ENOSPC-prediction is
unnecessary here.

+1. Makes sense.

The wording we use is "could not seek TO block N". (Or used to use,
before switching to pread/pwrite in most places, it seems).

Fixed in a couple of places.

Looks fine to me. Are you planning to send an extra patch to switch
BufFileWrite() to void for 14~?
--
Michael

#27Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#26)
Re: BufFileRead() error signalling

On Tue, Jun 9, 2020 at 2:32 PM Michael Paquier <michael@paquier.xyz> wrote:

Looks fine to me. Are you planning to send an extra patch to switch
BufFileWrite() to void for 14~?

Thanks! Pushed. I went ahead and made it void in master only.

#28Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#27)
Re: BufFileRead() error signalling

On Tue, Jun 16, 2020 at 05:41:31PM +1200, Thomas Munro wrote:

Thanks! Pushed. I went ahead and made it void in master only.

Thanks.
--
Michael