BufFileRead() error signalling

Started by Thomas Munroover 6 years ago28 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

As noted by Amit Khandhekar yesterday[1]/messages/by-id/CAJ3gD9emnEys=R+T3OVt_87DuMpMfS4KvoRV6e=iSi5PT+9f3w@mail.gmail.com, BufFileLoad() silently eats
pread()'s error and makes them indistinguishable from EOF.

Some observations:

1. BufFileRead() returns size_t (not ssize_t), so it's an
fread()-style interface, not a read()-style interface that could use
-1 to signal errors. Unlike fread(), it doesn't seem to have anything
corresponding to feof()/ferror()/clearerr() that lets the caller
distinguish between EOF and error, so our hash and tuplestore spill
code simply trusts that if there is a 0 size read where it expects a
tuple boundary, it must be EOF.

2. BufFileWrite() is the same, but just like fwrite(), a short write
must always mean error, so there is no problem here.

3. The calling code assumes that unexpected short read or write sets
errno, which isn't the documented behaviour of fwrite() and fread(),
so there we're doing something a bit different (which is fine, just
pointing it out; we're sort of somewhere between the <stdio.h> and
<unistd.h> functions, in terms of error reporting).

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

If we simply added an error flag to track if FileRead() had ever
signalled an error, we could change nodeHashjoin.c to do something
along these lines:

-    if (nread == 0)             /* end of file */
+    if (!BufFileError(file) && nread == 0)             /* end of file */

... and likewise for tuplestore.c:

-    if (nbytes != 0 || !eofOK)
+    if (BufFileError(file) || (nbytes == 0 && !eofOK))
         ereport(ERROR,

About the only advantage to the current design I can see if that you
can probably make your query finish faster by pulling out your temp
tablespace USB stick at the right time. Or am I missing some
complication?

[1]: /messages/by-id/CAJ3gD9emnEys=R+T3OVt_87DuMpMfS4KvoRV6e=iSi5PT+9f3w@mail.gmail.com

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: BufFileRead() error signalling

On Thu, Nov 21, 2019 at 10:50 AM Thomas Munro <thomas.munro@gmail.com> wrote:

As noted by Amit Khandhekar yesterday[1], BufFileLoad() silently eats

Erm, Khandekar, sorry for the extra h!

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: BufFileRead() error signalling

Thomas Munro <thomas.munro@gmail.com> writes:

As noted by Amit Khandhekar yesterday[1], BufFileLoad() silently eats
pread()'s error and makes them indistinguishable from EOF.

That's definitely bad.

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

I'd vote for (3), I think. Making the callers responsible for error
checks just leaves us with a permanent hazard of errors-of-omission,
and as you say, there's really no use-case where we'd be trying to
recover from the error.

I think that the motivation for making the caller do it might've
been an idea that the caller could provide a more useful error
message, but I'm not real sure that that's true --- the caller
doesn't know the physical file's name, and it doesn't necessarily
have the right errno either.

Possibly we could address any loss of usefulness by requiring callers
to pass some sort of context identification to BufFileCreate?

regards, tom lane

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#3)
Re: BufFileRead() error signalling

On Thu, Nov 21, 2019 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog(). I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

I'd vote for (3), I think. Making the callers responsible for error
checks just leaves us with a permanent hazard of errors-of-omission,
and as you say, there's really no use-case where we'd be trying to
recover from the error.

Ok. Here is a first attempt at that. I also noticed that some
callers of BufFileFlush() eat or disguise I/O errors too, so here's a
patch for that, though I'm a little confused about the exact meaning
of EOF from BufFileSeek().

I think that the motivation for making the caller do it might've
been an idea that the caller could provide a more useful error
message, but I'm not real sure that that's true --- the caller
doesn't know the physical file's name, and it doesn't necessarily
have the right errno either.

Yeah, the errno is undefined right now since we don't know if there
was an error.

Possibly we could address any loss of usefulness by requiring callers
to pass some sort of context identification to BufFileCreate?

Hmm. It's an idea. While thinking about the cohesion of this
module's API, I thought it seemed pretty strange to have
BufFileWrite() using a different style of error reporting, so here's
an experimental 0003 patch to make it consistent. I realise that an
API change might affect extensions, so I'm not sure if it's a good
idea even for master (obviously it's not back-patchable). You could
be right that more context would be good at least in the case of
ENOSPC: knowing that (say) a hash join or a sort or CTE is implicated
would be helpful.

Attachments:

0001-Raise-errors-for-I-O-errors-during-BufFileRead.patchapplication/octet-stream; name=0001-Raise-errors-for-I-O-errors-during-BufFileRead.patchDownload+16-9
0002-Report-I-O-errors-from-BufFileFlush-via-ereport.patchapplication/octet-stream; name=0002-Report-I-O-errors-from-BufFileFlush-via-ereport.patchDownload+8-11
0003-Align-BufFileWrite-s-error-reporting-with-BufFileRea.patchapplication/octet-stream; name=0003-Align-BufFileWrite-s-error-reporting-with-BufFileRea.patchDownload+21-64
#5Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Thomas Munro (#4)
Re: BufFileRead() error signalling

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.
size_t
BufFileRead(BufFile *file, void *ptr, size_t size)
{  
     size_t      nread = 0;
     size_t      nthistime;
     if (file->dirty)
     {  
         BufFileFlush(file);
         Assert(!file->dirty);
     }
static void
 BufFileFlush(BufFile *file)
 {
     if (file->dirty)
         BufFileDumpBuffer(file);
     Assert(!file->dirty);

The new status of this patch is: Waiting on Author

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Ibrar Ahmed (#5)
Re: BufFileRead() error signalling

On Wed, Dec 11, 2019 at 2:07 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.

True. Thanks for the review. Before I post a new version, any
opinions on whether to back-patch, and whether to do 0003 in master
only, or at all?

#7Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#6)
Re: BufFileRead() error signalling

On Fri, Jan 24, 2020 at 11:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Dec 11, 2019 at 2:07 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

You are checking file->dirty twice, first before calling the function and within the function too. Same for the Assert. For example.

True. Thanks for the review. Before I post a new version, any
opinions on whether to back-patch, and whether to do 0003 in master
only, or at all?

Rather than answering your actual question, I'd like to complain about this:

  if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
- elog(ERROR, "could not read temporary file: %m");
+ elog(ERROR, "could not read temporary file");

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

Now that is an error message! I am not confused! I don't know why that
happened, but I sure know what happened!

I think we should aim for that kind of style everywhere, so that
complaints about reading and writing files are typically reported as
either of these:

could not read file "%s": %m
could not read file "%s": read only %d of %d bytes

There is an existing precedent in twophase.c which works almost this way:

if (r != stat.st_size)
{
if (r < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read file
\"%s\": %m", path)));
else
ereport(ERROR,
(errmsg("could not read file
\"%s\": read %d of %zu",
path, r,
(Size) stat.st_size)));
}

I'd advocate for a couple more words in the latter message ("only" and
"bytes") but it's still excellent.

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote. It's worth keeping in mind that changes to
message strings will tend to degrade translatability unless the new
strings are copies of existing strings. Regarding 0003, it seems good
to me to make BufFileRead() and BufFileWrite() consistent in terms of
error-handling behavior, so +1 for the concept (but I haven't reviewed
the code).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 10:09:30AM -0500, Robert Haas wrote:

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

Now that is an error message! I am not confused! I don't know why that
happened, but I sure know what happened!

I was briefly looking at 0001, and count -1 from me for the
formulation of the error messages used in those patches.

I think we should aim for that kind of style everywhere, so that
complaints about reading and writing files are typically reported as
either of these:

could not read file "%s": %m
could not read file "%s": read only %d of %d bytes

That's actually not the best fit, because this does not take care of
the pluralization of the second message if you have only 1 byte to
read ;)

A second point to take into account is that the unification of error
messages makes for less translation work, which is always welcome.
Those points have been discussed on this thread:
/messages/by-id/20180520000522.GB1603@paquier.xyz

The related commit is 811b6e3, and the pattern we agreed on for a
partial read was:
"could not read file \"%s\": read %d of %zu"

Then, if the syscall had an error we'd fall down to that:
"could not read file \"%s\": %m"
--
Michael

#9Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#8)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 9:03 PM Michael Paquier <michael@paquier.xyz> wrote:

That's actually not the best fit, because this does not take care of
the pluralization of the second message if you have only 1 byte to
read ;)

But ... if you have only one byte to read, you cannot have a short read.

A second point to take into account is that the unification of error
messages makes for less translation work, which is always welcome.
Those points have been discussed on this thread:
/messages/by-id/20180520000522.GB1603@paquier.xyz

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#9)
Re: BufFileRead() error signalling

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0
--
Michael

#11Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#10)
Re: BufFileRead() error signalling

On Wed, Jan 29, 2020 at 1:26 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0

True. I didn't realize that 'read %d of %zu' was so widely used.

Your grep misses one instance of 'read only %d of %d bytes' because
you grepped for %zu specifically, but that doesn't really change the
overall picture.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#11)
Re: BufFileRead() error signalling

On Wed, Jan 29, 2020 at 10:01:31AM -0500, Robert Haas wrote:

Your grep misses one instance of 'read only %d of %d bytes' because
you grepped for %zu specifically, but that doesn't really change the
overall picture.

Yes, the one in pg_checksums.c. That could actually be changed with a
cast to Size. (Note that there is a second one related to writes but
there is a precedent in md.c, and a similar one in rewriteheap.c..)

Sorry for the digression.
--
Michael

#13Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On Mon, Jan 27, 2020 at 7:09 AM Robert Haas <robertmhaas@gmail.com> wrote:

Rather than answering your actual question, I'd like to complain about
this:

if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
- elog(ERROR, "could not read temporary file: %m");
+ elog(ERROR, "could not read temporary file");

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",

It would be cool to have the block number in more cases in error
messages. For example, in sts_parallel_scan_next()

/* Seek and load the chunk header. */
if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from shared tuplestore temporary
file"),
errdetail_internal("Could not seek to next block.")));

I'm actually in favor of having the block number in this error
message. I think it would be helpful for multi-batch parallel
hashjoin. If a worker reading one SharedTuplestoreChunk encounters an
error and another worker on a different SharedTuplstoreChunk of the
same file does not, I would find it useful to know more about which
block it was on when the error was encountered.

--
Melanie Plageman

#14David Steele
david@pgmasters.net
In reply to: Thomas Munro (#4)
Re: BufFileRead() error signalling

Hi Thomas,

On 11/29/19 9:46 PM, Thomas Munro wrote:

Ok. Here is a first attempt at that.

It's been a few CFs since this patch received an update, though there
has been plenty of discussion.

Perhaps it would be best to mark it RwF until you have a chance to
produce an update patch?

Regards,
--
-David
david@pgmasters.net

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#10)
Re: BufFileRead() error signalling

On 2020-Jan-29, Michael Paquier wrote:

On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that. I guess it was enough of an improvement
over what was there. But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does. So let's not make that an argument for not
changing.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#7)
Re: BufFileRead() error signalling

On 2020-Jan-27, Robert Haas wrote:

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote.

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

Thomas, if you're no longer interested in seeing this done, please let
me know and I can see to it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#16)
Re: BufFileRead() error signalling

On Wed, May 27, 2020 at 12:16 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

FWIW, that seems like a plenty good enough reason for back-patching to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#16)
Re: BufFileRead() error signalling

On Thu, May 28, 2020 at 4:16 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Jan-27, Robert Haas wrote:

OK, now that I've waxed eloquent on that topic, let me have a go at
your actual questions. Regarding back-patching, I don't mind
back-patching error handling patches like this, but I don't feel it's
necessary if we have no evidence that data is actually getting
corrupted as a result of the problem and the chances of it actually
happening seems remote.

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

Ooh.

Thomas, if you're no longer interested in seeing this done, please let
me know and I can see to it.

My indecision on the back-patching question has been resolved by your
crash report and a search on codesearch.debian.org. BufFileRead() and
BufFileWrite() aren't referenced by any of the extensions they
package, so by that standard it's OK to change this stuff in back
branches. I'll post a rebased a patch with Robert and Ibrar's changes
for last reviews later today.

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#18)
Re: BufFileRead() error signalling

On 2020-May-28, Thomas Munro wrote:

My indecision on the back-patching question has been resolved by your
crash report and a search on codesearch.debian.org.

Great news!

BufFileRead() and BufFileWrite() aren't referenced by any of the
extensions they package, so by that standard it's OK to change this
stuff in back branches.

This makes me a bit uncomfortable. For example,
https://inst.eecs.berkeley.edu/~cs186/fa03/hwk5/assign5.html (admittedly
a very old class) encourages students to use this API to create an
aggregate. It might not be the smartest thing in the world, but I'd
prefer not to break such things if they exist proprietarily. Can we
keep the API unchanged in stable branches and just ereport the errors?

I'll post a rebased a patch with Robert and Ibrar's changes
for last reviews later today.

... walks away wondering about BufFileSeekBlock's API ...

(BufFileSeek seems harder to change, due to tuplestore.c)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#15)
Re: BufFileRead() error signalling

On Wed, May 27, 2020 at 11:59:59AM -0400, Alvaro Herrera wrote:

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that. I guess it was enough of an improvement
over what was there. But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does. So let's not make that an argument for not
changing.

Using "only" would be fine by me, though I tend to prefer the existing
one. Now I think that we should avoid "bytes" to not have to worry
about pluralization of error messages. This has been a concern in the
past (see e5d11b9 and the likes).
--
Michael

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#24)
#26Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#25)
#27Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#27)