pread, pwrite, etc return ssize_t not int
Coverity whinged this morning about the following bit in
the new pg_combinebackup code:
644 unsigned rb;
645
646 /* Read the block from the correct source, except if dry-run. */
647 rb = pg_pread(s->fd, buffer, BLCKSZ, offsetmap[i]);
648 if (rb != BLCKSZ)
649 {
CID 1559912: Control flow issues (NO_EFFECT)
This less-than-zero comparison of an unsigned value is never true. "rb < 0U".
650 if (rb < 0)
651 pg_fatal("could not read file \"%s\": %m", s->filename);
It's dead right to complain of course. (I kind of think that the
majority of places where reconstruct.c is using "unsigned" variables
are poorly-thought-through; many of them look like they should be
size_t, and I suspect some other ones beside this one are flat wrong
or at least unnecessarily fragile. But I digress.)
While looking around for other places that might've made comparable
mistakes, I noted that we have places that are storing the result of
pg_pread[v]/pg_pwrite[v] into an "int" variable even though they are
passing a size_t count argument that there is no obvious reason to
believe must fit in int. This seems like trouble waiting to happen,
so I fixed some of these in the attached. The major remaining place
that I think we ought to change is the newly-minted
FileRead[V]/FileWrite[V] functions, which are declared to return int
but really should be returning ssize_t IMO. I didn't do that here
though.
We could go further by insisting that *all* uses of pg_pread/pg_pwrite
use ssize_t result variables. I think that's probably overkill --- in
the example above, which is only asking to write BLCKSZ worth of data,
surely an int is sufficient. But you could argue that allowing this
pattern at all creates risk of copy/paste errors.
Of course the real elephant in the room is that plain old read(2)
and write(2) also return ssize_t. I've not attempted to vet every
call of those, and I think it'd likely be a waste of effort, as
we're unlikely to ever try to shove more than INT_MAX worth of
data through them. But it's a bit harder to make that argument
for the iovec-based file APIs. I think we ought to try to keep
our uses of those functions clean on this point.
Thoughts?
regards, tom lane
Attachments:
v1-fix-wrong-datatype-for-pread-and-pwrite.patchtext/x-diff; charset=us-ascii; name=v1-fix-wrong-datatype-for-pread-and-pwrite.patchDownload+15-15
On Mon, Dec 25, 2023 at 7:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Coverity whinged this morning about the following bit in
the new pg_combinebackup code:644 unsigned rb;
645
646 /* Read the block from the correct source, except if dry-run. */
647 rb = pg_pread(s->fd, buffer, BLCKSZ, offsetmap[i]);
648 if (rb != BLCKSZ)
649 {CID 1559912: Control flow issues (NO_EFFECT)
This less-than-zero comparison of an unsigned value is never true. "rb < 0U".650 if (rb < 0)
651 pg_fatal("could not read file \"%s\": %m", s->filename);It's dead right to complain of course. (I kind of think that the
majority of places where reconstruct.c is using "unsigned" variables
are poorly-thought-through; many of them look like they should be
size_t, and I suspect some other ones beside this one are flat wrong
or at least unnecessarily fragile. But I digress.)
Yeah.
While looking around for other places that might've made comparable
mistakes, I noted that we have places that are storing the result of
pg_pread[v]/pg_pwrite[v] into an "int" variable even though they are
passing a size_t count argument that there is no obvious reason to
believe must fit in int. This seems like trouble waiting to happen,
so I fixed some of these in the attached. The major remaining place
that I think we ought to change is the newly-minted
FileRead[V]/FileWrite[V] functions, which are declared to return int
but really should be returning ssize_t IMO. I didn't do that here
though.
Agreed in theory. Note that we've only been using size_t in fd.c
functions since:
commit 2d4f1ba6cfc2f0a977f1c30bda9848041343e248
Author: Peter Eisentraut <peter@eisentraut.org>
Date: Thu Dec 8 08:51:38 2022 +0100
Update types in File API
Make the argument types of the File API match stdio better:
- Change the data buffer to void *, from char *.
- Change FileWrite() data buffer to const on top of that.
- Change amounts to size_t, from int.
I guess it was an oversight not to change the return type to match at
the same time. That said, I think that would only be for tidiness.
Some assorted observations:
1. We don't yet require "large file" support, meaning that we use
off_t our fd.c and lseek()/p*() replacement functions, but we know it
is only 32 bits on Windows, and we avoid creating large files. I
think that means that a hypothetical very large write would break that
assumption, creating data whose position cannot be named in those
calls. We could fix that on Windows by adjusting our wrappers, either
to work with pgoff_t instead off off_t (yuck) or redefining off_t
(yuck), but given that both options are gross, so far we have imagined
that we should move towards using large files only conditionally, on
sizeof(off_t) >= 8.
2. Windows' native read() and write() functions still have prototypes
like 1988 POSIX, eg int read(int filedes, void *buf, unsigned int
nbyte). It was 2001 POSIX that changed them to ssize_t read(int
filedesc, void *buf, size_t nbytes). I doubt there is much that can
be done about that, except perhaps adding a wrapper that caps it.
Seems like overkill for a hypothetical sort of a problem...
3. Windows' native ReadFile() and WriteFile() functions, which our
'positionified' pg_p*() wrapper functions use, work in terms of DWORD
= unsigned long which is 32 bit on that cursed ABI. Our wrappers
should probably cap.
I think a number of Unixoid systems implemented the POSIX interface
change by capping internally, which doesn't matter much in practice
because no one really tries to transfer gigabytes at once, and any
non-trivial transfer size probably requires handling short transfers.
For example, man read on Linux:
On Linux, read() (and similar system calls) will transfer at most
0x7ffff000 (2,147,479,552) bytes, returning the number of bytes
actually transferred. (This is true on both 32-bit and 64-bit
systems.)
We could go further by insisting that *all* uses of pg_pread/pg_pwrite
use ssize_t result variables. I think that's probably overkill --- in
the example above, which is only asking to write BLCKSZ worth of data,
surely an int is sufficient. But you could argue that allowing this
pattern at all creates risk of copy/paste errors.
Yeah.
Of course the real elephant in the room is that plain old read(2)
and write(2) also return ssize_t. I've not attempted to vet every
call of those, and I think it'd likely be a waste of effort, as
we're unlikely to ever try to shove more than INT_MAX worth of
data through them. But it's a bit harder to make that argument
for the iovec-based file APIs. I think we ought to try to keep
our uses of those functions clean on this point.
Yeah I think it's OK for a caller that knows it's passing in an int
value to (implicitly) cast the return to int. But it'd be nice to
make our I/O functions look and feel like standard functions and
return ssize_t.
Patches attached.
PS Correction to my earlier statement about POSIX: the traditional K&R
interfaces were indeed in the original POSIX.1 1988 but it was the
1990 edition (approximately coinciding with standard C) that adopted
void, size_t, const and invented ssize_t.
Attachments:
0001-Return-ssize_t-in-fd.c-I-O-functions.patchapplication/octet-stream; name=0001-Return-ssize_t-in-fd.c-I-O-functions.patchDownload+8-9
0002-Fix-theoretical-overflow-in-Windows-pg_pread-pg_pwri.patchapplication/octet-stream; name=0002-Fix-theoretical-overflow-in-Windows-pg_pread-pg_pwri.patchDownload+6-1
On 27.02.24 12:21, Thomas Munro wrote:
Patches attached.
PS Correction to my earlier statement about POSIX: the traditional K&R
interfaces were indeed in the original POSIX.1 1988 but it was the
1990 edition (approximately coinciding with standard C) that adopted
void, size_t, const and invented ssize_t.
0001-Return-ssize_t-in-fd.c-I-O-functions.patch
This patch looks correct to me.
0002-Fix-theoretical-overflow-in-Windows-pg_pread-pg_pwri.patch
I have two comments on that:
For the overflow of the input length (size_t -> DWORD), I don't think we
actually need to do anything. The size argument would be truncated, but
the callers would just repeat the calls with the remaining size, so in
effect they will read the data in chunks of rest + N * DWORD_MAX. The
patch just changes this to chunks of N * 1GB + rest.
The other issue, the possible overflow of size_t -> ssize_t is not
specific to Windows. We could install some protection against that on
some other layer, but it's unclear how widespread that issue is or what
the appropriate fix is. POSIX says that passing in a size larger than
SSIZE_MAX has implementation-defined effect. The FreeBSD man page says
that this will result in an EINVAL error. So if we here truncate
instead of error, we'd introduce a divergence.
On Sat, Mar 2, 2024 at 3:12 AM Peter Eisentraut <peter@eisentraut.org> wrote:
0001-Return-ssize_t-in-fd.c-I-O-functions.patch
This patch looks correct to me.
Thanks, I'll push this one.
0002-Fix-theoretical-overflow-in-Windows-pg_pread-pg_pwri.patch
I have two comments on that:
For the overflow of the input length (size_t -> DWORD), I don't think we
actually need to do anything. The size argument would be truncated, but
the callers would just repeat the calls with the remaining size, so in
effect they will read the data in chunks of rest + N * DWORD_MAX. The
patch just changes this to chunks of N * 1GB + rest.
But implicit conversion size_t -> DWORD doesn't convert large numbers
to DWORD_MAX, it just cuts off the high bits, and that might leave you
with zero. Zero has a special meaning (if we assume that kernel
doesn't reject a zero size argument outright, I dunno): if returned by
reads it indicates EOF, and if returned by writes a typical caller
would either loop forever making no progress or (in some of our code)
conjure up a fake ENOSPC. Hence desire to impose a cap.
I'm on the fence about whether it's worth wasting any more energy on
this, I mean we aren't really going to read/write 4GB, so I'd be OK if
we just left this as an observation in the archives...
The other issue, the possible overflow of size_t -> ssize_t is not
specific to Windows. We could install some protection against that on
some other layer, but it's unclear how widespread that issue is or what
the appropriate fix is. POSIX says that passing in a size larger than
SSIZE_MAX has implementation-defined effect. The FreeBSD man page says
that this will result in an EINVAL error. So if we here truncate
instead of error, we'd introduce a divergence.
Yeah, right, that's the caller's job to worry about on all platforms
so I was wrong to mention ssize_t in the comment.
On 01.03.24 22:23, Thomas Munro wrote:
For the overflow of the input length (size_t -> DWORD), I don't think we
actually need to do anything. The size argument would be truncated, but
the callers would just repeat the calls with the remaining size, so in
effect they will read the data in chunks of rest + N * DWORD_MAX. The
patch just changes this to chunks of N * 1GB + rest.But implicit conversion size_t -> DWORD doesn't convert large numbers
to DWORD_MAX, it just cuts off the high bits, and that might leave you
with zero. Zero has a special meaning (if we assume that kernel
doesn't reject a zero size argument outright, I dunno): if returned by
reads it indicates EOF, and if returned by writes a typical caller
would either loop forever making no progress or (in some of our code)
conjure up a fake ENOSPC. Hence desire to impose a cap.
Right, my thinko. Your patch is correct then.