No error checking when reading from file using zstd in pg_dump

Started by Evgeniy Gorbanev9 months ago29 messages

gorbanyoves@basealt.ru

9 months ago

Hello.

In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
returns true. But if you look at the Zstd_gets and Zstd_getc functions,
where Zstd_read is called via CFH->read_func, it is expected that
the Zstd_read function can also return false. In case of
a read-from-file error, the process is expected to terminate, but
pg_dump will continue the process.
I assume that after checking
if (cnt == 0)
should be
return false;

Patch attached.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Reporter: Evgeniy Gorbanev (gorbanyoves@basealt.ru).

Organization: BaseALT (org@basealt.ru).

Best regards,
Evgeniy

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Evgeniy Gorbanev (#1)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:

In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
returns true. But if you look at the Zstd_gets and Zstd_getc functions,
where Zstd_read is called via CFH->read_func, it is expected that
the Zstd_read function can also return false. In case of
a read-from-file error, the process is expected to terminate, but
pg_dump will continue the process.
I assume that after checking
if (cnt == 0)
should be
return false;

 	if (cnt == 0)
-		break;
+		return false;

As cnt is coming from fread() returning false here would be wrong as you cannot
from 0 alone know if it's EOF or an error. Instead it needs to inspect the
stream with feof() and ferror() to know how to proceed.

--
Daniel Gustafsson

Evgeniy Gorbanev

gorbanyoves@basealt.ru

9 months ago

In reply to: Daniel Gustafsson (#2)

Re: No error checking when reading from file using zstd in pg_dump

16.06.2025 14:25, Daniel Gustafsson пишет:

On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
returns true. But if you look at the Zstd_gets and Zstd_getc functions,
where Zstd_read is called via CFH->read_func, it is expected that
the Zstd_read function can also return false. In case of
a read-from-file error, the process is expected to terminate, but
pg_dump will continue the process.
I assume that after checking
if (cnt == 0)
should be
return false;
if (cnt == 0)
-		break;
+		return false;
As cnt is coming from fread() returning false here would be wrong as you cannot
from 0 alone know if it's EOF or an error. Instead it needs to inspect the
stream with feof() and ferror() to know how to proceed.

--
Daniel Gustafsson

The feof() check is in the calling functions, e.g. in the Zstd_getc
function.

Regards, Evgeniy

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Evgeniy Gorbanev (#3)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 10:52, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:

16.06.2025 14:25, Daniel Gustafsson пишет:
On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
returns true. But if you look at the Zstd_gets and Zstd_getc functions,
where Zstd_read is called via CFH->read_func, it is expected that
the Zstd_read function can also return false. In case of
a read-from-file error, the process is expected to terminate, but
pg_dump will continue the process.
I assume that after checking
if (cnt == 0)
should be
return false;
if (cnt == 0)
- break;
+ return false;
As cnt is coming from fread() returning false here would be wrong as you cannot
from 0 alone know if it's EOF or an error. Instead it needs to inspect the
stream with feof() and ferror() to know how to proceed.
The feof() check is in the calling functions, e.g. in the Zstd_getc
function.

That function is using feof/ferror to log an appropriate error message, what
you are proposing is to consider all cases of EOF as an error. If you test
that you will see lots of test starting to fail.

--
Daniel Gustafsson

Evgeniy Gorbanev

gorbanyoves@basealt.ru

9 months ago

In reply to: Daniel Gustafsson (#4)

Re: No error checking when reading from file using zstd in pg_dump

16.06.2025 15:00, Daniel Gustafsson пишет:

On 16 Jun 2025, at 10:52, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:

16.06.2025 14:25, Daniel Gustafsson пишет:
On 16 Jun 2025, at 10:14, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
In src/bin/pg_dump/compress_zstd.c, the Zstd_read function always
returns true. But if you look at the Zstd_gets and Zstd_getc functions,
where Zstd_read is called via CFH->read_func, it is expected that
the Zstd_read function can also return false. In case of
a read-from-file error, the process is expected to terminate, but
pg_dump will continue the process.
I assume that after checking
if (cnt == 0)
should be
return false;
if (cnt == 0)
- break;
+ return false;
As cnt is coming from fread() returning false here would be wrong as you cannot
from 0 alone know if it's EOF or an error. Instead it needs to inspect the
stream with feof() and ferror() to know how to proceed.
The feof() check is in the calling functions, e.g. in the Zstd_getc
function.
That function is using feof/ferror to log an appropriate error message, what
you are proposing is to consider all cases of EOF as an error. If you test
that you will see lots of test starting to fail.

--
Daniel Gustafsson

I ran tests for pg_dump and they all passed. Logs attached.
Please check.

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Evgeniy Gorbanev (#1)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 11:26, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:

I ran tests for pg_dump and they all passed. Logs attached.

Files=7, Tests=11918, 35 wallclock secs ( 0.59 usr 0.07 sys + 7.92 cusr 4.32 csys = 12.90 CPU)

That seems like a low number of tests executed, with ZStd enabled I see over
13200 tests in the log. Are you sure you are building with ZStd compression
enabled?

--
Daniel Gustafsson

Import Notes

Reply to msg id not found: b653a2f4-1dcb-45ba-b899-fc6aa80794f2@basealt.ru

Evgeniy Gorbanev

gorbanyoves@basealt.ru

9 months ago

In reply to: Daniel Gustafsson (#6)

Re: No error checking when reading from file using zstd in pg_dump

16.06.2025 15:43, Daniel Gustafsson пишет:

On 16 Jun 2025, at 11:26, Evgeniy Gorbanev <gorbanyoves@basealt.ru> wrote:
I ran tests for pg_dump and they all passed. Logs attached.

Files=7, Tests=11918, 35 wallclock secs ( 0.59 usr 0.07 sys + 7.92 cusr 4.32 csys = 12.90 CPU)

That seems like a low number of tests executed, with ZStd enabled I see over
13200 tests in the log. Are you sure you are building with ZStd compression
enabled?

--
Daniel Gustafsson

Yes, you are right. Now I see the errors.

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Evgeniy Gorbanev (#7)

Re: No error checking when reading from file using zstd in pg_dump

I think the real problem here is that what the code is doing is
precisely not what is documented in compress_io.h:

/*
* Read 'size' bytes of data from the file and store them into 'ptr'.
* Optionally it will store the number of bytes read in 'rsize'.
*
* Returns true on success and throws an internal error otherwise.
*/
bool (*read_func) (void *ptr, size_t size, size_t *rsize,
CompressFileHandle *CFH);

I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.
Otherwise we probably need to make them all alike; even if there's
not a bug today, one will soon appear from someone believing the
comment.

regards, tom lane

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tom Lane (#8)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.

AFAICT all other callers of this API are throwing an error with pg_fatal, and
so does the function in question for ZStd decompression errors. If we handle
the case of fread() returning 0 to indicate an error like the below *untested
sketch* (with a better error message) this function is fully API compliant as
well.

                /* If we have no more input to consume, we're done */
                if (cnt == 0)
+               {
+                       if (ferror(unconstify(void *, input->src)))
+                               pg_fatal("could not read data to decompress: %m");
+
                        break;
+               }

If this seems like a good approach then Zstd_getc can be simplified as well as
it no longer needs to call ferror, it still needs to check feof though.

--
Daniel Gustafsson

#10

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Daniel Gustafsson (#9)

Re: No error checking when reading from file using zstd in pg_dump

Daniel Gustafsson <daniel@yesql.se> writes:

On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.

AFAICT all other callers of this API are throwing an error with pg_fatal, and
so does the function in question for ZStd decompression errors.

I think I agree that we need to handle the ferror() case inside the
read_func for consistency. But there is another problem, which is
that neither of its callers are paying attention to the API: as
defined, the read_func can never return anything but "true",
which is how Zstd_read behaves. But both the callers in
compress_zstd.c seem to think they should test its result to
detect EOF. Apparently, our tests do not cover the case of an
unexpected EOF.

This API is actually quite bizarrely defined, if you ask me.
Returning the byte count is optional, but if you don't pay
attention to the byte count you cannot know if you got any
data. At best, that's bug-encouraging.

regards, tom lane

#11

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tom Lane (#10)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.

AFAICT all other callers of this API are throwing an error with pg_fatal, and
so does the function in question for ZStd decompression errors.

I think I agree that we need to handle the ferror() case inside the
read_func for consistency. But there is another problem, which is
that neither of its callers are paying attention to the API: as
defined, the read_func can never return anything but "true",
which is how Zstd_read behaves. But both the callers in
compress_zstd.c seem to think they should test its result to
detect EOF.

Attached is a stab at fixing both the error handling in read_func as well as
the callers misuse of the API.

Apparently, our tests do not cover the case of an
unexpected EOF.

I admittedly ran out of steam when looking at adding something like this to our
pg_dump tests.

This API is actually quite bizarrely defined, if you ask me.
Returning the byte count is optional, but if you don't pay
attention to the byte count you cannot know if you got any
data. At best, that's bug-encouraging.

Agreed. Given that many of implementations in the gzip support code directly
return the gzXXX function I suspect it was modeled around GZip but thats an
unresearched guess. The fact that this returns Z_NULL where the API defines
NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..

--
Daniel Gustafsson

#12

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Daniel Gustafsson (#11)

Re: No error checking when reading from file using zstd in pg_dump

Daniel Gustafsson <daniel@yesql.se> writes:

On 16 Jun 2025, at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This API is actually quite bizarrely defined, if you ask me.
Returning the byte count is optional, but if you don't pay
attention to the byte count you cannot know if you got any
data. At best, that's bug-encouraging.

Agreed. Given that many of implementations in the gzip support code directly
return the gzXXX function I suspect it was modeled around GZip but thats an
unresearched guess. The fact that this returns Z_NULL where the API defines
NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..

After looking around a bit more I realized that this API is a complete
disaster: it's not only bug-prone, but there are actual bugs in
multiple callers, eg _ReadBuf() totally fails to detect early EOF as
it intends to. We need to fix it, not slap some band-aids on.
Draft patch attached.

The write_func side is a bit of a mess too. I fixed some obvious API
violations in the attached, but I think we need to do more, because
it seems that zero thought was given to reporting errors sanely.
The callers all assume that strerror() is what to use to report an
incomplete write, but this is not appropriate in every case, for
instance we need to pay attention to gzerror() in gzip cases.
I'm inclined to think that the best answer is to move error reporting
into the write_funcs, and just make the API spec be "write or die".
I've not tackled that here though.

I was depressed to find that Zstd_getc reads one byte into
an otherwise-uninitialized int and then returns the whole int.
Even if we're lucky enough for the variable to start off zero,
this would fail utterly on big-endian machines. The only
reason we've not noticed is that the regression tests do not
reach Zstd_getc, nor most of the other getc_func implementations.

Now I'm afraid to look at the rest of the compress_io.h API.
If it's as badly written as these parts, there are more bugs
to be found.

regards, tom lane

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

9 months ago

In reply to: Daniel Gustafsson (#11)

Re: No error checking when reading from file using zstd in pg_dump

On 6/16/25 17:41, Daniel Gustafsson wrote:

On 16 Jun 2025, at 16:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 16 Jun 2025, at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've not checked to see what the other users of this API do, but
if they're all like this then we need to fix that comment.

AFAICT all other callers of this API are throwing an error with pg_fatal, and
so does the function in question for ZStd decompression errors.

I think I agree that we need to handle the ferror() case inside the
read_func for consistency. But there is another problem, which is
that neither of its callers are paying attention to the API: as
defined, the read_func can never return anything but "true",
which is how Zstd_read behaves. But both the callers in
compress_zstd.c seem to think they should test its result to
detect EOF.

Attached is a stab at fixing both the error handling in read_func as well as
the callers misuse of the API.

Apparently, our tests do not cover the case of an
unexpected EOF.

I admittedly ran out of steam when looking at adding something like this to our
pg_dump tests.

This API is actually quite bizarrely defined, if you ask me.
Returning the byte count is optional, but if you don't pay
attention to the byte count you cannot know if you got any
data. At best, that's bug-encouraging.

Agreed. Given that many of implementations in the gzip support code directly
return the gzXXX function I suspect it was modeled around GZip but thats an
unresearched guess. The fact that this returns Z_NULL where the API defines
NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..

Yes. It's been a while since this commit, but I recall we started with a
gzip-only implementation. 16 introduced this API, but it's definitely
the case it was based on the initial gzip code.

Regarding the Z_NULL, I believe it has always been ignored like this, at
least since 9.1. The code simply returns what gzgets() returns, and then
compares that to NULL, etc. Is there there's a better way to deal with
Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL,
although Z_NULL is simply defined as 0. I don't recall if NULL has some
additional magic.

regards

--
Tomas Vondra

#14

Tomas Vondra

tomas.vondra@2ndquadrant.com

9 months ago

In reply to: Tom Lane (#12)

Re: No error checking when reading from file using zstd in pg_dump

On 6/16/25 19:56, Tom Lane wrote:

...

After looking around a bit more I realized that this API is a complete
disaster: it's not only bug-prone, but there are actual bugs in
multiple callers, eg _ReadBuf() totally fails to detect early EOF as
it intends to. We need to fix it, not slap some band-aids on.
Draft patch attached.

The write_func side is a bit of a mess too. I fixed some obvious API
violations in the attached, but I think we need to do more, because
it seems that zero thought was given to reporting errors sanely.
The callers all assume that strerror() is what to use to report an
incomplete write, but this is not appropriate in every case, for
instance we need to pay attention to gzerror() in gzip cases.
I'm inclined to think that the best answer is to move error reporting
into the write_funcs, and just make the API spec be "write or die".
I've not tackled that here though.

I was depressed to find that Zstd_getc reads one byte into
an otherwise-uninitialized int and then returns the whole int.
Even if we're lucky enough for the variable to start off zero,
this would fail utterly on big-endian machines. The only
reason we've not noticed is that the regression tests do not
reach Zstd_getc, nor most of the other getc_func implementations.

Now I'm afraid to look at the rest of the compress_io.h API.
If it's as badly written as these parts, there are more bugs
to be found.

Well, that doesn't sound great :-( Most of this is likely my fault, as
the one who pushed e9960732a961 (and some commits that built on it).
Sorry about that. I'll take a fresh look at the commits this week, but
considering I missed the issues before commit ...

For a moment I was worried about breaking ABI when fixing this in the
backbranches, but I guess that's not an issue for tools like pg_dump.

regards

--
Tomas Vondra

#15

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Tomas Vondra (#14)

Re: No error checking when reading from file using zstd in pg_dump

Tomas Vondra <tomas@vondra.me> writes:

For a moment I was worried about breaking ABI when fixing this in the
backbranches, but I guess that's not an issue for tools like pg_dump.

Yeah, I think it'd be okay to change compress_io.h APIs in the back
branches; it's hard to see how anything outside pg_dump+pg_restore
would be depending on that.

regards, tom lane

#16

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tomas Vondra (#13)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 21:45, Tomas Vondra <tomas@vondra.me> wrote:

Regarding the Z_NULL, I believe it has always been ignored like this, at
least since 9.1. The code simply returns what gzgets() returns, and then
compares that to NULL, etc. Is there there's a better way to deal with
Z_NULL? I suppose we could explicitly check/translate Z_NULL to NULL,
although Z_NULL is simply defined as 0. I don't recall if NULL has some
additional magic.

Right, to be clear, I don't think there is a bug here (or the risk one going
forward). It's just my own preference of not mixing API concepts

--
Daniel Gustafsson

#17

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tom Lane (#15)

Re: No error checking when reading from file using zstd in pg_dump

On 16 Jun 2025, at 22:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Tomas Vondra <tomas@vondra.me> writes:

For a moment I was worried about breaking ABI when fixing this in the
backbranches, but I guess that's not an issue for tools like pg_dump.

Yeah, I think it'd be okay to change compress_io.h APIs in the back
branches; it's hard to see how anything outside pg_dump+pg_restore
would be depending on that.

Agreed, and the attached patchset takes that one step further by also changing
the signature of write_func to make errorhandling easier. More on that below.

I spent a little bit of time reading over all the implementations and cross
referencing the API for conformity, and came up with the attached. The 0001
patch is the one from upstream, and each subsequent commit is fixing one
function for all the implementations. Before pushing it should all be squashed
into a single commit IMHO.

Each patch has a commitmessage which describes the what/why so I won't go into
full detail here, but the main changes introduced are:

* Make write_func throw an error on all error conditions
* Ensure that functions return an error when assumed to, and call pg_fatal
when assumed to not return on error
* Try to capture more errors and ensure that errno has been reset

The reason for the jump in version number is that I've traded patches off-list
with Tomas over the past few days, some of which are omitted here due to being
v19 material. I might well have missed some changes which aren't required to
backpatch, and if so we need to pull those out.

One thing this doesn't address is the lack of testing, which also should be
tackled (having fault injection capabilities for client utils would be neat).

--
Daniel Gustafsson

#18

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Daniel Gustafsson (#17)

Re: No error checking when reading from file using zstd in pg_dump

Daniel Gustafsson <daniel@yesql.se> writes:

I spent a little bit of time reading over all the implementations and cross
referencing the API for conformity, and came up with the attached. The 0001
patch is the one from upstream, and each subsequent commit is fixing one
function for all the implementations. Before pushing it should all be squashed
into a single commit IMHO.

Thanks for tackling this!

I looked over this patchset briefly, and found a couple of nits:

v5-0002, in compress_io.h:

+ * Returns true on success and throws error for all error conditions.

It doesn't return true anymore. Should be more like

+ * Returns nothing. Exits via pg_fatal for all error conditions.

In LZ4Stream_write: you dropped the bit about

- errno = (errno) ? errno : ENOSPC;

but I think that's still necessary: we must assume ENOSPC if fwrite
doesn't set errno. Other fwrite callers (write_none, Zstd_write) need
this too. v5-0004 has an instance too, in Zstd_close. I did not check
to see if other fwrite calls are OK, but it'd be good to verify
that they all follow the pattern of presetting errno to 0 and then
replacing that with ENOSPC.

regards, tom lane

#19

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tom Lane (#18)

Re: No error checking when reading from file using zstd in pg_dump

On 25 Jun 2025, at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I looked over this patchset briefly, and found a couple of nits:

Thanks for looking!

v5-0002, in compress_io.h:

+ * Returns true on success and throws error for all error conditions.

It doesn't return true anymore. Should be more like

+ * Returns nothing. Exits via pg_fatal for all error conditions.

Instead of this I changed the write_func signature to return the number of
bytes written as size_t. The API is loosely modelled around the libc stream API so
going to void seemed less justifiable than size_t, even if the actual
returnvalue is fairly useless due to erroring out via pg_fatal.

In LZ4Stream_write: you dropped the bit about

- errno = (errno) ? errno : ENOSPC;

but I think that's still necessary: we must assume ENOSPC if fwrite
doesn't set errno.

Doh, of course, fixed in the attached.

Other fwrite callers (write_none, Zstd_write) need
this too. v5-0004 has an instance too, in Zstd_close. I did not check
to see if other fwrite calls are OK, but it'd be good to verify
that they all follow the pattern of presetting errno to 0 and then
replacing that with ENOSPC.

I went over all the fwrite callsites *in this patchset* and made sure they
follow the pattern. There are a number of more fwrite calls in pg_dump (and
elsewhere) which might need the same treatment but I left those for a separate
patch to keep this focused on the compression API.

--
Daniel Gustafsson

#20

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Daniel Gustafsson (#19)

Re: No error checking when reading from file using zstd in pg_dump

Daniel Gustafsson <daniel@yesql.se> writes:

On 25 Jun 2025, at 17:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It doesn't return true anymore. Should be more like
+ * Returns nothing. Exits via pg_fatal for all error conditions.

Instead of this I changed the write_func signature to return the number of
bytes written as size_t. The API is loosely modelled around the libc stream API so
going to void seemed less justifiable than size_t, even if the actual
returnvalue is fairly useless due to erroring out via pg_fatal.

Hm. My one concern about that is that using "void" ensures that the
compiler will catch any write_funcs or callsites that we missed
updating, whereas replacing bool with size_t probably prevents that
detection.

Of course this is now moot for the in-core code since we presumably
caught everything already. But I wonder about patches (say to
support an additional compression method) that might be in the
pipeline, or in use in some local fork somewhere. There's no
certainty that they'd get the word, especially since any tests
that didn't exercise failure conditions would still work.

So on the whole I prefer the "void" approach. I'm not dead
set on that though, it's just a niggling worry.

regards, tom lane

#21

Daniel Gustafsson

daniel@yesql.se

9 months ago

In reply to: Tom Lane (#20)

#22

Daniel Gustafsson

daniel@yesql.se

8 months ago

In reply to: Daniel Gustafsson (#21)