[PATCH] `pg_dump -Fd` doesn't check write return status...

Started by Sean Chittendenabout 12 years ago5 messageshackers
Jump to latest
#1Sean Chittenden
sean@chittenden.org

The attached patch fixes the case when `pg_dump -Fd …` is called on a partition where write(2) fails for some reason or another. In this case, backup jobs were returning with a successful exit code even though most of the files in the dump directory were all zero length.

I haven’t tested this patch’s failure conditions but the fix seems simple enough: cfwrite() needs to have its return status checked everywhere and exit_horribly() upon any failure. In this case, callers of _WriteData() were not checking the return status and were discarding the negative return status (e.g. ENOSPC).

I made a cursory pass over the code and found one other instance where write status wasn’t being checked and also included that.

-sc

Attachments:

pg_dump_check_write.patchapplication/octet-stream; name=pg_dump_check_write.patchDownload+14-2
#2Bruce Momjian
bruce@momjian.us
In reply to: Sean Chittenden (#1)
Re: [PATCH] `pg_dump -Fd` doesn't check write return status...

On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote:

The attached patch fixes the case when `pg_dump -Fd …` is called
on a partition where write(2) fails for some reason or another. In
this case, backup jobs were returning with a successful exit code even
though most of the files in the dump directory were all zero length.

I haven’t tested this patch’s failure conditions but the fix seems
simple enough: cfwrite() needs to have its return status checked
everywhere and exit_horribly() upon any failure. In this case,
callers of _WriteData() were not checking the return status and were
discarding the negative return status (e.g. ENOSPC).

I made a cursory pass over the code and found one other instance where
write status wasn’t being checked and also included that.

As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.

I have developed a comprehensive patch that addresses all the issues I
could find. The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex. I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.

Patch attached. I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed. I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_dump.difftext/x-diff; charset=us-asciiDownload+373-341
#3Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#2)
Re: [PATCH] `pg_dump -Fd` doesn't check write return status...

On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote:

On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote:

The attached patch fixes the case when `pg_dump -Fd …` is called
on a partition where write(2) fails for some reason or another. In
this case, backup jobs were returning with a successful exit code even
though most of the files in the dump directory were all zero length.

I haven’t tested this patch’s failure conditions but the fix seems
simple enough: cfwrite() needs to have its return status checked
everywhere and exit_horribly() upon any failure. In this case,
callers of _WriteData() were not checking the return status and were
discarding the negative return status (e.g. ENOSPC).

I made a cursory pass over the code and found one other instance where
write status wasn’t being checked and also included that.

As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.

I have developed a comprehensive patch that addresses all the issues I
could find. The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex. I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.

Patch attached. I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed. I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.

Patch applied. Thanks for the report.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Noah Misch
noah@leadboat.com
In reply to: Bruce Momjian (#3)
Re: [PATCH] `pg_dump -Fd` doesn't check write return status...

On Mon, May 05, 2014 at 08:29:56PM -0400, Bruce Momjian wrote:

On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote:

As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.

I have developed a comprehensive patch that addresses all the issues I
could find. The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex. I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.

Patch attached. I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed. I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.

Patch applied. Thanks for the report.

This broke automatic detection of tar-format dumps:

$ pg_dump -Ft -f tardump
$ pg_restore tardump
pg_restore: [archiver] could not read from input file: Success
$ pg_restore -Ft tardump | wc -c
736

Stack trace:

#0 vwrite_msg (modulename=0x4166c9 "archiver", fmt=0x41aa08 "could not read from input file: %s\n", ap=0x7fffffffde38) at pg_backup_utils.c:85
#1 0x000000000040f820 in exit_horribly (modulename=0x4166c9 "archiver", fmt=0x41aa08 "could not read from input file: %s\n") at parallel.c:190
#2 0x000000000040557d in _discoverArchiveFormat (AH=0x425340) at pg_backup_archiver.c:2040
#3 0x0000000000405756 in _allocAH (FileSpec=0x7fffffffecbb "tardump", fmt=archUnknown, compression=0, mode=archModeRead, setupWorkerPtr=0x4044f0 <setupRestoreWorker>) at pg_backup_archiver.c:2160
#4 0x0000000000405819 in OpenArchive (FileSpec=<optimized out>, fmt=<optimized out>) at pg_backup_archiver.c:148
#5 0x0000000000404331 in main (argc=2, argv=0x7fffffffea28) at pg_restore.c:375

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Bruce Momjian
bruce@momjian.us
In reply to: Noah Misch (#4)
Re: [PATCH] `pg_dump -Fd` doesn't check write return status...

On Tue, May 6, 2014 at 09:22:20AM -0400, Noah Misch wrote:

On Mon, May 05, 2014 at 08:29:56PM -0400, Bruce Momjian wrote:

On Tue, Apr 22, 2014 at 03:19:15PM -0400, Bruce Momjian wrote:

As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.

I have developed a comprehensive patch that addresses all the issues I
could find. The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex. I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.

Patch attached. I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed. I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.

Patch applied. Thanks for the report.

This broke automatic detection of tar-format dumps:

$ pg_dump -Ft -f tardump
$ pg_restore tardump
pg_restore: [archiver] could not read from input file: Success
$ pg_restore -Ft tardump | wc -c
736

OK, thanks for the report. Fixed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers