pg_dump, pg_dumpall and data durability
Hi all,
In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall. Here is in a
couple of lines my proposal:
- Addition in _archiveHandle of a field to track if the dump generated
should be synced or not.
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
- Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
There is for example nothing to do for pg_backup_null.c
- Addition of --nosync option to allow users to disable it. By default
it is enabled.
Note that to make the data durable, the file need to be sync'ed as
well as its parent folder. So with pg_dump we can only make that
really durable with -Fd. I think that in the case where the user
specifies an output file for the other modes we should sync it, that's
the best we can do. This last statement applies as well for
pg_dumpall.
Of course, if no output file is specified, that's up to the user to
deal with the sync phases.
Or more simply, as truly durability can just be achieved if each file
and their parent directory are fsync'd, we just support the operation
for -Fd. Still I'd like to think htat we had better do the best we can
here and do things as well for the other modes.
Thoughts? I'd like to prepare a patch according to those lines for the next CF.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 13, 2016 at 2:49 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
I have sent the email too early here... In this case the sync is a
no-op. It is missing a sentence.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 13, 2016 at 2:49 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall. Here is in a
couple of lines my proposal:
- Addition in _archiveHandle of a field to track if the dump generated
should be synced or not.
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
- Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
There is for example nothing to do for pg_backup_null.c
- Addition of --nosync option to allow users to disable it. By default
it is enabled.
Note that to make the data durable, the file need to be sync'ed as
well as its parent folder. So with pg_dump we can only make that
really durable with -Fd. I think that in the case where the user
specifies an output file for the other modes we should sync it, that's
the best we can do. This last statement applies as well for
pg_dumpall.Thoughts? I'd like to prepare a patch according to those lines for the next CF.
Okay, here is a patch doing the above. I have added a new --nosync
option to pg_dump and pg_dumpall to switch to the pre-10 behavior. I
have arrived at the conclusion that it is better not to touch at
_EndData and _EndBlob, and just issue the fsync in CloseArchive when
all the write operations are done. In the case of the directory
format, the fsync is done on all the entries recursively. This makes
as well the patch more simple. The regression tests calling pg_dump
don't use --nosync yet in this patch, that's a move that could be done
afterwards.
I have added that to next CF:
https://commitfest.postgresql.org/11/823/
--
Michael
Attachments:
pgdump-sync-v1.patchapplication/x-download; name=pgdump-sync-v1.patchDownload+105-9
Michael Paquier wrote:
In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall.Okay, here is a patch doing the above. I have added a new --nosync
option to pg_dump and pg_dumpall to switch to the pre-10 behavior. I
have arrived at the conclusion that it is better not to touch at
_EndData and _EndBlob, and just issue the fsync in CloseArchive when
all the write operations are done. In the case of the directory
format, the fsync is done on all the entries recursively. This makes
as well the patch more simple. The regression tests calling pg_dump
don't use --nosync yet in this patch, that's a move that could be done
afterwards.
The patch does not apply, I had to change the hunk for
src/include/common/file_utils.h.
Also, compilation fails because function "pre_fsync_fname" cannot be
resolved during linking. Is that a typo for "pre_fsync_fname" or is
the patch incomplete?
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Nov 8, 2016 at 1:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
The patch does not apply, I had to change the hunk for
src/include/common/file_utils.h.
Yes, the patch has rotten a bit because of f82ec32a. 5d58c07a has also
made the --noxxx option names appearing as --no-xxx.
Also, compilation fails because function "pre_fsync_fname" cannot be
resolved during linking. Is that a typo for "pre_fsync_fname" or is
the patch incomplete?
A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.
v2 is attached, fixing those issues.
--
Michael
Attachments:
pgdump-sync-v2.patchtext/x-patch; charset=US-ASCII; name=pgdump-sync-v2.patchDownload+105-9
On Mon, Nov 7, 2016 at 7:52 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Tue, Nov 8, 2016 at 1:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
The patch does not apply, I had to change the hunk for
src/include/common/file_utils.h.Yes, the patch has rotten a bit because of f82ec32a. 5d58c07a has also
made the --noxxx option names appearing as --no-xxx.Also, compilation fails because function "pre_fsync_fname" cannot be
resolved during linking. Is that a typo for "pre_fsync_fname" or is
the patch incomplete?A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.v2 is attached, fixing those issues.
Kyotaro Horiguchi set this patch to "Ready for Committer" in the
CommitFest application, but that seems entirely premature to me
considering that v2 has had no review at all, or at least not that I
see on this thread. I'm setting it back to "Needs Review".
First question: Do we even want this? Generally, when a program
writes to a file, we rely on the operating system to decide when that
data should be written back to disk. We have to override that
distinction for things internal to PostgreSQL because we need certain
bits of data to reach the disk in a certain order, but it's unclear to
me how far outside the core database system we want to extend that.
Are we going to have psql fsync() data it writes to a file when \o is
used, for example? That would seem to me to be beyond insane, because
we have no idea whether the user actually needs that file to be
durable. It is a better bet that a pg_dump command's output needs
durability, of course, but I feel that we shouldn't just go disabling
the filesystem cache one program at a time without some guiding
principle.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
First question: Do we even want this? Generally, when a program
writes to a file, we rely on the operating system to decide when that
data should be written back to disk. We have to override that
distinction for things internal to PostgreSQL because we need certain
bits of data to reach the disk in a certain order, but it's unclear to
me how far outside the core database system we want to extend that.
Are we going to have psql fsync() data it writes to a file when \o is
used, for example? That would seem to me to be beyond insane, because
we have no idea whether the user actually needs that file to be
durable. It is a better bet that a pg_dump command's output needs
durability, of course, but I feel that we shouldn't just go disabling
the filesystem cache one program at a time without some guiding
principle.
FWIW, I find the premise pretty dubious. People don't normally expect
programs to fsync their standard output, and the argument that pg_dump's
output is always critical data doesn't withstand inspection. Also,
I don't understand what pg_dump should do if it fails to fsync. There
are too many cases where that would fail (eg, output piped to a program)
for it to be treated as an error condition. But if it isn't reported as
an error, then how much durability guarantee are we really adding?
I think this might be better addressed by adding something to backup.sgml
pointing out that you'd better fsync or sync your backups before assuming
that they can't be lost.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 9, 2016 at 5:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Kyotaro Horiguchi set this patch to "Ready for Committer" in the
CommitFest application, but that seems entirely premature to me
considering that v2 has had no review at all, or at least not that I
see on this thread. I'm setting it back to "Needs Review".
That's indeed too early. Thanks for correcting.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 9, 2016 at 8:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
First question: Do we even want this? Generally, when a program
writes to a file, we rely on the operating system to decide when that
data should be written back to disk. We have to override that
distinction for things internal to PostgreSQL because we need certain
bits of data to reach the disk in a certain order, but it's unclear to
me how far outside the core database system we want to extend that.
Are we going to have psql fsync() data it writes to a file when \o is
used, for example? That would seem to me to be beyond insane, because
we have no idea whether the user actually needs that file to be
durable. It is a better bet that a pg_dump command's output needs
durability, of course, but I feel that we shouldn't just go disabling
the filesystem cache one program at a time without some guiding
principle.FWIW, I find the premise pretty dubious. People don't normally expect
programs to fsync their standard output, and the argument that pg_dump's
output is always critical data doesn't withstand inspection. Also,
I don't understand what pg_dump should do if it fails to fsync. There
are too many cases where that would fail (eg, output piped to a program)
for it to be treated as an error condition. But if it isn't reported as
an error, then how much durability guarantee are we really adding?
If the output is piped to a program, there is no way to guarantee that
the data will be flushed and the user is responsible for that. We
cannot control all the use cases. The same applies for example with
pg_basebackup where the data is sent to stdout. IMO, the limit set is
that tools aimed at taking physical and logical backups should do a
better effort in the cases where they can do it. That's a cheap
insurance.
Based on this past thread, it seems to me that Magnus, Andres and Jim
Nasby are of the opinion that making things is useful:
/messages/by-id/20160327233033.GD20662@awork2.anarazel.de
And so do I.
I think this might be better addressed by adding something to backup.sgml
pointing out that you'd better fsync or sync your backups before assuming
that they can't be lost.
Perhaps. That would be better than nothing at least, but that won't
help for cases where we can help a bit.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.v2 is attached, fixing those issues.
The patch applies and compiles fine.
I have tested it on Linux and MinGW and could see the fsync(2) and
FlushFileBuffers calls I expected.
This adds crash safety for a reasonable price, and I think we should have that.
The documentation additions are sufficient.
Looking through the patch, I had two questions that are more about
style and consistency than anything else:
- In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
the return code is ignored, but not anywhere else. Is that by design?
- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
pg_dump (because -N is already taken there).
I'd opt for either using the same short option for both or (IMO better)
only offering a long option for both.
This would avoid confusion, and we expect that few people will want to use
this option anyway, right?
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
- In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
the return code is ignored, but not anywhere else. Is that by design?
Right. The patch is lacking consistency in this area. The main thought
regarding this design is to not consider a fsync failure as critical,
and just issue a warning in stderr. I have updated the two other call
sites with a (void) cast.
- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
pg_dump (because -N is already taken there).
I'd opt for either using the same short option for both or (IMO better)
only offering a long option for both.
Okay. For consistency's sake let's do that. I was a bit hesitant
regarding that to be honest.
This would avoid confusion, and we expect that few people will want to use
this option anyway, right?
Definitely a good point.
--
Michael
Attachments:
pgdump-sync-v3.patchtext/plain; charset=US-ASCII; name=pgdump-sync-v3.patchDownload+105-9
On Sat, Nov 12, 2016 at 1:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Nov 11, 2016 at 11:03 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
This would avoid confusion, and we expect that few people will want to use
this option anyway, right?Definitely a good point.
Meh. I forgot docs and --help output updates.
--
Michael
Attachments:
pgdump-sync-v4.patchapplication/x-patch; name=pgdump-sync-v4.patchDownload+104-9
On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
pg_dump (because -N is already taken there).
I'd opt for either using the same short option for both or (IMO better)
only offering a long option for both.Okay. For consistency's sake let's do that. I was a bit hesitant
regarding that to be honest.
Seems like you have missed to remove -N at some places, as mentioned below.
1.
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -365,6 +365,21 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-N</option></term>
@@ -543,6 +557,7 @@ help(void)
2.
printf(_("\nGeneral options:\n"));
printf(_(" -f, --file=FILENAME output file name\n"));
+ printf(_(" -N, --no-sync do not wait for changes to
be written safely to disk\n"));
3.
- while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx",
long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx",
long_options, &optindex)) != -1)
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Nov 13, 2016 at 12:17 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Nov 12, 2016 at 10:16 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
pg_dump (because -N is already taken there).
I'd opt for either using the same short option for both or (IMO better)
only offering a long option for both.Okay. For consistency's sake let's do that. I was a bit hesitant
regarding that to be honest.Seems like you have missed to remove -N at some places, as mentioned below.
1. --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -365,6 +365,21 @@ PostgreSQL documentation </varlistentry><varlistentry>
+ <term><option>-N</option></term>@@ -543,6 +557,7 @@ help(void)
2.
printf(_("\nGeneral options:\n"));
printf(_(" -f, --file=FILENAME output file name\n"));
+ printf(_(" -N, --no-sync do not wait for changes to
be written safely to disk\n"));
v4 fixed those two places.
3. - while ((c = getopt_long(argc, argv, "acd:f:gh:l:oOp:rsS:tU:vwWx", long_options, &optindex)) != -1) + while ((c = getopt_long(argc, argv, "acd:f:gh:l:NoOp:rsS:tU:vwWx", long_options, &optindex)) != -1)
But not this one...
--
Michael
Attachments:
pgdump-sync-v5.patchapplication/x-patch; name=pgdump-sync-v5.patchDownload+103-8
On 11/8/16 3:48 PM, Robert Haas wrote:
First question: Do we even want this? Generally, when a program
writes to a file, we rely on the operating system to decide when that
data should be written back to disk. We have to override that
distinction for things internal to PostgreSQL because we need certain
bits of data to reach the disk in a certain order, but it's unclear to
me how far outside the core database system we want to extend that.
I had voiced a similar concern previously:
/messages/by-id/f8dff810-f5f4-77c3-933b-127df4ed94e5@2ndquadrant.com
At the time, there were no other comments, so we went ahead with it,
which presumably gave encouragement to pursue the current patch.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
I think this might be better addressed by adding something to backup.sgml
pointing out that you'd better fsync or sync your backups before assuming
that they can't be lost.
How does a normal user do that? I don't think there's a cross-platform
advice we can give, and even on *nix the answer basically is 'sync;
sync;' which is a pretty big hammer, and might be completely
unacceptable on a busy server.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
Meh. I forgot docs and --help output updates.
Looks good, except that you left the "N" option in the getopt_long
call for pg_dumpall. I fixed that in the attached patch.
I'll mark the patch "ready for committer".
Yours,
Laurenz Albe
Attachments:
pgdump-sync-v5.patchapplication/octet-stream; name=pgdump-sync-v5.patchDownload+103-8
On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Michael Paquier wrote:
Meh. I forgot docs and --help output updates.
Looks good, except that you left the "N" option in the getopt_long
call for pg_dumpall. I fixed that in the attached patch.
No, v5 has removed it, but it does not matter much now...
I'll mark the patch "ready for committer".
Thanks!
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Nov 13, 2016 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-11-08 18:18:01 -0500, Tom Lane wrote:
I think this might be better addressed by adding something to backup.sgml
pointing out that you'd better fsync or sync your backups before assuming
that they can't be lost.How does a normal user do that? I don't think there's a cross-platform
advice we can give, and even on *nix the answer basically is 'sync;
sync;' which is a pretty big hammer, and might be completely
unacceptable on a busy server.
Yeah, that's a pretty fair point. I see the point of this patch
pretty clearly but somehow it makes me nervous anyway. I'm not sure
there's any better alternative to what's being proposed, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Michael Paquier wrote:
Meh. I forgot docs and --help output updates.
Looks good, except that you left the "N" option in the getopt_long
call for pg_dumpall. I fixed that in the attached patch.No, v5 has removed it, but it does not matter much now...
I'll mark the patch "ready for committer".
Thanks!
Moved to CF 2017-01.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers