should frontend tools use syncfs() ?

Started by Justin Pryzbyover 4 years ago50 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Forking this thread in which Thomas implemented syncfs for the startup process
(61752afb2).
/messages/by-id/CA+hUKG+SG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A@mail.gmail.com

Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind shouldn't
use syncfs() ?

do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented in
common.

They can't use the GUC, so need to add an cmdline option or look at an
environment variable.

Attachments:

0001-Allow-use-of-syncfs-in-frontend-tools.patchtext/x-diff; charset=us-asciiDownload+83-9
#2Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#1)
Re: should frontend tools use syncfs() ?

On Wed, Sep 29, 2021 at 07:43:41PM -0500, Justin Pryzby wrote:

Forking this thread in which Thomas implemented syncfs for the startup process
(61752afb2).
/messages/by-id/CA+hUKG+SG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A@mail.gmail.com

Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind shouldn't
use syncfs() ?

That makes sense.

do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented in
common.

The fd handling in the backend makes things tricky if trying to plug
in a common interface, so I'd rather do that as this is frontend-only
code.

They can't use the GUC, so need to add an cmdline option or look at an
environment variable.

fsync_pgdata() is going to manipulate many inodes anyway, because
that's a code path designed to do so. If we know that syncfs() is
just going to be better, I'd rather just call it by default if
available and not add new switches to all the frontend tools in need
of flushing the data folder, switches that are not documented in your
patch.
--
Michael

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#2)
Re: should frontend tools use syncfs() ?

On Thu, Sep 30, 2021 at 4:49 PM Michael Paquier <michael@paquier.xyz> wrote:

fsync_pgdata() is going to manipulate many inodes anyway, because
that's a code path designed to do so. If we know that syncfs() is
just going to be better, I'd rather just call it by default if
available and not add new switches to all the frontend tools in need
of flushing the data folder, switches that are not documented in your
patch.

If we want this it should be an option, because it flushes out data
other than the pgdata dir, and it doesn't report errors on old
kernels.

#4Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#3)
Re: should frontend tools use syncfs() ?

On Thu, Sep 30, 2021 at 05:08:24PM +1300, Thomas Munro wrote:

If we want this it should be an option, because it flushes out data
other than the pgdata dir, and it doesn't report errors on old
kernels.

Oh, OK, thanks. That's the part about 5.8. The only option
controlling if sync is used now in those binaries is --no-sync.
Should we use a different design for the option rather than a
--syncfs? Something like --sync={on,off,syncfs,fsync} could be a
possibility, for example.
--
Michael

#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Thomas Munro (#3)
Re: should frontend tools use syncfs() ?

On Thu, Sep 30, 2021 at 05:08:24PM +1300, Thomas Munro wrote:

On Thu, Sep 30, 2021 at 4:49 PM Michael Paquier <michael@paquier.xyz> wrote:

fsync_pgdata() is going to manipulate many inodes anyway, because
that's a code path designed to do so. If we know that syncfs() is
just going to be better, I'd rather just call it by default if
available and not add new switches to all the frontend tools in need
of flushing the data folder, switches that are not documented in your
patch.

If we want this it should be an option, because it flushes out data
other than the pgdata dir, and it doesn't report errors on old
kernels.

I ran into bad performance of initdb --sync-only shortly after adding it to my
db migration script, so added initdb --syncfs.

I found that with sufficiently recent coreutils, I can do what's wanted by calling
/bin/sync -f /datadir

Since it's not integrated into initdb, it's necessary to include each
tablespace and wal.

--
Justin

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#2)
Re: should frontend tools use syncfs() ?

On Thu, Sep 30, 2021 at 12:49:36PM +0900, Michael Paquier wrote:

On Wed, Sep 29, 2021 at 07:43:41PM -0500, Justin Pryzby wrote:

Forking this thread in which Thomas implemented syncfs for the startup process
(61752afb2).
/messages/by-id/CA+hUKG+SG9jSW3ekwib0cSdC0yD-jReJ21X4bZAmqxoWTLTc2A@mail.gmail.com

Is there any reason that initdb/pg_basebackup/pg_checksums/pg_rewind shouldn't
use syncfs() ?

That makes sense.

do_syncfs() is in src/backend/ so would need to be duplicated^Wimplemented in
common.

The fd handling in the backend makes things tricky if trying to plug
in a common interface, so I'd rather do that as this is frontend-only
code.

They can't use the GUC, so need to add an cmdline option or look at an
environment variable.

fsync_pgdata() is going to manipulate many inodes anyway, because
that's a code path designed to do so. If we know that syncfs() is
just going to be better, I'd rather just call it by default if
available and not add new switches to all the frontend tools in need
of flushing the data folder, switches that are not documented in your
patch.

It is a draft/POC, after all.

The argument against using syncfs by default is that it could be worse than
recursive fsync if a tiny 200MB postgres instance lives on a shared filesystem
along with other, larger applications (maybe a larger postgres instance).

There's also an argument that syncfs might be unreliable in the case of a write
error. (But I agreed with Thomas' earlier assessment: that claim caries little
weight since fsync() itself wasn't reliable for 20some years).

I didn't pursue this patch, as it's easier for me to use /bin/sync -f. Someone
should adopt it if interested.

--
Justin

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Justin Pryzby (#6)
Re: should frontend tools use syncfs() ?

On Wed, Apr 13, 2022 at 06:54:12AM -0500, Justin Pryzby wrote:

I didn't pursue this patch, as it's easier for me to use /bin/sync -f. Someone
should adopt it if interested.

I was about to start a new thread, but I found this one with some good
preliminary discussion. I came to the same conclusion about introducing a
new option instead of using syncfs() by default wherever it is available.
The attached patch is still a work-in-progress, but it seems to behave as
expected. I began investigating this because I noticed that the
sync-data-directory step on pg_upgrade takes quite a while when there are
many files, and I am looking for ways to reduce the amount of downtime
required for pg_upgrade.

The attached patch adds a new --sync-method option to the relevant frontend
utilities, but I am not wedded to that name/approach.

Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-allow-syncfs-in-frontend-utilities.patchtext/x-diff; charset=us-asciiDownload+168-11
#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#7)
Re: should frontend tools use syncfs() ?

On Sat, Jul 29, 2023 at 02:40:10PM -0700, Nathan Bossart wrote:

I was about to start a new thread, but I found this one with some good
preliminary discussion. I came to the same conclusion about introducing a
new option instead of using syncfs() by default wherever it is available.
The attached patch is still a work-in-progress, but it seems to behave as
expected. I began investigating this because I noticed that the
sync-data-directory step on pg_upgrade takes quite a while when there are
many files, and I am looking for ways to reduce the amount of downtime
required for pg_upgrade.

The attached patch adds a new --sync-method option to the relevant frontend
utilities, but I am not wedded to that name/approach.

Here is a new version of the patch with documentation updates and a couple
other small improvements.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-allow-syncfs-in-frontend-utilities.patchtext/x-diff; charset=us-asciiDownload+369-22
#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#8)
Re: should frontend tools use syncfs() ?

On Mon, Jul 31, 2023 at 10:51:38AM -0700, Nathan Bossart wrote:

Here is a new version of the patch with documentation updates and a couple
other small improvements.

I just realized I forgot to update the --help output for these utilities.
I'll do that in the next version of the patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#9)
Re: should frontend tools use syncfs() ?

On Mon, Jul 31, 2023 at 11:39:46AM -0700, Nathan Bossart wrote:

I just realized I forgot to update the --help output for these utilities.
I'll do that in the next version of the patch.

Done in v3. Sorry for the noise.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-allow-syncfs-in-frontend-utilities.patchtext/x-diff; charset=us-asciiDownload+376-22
#11Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#10)
Re: should frontend tools use syncfs() ?

I ran a couple of tests for pg_upgrade with 100k tables (created using the
script here [0]/messages/by-id/3612876.1689443232@sss.pgh.pa.us) in order to demonstrate the potential benefits of this
patch.

pg_upgrade --sync-method fsync
real 5m50.072s
user 0m10.606s
sys 0m40.298s

pg_upgrade --sync-method syncfs
real 3m44.096s
user 0m8.906s
sys 0m26.398s

pg_upgrade --no-sync
real 3m27.697s
user 0m9.056s
sys 0m26.605s

[0]: /messages/by-id/3612876.1689443232@sss.pgh.pa.us

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#11)
Re: should frontend tools use syncfs() ?

On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:

I ran a couple of tests for pg_upgrade with 100k tables (created using the
script here [0]) in order to demonstrate the potential benefits of this
patch.

That shows some nice numbers with many files, indeed. How does the
size of each file influence the difference in time?

+        else
+        {
+            while (errno = 0, (de = readdir(dir)) != NULL)
+            {
+                char        subpath[MAXPGPATH * 2];
+
+                if (strcmp(de->d_name, ".") == 0 ||
+                    strcmp(de->d_name, "..") == 0)
+                    continue;

It seems to me that there is no need to do that for in-place
tablespaces. There are relative paths in pg_tblspc/, so they would be
taken care of by the syncfs() done on the main data folder.

This does not really check if the mount points of each tablespace is
different, as well. For example, imagine that you have two
tablespaces within the same disk, syncfs() twice. Perhaps, the
current logic is OK anyway as long as the behavior is optional, but it
should be explained in the docs, at least.

I'm finding a bit confusing that fsync_pgdata() is coded in such a way
that it does a silent fallback to the cascading syncs through
walkdir() when syncfs is specified but not available in the build.
Perhaps an error is more helpful because one would then know that they
are trying something that's not around?

+       pg_log_error("could not synchronize file system for file \"%s\": %m", path);
+       (void) close(fd);
+       exit(EXIT_FAILURE);

walkdir() reports errors and does not consider these fatal. Why the
early exit()?

I am a bit concerned about the amount of duplication this patch
introduces in the docs. Perhaps this had better be moved into a new
section of the docs to explain the tradeoffs, with each tool linking
to it?

Do we actually need --no-sync at all if --sync-method is around? We
could have an extra --sync-method=none at option level with --no-sync
still around mainly for compatibility? Or perhaps that's just
over-designing things?
--
Michael

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#12)
Re: should frontend tools use syncfs() ?

Thanks for taking a look.

On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:

On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:

I ran a couple of tests for pg_upgrade with 100k tables (created using the
script here [0]) in order to demonstrate the potential benefits of this
patch.

That shows some nice numbers with many files, indeed. How does the
size of each file influence the difference in time?

IME the number of files tends to influence the duration much more than the
size. I assume this is because most files are already sync'd in these code
paths that loop through every file.

+        else
+        {
+            while (errno = 0, (de = readdir(dir)) != NULL)
+            {
+                char        subpath[MAXPGPATH * 2];
+
+                if (strcmp(de->d_name, ".") == 0 ||
+                    strcmp(de->d_name, "..") == 0)
+                    continue;

It seems to me that there is no need to do that for in-place
tablespaces. There are relative paths in pg_tblspc/, so they would be
taken care of by the syncfs() done on the main data folder.

This does not really check if the mount points of each tablespace is
different, as well. For example, imagine that you have two
tablespaces within the same disk, syncfs() twice. Perhaps, the
current logic is OK anyway as long as the behavior is optional, but it
should be explained in the docs, at least.

True. But I don't know if checking the mount point of each tablespace is
worth the complexity. In the worst case, we'll call syncfs() on the same
file system a few times, which is probably still much faster in most cases.
FWIW this is what recovery_init_sync_method does today, and I'm not aware
of any complaints about this behavior.

The patch does have the following note:

+        On Linux, <literal>syncfs</literal> may be used instead to ask the
+        operating system to synchronize the whole file systems that contain the
+        data directory, the WAL files, and each tablespace.

Do you think that is sufficient, or do you think we should really clearly
explain that you could end up calling syncfs() on the same file system a
few times if your tablespaces are on the same disk? I personally feel
like that'd be a bit too verbose for the already lengthy descriptions of
this setting.

I'm finding a bit confusing that fsync_pgdata() is coded in such a way
that it does a silent fallback to the cascading syncs through
walkdir() when syncfs is specified but not available in the build.
Perhaps an error is more helpful because one would then know that they
are trying something that's not around?

If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and
parse_sync_method() should fail if "syncfs" is specified. Furthermore, the
relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS
is not defined.

+       pg_log_error("could not synchronize file system for file \"%s\": %m", path);
+       (void) close(fd);
+       exit(EXIT_FAILURE);

walkdir() reports errors and does not consider these fatal. Why the
early exit()?

I know it claims to, but fsync_fname() does exit when fsync() fails:

returncode = fsync(fd);

/*
* Some OSes don't allow us to fsync directories at all, so we can ignore
* those errors. Anything else needs to be reported.
*/
if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL)))
{
pg_log_error("could not fsync file \"%s\": %m", fname);
(void) close(fd);
exit(EXIT_FAILURE);
}

I suspect that the current code does not treat failures for things like
open() as fatal because it's likely due to a lack of permissions on the
file, but it does treat failures to fsync() as fatal because it is more
likely to indicate that ѕomething is very wrong. I don't know whether this
reasoning is sound, but I tried to match the current convention in the
syncfs() code.

I am a bit concerned about the amount of duplication this patch
introduces in the docs. Perhaps this had better be moved into a new
section of the docs to explain the tradeoffs, with each tool linking
to it?

Yeah, this crossed my mind. Do you know of any existing examples of
options with links to a common section? One problem with this approach is
that there are small differences in the wording for some of the frontend
utilities, so it might be difficult to cleanly unite these sections.

Do we actually need --no-sync at all if --sync-method is around? We
could have an extra --sync-method=none at option level with --no-sync
still around mainly for compatibility? Or perhaps that's just
over-designing things?

I don't have a strong opinion. We could take up deprecating --no-sync in a
follow-up thread, though. Like you said, we'll probably need to keep it
around for backward compatibility, so it might not be worth the trouble.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#13)
Re: should frontend tools use syncfs() ?

On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote:

On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:

+       pg_log_error("could not synchronize file system for file \"%s\": %m", path);
+       (void) close(fd);
+       exit(EXIT_FAILURE);

walkdir() reports errors and does not consider these fatal. Why the
early exit()?

I know it claims to, but fsync_fname() does exit when fsync() fails:

returncode = fsync(fd);

/*
* Some OSes don't allow us to fsync directories at all, so we can ignore
* those errors. Anything else needs to be reported.
*/
if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL)))
{
pg_log_error("could not fsync file \"%s\": %m", fname);
(void) close(fd);
exit(EXIT_FAILURE);
}

I suspect that the current code does not treat failures for things like
open() as fatal because it's likely due to a lack of permissions on the
file, but it does treat failures to fsync() as fatal because it is more
likely to indicate that ѕomething is very wrong. I don't know whether this
reasoning is sound, but I tried to match the current convention in the
syncfs() code.

Ah, it looks like this code used to treat fsync() errors as non-fatal, but
it was changed in commit 1420617. I still find it a bit strange that some
errors that prevent a file from being sync'd are non-fatal while others
_are_ fatal, but that is probably a topic for another thread.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#14)
Re: should frontend tools use syncfs() ?

On Wed, Aug 16, 2023 at 08:23:25AM -0700, Nathan Bossart wrote:

Ah, it looks like this code used to treat fsync() errors as non-fatal, but
it was changed in commit 1420617. I still find it a bit strange that some
errors that prevent a file from being sync'd are non-fatal while others
_are_ fatal, but that is probably a topic for another thread.

Right. That rings a bell.
--
Michael

#16Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#13)
Re: should frontend tools use syncfs() ?

On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote:

On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:

On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:
+        else
+        {
+            while (errno = 0, (de = readdir(dir)) != NULL)
+            {
+                char        subpath[MAXPGPATH * 2];
+
+                if (strcmp(de->d_name, ".") == 0 ||
+                    strcmp(de->d_name, "..") == 0)
+                    continue;

It seems to me that there is no need to do that for in-place
tablespaces. There are relative paths in pg_tblspc/, so they would be
taken care of by the syncfs() done on the main data folder.

This does not really check if the mount points of each tablespace is
different, as well. For example, imagine that you have two
tablespaces within the same disk, syncfs() twice. Perhaps, the
current logic is OK anyway as long as the behavior is optional, but it
should be explained in the docs, at least.

True. But I don't know if checking the mount point of each tablespace is
worth the complexity.

Perhaps worth a note, this would depend on statvfs(), which is not
that portable the last time I looked at it (NetBSD, some BSD-ish? And
of course WIN32).

In the worst case, we'll call syncfs() on the same
file system a few times, which is probably still much faster in most cases.
FWIW this is what recovery_init_sync_method does today, and I'm not aware
of any complaints about this behavior.

Hmm. Okay.

The patch does have the following note:

+        On Linux, <literal>syncfs</literal> may be used instead to ask the
+        operating system to synchronize the whole file systems that contain the
+        data directory, the WAL files, and each tablespace.

Do you think that is sufficient, or do you think we should really clearly
explain that you could end up calling syncfs() on the same file system a
few times if your tablespaces are on the same disk? I personally feel
like that'd be a bit too verbose for the already lengthy descriptions of
this setting.

It does not hurt to mention that the code syncfs()-es each tablespace
path (not in-place tablespaces), ignoring locations that share the
same mounting point, IMO. For that, we'd better rely on
get_dirent_type() like the normal sync path.

I'm finding a bit confusing that fsync_pgdata() is coded in such a way
that it does a silent fallback to the cascading syncs through
walkdir() when syncfs is specified but not available in the build.
Perhaps an error is more helpful because one would then know that they
are trying something that's not around?

If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and
parse_sync_method() should fail if "syncfs" is specified. Furthermore, the
relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS
is not defined.

That feels structurally inconsistent with what we do with other
option sets that have library dependencies. For example, look at
compression.h and what happens for pg_compress_algorithm. So, it
seems to me that it would be more friendly to list SYNC_METHOD_SYNCFS
all the time in SyncMethod even if HAVE_SYNCFS is not around, and at
least generate a warning rather than having a platform-dependent set
of options?

SyncMethod may be a bit too generic as name for the option structure.
How about a PGSyncMethod or pg_sync_method?

I am a bit concerned about the amount of duplication this patch
introduces in the docs. Perhaps this had better be moved into a new
section of the docs to explain the tradeoffs, with each tool linking
to it?

Yeah, this crossed my mind. Do you know of any existing examples of
options with links to a common section? One problem with this approach is
that there are small differences in the wording for some of the frontend
utilities, so it might be difficult to cleanly unite these sections.

The closest thing I can think of is Color Support in section
Appendixes, that describes something shared across a lot of binaries
(that would be 6 tools with this patch).

Do we actually need --no-sync at all if --sync-method is around? We
could have an extra --sync-method=none at option level with --no-sync
still around mainly for compatibility? Or perhaps that's just
over-designing things?

I don't have a strong opinion. We could take up deprecating --no-sync in a
follow-up thread, though. Like you said, we'll probably need to keep it
around for backward compatibility, so it might not be worth the trouble.

Okay, maybe that's not worth it.
--
Michael

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#16)
Re: should frontend tools use syncfs() ?

On Thu, Aug 17, 2023 at 12:50:31PM +0900, Michael Paquier wrote:

On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote:

The patch does have the following note:

+        On Linux, <literal>syncfs</literal> may be used instead to ask the
+        operating system to synchronize the whole file systems that contain the
+        data directory, the WAL files, and each tablespace.

Do you think that is sufficient, or do you think we should really clearly
explain that you could end up calling syncfs() on the same file system a
few times if your tablespaces are on the same disk? I personally feel
like that'd be a bit too verbose for the already lengthy descriptions of
this setting.

It does not hurt to mention that the code syncfs()-es each tablespace
path (not in-place tablespaces), ignoring locations that share the
same mounting point, IMO. For that, we'd better rely on
get_dirent_type() like the normal sync path.

But it doesn't ignore tablespace locations that share the same mount point.
It simply calls syncfs() for each tablespace path, just like
recovery_init_sync_method.

If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and
parse_sync_method() should fail if "syncfs" is specified. Furthermore, the
relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS
is not defined.

That feels structurally inconsistent with what we do with other
option sets that have library dependencies. For example, look at
compression.h and what happens for pg_compress_algorithm. So, it
seems to me that it would be more friendly to list SYNC_METHOD_SYNCFS
all the time in SyncMethod even if HAVE_SYNCFS is not around, and at
least generate a warning rather than having a platform-dependent set
of options?

Done.

SyncMethod may be a bit too generic as name for the option structure.
How about a PGSyncMethod or pg_sync_method?

In v4, I renamed this to DataDirSyncMethod and merged it with
RecoveryInitSyncMethod. I'm not wedded to the name, but that seemed
generic enough for both use-cases. As an aside, we need to be careful to
distinguish these options from those for wal_sync_method.

Yeah, this crossed my mind. Do you know of any existing examples of
options with links to a common section? One problem with this approach is
that there are small differences in the wording for some of the frontend
utilities, so it might be difficult to cleanly unite these sections.

The closest thing I can think of is Color Support in section
Appendixes, that describes something shared across a lot of binaries
(that would be 6 tools with this patch).

If I added a "syncfs() Caveats" appendix for the common parts of the docs,
it would only say something like the following:

Using syncfs may be a lot faster than using fsync, because it doesn't
need to open each file one by one. On the other hand, it may be slower
if a file system is shared by other applications that modify a lot of
files, since those files will also be written to disk. Furthermore, on
versions of Linux before 5.8, I/O errors encountered while writing data
to disk may not be reported to the calling program, and relevant error
messages may appear only in kernel logs.

Does that seem reasonable? It would reduce the duplication a little bit,
but I'm not sure it's really much of an improvement in this case.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-allow-syncfs-in-frontend-utilities.patchtext/x-diff; charset=us-asciiDownload+388-33
#18Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#16)
Re: should frontend tools use syncfs() ?

On Wed, Aug 16, 2023 at 11:50 PM Michael Paquier <michael@paquier.xyz> wrote:

Do we actually need --no-sync at all if --sync-method is around? We
could have an extra --sync-method=none at option level with --no-sync
still around mainly for compatibility? Or perhaps that's just
over-designing things?

I don't have a strong opinion. We could take up deprecating --no-sync in a
follow-up thread, though. Like you said, we'll probably need to keep it
around for backward compatibility, so it might not be worth the trouble.

Okay, maybe that's not worth it.

Doesn't seem worth it to me. I think --no-sync is more intuitive than
--sync-method=none, it's certainly shorter, and it's a pretty
important setting because we use it when running the regression tests.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#18)
Re: should frontend tools use syncfs() ?

On Mon, Aug 21, 2023 at 04:08:46PM -0400, Robert Haas wrote:

Doesn't seem worth it to me. I think --no-sync is more intuitive than
--sync-method=none, it's certainly shorter, and it's a pretty
important setting because we use it when running the regression tests.

No arguments against that ;)
--
Michael

#20Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#17)
Re: should frontend tools use syncfs() ?

On Fri, Aug 18, 2023 at 09:01:11AM -0700, Nathan Bossart wrote:

On Thu, Aug 17, 2023 at 12:50:31PM +0900, Michael Paquier wrote:

SyncMethod may be a bit too generic as name for the option structure.
How about a PGSyncMethod or pg_sync_method?

In v4, I renamed this to DataDirSyncMethod and merged it with
RecoveryInitSyncMethod. I'm not wedded to the name, but that seemed
generic enough for both use-cases. As an aside, we need to be careful to
distinguish these options from those for wal_sync_method.

Okay.

Yeah, this crossed my mind. Do you know of any existing examples of
options with links to a common section? One problem with this approach is
that there are small differences in the wording for some of the frontend
utilities, so it might be difficult to cleanly unite these sections.

The closest thing I can think of is Color Support in section
Appendixes, that describes something shared across a lot of binaries
(that would be 6 tools with this patch).

If I added a "syncfs() Caveats" appendix for the common parts of the docs,
it would only say something like the following:

Using syncfs may be a lot faster than using fsync, because it doesn't
need to open each file one by one. On the other hand, it may be slower
if a file system is shared by other applications that modify a lot of
files, since those files will also be written to disk. Furthermore, on
versions of Linux before 5.8, I/O errors encountered while writing data
to disk may not be reported to the calling program, and relevant error
messages may appear only in kernel logs.

Does that seem reasonable? It would reduce the duplication a little bit,
but I'm not sure it's really much of an improvement in this case.

This would cut 60% (?) of the documentation added by the patch for
these six tools, so that looks like an improvement to me. Perhaps
other may disagree, so more opinions are welcome.

--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -43,15 +43,11 @@
 #ifndef FD_H
 #define FD_H
+#ifndef FRONTEND
+
 #include <dirent.h>
 #include <fcntl.h>

Ugh. So you need this part because pg_rewind's filemap.c includes
fd.h, and pg_rewind also needs file_utils.h. This is not the fault of
your patch, but this does not make the situation better, either.. It
looks like we need to think harder about this layer. An improvement
would be to split file_utils.c so as its frontend-only code is moved
to OBJS_FRONTEND in a new file with a new header? It should be OK to
keep DataDirSyncMethod in file_utils.h as long as the split is clean.
--
Michael

#21Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#21)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#23)
#25Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#24)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#26)
#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#28)
#30Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#29)
#31Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#30)
#32Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#31)
#33Justin Pryzby
pryzby@telsasoft.com
In reply to: Nathan Bossart (#7)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: Justin Pryzby (#33)
#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Nathan Bossart (#34)
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Justin Pryzby (#35)
#37Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#36)
#38Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#37)
#39Maxim Orlov
orlovmg@gmail.com
In reply to: Nathan Bossart (#38)
#40Nathan Bossart
nathandbossart@gmail.com
In reply to: Maxim Orlov (#39)
#41Maxim Orlov
orlovmg@gmail.com
In reply to: Nathan Bossart (#40)
#42Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#16)
#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#42)
#44Maxim Orlov
orlovmg@gmail.com
In reply to: Nathan Bossart (#43)
#45Nathan Bossart
nathandbossart@gmail.com
In reply to: Maxim Orlov (#44)
#46Maxim Orlov
orlovmg@gmail.com
In reply to: Nathan Bossart (#45)
#47Nathan Bossart
nathandbossart@gmail.com
In reply to: Maxim Orlov (#46)
#48Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#47)
#49Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#43)
#50Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#48)