remove more archiving overhead

Started by Nathan Bossartalmost 4 years ago39 messages

nathandbossart@gmail.com

almost 4 years ago

1 attachment(s)

Hi hackers,

With the addition of archive modules (5ef1eef) and other archiving
improvements (e.g., beb4e9b), the amount of archiving overhead has been
reduced. I spent some time measuring the remaining overhead, and the
following function stood out:

/*
* pgarch_archiveDone
*
* Emit notification that an xlog file has been successfully archived.
* We do this by renaming the status file from NNN.ready to NNN.done.
* Eventually, a checkpoint process will notice this and delete both the
* NNN.done file and the xlog file itself.
*/
static void
pgarch_archiveDone(char *xlog)
{
char rlogready[MAXPGPATH];
char rlogdone[MAXPGPATH];

StatusFilePath(rlogready, xlog, ".ready");
StatusFilePath(rlogdone, xlog, ".done");
(void) durable_rename(rlogready, rlogdone, WARNING);
}

Specifically, the call to durable_rename(), which calls fsync() a few
times, can cause a decent delay. On my machine, archiving ~12k WAL files
using a bare-bones archive module that did nothing but return true took
over a minute. When I changed this to rename() (i.e., reverting the change
to pgarch.c in 1d4a0ab), the same test took a second or less.

I also spent some time investigating whether durably renaming the archive
status files was even necessary. In theory, it shouldn't be. If a crash
happens before the rename is persisted, we might try to re-archive files,
but your archive_command/archive_library should handle that. If the file
was already recycled or removed, the archiver will skip it (thanks to
6d8727f). But when digging further, I found that WAL file recyling uses
durable_rename_excl(), which has the following note:

* Note that a crash in an unfortunate moment can leave you with two links to
* the target file.

IIUC this means that in theory, a crash at an unfortunate moment could
leave you with a .ready file, the file to archive, and another link to that
file with a "future" WAL filename. If you re-archive the file after it has
been reused, you could end up with corrupt WAL archives. I think this
might already be possible today. Syncing the directory after every rename
probably reduces the likelihood quite substantially, but if
RemoveOldXlogFiles() quickly picks up the .done file and attempts
recycling before durable_rename() calls fsync() on the directory,
presumably the same problem could occur.

I believe the current recommendation is to fail if the archive file already
exists. The basic_archive contrib module fails if the archive already
exists and has different contents, and it succeeds without re-archiving if
it already exists with identical contents. Since something along these
lines is recommended as a best practice, perhaps this is okay. But I think
we might be able to do better. If we ensure that the archive_status
directory is sync'd prior to recycling/removing a WAL file, I believe we
are safe from the aforementioned problem.

The drawback of this is that we are essentially offloading more work to the
checkpointer process. In addition to everything else it must do, it will
also need to fsync() the archive_status directory many times. I'm not sure
how big of a problem this is. It should be good to ensure that archiving
keeps up, and there are far more expensive tasks that the checkpointer must
perform that could make the added fsync() calls relatively insignificant.

I've attached a patch that makes the changes discussed above. Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-remove-more-archiving-overhead.patchtext/x-diff; charset=us-asciiDownload

From a6d033aff90a6218345cba41847ccfdfbe6447d7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Mon, 21 Feb 2022 16:29:14 -0800
Subject: [PATCH v1 1/1] remove more archiving overhead

---
 src/backend/access/transam/xlogarchive.c | 15 +++++++++++++--
 src/backend/postmaster/pgarch.c          | 13 ++++++++++++-
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index a2657a2005..d6823c9c38 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -585,8 +585,13 @@ XLogArchiveForceDone(const char *xlog)
  * If it is not time to delete it, make sure a .ready file exists, and return
  * false.
  *
- * If <XLOG>.done exists, then return true; else if <XLOG>.ready exists,
- * then return false; else create <XLOG>.ready and return false.
+ * If <XLOG>.done exists, then sync the archive_status directory to disk and
+ * return true.  Syncing the directory ensures that the rename from
+ * <XLOG>.ready to <XLOG>.done is persisted so that we won't attempt to re-
+ * archive the file after a crash.
+ *
+ * If <XLOG>.ready exists, then return false.  If neither <XLOG>.ready nor
+ * <XLOG>.done exist, create <XLOG>.ready and return false.
  *
  * The reason we do things this way is so that if the original attempt to
  * create <XLOG>.ready fails, we'll retry during subsequent checkpoints.
@@ -618,7 +623,10 @@ XLogArchiveCheckDone(const char *xlog)
 	/* First check for .done --- this means archiver is done with it */
 	StatusFilePath(archiveStatusPath, xlog, ".done");
 	if (stat(archiveStatusPath, &stat_buf) == 0)
+	{
+		fsync_fname(XLOGDIR "/archive_status", true);
 		return true;
+	}
 
 	/* check for .ready --- this means archiver is still busy with it */
 	StatusFilePath(archiveStatusPath, xlog, ".ready");
@@ -628,7 +636,10 @@ XLogArchiveCheckDone(const char *xlog)
 	/* Race condition --- maybe archiver just finished, so recheck */
 	StatusFilePath(archiveStatusPath, xlog, ".done");
 	if (stat(archiveStatusPath, &stat_buf) == 0)
+	{
+		fsync_fname(XLOGDIR "/archive_status", true);
 		return true;
+	}
 
 	/* Retry creation of the .ready file */
 	XLogArchiveNotify(xlog);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..44cb58e1f4 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -746,7 +746,18 @@ pgarch_archiveDone(char *xlog)
 
 	StatusFilePath(rlogready, xlog, ".ready");
 	StatusFilePath(rlogdone, xlog, ".done");
-	(void) durable_rename(rlogready, rlogdone, WARNING);
+
+	/*
+	 * To reduce overhead, we don't durably rename the .ready file to .done
+	 * here.  Whoever recycles or removes the WAL file is responsible for first
+	 * persisting the archive_status directory to disk so that we won't attempt
+	 * to re-archive the file after a crash.
+	 */
+	if (rename(rlogready, rlogdone) < 0)
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not rename file \"%s\" to \"%s\": %m",
+						rlogready, rlogdone)));
 }
 
 
-- 
2.25.1

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#1)

Re: remove more archiving overhead

On Mon, Feb 21, 2022 at 05:19:48PM -0800, Nathan Bossart wrote:

I also spent some time investigating whether durably renaming the archive
status files was even necessary. In theory, it shouldn't be. If a crash
happens before the rename is persisted, we might try to re-archive files,
but your archive_command/archive_library should handle that. If the file
was already recycled or removed, the archiver will skip it (thanks to
6d8727f). But when digging further, I found that WAL file recyling uses
durable_rename_excl(), which has the following note:

* Note that a crash in an unfortunate moment can leave you with two links to
* the target file.

IIUC this means that in theory, a crash at an unfortunate moment could
leave you with a .ready file, the file to archive, and another link to that
file with a "future" WAL filename. If you re-archive the file after it has
been reused, you could end up with corrupt WAL archives. I think this
might already be possible today. Syncing the directory after every rename
probably reduces the likelihood quite substantially, but if
RemoveOldXlogFiles() quickly picks up the .done file and attempts
recycling before durable_rename() calls fsync() on the directory,
presumably the same problem could occur.

In my testing, I found that when I killed the server just before unlink()
during WAL recyling, I ended up with links to the same file in pg_wal after
restarting. My latest test produced links to the same file for the current
WAL file and the next one. Maybe WAL recyling should use durable_rename()
instead of durable_rename_excl().

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#2)

1 attachment(s)

Re: remove more archiving overhead

On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote:

In my testing, I found that when I killed the server just before unlink()
during WAL recyling, I ended up with links to the same file in pg_wal after
restarting. My latest test produced links to the same file for the current
WAL file and the next one. Maybe WAL recyling should use durable_rename()
instead of durable_rename_excl().

Here is an updated patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-reduce-archiving-overhead.patchtext/x-diff; charset=us-asciiDownload

From e080d493985cf0a203122c1e07952b0f765019e4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Tue, 22 Feb 2022 11:39:28 -0800
Subject: [PATCH v2 1/1] reduce archiving overhead

---
 src/backend/access/transam/xlog.c | 10 ++++++----
 src/backend/postmaster/pgarch.c   | 14 +++++++++++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..2ad047052f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3334,13 +3334,15 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	}
 
 	/*
-	 * Perform the rename using link if available, paranoidly trying to avoid
-	 * overwriting an existing file (there shouldn't be one).
+	 * Perform the rename.  Ideally, we'd use link() and unlink() to avoid
+	 * overwriting an existing file (there shouldn't be one).  However, that
+	 * approach opens up the possibility that pg_wal will contain multiple hard
+	 * links to the same WAL file after a crash.
 	 */
-	if (durable_rename_excl(tmppath, path, LOG) != 0)
+	if (durable_rename(tmppath, path, LOG) != 0)
 	{
 		LWLockRelease(ControlFileLock);
-		/* durable_rename_excl already emitted log message */
+		/* durable_rename already emitted log message */
 		return false;
 	}
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index d916ed39a8..641297e9f5 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -746,7 +746,19 @@ pgarch_archiveDone(char *xlog)
 
 	StatusFilePath(rlogready, xlog, ".ready");
 	StatusFilePath(rlogdone, xlog, ".done");
-	(void) durable_rename(rlogready, rlogdone, WARNING);
+
+	/*
+	 * To avoid extra overhead, we don't durably rename the .ready file to
+	 * .done.  Archive commands and libraries must already gracefully handle
+	 * attempts to re-archive files (e.g., if the server crashes just before
+	 * this function is called), so it should be okay if the .ready file
+	 * reappears after a crash.
+	 */
+	if (rename(rlogready, rlogdone) < 0)
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not rename file \"%s\" to \"%s\": %m",
+						rlogready, rlogdone)));
 }
 
 
-- 
2.25.1

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#1)

Re: remove more archiving overhead

At Mon, 21 Feb 2022 17:19:48 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

I also spent some time investigating whether durably renaming the archive
status files was even necessary. In theory, it shouldn't be. If a crash
happens before the rename is persisted, we might try to re-archive files,
but your archive_command/archive_library should handle that. If the file

I believe we're trying to archive a WAL segment once and only once,
even though actually we fail in certain cases.

https://www.postgresql.org/docs/14/continuous-archiving.html
| The archive command should generally be designed to refuse to
| overwrite any pre-existing archive file. This is an important safety
| feature to preserve the integrity of your archive in case of
| administrator error (such as sending the output of two different
| servers to the same archive directory).

The recommended behavior of archive_command above is to avoid this
kind of corruption. If we officially admit that a WAL segment can be
archive twice, we need to revise that part (and the following part) of
the doc.

was already recycled or removed, the archiver will skip it (thanks to
6d8727f). But when digging further, I found that WAL file recyling uses
durable_rename_excl(), which has the following note:

* Note that a crash in an unfortunate moment can leave you with two links to
* the target file.

IIUC this means that in theory, a crash at an unfortunate moment could
leave you with a .ready file, the file to archive, and another link to that
file with a "future" WAL filename. If you re-archive the file after it has
been reused, you could end up with corrupt WAL archives. I think this

IMHO the difference would be that the single system call (maybe)
cannot be interrupted by sigkill. So I'm not sure that that is worth
considering. The sigkill window can be elimianted if we could use
renameat(RENAME_NOREPLACE). But finally power-loss can cause that.

might already be possible today. Syncing the directory after every rename
probably reduces the likelihood quite substantially, but if
RemoveOldXlogFiles() quickly picks up the .done file and attempts
recycling before durable_rename() calls fsync() on the directory,
presumably the same problem could occur.

If we don't fsync at every rename, we don't even need for
RemoveOldXlogFiles to pickup .done very quickly. Isn't it a big
difference?

I believe the current recommendation is to fail if the archive file already
exists. The basic_archive contrib module fails if the archive already
exists and has different contents, and it succeeds without re-archiving if
it already exists with identical contents. Since something along these
lines is recommended as a best practice, perhaps this is okay. But I think
we might be able to do better. If we ensure that the archive_status
directory is sync'd prior to recycling/removing a WAL file, I believe we
are safe from the aforementioned problem.

The drawback of this is that we are essentially offloading more work to the
checkpointer process. In addition to everything else it must do, it will
also need to fsync() the archive_status directory many times. I'm not sure
how big of a problem this is. It should be good to ensure that archiving
keeps up, and there are far more expensive tasks that the checkpointer must
perform that could make the added fsync() calls relatively insignificant.

I think we don't want make checkpointer slower in exchange of making
archiver faster. Perhaps we need to work using a condensed
information rather than repeatedly scanning over the distributed
information like archive_status?

At Tue, 22 Feb 2022 11:52:29 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote:

In my testing, I found that when I killed the server just before unlink()
during WAL recyling, I ended up with links to the same file in pg_wal after
restarting. My latest test produced links to the same file for the current
WAL file and the next one. Maybe WAL recyling should use durable_rename()
instead of durable_rename_excl().

Here is an updated patch.

Is it intentional that v2 loses the xlogarchive.c part?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Import Notes

Reply to msg id not found: 20220222195229.GA42988@nathanxps1320220222011948.GA3850532@nathanxps13

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Kyotaro Horiguchi (#4)

Re: remove more archiving overhead

Thanks for taking a look!

On Thu, Feb 24, 2022 at 02:13:49PM +0900, Kyotaro Horiguchi wrote:

https://www.postgresql.org/docs/14/continuous-archiving.html
| The archive command should generally be designed to refuse to
| overwrite any pre-existing archive file. This is an important safety
| feature to preserve the integrity of your archive in case of
| administrator error (such as sending the output of two different
| servers to the same archive directory).

The recommended behavior of archive_command above is to avoid this
kind of corruption. If we officially admit that a WAL segment can be
archive twice, we need to revise that part (and the following part) of
the doc.

Yeah, we might want to recommend succeeding if the archive already exists
with identical contents (like basic_archive does). IMO this would best be
handled in a separate thread that is solely focused on documentation
updates. AFAICT this is an existing problem.

IIUC this means that in theory, a crash at an unfortunate moment could
leave you with a .ready file, the file to archive, and another link to that
file with a "future" WAL filename. If you re-archive the file after it has
been reused, you could end up with corrupt WAL archives. I think this

IMHO the difference would be that the single system call (maybe)
cannot be interrupted by sigkill. So I'm not sure that that is worth
considering. The sigkill window can be elimianted if we could use
renameat(RENAME_NOREPLACE). But finally power-loss can cause that.

Perhaps durable_rename_excl() could use renameat2() for systems that
support it. However, the problem would still exist for systems that have
link() but not renameat2(). In this particular case, there really
shouldn't be an existing file in pg_wal.

might already be possible today. Syncing the directory after every rename
probably reduces the likelihood quite substantially, but if
RemoveOldXlogFiles() quickly picks up the .done file and attempts
recycling before durable_rename() calls fsync() on the directory,
presumably the same problem could occur.

If we don't fsync at every rename, we don't even need for
RemoveOldXlogFiles to pickup .done very quickly. Isn't it a big
difference?

Yeah, it probably increases the chances quite a bit.

I think we don't want make checkpointer slower in exchange of making
archiver faster. Perhaps we need to work using a condensed
information rather than repeatedly scanning over the distributed
information like archive_status?

v2 avoids giving the checkpointer more work.

At Tue, 22 Feb 2022 11:52:29 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Tue, Feb 22, 2022 at 09:37:11AM -0800, Nathan Bossart wrote:

In my testing, I found that when I killed the server just before unlink()
during WAL recyling, I ended up with links to the same file in pg_wal after
restarting. My latest test produced links to the same file for the current
WAL file and the next one. Maybe WAL recyling should use durable_rename()
instead of durable_rename_excl().

Here is an updated patch.

Is it intentional that v2 loses the xlogarchive.c part?

Yes. I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing. Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#5)

Re: remove more archiving overhead

I tested this again with many more WAL files and a much larger machine
(r5d.24xlarge with data directory on an NVMe SSD instance store volume).
As before, I am using a bare-bones archive module that does nothing but
return true. Without the patch, archiving ~120k WAL files took about 109
seconds. With the patch, it took around 62 seconds, which amounts to a
~43% reduction in overhead.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#5)

1 attachment(s)

Re: remove more archiving overhead

On Thu, Feb 24, 2022 at 09:55:53AM -0800, Nathan Bossart wrote:

Yes. I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing. Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

I moved the fix for this to a new thread [0]/messages/by-id/20220407182954.GA1231544@nathanxps13 since I think it should be
back-patched. I've attached a new patch that only contains the part
related to reducing archiving overhead.

[0]: /messages/by-id/20220407182954.GA1231544@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Reduce-overhead-of-renaming-archive-status-files.patchtext/x-diff; charset=us-asciiDownload

From 6d8972412179fd3c82bb7e471966d4b7862bcbff Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thu, 7 Apr 2022 14:11:59 -0700
Subject: [PATCH v3 1/1] Reduce overhead of renaming archive status files.

Presently, archive status files are durably renamed from .ready to
.done to indicate that a file has been archived.  Persisting this
rename to disk accounts for a significant amount of the overhead
associated with archiving.  While durably renaming the file
prevents re-archiving in most cases, archive commands and libraries
must already gracefully handle attempts to re-archive the last
archived file after a crash (e.g., a crash immediately after
archive_command exits but before the server renames the status
file).

This change reduces the amount of overhead associated with
archiving by using rename() instead of durable_rename() to rename
the archive status files.  As a consequence, the server is more
likely to attempt to re-archive files after a crash, but as noted
above, archive commands and modules are already expected to handle
this.  It is also possible that the server will attempt to re-
archive files that have been removed or recycled, but the archiver
already handles this, too.

Author: Nathan Bossart
---
 src/backend/postmaster/pgarch.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0c8ca29f73..17a4151c01 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -746,7 +746,19 @@ pgarch_archiveDone(char *xlog)

 	StatusFilePath(rlogready, xlog, ".ready");
 	StatusFilePath(rlogdone, xlog, ".done");
-	(void) durable_rename(rlogready, rlogdone, WARNING);
+
+	/*
+	 * To avoid extra overhead, we don't durably rename the .ready file to
+	 * .done.  Archive commands and libraries must gracefully handle attempts
+	 * to re-archive files (e.g., if the server crashes just before this
+	 * function is called), so it should be okay if the .ready file reappears
+	 * after a crash.
+	 */
+	if (rename(rlogready, rlogdone) < 0)
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not rename file \"%s\" to \"%s\": %m",
+						rlogready, rlogdone)));
 }

-- 
2.25.1

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Nathan Bossart (#7)

Re: remove more archiving overhead

On Thu, Apr 7, 2022 at 6:23 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 24, 2022 at 09:55:53AM -0800, Nathan Bossart wrote:

Yes. I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing. Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

I moved the fix for this to a new thread [0] since I think it should be
back-patched. I've attached a new patch that only contains the part
related to reducing archiving overhead.

While we're now after the feature freeze and thus this will need to
wait for v16, it looks like a reasonable change to me.

--
Robert Haas
EDB: http://www.enterprisedb.com

Nathan Bossart

nathandbossart@gmail.com

almost 4 years ago

In reply to: Robert Haas (#8)

Re: remove more archiving overhead

On Fri, Apr 08, 2022 at 10:20:27AM -0400, Robert Haas wrote:

On Thu, Apr 7, 2022 at 6:23 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, Feb 24, 2022 at 09:55:53AM -0800, Nathan Bossart wrote:

Yes. I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing. Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

I moved the fix for this to a new thread [0] since I think it should be
back-patched. I've attached a new patch that only contains the part
related to reducing archiving overhead.

While we're now after the feature freeze and thus this will need to
wait for v16, it looks like a reasonable change to me.

Dang, just missed it. Thanks for taking a look.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#10

Fujii Masao

masao.fujii@oss.nttdata.com

over 3 years ago

In reply to: Nathan Bossart (#7)

Re: remove more archiving overhead

On 2022/04/08 7:23, Nathan Bossart wrote:

On Thu, Feb 24, 2022 at 09:55:53AM -0800, Nathan Bossart wrote:

Yes. I found that a crash at an unfortunate moment can produce multiple
links to the same file in pg_wal, which seemed bad independent of archival.
By fixing that (i.e., switching from durable_rename_excl() to
durable_rename()), we not only avoid this problem, but we also avoid trying
to archive a file the server is concurrently writing. Then, after a crash,
the WAL file to archive should either not exist (which is handled by the
archiver) or contain the same contents as any preexisting archives.

I moved the fix for this to a new thread [0] since I think it should be
back-patched. I've attached a new patch that only contains the part
related to reducing archiving overhead.

Thanks for updating the patch. It looks good to me.
Barring any objection, I'm thinking to commit it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Fujii Masao (#10)

Re: remove more archiving overhead

On Thu, Jul 7, 2022 at 10:03 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch. It looks good to me.
Barring any objection, I'm thinking to commit it.

I don't object, but I just started to wonder whether the need to
handle re-archiving of the same file cleanly is as well-documented as
it ought to be.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Robert Haas (#11)

Re: remove more archiving overhead

On 7/7/22 10:37, Robert Haas wrote:

On Thu, Jul 7, 2022 at 10:03 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch. It looks good to me.
Barring any objection, I'm thinking to commit it.

I don't object, but I just started to wonder whether the need to
handle re-archiving of the same file cleanly is as well-documented as
it ought to be.

+1, but I don't think that needs to stand in the way of this patch,
which looks sensible to me as-is. I think that's what you meant, but
just wanted to be sure.

There are plenty of ways that already-archived WAL might get archived
again and this is just one of them.

Thoughts, Nathan?

Regards,
-David

#13

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: David Steele (#12)

Re: remove more archiving overhead

On Thu, Jul 07, 2022 at 10:46:23AM -0400, David Steele wrote:

On 7/7/22 10:37, Robert Haas wrote:

On Thu, Jul 7, 2022 at 10:03 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Thanks for updating the patch. It looks good to me.
Barring any objection, I'm thinking to commit it.

I don't object, but I just started to wonder whether the need to
handle re-archiving of the same file cleanly is as well-documented as
it ought to be.

+1, but I don't think that needs to stand in the way of this patch, which
looks sensible to me as-is. I think that's what you meant, but just wanted
to be sure.

Yeah, this seems like something that should be documented. I can pick this
up. I believe this is an existing problem, but this patch could make it
more likely.

There are plenty of ways that already-archived WAL might get archived again
and this is just one of them.

What are some of the others? I was aware of the case that was fixed in
ff9f111, where we might try to re-archive a file with different contents,
but I'm curious what other ways you've seen this happen.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#13)

1 attachment(s)

Re: remove more archiving overhead

On Thu, Jul 07, 2022 at 09:18:25AM -0700, Nathan Bossart wrote:

On Thu, Jul 07, 2022 at 10:46:23AM -0400, David Steele wrote:

On 7/7/22 10:37, Robert Haas wrote:

I don't object, but I just started to wonder whether the need to
handle re-archiving of the same file cleanly is as well-documented as
it ought to be.

+1, but I don't think that needs to stand in the way of this patch, which
looks sensible to me as-is. I think that's what you meant, but just wanted
to be sure.

Yeah, this seems like something that should be documented. I can pick this
up. I believe this is an existing problem, but this patch could make it
more likely.

Here is a first try at documenting this. I'm not thrilled about the
placement, since it feels a bit buried in the backup docs, but this is
where this sort of thing lives today. It also seems odd to stress the
importance of avoiding overwriting pre-existing archives in case multiple
servers are archiving to the same place while only offering solutions with
obvious race conditions. Even basic_archive is subject to this now that
durable_rename_excl() no longer exists. Perhaps we should make a note of
that, too.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-add-note-about-re-archiving-in-docs.patchtext/x-diff; charset=us-asciiDownload

From 2b563823ca8dc57ea386832ea0b8c4c472645117 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thu, 7 Jul 2022 10:43:30 -0700
Subject: [PATCH v1 1/1] add note about re-archiving in docs

---
 doc/src/sgml/backup.sgml | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 73a774d3d7..dd1c360455 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -685,10 +685,24 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
    </para>
 
    <para>
-    It is advisable to test your proposed archive library to ensure that it
-    indeed does not overwrite an existing file, <emphasis>and that it returns
-    <literal>false</literal> in this case</emphasis>.
-    The example command above for Unix ensures this by including a separate
+    In rare cases, <productname>PostgreSQL</productname> may attempt to
+    re-archive a WAL file that was previously archived.  For example, if the
+    system crashes before the server makes a durable record of archival success,
+    the server will attempt to archive the file again after restarting (provided
+    archiving is still enabled).  It is advisable to test your proposed archive
+    library to ensure that it indeed does not overwrite an existing file.  When
+    a pre-existing file is encountered, the archive library may return
+    <literal>true</literal> if the WAL file has identical contents to the
+    pre-existing archive.  Alternatively, the archive library can return
+    <literal>false</literal> anytime a pre-existing file is encountered, but
+    this will require manual action by an administrator to resolve.  If a
+    pre-existing file contains different contents than the WAL file being
+    archived, the archive library <emphasis>must</emphasis> return false.
+   </para>
+
+   <para>
+    The example command above for Unix avoids overwriting a pre-existing archive
+    by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
     less verbosely, but you should not rely on these without verifying that
-- 
2.25.1

#15

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Nathan Bossart (#13)

Re: remove more archiving overhead

On 7/7/22 12:18, Nathan Bossart wrote:

On Thu, Jul 07, 2022 at 10:46:23AM -0400, David Steele wrote:

There are plenty of ways that already-archived WAL might get archived again
and this is just one of them.

What are some of the others? I was aware of the case that was fixed in
ff9f111, where we might try to re-archive a file with different contents,
but I'm curious what other ways you've seen this happen.

On the PG side, crashes and (IIRC) immediate shutdown.

In general, any failure in the archiver itself. Storage, memory,
network, etc. There are plenty of ways that the file might make it to
storage but postgres never gets notified, so it will retry.

Any archiver that is not tolerant of this fact is not going to be very
useful and this patch only makes it slightly more true.

Regards,
-David

#16

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: David Steele (#15)

Re: remove more archiving overhead

On Thu, Jul 07, 2022 at 02:07:26PM -0400, David Steele wrote:

On 7/7/22 12:18, Nathan Bossart wrote:

On Thu, Jul 07, 2022 at 10:46:23AM -0400, David Steele wrote:

There are plenty of ways that already-archived WAL might get archived again
and this is just one of them.

What are some of the others? I was aware of the case that was fixed in
ff9f111, where we might try to re-archive a file with different contents,
but I'm curious what other ways you've seen this happen.

On the PG side, crashes and (IIRC) immediate shutdown.

In general, any failure in the archiver itself. Storage, memory, network,
etc. There are plenty of ways that the file might make it to storage but
postgres never gets notified, so it will retry.

Any archiver that is not tolerant of this fact is not going to be very
useful and this patch only makes it slightly more true.

Ah, got it, makes sense.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#14)

Re: remove more archiving overhead

On Thu, Jul 07, 2022 at 10:51:42AM -0700, Nathan Bossart wrote:

+    library to ensure that it indeed does not overwrite an existing file.  When
+    a pre-existing file is encountered, the archive library may return
+    <literal>true</literal> if the WAL file has identical contents to the
+    pre-existing archive.  Alternatively, the archive library can return

This should likely say something about ensuring the pre-existing file is
persisted to disk before returning true.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Nathan Bossart (#17)

Re: remove more archiving overhead

On 7/7/22 14:22, Nathan Bossart wrote:

On Thu, Jul 07, 2022 at 10:51:42AM -0700, Nathan Bossart wrote:
+    library to ensure that it indeed does not overwrite an existing file.  When
+    a pre-existing file is encountered, the archive library may return
+    <literal>true</literal> if the WAL file has identical contents to the
+    pre-existing archive.  Alternatively, the archive library can return
This should likely say something about ensuring the pre-existing file is
persisted to disk before returning true.

Agreed.

Also, I'd prefer to remove "indeed" from this sentence (yes, I see that
was in the original text):

+ library to ensure that it indeed does not overwrite an existing

Regards,
-David

#19

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: David Steele (#18)

1 attachment(s)

Re: remove more archiving overhead

On Thu, Jul 07, 2022 at 05:06:13PM -0400, David Steele wrote:

On 7/7/22 14:22, Nathan Bossart wrote:
On Thu, Jul 07, 2022 at 10:51:42AM -0700, Nathan Bossart wrote:
+    library to ensure that it indeed does not overwrite an existing file.  When
+    a pre-existing file is encountered, the archive library may return
+    <literal>true</literal> if the WAL file has identical contents to the
+    pre-existing archive.  Alternatively, the archive library can return
This should likely say something about ensuring the pre-existing file is
persisted to disk before returning true.
Agreed.

Also, I'd prefer to remove "indeed" from this sentence (yes, I see that was
in the original text):

+ library to ensure that it indeed does not overwrite an existing

Here's an updated patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-add-note-about-re-archiving-in-docs.patchtext/x-diff; charset=us-asciiDownload

From 044962852ff86ffd768e124bff9883b6f7591b62 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thu, 7 Jul 2022 10:43:30 -0700
Subject: [PATCH v2 1/1] add note about re-archiving in docs

---
 doc/src/sgml/backup.sgml | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 73a774d3d7..4462bfe7a9 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -681,14 +681,28 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
-    directory).
+    directory).  It is advisable to test your proposed archive library to ensure
+    that it does not overwrite an existing file.
    </para>
 
    <para>
-    It is advisable to test your proposed archive library to ensure that it
-    indeed does not overwrite an existing file, <emphasis>and that it returns
-    <literal>false</literal> in this case</emphasis>.
-    The example command above for Unix ensures this by including a separate
+    In rare cases, <productname>PostgreSQL</productname> may attempt to
+    re-archive a WAL file that was previously archived.  For example, if the
+    system crashes before the server makes a durable record of archival success,
+    the server will attempt to archive the file again after restarting (provided
+    archiving is still enabled).  When an archive library encounters a
+    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    identical contents to the pre-existing archive and the pre-existing archive
+    is fully persisted to storage.  Alternatively, the archive library can
+    return <literal>false</literal> anytime a pre-existing file is encountered,
+    but this will require manual action by an administrator to resolve.  If a
+    pre-existing file contains different contents than the WAL file being
+    archived, the archive library <emphasis>must</emphasis> return false.
+   </para>
+
+   <para>
+    The example command above for Unix avoids overwriting a pre-existing archive
+    by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
     less verbosely, but you should not rely on these without verifying that
-- 
2.25.1

#20

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#19)

Re: remove more archiving overhead

At Thu, 7 Jul 2022 15:07:16 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in

Here's an updated patch.

Thinking RFC'ish, the meaning of "may" and "must" is significant in
this description. On the other hand it uses both "may" and "can" but
I thinkthat their difference is not significant or "can" there is
somewhat confusing. I think the "can" should be "may" here.

And since "must" is emphasized, doesn't "may" also needs emphasis?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#21

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Kyotaro Horiguchi (#20)

Re: remove more archiving overhead

On 7/7/22 21:56, Kyotaro Horiguchi wrote:

At Thu, 7 Jul 2022 15:07:16 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in

Here's an updated patch.

Thinking RFC'ish, the meaning of "may" and "must" is significant in
this description. On the other hand it uses both "may" and "can" but
I thinkthat their difference is not significant or "can" there is
somewhat confusing. I think the "can" should be "may" here.

+1.

And since "must" is emphasized, doesn't "may" also needs emphasis?

I think emphasis only on must is fine.

Nathan, I don't see the language about being sure to persist to storage
here?

Regards,
-David

#22

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: David Steele (#21)

2 attachment(s)

Re: remove more archiving overhead

On Fri, Jul 08, 2022 at 08:20:09AM -0400, David Steele wrote:

On 7/7/22 21:56, Kyotaro Horiguchi wrote:

Thinking RFC'ish, the meaning of "may" and "must" is significant in
this description. On the other hand it uses both "may" and "can" but
I thinkthat their difference is not significant or "can" there is
somewhat confusing. I think the "can" should be "may" here.

+1.

Done.

And since "must" is emphasized, doesn't "may" also needs emphasis?

I think emphasis only on must is fine.

Yeah, I wanted to emphasize the importance of returning false in this case.
Since it's okay to return true or false in the identical/persisted file
case, I didn't think it deserved emphasis.

Nathan, I don't see the language about being sure to persist to storage
here?

It's here:
When an archive library encounters a pre-existing file, it may return
true if the WAL file has identical contents to the pre-existing archive
and the pre-existing archive is fully persisted to storage.

Since you didn't catch it, I wonder if it needs improvement. At the very
least, perhaps we should note that one way to do the latter is to persist
it yourself before returning true, and we could point to basic_archive.c as
an example. However, I'm hesitant to make these docs too much more
complicated than they already are. WDYT?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Reduce-overhead-of-renaming-archive-status-files.patchtext/x-diff; charset=us-asciiDownload

From a32f95d3b129d386ce03c194bd6cddad2f92b76b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thu, 7 Apr 2022 14:11:59 -0700
Subject: [PATCH v4 1/2] Reduce overhead of renaming archive status files.

Presently, archive status files are durably renamed from .ready to
.done to indicate that a file has been archived.  Persisting this
rename to disk accounts for a significant amount of the overhead
associated with archiving.  While durably renaming the file
prevents re-archiving in most cases, archive commands and libraries
must already gracefully handle attempts to re-archive the last
archived file after a crash (e.g., a crash immediately after
archive_command exits but before the server renames the status
file).

This change reduces the amount of overhead associated with
archiving by using rename() instead of durable_rename() to rename
the archive status files.  As a consequence, the server is more
likely to attempt to re-archive files after a crash, but as noted
above, archive commands and modules are already expected to handle
this.  It is also possible that the server will attempt to re-
archive files that have been removed or recycled, but the archiver
already handles this, too.

Author: Nathan Bossart
---
 src/backend/postmaster/pgarch.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 25e31c42e1..6ce361707d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -745,7 +745,19 @@ pgarch_archiveDone(char *xlog)

 	StatusFilePath(rlogready, xlog, ".ready");
 	StatusFilePath(rlogdone, xlog, ".done");
-	(void) durable_rename(rlogready, rlogdone, WARNING);
+
+	/*
+	 * To avoid extra overhead, we don't durably rename the .ready file to
+	 * .done.  Archive commands and libraries must gracefully handle attempts
+	 * to re-archive files (e.g., if the server crashes just before this
+	 * function is called), so it should be okay if the .ready file reappears
+	 * after a crash.
+	 */
+	if (rename(rlogready, rlogdone) < 0)
+		ereport(WARNING,
+				(errcode_for_file_access(),
+				 errmsg("could not rename file \"%s\" to \"%s\": %m",
+						rlogready, rlogdone)));
 }

-- 
2.25.1

v4-0002-add-note-about-re-archiving-in-docs.patchtext/x-diff; charset=us-asciiDownload

From 54e1225173211062940ed4c8130ba97c192a99ec Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Thu, 7 Jul 2022 10:43:30 -0700
Subject: [PATCH v4 2/2] add note about re-archiving in docs

---
 doc/src/sgml/backup.sgml | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 73a774d3d7..04a1f94ad0 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -681,14 +681,28 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     any pre-existing archive file.  This is an important safety feature to
     preserve the integrity of your archive in case of administrator error
     (such as sending the output of two different servers to the same archive
-    directory).
+    directory).  It is advisable to test your proposed archive library to ensure
+    that it does not overwrite an existing file.
    </para>
 
    <para>
-    It is advisable to test your proposed archive library to ensure that it
-    indeed does not overwrite an existing file, <emphasis>and that it returns
-    <literal>false</literal> in this case</emphasis>.
-    The example command above for Unix ensures this by including a separate
+    In rare cases, <productname>PostgreSQL</productname> may attempt to
+    re-archive a WAL file that was previously archived.  For example, if the
+    system crashes before the server makes a durable record of archival success,
+    the server will attempt to archive the file again after restarting (provided
+    archiving is still enabled).  When an archive library encounters a
+    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    identical contents to the pre-existing archive and the pre-existing archive
+    is fully persisted to storage.  Alternatively, the archive library may
+    return <literal>false</literal> anytime a pre-existing file is encountered,
+    but this will require manual action by an administrator to resolve.  If a
+    pre-existing file contains different contents than the WAL file being
+    archived, the archive library <emphasis>must</emphasis> return false.
+   </para>
+
+   <para>
+    The example command above for Unix avoids overwriting a pre-existing archive
+    by including a separate
     <command>test</command> step.  On some Unix platforms, <command>cp</command> has
     switches such as <option>-i</option> that can be used to do the same thing
     less verbosely, but you should not rely on these without verifying that
-- 
2.25.1

#23

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Nathan Bossart (#22)

Re: remove more archiving overhead

On 7/8/22 12:54, Nathan Bossart wrote:

On Fri, Jul 08, 2022 at 08:20:09AM -0400, David Steele wrote:

Nathan, I don't see the language about being sure to persist to storage
here?

It's here:
When an archive library encounters a pre-existing file, it may return
true if the WAL file has identical contents to the pre-existing archive
and the pre-existing archive is fully persisted to storage.

Since you didn't catch it, I wonder if it needs improvement. At the very
least, perhaps we should note that one way to do the latter is to persist
it yourself before returning true, and we could point to basic_archive.c as
an example. However, I'm hesitant to make these docs too much more
complicated than they already are. WDYT?

I think I wrote this before I'd had enough coffee. "fully persisted to
storage" can mean many things depending on the storage (Posix, CIFS, S3,
etc.) so I think this is fine. The basic_archive module is there for
people who would like implementation details for Posix.

Regards,
-David

#24

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: David Steele (#23)

Re: remove more archiving overhead

On Fri, Jul 08, 2022 at 01:02:51PM -0400, David Steele wrote:

I think I wrote this before I'd had enough coffee. "fully persisted to
storage" can mean many things depending on the storage (Posix, CIFS, S3,
etc.) so I think this is fine. The basic_archive module is there for people
who would like implementation details for Posix.

Perhaps this section should warn against reading without sufficient
caffeination. :)

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#25

Fujii Masao

masao.fujii@oss.nttdata.com

over 3 years ago

In reply to: Nathan Bossart (#24)

Re: remove more archiving overhead

On 2022/07/09 2:18, Nathan Bossart wrote:

On Fri, Jul 08, 2022 at 01:02:51PM -0400, David Steele wrote:

I think I wrote this before I'd had enough coffee. "fully persisted to
storage" can mean many things depending on the storage (Posix, CIFS, S3,
etc.) so I think this is fine. The basic_archive module is there for people
who would like implementation details for Posix.

Yes. But some users might want to use basic_archive without any changes.
For them, how about documenting how basic_archive works in basic_archive docs?
I'm just thinking that it's worth exposing the information commented on
the top of basic_archive.c.

Anyway, since the patches look good to me, I pushed them. Thanks a lot!
If necessary, we can keep improving the docs later.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#26

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Fujii Masao (#25)

Re: remove more archiving overhead

On Tue, Jul 26, 2022 at 04:26:23PM +0900, Fujii Masao wrote:

Anyway, since the patches look good to me, I pushed them. Thanks a lot!
If necessary, we can keep improving the docs later.

Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#27

Noah Misch

noah@leadboat.com

over 3 years ago

In reply to: Nathan Bossart (#22)

Re: remove more archiving overhead

On Fri, Jul 08, 2022 at 09:54:50AM -0700, Nathan Bossart wrote:

Since it's okay to return true or false in the identical/persisted file
case, I didn't think it deserved emphasis.

I think returning false is not-okay:

--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -681,14 +681,28 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
any pre-existing archive file.  This is an important safety feature to
preserve the integrity of your archive in case of administrator error
(such as sending the output of two different servers to the same archive
-    directory).
+    directory).  It is advisable to test your proposed archive library to ensure
+    that it does not overwrite an existing file.
</para>

<para>
-    It is advisable to test your proposed archive library to ensure that it
-    indeed does not overwrite an existing file, <emphasis>and that it returns
-    <literal>false</literal> in this case</emphasis>.
-    The example command above for Unix ensures this by including a separate
+    In rare cases, <productname>PostgreSQL</productname> may attempt to
+    re-archive a WAL file that was previously archived.  For example, if the
+    system crashes before the server makes a durable record of archival success,
+    the server will attempt to archive the file again after restarting (provided
+    archiving is still enabled).  When an archive library encounters a
+    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    identical contents to the pre-existing archive and the pre-existing archive
+    is fully persisted to storage.  Alternatively, the archive library may
+    return <literal>false</literal> anytime a pre-existing file is encountered,
+    but this will require manual action by an administrator to resolve.  If a

Inviting the administrator to resolve things is more dangerous than just
returning true. I recommend making this text more opinionated and simpler:
libraries must return true. Alternately, if some library has found a good
reason to return false, this paragraph could give the reason. I don't know of
such a reason, though.

Show quoted text

+    pre-existing file contains different contents than the WAL file being
+    archived, the archive library <emphasis>must</emphasis> return false.
+   </para>

#28

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Noah Misch (#27)

1 attachment(s)

Re: remove more archiving overhead

On Sat, Jul 30, 2022 at 11:51:56PM -0700, Noah Misch wrote:

Inviting the administrator to resolve things is more dangerous than just
returning true. I recommend making this text more opinionated and simpler:
libraries must return true. Alternately, if some library has found a good
reason to return false, this paragraph could give the reason. I don't know of
such a reason, though.

Your suggestion seems reasonable to me. I've attached a small patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

adjust_archive_docs.patchtext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e432bb015a..cba7292f98 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -691,11 +691,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
     system crashes before the server makes a durable record of archival success,
     the server will attempt to archive the file again after restarting (provided
     archiving is still enabled).  When an archive library encounters a
-    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    pre-existing file, it should return <literal>true</literal> if the WAL file has
     identical contents to the pre-existing archive and the pre-existing archive
-    is fully persisted to storage.  Alternatively, the archive library may
-    return <literal>false</literal> anytime a pre-existing file is encountered,
-    but this will require manual action by an administrator to resolve.  If a
+    is fully persisted to storage.  If a
     pre-existing file contains different contents than the WAL file being
     archived, the archive library <emphasis>must</emphasis> return
     <literal>false</literal>.

#29

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Nathan Bossart (#28)

Re: remove more archiving overhead

On 8/2/22 01:02, Nathan Bossart wrote:

On Sat, Jul 30, 2022 at 11:51:56PM -0700, Noah Misch wrote:

Inviting the administrator to resolve things is more dangerous than just
returning true. I recommend making this text more opinionated and simpler:
libraries must return true. Alternately, if some library has found a good
reason to return false, this paragraph could give the reason. I don't know of
such a reason, though.

Your suggestion seems reasonable to me. I've attached a small patch.

+1. I think this is less confusing overall.

Regards,
-David

#30

Noah Misch

noah@leadboat.com

over 3 years ago

In reply to: Nathan Bossart (#28)

Re: remove more archiving overhead

On Mon, Aug 01, 2022 at 10:02:19PM -0700, Nathan Bossart wrote:

On Sat, Jul 30, 2022 at 11:51:56PM -0700, Noah Misch wrote:

Inviting the administrator to resolve things is more dangerous than just
returning true. I recommend making this text more opinionated and simpler:
libraries must return true. Alternately, if some library has found a good
reason to return false, this paragraph could give the reason. I don't know of
such a reason, though.

Your suggestion seems reasonable to me. I've attached a small patch.

--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -691,11 +691,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
system crashes before the server makes a durable record of archival success,
the server will attempt to archive the file again after restarting (provided
archiving is still enabled).  When an archive library encounters a
-    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    pre-existing file, it should return <literal>true</literal> if the WAL file has
identical contents to the pre-existing archive and the pre-existing archive
-    is fully persisted to storage.  Alternatively, the archive library may
-    return <literal>false</literal> anytime a pre-existing file is encountered,
-    but this will require manual action by an administrator to resolve.  If a
+    is fully persisted to storage.  If a
pre-existing file contains different contents than the WAL file being
archived, the archive library <emphasis>must</emphasis> return
<literal>false</literal>.

Works for me. Thanks.

#31

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Noah Misch (#30)

Re: remove more archiving overhead

On 03.08.22 09:16, Noah Misch wrote:

On Mon, Aug 01, 2022 at 10:02:19PM -0700, Nathan Bossart wrote:

On Sat, Jul 30, 2022 at 11:51:56PM -0700, Noah Misch wrote:

Inviting the administrator to resolve things is more dangerous than just
returning true. I recommend making this text more opinionated and simpler:
libraries must return true. Alternately, if some library has found a good
reason to return false, this paragraph could give the reason. I don't know of
such a reason, though.

Your suggestion seems reasonable to me. I've attached a small patch.
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -691,11 +691,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
system crashes before the server makes a durable record of archival success,
the server will attempt to archive the file again after restarting (provided
archiving is still enabled).  When an archive library encounters a
-    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    pre-existing file, it should return <literal>true</literal> if the WAL file has
identical contents to the pre-existing archive and the pre-existing archive
-    is fully persisted to storage.  Alternatively, the archive library may
-    return <literal>false</literal> anytime a pre-existing file is encountered,
-    but this will require manual action by an administrator to resolve.  If a
+    is fully persisted to storage.  If a
pre-existing file contains different contents than the WAL file being
archived, the archive library <emphasis>must</emphasis> return
<literal>false</literal>.
Works for me. Thanks.

This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?

#32

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#31)

Re: remove more archiving overhead

On Sat, Sep 17, 2022 at 11:46:39AM +0200, Peter Eisentraut wrote:

--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -691,11 +691,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
system crashes before the server makes a durable record of archival success,
the server will attempt to archive the file again after restarting (provided
archiving is still enabled).  When an archive library encounters a
-    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    pre-existing file, it should return <literal>true</literal> if the WAL file has
identical contents to the pre-existing archive and the pre-existing archive
-    is fully persisted to storage.  Alternatively, the archive library may
-    return <literal>false</literal> anytime a pre-existing file is encountered,
-    but this will require manual action by an administrator to resolve.  If a
+    is fully persisted to storage.  If a
pre-existing file contains different contents than the WAL file being
archived, the archive library <emphasis>must</emphasis> return
<literal>false</literal>.

Works for me. Thanks.

This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?

I believe users of archive_command need to do something similar to what is
described here. However, it might be more reasonable to expect
archive_command users to simply return false when there is a pre-existing
file, as the deleted text notes. IIRC that is why I added that sentence
originally.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#33

Noah Misch

noah@leadboat.com

over 3 years ago

In reply to: Nathan Bossart (#32)

Re: remove more archiving overhead

On Sat, Sep 17, 2022 at 02:54:27PM -0700, Nathan Bossart wrote:

On Sat, Sep 17, 2022 at 11:46:39AM +0200, Peter Eisentraut wrote:
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -691,11 +691,9 @@ test ! -f /mnt/server/archivedir/00000001000000A900000065 &amp;&amp; cp pg_wal/0
system crashes before the server makes a durable record of archival success,
the server will attempt to archive the file again after restarting (provided
archiving is still enabled).  When an archive library encounters a
-    pre-existing file, it may return <literal>true</literal> if the WAL file has
+    pre-existing file, it should return <literal>true</literal> if the WAL file has
identical contents to the pre-existing archive and the pre-existing archive
-    is fully persisted to storage.  Alternatively, the archive library may
-    return <literal>false</literal> anytime a pre-existing file is encountered,
-    but this will require manual action by an administrator to resolve.  If a
+    is fully persisted to storage.  If a
pre-existing file contains different contents than the WAL file being
archived, the archive library <emphasis>must</emphasis> return
<literal>false</literal>.
Works for me. Thanks.
This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?
I believe users of archive_command need to do something similar to what is
described here. However, it might be more reasonable to expect
archive_command users to simply return false when there is a pre-existing
file, as the deleted text notes. IIRC that is why I added that sentence
originally.

What makes the answer for archive_command diverge from the answer for
archive_library?

#34

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Noah Misch (#33)

Re: remove more archiving overhead

On 18.09.22 09:13, Noah Misch wrote:

This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?

I believe users of archive_command need to do something similar to what is
described here. However, it might be more reasonable to expect
archive_command users to simply return false when there is a pre-existing
file, as the deleted text notes. IIRC that is why I added that sentence
originally.

What makes the answer for archive_command diverge from the answer for
archive_library?

I suspect what we are really trying to say here is

===
Archiving setups (using either archive_command or archive_library)
should be prepared for the rare case that an identical archive file is
being archived a second time. In such a case, they should compare that
the source and the target file are identical and proceed without error
if so.

In some cases, it is difficult or impossible to configure
archive_command or archive_library to do this. In such cases, the
archiving command or library should error like in the case for any
pre-existing target file, and operators need to be prepared to resolve
such cases manually.
===

Is that correct?

#35

Noah Misch

noah@leadboat.com

over 3 years ago

In reply to: Peter Eisentraut (#34)

Re: remove more archiving overhead

On Mon, Sep 19, 2022 at 06:08:29AM -0400, Peter Eisentraut wrote:

On 18.09.22 09:13, Noah Misch wrote:

This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?

I believe users of archive_command need to do something similar to what is
described here. However, it might be more reasonable to expect
archive_command users to simply return false when there is a pre-existing
file, as the deleted text notes. IIRC that is why I added that sentence
originally.

What makes the answer for archive_command diverge from the answer for
archive_library?

I suspect what we are really trying to say here is

===
Archiving setups (using either archive_command or archive_library) should be
prepared for the rare case that an identical archive file is being archived
a second time. In such a case, they should compare that the source and the
target file are identical and proceed without error if so.

In some cases, it is difficult or impossible to configure archive_command or
archive_library to do this. In such cases, the archiving command or library
should error like in the case for any pre-existing target file, and
operators need to be prepared to resolve such cases manually.
===

Is that correct?

I wanted it to stop saying anything like the second paragraph, hence commit
d263ced. Implementing a proper archiving setup is not especially difficult,
and inviting the operator to work around a wrong implementation invites
damaging mistakes under time pressure.

#36

David Steele

david@pgmasters.net

over 3 years ago

In reply to: Noah Misch (#35)

Re: remove more archiving overhead

On 9/19/22 07:39, Noah Misch wrote:

On Mon, Sep 19, 2022 at 06:08:29AM -0400, Peter Eisentraut wrote:

On 18.09.22 09:13, Noah Misch wrote:

This documentation change only covers archive_library. How are users of
archive_command supposed to handle this?

I believe users of archive_command need to do something similar to what is
described here. However, it might be more reasonable to expect
archive_command users to simply return false when there is a pre-existing
file, as the deleted text notes. IIRC that is why I added that sentence
originally.

What makes the answer for archive_command diverge from the answer for
archive_library?

I suspect what we are really trying to say here is

===
Archiving setups (using either archive_command or archive_library) should be
prepared for the rare case that an identical archive file is being archived
a second time. In such a case, they should compare that the source and the
target file are identical and proceed without error if so.

In some cases, it is difficult or impossible to configure archive_command or
archive_library to do this. In such cases, the archiving command or library
should error like in the case for any pre-existing target file, and
operators need to be prepared to resolve such cases manually.
===

Is that correct?

I wanted it to stop saying anything like the second paragraph, hence commit
d263ced. Implementing a proper archiving setup is not especially difficult,
and inviting the operator to work around a wrong implementation invites
damaging mistakes under time pressure.

I would also not want to state that duplicate WAL is rare. In practice
it is pretty common when things are going wrong. Also, implying it is
rare might lead a user to decide they don't need to worry about it.

Regards,
-David

#37

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#34)

Re: remove more archiving overhead

On Mon, Sep 19, 2022 at 6:08 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

I suspect what we are really trying to say here is

===
Archiving setups (using either archive_command or archive_library)
should be prepared for the rare case that an identical archive file is
being archived a second time. In such a case, they should compare that
the source and the target file are identical and proceed without error
if so.

In some cases, it is difficult or impossible to configure
archive_command or archive_library to do this. In such cases, the
archiving command or library should error like in the case for any
pre-existing target file, and operators need to be prepared to resolve
such cases manually.
===

Is that correct?

I think it is. I think commit d263ced225bffe2c340175125b0270d1869138fe
made the documentation in this area substantially clearer than what we
had before, and I think that we could adjust that text to apply to
both archive libraries and archive commands relatively easily, so I'm
not really sure that we need to add text like what you propose here.

However, I also don't think it would be particularly bad if we did. An
advantage of that language is that it makes it clear what the
consequences are if you fail to follow the recommendation about
handling identical files. That may be helpful to users in
understanding the behavior of the system, and might even make it more
likely that they will include proper handling of that case in their
archive_command or archive_library.

"impossible" does seem like a bit of a strong word, though. I'd
recommend leaving that out.

--
Robert Haas
EDB: http://www.enterprisedb.com

#38

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Noah Misch (#35)

Re: remove more archiving overhead

On Mon, Sep 19, 2022 at 10:39 AM Noah Misch <noah@leadboat.com> wrote:

I wanted it to stop saying anything like the second paragraph, hence commit
d263ced. Implementing a proper archiving setup is not especially difficult,
and inviting the operator to work around a wrong implementation invites
damaging mistakes under time pressure.

I agree wholeheartedly with the second sentence. I do think that we
need to be a little careful not to be too prescriptive. Telling people
what they have to do when they don't really have to do it often leads
to non-compliance. It can be more productive to spell out the precise
consequences of non-compliance, so I think Peter's proposal has some
merit. On the other hand, I don't see any real problem with the
current language, either.

Unfortunately, no matter what we do here, it is not likely that we'll
soon be able to eliminate the phenomenon where people use a buggy
home-grown archive_command, both because we've been encouraging that
in our documentation for a very long time, and also because the people
who are most likely to do something half-baked here are probably the
least likely to actually read the updated documentation.

--
Robert Haas
EDB: http://www.enterprisedb.com

#39

Noah Misch

noah@leadboat.com

over 3 years ago

In reply to: Robert Haas (#38)

Re: remove more archiving overhead

On Mon, Sep 19, 2022 at 12:49:58PM -0400, Robert Haas wrote:

On Mon, Sep 19, 2022 at 10:39 AM Noah Misch <noah@leadboat.com> wrote:

I wanted it to stop saying anything like the second paragraph, hence commit
d263ced. Implementing a proper archiving setup is not especially difficult,
and inviting the operator to work around a wrong implementation invites
damaging mistakes under time pressure.

I agree wholeheartedly with the second sentence. I do think that we
need to be a little careful not to be too prescriptive. Telling people
what they have to do when they don't really have to do it often leads
to non-compliance. It can be more productive to spell out the precise
consequences of non-compliance, so I think Peter's proposal has some
merit.

Good point. We could say it from a perspective like, "if you don't do this,
your setup will appear to work most of the time, but your operator will be
stuck with dangerous manual fixes at times."

On the other hand, I don't see any real problem with the
current language, either.

Unfortunately, no matter what we do here, it is not likely that we'll
soon be able to eliminate the phenomenon where people use a buggy
home-grown archive_command, both because we've been encouraging that
in our documentation for a very long time, and also because the people
who are most likely to do something half-baked here are probably the
least likely to actually read the updated documentation.

True. If I wanted to eliminate such setups, I would make the server
double-archive one WAL file each time the archive setup changes.