basic_archive lost archive_directory

Started by Олег Самойлов2 months ago20 messagesbugs
Jump to latest

<div>Debian 13.3</div><div><div>Postgresql 17.7-3.pgdg13+1 from PGDG</div><div> </div><div>PostgreSQL was configured to wrote WAL archive into NFS server. After some time, suddenly, WAL writing stopped. In the log:<div><span style="font-family:'courier new' , monospace">2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49 MSK,,0,WARNING,22023,"invalid value for parameter ""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0</span></div><div><span style="font-family:'courier new' , monospace">2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49 MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not configured","basic_archive.archive_directory is not set.",,,,,,,,"","archiver",,0</span><br />And the last row repeated several days, pg_wal raised up to 4 TiB. I checked, the <span style="font-family:'courier new' , monospace">basic_archive.archive_directory </span><span style="font-family:'arial' , sans-serif">was configured:</span></div><div><div><span style="font-family:'courier new' , monospace">postgres=# show basic_archive.archive_directory;</span></div><div><span style="font-family:'courier new' , monospace">     basic_archive.archive_directory</span></div><div><span style="font-family:'courier new' , monospace">-----------------------------------------</span></div><div><span style="font-family:'courier new' , monospace"> /mnt/ocean/postgres/stars/WAL</span></div><div><span style="font-family:'courier new' , monospace">(1 row)</span></div><div>And directory existed and it has old WAL files (before error raised).</div><div>The basic_archive is unstable, but I don't know how to repeat this. Workaround: use the oldschool archive_command .</div></div></div></div>

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Олег Самойлов (#1)
Re: basic_archive lost archive_directory

On 2026-Jan-29, Олег Самойлов wrote:

2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49
MSK,,0,WARNING,22023,"invalid value for parameter
""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0
2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49
MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not
configured","basic_archive.archive_directory is not
set.",,,,,,,,"","archiver",,0

Most likely, the NFS mount was lost for some reason. Did you check the
kernel logs? Did you ask your colleagues if they unmounted the
filesystem for laughs? Did you search for any funny logs on the NFS
server side?

And directory existed and it has old WAL files (before error raised).
The basic_archive is unstable, but I don't know how to repeat this.

It's difficult to believe that it was basic_archive's fault; it's just
giving you whatever the kernel reported to a stat() call for the
directory.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"¿Qué importan los años? Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)

In reply to: Alvaro Herrera (#2)
Re: basic_archive lost archive_directory

31.01.2026 19:08, Álvaro Herrera пишет:

On 2026-Jan-29, Олег Самойлов wrote:

2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49
MSK,,0,WARNING,22023,"invalid value for parameter
""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0

Most likely, the NFS mount was lost for some reason. Did you check the
kernel logs? Did you ask your colleagues if they unmounted the
filesystem for laughs? Did you search for any funny logs on the NFS
server side?

May be this is the reason for the first message and this is not a bug.
This can be with NFS.

2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49
MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not
configured","basic_archive.archive_directory is not
set.",,,,,,,,"","archiver",,0
And directory existed and it has old WAL files (before error raised).
The basic_archive is unstable, but I don't know how to repeat this.

It's difficult to believe that it was basic_archive's fault; it's just
giving you whatever the kernel reported to a stat() call for the
directory.

The bug is here. basic_archive lost the value and was empty. But show
all; show that the value is.

In reply to: Олег Самойлов (#3)
Re: basic_archive lost archive_directory

Hello

How to reproduce:

1) configure

archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '/some/path/'

2) start postgres and verify archive works
3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough.
4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts
5) restart archiver process with any reason: kill it or restart postgres
6) make archive_directory accessible again: archiver process will not check the directory's existence again and continue to complain about unconfigured archive_directory

Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured? (attached)

regards, Sergei

Attachments:

basic_archive_move_is_dir.patchtext/x-diff; name=basic_archive_move_is_dir.patchDownload+15-17
In reply to: Sergei Kornilov (#4)
Re: basic_archive lost archive_directory

06.02.2026 17:25, Sergei Kornilov пишет

Hello

How to reproduce:

1) configure

archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '/some/path/'

2) start postgres and verify archive works
3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough.
4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts
5) restart archiver process with any reason: kill it or restart postgres
6) make archive_directory accessible again: archiver process will not check the directory's existence again and continue to complain about unconfigured archive_directory

Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured? (attached)

regards, Sergei

You described, may be, different bug or depended. I'll try again to explain.

First error was:

"invalid value for parameter
""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
WAL""","Specified archive directory does not exist."

And this is, may be, correct. But second, just after the first:
"""archive_mode"" enabled, yet archiving is not
configured","basic_archive.archive_directory is not
set."

The value of basic_archive.archive_directory
was erased after the first error. But it was erased only inside archive_library 'basic_archive', postgresql itself it was still
basic_archive.archive_directory=/mnt/ocean/postgres/stars/

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Sergei Kornilov (#4)
Re: basic_archive lost archive_directory

On Fri, Feb 6, 2026 at 11:25 PM Sergei Kornilov <sk@zsrv.org> wrote:

Hello

How to reproduce:

1) configure

archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '/some/path/'

2) start postgres and verify archive works
3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough.
4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts
5) restart archiver process with any reason: kill it or restart postgres
6) make archive_directory accessible again: archiver process will not check the directory's existence again and continue to complain about unconfigured archive_directory

Yes, this issue can last until, for example, the configuration is reloaded and
archive_directory is set again. I agree this needs to be addressed.

Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured? (attached)

Basically I like the idea of moving the checks for archive_directory from
check_archive_directory() to basic_archive_configured(). This would not only
address this issue, but also other problems caused by performing these checks
in the GUC check hook.

basic_archive is usually loaded only by the archiver via archive_library.
In that case, errors reported by check_archive_directory() are not logged
by default, since GUC check hook errors are normally emitted only by
the postmaster. As a result, misconfigurations (e.g., a non-existent
archive_directory) may go unnoticed, which is problematic for users.

Moving the checks currently done in check_archive_directory() to
basic_archive_configured() would resolve this, which is one reason
I like your proposal.

In your patch, only the existence check is moved to basic_archive_configured().
Would it also make sense to move the filename length check there? If so,
we could potentially remove the check_archive_directory GUC check hook entirely.

Regards,

--
Fujii Masao

In reply to: Fujii Masao (#6)
Re: basic_archive lost archive_directory

Hello!

In your patch, only the existence check is moved to basic_archive_configured().
Would it also make sense to move the filename length check there? If so,
we could potentially remove the check_archive_directory GUC check hook entirely.

Indeed.

I'm not sure, do I need to keep the check_archive_directory declaration for backport branches? I will split the patch into two parts.

regards, Sergei

Attachments:

001-bug_basic_archive_lost_archive_directory.patchtext/x-diff; name=001-bug_basic_archive_lost_archive_directory.patchDownload+22-32
002-basic_archive_remove_check_archive_directory.patchtext/x-diff; name=002-basic_archive_remove_check_archive_directory.patchDownload+1-13
#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#6)
Re: basic_archive lost archive_directory

On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:

Basically I like the idea of moving the checks for archive_directory from
check_archive_directory() to basic_archive_configured(). This would not only
address this issue, but also other problems caused by performing these checks
in the GUC check hook.

Note that the check_configured_cb is called for every segment to archive.
That means we'd be calling stat() much more, which seems like unnecessary
overhead to me. And we still need to be prepared for the archive directory
to disappear at any time. I'm wondering if it would be better to simply
remove this archive directory existence check from the check_configured_cb.

--
nathan

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#6)
Re: basic_archive lost archive_directory

On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:

basic_archive is usually loaded only by the archiver via archive_library.
In that case, errors reported by check_archive_directory() are not logged
by default, since GUC check hook errors are normally emitted only by
the postmaster. As a result, misconfigurations (e.g., a non-existent
archive_directory) may go unnoticed, which is problematic for users.

I don't think this is true. With default parameters, I see the following
in my logs with a misconfigured archive directory setting:

2026-02-09 15:53:10.372 CST [12803] WARNING: invalid value for parameter "basic_archive.archive_directory": "/does/not/exist"
2026-02-09 15:53:10.372 CST [12803] DETAIL: Specified archive directory does not exist.

--
nathan

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#8)
Re: basic_archive lost archive_directory

On Mon, Feb 09, 2026 at 03:48:05PM -0600, Nathan Bossart wrote:

I'm wondering if it would be better to simply remove this archive
directory existence check from the check_configured_cb.

s/check_configured_cb/check_archive_directory

--
nathan

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#9)
Re: basic_archive lost archive_directory

On Tue, Feb 10, 2026 at 6:58 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:

basic_archive is usually loaded only by the archiver via archive_library.
In that case, errors reported by check_archive_directory() are not logged
by default, since GUC check hook errors are normally emitted only by
the postmaster. As a result, misconfigurations (e.g., a non-existent
archive_directory) may go unnoticed, which is problematic for users.

I don't think this is true. With default parameters, I see the following
in my logs with a misconfigured archive directory setting:

2026-02-09 15:53:10.372 CST [12803] WARNING: invalid value for parameter "basic_archive.archive_directory": "/does/not/exist"
2026-02-09 15:53:10.372 CST [12803] DETAIL: Specified archive directory does not exist.

You're right if an invalid value for basic_archive.archive_directory is detected
at server startup. However, when the setting is changed and the configuration
is reloaded, the default behavior does not emit an error log.
Please see the steps below.

-----------------------------------
initdb -D data
mkdir arch
cat <<EOF >> data/postgresql.conf
archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '../arch'
EOF
pg_ctl -D data start
echo "basic_archive.archive_directory = 'not_exists'" >> data/postgresql.conf
pg_ctl -D data reload
-----------------------------------

With these steps, the only log messages I see are:

-----------------------------------
LOG: received SIGHUP, reloading configuration files
LOG: parameter "basic_archive.archive_directory" changed to "not_exists"
-----------------------------------

BTW, if basic_archive is specified in shared_preload_libraries, the same steps
produce:

-----------------------------------
LOG: invalid value for parameter "basic_archive.archive_directory":
"not_exists"
DETAIL: Specified archive directory does not exist.
LOG: configuration file "/hoge/data/postgresql.conf" contains errors;
unaffected changes were applied
-----------------------------------

Similarly, lowering the archiver log level to DEBUG3 (for example,
via log_min_messages = 'warning,archiver:debug3') also results in:

-----------------------------------
DEBUG: invalid value for parameter "basic_archive.archive_directory":
"not_exists"
DETAIL: Specified archive directory does not exist.
DEBUG: configuration file
"/System/Volumes/Data/dav/head-pgsql/data/postgresql.conf" contains
errors; unaffected changes were applied
-----------------------------------

This illustrates that, with default settings, the error can go unnoticed
on reload.

Regards,

--
Fujii Masao

#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#11)
Re: basic_archive lost archive_directory

On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote:

You're right if an invalid value for basic_archive.archive_directory is detected
at server startup. However, when the setting is changed and the configuration
is reloaded, the default behavior does not emit an error log.

Oh, I see. Even so, I don't think this problem is best fixed by moving
code from the GUC check hook to the archive module check callback. For
starters, since the archive module check callback is called for each file
to archive, this adds unnecessary overhead to repeatedly verify things that
really shouldn't change as long as the GUC's value stays the same. In
practice, the check callback is mostly meant to ensure the user hasn't
temporarily halted archiving, as per the following note in basic_archive's
documentation:

The default is an empty string, which effectively halts WAL archiving,
but if archive_mode is enabled, the server will accumulate WAL segment
files in the expectation that a value will soon be provided.

Furthermore, fixing this problem in basic_archive doesn't fix it for other
archive modules that use GUC check hooks. I think it should be fixed more
generally, perhaps by bumping up the log level in the archiver when the
GUC's prefix matches the archive_library.

I also want to push back a bit on the idea that basic_archive not working
when the directory is missing at startup is a bug. The documentation for
basic_archive clearly states that the directory must already exist. That
being said, I have no objection to making basic_archive more lenient or
even to back-patching such a change. As I mentioned upthread, IMHO we
should simply remove the existence check from the GUC check hook.
basic_archive must already be written to handle the archive directory
disappearing at any moment, so we should be able to rely on it without the
extra stat().

--
nathan

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#12)
Re: basic_archive lost archive_directory

On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:

As I mentioned upthread, IMHO we should simply remove the existence check
from the GUC check hook. basic_archive must already be written to handle
the archive directory disappearing at any moment, so we should be able to
rely on it without the extra stat().

Concretely, like the attached.

--
nathan

Attachments:

v1-0001-basic_archive-Allow-archive-directory-to-be-missi.patchtext/plain; charset=us-asciiDownload+1-15
#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#13)
Re: basic_archive lost archive_directory

On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote:

On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:

As I mentioned upthread, IMHO we should simply remove the existence check
from the GUC check hook. basic_archive must already be written to handle
the archive directory disappearing at any moment, so we should be able to
rely on it without the extra stat().

Concretely, like the attached.

Any thoughts? I'll wait another week or so, but then will proceed with
committing/back-patching this.

--
nathan

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#14)
Re: basic_archive lost archive_directory

Nathan Bossart <nathandbossart@gmail.com> writes:

On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote:

On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:

As I mentioned upthread, IMHO we should simply remove the existence check
from the GUC check hook. basic_archive must already be written to handle
the archive directory disappearing at any moment, so we should be able to
rely on it without the extra stat().

Any thoughts? I'll wait another week or so, but then will proceed with
committing/back-patching this.

Agreed --- this seems like a pretty inappropriate check to be making
in a check_hook.

regards, tom lane

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#12)
Re: basic_archive lost archive_directory

On Wed, Feb 11, 2026 at 1:06 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote:

You're right if an invalid value for basic_archive.archive_directory is detected
at server startup. However, when the setting is changed and the configuration
is reloaded, the default behavior does not emit an error log.

Oh, I see. Even so, I don't think this problem is best fixed by moving
code from the GUC check hook to the archive module check callback. For
starters, since the archive module check callback is called for each file
to archive, this adds unnecessary overhead to repeatedly verify things that
really shouldn't change as long as the GUC's value stays the same.

Yes, so if we had an archive module callback invoked just after
ProcessConfigFile(), we could use it to check the archive directory only
when needed. However, introducing a new callback isn't suitable for
back branches.

I also want to push back a bit on the idea that basic_archive not working
when the directory is missing at startup is a bug. The documentation for
basic_archive clearly states that the directory must already exist. That
being said, I have no objection to making basic_archive more lenient or
even to back-patching such a change. As I mentioned upthread, IMHO we
should simply remove the existence check from the GUC check hook.

I agree with removing the check from the GUC check hook. My only concern is
that simply removing it (rather than relocating it) changes the error message
users see when archive_directory is misconfigured, which may make
troubleshooting WAL archiving failures slightly harder.

[Current]
WARNING: invalid value for parameter
"basic_archive.archive_directory": "not_exists"
DETAIL: Specified archive directory does not exist.

[With the patch]
ERROR: could not create file
"not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
such file or directory

One option would be to perform the check in check_configured_cb(),
but as you noted, that would add an extra stat() call for each WAL archiving.
If that overhead is unacceptable, another approach would be to wrap
copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
we could stat() the archive directory and report a clearer reason if it does
not exist.

That said, if users are generally fine with the "could not create file" error,
I'm ok with the proposed patch (i.e., just removing the check).

Regards,

--
Fujii Masao

In reply to: Fujii Masao (#16)
Re: basic_archive lost archive_directory

Hello

Honestly, I don't think that only 8 stat calls per each 1Gbps of WAL writes is too much. I doubt basic_archive will be used in systems where there's a lot of writing. Although there is little benefit from this stat call, I agree.

That said, if users are generally fine with the "could not create file" error,
I'm ok with the proposed patch (i.e., just removing the check).

I'm ok with this error message.

regards, Sergei

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#16)
Re: basic_archive lost archive_directory

On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote:

I agree with removing the check from the GUC check hook. My only concern is
that simply removing it (rather than relocating it) changes the error message
users see when archive_directory is misconfigured, which may make
troubleshooting WAL archiving failures slightly harder.

[Current]
WARNING: invalid value for parameter
"basic_archive.archive_directory": "not_exists"
DETAIL: Specified archive directory does not exist.

[With the patch]
ERROR: could not create file
"not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
such file or directory

One option would be to perform the check in check_configured_cb(),
but as you noted, that would add an extra stat() call for each WAL archiving.
If that overhead is unacceptable, another approach would be to wrap
copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
we could stat() the archive directory and report a clearer reason if it does
not exist.

That said, if users are generally fine with the "could not create file" error,
I'm ok with the proposed patch (i.e., just removing the check).

I think it's fine as-is. If something is wrong with the error message,
IMHO we should fix the error message. Adding extra stat() calls to try to
give a nicer message might work most of the time, but there are still race
conditions where users will see the original one. But in any case, I
believe the current message style is used in many places, so I don't see a
strong reason to do anything different here.

--
nathan

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#18)
Re: basic_archive lost archive_directory

On Wed, Feb 25, 2026 at 2:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote:

I agree with removing the check from the GUC check hook. My only concern is
that simply removing it (rather than relocating it) changes the error message
users see when archive_directory is misconfigured, which may make
troubleshooting WAL archiving failures slightly harder.

[Current]
WARNING: invalid value for parameter
"basic_archive.archive_directory": "not_exists"
DETAIL: Specified archive directory does not exist.

[With the patch]
ERROR: could not create file
"not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
such file or directory

One option would be to perform the check in check_configured_cb(),
but as you noted, that would add an extra stat() call for each WAL archiving.
If that overhead is unacceptable, another approach would be to wrap
copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
we could stat() the archive directory and report a clearer reason if it does
not exist.

That said, if users are generally fine with the "could not create file" error,
I'm ok with the proposed patch (i.e., just removing the check).

I think it's fine as-is. If something is wrong with the error message,
IMHO we should fix the error message. Adding extra stat() calls to try to
give a nicer message might work most of the time, but there are still race
conditions where users will see the original one. But in any case, I
believe the current message style is used in many places, so I don't see a
strong reason to do anything different here.

Ok, so since I seem to be the only one with some reservations about
the message, I'm fine with proceeding with the proposed change.

Regards,

--
Fujii Masao

#20Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#19)
Re: basic_archive lost archive_directory

Committed.

--
nathan