Add --system-identifier / -s option to pg_resetwal
hi hackers,
I was just involved in a DR case, where we needed to change system ID in
control data, and noticed that there is no such capability, officially – so
I thought, it would be good to have it
the attached patch adds a new -s / --system-identifier option to
pg_resetwal that allows users to change the database cluster's system
identifier; it can be useful when you need to make a restored cluster
distinct from the original, or when cloning for testing
some aspects about the patch:
- accepts positive 64-bit integers only (zero not allowed)
- requires interactive confirmation or --force flag for safety
- detects non-TTY environments and requires --force in scripts
- included input validation and error handling
- updated docs with clear warnings about compatibility
- added tests, including some edge cases
looking forward to hearing feedback!
Nik
Attachments:
0001-pg_resetwal--system-identifier.patchapplication/octet-stream; name=0001-pg_resetwal--system-identifier.patchDownload+279-8
On Sat, May 31, 2025 at 2:52 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
I was just involved in a DR case, where we needed to change system ID in
control data, and noticed that there is no such capability, officially – so
I thought, it would be good to have itthe attached patch adds a new -s / --system-identifier option to
pg_resetwal that allows users to change the database cluster's system
identifier; it can be useful when you need to make a restored cluster
distinct from the original, or when cloning for testingsome aspects about the patch:
- accepts positive 64-bit integers only (zero not allowed)
- requires interactive confirmation or --force flag for safety
- detects non-TTY environments and requires --force in scripts
- included input validation and error handling
- updated docs with clear warnings about compatibility
- added tests, including some edge caseslooking forward to hearing feedback!
+1
Like the idea. I did a manual code review. It's a small patch, very
focused, and includes the proper docs.
It's very well thought out (I Iike the --force requirement for scripts).
The only thing I saw was that the code basically accepts Y or y, and all
else is considered "N". for that reason,
I suggest that this line:
printf(_("Continue? (y/n) "));
Would be more clear as:
printf(_("Continue? (y/n) [n]"));
indicating the N is the default answer. if you put anything other than Y
or y... N will be picked.
But this is almost trivial, but I wanted to be thorough.
Regards!
On 31.05.25 20:52, Nikolay Samokhvalov wrote:
the attached patch adds a new -s / --system-identifier option to
pg_resetwal that allows users to change the database cluster's system
identifier; it can be useful when you need to make a restored cluster
distinct from the original, or when cloning for testing
Maybe we can stop eating up short options? A long option seems
sufficient for this.
- requires interactive confirmation or --force flag for safety
I don't see the need for interactive confirmation here. The user would
have explicitly chosen the option, so they get what they asked for.
On Wed, Jun 04, 2025 at 06:49:31AM +0200, Peter Eisentraut wrote:
Maybe we can stop eating up short options? A long option seems sufficient
for this.
Agreed.
I don't see the need for interactive confirmation here. The user would have
explicitly chosen the option, so they get what they asked for.
Yes, let's remove that. The patch is making --force hold a different
meaning that it currently has. And the tests will have less bloat
with their TTY parts gone.
Honestly, I don't think that we need that many tests in the scope of
this patch. Let's drop the hexa test, the empty test, the test with a
value of 1, the very-large-but-valid-value, and the second update.
These overlap with other tests or just rely on internals of getopt()
or of strtou64(), which eat cycles without really any added value.
+ The system identifier is a unique 64-bit number that identifies the
+ database cluster. It is used by replication systems and backup tools
+ to ensure they are working with the correct cluster. Changing the
+ system identifier makes the cluster incompatible with existing
+ backups, standby servers, and replication setups.
The first sentence of this paragraph is a close copy-paste of what's
in protocol.sgml. It would be nice to avoid these duplications.
Note the report generated by `git diff --check`, while on it, with a
couple of whitespaces reported.
FWIW, I've wanted that in the past when err.. Performing some
chirurgy on clusters.
--
Michael
Thank you, Peter and Michael, for the reviews
Attached is v2, simplified as suggested:
- Removed short option -s
- Removed interactive confirmation and --force
- Simplified tests leaving only essential ones
Additionally, there was an off-list review done by Andrey Borodin, so his
comments also addressed:
- Simplified logic, getting rid of '-' check (negative numbers) -- decided
to accept negative input values (they wrap to valid positive uint64)
Nik
Attachments:
0002-pg_resetwal--system-identifier.patchapplication/octet-stream; name=0002-pg_resetwal--system-identifier.patchDownload+77-1
On 04.06.25 20:46, Nikolay Samokhvalov wrote:
- Simplified logic, getting rid of '-' check (negative numbers) --
decided to accept negative input values (they wrap to valid positive uint64)
That doesn't seem like a good idea to me.
On 2025/06/05 3:46, Nikolay Samokhvalov wrote:
Thank you, Peter and Michael, for the reviews
Attached is v2, simplified as suggested:
- Removed short option -s
- Removed interactive confirmation and --force
- Simplified tests leaving only essential ones
In my environment, the test failed with the following output:
# Failed test 'system identifier was changed correctly'
# at t/001_basic.pl line 203.
# got: '-8570200862721897295'
# expected: '9876543210987654321'
# Looks like you failed 1 test of 84.
t/001_basic.pl ......
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/84 subtests
I think this happens because the system_identifier returned
by pg_control_system() is a bigint, not an unsigned one.
So either we need to use a different method to retrieve
the value correctly, or we should fix pg_control_system() to
report the system identifier properly (though that would be
a separate issue).
+ if (set_sysid != 0)
+ {
+ printf(_("System identifier: " UINT64_FORMAT "\n"),
+ ControlFile.system_identifier);
+ }
For consistency with PrintControlValues() and pg_controldata,
the label should be "Database system identifier", and we should
use PRIu64 instead of UINT64_FORMAT for gettext()?
+ {"system-identifier", required_argument, NULL, 3},
{"next-transaction-id", required_argument, NULL, 'x'},
{"wal-segsize", required_argument, NULL, 1},
{"char-signedness", required_argument, NULL, 2},
It would be more consistent to place the "system-identifier"
entry just after "char-signedness".
One question about this feature: why do we need to allow
explicitly setting the system identifier? If the goal is
simply to ensure it's different from the original,
wouldn't it be sufficient to let pg_resetwal generate
a new (hopefully unique) value using existing logic,
like what's done in GuessControlValues() or BootStrapXLOG()?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Thank you for the review!
Attached is v3 addressing your feedback:
- Changed label to "Database system identifier" for consistency
- Use PRIu64 instead of UINT64_FORMAT for gettext
- Moved option entry after "char-signedness" in getopt array
- Fixed test to use pg_controldata instead of pg_control_system()
On Wed, Jun 4, 2025 at 4:56 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
One question about this feature: why do we need to allow
explicitly setting the system identifier? If the goal is
simply to ensure it's different from the original,
wouldn't it be sufficient to let pg_resetwal generate
a new (hopefully unique) value using existing logic,
like what's done in GuessControlValues() or BootStrapXLOG()?
Auto-generation would cover most cases. But explicit setting is a superset
-- users who want random can generate one themselves. Explicit control
costs nothing extra and gives full flexibility for edge cases (eg,
reproducible test environments, coordinating multiple clones). pg_resetwal
users are expected to be experts, so giving more control can cover more
possible scenarios.
Nik
Attachments:
v3-0001-Add-system-identifier-option-to-pg_resetwal.patchtext/plain; charset=US-ASCII; name=v3-0001-Add-system-identifier-option-to-pg_resetwal.patchDownload+80-4
On Sun, Feb 1, 2026 at 6:33 AM Nikolay Samokhvalov <nik@postgres.ai> wrote:
Thank you for the review!
Attached is v3 addressing your feedback:
Thanks for updating the patch!
With this patch, the test failed on my laptop as follows.
Could you please take a look and fix it?
$ make -s check -C src/bin/pg_resetwal/ PROVE_TESTS="t/001*"
# +++ tap check in src/bin/pg_resetwal +++
t/001_basic.pl .. 15/? # Tests were run but no plan was declared and
done_testing() was not seen.
# Looks like your test exited with 4 just after 82.
t/001_basic.pl .. Dubious, test returned 4 (wstat 1024, 0x400)
All 82 subtests passed
Test Summary Report
-------------------
t/001_basic.pl (Wstat: 1024 Tests: 82 Failed: 0)
Non-zero exit status: 4
Parse errors: No plan found in TAP output
Files=1, Tests=82, 4 wallclock secs ( 0.03 usr 0.01 sys + 0.31 cusr
0.65 csys = 1.00 CPU)
Result: FAIL
make: *** [check] Error 1
Regards,
--
Fujii Masao
On Wed, Feb 4, 2026 at 8:21 AM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Feb 1, 2026 at 6:33 AM Nikolay Samokhvalov <nik@postgres.ai>
wrote:Thank you for the review!
Attached is v3 addressing your feedback:
Thanks for updating the patch!
With this patch, the test failed on my laptop as follows.
Could you please take a look and fix it?$ make -s check -C src/bin/pg_resetwal/ PROVE_TESTS="t/001*"
# +++ tap check in src/bin/pg_resetwal +++
t/001_basic.pl .. 15/? # Tests were run but no plan was declared and
done_testing() was not seen.
# Looks like your test exited with 4 just after 82.
t/001_basic.pl .. Dubious, test returned 4 (wstat 1024, 0x400)
All 82 subtests passedTest Summary Report
-------------------
t/001_basic.pl (Wstat: 1024 Tests: 82 Failed: 0)
Non-zero exit status: 4
Parse errors: No plan found in TAP output
Files=1, Tests=82, 4 wallclock secs ( 0.03 usr 0.01 sys + 0.31 cusr
0.65 csys = 1.00 CPU)
Result: FAIL
make: *** [check] Error 1
v4 attached: rebased, fixed the failing test + added coverage
for --dry-run.