pg_upgrade to clusters with a different WAL segment size
Hi hackers,
The thread for the recent change to allow setting the WAL segment size at
initdb time [0]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fc49e24fa69a15efacd5b8958115ed9c43c48f9a included a bit of discussion regarding pg_upgrade [1]/messages/by-id/20160826043926.pwkz45ksxpmfn4g6@alap3.anarazel.de,
where it was suggested that relaxing an error check (presumably in
check_control_data()) might be enough to upgrade servers to a different WAL
segment size.
We've had success with our initial testing of upgrades to larger WAL
segment sizes, including post-upgrade pgbench runs. Beyond adjusting
check_control_data(), it looks like the 'pg_resetwal -l' call in
copy_xact_xlog_xid() may need to be adjusted to ensure that the WAL
starting address is set to a valid value.
Allowing changes to the WAL segment size during pg_upgrade seems like a
nice way to avoid needing a dump and load, so I would like to propose
adding support for this. I'd be happy to submit patches for this in the
next commitfest.
Nathan
[0]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fc49e24fa69a15efacd5b8958115ed9c43c48f9a
[1]: /messages/by-id/20160826043926.pwkz45ksxpmfn4g6@alap3.anarazel.de
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 11, 2017 at 12:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
Allowing changes to the WAL segment size during pg_upgrade seems like a
nice way to avoid needing a dump and load, so I would like to propose
adding support for this. I'd be happy to submit patches for this in the
next commitfest.
That's a worthy goal.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 10, 2017 at 4:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sat, Nov 11, 2017 at 12:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
Allowing changes to the WAL segment size during pg_upgrade seems like a
nice way to avoid needing a dump and load, so I would like to propose
adding support for this. I'd be happy to submit patches for this in the
next commitfest.That's a worthy goal.
I'm also interested in this item and I helped Nathan with a little of
the initial testing. Also, having changed redo sizes on other
database platforms a couple times (a simple & safe runtime operation
there), it seems to me that a feature like this would benefit
PostgreSQL.
I would add that we increased the max segment size in pg10 - but the
handful of users who are in the most pain with very high activity
rates on running systems are still limited to logical upgrades or
dump-and-load to get the benefit of larger WAL segment sizes. From a
technical perspective, it doesn't seem like it should be too
complicated to implement this in pg_upgrade since you're moving into a
new cluster anyway.
On Fri, Nov 10, 2017 at 7:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
We've had success with our initial testing of upgrades to larger WAL
segment sizes, including post-upgrade pgbench runs.
Just to fill this out a little; our very first test was to take a
9.6.5 16mb-wal post-pgbench db and pg_upgrade it to 10.0 128mb-wal
with no changes except removing the WAL size from check_control_data()
then doing more pgbench runs on the same db post-upgrade. Checked for
errors or problematic variation in TPS. More of a smoke-screen than a
thorough test, but everything looks good so far.
On Fri, Nov 10, 2017 at 7:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
Beyond adjusting
check_control_data(), it looks like the 'pg_resetwal -l' call in
copy_xact_xlog_xid() may need to be adjusted to ensure that the WAL
starting address is set to a valid value.
This was related to one interesting quirk we observed. The pg_upgrade
tried to call pg_resetwal on the *new* database with a log sequence
number that assumes the *old* wal size. In our test, it called
"pg_resetwal -l 000000010000000200000071" which is an invalid filename
with 128mb wal segment. In order to get a sensible filename,
PostgreSQL took the "71" and wrapped three times and added to get a
new WAL filename of "000000010000000500000011".
This actually raises a really interesting concern with pg_upgrade and
different WAL segment sizes. We have WAL filenames and then we have
XLogSegNo. If pg_upgrade just chooses the next valid filename, then
XLogSegNo will decrease and overlap when the WAL segment size goes up.
If pg_upgrade calculates the next XLogSegNo then the WAL segment
filename will decrease and overlap when the WAL segment size goes
down.
from xlog_internal.h:
#define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, \
(uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \
(uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes)))
...
#define XLogFromFileName(fname, tli, logSegNo, wal_segsz_bytes) \
do { \
uint32 log; \
uint32 seg; \
sscanf(fname, "%08X%08X%08X", tli, &log, &seg); \
*logSegNo = (uint64) log *
XLogSegmentsPerXLogId(wal_segsz_bytes) + seg; \
} while (0)
If there's an archive_command script that simply copies WAL files
somewhere then it might overwrite old logs when filenames overlap. I
haven't surveyed all the postgres backup tools & scripts out there but
it also seems conceivable that some tools will do the equivalent of
XLogFromFileName() so that they can be aware of there are missing logs
in a recovery scenario. Those tools could conceivably get broken by
an overlapping/decremented XLogSegNo.
I haven't fully thought through replication to consider whether
anything could break there, but that's another open question.
There are a few different approaches that could be taken to determine
the next WAL sequence number.
1) simplest: increment filename's middle digit by 1, zero out the
right digit. no filename overlap, don't need to know WAL segment
size. has XLogSegNo overlap.
2) use the next valid WAL filename with segment size awareness. no
filename overlap, has XLogSegNo overlap.
3) translate old DB filename to XLogSegNo, XLogSegNo++, translate to
new DB filename. no XLogSegNo overlap, has filename overlap.
4) most complex: XLogSegNo++, translate to new DB filename, then
increase filename until it's greater than last used filename in old
db. Always has gaps, never overlaps.
I'm thinking option 4 sounds the most correct. Any thoughts from
others to the contrary?
Anything else that is worth testing to look for potential problems
after pg_upgrade with different WAL segment sizes?
-Jeremy
Hi,
On 2017-11-10 15:46:11 +0000, Bossart, Nathan wrote:
The thread for the recent change to allow setting the WAL segment size at
initdb time [0] included a bit of discussion regarding pg_upgrade [1],
where it was suggested that relaxing an error check (presumably in
check_control_data()) might be enough to upgrade servers to a different WAL
segment size.We've had success with our initial testing of upgrades to larger WAL
segment sizes, including post-upgrade pgbench runs. Beyond adjusting
check_control_data(), it looks like the 'pg_resetwal -l' call in
copy_xact_xlog_xid() may need to be adjusted to ensure that the WAL
starting address is set to a valid value.Allowing changes to the WAL segment size during pg_upgrade seems like a
nice way to avoid needing a dump and load, so I would like to propose
adding support for this. I'd be happy to submit patches for this in the
next commitfest.
Hm. I'm not really on-board with doing this in pg_upgrade. A more
logical place seems to be pg_resetwal or something - there's no need to
force a pg_upgrade cycle (which is pretty expensive on clusters with a
significant number of objects) for somebody that wants to change the
segment size of a cluster without changing the major version.
pg_resetwal or a new tool seems like a more appropriate places for this.
Greetings,
Andres Freund
On Tue, Nov 14, 2017 at 6:11 AM, Andres Freund <andres@anarazel.de> wrote:
Hm. I'm not really on-board with doing this in pg_upgrade. A more
logical place seems to be pg_resetwal or something - there's no need to
force a pg_upgrade cycle (which is pretty expensive on clusters with a
significant number of objects) for somebody that wants to change the
segment size of a cluster without changing the major version.
pg_resetwal or a new tool seems like a more appropriate places for this.
pg_upgrade makes use of pg_resetwal, so I am assuming that what Nathan
means is actually what you mean, so as pg_upgrade gains as well an
option with the segment size which will wrap the pg_resetwal's call.
--
Michael
On 2017-11-14 07:26:22 +0900, Michael Paquier wrote:
On Tue, Nov 14, 2017 at 6:11 AM, Andres Freund <andres@anarazel.de> wrote:
Hm. I'm not really on-board with doing this in pg_upgrade. A more
logical place seems to be pg_resetwal or something - there's no need to
force a pg_upgrade cycle (which is pretty expensive on clusters with a
significant number of objects) for somebody that wants to change the
segment size of a cluster without changing the major version.
pg_resetwal or a new tool seems like a more appropriate places for this.pg_upgrade makes use of pg_resetwal, so I am assuming that what Nathan
means is actually what you mean, so as pg_upgrade gains as well an
option with the segment size which will wrap the pg_resetwal's call.
Even if that's the case, I fail to see why it'd be a good idea to have
any sort of pg_upgrade integration here. We should make pg_resetwal's
checks for this good enough, and not conflate something unrelated with
pg_upgrade goals.
Greetings,
Andres Freund
On Tue, Nov 14, 2017 at 7:32 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-14 07:26:22 +0900, Michael Paquier wrote:
On Tue, Nov 14, 2017 at 6:11 AM, Andres Freund <andres@anarazel.de> wrote:
Hm. I'm not really on-board with doing this in pg_upgrade. A more
logical place seems to be pg_resetwal or something - there's no need to
force a pg_upgrade cycle (which is pretty expensive on clusters with a
significant number of objects) for somebody that wants to change the
segment size of a cluster without changing the major version.
pg_resetwal or a new tool seems like a more appropriate places for this.pg_upgrade makes use of pg_resetwal, so I am assuming that what Nathan
means is actually what you mean, so as pg_upgrade gains as well an
option with the segment size which will wrap the pg_resetwal's call.Even if that's the case, I fail to see why it'd be a good idea to have
any sort of pg_upgrade integration here. We should make pg_resetwal's
checks for this good enough, and not conflate something unrelated with
pg_upgrade goals.
Both positions can be defended. Note that some users like to have the
upgrade experience within one single command do as much as possible if
possible, and this may include the possibility to switch segment size
to make the tool more friendly. I definitely agree with your point to
make the low-level magic happen in pg_resetwal though. Having
pg_upgrade call that at will could be argued afterwards.
--
Michael
Michael Paquier <michael.paquier@gmail.com> writes:
On Tue, Nov 14, 2017 at 7:32 AM, Andres Freund <andres@anarazel.de> wrote:
Even if that's the case, I fail to see why it'd be a good idea to have
any sort of pg_upgrade integration here. We should make pg_resetwal's
checks for this good enough, and not conflate something unrelated with
pg_upgrade goals.
Both positions can be defended. Note that some users like to have the
upgrade experience within one single command do as much as possible if
possible, and this may include the possibility to switch segment size
to make the tool more friendly. I definitely agree with your point to
make the low-level magic happen in pg_resetwal though. Having
pg_upgrade call that at will could be argued afterwards.
FWIW, I agree with Andres' position here. I think the charter of
pg_upgrade is to duplicate the old cluster as closely as it can,
not to modify its configuration. A close analogy is that it does not
attempt to upgrade extension versions while migrating the cluster.
regards, tom lane
On 11/13/17, 5:09 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Tue, Nov 14, 2017 at 7:32 AM, Andres Freund <andres@anarazel.de> wrote:
Even if that's the case, I fail to see why it'd be a good idea to have
any sort of pg_upgrade integration here. We should make pg_resetwal's
checks for this good enough, and not conflate something unrelated with
pg_upgrade goals.Both positions can be defended. Note that some users like to have the
upgrade experience within one single command do as much as possible if
possible, and this may include the possibility to switch segment size
to make the tool more friendly. I definitely agree with your point to
make the low-level magic happen in pg_resetwal though. Having
pg_upgrade call that at will could be argued afterwards.FWIW, I agree with Andres' position here. I think the charter of
pg_upgrade is to duplicate the old cluster as closely as it can,
not to modify its configuration. A close analogy is that it does not
attempt to upgrade extension versions while migrating the cluster.
Fair points. If we added an option to pg_resetwal, should we bother
trying to handle the WAL filename overlap that Jeremy mentioned? The
-l option gives us the ability to set the WAL starting address
manually, but it might not be terribly clear to end users that this is
something to watch out for.
Nathan
On Tue, Nov 14, 2017 at 8:38 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
Fair points. If we added an option to pg_resetwal, should we bother
trying to handle the WAL filename overlap that Jeremy mentioned? The
-l option gives us the ability to set the WAL starting address
manually, but it might not be terribly clear to end users that this is
something to watch out for.
After running an upgrade the previous WAL segments become useless, and
that's the same when changing a segment size. Even if you take care of
having the pg_resetwal-ed cluster using a segment name ahead of what
the previous cluster is using, you still run after the risk of having
other nodes with the previous segment size overwrite the WAL segments
of the new size. In short, that's only a matter of being careful with
your archive locations.
--
Michael
On Mon, Nov 13, 2017 at 3:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
On Tue, Nov 14, 2017 at 7:32 AM, Andres Freund <andres@anarazel.de> wrote:
Even if that's the case, I fail to see why it'd be a good idea to have
any sort of pg_upgrade integration here. We should make pg_resetwal's
checks for this good enough, and not conflate something unrelated with
pg_upgrade goals.FWIW, I agree with Andres' position here. I think the charter of
pg_upgrade is to duplicate the old cluster as closely as it can,
not to modify its configuration. A close analogy is that it does not
attempt to upgrade extension versions while migrating the cluster.
Having pg_upgrade simply allow an upgrade where the WAL sizes don't match
between the old cluster and the new cluster seems fairly trivial.
pg_upgrade isn't
changing anything; it's just *not* requiring the new and old clusters
to match in this
way. I'll admit I'm much newer to postgresql than everyone else on
this list, but I
haven't yet thought of any technical reason this check is actually
needed. (Just the
WAL sequencing/naming concern I outlined earlier.)
Writing code to change the WAL size on an existing cluster will be a little more
complex. We will need to delete all WAL files and recreate them at the new
sizes. We will need to either (1) be absolutely sure that there was a
clean shutdown
or (2) copy WAL data from the old files to the new files. We will
need to think harder
through the issue of gaps being introduced in the sequence of WAL filenames and
effects on backup/recovery.
These two ideas don't seem mutually exclusive to me. If pg_upgrade
allows working
with different WAL sizes, it doesn't mean we can't still introduce a
utility to change the
WAL size on running clusters.
So yeah - having a utility command to change the WAL size on a running cluster
is a great idea. But why are we wanting to maintain a check in pg_upgrade
which doesn't actually seem technically necessary? Or am I missing something
that would break if the WAL sizes didn't match across two clusters when
pg_upgrade moved the data between them?
-Jeremy
--
http://about.me/jeremy_schneider
(Disclaimer: I work for AWS and my opinions are my own.)
On Fri, Nov 17, 2017 at 5:46 PM, Jeremy Schneider
<schneider@ardentperf.com> wrote:
Having pg_upgrade simply allow an upgrade where the WAL sizes don't match
between the old cluster and the new cluster seems fairly trivial.
...
Writing code to change the WAL size on an existing cluster will be a little more
complex. We will need to delete all WAL files and recreate them at the new
sizes. We will need to either (1) be absolutely sure that there was a
clean shutdown
or (2) copy WAL data from the old files to the new files.
I think pg_resetwal has most of this logic already.
These two ideas don't seem mutually exclusive to me. If pg_upgrade
allows working
with different WAL sizes, it doesn't mean we can't still introduce a
utility to change the
WAL size on running clusters.
I don't think anyone is talking about doing this on a running cluster.
But it seems simple enough on a cluster that has been shut down.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Apologies for the delay.
Here is a patch to allow users to change the WAL segment size of a cluster with
pg_resetwal. Like the '--wal-segize' option in initdb, the new '-s' option
accepts segment sizes in megabytes. This patch also fixes two small issues that
I noticed.
The first is a division-by-zero error caused when the control data must be
guessed. If the control data is damaged, WalSegSz will not be set to the
guessed WAL segment size, resulting in an error during XLogFromFileName(). The
attached patch resolves this issue by ensuring WalSegSz is set to a valid value
after either reading or guessing the control values.
The second issue is with the automatically determined new WAL starting address
calculation. Currently, the starting address is determined by looking at the
last WAL file name, rounding the WAL byte position up to the next segment of the
new size, and then incrementing to the next WAL segment for the new size. I
believe this can cause the WAL byte position to go backwards in some cases (e.g.
WAL position is towards the end of a 1024 MB segment and pg_resetwal is used to
switch the segment size to 1 MB). Since the "current" WAL byte position
calculated in pg_resetwal is always the beginning of the latest log, the current
strategy may not yield a byte position sufficiently far ahead. The attached
patch changes this logic to determine the last byte position in the latest log
(for the old WAL segment size) prior to aligning to the new segment size and
incrementing.
Note that some segment size changes will cause old WAL file names to be reused.
The new documentation for '-s' contains a note warning of this side effect. I
considered a couple of strategies to prevent this, but I chose not to include
any in the patch based on previous correspondence in this thread. If file name
overlap is an issue, users can already use the '-l' option to set a safe WAL
starting address.
Nathan
Attachments:
pg_resetwal_segsize_v1.patchapplication/octet-stream; name=pg_resetwal_segsize_v1.patchDownload
diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index 43b58a4..2bffb59 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -248,6 +248,30 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-s</option> <replaceable class="parameter">wal_segment_size</replaceable></term>
+ <listitem>
+ <para>
+ Manually set the WAL segment size (in megabytes).
+ </para>
+
+ <para>
+ The WAL segment size must be set to a power of 2 between 1 and 1024 (megabytes).
+ </para>
+
+ <note>
+ <para>
+ While <command>pg_resetwal</command> will set the WAL starting address
+ beyond the latest existing WAL segment file, some segment size changes
+ can cause previous WAL file names to be reused. It is recommended to use
+ <option>-l</option> together with <option>-s</option> to manually set the
+ WAL starting address if WAL file name overlap will cause problems with
+ your archiving strategy.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-x</option> <replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index a132cf2..4bb7cd7 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -71,6 +71,7 @@ static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
static uint32 minXlogTli = 0;
static XLogSegNo minXlogSegNo = 0;
static int WalSegSz;
+static bool walSegSizeChanged = false;
static void CheckDataVersion(void);
static bool ReadControlFile(void);
@@ -117,7 +118,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:")) != -1)
+ while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:s:")) != -1)
{
switch (c)
{
@@ -275,6 +276,15 @@ main(int argc, char *argv[])
log_fname = pg_strdup(optarg);
break;
+ case 's':
+ WalSegSz = strtol(optarg, &endptr, 10) * 1024 * 1024;
+ if (*endptr != '\0' || !IsValidWalSegSize(WalSegSz))
+ {
+ fprintf(stderr, _("%s: WAL segment size (-s) must be a power of 2 between 1 and 1024 (megabytes)\n"), progname);
+ exit(1);
+ }
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
@@ -357,6 +367,9 @@ main(int argc, char *argv[])
if (!ReadControlFile())
GuessControlValues();
+ if (WalSegSz == 0)
+ WalSegSz = ControlFile.xlog_seg_size;
+
if (log_fname != NULL)
XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo, WalSegSz);
@@ -423,6 +436,12 @@ main(int argc, char *argv[])
ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;
}
+ if (WalSegSz != ControlFile.xlog_seg_size)
+ {
+ ControlFile.xlog_seg_size = WalSegSz;
+ walSegSizeChanged = true;
+ }
+
if (minXlogSegNo > newXlogSegNo)
newXlogSegNo = minXlogSegNo;
@@ -593,14 +612,13 @@ ReadControlFile(void)
}
memcpy(&ControlFile, buffer, sizeof(ControlFile));
- WalSegSz = ControlFile.xlog_seg_size;
- /* return false if WalSegSz is not valid */
- if (!IsValidWalSegSize(WalSegSz))
+ /* return false if segment size is not valid */
+ if (!IsValidWalSegSize(ControlFile.xlog_seg_size))
{
fprintf(stderr,
_("%s: pg_control specifies invalid WAL segment size (%d bytes); proceed with caution \n"),
- progname, WalSegSz);
+ progname, ControlFile.xlog_seg_size);
guessed = true;
}
@@ -844,6 +862,12 @@ PrintNewControlValues(void)
printf(_("newestCommitTsXid: %u\n"),
ControlFile.checkPointCopy.newestCommitTsXid);
}
+
+ if (walSegSizeChanged)
+ {
+ printf(_("Bytes per WAL segment: %u\n"),
+ ControlFile.xlog_seg_size);
+ }
}
@@ -895,9 +919,6 @@ RewriteControlFile(void)
ControlFile.max_prepared_xacts = 0;
ControlFile.max_locks_per_xact = 64;
- /* Now we can force the recorded xlog seg size to the right thing. */
- ControlFile.xlog_seg_size = WalSegSz;
-
/* Contents are protected with a CRC */
INIT_CRC32C(ControlFile.crc);
COMP_CRC32C(ControlFile.crc,
@@ -1033,7 +1054,7 @@ FindEndOfXLOG(void)
* are in virgin territory.
*/
xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size;
- newXlogSegNo = (xlogbytepos + WalSegSz - 1) / WalSegSz;
+ newXlogSegNo = (xlogbytepos + ControlFile.xlog_seg_size - 1) / WalSegSz;
newXlogSegNo++;
}
@@ -1263,6 +1284,7 @@ usage(void)
printf(_(" -O OFFSET set next multitransaction offset\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -x XID set next transaction ID\n"));
+ printf(_(" -s SEGMENTSIZE set WAL segment size (in megabytes)\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
}
On 2/7/18 13:34, Bossart, Nathan wrote:
Here is a patch to allow users to change the WAL segment size of a cluster with
pg_resetwal. Like the '--wal-segize' option in initdb, the new '-s' option
accepts segment sizes in megabytes.
Or how about we call the new option, um, --wal-segsize?
The first is a division-by-zero error caused when the control data must be
guessed. If the control data is damaged, WalSegSz will not be set to the
guessed WAL segment size, resulting in an error during XLogFromFileName(). The
attached patch resolves this issue by ensuring WalSegSz is set to a valid value
after either reading or guessing the control values.
I noticed this issue in pg_controldata too. Maybe that should be
combined in a separate patch.
Note that some segment size changes will cause old WAL file names to be reused.
The new documentation for '-s' contains a note warning of this side effect. I
considered a couple of strategies to prevent this, but I chose not to include
any in the patch based on previous correspondence in this thread. If file name
overlap is an issue, users can already use the '-l' option to set a safe WAL
starting address.
The patch "Configurable file mode mask" contains the beginning of a test
suite for pg_resetwal. It would be great if we could get a test case
for this new functionality implemented.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for taking a look.
On 3/3/18, 12:22 PM, "Peter Eisentraut" <peter.eisentraut@2ndquadrant.com> wrote:
On 2/7/18 13:34, Bossart, Nathan wrote:
Here is a patch to allow users to change the WAL segment size of a cluster with
pg_resetwal. Like the '--wal-segize' option in initdb, the new '-s' option
accepts segment sizes in megabytes.Or how about we call the new option, um, --wal-segsize?
It doesn't look like pg_resetwal takes any long options except for
--help and --version, although I suppose those could be added as part
of this effort.
The first is a division-by-zero error caused when the control data must be
guessed. If the control data is damaged, WalSegSz will not be set to the
guessed WAL segment size, resulting in an error during XLogFromFileName(). The
attached patch resolves this issue by ensuring WalSegSz is set to a valid value
after either reading or guessing the control values.I noticed this issue in pg_controldata too. Maybe that should be
combined in a separate patch.
IIUC you are talking about the following code in pg_controldata:
/*
* Calculate name of the WAL file containing the latest checkpoint's REDO
* start point.
*/
XLByteToSeg(ControlFile->checkPointCopy.redo, segno, WalSegSz);
XLogFileName(xlogfilename, ControlFile->checkPointCopy.ThisTimeLineID,
segno, WalSegSz);
If the control file says that the WAL segment size is 0 bytes, then
XLByteToSeg() will indeed fail. I think a similar issue is still
present for XLogFromFileName() in pg_resetwal even with this patch.
For pg_controldata, perhaps the affected output ("Latest checkpoint's
REDO WAL file") should be marked unknown if the control file specifies
a 0-byte WAL segment size. For pg_resetwal, perhaps the WAL segment
size should be "guessed" in this case.
The patch "Configurable file mode mask" contains the beginning of a test
suite for pg_resetwal. It would be great if we could get a test case
for this new functionality implemented.
I agree. I will take a look at that patch set.
Nathan
Here is a new set of patches that addresses most of Peter's feedback.
I've split it into four pieces:
0001: Fix division-by-zero error in pg_controldata
0002: Fix division-by-zero error in pg_resetwal
0003: Allow users to specify WAL segment size in pg_resetwal
0004: Add long options to pg_resetwal (including --wal-segsize)
On 3/3/18, 12:22 PM, "Peter Eisentraut" <peter.eisentraut@2ndquadrant.com> wrote:
The patch "Configurable file mode mask" contains the beginning of a test
suite for pg_resetwal. It would be great if we could get a test case
for this new functionality implemented.
I haven't added a test case yet, as this thread [0]/messages/by-id/ad346fe6-b23e-59f1-ecb7-0e08390ad629@pgmasters.net seems to still be
in-progress.
Nathan
[0]: /messages/by-id/ad346fe6-b23e-59f1-ecb7-0e08390ad629@pgmasters.net
Attachments:
v2-0001-Prevent-division-by-zero-errors-in-pg_controldata.patchapplication/octet-stream; name=v2-0001-Prevent-division-by-zero-errors-in-pg_controldata.patchDownload
From c07e622b8b47d249608d34bd334db5532b051b6b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Thu, 8 Mar 2018 16:48:47 +0000
Subject: [PATCH v2 1/4] Prevent division-by-zero errors in pg_controldata.
If the control file is corrupted and specifies the WAL segment size
to be 0 bytes, calculating the latest checkpoint's REDO WAL file
will fail with a division-by-zero error. This change marks the WAL
file name as "unknown" in this case.
---
src/bin/pg_controldata/pg_controldata.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cc73b7d..709418e 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -193,10 +193,19 @@ main(int argc, char *argv[])
/*
* Calculate name of the WAL file containing the latest checkpoint's REDO
* start point.
+ *
+ * If the control file indicates that the WAL segment size is 0 bytes, we
+ * cannot calculate the latest checkpoint's REDO WAL file, so mark it as
+ * unknown.
*/
- XLByteToSeg(ControlFile->checkPointCopy.redo, segno, WalSegSz);
- XLogFileName(xlogfilename, ControlFile->checkPointCopy.ThisTimeLineID,
- segno, WalSegSz);
+ if (WalSegSz == 0)
+ strcpy(xlogfilename, "unknown");
+ else
+ {
+ XLByteToSeg(ControlFile->checkPointCopy.redo, segno, WalSegSz);
+ XLogFileName(xlogfilename, ControlFile->checkPointCopy.ThisTimeLineID,
+ segno, WalSegSz);
+ }
/*
* Format system_identifier and mock_authentication_nonce separately to
--
2.7.3.AMZN
v2-0002-Prevent-division-by-zero-errors-in-pg_resetwal.patchapplication/octet-stream; name=v2-0002-Prevent-division-by-zero-errors-in-pg_resetwal.patchDownload
From e00b10052ec7fd42296401491875e209c6ae0489 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 13 Mar 2018 23:36:37 +0000
Subject: [PATCH v2 2/4] Prevent division-by-zero errors in pg_resetwal.
Currently, if the control file data cannot be read, the WalSegSz
will stay set to 0, which will cause division-by-zero errors later
on. This change fixes this by ensuring that the WalSegSz variable
is set when the control file data must be guessed.
Even so, the control file may be corrupted and specify the WAL
segment size to be 0 bytes. With this change, pg_resetwal defaults
to a "guessed" segment size (the default WAL segment size) for this
case.
While the WalSegSz variable could presumably be removed entirely as
part of this change, it is likely to prove useful in a follow-up
change to allow changing the WAL segment size of a cluster with
pg_resetwal.
---
src/bin/pg_resetwal/pg_resetwal.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index a132cf2..a5df0fe 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -71,6 +71,7 @@ static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
static uint32 minXlogTli = 0;
static XLogSegNo minXlogSegNo = 0;
static int WalSegSz;
+static bool walSegSizeChanged = false;
static void CheckDataVersion(void);
static bool ReadControlFile(void);
@@ -357,6 +358,19 @@ main(int argc, char *argv[])
if (!ReadControlFile())
GuessControlValues();
+ /*
+ * Set WalSegSz.
+ *
+ * If the control file specifies the WAL segment size to be 0 bytes, guess
+ * that it should be the default WAL segment size.
+ */
+ if (ControlFile.xlog_seg_size == 0)
+ {
+ ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
+ walSegSizeChanged = true;
+ }
+ WalSegSz = ControlFile.xlog_seg_size;
+
if (log_fname != NULL)
XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo, WalSegSz);
@@ -593,14 +607,12 @@ ReadControlFile(void)
}
memcpy(&ControlFile, buffer, sizeof(ControlFile));
- WalSegSz = ControlFile.xlog_seg_size;
- /* return false if WalSegSz is not valid */
- if (!IsValidWalSegSize(WalSegSz))
+ if (!IsValidWalSegSize(ControlFile.xlog_seg_size))
{
fprintf(stderr,
_("%s: pg_control specifies invalid WAL segment size (%d bytes); proceed with caution \n"),
- progname, WalSegSz);
+ progname, ControlFile.xlog_seg_size);
guessed = true;
}
@@ -844,6 +856,12 @@ PrintNewControlValues(void)
printf(_("newestCommitTsXid: %u\n"),
ControlFile.checkPointCopy.newestCommitTsXid);
}
+
+ if (walSegSizeChanged)
+ {
+ printf(_("Bytes per WAL segment: %u\n"),
+ ControlFile.xlog_seg_size);
+ }
}
@@ -895,9 +913,6 @@ RewriteControlFile(void)
ControlFile.max_prepared_xacts = 0;
ControlFile.max_locks_per_xact = 64;
- /* Now we can force the recorded xlog seg size to the right thing. */
- ControlFile.xlog_seg_size = WalSegSz;
-
/* Contents are protected with a CRC */
INIT_CRC32C(ControlFile.crc);
COMP_CRC32C(ControlFile.crc,
--
2.7.3.AMZN
v2-0003-Allow-users-to-change-the-WAL-segment-size-with-p.patchapplication/octet-stream; name=v2-0003-Allow-users-to-change-the-WAL-segment-size-with-p.patchDownload
From b9a440ba4892f43e1f65c79c7b4da86d00053530 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Tue, 13 Mar 2018 23:37:28 +0000
Subject: [PATCH v2 3/4] Allow users to change the WAL segment size with
pg_resetwal.
Like the '--wal-segsize' option in initdb, the new 's' option
accepts segment sizes in megabytes. This patch also fixes a small
issue with the WAL starting address calculation that would cause
the WAL byte position to go backwards in some cases.
Note that some segment size changes will cause old WAL file names
to be reused. The new documentation for '-s' contains a note
warning of this side effect and how to avoid it.
---
doc/src/sgml/ref/pg_resetwal.sgml | 24 ++++++++++++++++++++++++
src/bin/pg_resetwal/pg_resetwal.c | 25 ++++++++++++++++++++++---
2 files changed, 46 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index 43b58a4..2bffb59 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -248,6 +248,30 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-s</option> <replaceable class="parameter">wal_segment_size</replaceable></term>
+ <listitem>
+ <para>
+ Manually set the WAL segment size (in megabytes).
+ </para>
+
+ <para>
+ The WAL segment size must be set to a power of 2 between 1 and 1024 (megabytes).
+ </para>
+
+ <note>
+ <para>
+ While <command>pg_resetwal</command> will set the WAL starting address
+ beyond the latest existing WAL segment file, some segment size changes
+ can cause previous WAL file names to be reused. It is recommended to use
+ <option>-l</option> together with <option>-s</option> to manually set the
+ WAL starting address if WAL file name overlap will cause problems with
+ your archiving strategy.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-x</option> <replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index a5df0fe..604a8d2 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -118,7 +118,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:")) != -1)
+ while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:s:")) != -1)
{
switch (c)
{
@@ -276,6 +276,15 @@ main(int argc, char *argv[])
log_fname = pg_strdup(optarg);
break;
+ case 's':
+ WalSegSz = strtol(optarg, &endptr, 10) * 1024 * 1024;
+ if (*endptr != '\0' || !IsValidWalSegSize(WalSegSz))
+ {
+ fprintf(stderr, _("%s: WAL segment size (-s) must be a power of 2 between 1 and 1024 (megabytes)\n"), progname);
+ exit(1);
+ }
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
@@ -363,13 +372,16 @@ main(int argc, char *argv[])
*
* If the control file specifies the WAL segment size to be 0 bytes, guess
* that it should be the default WAL segment size.
+ *
+ * If no new WAL segment size was set, use the control file value.
*/
if (ControlFile.xlog_seg_size == 0)
{
ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
walSegSizeChanged = true;
}
- WalSegSz = ControlFile.xlog_seg_size;
+ if (WalSegSz == 0)
+ WalSegSz = ControlFile.xlog_seg_size;
if (log_fname != NULL)
XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo, WalSegSz);
@@ -437,6 +449,12 @@ main(int argc, char *argv[])
ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;
}
+ if (WalSegSz != ControlFile.xlog_seg_size)
+ {
+ ControlFile.xlog_seg_size = WalSegSz;
+ walSegSizeChanged = true;
+ }
+
if (minXlogSegNo > newXlogSegNo)
newXlogSegNo = minXlogSegNo;
@@ -1048,7 +1066,7 @@ FindEndOfXLOG(void)
* are in virgin territory.
*/
xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size;
- newXlogSegNo = (xlogbytepos + WalSegSz - 1) / WalSegSz;
+ newXlogSegNo = (xlogbytepos + ControlFile.xlog_seg_size - 1) / WalSegSz;
newXlogSegNo++;
}
@@ -1276,6 +1294,7 @@ usage(void)
printf(_(" -n no update, just show what would be done (for testing)\n"));
printf(_(" -o OID set next OID\n"));
printf(_(" -O OFFSET set next multitransaction offset\n"));
+ printf(_(" -s SIZE set WAL segment size (in megabytes)\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -x XID set next transaction ID\n"));
printf(_(" -?, --help show this help, then exit\n"));
--
2.7.3.AMZN
v2-0004-Add-long-options-to-pg_resetwal.patchapplication/octet-stream; name=v2-0004-Add-long-options-to-pg_resetwal.patchDownload
From df8cfdf848de9f6b49964cc776617a6cbc34a237 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 14 Mar 2018 00:45:40 +0000
Subject: [PATCH v2 4/4] Add long options to pg_resetwal.
---
doc/src/sgml/ref/pg_resetwal.sgml | 28 +++++++++++++++++++++---
src/bin/pg_resetwal/pg_resetwal.c | 45 ++++++++++++++++++++++++++-------------
2 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index 2bffb59..ce87403 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -22,10 +22,22 @@ PostgreSQL documentation
<refsynopsisdiv>
<cmdsynopsis>
<command>pg_resetwal</command>
- <arg choice="opt"><option>-f</option></arg>
- <arg choice="opt"><option>-n</option></arg>
+ <group choice="opt">
+ <arg choice="plain"><option>--force</option></arg>
+ <arg choice="plain"><option>-f</option></arg>
+ </group>
+ <group choice="opt">
+ <arg choice="plain"><option>--no-update</option></arg>
+ <arg choice="plain"><option>-n</option></arg>
+ </group>
<arg rep="repeat"><replaceable>option</replaceable></arg>
- <arg choice="req"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+ <arg choice="req">
+ <group choice="opt">
+ <arg choice="plain"><option>--pgdata</option></arg>
+ <arg choice="plain"><option>-D</option></arg>
+ </group>
+ <replaceable class="parameter"> datadir</replaceable>
+ </arg>
</cmdsynopsis>
</refsynopsisdiv>
@@ -78,6 +90,7 @@ PostgreSQL documentation
<variablelist>
<varlistentry>
<term><option>-f</option></term>
+ <term><option>--force</option></term>
<listitem>
<para>
Force <command>pg_resetwal</command> to proceed even if it cannot determine
@@ -88,6 +101,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-n</option></term>
+ <term><option>--no-update</option></term>
<listitem>
<para>
The <option>-n</option> (no operation) option instructs
@@ -124,6 +138,7 @@ PostgreSQL documentation
<variablelist>
<varlistentry>
<term><option>-c</option> <replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></term>
+ <term><option>--xact-ids=</option><replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
Manually set the oldest and newest transaction IDs for which the commit
@@ -145,6 +160,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-e</option> <replaceable class="parameter">xid_epoch</replaceable></term>
+ <term><option>--epoch=</option><replaceable class="parameter">xid_epoch</replaceable></term>
<listitem>
<para>
Manually set the next transaction ID's epoch.
@@ -165,6 +181,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-l</option> <replaceable class="parameter">walfile</replaceable></term>
+ <term><option>--wal-address=</option><replaceable class="parameter">walfile</replaceable></term>
<listitem>
<para>
Manually set the WAL starting address.
@@ -196,6 +213,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-m</option> <replaceable class="parameter">mxid</replaceable>,<replaceable class="parameter">mxid</replaceable></term>
+ <term><option>--multixact-ids=</option><replaceable class="parameter">mxid</replaceable>,<replaceable class="parameter">mxid</replaceable></term>
<listitem>
<para>
Manually set the next and oldest multitransaction ID.
@@ -217,6 +235,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-o</option> <replaceable class="parameter">oid</replaceable></term>
+ <term><option>--next-oid=</option><replaceable class="parameter">oid</replaceable></term>
<listitem>
<para>
Manually set the next OID.
@@ -232,6 +251,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-O</option> <replaceable class="parameter">mxoff</replaceable></term>
+ <term><option>--multixact-offset=</option><replaceable class="parameter">mxoff</replaceable></term>
<listitem>
<para>
Manually set the next multitransaction offset.
@@ -249,6 +269,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-s</option> <replaceable class="parameter">wal_segment_size</replaceable></term>
+ <term><option>--wal-segsize=</option><replaceable class="parameter">wal_segment_size</replaceable></term>
<listitem>
<para>
Manually set the WAL segment size (in megabytes).
@@ -273,6 +294,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-x</option> <replaceable class="parameter">xid</replaceable></term>
+ <term><option>--next-xact-id=</option><replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
Manually set the next transaction ID.
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 604a8d2..b3c1d8d 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -89,6 +89,21 @@ static void usage(void);
int
main(int argc, char *argv[])
{
+ static struct option long_options[] = {
+ {"pgdata", required_argument, NULL, 'D'},
+ {"force", no_argument, NULL, 'f'},
+ {"no-update", no_argument, NULL, 'n'},
+ {"epoch", required_argument, NULL, 'e'},
+ {"next-xact-id", required_argument, NULL, 'x'},
+ {"xact-ids", required_argument, NULL, 'c'},
+ {"next-oid", required_argument, NULL, 'o'},
+ {"multixact-ids", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'O'},
+ {"wal-address", required_argument, NULL, 'l'},
+ {"wal-segsize", required_argument, NULL, 's'},
+ {NULL, 0, NULL, 0}
+ };
+
int c;
bool force = false;
bool noupdate = false;
@@ -118,7 +133,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:s:")) != -1)
+ while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:s:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -1284,19 +1299,19 @@ usage(void)
printf(_("%s resets the PostgreSQL write-ahead log.\n\n"), progname);
printf(_("Usage:\n %s [OPTION]... DATADIR\n\n"), progname);
printf(_("Options:\n"));
- printf(_(" -c XID,XID set oldest and newest transactions bearing commit timestamp\n"));
- printf(_(" (zero in either value means no change)\n"));
- printf(_(" [-D] DATADIR data directory\n"));
- printf(_(" -e XIDEPOCH set next transaction ID epoch\n"));
- printf(_(" -f force update to be done\n"));
- printf(_(" -l WALFILE force minimum WAL starting location for new write-ahead log\n"));
- printf(_(" -m MXID,MXID set next and oldest multitransaction ID\n"));
- printf(_(" -n no update, just show what would be done (for testing)\n"));
- printf(_(" -o OID set next OID\n"));
- printf(_(" -O OFFSET set next multitransaction offset\n"));
- printf(_(" -s SIZE set WAL segment size (in megabytes)\n"));
- printf(_(" -V, --version output version information, then exit\n"));
- printf(_(" -x XID set next transaction ID\n"));
- printf(_(" -?, --help show this help, then exit\n"));
+ printf(_(" -c, --xact-ids=XID,XID set oldest and newest transactions bearing commit timestamp\n"));
+ printf(_(" (zero in either value means no change)\n"));
+ printf(_(" [-D, --pgdata=]DATADIR data directory\n"));
+ printf(_(" -e, --epoch=XIDEPOCH set next transaction ID epoch\n"));
+ printf(_(" -f, --force force update to be done\n"));
+ printf(_(" -l, --wal-address=WALFILE force minimum WAL starting location for new write-ahead log\n"));
+ printf(_(" -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID\n"));
+ printf(_(" -n, --no-update no update, just show what would be done (for testing)\n"));
+ printf(_(" -o, --next-oid=OID set next OID\n"));
+ printf(_(" -O, --multixact-offset=OFFSET set next multitransaction offset\n"));
+ printf(_(" -s, --wal-segsize=SIZE set WAL segment size (in megabytes)\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -x, --next-xact-id=XID set next transaction ID\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
}
--
2.7.3.AMZN
On 3/13/18 20:53, Bossart, Nathan wrote:
Here is a new set of patches that addresses most of Peter's feedback.
I've split it into four pieces:0001: Fix division-by-zero error in pg_controldata
committed that
0002: Fix division-by-zero error in pg_resetwal
This looks a bit more complicated than necessary to me. I think there
is a mistake in the original patch fc49e24fa69: In ReadControlFile(),
it says
/* return false if WalSegSz is not valid */
but then it doesn't actually do that.
If we make that change, then a wrong WAL segment size in the control
file would send us to GuessControlValues(). There, we need to set
WalSegSz, and everything would work.
diff --git a/src/bin/pg_resetwal/pg_resetwal.c
b/src/bin/pg_resetwal/pg_resetwal.c
index a132cf2e9f..c99e7a8db1 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -601,7 +601,7 @@ ReadControlFile(void)
fprintf(stderr,
_("%s: pg_control specifies invalid WAL segment size
(%d bytes); proceed with caution \n"),
progname, WalSegSz);
- guessed = true;
+ return false;
}
return true;
@@ -678,7 +678,7 @@ GuessControlValues(void)
ControlFile.floatFormat = FLOATFORMAT_VALUE;
ControlFile.blcksz = BLCKSZ;
ControlFile.relseg_size = RELSEG_SIZE;
- ControlFile.xlog_blcksz = XLOG_BLCKSZ;
+ WalSegSz = ControlFile.xlog_blcksz = XLOG_BLCKSZ;
ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
ControlFile.nameDataLen = NAMEDATALEN;
ControlFile.indexMaxKeys = INDEX_MAX_KEYS;
What do you think?
Does your patch aim to do something different?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/21/18, 12:57 PM, "Peter Eisentraut" <peter.eisentraut@2ndquadrant.com> wrote:
On 3/13/18 20:53, Bossart, Nathan wrote:
0001: Fix division-by-zero error in pg_controldata
committed that
Thanks!
0002: Fix division-by-zero error in pg_resetwal
This looks a bit more complicated than necessary to me. I think there
is a mistake in the original patch fc49e24fa69: In ReadControlFile(),
it says/* return false if WalSegSz is not valid */
but then it doesn't actually do that.
If we make that change, then a wrong WAL segment size in the control
file would send us to GuessControlValues(). There, we need to set
WalSegSz, and everything would work.diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index a132cf2e9f..c99e7a8db1 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -601,7 +601,7 @@ ReadControlFile(void) fprintf(stderr, _("%s: pg_control specifies invalid WAL segment size (%d bytes); proceed with caution \n"), progname, WalSegSz); - guessed = true; + return false; }return true; @@ -678,7 +678,7 @@ GuessControlValues(void) ControlFile.floatFormat = FLOATFORMAT_VALUE; ControlFile.blcksz = BLCKSZ; ControlFile.relseg_size = RELSEG_SIZE; - ControlFile.xlog_blcksz = XLOG_BLCKSZ; + WalSegSz = ControlFile.xlog_blcksz = XLOG_BLCKSZ; ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE; ControlFile.nameDataLen = NAMEDATALEN; ControlFile.indexMaxKeys = INDEX_MAX_KEYS;What do you think?
Does your patch aim to do something different?
With my patch, I was trying to still use the rest of the values in the
control file, altering only the WAL segment size. Also, I was doing a
bit of setup for 0003, where WalSegSz is used as the "new" WAL segment
size.
However, I think your approach is better. In addition to being much
simpler, it might make more sense to treat the control file as
corrupted if the WAL segment size is invalid, anyway. Any setup for
0003 can be easily moved, too.
Nathan
Here is an updated set of patches that use the newly proposed approach
for avoiding division-by-zero errors in pg_resetwal.
Nathan
Attachments:
v3-0001-Prevent-division-by-zero-errors-in-pg_resetwal.patchapplication/octet-stream; name=v3-0001-Prevent-division-by-zero-errors-in-pg_resetwal.patchDownload
From 313734e050c391c13016befe7e57d3d5a87d778d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 21 Mar 2018 19:18:40 +0000
Subject: [PATCH v3 1/3] Prevent division-by-zero errors in pg_resetwal.
---
src/bin/pg_resetwal/pg_resetwal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index a132cf2..387ce51 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -601,7 +601,7 @@ ReadControlFile(void)
fprintf(stderr,
_("%s: pg_control specifies invalid WAL segment size (%d bytes); proceed with caution \n"),
progname, WalSegSz);
- guessed = true;
+ return false;
}
return true;
@@ -679,7 +679,7 @@ GuessControlValues(void)
ControlFile.blcksz = BLCKSZ;
ControlFile.relseg_size = RELSEG_SIZE;
ControlFile.xlog_blcksz = XLOG_BLCKSZ;
- ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
+ WalSegSz = ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
ControlFile.nameDataLen = NAMEDATALEN;
ControlFile.indexMaxKeys = INDEX_MAX_KEYS;
ControlFile.toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
--
2.7.3.AMZN
v3-0002-Allow-users-to-change-the-WAL-segment-size-with-p.patchapplication/octet-stream; name=v3-0002-Allow-users-to-change-the-WAL-segment-size-with-p.patchDownload
From b3aaf8f4a79f3b66a9ca5e1a058f155f95937fef Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 21 Mar 2018 20:02:27 +0000
Subject: [PATCH v3 2/3] Allow users to change the WAL segment size with
pg_resetwal.
---
doc/src/sgml/ref/pg_resetwal.sgml | 24 +++++++++++++++++++++
src/bin/pg_resetwal/pg_resetwal.c | 45 ++++++++++++++++++++++++++++++---------
2 files changed, 59 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index 43b58a4..2bffb59 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -248,6 +248,30 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
+ <term><option>-s</option> <replaceable class="parameter">wal_segment_size</replaceable></term>
+ <listitem>
+ <para>
+ Manually set the WAL segment size (in megabytes).
+ </para>
+
+ <para>
+ The WAL segment size must be set to a power of 2 between 1 and 1024 (megabytes).
+ </para>
+
+ <note>
+ <para>
+ While <command>pg_resetwal</command> will set the WAL starting address
+ beyond the latest existing WAL segment file, some segment size changes
+ can cause previous WAL file names to be reused. It is recommended to use
+ <option>-l</option> together with <option>-s</option> to manually set the
+ WAL starting address if WAL file name overlap will cause problems with
+ your archiving strategy.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><option>-x</option> <replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 387ce51..506fdda 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -71,6 +71,7 @@ static MultiXactOffset set_mxoff = (MultiXactOffset) -1;
static uint32 minXlogTli = 0;
static XLogSegNo minXlogSegNo = 0;
static int WalSegSz;
+static bool walSegSzChanged = false;
static void CheckDataVersion(void);
static bool ReadControlFile(void);
@@ -117,7 +118,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:")) != -1)
+ while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:s:")) != -1)
{
switch (c)
{
@@ -275,6 +276,15 @@ main(int argc, char *argv[])
log_fname = pg_strdup(optarg);
break;
+ case 's':
+ WalSegSz = strtol(optarg, &endptr, 10) * 1024 * 1024;
+ if (endptr == optarg || *endptr != '\0' || !IsValidWalSegSize(WalSegSz))
+ {
+ fprintf(stderr, _("%s: WAL segment size (-s) must be a power of 2 between 1 and 1024 (megabytes)\n"), progname);
+ exit(1);
+ }
+ break;
+
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
@@ -357,6 +367,12 @@ main(int argc, char *argv[])
if (!ReadControlFile())
GuessControlValues();
+ /*
+ * If no new WAL segment size was specified, use the control file value.
+ */
+ if (WalSegSz == 0)
+ WalSegSz = ControlFile.xlog_seg_size;
+
if (log_fname != NULL)
XLogFromFileName(log_fname, &minXlogTli, &minXlogSegNo, WalSegSz);
@@ -423,6 +439,12 @@ main(int argc, char *argv[])
ControlFile.checkPointCopy.PrevTimeLineID = minXlogTli;
}
+ if (WalSegSz != ControlFile.xlog_seg_size)
+ {
+ ControlFile.xlog_seg_size = WalSegSz;
+ walSegSzChanged = true;
+ }
+
if (minXlogSegNo > newXlogSegNo)
newXlogSegNo = minXlogSegNo;
@@ -593,14 +615,13 @@ ReadControlFile(void)
}
memcpy(&ControlFile, buffer, sizeof(ControlFile));
- WalSegSz = ControlFile.xlog_seg_size;
- /* return false if WalSegSz is not valid */
- if (!IsValidWalSegSize(WalSegSz))
+ /* return false if WAL segment size is not valid */
+ if (!IsValidWalSegSize(ControlFile.xlog_seg_size))
{
fprintf(stderr,
_("%s: pg_control specifies invalid WAL segment size (%d bytes); proceed with caution \n"),
- progname, WalSegSz);
+ progname, ControlFile.xlog_seg_size);
return false;
}
@@ -679,7 +700,7 @@ GuessControlValues(void)
ControlFile.blcksz = BLCKSZ;
ControlFile.relseg_size = RELSEG_SIZE;
ControlFile.xlog_blcksz = XLOG_BLCKSZ;
- WalSegSz = ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
+ ControlFile.xlog_seg_size = DEFAULT_XLOG_SEG_SIZE;
ControlFile.nameDataLen = NAMEDATALEN;
ControlFile.indexMaxKeys = INDEX_MAX_KEYS;
ControlFile.toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
@@ -844,6 +865,12 @@ PrintNewControlValues(void)
printf(_("newestCommitTsXid: %u\n"),
ControlFile.checkPointCopy.newestCommitTsXid);
}
+
+ if (walSegSzChanged)
+ {
+ printf(_("Bytes per WAL segment: %u\n"),
+ ControlFile.xlog_seg_size);
+ }
}
@@ -895,9 +922,6 @@ RewriteControlFile(void)
ControlFile.max_prepared_xacts = 0;
ControlFile.max_locks_per_xact = 64;
- /* Now we can force the recorded xlog seg size to the right thing. */
- ControlFile.xlog_seg_size = WalSegSz;
-
/* Contents are protected with a CRC */
INIT_CRC32C(ControlFile.crc);
COMP_CRC32C(ControlFile.crc,
@@ -1033,7 +1057,7 @@ FindEndOfXLOG(void)
* are in virgin territory.
*/
xlogbytepos = newXlogSegNo * ControlFile.xlog_seg_size;
- newXlogSegNo = (xlogbytepos + WalSegSz - 1) / WalSegSz;
+ newXlogSegNo = (xlogbytepos + ControlFile.xlog_seg_size - 1) / WalSegSz;
newXlogSegNo++;
}
@@ -1261,6 +1285,7 @@ usage(void)
printf(_(" -n no update, just show what would be done (for testing)\n"));
printf(_(" -o OID set next OID\n"));
printf(_(" -O OFFSET set next multitransaction offset\n"));
+ printf(_(" -s SIZE set WAL segment size (in megabytes)\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -x XID set next transaction ID\n"));
printf(_(" -?, --help show this help, then exit\n"));
--
2.7.3.AMZN
v3-0003-Add-long-options-to-pg_resetwal.patchapplication/octet-stream; name=v3-0003-Add-long-options-to-pg_resetwal.patchDownload
From a6042d6a769d4c40b8384b6bf9c533ee293a93df Mon Sep 17 00:00:00 2001
From: Nathan Bossart <bossartn@amazon.com>
Date: Wed, 21 Mar 2018 20:03:56 +0000
Subject: [PATCH v3 3/3] Add long options to pg_resetwal.
---
doc/src/sgml/ref/pg_resetwal.sgml | 28 +++++++++++++++++++++---
src/bin/pg_resetwal/pg_resetwal.c | 45 ++++++++++++++++++++++++++-------------
2 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index 2bffb59..ce87403 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -22,10 +22,22 @@ PostgreSQL documentation
<refsynopsisdiv>
<cmdsynopsis>
<command>pg_resetwal</command>
- <arg choice="opt"><option>-f</option></arg>
- <arg choice="opt"><option>-n</option></arg>
+ <group choice="opt">
+ <arg choice="plain"><option>--force</option></arg>
+ <arg choice="plain"><option>-f</option></arg>
+ </group>
+ <group choice="opt">
+ <arg choice="plain"><option>--no-update</option></arg>
+ <arg choice="plain"><option>-n</option></arg>
+ </group>
<arg rep="repeat"><replaceable>option</replaceable></arg>
- <arg choice="req"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+ <arg choice="req">
+ <group choice="opt">
+ <arg choice="plain"><option>--pgdata</option></arg>
+ <arg choice="plain"><option>-D</option></arg>
+ </group>
+ <replaceable class="parameter"> datadir</replaceable>
+ </arg>
</cmdsynopsis>
</refsynopsisdiv>
@@ -78,6 +90,7 @@ PostgreSQL documentation
<variablelist>
<varlistentry>
<term><option>-f</option></term>
+ <term><option>--force</option></term>
<listitem>
<para>
Force <command>pg_resetwal</command> to proceed even if it cannot determine
@@ -88,6 +101,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-n</option></term>
+ <term><option>--no-update</option></term>
<listitem>
<para>
The <option>-n</option> (no operation) option instructs
@@ -124,6 +138,7 @@ PostgreSQL documentation
<variablelist>
<varlistentry>
<term><option>-c</option> <replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></term>
+ <term><option>--xact-ids=</option><replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
Manually set the oldest and newest transaction IDs for which the commit
@@ -145,6 +160,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-e</option> <replaceable class="parameter">xid_epoch</replaceable></term>
+ <term><option>--epoch=</option><replaceable class="parameter">xid_epoch</replaceable></term>
<listitem>
<para>
Manually set the next transaction ID's epoch.
@@ -165,6 +181,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-l</option> <replaceable class="parameter">walfile</replaceable></term>
+ <term><option>--wal-address=</option><replaceable class="parameter">walfile</replaceable></term>
<listitem>
<para>
Manually set the WAL starting address.
@@ -196,6 +213,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-m</option> <replaceable class="parameter">mxid</replaceable>,<replaceable class="parameter">mxid</replaceable></term>
+ <term><option>--multixact-ids=</option><replaceable class="parameter">mxid</replaceable>,<replaceable class="parameter">mxid</replaceable></term>
<listitem>
<para>
Manually set the next and oldest multitransaction ID.
@@ -217,6 +235,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-o</option> <replaceable class="parameter">oid</replaceable></term>
+ <term><option>--next-oid=</option><replaceable class="parameter">oid</replaceable></term>
<listitem>
<para>
Manually set the next OID.
@@ -232,6 +251,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-O</option> <replaceable class="parameter">mxoff</replaceable></term>
+ <term><option>--multixact-offset=</option><replaceable class="parameter">mxoff</replaceable></term>
<listitem>
<para>
Manually set the next multitransaction offset.
@@ -249,6 +269,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-s</option> <replaceable class="parameter">wal_segment_size</replaceable></term>
+ <term><option>--wal-segsize=</option><replaceable class="parameter">wal_segment_size</replaceable></term>
<listitem>
<para>
Manually set the WAL segment size (in megabytes).
@@ -273,6 +294,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-x</option> <replaceable class="parameter">xid</replaceable></term>
+ <term><option>--next-xact-id=</option><replaceable class="parameter">xid</replaceable></term>
<listitem>
<para>
Manually set the next transaction ID.
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 506fdda..06125f6 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -89,6 +89,21 @@ static void usage(void);
int
main(int argc, char *argv[])
{
+ static struct option long_options[] = {
+ {"pgdata", required_argument, NULL, 'D'},
+ {"force", no_argument, NULL, 'f'},
+ {"no-update", no_argument, NULL, 'n'},
+ {"epoch", required_argument, NULL, 'e'},
+ {"next-xact-id", required_argument, NULL, 'x'},
+ {"xact-ids", required_argument, NULL, 'c'},
+ {"next-oid", required_argument, NULL, 'o'},
+ {"multixact-ids", required_argument, NULL, 'm'},
+ {"multixact-offset", required_argument, NULL, 'O'},
+ {"wal-address", required_argument, NULL, 'l'},
+ {"wal-segsize", required_argument, NULL, 's'},
+ {NULL, 0, NULL, 0}
+ };
+
int c;
bool force = false;
bool noupdate = false;
@@ -118,7 +133,7 @@ main(int argc, char *argv[])
}
- while ((c = getopt(argc, argv, "c:D:e:fl:m:no:O:x:s:")) != -1)
+ while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:s:", long_options, NULL)) != -1)
{
switch (c)
{
@@ -1275,19 +1290,19 @@ usage(void)
printf(_("%s resets the PostgreSQL write-ahead log.\n\n"), progname);
printf(_("Usage:\n %s [OPTION]... DATADIR\n\n"), progname);
printf(_("Options:\n"));
- printf(_(" -c XID,XID set oldest and newest transactions bearing commit timestamp\n"));
- printf(_(" (zero in either value means no change)\n"));
- printf(_(" [-D] DATADIR data directory\n"));
- printf(_(" -e XIDEPOCH set next transaction ID epoch\n"));
- printf(_(" -f force update to be done\n"));
- printf(_(" -l WALFILE force minimum WAL starting location for new write-ahead log\n"));
- printf(_(" -m MXID,MXID set next and oldest multitransaction ID\n"));
- printf(_(" -n no update, just show what would be done (for testing)\n"));
- printf(_(" -o OID set next OID\n"));
- printf(_(" -O OFFSET set next multitransaction offset\n"));
- printf(_(" -s SIZE set WAL segment size (in megabytes)\n"));
- printf(_(" -V, --version output version information, then exit\n"));
- printf(_(" -x XID set next transaction ID\n"));
- printf(_(" -?, --help show this help, then exit\n"));
+ printf(_(" -c, --xact-ids=XID,XID set oldest and newest transactions bearing commit timestamp\n"));
+ printf(_(" (zero in either value means no change)\n"));
+ printf(_(" [-D, --pgdata=]DATADIR data directory\n"));
+ printf(_(" -e, --epoch=XIDEPOCH set next transaction ID epoch\n"));
+ printf(_(" -f, --force force update to be done\n"));
+ printf(_(" -l, --wal-address=WALFILE force minimum WAL starting location for new write-ahead log\n"));
+ printf(_(" -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID\n"));
+ printf(_(" -n, --no-update no update, just show what would be done (for testing)\n"));
+ printf(_(" -o, --next-oid=OID set next OID\n"));
+ printf(_(" -O, --multixact-offset=OFFSET set next multitransaction offset\n"));
+ printf(_(" -s, --wal-segsize=SIZE set WAL segment size (in megabytes)\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -x, --next-xact-id=XID set next transaction ID\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
}
--
2.7.3.AMZN
On 3/21/18 17:14, Bossart, Nathan wrote:
Here is an updated set of patches that use the newly proposed approach
for avoiding division-by-zero errors in pg_resetwal.
committed
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services