pg_rewind copies
If a file is modified and becomes larger in the source system while
pg_rewind is running, pg_rewind can leave behind a partial copy of file.
That's by design, and it's OK for relation files because they're
replayed from WAL. But it can cause trouble for configuration files.
I ran into this while playing with pg_auto_failover. After failover,
pg_auto_failover would often launch pg_rewind, and run ALTER SYSTEM on
the primary while pg_rewind was running. The resulting rewound system
would fail to start up:
Nov 13 09:24:42 pg-node-a pg_autoctl[2217]: 09:24:42 2220 ERROR
2020-11-13 09:24:32.547 GMT [2246] LOG: syntax error in file
"/data/pgdata/postgresql.auto.conf" line 4, near token "'"
Nov 13 09:24:42 pg-node-a pg_autoctl[2217]: 09:24:42 2220 ERROR
2020-11-13 09:24:32.547 GMT [2246] FATAL: configuration file
"postgresql.auto.conf" contains errors
Attached is a patch to mitigate that. It changes pg_rewind so that when
it copies a whole file, it ignores the original file size. It's not a
complete cure: it still believes the original size for files larger than
1 MB. That limit was just expedient given the way the chunking logic in
libpq_source.c works, but should be enough for configuration files.
There's another race condition that this doesn't try to fix: If a file
is modified while it's being copied, you can have a torn file with one
half of the file from the old version, and one half from the new. That's
a much more narrow window, though, and pg_basebackup has the same problem.
- Heikki
Attachments:
0001-pg_rewind-Fetch-small-files-according-to-new-size.patchtext/x-patch; charset=UTF-8; name=0001-pg_rewind-Fetch-small-files-according-to-new-size.patchDownload+112-15
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested
Hello
The patch seems to do as described and the regression and tap tests are passing
+ /*
+ * A local source is not expected to change while we're rewinding, so check
+ * that we size of the file matches our earlier expectation.
+ */
Is this a tyoo?
thanks
Cary
On 16/12/2020 00:08, Cary Huang wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not testedHello
The patch seems to do as described and the regression and tap tests are passing + /* + * A local source is not expected to change while we're rewinding, so check + * that we size of the file matches our earlier expectation. + */ Is this a tyoo?
Yep, thanks! Attached is a new patch version, with that fixed and
rebased. No other changes.
- Heikki
Attachments:
v2-0001-pg_rewind-Fetch-small-files-according-to-new-size.patchtext/x-patch; charset=UTF-8; name=v2-0001-pg_rewind-Fetch-small-files-according-to-new-size.patchDownload+112-15
On 1/22/21 8:26 AM, Heikki Linnakangas wrote:
On 16/12/2020 00:08, Cary Huang wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not testedHello
The patch seems to do as described and the regression and tap tests are passing + /* + * A local source is not expected to change while we're rewinding, so check + * that we size of the file matches our earlier expectation. + */ Is this a tyoo?Yep, thanks! Attached is a new patch version, with that fixed and
rebased. No other changes.
Cary, does this patch look ready to commit? If so, please change the
state in the CF entry to "Ready for Committer".
Regards,
--
-David
david@pgmasters.net
On 25 Mar 2021, at 14:32, David Steele <david@pgmasters.net> wrote:
On 1/22/21 8:26 AM, Heikki Linnakangas wrote:
On 16/12/2020 00:08, Cary Huang wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not testedHello
The patch seems to do as described and the regression and tap tests are passing + /* + * A local source is not expected to change while we're rewinding, so check + * that we size of the file matches our earlier expectation. + */ Is this a tyoo?Yep, thanks! Attached is a new patch version, with that fixed and rebased. No other changes.
Cary, does this patch look ready to commit? If so, please change the state in the CF entry to "Ready for Committer".
Reading the patch I think it definitely qualifies for RFC state. Heikki, do
you have plans to address till patch in the ongoing CF?
--
Daniel Gustafsson https://vmware.com/
I took another look at this patch, and I think it's ready to go in, it clearly
fixes a bug that isn't too hard to hit in production settings. To ensure we
don't break this I've added a testcase which pipes the pg_rewind --verbose
output to a file it's asked to copy, which then guarantees that the file is
growing in size during the operation without need for synchronizing two
processes with IPC::Run (it also passes on Windows in the CI setup).
One small comment on the patch:
+ snprintf(srcpath, sizeof(srcpath), "%s/%s", datadir, path);
This should IMO check the returnvalue of snprintf to ensure it wasn't
truncated. While the risk is exceedingly small, a truncated filename might
match another existing filename and the error not getting caught. There is
another instance just like this one in open_target_file() to which I think we
should apply the same belts-and-suspenders treatment. I've fixed this in the
attached version which also have had a pg_indent run on top of a fresh rebase.
Thoughts on this version?
--
Daniel Gustafsson https://vmware.com/
Attachments:
v3-0001-pg_rewind-Fetch-small-files-according-to-new-size.patchapplication/octet-stream; name=v3-0001-pg_rewind-Fetch-small-files-according-to-new-size.patch; x-unix-mode=0644Download+195-16
On 01/04/2022 12:00, Daniel Gustafsson wrote:
I took another look at this patch, and I think it's ready to go in, it clearly
fixes a bug that isn't too hard to hit in production settings. To ensure we
don't break this I've added a testcase which pipes the pg_rewind --verbose
output to a file it's asked to copy, which then guarantees that the file is
growing in size during the operation without need for synchronizing two
processes with IPC::Run (it also passes on Windows in the CI setup).One small comment on the patch:
+ snprintf(srcpath, sizeof(srcpath), "%s/%s", datadir, path);
This should IMO check the returnvalue of snprintf to ensure it wasn't
truncated. While the risk is exceedingly small, a truncated filename might
match another existing filename and the error not getting caught. There is
another instance just like this one in open_target_file() to which I think we
should apply the same belts-and-suspenders treatment. I've fixed this in the
attached version which also have had a pg_indent run on top of a fresh rebase.
+ if (len >= sizeof(dstpath)) + pg_fatal("filepath buffer too small"); /* shouldn't happen */
Makes sense. I would remove the "shouldn't happen"; it's not very hard
to make it happen, you just need a very long target datadir path. And
rephrase the error message as "datadir path too long".
One typo in the commit message: s/update/updates/.
Thanks!
- Heikki
On 1 Apr 2022, at 12:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
+ if (len >= sizeof(dstpath)) + pg_fatal("filepath buffer too small"); /* shouldn't happen */Makes sense. I would remove the "shouldn't happen"; it's not very hard to make it happen, you just need a very long target datadir path. And rephrase the error message as "datadir path too long".
Right, good point.
One typo in the commit message: s/update/updates/.
Will fix.
--
Daniel Gustafsson https://vmware.com/
On 01.04.22 11:00, Daniel Gustafsson wrote:
One small comment on the patch:
+ snprintf(srcpath, sizeof(srcpath), "%s/%s", datadir, path);
This should IMO check the returnvalue of snprintf to ensure it wasn't
truncated. While the risk is exceedingly small, a truncated filename might
match another existing filename and the error not getting caught. There is
another instance just like this one in open_target_file() to which I think we
should apply the same belts-and-suspenders treatment. I've fixed this in the
attached version which also have had a pg_indent run on top of a fresh rebase.
We use snprintf() like that countless times, and approximately none of
them check for overflow. So while you are right, this might not be the
place to start a new policy.
If you don't like this approach, use psprintf() perhaps.
On 4 Apr 2022, at 15:08, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
We use snprintf() like that countless times, and approximately none of them check for overflow. So while you are right, this might not be the place to start a new policy.
Fair enough, I'll remove these hunks before committing and will look a bigger
picture patch to address these across the codebase later.
--
Daniel Gustafsson https://vmware.com/