pg_rewind and xlogtemp files
Hi all,
I just bumped into this report regarding pg_rewind, that impacts as
well the version shipped in src/bin/pg_rewind:
https://github.com/vmware/pg_rewind/issues/45
In short, the issue refers to the fact that if the source server
filemap includes xlogtemp files pg_rewind will surely fail with
something like the following error:
error reading xlog record: record with zero length at 1/D5000090
unexpected result while fetching remote files: ERROR: could not open
file "pg_xlog/xlogtemp.23056" for reading: No such file or directory
The servers diverged at WAL position 1/D4A081B0 on timeline 174.
Rewinding from Last common checkpoint at 1/D30A5650 on timeline 174
As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?
And attached is a patch following those lines.
--
Michael
Attachments:
20150617_rewind_xlogtemp.patchapplication/x-patch; name=20150617_rewind_xlogtemp.patchDownload+11-4
17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com> написал(а):
On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?
Declaration seems to be the right thing.
Another problem I’ve caught twice already in the same test:
error reading xlog record: record with zero length at 0/78000090
unexpected result while fetching remote files: ERROR: could not open file "base/13003/t6_2424967" for reading: No such file or directory
The servers diverged at WAL position 0/76BADD50 on timeline 303.
Rewinding from Last common checkpoint at 0/7651F870 on timeline 303
I don’t know if this problem could be solved the same way (by skipping such files)… Should I start a new thread for that?
And attached is a patch following those lines.
--
Michael
<20150617_rewind_xlogtemp.patch>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
May the force be with you…
https://simply.name
On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote:
17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com>
написал(а):On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?Declaration seems to be the right thing.
Another problem I’ve caught twice already in the same test:
error reading xlog record: record with zero length at 0/78000090
unexpected result while fetching remote files: ERROR: could not open file
"base/13003/t6_2424967" for reading: No such file or directory
The servers diverged at WAL position 0/76BADD50 on timeline 303.
Rewinding from Last common checkpoint at 0/7651F870 on timeline 303I don’t know if this problem could be solved the same way (by skipping such
files)… Should I start a new thread for that?
That's the file of the temporary table, so there is no need to copy it
from the source server. pg_rewind can safely skip such file, I think.
But even if we make pg_rewind skip such file, we would still get the
similar problem. You can see the problem that I reported in other thread.
In order to address this type of problem completely, we would need
to apply the fix that is been discussed in that thread.
/messages/by-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com
BTW, even pg_basebackup doesn't skip the file of temporary table.
But maybe we should change this, too.
Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM
that basically pg_rewind can safely skip any files that pg_basebackup does.
So probably we need to reconsider which file to make pg_rewind skip.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 17, 2015 at 9:07 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Jun 17, 2015 at 4:57 PM, Vladimir Borodin <root@simply.name> wrote:
17 июня 2015 г., в 9:48, Michael Paquier <michael.paquier@gmail.com>
написал(а):On Wed, Jun 17, 2015 at 3:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:As pointed by dev1ant on the original bug report, process_remote_file
should ignore files named as pg_xlog/xlogtemp.*, and I think that this
is the right thing to do. Any objections for a patch that at the same
time makes "xlogtemp." a define declaration in xlog_internal.h?Declaration seems to be the right thing.
Another problem I’ve caught twice already in the same test:
error reading xlog record: record with zero length at 0/78000090
unexpected result while fetching remote files: ERROR: could not open file
"base/13003/t6_2424967" for reading: No such file or directory
The servers diverged at WAL position 0/76BADD50 on timeline 303.
Rewinding from Last common checkpoint at 0/7651F870 on timeline 303I don’t know if this problem could be solved the same way (by skipping such
files)… Should I start a new thread for that?That's the file of the temporary table, so there is no need to copy it
from the source server. pg_rewind can safely skip such file, I think.
Yes. It is actually recommended to copy them manually if needed from
the archive (per se the docs).
But even if we make pg_rewind skip such file, we would still get the
similar problem. You can see the problem that I reported in other thread.
In order to address this type of problem completely, we would need
to apply the fix that is been discussed in that thread.
/messages/by-id/CAHGQGwEdsNgeNZo+GyrzZtjW_TkC=XC6XxrjuAZ7=X_cj1aHHg@mail.gmail.com
There are two things to take into account here in my opinion:
1) Ignoring files that should not be added into the filemap, like
postmaster.pid, xlogtemp, etc.
2) bypass the files that can be added in the file map, for example a
relation file or a fsm file, and prevent erroring out if they are
missing.
BTW, even pg_basebackup doesn't skip the file of temporary table.
But maybe we should change this, too.Also pg_rewind doesn't skip the files that pg_basebackup does. ISTM
that basically pg_rewind can safely skip any files that pg_basebackup does.
So probably we need to reconsider which file to make pg_rewind skip.
pg_rewind and basebackup.c are beginning to share many things in this
area, perhaps we should consider a common routine in let's say
libpqcommon to define if a file can be safely skipped depending on its
path name in PGDATA.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers