pg_basebackup fails if a data file is removed
When pg_basebackup copies data files, it does basically this:
if (lstat(pathbuf, &statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory \"%s\": %m",
pathbuf)));/* If the file went away while scanning, it's no error. */
continue;
}
...
sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);
There's a race condition there. If the file is removed after the lstat
call, and before sendFile opens the file, the backup fails with an
error. It's a fairly tight window, so it's difficult to run into by
accident, but by putting a breakpoint with a debugger there it's quite
easy to reproduce, by e.g doing a VACUUM FULL on the table about to be
copied.
A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.
- Heikki
Attachments:
fix-enoent-in-basebackup.patchtext/x-diff; name=fix-enoent-in-basebackup.patchDownload+31-10
On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
When pg_basebackup copies data files, it does basically this:
if (lstat(pathbuf, &statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory
\"%s\": %m",
pathbuf)));/* If the file went away while scanning, it's no error. */
continue;
}...
sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);There's a race condition there. If the file is removed after the lstat call,
and before sendFile opens the file, the backup fails with an error. It's a
fairly tight window, so it's difficult to run into by accident, but by
putting a breakpoint with a debugger there it's quite easy to reproduce, by
e.g doing a VACUUM FULL on the table about to be copied.A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.
Looks good to me. Nice spot - don't tell me you actually ran into it
during testing? :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 21.12.2012 15:30, Magnus Hagander wrote:
On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:When pg_basebackup copies data files, it does basically this:
if (lstat(pathbuf,&statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory
\"%s\": %m",
pathbuf)));/* If the file went away while scanning, it's no error. */
continue;
}...
sendFile(pathbuf, pathbuf + basepathlen + 1,&statbuf);There's a race condition there. If the file is removed after the lstat call,
and before sendFile opens the file, the backup fails with an error. It's a
fairly tight window, so it's difficult to run into by accident, but by
putting a breakpoint with a debugger there it's quite easy to reproduce, by
e.g doing a VACUUM FULL on the table about to be copied.A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.Looks good to me.
Ok, committed.
Nice spot - don't tell me you actually ran into it
during testing? :)
Heh, no, eyeballing the code.
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs