pg_basebackup fails if a data file is removed

Started by Heikki Linnakangasover 13 years ago3 messagesbugs
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

When pg_basebackup copies data files, it does basically this:

if (lstat(pathbuf, &statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory \"%s\": %m",
pathbuf)));

/* If the file went away while scanning, it's no error. */
continue;
}
...
sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);

There's a race condition there. If the file is removed after the lstat
call, and before sendFile opens the file, the backup fails with an
error. It's a fairly tight window, so it's difficult to run into by
accident, but by putting a breakpoint with a debugger there it's quite
easy to reproduce, by e.g doing a VACUUM FULL on the table about to be
copied.

A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.

- Heikki

Attachments:

fix-enoent-in-basebackup.patchtext/x-diff; name=fix-enoent-in-basebackup.patchDownload+31-10
#2Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#1)
Re: pg_basebackup fails if a data file is removed

On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

When pg_basebackup copies data files, it does basically this:

if (lstat(pathbuf, &statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory
\"%s\": %m",
pathbuf)));

/* If the file went away while scanning, it's no error. */
continue;
}

...
sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);

There's a race condition there. If the file is removed after the lstat call,
and before sendFile opens the file, the backup fails with an error. It's a
fairly tight window, so it's difficult to run into by accident, but by
putting a breakpoint with a debugger there it's quite easy to reproduce, by
e.g doing a VACUUM FULL on the table about to be copied.

A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.

Looks good to me. Nice spot - don't tell me you actually ran into it
during testing? :)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Magnus Hagander (#2)
Re: pg_basebackup fails if a data file is removed

On 21.12.2012 15:30, Magnus Hagander wrote:

On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

When pg_basebackup copies data files, it does basically this:

if (lstat(pathbuf,&statbuf) != 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file or directory
\"%s\": %m",
pathbuf)));

/* If the file went away while scanning, it's no error. */
continue;
}

...
sendFile(pathbuf, pathbuf + basepathlen + 1,&statbuf);

There's a race condition there. If the file is removed after the lstat call,
and before sendFile opens the file, the backup fails with an error. It's a
fairly tight window, so it's difficult to run into by accident, but by
putting a breakpoint with a debugger there it's quite easy to reproduce, by
e.g doing a VACUUM FULL on the table about to be copied.

A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.

Looks good to me.

Ok, committed.

Nice spot - don't tell me you actually ran into it
during testing? :)

Heh, no, eyeballing the code.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs