Race condition in pg_database_size()

Started by Michael Fuhralmost 19 years ago6 messages
#1Michael Fuhr
mike@fuhr.org

I'm occasionally seeing calls to pg_database_size() fail with

ERROR: could not stat file "/var/lib/pgsql/data/base/16404/1738343": No such file or directory

So far I haven't noticed any other problems that might be related
to this error. This database frequently uses temporary tables so
I'm wondering if the error might be due to a race condition in
db_dir_size(), which does the following:

while ((direntry = ReadDir(dirdesc, path)) != NULL)
{
struct stat fst;

if (strcmp(direntry->d_name, ".") == 0 ||
strcmp(direntry->d_name, "..") == 0)
continue;

snprintf(filename, MAXPGPATH, "%s/%s", path, direntry->d_name);

if (stat(filename, &fst) < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", filename)));

dirsize += fst.st_size;
}

I'm wondering if the code should check for ENOENT if stat() fails
and either skip this entry silently under the assumption that the
file had been deleted since the call to ReadDir(), or issue a warning
without failing.

--
Michael Fuhr

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Fuhr (#1)
Re: Race condition in pg_database_size()

Michael Fuhr <mike@fuhr.org> writes:

I'm wondering if the code should check for ENOENT if stat() fails
and either skip this entry silently under the assumption that the
file had been deleted since the call to ReadDir(),

Probably. Want to look through the rest of that module for similar
problems?

regards, tom lane

#3Michael Fuhr
mike@fuhr.org
In reply to: Tom Lane (#2)
Re: Race condition in pg_database_size()

On Sat, Mar 10, 2007 at 12:32:04PM -0500, Tom Lane wrote:

Michael Fuhr <mike@fuhr.org> writes:

I'm wondering if the code should check for ENOENT if stat() fails
and either skip this entry silently under the assumption that the
file had been deleted since the call to ReadDir(),

Probably. Want to look through the rest of that module for similar
problems?

I think only db_dir_size() and calculate_tablespace_size() are
affected by this particular failure (ReadDir followed by stat).
I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

--
Michael Fuhr

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Fuhr (#3)
Re: Race condition in pg_database_size()

Michael Fuhr <mike@fuhr.org> writes:

I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

I think silent is fine for ENOENT cases. We know the file had been
there at ReadDir time, so the only possible conclusion is that it was
just unlinked, and I see no reason to complain about that.

regards, tom lane

#5Michael Fuhr
mike@fuhr.org
In reply to: Tom Lane (#4)
Re: Race condition in pg_database_size()

On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:

Michael Fuhr <mike@fuhr.org> writes:

I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

I think silent is fine for ENOENT cases. We know the file had been
there at ReadDir time, so the only possible conclusion is that it was
just unlinked, and I see no reason to complain about that.

Patch submitted.

--
Michael Fuhr

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Michael Fuhr (#5)
Re: Race condition in pg_database_size()

Michael Fuhr wrote:

On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:

Michael Fuhr <mike@fuhr.org> writes:

I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

I think silent is fine for ENOENT cases. We know the file had been
there at ReadDir time, so the only possible conclusion is that it was
just unlinked, and I see no reason to complain about that.

Patch submitted.

Applied, thanks.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.