Possible race condition in pg_mkdir_p()?

Started by Ning Yuover 6 years ago18 messages
#1Ning Yu
nyu@pivotal.io

Hi Hackers,

We found a race condition in pg_mkdir_p(), here is a simple reproducer:

#!/bin/sh

basedir=pgdata
datadir=$basedir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
logdir=$basedir/logs
n=2

rm -rf $basedir
mkdir -p $logdir

# init databases concurrently, they will all try to create the parent
dirs
for i in `seq $n`; do
initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
done

wait

# there is a chance one of the initdb commands failed to create the
datadir
grep 'could not create directory' $logdir/*

The logic in pg_mkdir_p() is as below:

/* check for pre-existing directory */
if (stat(path, &sb) == 0)
{
if (!S_ISDIR(sb.st_mode))
{
if (last)
errno = EEXIST;
else
errno = ENOTDIR;
retval = -1;
break;
}
}
else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) <
0)
{
retval = -1;
break;
}

This seems buggy as it first checks the existence of the dir and makes the
dir if it does not exist yet, however when executing concurrently a
possible race condition can be as below:

A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists

To prevent the race condition we could mkdir() directly, if it returns -1
and errno is EEXIST then check whether it's really a dir with stat(). In
fact this is what is done in the `mkdir -p` command:
https://github.com/coreutils/gnulib/blob/b5a9fa677847081c9b4f26908272f122b15df8b9/lib/mkdir-p.c#L130-L164

By the way, some callers of pg_mkdir_p() checks for EEXIST explicitly, such
as in pg_basebackup.c:

if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno !=
EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
statusdir);
exit(1);
}

This is still wrong with current code logic, because when the statusdir is
a file the errno is also EEXIST, but it can pass the check here. Even if
we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
still wrong.

Best Regards
Ning

#2Michael Paquier
michael@paquier.xyz
In reply to: Ning Yu (#1)
Re: Possible race condition in pg_mkdir_p()?

On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:

This is still wrong with current code logic, because when the statusdir is
a file the errno is also EEXIST, but it can pass the check here. Even if
we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
still wrong.

Would you like to send a patch?
--
Michael

#3Paul Guo
pguo@pivotal.io
In reply to: Michael Paquier (#2)
Re: Possible race condition in pg_mkdir_p()?

On Thu, Jul 18, 2019 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:

This is still wrong with current code logic, because when the statusdir

is

a file the errno is also EEXIST, but it can pass the check here. Even if
we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here

is

still wrong.

Would you like to send a patch?

Michael, we'll send out the patch later. Checked code, it seems that there
is another related mkdir() issue.

MakePGDirectory() is actually a syscall mkdir(), and manpage says the errno
meaning of EEXIST,

EEXIST pathname already exists (not necessarily as a directory).
This includes the case where pathname is a symbolic link, dangling or not.

However it looks like some callers do not use that correctly, e.g.

if (MakePGDirectory(directory) < 0)
{
if (errno == EEXIST)
return;

OR

if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)

i.e. we should better use stat(path) && S_ISDIR(buf) && errno == EEXIST to
replace errno == EEXIST.

One possible fix is to add an argument like ignore_created (in case some
callers want to fail if the path has been created) in MakePGDirectory() and
then add that code logic into it.

#4Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#2)
4 attachment(s)
Re: Possible race condition in pg_mkdir_p()?

Hi Michael,

The patches are attached. To make reviewing easier we spilt them into small
pieces:

- v1-0001-Fix-race-condition-in-pg_mkdir_p.patch: the fix to pg_mkdir_p()
itself, basically we are following the `mkdir -p` logic;
- v1-0002-Test-concurrent-call-to-pg_mkdir_p.patch: the tests for
pg_mkdir_p(),
we could see how it fails by reverting the first patch, and a reproducer
with
initdb is also provided in the README; as we do not know how to create a
unit
test in postgresql we have to employ a test module to do the job, not
sure if
this is a proper solution;
- v1-0003-Fix-callers-of-pg_mkdir_p.patch &
v1-0004-Fix-callers-of-MakePGDirectory.patch: fix callers of pg_mkdir_p()
and
MakePGDirectory(), tests are not provided for these changes;

MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
considered as non-error for parent directories, however as it considers
EEXIST
as a failure for the last level of the path so the logic is still correct,
we
do not add a final stat() check for it.

Best Regards
Ning

On Thu, Jul 18, 2019 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:

Show quoted text

On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:

This is still wrong with current code logic, because when the statusdir

is

a file the errno is also EEXIST, but it can pass the check here. Even if
we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here

is

still wrong.

Would you like to send a patch?
--
Michael

Attachments:

v1-0003-Fix-callers-of-pg_mkdir_p.patchapplication/octet-stream; name=v1-0003-Fix-callers-of-pg_mkdir_p.patchDownload
From eadc6f84fd1f0165c5625b0de8aae18f5505c507 Mon Sep 17 00:00:00 2001
From: Ning Yu <nyu@pivotal.io>
Date: Tue, 23 Jul 2019 13:32:30 +0800
Subject: [PATCH v1 3/4] Fix callers of pg_mkdir_p()

Some callers of pg_mkdir_p() consider EEXIST as a non-error case, this
is incorrect.  When pg_mkdir_p() sets errno to EEXIST it means that the
target path already exists but is not a directory.  So as long as
pg_mkdir_p() returns -1 we should consider it as a failure.

Co-authored-by: Paul Guo <pguo@pivotal.io>
Co-authored-by: Ning Yu <nyu@pivotal.io>
---
 src/bin/pg_basebackup/pg_basebackup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 15f43f9432..d02200b3ff 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -610,7 +610,7 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
 				 PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ?
 				 "pg_xlog" : "pg_wal");
 
-		if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
+		if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0)
 		{
 			pg_log_error("could not create directory \"%s\": %m", statusdir);
 			exit(1);
-- 
2.20.1

v1-0004-Fix-callers-of-MakePGDirectory.patchapplication/octet-stream; name=v1-0004-Fix-callers-of-MakePGDirectory.patchDownload
From b8c4094cd512dd48015255052170af337db32fc9 Mon Sep 17 00:00:00 2001
From: Ning Yu <nyu@pivotal.io>
Date: Tue, 23 Jul 2019 14:27:20 +0800
Subject: [PATCH v1 4/4] Fix callers of MakePGDirectory()

MakePGDirectory() is a wrapper of mkdir(), when it returns -1 and sets
errno to EEXIST it means that the path already exists, but that does not
means it is a directory, we need to double check with stat().

Co-authored-by: Paul Guo <pguo@pivotal.io>
Co-authored-by: Ning Yu <nyu@pivotal.io>
---
 src/backend/storage/file/fd.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 315c74c745..5059802c7f 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -1399,11 +1399,10 @@ PathNameOpenFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
 void
 PathNameCreateTemporaryDir(const char *basedir, const char *directory)
 {
+	struct stat statbuf;
+
 	if (MakePGDirectory(directory) < 0)
 	{
-		if (errno == EEXIST)
-			return;
-
 		/*
 		 * Failed.  Try to create basedir first in case it's missing. Tolerate
 		 * EEXIST to close a race against another process following the same
@@ -1422,6 +1421,17 @@ PathNameCreateTemporaryDir(const char *basedir, const char *directory)
 					 errmsg("cannot create temporary subdirectory \"%s\": %m",
 							directory)));
 	}
+
+	if (stat(directory, &statbuf) < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("cannot stat temporary subdirectory \"%s\": %m",
+						directory)));
+	else if (!S_ISDIR(statbuf.st_mode))
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not a directory",
+						directory)));
 }
 
 /*
-- 
2.20.1

v1-0002-Test-concurrent-call-to-pg_mkdir_p.patchapplication/octet-stream; name=v1-0002-Test-concurrent-call-to-pg_mkdir_p.patchDownload
From 3f0252b89bb62c2f40b414d4051e869f25f635e1 Mon Sep 17 00:00:00 2001
From: Ning Yu <nyu@pivotal.io>
Date: Tue, 23 Jul 2019 11:29:04 +0800
Subject: [PATCH v1 2/4] Test concurrent call to pg_mkdir_p()

---
 src/test/modules/Makefile                     |   1 +
 src/test/modules/test_pg_mkdir_p/.gitignore   |   2 +
 src/test/modules/test_pg_mkdir_p/Makefile     |  23 ++++
 src/test/modules/test_pg_mkdir_p/README       |  44 ++++++++
 .../expected/test_pg_mkdir_p.out              |  67 +++++++++++
 .../test_pg_mkdir_p/sql/test_pg_mkdir_p.sql   |  18 +++
 .../test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql  |   9 ++
 .../modules/test_pg_mkdir_p/test_pg_mkdir_p.c | 106 ++++++++++++++++++
 .../test_pg_mkdir_p/test_pg_mkdir_p.control   |   5 +
 9 files changed, 275 insertions(+)
 create mode 100644 src/test/modules/test_pg_mkdir_p/.gitignore
 create mode 100644 src/test/modules/test_pg_mkdir_p/Makefile
 create mode 100644 src/test/modules/test_pg_mkdir_p/README
 create mode 100644 src/test/modules/test_pg_mkdir_p/expected/test_pg_mkdir_p.out
 create mode 100644 src/test/modules/test_pg_mkdir_p/sql/test_pg_mkdir_p.sql
 create mode 100644 src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql
 create mode 100644 src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.c
 create mode 100644 src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.control

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 60d6d7be1b..ff7847747d 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  test_integerset \
 		  test_parser \
 		  test_pg_dump \
+		  test_pg_mkdir_p \
 		  test_predtest \
 		  test_rbtree \
 		  test_rls_hooks \
diff --git a/src/test/modules/test_pg_mkdir_p/.gitignore b/src/test/modules/test_pg_mkdir_p/.gitignore
new file mode 100644
index 0000000000..19b6c5ba42
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/.gitignore
@@ -0,0 +1,2 @@
+# Generated subdirectories
+/results/
diff --git a/src/test/modules/test_pg_mkdir_p/Makefile b/src/test/modules/test_pg_mkdir_p/Makefile
new file mode 100644
index 0000000000..0005653d6f
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_pg_mkdir_p/Makefile
+
+MODULE_big = test_pg_mkdir_p
+OBJS = test_pg_mkdir_p.o $(WIN32RES)
+PGFILEDESC = "test_pg_mkdir_p - helper function to test concurrent call to pg_mkdir_p"
+
+EXTENSION = test_pg_mkdir_p
+DATA = test_pg_mkdir_p--1.0.sql
+
+REGRESS = test_pg_mkdir_p
+
+SHLIB_LINK += -L$(top_builddir)/src/port -lpgport
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_pg_mkdir_p
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_pg_mkdir_p/README b/src/test/modules/test_pg_mkdir_p/README
new file mode 100644
index 0000000000..0ef705f4f0
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/README
@@ -0,0 +1,44 @@
+test_pg_mkdir_p is a test module to validate a race condition issue in
+pg_mkdir_p().
+
+pg_mkdir_p() is used by initdb and pg_basebackup to create a directory as well
+as its parent directories.  The logic used to be like below:
+
+    if (stat(path) < 0)
+    {
+        if (mkdir(path) < 0)
+            retval = -1;
+    }
+
+It first checks for the existence of the path, if path does not pre-exist then
+it creates the directory.  However if two processes try to create path
+concurrently, then one possible execution order is as below:
+
+    A: stat(path) returns -1, so decide to create it;
+    B: stat(path) returns -1, so decide to create it;
+    B: mkdir(path) returns 0, successfully created path;
+    A: mkdir(path) returns -1, fail to create path as it already exist;
+
+It could be triggered easily with initdb:
+
+    testdir=/tmp/testdir
+    datadir=$testdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
+
+    rm -rf $testdir
+    mkdir $testdir
+
+    # init two databases with common parent directories
+    initdb -D $datadir/db1 >$testdir/db1.log 2>&1 &
+    initdb -D $datadir/db2 >$testdir/db2.log 2>&1 &
+
+    # wait for them to finish and check for the error
+    wait
+    grep 'could not create directory' $testdir/*.log
+
+The fail rate is not 100% but should be large enough to happen in 5 tries.
+
+This race condition could be fixed by swapping the order of stat() and
+mkdir(), this is also what the "mkdir -p" command does.
+
+In this test module we test concurrent calls to pg_mkdir_p() to ensure the
+race condition does not happen.
diff --git a/src/test/modules/test_pg_mkdir_p/expected/test_pg_mkdir_p.out b/src/test/modules/test_pg_mkdir_p/expected/test_pg_mkdir_p.out
new file mode 100644
index 0000000000..701c316926
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/expected/test_pg_mkdir_p.out
@@ -0,0 +1,67 @@
+CREATE EXTENSION test_pg_mkdir_p;
+select * from test_pg_mkdir_p(1);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(2);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(2);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(4);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(4);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(8);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(8);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(16);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(16);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(32);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
+select * from test_pg_mkdir_p(32);
+ test_pg_mkdir_p 
+-----------------
+ t
+(1 row)
+
diff --git a/src/test/modules/test_pg_mkdir_p/sql/test_pg_mkdir_p.sql b/src/test/modules/test_pg_mkdir_p/sql/test_pg_mkdir_p.sql
new file mode 100644
index 0000000000..a637d7d75f
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/sql/test_pg_mkdir_p.sql
@@ -0,0 +1,18 @@
+CREATE EXTENSION test_pg_mkdir_p;
+
+select * from test_pg_mkdir_p(1);
+
+select * from test_pg_mkdir_p(2);
+select * from test_pg_mkdir_p(2);
+
+select * from test_pg_mkdir_p(4);
+select * from test_pg_mkdir_p(4);
+
+select * from test_pg_mkdir_p(8);
+select * from test_pg_mkdir_p(8);
+
+select * from test_pg_mkdir_p(16);
+select * from test_pg_mkdir_p(16);
+
+select * from test_pg_mkdir_p(32);
+select * from test_pg_mkdir_p(32);
diff --git a/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql
new file mode 100644
index 0000000000..044c4e8f38
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_pg_mkdir_p" to load this file. \quit
+
+CREATE FUNCTION test_pg_mkdir_p(integer)
+RETURNS bool
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.c b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.c
new file mode 100644
index 0000000000..59cc4efbaf
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.c
@@ -0,0 +1,106 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_pg_mkdir_p.c
+ *    Helper function to test concurrent call to pg_mkdir_p()
+ *
+ * Copyright (c) 2007-2019, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <errno.h>
+#include <pthread.h>
+
+#include "common/file_perm.h"
+#include "fmgr.h"
+#include "port.h"
+#include "utils/elog.h"
+
+PG_MODULE_MAGIC;
+
+#define TESTDIR "/tmp/testdir_pg_mkdir_p"
+#define DATADIR TESTDIR "/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z"
+
+/*
+ * A struct to pass arguments to the thread and return the results.
+ */
+typedef struct
+{
+	pthread_t	tid;				/* thread id */
+	char		path[MAXPGPATH];	/* the path to create */
+	int			retcode;			/* return code of pg_mkdir_p() */
+	int			error;				/* errno */
+} Job;
+
+PG_FUNCTION_INFO_V1(test_pg_mkdir_p);
+
+static void *
+job_thread(void *arg)
+{
+	Job		   *job = (Job *) arg;
+
+	errno = 0;
+
+	job->retcode = pg_mkdir_p(job->path, pg_dir_create_mode);
+	job->error = errno;
+
+	return NULL;
+}
+
+/*
+ * This function accepts one int32 argument n, it will launch n concurrent
+ * threads to call pg_mkdir_p() to create the same dir and check for errors
+ * from them.
+ *
+ * Return true if all the calls to pg_mkdir_p() succeed, otherwise false is
+ * returned.
+ */
+Datum
+test_pg_mkdir_p(PG_FUNCTION_ARGS)
+{
+	int			n = PG_GETARG_INT32(0);
+	int			failed = 0;
+	int			i;
+	Job		   *jobs;
+
+	if (n <= 0)
+		elog(ERROR, "invalid argument: %d", n);
+
+	jobs = palloc(sizeof(Job) * n);
+
+	rmtree(TESTDIR, true);
+
+	/* Create concurrent threads to execute pg_mkdir_p() */
+	for (i = 0; i < n; i++)
+	{
+		Job		   *job = &jobs[i];
+
+		strncpy(job->path, DATADIR, sizeof(job->path));
+		pthread_create(&job->tid, NULL, job_thread, job);
+	}
+
+	/* Check for the results */
+	for (i = 0; i < n; i++)
+	{
+		Job		   *job = &jobs[i];
+
+		pthread_join(job->tid, NULL);
+
+		if (job->retcode < 0)
+		{
+			elog(NOTICE,
+				 "job %d: could not create directory \"%s\": %s",
+				 i, job->path, strerror(job->error));
+
+			failed++;
+		}
+	}
+
+	pfree(jobs);
+
+	PG_RETURN_BOOL(failed == 0);
+}
diff --git a/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.control b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.control
new file mode 100644
index 0000000000..2b8b738b6e
--- /dev/null
+++ b/src/test/modules/test_pg_mkdir_p/test_pg_mkdir_p.control
@@ -0,0 +1,5 @@
+# test_pg_mkdir_p extension
+comment = 'Test concurrent call to pg_mkdir_p()'
+default_version = '1.0'
+module_pathname = '$libdir/test_pg_mkdir_p'
+relocatable = true
-- 
2.20.1

v1-0001-Fix-race-condition-in-pg_mkdir_p.patchapplication/octet-stream; name=v1-0001-Fix-race-condition-in-pg_mkdir_p.patchDownload
From 12a033e0b2485740d74985bed0882f87a5ad05ad Mon Sep 17 00:00:00 2001
From: Ning Yu <nyu@pivotal.io>
Date: Tue, 23 Jul 2019 11:21:19 +0800
Subject: [PATCH v1 1/4] Fix race condition in pg_mkdir_p()

pg_mkdir_p() is used by initdb and pg_basebackup to create a directory
as well as its parent directories.  The logic used to be like below:

    if (stat(path) < 0)
    {
        if (mkdir(path) < 0)
            retval = -1;
    }

It first checks for the existence of the path, if path does not
pre-exist then it creates the directory.  However if two processes try
to create path concurrently, then one possible execution order is as
below:

    A: stat(path) returns -1, so decide to create it;
    B: stat(path) returns -1, so decide to create it;
    B: mkdir(path) returns 0, successfully created path;
    A: mkdir(path) returns -1, fail to create path as it already exist;

This race condition could be fixed by swapping the order of stat() and
mkdir(), this is also what the "mkdir -p" command does.

Co-authored-by: Paul Guo <pguo@pivotal.io>
Co-authored-by: Ning Yu <nyu@pivotal.io>
---
 src/port/pgmkdirp.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/port/pgmkdirp.c b/src/port/pgmkdirp.c
index d943559760..e0cc06260b 100644
--- a/src/port/pgmkdirp.c
+++ b/src/port/pgmkdirp.c
@@ -119,11 +119,17 @@ pg_mkdir_p(char *path, int omode)
 		if (last)
 			(void) umask(oumask);
 
-		/* check for pre-existing directory */
-		if (stat(path, &sb) == 0)
+		if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) == 0)
 		{
-			if (!S_ISDIR(sb.st_mode))
+			/* path does not pre-exist and is successfully created */
+		}
+		else if (errno == EEXIST)
+		{
+			/* path exists, double check it is a dir */
+
+			if (stat(path, &sb) < 0 || !S_ISDIR(sb.st_mode))
 			{
+				/* path is not a dir at all */
 				if (last)
 					errno = EEXIST;
 				else
@@ -131,8 +137,10 @@ pg_mkdir_p(char *path, int omode)
 				retval = -1;
 				break;
 			}
+
+			/* now we know that path does is a dir */
 		}
-		else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) < 0)
+		else
 		{
 			retval = -1;
 			break;
-- 
2.20.1

#5Michael Paquier
michael@paquier.xyz
In reply to: Ning Yu (#4)
Re: Possible race condition in pg_mkdir_p()?

On Tue, Jul 23, 2019 at 02:54:20PM +0800, Ning Yu wrote:

MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
considered as non-error for parent directories, however as it considers
EEXIST as a failure for the last level of the path so the logic is
still correct,

So the complains here are about two things:
- In some code paths calling mkdir, we don't care about the fact that
EEXIST can happen for something else than a folder. This could be a
problem if we have conflicts in the backend related to the naming of
the files/folders created. I find a bit surprising to not perform
the sanity checks in MakePGDirectory() in your patch. What of all the
existing callers of this routine?
- pg_mkdir_p is pretty bad at detecting problems with concurrent
creation of parent directories, leading to random failures where these
should not happen.

I may be missing something, but your patch does not actually fix
problem 2, no? Trying to do an initdb with a set of N folders using
the same parent folders not created still results in random failures.
--
Michael

#6Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#5)
Re: Possible race condition in pg_mkdir_p()?

On Tue, Jul 30, 2019 at 4:04 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 23, 2019 at 02:54:20PM +0800, Ning Yu wrote:

MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
considered as non-error for parent directories, however as it considers
EEXIST as a failure for the last level of the path so the logic is
still correct,

So the complains here are about two things:
- In some code paths calling mkdir, we don't care about the fact that
EEXIST can happen for something else than a folder. This could be a
problem if we have conflicts in the backend related to the naming of
the files/folders created. I find a bit surprising to not perform
the sanity checks in MakePGDirectory() in your patch. What of all the
existing callers of this routine?

Thanks for the reply. There are several callers of MakePGDirectory(), most of
them already treats EEXIST as an error; TablespaceCreateDbspace() already has
a proper checking for the target dir, it has the chance to fail on a
concurrently created dir, but at least it will not be confused by a file with
the same name; PathNameCreateTemporaryDir() is fixed by our patch 0004, we
will call stat() after mkdir() to double check the result.

In fact personally I'm thinking that whether could we replace all uses of
MakePGDirectory() with pg_mkdir_p(), so we could simplify
TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
significantly.

- pg_mkdir_p is pretty bad at detecting problems with concurrent
creation of parent directories, leading to random failures where these
should not happen.

I may be missing something, but your patch does not actually fix
problem 2, no? Trying to do an initdb with a set of N folders using
the same parent folders not created still results in random failures.

Well, we should have fixed problem 2, this is our major purpose of the patch
0001, it performs sanity check with stat() after mkdir() at each part of the
path.

The initdb test was just the one used by us to verify our fix, here is our
script:

n=4
testdir=testdir
datadir=$testdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
logdir=$testdir/logs

rm -rf $testdir
mkdir -p $logdir

for i in `seq $n`; do
initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
done

wait

# check for failures
grep 'could not create directory' $logdir/*

We have provided a test module in patch 0002 to perform a similar test, it
calls pg_mkdir_p() concurrently to trigger the issue, which has higher fail
rate than initdb. With the patch 0001 both the initdb test and the test
module will always succeed in our local environment.

Thanks
Ning

Show quoted text

--
Michael

#7Michael Paquier
michael@paquier.xyz
In reply to: Ning Yu (#6)
Re: Possible race condition in pg_mkdir_p()?

On Tue, Jul 30, 2019 at 06:22:59PM +0800, Ning Yu wrote:

In fact personally I'm thinking that whether could we replace all uses of
MakePGDirectory() with pg_mkdir_p(), so we could simplify
TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
significantly.

I would still keep the wrapper, but I think that as a final result we
should be able to get the code in PathNameCreateTemporaryDir() shaped
in such a way that there are no multiple attempts at calling
MakePGDirectory() on EEXIST. This has been introduced by dc6c4c9d to
allow sharing temporary files between backends, which is rather recent
but a fixed set of two retries is not a deterministic method of
resolution.

Well, we should have fixed problem 2, this is our major purpose of the patch
0001, it performs sanity check with stat() after mkdir() at each part of the
path.

I just reuse the script presented at the top of the thread with n=2,
and I get that:
pgdata/logs/1.log:creating directory
pgdata/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/1
... initdb: error: could not create directory "pgdata/a": File exists

But the result expected is that all the paths should be created with
no complains about the parents existing, no? This reproduces on my
Debian box 100% of the time, for different sub-paths. So something
looks wrong in your solution. The code comes originally from FreeBSD,
how do things happen there. Do we get failures if doing something
like that? I would expect this sequence to not fail:
for i in `seq 1 100`; do mkdir -p b/c/d/f/g/h/j/$i; done
--
Michael

#8Andres Freund
andres@anarazel.de
In reply to: Ning Yu (#1)
Re: Possible race condition in pg_mkdir_p()?

Hi,

On 2019-07-18 16:17:22 +0800, Ning Yu wrote:

This seems buggy as it first checks the existence of the dir and makes the
dir if it does not exist yet, however when executing concurrently a
possible race condition can be as below:

A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists

Hm. I'm not really seing much of a point in making mkdir_p safe against
all of this. What's the scenario for pg where this matters? I assume
you're using it for somewhat different purposes, and that's why it is
problematic for you?

Greetings,

Andres Freund

#9Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#7)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 30, 2019 at 06:22:59PM +0800, Ning Yu wrote:

In fact personally I'm thinking that whether could we replace all uses of
MakePGDirectory() with pg_mkdir_p(), so we could simplify
TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
significantly.

I would still keep the wrapper, but I think that as a final result we
should be able to get the code in PathNameCreateTemporaryDir() shaped
in such a way that there are no multiple attempts at calling
MakePGDirectory() on EEXIST. This has been introduced by dc6c4c9d to
allow sharing temporary files between backends, which is rather recent
but a fixed set of two retries is not a deterministic method of
resolution.

Well, we should have fixed problem 2, this is our major purpose of the patch
0001, it performs sanity check with stat() after mkdir() at each part of the
path.

I just reuse the script presented at the top of the thread with n=2,
and I get that:
pgdata/logs/1.log:creating directory
pgdata/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/1
... initdb: error: could not create directory "pgdata/a": File exists

Could I double confirm with you that you made a clean rebuild after
applying the patches? pg_mkdir_p() is compiled as part of libpgport.a,
and the postgres makefile will not relink the initdb binary
automatically, for myself I must 'make clean' and 'make' to ensure
initdb gets relinked.

Show quoted text

But the result expected is that all the paths should be created with
no complains about the parents existing, no? This reproduces on my
Debian box 100% of the time, for different sub-paths. So something
looks wrong in your solution. The code comes originally from FreeBSD,
how do things happen there. Do we get failures if doing something
like that? I would expect this sequence to not fail:
for i in `seq 1 100`; do mkdir -p b/c/d/f/g/h/j/$i; done
--
Michael

#10Ning Yu
nyu@pivotal.io
In reply to: Andres Freund (#8)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 12:11 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-07-18 16:17:22 +0800, Ning Yu wrote:

This seems buggy as it first checks the existence of the dir and makes the
dir if it does not exist yet, however when executing concurrently a
possible race condition can be as below:

A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists

Hm. I'm not really seing much of a point in making mkdir_p safe against
all of this. What's the scenario for pg where this matters? I assume
you're using it for somewhat different purposes, and that's why it is
problematic for you?

Yes, you are right, postgres itself might not run into such kind of race
condition issue. The problem we encountered was on a downstream product
of postgres, where multiple postgres clusters are deployed on the same
machine with common parent dirs.

Best Regards
Ning

Show quoted text

Greetings,

Andres Freund

#11Michael Paquier
michael@paquier.xyz
In reply to: Ning Yu (#9)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 12:26:30PM +0800, Ning Yu wrote:

Could I double confirm with you that you made a clean rebuild after
applying the patches? pg_mkdir_p() is compiled as part of libpgport.a,
and the postgres makefile will not relink the initdb binary
automatically, for myself I must 'make clean' and 'make' to ensure
initdb gets relinked.

For any patch I test, I just do a "git clean -d -x -f" before building
as I switch a lot across stable branches as well. It looks that you
are right on this one though, I have just rebuilt from scratch and I
don't see the failures anymore.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#8)
Re: Possible race condition in pg_mkdir_p()?

On Tue, Jul 30, 2019 at 09:11:44PM -0700, Andres Freund wrote:

Hm. I'm not really seing much of a point in making mkdir_p safe against
all of this. What's the scenario for pg where this matters? I assume
you're using it for somewhat different purposes, and that's why it is
problematic for you?

I don't see why it is a problem to make our APIs more stable if we
have ways to do so. I actually fixed one recently as of 754b90f for a
problem that involved a tool linking to our version of readdir() that
we ship. Even with that, the retries for mkdir() on the base
directory in PathNameCreateTemporaryDir() are basically caused by that
same limitation with the parent paths from this report, no? So we
could actually remove the dependency to the base directory in this
code path and just rely on pg_mkdir_p() to do the right thing for all
the parent paths. That's also a point raised by Ning upthread.
--
Michael

#13Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#11)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 12:41 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jul 31, 2019 at 12:26:30PM +0800, Ning Yu wrote:

Could I double confirm with you that you made a clean rebuild after
applying the patches? pg_mkdir_p() is compiled as part of libpgport.a,
and the postgres makefile will not relink the initdb binary
automatically, for myself I must 'make clean' and 'make' to ensure
initdb gets relinked.

For any patch I test, I just do a "git clean -d -x -f" before building
as I switch a lot across stable branches as well. It looks that you
are right on this one though, I have just rebuilt from scratch and I
don't see the failures anymore.

Cool, glad to know that it works.

Best Regards
Ning

Show quoted text

--
Michael

#14Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#12)
Re: Possible race condition in pg_mkdir_p()?

Hi,

On 2019-07-31 13:48:23 +0900, Michael Paquier wrote:

On Tue, Jul 30, 2019 at 09:11:44PM -0700, Andres Freund wrote:

Hm. I'm not really seing much of a point in making mkdir_p safe against
all of this. What's the scenario for pg where this matters? I assume
you're using it for somewhat different purposes, and that's why it is
problematic for you?

I don't see why it is a problem to make our APIs more stable if we
have ways to do so.

Well, wanting to support additional use-cases often isn't free. There's
additional code for the new usecase, there's review & commit time,
there's additional test time, there's bug fixes for the new code etc.

We're not developing a general application support library...

I don't really have a problem fixing this case if we think it's
useful. But I'm a bit bothered by all the "fixes" being submitted that
don't matter for PG itself. They do eat up resources.

And sorry, adding in-backend threading to test testing mkdir_p doesn't
make me inclined to believe that this is all well considered. There's
minor issues like us not supporting threads in the backend, pthread not
being portable, and also it being entirely out of proportion to the
issue.

Greetings,

Andres Freund

#15Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#14)
Re: Possible race condition in pg_mkdir_p()?

On Tue, Jul 30, 2019 at 10:19:45PM -0700, Andres Freund wrote:

I don't really have a problem fixing this case if we think it's
useful. But I'm a bit bothered by all the "fixes" being submitted that
don't matter for PG itself. They do eat up resources.

Sure. In this particular case, we can simplify at least one code path
in the backend though for temporary path creation. Such cleanup rings
like a sufficient argument to me.

And sorry, adding in-backend threading to test testing mkdir_p doesn't
make me inclined to believe that this is all well considered. There's
minor issues like us not supporting threads in the backend, pthread not
being portable, and also it being entirely out of proportion to the
issue.

Agreed on this one. The test case may be useful for the purpose of
testing the presented patches, but it does not have enough value to be
merged.
--
Michael

#16Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#15)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 1:31 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 30, 2019 at 10:19:45PM -0700, Andres Freund wrote:

I don't really have a problem fixing this case if we think it's
useful. But I'm a bit bothered by all the "fixes" being submitted that
don't matter for PG itself. They do eat up resources.

Sure. In this particular case, we can simplify at least one code path
in the backend though for temporary path creation. Such cleanup rings
like a sufficient argument to me.

Yes, in current postgres source code there are several wrappers of
mkdir() that do similar jobs. If we could have a safe mkdir_p()
implementation then we could use it directly in all these wrappers, that
could save a lot of maintenance effort in the long run. I'm not saying
that our patches are enough to make it safe and reliable, and I agree
that any patches may introduce new bugs, but I think that having a safe
and unified mkdir_p() is a good direction to go.

And sorry, adding in-backend threading to test testing mkdir_p doesn't
make me inclined to believe that this is all well considered. There's
minor issues like us not supporting threads in the backend, pthread not
being portable, and also it being entirely out of proportion to the
issue.

Agreed on this one. The test case may be useful for the purpose of
testing the presented patches, but it does not have enough value to be
merged.

Yes, that's why we put the testing module in a separate patch from the
fix, feel free to ignore it. In fact ourselves have concerns about it ;)

Best Regards
Ning

Show quoted text

--
Michael

#17Michael Paquier
michael@paquier.xyz
In reply to: Ning Yu (#16)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 02:02:03PM +0800, Ning Yu wrote:

Yes, in current postgres source code there are several wrappers of
mkdir() that do similar jobs. If we could have a safe mkdir_p()
implementation then we could use it directly in all these wrappers, that
could save a lot of maintenance effort in the long run. I'm not saying
that our patches are enough to make it safe and reliable, and I agree
that any patches may introduce new bugs, but I think that having a safe
and unified mkdir_p() is a good direction to go.

That's my impression as well. Please note that this does not involve
an actual bug in Postgres and that this is rather invasive, so this
does not really qualify for a back-patch. No objections to simplify
the backend on HEAD though. It would be good if you could actually
register a patch to the commit fest app [1]https://commitfest.postgresql.org/24/ -- Michael and also rework the patch
set so as at least PathNameCreateTemporaryDir wins its simplifications
for the first problem (double-checking the other code paths would be
nice as well). The EEXIST handling, and the confusion about EEXIST
showing for both a path and a file need some separate handling (not
sure what to do on these parts yet).

Yes, that's why we put the testing module in a separate patch from the
fix, feel free to ignore it. In fact ourselves have concerns about it ;)

I think that it is nice that you took the time to do so as you get
yourself more familiar with the TAP infrastructure in the tree and
prove your point. For this case, I would not have gone to do this
much though ;p

[1]: https://commitfest.postgresql.org/24/ -- Michael
--
Michael

#18Ning Yu
nyu@pivotal.io
In reply to: Michael Paquier (#17)
Re: Possible race condition in pg_mkdir_p()?

On Wed, Jul 31, 2019 at 2:21 PM Michael Paquier <michael@paquier.xyz> wrote:

That's my impression as well. Please note that this does not involve
an actual bug in Postgres and that this is rather invasive, so this
does not really qualify for a back-patch. No objections to simplify
the backend on HEAD though. It would be good if you could actually
register a patch to the commit fest app [1] and also rework the patch
set so as at least PathNameCreateTemporaryDir wins its simplifications
for the first problem (double-checking the other code paths would be
nice as well). The EEXIST handling, and the confusion about EEXIST
showing for both a path and a file need some separate handling (not
sure what to do on these parts yet).

Thanks for the suggestion and information, we will rework the patch and
register it. The planned changes are: 1) remove the tests (cool!), 2)
simplify PathNameCreateTemporaryDir() and other callers. The EEXIST
handling will be put in a separate patch so depends on the reviews we
could accept or drop it easily.

Best Regards
Ning