Bug in slot.c and are replication slots ever used at Window?
Hi hackers,
I am really confused. If my conclusions are correct, then nobody ever
tried to use replication slots at Windows!
The function RestoreSlotFromDisk in slot.c contains the following code:
static void
RestoreSlotFromDisk(const char *name)
{
ReplicationSlotOnDisk cp;
int i;
char path[MAXPGPATH + 22];
int fd;
bool restored = false;
int readBytes;
pg_crc32c checksum;
/* no need to lock here, no concurrent access allowed yet */
/* delete temp file if it exists */
sprintf(path, "pg_replslot/%s/state.tmp", name);
if (unlink(path) < 0 && errno != ENOENT)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
sprintf(path, "pg_replslot/%s/state", name);
elog(DEBUG1, "restoring replication slot from \"%s\"", path);
fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
/*
* We do not need to handle this as we are rename()ing the
directory into
* place only after we fsync()ed the state file.
*/
if (fd < 0)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not open file \"%s\": %m", path)));
/*
* Sync state file before we're reading from it. We might have crashed
* while it wasn't synced yet and we shouldn't continue on that basis.
*/
pgstat_report_wait_start(WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC);
if (pg_fsync(fd) != 0)
{
int save_errno = errno;
CloseTransientFile(fd);
errno = save_errno;
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not fsync file \"%s\": %m",
path)));
}
pgstat_report_wait_end();
/* Also sync the parent directory */
START_CRIT_SECTION();
fsync_fname(path, true);
END_CRIT_SECTION();
...
Please notice that fsync_fname with comment "also sync parent directory"
is called for path of the file!
fsync_fname in turn does the following:
/*
* fsync_fname -- Try to fsync a file or directory
*
* Ignores errors trying to open unreadable files, or trying to fsync
* directories on systems where that isn't allowed/required. Reports
* other errors non-fatally.
*/
int
fsync_fname(const char *fname, bool isdir, const char *progname)
{
int fd;
int flags;
int returncode;
/*
* Some OSs require directories to be opened read-only whereas other
* systems don't allow us to fsync files opened read-only; so we
need both
* cases here. Using O_RDWR will cause us to fail to fsync files
that are
* not writable by our userid, but we assume that's OK.
*/
flags = PG_BINARY;
if (!isdir)
flags |= O_RDWR;
else
flags |= O_RDONLY;
/*
* Open the file, silently ignoring errors about unreadable files (or
* unsupported operations, e.g. opening a directory under Windows), and
* logging others.
*/
fd = open(fname, flags);
if (fd < 0)
{
if (errno == EACCES || (isdir && errno == EISDIR))
return 0;
fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
progname, fname, strerror(errno));
return -1;
}
returncode = fsync(fd);
So if "isdir" is true (and it is true in this case), it sets O_RDONLY flag.
Then fsync_fname successfully opens slot file in readonly mode and calls
fsync() which at windows
is substituted with _commit() which in turn is wrapper for FlushFileBuffers.
Finally FlushFileBuffers returns ERROR_ACCESS_DENINED which cause
assertion failure in _commit:
if ( !FlushFileBuffers((HANDLE)_get_osfhandle(filedes)) ) {
retval = GetLastError();
}
else {
retval = 0; /* return success */
}
/* map the OS return code to C errno value and return code */
if (retval == 0)
goto good;
_doserrno = retval;
}
errno = EBADF;
retval = -1;
_ASSERTE(("Invalid file descriptor. File possibly closed by a different thread",0));
I think that the problem happen only with debug version of postgres.
Release version will just return error in this case which is silently ignored by RestoreSlotFromDisk function.
I think that bug fix is trivial: we just need to use fsync_parent_path instead of fsync_fname in RestoreSlotFromDisk.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Thu, Aug 30, 2018 at 11:00:43AM +0300, Konstantin Knizhnik wrote:
So if "isdir" is true (and it is true in this case), it sets O_RDONLY
flag. Then fsync_fname successfully opens slot file in readonly mode
and calls fsync() which at windows is substituted with _commit() which
in turn is wrapper for FlushFileBuffers.
It seems to me that you are right here, "path" points to
pg_replslot/$SLOTNAME/state which is a file so the fsync is incorrect.
I am not sure that we'd want to publish fsync_parent_path out of fd.c
though, so perhaps we could just save the slot path in a different
variable and use it?
--
Michael
Hi Konstantin,
On Thu, Aug 30, 2018 at 11:27:31AM -0700, Michael Paquier wrote:
It seems to me that you are right here, "path" points to
pg_replslot/$SLOTNAME/state which is a file so the fsync is incorrect.
I am not sure that we'd want to publish fsync_parent_path out of fd.c
though, so perhaps we could just save the slot path in a different
variable and use it?
I have spent more time on this bug, and the code path you have pointed
at is the only one having such an issue. Attached is a patch to fix the
problem. It includes the sanity checks I have used to check all code
paths calling fsync_fname() for both the frontend and the backend code.
The checks will not be included in the final fix, still they look useful
so I am planning to start a new thread on the matter as perhaps other
folks have more and/or better ideas.
--
Michael
Attachments:
slotpath_sync.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 19978d9a9e..4f30904141 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1352,6 +1352,7 @@ RestoreSlotFromDisk(const char *name)
{
ReplicationSlotOnDisk cp;
int i;
+ char slotdir[MAXPGPATH];
char path[MAXPGPATH + 22];
int fd;
bool restored = false;
@@ -1361,13 +1362,14 @@ RestoreSlotFromDisk(const char *name)
/* no need to lock here, no concurrent access allowed yet */
/* delete temp file if it exists */
- sprintf(path, "pg_replslot/%s/state.tmp", name);
+ sprintf(slotdir, "pg_replslot/%s", name);
+ sprintf(path, "%s/state.tmp", slotdir);
if (unlink(path) < 0 && errno != ENOENT)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
- sprintf(path, "pg_replslot/%s/state", name);
+ sprintf(path, "%s/state", slotdir);
elog(DEBUG1, "restoring replication slot from \"%s\"", path);
@@ -1402,7 +1404,7 @@ RestoreSlotFromDisk(const char *name)
/* Also sync the parent directory */
START_CRIT_SECTION();
- fsync_fname(path, true);
+ fsync_fname(slotdir, true);
END_CRIT_SECTION();
/* read part of statefile that's guaranteed to be version independent */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..49a5640c61 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -574,6 +574,14 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
void
fsync_fname(const char *fname, bool isdir)
{
+#ifdef USE_ASSERT_CHECKING
+ struct stat statbuf;
+
+ stat(fname, &statbuf);
+ Assert((isdir && S_ISDIR(statbuf.st_mode)) ||
+ (!isdir && !S_ISDIR(statbuf.st_mode)));
+#endif
+
fsync_fname_ext(fname, isdir, false, ERROR);
}
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 48876061c3..36095b01af 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -266,6 +266,14 @@ fsync_fname(const char *fname, bool isdir, const char *progname)
int flags;
int returncode;
+#ifdef USE_ASSERT_CHECKING
+ struct stat statbuf;
+
+ stat(fname, &statbuf);
+ Assert((isdir && S_ISDIR(statbuf.st_mode)) ||
+ (!isdir && !S_ISDIR(statbuf.st_mode)));
+#endif
+
/*
* Some OSs require directories to be opened read-only whereas other
* systems don't allow us to fsync files opened read-only; so we need both
On Fri, Aug 31, 2018 at 11:46:47AM -0700, Michael Paquier wrote:
The checks will not be included in the final fix, still they look useful
so I am planning to start a new thread on the matter as perhaps other
folks have more and/or better ideas.
Pushed the fix down to 9.4, without the extra sanity checks.
--
Michael