get_controlfile() can leak fds in the backend
Hi all,
(CC-ing Joe as of dc7d70e)
I was just looking at the offline checksum patch, and noticed some
sloppy coding in controldata_utils.c. The control file gets opened in
get_controlfile(), and if it generates an error then the file
descriptor remains open. As basic code rule in the backend we should
only use BasicOpenFile() when opening files, so I think that the issue
should be fixed as attached, even if this requires including fd.h for
the backend compilation which is kind of ugly.
Another possibility would be to just close the file descriptor before
any error, saving appropriately errno before issuing any %m portion,
but that still does not respect the backend policy regarding open().
We also do not have a wait event for the read() call, maybe we should
have one, but knowing that this gets called only for the SQL-level
functions accessing the control file, I don't really think that's
worth it.
Thoughts?
--
Michael
Attachments:
controlfile-leak.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index abfe7065f5..0b011741e3 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -27,6 +27,9 @@
#include "catalog/pg_control.h"
#include "common/controldata_utils.h"
#include "port/pg_crc32c.h"
+#ifndef FRONTEND
+#include "storage/fd.h"
+#endif
/*
* get_controlfile(char *DataDir, const char *progname, bool *crc_ok_p)
@@ -45,13 +48,20 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
char ControlFilePath[MAXPGPATH];
pg_crc32c crc;
int r;
+ int flags = O_RDONLY | PG_BINARY;
AssertArg(crc_ok_p);
ControlFile = palloc(sizeof(ControlFileData));
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
- if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
+#ifndef FRONTEND
+ fd = BasicOpenFile(ControlFilePath, flags);
+#else
+ fd = open(ControlFilePath, flags, 0);
+#endif
+
+ if (fd < 0)
#ifndef FRONTEND
ereport(ERROR,
(errcode_for_file_access(),
Bonjour Michaᅵl,
Thoughts?
None.
However, while at it, there is also the question of whether the control
file should be locked when updated, eg with flock(2) to avoid race
conditions between concurrent commands. ISTM that there is currently not
such thing in the code, but that it would be desirable.
--
Fabie.
Hi,
On 2019-02-27 10:23:45 +0100, Fabien COELHO wrote:
However, while at it, there is also the question of whether the control file
should be locked when updated, eg with flock(2) to avoid race conditions
between concurrent commands. ISTM that there is currently not such thing in
the code, but that it would be desirable.
Shouldn't be necessary - the control file fits into a single page, and
writes of that size ought to always be atomic. And I also think
introducing flock usage for this would be quite disproportional.
Greetings,
Andres Freund
However, while at it, there is also the question of whether the control file
should be locked when updated, eg with flock(2) to avoid race conditions
between concurrent commands. ISTM that there is currently not such thing in
the code, but that it would be desirable.Shouldn't be necessary - the control file fits into a single page, and
writes of that size ought to always be atomic. And I also think
introducing flock usage for this would be quite disproportional.
Ok, fine.
Note that my concern is not about the page size, but rather that as more
commands may change the cluster status by editing the control file, it
would be better that a postmaster does not start while a pg_rewind or
enable checksum or whatever is in progress, and currently there is a
possible race condition between the read and write that can induce an
issue, at least theoretically.
--
Fabien.
On 2/27/19 2:47 AM, Michael Paquier wrote:
Hi all,
(CC-ing Joe as of dc7d70e)I was just looking at the offline checksum patch, and noticed some
sloppy coding in controldata_utils.c. The control file gets opened in
get_controlfile(), and if it generates an error then the file
descriptor remains open. As basic code rule in the backend we should
only use BasicOpenFile() when opening files, so I think that the issue
should be fixed as attached, even if this requires including fd.h for
the backend compilation which is kind of ugly.Another possibility would be to just close the file descriptor before
any error, saving appropriately errno before issuing any %m portion,
but that still does not respect the backend policy regarding open().
In fd.c I see:
8<--------------------
* AllocateFile, AllocateDir, OpenPipeStream and OpenTransientFile are
* wrappers around fopen(3), opendir(3), popen(3) and open(2),
* respectively. They behave like the corresponding native functions,
* except that the handle is registered with the current subtransaction,
* and will be automatically closed at abort. These are intended mainly
* for short operations like reading a configuration file; there is a
* limit on the number of files that can be opened using these functions
* at any one time.
*
* Finally, BasicOpenFile is just a thin wrapper around open() that can
* release file descriptors in use by the virtual file descriptors if
* necessary. There is no automatic cleanup of file descriptors returned
* by BasicOpenFile, it is solely the caller's responsibility to close
* the file descriptor by calling close(2).
8<--------------------
According to that comment BasicOpenFile does not seem to solve the issue
you are pointing out (leaking of file descriptor on ERROR). Perhaps
OpenTransientFile() is more appropriate? I am on the road at the moment
so have not looked very deeply at this yet though.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On 2/27/19 10:26 AM, Joe Conway wrote:
On 2/27/19 2:47 AM, Michael Paquier wrote:
Hi all,
(CC-ing Joe as of dc7d70e)
According to that comment BasicOpenFile does not seem to solve the issue
you are pointing out (leaking of file descriptor on ERROR). Perhaps
OpenTransientFile() is more appropriate? I am on the road at the moment
so have not looked very deeply at this yet though.
It seems to me that OpenTransientFile() is more appropriate. Patch done
that way attached.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Attachments:
controldata_utils-fdfix-01.difftext/x-patch; name=controldata_utils-fdfix-01.diffDownload
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index abfe706..c3b1934 100644
*** a/src/common/controldata_utils.c
--- b/src/common/controldata_utils.c
***************
*** 27,32 ****
--- 27,35 ----
#include "catalog/pg_control.h"
#include "common/controldata_utils.h"
#include "port/pg_crc32c.h"
+ #ifndef FRONTEND
+ #include "storage/fd.h"
+ #endif
/*
* get_controlfile(char *DataDir, const char *progname, bool *crc_ok_p)
*************** get_controlfile(const char *DataDir, con
*** 51,63 ****
ControlFile = palloc(sizeof(ControlFileData));
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
- if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
#ifndef FRONTEND
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\" for reading: %m",
ControlFilePath)));
#else
{
fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
progname, ControlFilePath, strerror(errno));
--- 54,67 ----
ControlFile = palloc(sizeof(ControlFileData));
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
#ifndef FRONTEND
+ if ((fd = OpenTransientFile(ControlFilePath, O_RDONLY | PG_BINARY)) == -1)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\" for reading: %m",
ControlFilePath)));
#else
+ if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
{
fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"),
progname, ControlFilePath, strerror(errno));
*************** get_controlfile(const char *DataDir, con
*** 95,101 ****
--- 99,109 ----
#endif
}
+ #ifndef FRONTEND
+ CloseTransientFile(fd);
+ #else
close(fd);
+ #endif
/* Check the CRC. */
INIT_CRC32C(crc);
On Wed, Feb 27, 2019 at 11:50:17AM +0100, Fabien COELHO wrote:
Shouldn't be necessary - the control file fits into a single page, and
writes of that size ought to always be atomic. And I also think
introducing flock usage for this would be quite disproportional.
There are static assertions to make sure that the side of control file
data never gets higher than 512 bytes for this purpose.
Note that my concern is not about the page size, but rather that as more
commands may change the cluster status by editing the control file, it would
be better that a postmaster does not start while a pg_rewind or enable
checksum or whatever is in progress, and currently there is a possible race
condition between the read and write that can induce an issue, at least
theoretically.
Something that I think we could live instead is a special flag in the
control file to mark the postmaster as in maintenance mode. This
would be useful to prevent the postmaster to start if seeing this flag
in the control file, as well to find out that a host has crashed in
the middle of a maintenance operation. We don't give this insurance
now when running pg_rewind, which is bad. That's also separate from
the checksum-related patches and pg_rewind.
flock() can be something hard to live with for cross-platform
compatibility like Windows (LockFileEx) or fancy platforms. And note
that we don't use it yet in the tree. And flock() would help in the
first case I am mentioning, but not in the second.
--
Michael
Hi,
On 2019-02-27 11:50:17 +0100, Fabien COELHO wrote:
Note that my concern is not about the page size, but rather that as more
commands may change the cluster status by editing the control file, it would
be better that a postmaster does not start while a pg_rewind or enable
checksum or whatever is in progress, and currently there is a possible race
condition between the read and write that can induce an issue, at least
theoretically.
Seems odd to bring this up in this thread, it really has nothing to do
with the topic. If we were to want to do more here, ISTM the right
approach would use the postmaster pid file, not the control file.
Greetings,
Andres Freund
On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
It seems to me that OpenTransientFile() is more appropriate. Patch done
that way attached.
Works for me, thanks for sending a patch! While on it, could you
clean up the comment on top of get_controlfile()? "char" is mentioned
instead of "const char" for DataDir which is incorrect. I would
remove the complete set of arguments from the description and just
keep the routine name.
--
Michael
Hello Andres,
Note that my concern is not about the page size, but rather that as more
commands may change the cluster status by editing the control file, it would
be better that a postmaster does not start while a pg_rewind or enable
checksum or whatever is in progress, and currently there is a possible race
condition between the read and write that can induce an issue, at least
theoretically.Seems odd to bring this up in this thread, it really has nothing to do
with the topic.
Indeed. I raised it here because it is in the same area of code and
Michaᅵl was looking at it.
If we were to want to do more here, ISTM the right approach would use
the postmaster pid file, not the control file.
ISTM that this just means re-inventing a manual poor-featured
race-condition-prone lock API around another file, which seems to be
created more or less only by "pg_ctl", while some other commands use the
control file (eg pg_rewind, AFAICS).
--
Fabien.
On 2/27/19 7:54 PM, Michael Paquier wrote:
On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
It seems to me that OpenTransientFile() is more appropriate. Patch done
that way attached.Works for me, thanks for sending a patch! While on it, could you
clean up the comment on top of get_controlfile()? "char" is mentioned
instead of "const char" for DataDir which is incorrect. I would
remove the complete set of arguments from the description and just
keep the routine name.
Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.
I agree that this would unlikely become an issue as an error on the
control file would most likely be a PANIC when updating it, so fixing
only HEAD sounds fine to me. Thanks for asking!
--
Michael
Hi,
On 2019-02-28 09:54:48 +0100, Fabien COELHO wrote:
If we were to want to do more here, ISTM the right approach would use
the postmaster pid file, not the control file.ISTM that this just means re-inventing a manual poor-featured
race-condition-prone lock API around another file, which seems to be created
more or less only by "pg_ctl", while some other commands use the control
file (eg pg_rewind, AFAICS).
Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
see if the backend has finished starting up etc. It's precisely what the
backend uses to prevent two postmasters to start etc. It's also what say
pg_resetwal checks to protect against a concurrently running lcuster
(albeit in a racy way). If we want to make things more bulletproof,
that's the place. The control file is constantly written to, sometimes
by different processes, it'd just not be a good file for such lockout
mechanisms.
Greetings,
Andres Freund
On 2/28/19 7:20 AM, Michael Paquier wrote:
On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.I agree that this would unlikely become an issue as an error on the
control file would most likely be a PANIC when updating it, so fixing
only HEAD sounds fine to me. Thanks for asking!
Committed and push that way.
By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
On Thu, Feb 28, 2019 at 04:09:32PM -0500, Joe Conway wrote:
Committed and push that way.
Thanks for committing a fix.
By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?
There are roughly 40~42 callers of OpenTransientFile(). Looking at
them I can see that RestoreSlotFromDisk() could also switch to RDONLY
instead of RDWR. I am also a bit tired of the lack error handling
around CloseTransientFile(). While in some code paths the file
descriptors are closed for an error, in some others we should report
something. I am going to send a patch after a lookup. Let's see all
that on a separate thread.
--
Michael
On Thu, Feb 28, 2019 at 01:07:23PM -0800, Andres Freund wrote:
Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
see if the backend has finished starting up etc. It's precisely what the
backend uses to prevent two postmasters to start etc. It's also what say
pg_resetwal checks to protect against a concurrently running lcuster
(albeit in a racy way). If we want to make things more bulletproof,
that's the place. The control file is constantly written to, sometimes
by different processes, it'd just not be a good file for such lockout
mechanisms.
Hijacking more my own thread... I can do that, right? Or not.
One thing is that we don't protect a data folder to be started when it
is in the process of being treated by an external tool, like
pg_rewind, or pg_checksums. So having an extra flag in the control
file, which can be used by external tools to tell that the data folder
is being treated for something does not sound that crazy to me.
Having a tool write a fake postmaster.pid for this kind of task does
not sound right from a correctness point of view, because the instance
is not started.
And a lock API won't protect much if the host is unplugged as well if
its state is made durable...
Let's keep the discussions where they are by the way. Joe has just
closed the report of this thread with 4598a99, so I am moving on to
the correct places.
My apologies for the digressions.
--
Michael
Hi,
On 2019-03-01 10:11:53 +0900, Michael Paquier wrote:
One thing is that we don't protect a data folder to be started when it
is in the process of being treated by an external tool, like
pg_rewind, or pg_checksums. So having an extra flag in the control
file, which can be used by external tools to tell that the data folder
is being treated for something does not sound that crazy to me.
Having a tool write a fake postmaster.pid for this kind of task does
not sound right from a correctness point of view, because the instance
is not started.
I think putting this into the control file is a seriously bad
idea. Postmaster interlocks against other postmasters running via
postmaster.pid. Having a second interlock mechanism, in a different
file, doesn't make any sort of sense. Nor does it seem sane to have
external tool write over data as INTENSELY critical as the control file,
when they then have to understand CRCs etc.
Let's keep the discussions where they are by the way. Joe has just
closed the report of this thread with 4598a99, so I am moving on to
the correct places.
I don't know what that means, given you replied to the above in this
thread?
Greetings,
Andres Freund
Hello Andres,
I think putting this into the control file is a seriously bad
idea. Postmaster interlocks against other postmasters running via
postmaster.pid.
Having a second interlock mechanism, in a different file, doesn't make
any sort of sense. Nor does it seem sane to have external tool write
over data as INTENSELY critical as the control file, when they then have
to understand CRCs etc.
On this point, there are functions for that, get/update_controlfile, so
this should be factored out.
The initial insentive for raising the issue, probably in the wrong thread
and without a clear understanding of the matter, is that I've been
reviewing a patch to enable/disable checksums on a stopped cluster.
The patch updates all the checksums in all the files, and changes the
control file to tell that there are checksums. Currently it checks and
creates a fake "posmaster.pid" file to attempt to prevent another tool to
run concurrently to this operation, with ISTM a procedure prone to race
conditions thus does not warrant that it would be the only tool running on
the cluster. This looked to me as a bad hack. Given that other command
that take on a cluster seemed to use the controlfile to signal that they
are doing something, I'd thought that it would be the way to go, but then
I noticed that the control file read/write procedure looks as bad as the
postmaster.pid hack to ensure that only one command is active.
Nevertheless, I'm ranting in the wrong thread and it seems that these is
no real problem to solve, so I'll say fine to the to-me-unconvincing
"postmaster.pid" hack proposed in the patch.
--
Fabien.