too-may-open-files log file entries when vauuming under solaris
Dear all,
recently we have seen a lot of occurrences of "out of file descriptors: Too many open files; release and retry" in our postgres log files, every night when a "vacuum full analyze" is run. After some digging into the code we found that postgres potentially tries to open as many as a pre-determined maximum number of file descriptors when vacuuming. That number is the lesser of the one from the configuration file (max_files_per_process) and the one determined at start-up by "src/backend/storage/file/fd.c::count_usable_fds()". Under Solaris now, it would seem, finding out that number via dup(0) is not sufficient, as the actual number of interest might be/is the number of usable stream file descriptors (up until Solaris 10, at least). Also, closing the last recently used file descriptor might therefore not solve a temporary problem (as something below 256 is needed). Now, this can be fixed by setting/leaving the descriptor limit at 256 or changing the postgresql.conf setting accordingly. Still, the function for determining the max number is not working as intended under Solaris, it would appear. One might try using fopen() instead of dup() or have a different handling for stream and normal file descriptors (including moving standard file descriptors to above 255 to leave room for stream ones). Maybe though, all this is not worth the effort; then it might perhaps be a good idea to mention the limitations/specialties in the platform specific notes (e.g. have u/limit at 256 maximum).
cheers
hardy
Hartmut Raschick
Network Management Solutions
-----------------------------------
KEYMILE GmbH
Wohlenbergstr. 3
D-30175 Hannover, Germany
Phone: +49 (0)511 6747-564
Fax: +49 (0)511 6747-777
mailto:Hartmut.Raschick@keymile.com
http://www.keymile.com
<< KEYMILE - because connectivity matters >>
Geschäftsführer/Managing Directors: Björn Claaßen, Michael Breyer, Axel Föry - Rechtsform der Gesellschaft/Legal structure: GmbH, Sitz/Registered office: Hannover HRB 61069, Amtsgericht/Local court Hannover, USt-Id. Nr./VAT-Reg.-No.: DE 812282795, WEEE-Reg.-No.: DE 59336750
"Raschick, Hartmut" <Hartmut.Raschick@keymile.com> writes:
recently we have seen a lot of occurrences of "out of file descriptors:
Too many open files; release and retry" in our postgres log files, every
night when a "vacuum full analyze" is run. After some digging into the
code we found that postgres potentially tries to open as many as a
pre-determined maximum number of file descriptors when vacuuming. That
number is the lesser of the one from the configuration file
(max_files_per_process) and the one determined at start-up by
"src/backend/storage/file/fd.c::count_usable_fds()". Under Solaris now,
it would seem, finding out that number via dup(0) is not sufficient, as
the actual number of interest might be/is the number of usable stream
file descriptors (up until Solaris 10, at least). Also, closing the last
recently used file descriptor might therefore not solve a temporary
problem (as something below 256 is needed). Now, this can be fixed by
setting/leaving the descriptor limit at 256 or changing the
postgresql.conf setting accordingly. Still, the function for determining
the max number is not working as intended under Solaris, it would
appear. One might try using fopen() instead of dup() or have a different
handling for stream and normal file descriptors (including moving
standard file descriptors to above 255 to leave room for stream
ones). Maybe though, all this is not worth the effort; then it might
perhaps be a good idea to mention the limitations/specialties in the
platform specific notes (e.g. have u/limit at 256 maximum).
TBH this sounds like unfounded speculation. AFAIK a Postgres backend will
not open anything but regular files after its initial startup. I'm not
sure what a "stream" is on Solaris, but guessing that it refers to pipes
or sockets, I don't think we have a problem with an OS restriction that
those be below FD 256. In any case, if we did, it would presumably show
up as errors not release-and-retry events.
Our usual experience is that you get release-and-retry log messages when
the OS is up against the system-wide open-file limit rather than the
per-process limit (ie, the underlying error code is ENFILE not EMFILE).
I don't know exactly how Solaris strerror() spells those codes so it's
difficult to tell from your reported log message which case is happening.
If it is the system-wide limit that's at issue, then of course the dup(0)
loop isn't likely to find it, and adjusting max_files_per_process (or
maybe better, reducing max_connections) is the expected solution.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, March 5, 2014 9:17 PM
To: Raschick, Hartmut
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] too-may-open-files log file entries when
vauuming under solaris"Raschick, Hartmut" <Hartmut.Raschick@keymile.com> writes:
recently we have seen a lot of occurrences of "out of file
descriptors:
Too many open files; release and retry" in our postgres log files,
...
..
.
in the platform specific notes (e.g. have u/limit at 256 maximum).
TBH this sounds like unfounded speculation
...
..
.
regards, tom lane
Hmmm... FWIW, we've compiled some more info to illustrate the problem. Starting w/a sniplet from the Solaris 10 "fopen" man page, test programs showing the "difference" between open and fopen calls under Solaris, a postgres log file w/extended logging and a diff-file of that fd.c to show how the addt'l logs were produced. I hope this makes it somewhat clearer, i.e. no system-wide, but a per-process topic and related to that Solaris 32-bit ABI backwards compatibility... surely not a world-wide problem, one would agree. Nevertheless, we'd though it prudent to at least mention it.
Btw, here's what Oracle has to say: http://www.oracle.com/technetwork/server-storage/solaris10/stdio-256-136698.html
I hope the 15K attachment gets through...
cheers,
hardy
Hartmut Raschick
Network Management Solutions
-----------------------------------
KEYMILE GmbH
Wohlenbergstr. 3
D-30175 Hannover, Germany
Phone: +49 (0)511 6747-564
Fax: +49 (0)511 6747-777
mailto:Hartmut.Raschick@keymile.com
http://www.keymile.com
<< KEYMILE - because connectivity matters >>
Geschäftsführer/Managing Directors: Björn Claaßen, Michael Breyer, Axel Föry - Rechtsform der Gesellschaft/Legal structure: GmbH, Sitz/Registered office: Hannover HRB 61069, Amtsgericht/Local court Hannover, USt-Id. Nr./VAT-Reg.-No.: DE 812282795, WEEE-Reg.-No.: DE 59336750