pgsql: Fix error handling of vacuumdb and reindexdb when running out of

Started by Michael Paquieralmost 7 years ago5 messagescomitters
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Fix error handling of vacuumdb and reindexdb when running out of fds

When trying to use a high number of jobs, vacuumdb (and more recently
reindexdb) has only checked for a maximum number of jobs used, causing
confusing failures when running out of file descriptors when the jobs
open connections to Postgres. This commit changes the error handling so
as we do not check anymore for a maximum number of allowed jobs when
parsing the option value with FD_SETSIZE, but check instead if a file
descriptor is within the supported range when opening the connections
for the jobs so as this is detected at the earliest time possible.

Also, improve the error message to give a hint about the number of jobs
recommended, using a wording given by the reviewers of the patch.

Reported-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Andres Freund, Álvaro Herrera, Tom Lane
Discussion: /messages/by-id/20190818001858.ho3ev4z57fqhs7a5@alap3.anarazel.de
Backpatch-through: 9.5

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/71d84efba714db3b8a330a54be15c4d385719ad6

Modified Files
--------------
src/bin/scripts/reindexdb.c | 6 ------
src/bin/scripts/scripts_parallel.c | 26 ++++++++++++--------------
src/bin/scripts/scripts_parallel.h | 2 --
src/bin/scripts/vacuumdb.c | 6 ------
4 files changed, 12 insertions(+), 28 deletions(-)

#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: pgsql: Fix error handling of vacuumdb and reindexdb when running out of

On Mon, Aug 26, 2019 at 02:17:06AM +0000, Michael Paquier wrote:

Fix error handling of vacuumdb and reindexdb when running out of fds

When trying to use a high number of jobs, vacuumdb (and more recently
reindexdb) has only checked for a maximum number of jobs used, causing
confusing failures when running out of file descriptors when the jobs
open connections to Postgres. This commit changes the error handling so
as we do not check anymore for a maximum number of allowed jobs when
parsing the option value with FD_SETSIZE, but check instead if a file
descriptor is within the supported range when opening the connections
for the jobs so as this is detected at the earliest time possible.

Also, improve the error message to give a hint about the number of jobs
recommended, using a wording given by the reviewers of the patch.

jacana does not like that, and has reported an error for 9.6:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-08-26 03%3A51%3A19
# Running: vacuumdb -zj2 postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: too many jobs for this platform -- try 1

I am able to reproduce the problem locally, and the problem is that we
don't declare FD_SETSIZE on Windows before winsock2.h in
scripts_parallel.c (or vacuumdb.c in ~12) so all the Windows machines
running TAP are going to complain on that.

This matches with the same problem reported here
/messages/by-id/1248903114.6348.195.camel@lapdragon

We have never done that before in vacuumdb.c, and because of that I
think that with a high number of jobs we run into buffer overflows.

The patch attached does the job on my end, any objections? There
is an argument to do that in win32_port.h, but for now I'd rather take
a safe path, or just do for the change in win32_port.h on HEAD.

(Noticed the missing newline as well in the error string in
back-branches... I'll address it at the same time.)
--
Michael

Attachments:

windows-fdset.patchtext/x-diff; charset=us-asciiDownload+4-0
#3Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#2)
Re: pgsql: Fix error handling of vacuumdb and reindexdb when running out of

On 8/26/19 1:40 AM, Michael Paquier wrote:

On Mon, Aug 26, 2019 at 02:17:06AM +0000, Michael Paquier wrote:

Fix error handling of vacuumdb and reindexdb when running out of fds

When trying to use a high number of jobs, vacuumdb (and more recently
reindexdb) has only checked for a maximum number of jobs used, causing
confusing failures when running out of file descriptors when the jobs
open connections to Postgres. This commit changes the error handling so
as we do not check anymore for a maximum number of allowed jobs when
parsing the option value with FD_SETSIZE, but check instead if a file
descriptor is within the supported range when opening the connections
for the jobs so as this is detected at the earliest time possible.

Also, improve the error message to give a hint about the number of jobs
recommended, using a wording given by the reviewers of the patch.

jacana does not like that, and has reported an error for 9.6:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-08-26 03%3A51%3A19
# Running: vacuumdb -zj2 postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: too many jobs for this platform -- try 1

I am able to reproduce the problem locally, and the problem is that we
don't declare FD_SETSIZE on Windows before winsock2.h in
scripts_parallel.c (or vacuumdb.c in ~12) so all the Windows machines
running TAP are going to complain on that.

This matches with the same problem reported here
/messages/by-id/1248903114.6348.195.camel@lapdragon

We have never done that before in vacuumdb.c, and because of that I
think that with a high number of jobs we run into buffer overflows.

The patch attached does the job on my end, any objections? There
is an argument to do that in win32_port.h, but for now I'd rather take
a safe path, or just do for the change in win32_port.h on HEAD.

(Noticed the missing newline as well in the error string in
back-branches... I'll address it at the same time.)

Looks OK.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#3)
Re: pgsql: Fix error handling of vacuumdb and reindexdb when running out of

On Mon, Aug 26, 2019 at 11:36:50AM -0400, Andrew Dunstan wrote:

Looks OK.

Thanks Andrew for looking at the patch. I have now committed it on
all the impacted branches, after running vcregress bincheck for all of
them with MSVC.
--
Michael

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
Re: pgsql: Fix error handling of vacuumdb and reindexdb when running out of

On Tue, Aug 27, 2019 at 09:18:27AM +0900, Michael Paquier wrote:

Thanks Andrew for looking at the patch. I have now committed it on
all the impacted branches, after running vcregress bincheck for all of
them with MSVC.

bowerbird has turned back to green on 9.5 and 9.6 now, so it seems
that we are fine.
--
Michael