optimizer cost calculation problem

Started by Tatsuo Ishiialmost 23 years ago8 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp

Kenji Sugita has identified a problem with cost_sort() in costsize.c.

In the following code fragment, sortmembytes is defined as long. So

double nruns = nbytes / (sortmembytes * 2);

may cause an integer overflow if sortmembytes exceeds 2^30, which in
turn make optimizer to produce wrong query plan(this actually happned
in a large PostgreSQL installation which has tons of memory). Here is a
proposed fix against current:

*** optimizer/path/costsize.c	16 Feb 2003 02:30:38 -0000	1.107
--- optimizer/path/costsize.c	31 Mar 2003 22:25:06 -0000
***************
*** 490,496 ****
  	Cost		startup_cost = input_cost;
  	Cost		run_cost = 0;
  	double		nbytes = relation_byte_size(tuples, width);
! 	long		sortmembytes = SortMem * 1024L;
  	if (!enable_sort)
  		startup_cost += disable_cost;
--- 490,496 ----
  	Cost		startup_cost = input_cost;
  	Cost		run_cost = 0;
  	double		nbytes = relation_byte_size(tuples, width);
! 	double		sortmembytes = SortMem * 1024.0;

if (!enable_sort)
startup_cost += disable_cost;

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: optimizer cost calculation problem

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Kenji Sugita has identified a problem with cost_sort() in costsize.c.
In the following code fragment, sortmembytes is defined as long. So
double nruns = nbytes / (sortmembytes * 2);
may cause an integer overflow if sortmembytes exceeds 2^30, which in
turn make optimizer to produce wrong query plan(this actually happned
in a large PostgreSQL installation which has tons of memory).

I find it really really hard to believe that it's wise to run with
sort_mem exceeding 2 gig ;-). Does that installation have so much
RAM that it can afford to run multiple many-Gb sorts concurrently?

This is far from being the only place that multiplies SortMem by 1024.
My inclination is that a safer fix is to alter guc.c's entry for
SortMem to establish a maximum value of INT_MAX/1024 for the variable.

Probably some of the other GUC variables like shared_buffers ought to
have overflow-related maxima established, too.

regards, tom lane

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#2)
Re: optimizer cost calculation problem

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

Kenji Sugita has identified a problem with cost_sort() in costsize.c.
In the following code fragment, sortmembytes is defined as long. So
double nruns = nbytes / (sortmembytes * 2);
may cause an integer overflow if sortmembytes exceeds 2^30, which in
turn make optimizer to produce wrong query plan(this actually happned
in a large PostgreSQL installation which has tons of memory).

I find it really really hard to believe that it's wise to run with
sort_mem exceeding 2 gig ;-). Does that installation have so much
RAM that it can afford to run multiple many-Gb sorts concurrently?

The process is assigned 1 gig sort mem to speed up a batch job by
uisng backend-process-only sort mem setting, and they do not modify
postgresql.conf for ordinaly user.

BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
2) .

Show quoted text

This is far from being the only place that multiplies SortMem by 1024.
My inclination is that a safer fix is to alter guc.c's entry for
SortMem to establish a maximum value of INT_MAX/1024 for the variable.

Probably some of the other GUC variables like shared_buffers ought to
have overflow-related maxima established, too.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#3)
Re: optimizer cost calculation problem

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
2) .

Good point. Probably that particular calculation should be
"sortmembytes * 2.0" to force it to double before it can overflow.
But I still think we'd better limit SortMem so that the basic
SortMem*1024 calculation can't overflow (or even come close to overflow,
likely).

regards, tom lane

#5Ron Mayer
ron@intervideo.com
In reply to: Tom Lane (#2)
Re: optimizer cost calculation problem

Tom wrote:

I find it really really hard to believe that it's wise to run with
sort_mem exceeding 2 gig ;-). Does that installation have so much
RAM that it can afford to run multiple many-Gb sorts concurrently?

I don't do 2 gig... but I found 0.3 gig helped on a not-too-large system.

In a nightly load of a datawarehousing application I have
a number of places where I do:

set sort_mem=300000;
create table new_whatevers as select distinct whatever from import_table;
set sort_mem=10000;

and

set sort_mem=100000;
select count(*),b from a group by b;
set sort_mem=10000;

when the optimizer likes hash aggregates.

It significantly (8 hours -> 6 hours) reduces the nightly processing
of my log-file analysis database. If my modest system benefited
from 1/3 gig, it wouldn't surprise me if a large system benefits
from 2 gig.

If more info's useful, I'd be happy to provide some.

Ron

#6scott.marlowe
scott.marlowe@ihs.com
In reply to: Tom Lane (#4)
Re: optimizer cost calculation problem

On Mon, 31 Mar 2003, Tom Lane wrote:

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

BTW it does not 2 gig, but 1 gig (remember that we do sortmembytes *
2) .

Good point. Probably that particular calculation should be
"sortmembytes * 2.0" to force it to double before it can overflow.
But I still think we'd better limit SortMem so that the basic
SortMem*1024 calculation can't overflow (or even come close to overflow,
likely).

This isn't really an issue for 64 bit hardware is it?

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: scott.marlowe (#6)
Re: optimizer cost calculation problem

"scott.marlowe" <scott.marlowe@ihs.com> writes:

This isn't really an issue for 64 bit hardware is it?

Is "int" 64 bits on such a machine? The ones I've dealt with chose to
set int = 32bits, long = 64bits. If they don't do that then they have
a problem with not having any native C 32bit type (and promoting short
int up to 32 bits just moves the problem...)

At some point we should probably try to regularize the backend code so
that all memory-size-related calcs are consistently done in size_t or
ssize_t arithmetic; but we're a long way from that at present. For now
I think it's prudent to keep sort_mem small enough to avoid overflow in
int32 arithmetic.

regards, tom lane

#8Andrew Sullivan
andrew@libertyrms.info
In reply to: Tom Lane (#7)
Re: optimizer cost calculation problem

On Tue, Apr 01, 2003 at 11:24:01AM -0500, Tom Lane wrote:

"scott.marlowe" <scott.marlowe@ihs.com> writes:

This isn't really an issue for 64 bit hardware is it?

Is "int" 64 bits on such a machine? The ones I've dealt with chose to

It was for the regression tests we did recently on a 64-bit
gcc-3.2-compiled Solaris 8 binary. Piles of regression failures
because int was not 32 bits.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110