Cause of intermittent rangetypes regression test failures

Started by Tom Laneabout 14 years ago6 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Well, I was overthinking the question of why rangetypes sometimes fails
with

select count(*) from test_range_gist where ir << int4range(100,500);
! ERROR: input range is empty

Turns out that happens whenever auto-analyze has managed to process
test_range_gist before we get to this part of the test. jaguar
is more likely to see this because CLOBBER_CACHE_ALWAYS slows down the
rangetypes code to a really staggering extent, but obviously it can
happen anywhere. If the table has been analyzed, then the
most_common_values array for column ir will consist of
{empty}
which is entirely correct since that value accounts for 16% of the
table. And then, when mcv_selectivity tries to estimate the selectivity
of the << condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

regards, tom lane

#2Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#1)
Re: Cause of intermittent rangetypes regression test failures

On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:

If the table has been analyzed, then the
most_common_values array for column ir will consist of
{empty}
which is entirely correct since that value accounts for 16% of the
table. And then, when mcv_selectivity tries to estimate the selectivity
of the << condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

Thoughts, other ideas?

Regards,
Jeff Davis

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#2)
Re: Cause of intermittent rangetypes regression test failures

Jeff Davis <pgsql@j-davis.com> writes:

On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

This doesn't seem terribly unreasonable: we just have to document
that the empty range is neither before nor after any other range.

regards, tom lane

#4Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#3)
Re: Cause of intermittent rangetypes regression test failures

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

Good point. That makes sense to me.

Regards,
Jeff Davis

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#4)
Re: Cause of intermittent rangetypes regression test failures

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

Good point. That makes sense to me.

While thinking about this ... would it be sensible for range_lower and
range_upper to return NULL instead of throwing an exception for empty or
infinite ranges? As with these comparison functions, throwing an error
seems like a fairly unpleasant definition to work with in practice.

regards, tom lane

#6Erik Rijkers
er@xs4all.nl
In reply to: Tom Lane (#5)
Re: Cause of intermittent rangetypes regression test failures

On Mon, November 14, 2011 19:43, Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

While thinking about this ... would it be sensible for range_lower and
range_upper to return NULL instead of throwing an exception for empty or
infinite ranges? As with these comparison functions, throwing an error
seems like a fairly unpleasant definition to work with in practice.

+1

much better, IMHO.

Erik Rijkers