Cause of intermittent rangetypes regression test failures

Started by Tom Laneover 14 years ago6 messageshackers

tgl@sss.pgh.pa.us

over 14 years ago

Well, I was overthinking the question of why rangetypes sometimes fails
with

select count(*) from test_range_gist where ir << int4range(100,500);
! ERROR: input range is empty

Turns out that happens whenever auto-analyze has managed to process
test_range_gist before we get to this part of the test. jaguar
is more likely to see this because CLOBBER_CACHE_ALWAYS slows down the
rangetypes code to a really staggering extent, but obviously it can
happen anywhere. If the table has been analyzed, then the
most_common_values array for column ir will consist of
{empty}
which is entirely correct since that value accounts for 16% of the
table. And then, when mcv_selectivity tries to estimate the selectivity
of the << condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

regards, tom lane

Jeff Davis

pgsql@j-davis.com

over 14 years ago

In reply to: Tom Lane (#1)

Re: Cause of intermittent rangetypes regression test failures

On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:

If the table has been analyzed, then the
most_common_values array for column ir will consist of
{empty}
which is entirely correct since that value accounts for 16% of the
table. And then, when mcv_selectivity tries to estimate the selectivity
of the << condition, it applies range_before to the empty range along
with the int4range(100,500) value, and range_before spits up.

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

Thoughts, other ideas?

Regards,
Jeff Davis

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Jeff Davis (#2)

Re: Cause of intermittent rangetypes regression test failures

Jeff Davis <pgsql@j-davis.com> writes:

On Sun, 2011-11-13 at 15:38 -0500, Tom Lane wrote:

I think this demonstrates that the current definition of range_before is
broken. It is not reasonable for it to throw an error on a perfectly
valid input ... at least, not unless you'd like to mark it VOLATILE so
that the planner will not risk calling it.

What shall we have it do instead?

We could have it return NULL, I suppose. I was worried that that would
lead to confusion between NULL and the empty range, but it might be
better than marking it VOLATILE.

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

This doesn't seem terribly unreasonable: we just have to document
that the empty range is neither before nor after any other range.

regards, tom lane

Jeff Davis

pgsql@j-davis.com

over 14 years ago

In reply to: Tom Lane (#3)

Re: Cause of intermittent rangetypes regression test failures

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

Good point. That makes sense to me.

Regards,
Jeff Davis

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Jeff Davis (#4)

Re: Cause of intermittent rangetypes regression test failures

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

It needs to return FALSE, actually. After further reading I realized
that you have that behavior hard-wired into the range GiST routines,
and it's silly to make the stand-alone versions of the function act
differently.

Good point. That makes sense to me.

While thinking about this ... would it be sensible for range_lower and
range_upper to return NULL instead of throwing an exception for empty or
infinite ranges? As with these comparison functions, throwing an error
seems like a fairly unpleasant definition to work with in practice.

regards, tom lane

Erik Rijkers

er@xs4all.nl

over 14 years ago

In reply to: Tom Lane (#5)

Re: Cause of intermittent rangetypes regression test failures

On Mon, November 14, 2011 19:43, Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

On Mon, 2011-11-14 at 08:11 -0500, Tom Lane wrote:

While thinking about this ... would it be sensible for range_lower and
range_upper to return NULL instead of throwing an exception for empty or
infinite ranges? As with these comparison functions, throwing an error
seems like a fairly unpleasant definition to work with in practice.

much better, IMHO.

Erik Rijkers