GEQO randomness?

Started by Eric Schwarzenbachover 17 years ago4 messagesgeneral
Jump to latest
#1Eric Schwarzenbach
Eric.J.Schwarzenbach.C88@alumni.upenn.edu

This is in a sense a followup to my post with subject "Wildly erratic
query performance". The more I think about it the only thing that makes
sense of my results is if the query planner really WAS choosing my join
order truly randomly each time. I went digging into the manual and
Section 49.3.1. "Generating Possible Plans with GEQO" says

"In the initial stage, the GEQO code simply generates some possible join
sequences at random."

Now ordinarily I would interpret this use of the word random loosely, to
mean "arbitrarily" or using some non-meaningful selection criteria. But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it uses a random number generate to pick paths. Can
someone confirm that this is the case?

Is this really a good idea? Is non-deterministic behavior really
acceptable? I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

Eric

#2Eric Schwarzenbach
subscriber@blackbrook.org
In reply to: Eric Schwarzenbach (#1)

This is in a sense a followup to my post with subject "Wildly erratic
query performance".

The more I think about it the only thing that makes sense of my results
is if the query planner really WAS choosing my join order truly randomly
each time. I went digging into the manual and Section 49.3.1.
"Generating Possible Plans with GEQO" says

"In the initial stage, the GEQO code simply generates some possible join
sequences at random."

Now ordinarily I would interpret this use of the word "random" loosely, to
mean "arbitrarily" or "using some non-meaningful selection criteria". But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it actually uses a random number generator to choose paths. Can
someone confirm that this really is the case?

If so, I is this really a good idea? Is non-deterministic behavior really
acceptable? I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

Eric

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Schwarzenbach (#2)
Re: GEQO randomness?

Eric Schwarzenbach <subscriber@blackbrook.org> writes:

Now ordinarily I would interpret this use of the word "random" loosely, to
mean "arbitrarily" or "using some non-meaningful selection criteria". But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it actually uses a random number generator to choose paths. Can
someone confirm that this really is the case?

What it's doing is searching a subset of the space of all possible join
orders. It still picks the best (according to cost estimate) plan
within that subset, but if you're unlucky there may be no very good plan
in that subset. And yes, there is a random number generator in there.

If so, I is this really a good idea?

The alternatives are not very appealing either ...

I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

... in particular, that one's hardly a panacea. For one thing, a
not-unlikely outcome would be that you *never* get a good plan and thus
don't even get a hint that you might be missing something. For another,
the data values used in the query and the current ANALYZE statistics
also affect the search, which means that in the real world where those
things change, you'd still be exposed to getting the occasional
unexpectedly bad plan.

There are a number of alternatives you can consider though:

1. Disable geqo or bump up the threshold enough that it's not used for
your query. Whether this is a feasible answer is impossible to say with
the limited detail you've provided. (Remember that potentially
exponential search time.)

2. Increase geqo_effort to make the randomized search run a bit longer
and examine more plans. This just decreases the probability of losing,
but maybe it will do so enough that you won't care anymore.

3. Figure out what's a good join order, rewrite your query to explicitly
join in that order, and *decrease* join_collapse_limit to force the
planner to follow that order instead of searching. Permanent solution
but the initial development effort is high, especially if you have a lot
of different queries that need this treatment.

4. Write a better randomized-search algorithm and submit a patch ;-)
We have good reason to think that the GEQO code is not a really
intelligent approach to doing randomized plan searching --- it's based
on an algorithm designed to solve traveling-salesman problems, which is
not such a good match to join-order problems --- but no one's yet gotten
motivated to replace it.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Eric Schwarzenbach (#1)
Re: GEQO randomness?

Eric Schwarzenbach wrote:

This is in a sense a followup to my post with subject "Wildly erratic
query performance". The more I think about it the only thing that makes
sense of my results is if the query planner really WAS choosing my join
order truly randomly each time. I went digging into the manual and
Section 49.3.1. "Generating Possible Plans with GEQO" says

"In the initial stage, the GEQO code simply generates some possible join
sequences at random."

Now ordinarily I would interpret this use of the word random loosely, to
mean "arbitrarily" or using some non-meaningful selection criteria. But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it uses a random number generate to pick paths. Can
someone confirm that this is the case?

Yes, "random" means random.

Is this really a good idea? Is non-deterministic behavior really
acceptable? I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

Uh, no one has ever asked for that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +