Analyze all plans

Started by Donald Dongalmost 7 years ago4 messages
#1Donald Dong
xdong@csumb.edu

Hi,

I'm working on an extension which analyzes all possible plans
generated by the planner. I believe this extension would become
useful for benchmarking the planner (e.g. the performance of the
estimation and the cost model) and better understanding the cases
where the planners would make a suboptimal move.

Here are my procedures:
1. Enumerate all the plans
1.1 Create a hook for add_path so that it won't discard the
expensive paths from the planner's point of view.
1.2 Collect all the paths from final_rel->pathlist, and turn them
into PlannedStmt nodes by patching standard_planner.
1.3 Mark the cheapest path from the planner's point of view.

2. Explain all the plans
2.1 First explain the cheapest plan
2.1.1 If analyzing, collect the execution time and use it to set
a timer interrupt.
2.2 Explain the remaining plans
2.2.1 The other plans can be disastrous; the plans may never
finish in a reasonable amount of time. If analyzing, the timer
interrupt shall stop the executor.
2.2.2 Move on to the next plan

Are those procedures reasonable?

I'm able to implement all the steps except for 2.2.1.

- Attempt 1
Our most common way of handling the timeouts is to kill the
process. However, it would terminate the entire explain statement,
yet there're still plans to be explained.

- Attempt 2
Fork before explain so it would possible to kill the child process
without disrupting the explain statement. However, simply
allocating shared memory for the QueryDesc would not work (PANIC:
failed to re-find shared lock object). To me, this plan is more
doable, but it seems there exist other mechanisms I need to be
aware, to execute a plan in the child process and report the
result in the parent process?

What do you think? I will appreciate any feedback.

Thank you,
Donald Dong

#2Oleksandr Shulgin
oleksandr.shulgin@zalando.de
In reply to: Donald Dong (#1)
Re: Analyze all plans

On Wed, Jan 23, 2019 at 9:44 AM Donald Dong <xdong@csumb.edu> wrote:

1. Enumerate all the plans

Not sure this is going to work. Because of the total number of possible
plans is somewhere
around O(n!), if I'm not mistaken, in terms of number of joined relations
times the available
access methods times the possible join strategies.

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Regards,
--
Alex

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleksandr Shulgin (#2)
Re: Analyze all plans

Oleksandr Shulgin <oleksandr.shulgin@zalando.de> writes:

On Wed, Jan 23, 2019 at 9:44 AM Donald Dong <xdong@csumb.edu> wrote:

1. Enumerate all the plans

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Yes. You can *not* disable the planner's aggressive pruning of losing
paths and subpaths without ending up with something that's completely
impractical for anything beyond toy queries. That's just from the
standpoint of planner runtime. Adding on the cost of actually creating
a finished plan and then running it for long enough to get a reliable
runtime for each case would ... well, let's just say you'd be safely dead
before you got any interesting answers.

regards, tom lane

#4Donald Dong
xdong@csumb.edu
In reply to: Tom Lane (#3)
Re: Analyze all plans

On Jan 23, 2019, at 6:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleksandr Shulgin <oleksandr.shulgin@zalando.de> writes:

On Wed, Jan 23, 2019 at 9:44 AM Donald Dong <xdong@csumb.edu> wrote:

1. Enumerate all the plans

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Yes. You can *not* disable the planner's aggressive pruning of losing
paths and subpaths without ending up with something that's completely
impractical for anything beyond toy queries. That's just from the
standpoint of planner runtime. Adding on the cost of actually creating
a finished plan and then running it for long enough to get a reliable
runtime for each case would ... well, let's just say you'd be safely dead
before you got any interesting answers.

Thank you for the feedback. I didn't think about this carefully. I
thought the planner would use GEQO when the number of joins is
large, but indeed it's still n! for normal path selection.

Now I keep top-k cheapest sub plans, where k is configured
by users. If the k is set up properly, setting a timeout may not
be necessary. However, if I do want to set a timeout, I would run
into the issues I had in step 2.

I'm looking forward to hearing more thoughts :)

Thanks again!