Speedups

Started by Nonamealmost 28 years ago11 messages
#1Noname
jwieck@debis.com

Hi,

just to let anyone know:

I did some analyzing and searched for areas that could gain
more speedups for 6.4. First I had something like an
optimizer cache in mind (planner remembers parsetree and if a
subsequent parsetree only differs in const values, substitute
consts by params and reuse saved plans instead of creating a
new plan all the time).

But this is what I got for the complete regression test (only
queries that went through the planner counted):

Parsing and rule rewriting 14 %
Optimizer and planning 6 %
Query execution 80 %
------
Total time in backend 100 %

It clearly shows that there's no need to speedup the
optimizer. The parser and the executor are the ones that
consume the time. Making the planner/optimizer smarter
resulting better plans faster to execute is the way.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#1)
Re: [HACKERS] Speedups

Parsing and rule rewriting 14 %
Optimizer and planning 6 %
Query execution 80 %
------
Total time in backend 100 %

Nice analysis. Certainly looks like Query Execution is the way to go.
profiling has shown quite a lot to help us. Usually it is not the
executor itself, but the subsystems it calls.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#3Noname
ocie@paracel.com
In reply to: Noname (#1)
Re: [HACKERS] Speedups

Jan Wieck wrote:

Hi,

just to let anyone know:

I did some analyzing and searched for areas that could gain
more speedups for 6.4. First I had something like an
optimizer cache in mind (planner remembers parsetree and if a
subsequent parsetree only differs in const values, substitute
consts by params and reuse saved plans instead of creating a
new plan all the time).

But this is what I got for the complete regression test (only
queries that went through the planner counted):

Parsing and rule rewriting 14 %
Optimizer and planning 6 %
Query execution 80 %
------
Total time in backend 100 %

It clearly shows that there's no need to speedup the
optimizer. The parser and the executor are the ones that
consume the time. Making the planner/optimizer smarter
resulting better plans faster to execute is the way.

This may sound like an obvious question, but if a user defines a
query, do we save the query plan? This would reduce the
communications between the client and server (a small gain), and allow
the server to start executing the query as soon as it recognized the
name of the stored query and parsed the arguments.

Ocie Mitchell

#4Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Noname (#3)
Re: [HACKERS] Speedups

On Wed, 4 Mar 1998 ocie@paracel.com wrote:

This may sound like an obvious question, but if a user defines a
query, do we save the query plan? This would reduce the
communications between the client and server (a small gain), and allow
the server to start executing the query as soon as it recognized the
name of the stored query and parsed the arguments.

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#5Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Peter T Mount (#4)
Re: [HACKERS] Speedups

Peter T Mount wrote:

On Wed, 4 Mar 1998 ocie@paracel.com wrote:

This may sound like an obvious question, but if a user defines a
query, do we save the query plan? This would reduce the
communications between the client and server (a small gain), and allow
the server to start executing the query as soon as it recognized the
name of the stored query and parsed the arguments.

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

Vadim

#6Noname
ocie@paracel.com
In reply to: Vadim B. Mikheev (#5)
Re: [HACKERS] Speedups

Vadim B. Mikheev wrote:

Peter T Mount wrote:

On Wed, 4 Mar 1998 ocie@paracel.com wrote:

This may sound like an obvious question, but if a user defines a
query, do we save the query plan? This would reduce the
communications between the client and server (a small gain), and allow
the server to start executing the query as soon as it recognized the
name of the stored query and parsed the arguments.

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

I don't think it is so much a question of syntax as it is a question
of what we do in the backend. Suppose I create a stored query in SQL.
We already store the SQL source for this in the database, right? So
when it comes time to execute the query, we take this SQL and execute
it as if the user had entered it directly. What I am proposing would
be to basically store the compiled query plan as well.

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

We need to be able to handle changes to the structures and contents of
the tables. If the query plan is built and we add 10000 rows to a
table it references, the query should probably be recompiled. We
could probably do this at vacuum time. There is also a small chance
that a table or index that the query plan was using is dropped. We
could automatically rebuild the query if the table was created after
the query was compiled.

Boy, to look at this, you'd think I had already built one of these :)
I haven't but I'm willing to give it a shot.

Ocie

#7Vadim B. Mikheev
vadim@sable.krasnoyarsk.su
In reply to: Noname (#6)
Re: [HACKERS] Speedups

ocie@paracel.com wrote:

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Parser, Planner/Optimizer and Executor are able to handle parameters!
No problems with this.

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Unfortunately, no!
We have to add this feature of 'course.
I don't know how we could deal with pre-compiled plans after this :(
May be, we could prepare/store not single plan, but some number of
possible plans.

We need to be able to handle changes to the structures and contents of
the tables. If the query plan is built and we add 10000 rows to a
table it references, the query should probably be recompiled. We
could probably do this at vacuum time. There is also a small chance
that a table or index that the query plan was using is dropped. We
could automatically rebuild the query if the table was created after
the query was compiled.

We could mark stored plans as durty in such cases to force re-compiling
when an application tries to use this plan.

Vadim

#8Noname
jwieck@debis.com
In reply to: Vadim B. Mikheev (#7)
Re: [HACKERS] Speedups

Vadim wrote:

ocie@paracel.com wrote:

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Parser, Planner/Optimizer and Executor are able to handle parameters!
No problems with this.

Nice discussion - especially when looking at what I initially
posted.

I assume you think about using SPI's saved plan feature for
it. Right?

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Unfortunately, no!
We have to add this feature of 'course.
I don't know how we could deal with pre-compiled plans after this :(
May be, we could prepare/store not single plan, but some number of
possible plans.

That's something I thought about when I used the SPI
functions when I built PL/Tcl. Since the saved plan will be
executed via SPI_execp(), we could change some details there.
Currently SPI_prepare() and SPI_saveplan() return the plan
itself. But they could also return a little control struct
that contains the plan and other information. Since I don't
expect someone uses these plans for something else than
calling SPI_execp(), it wouldn't break anything.

SPI_execp() can do some timing calculations. For each
execution of a plan it collects the runtime in microseconds
(gettimeofday()). After the 5th or 10th call, it builds an
average and remembers that permanently. For all subsequent
calls it calculates the average time of the last 10 calls and
if that gets much higher than the initial average it wouldn't
hurt to silently prepare and save the plan again. Using
averages lowers the problem that differences in the
parameters can cause the runtime differences.

Another possible reason for the runtime differences is the
overall workload of the server. This could be very high
during the initial average calculation. So I think it could
be smart to rebuild the plan after e.g. 1000 calls ignoring
any runtimes.

We need to be able to handle changes to the structures and contents of
the tables. If the query plan is built and we add 10000 rows to a
table it references, the query should probably be recompiled. We
could probably do this at vacuum time. There is also a small chance
that a table or index that the query plan was using is dropped. We
could automatically rebuild the query if the table was created after
the query was compiled.

We could mark stored plans as durty in such cases to force re-compiling
when an application tries to use this plan.

Yep. SPI must remember all prepared and saved plans (and
forget about only prepared ones at transaction end). Things
like dropping an index or modifying a table structure cause
invalidations in the relcache, syscache and catcache (even if
another backend did it in some cases). I think it must be
possible to tell SPI from there that something happened and
which relations are affected. If a plans rangetable contains
the affected relation, the plan is marked durty.

Things like functions, operators and aggregates are also
objects that might change (drop/recreate function -> funcnode
in plan get's unusable).

I think the best would be that SPI_prepare() set's up a
collection of Oid's that cause plan invalidation in the
control structure. These are the Oid's of ALL objects
(relations, indices, functions etc.) used in the plan. Then
a call to SPI_invalidate(Oid) from the cache invalidation
handlers doesn't have to walk through the plan itself.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#9Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Noname (#6)
Re: [HACKERS] Speedups

I don't think it is so much a question of syntax as it is a question
of what we do in the backend. Suppose I create a stored query in SQL.
We already store the SQL source for this in the database, right? So
when it comes time to execute the query, we take this SQL and execute
it as if the user had entered it directly. What I am proposing would
be to basically store the compiled query plan as well.

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

I have thought about this. If we take a query string, remove all quoted
constants and numeric constants, we can automatically split apart the
query from the parameters. We can then look up the non-parameter query
in our cache, and if it matches, replace the new contants with the old
and run the query.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#10Peter T Mount
psqlhack@maidast.demon.co.uk
In reply to: Vadim B. Mikheev (#7)
Re: [HACKERS] Speedups

On Thu, 5 Mar 1998, Vadim B. Mikheev wrote:

ocie@paracel.com wrote:

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Parser, Planner/Optimizer and Executor are able to handle parameters!
No problems with this.

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Unfortunately, no!
We have to add this feature of 'course.
I don't know how we could deal with pre-compiled plans after this :(
May be, we could prepare/store not single plan, but some number of
possible plans.

For inserts & updates I would have thought that the plan would be almost
identical (correct me here if I'm wrong here), and this is an area where
Java really crawls at the moment. Try some 40k inserts in JDBC... and go
out for the night. If we can improve things, it would be a big bonus.

--
Peter T Mount petermount@earthling.net or pmount@maidast.demon.co.uk
Main Homepage: http://www.demon.co.uk/finder
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk

#11David Hartwig
daveh@insightdist.com
In reply to: Peter T Mount (#10)
Re: [HACKERS] Speedups

Vadim B. Mikheev wrote:

Peter T Mount wrote:

On Wed, 4 Mar 1998 ocie@paracel.com wrote:

This may sound like an obvious question, but if a user defines a
query, do we save the query plan? This would reduce the
communications between the client and server (a small gain), and allow
the server to start executing the query as soon as it recognized the
name of the stored query and parsed the arguments.

Not sure ofhand, but it would be useful for JDBC's PreparedStatement and
CallableStatement classes

We can implement it very easy, and fast. Execution plan may be reused
many times. Is this feature in standard ?
What is proposed syntax if not ?

I don't think it is so much a question of syntax as it is a question
of what we do in the backend. Suppose I create a stored query in SQL.
We already store the SQL source for this in the database, right? So
when it comes time to execute the query, we take this SQL and execute
it as if the user had entered it directly. What I am proposing would
be to basically store the compiled query plan as well.

I do see a couple sticky points:

We would need some information about which variables are to be
substituted into this query plan, but this should be fairly
straightforward.

Some querys may not respond well to this, for example, if a table had
an index on an integer field f1, this would probably be the best way
to satisfy a select where f1<10. But if this were in a query as f1<x,
then a sufficiently high value of x might make this not such a good
way to run the query. I haven't looked into this, but I would assume
that the optimizer relies on the specific values in such cases.

We need to be able to handle changes to the structures and contents of
the tables. If the query plan is built and we add 10000 rows to a
table it references, the query should probably be recompiled. We
could probably do this at vacuum time. There is also a small chance
that a table or index that the query plan was using is dropped. We
could automatically rebuild the query if the table was created after
the query was compiled.

Boy, to look at this, you'd think I had already built one of these :)
I haven't but I'm willing to give it a shot.

Ocie

Not to pile on, but, I have a great interest in this subject. We do a
lot of work using off-the-shelf ODBC tools. And, we have observed that
these tools use PREPARE for two purposes.

One is to speed up iterative queries which join data from different
databases. You seem to be addressing this issue.

The other reason PREPARE is used is to retrieve a description of a
query's projection (target/result) with out actually running the
query. Currently, ODBC drivers must simulate the prepare statement by
submitting the full query and discard the data just to get the result
description. Obviously this slows response time greatly when the query
is a large data set. So if you haven't considered returning the the
results description, please do.

Thank Very Much