vacuum analyze feedback

Started by Ed Loehrover 25 years ago6 messages

eloehr@austin.rr.com

over 25 years ago

I know this topic has been rehashed a million times, but I just wanted to
add one datapoint. I have a database (150 tables, less than 20K tuples
in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access,
and I still see frequent situations where my query times bloat by roughly
300% (4 times slower) in the intervening time between vacuums. All this
is to say that I think a more strategic implementation of the
functionality of vacuum analyze (specifically, non-batched, automated,
on-the-fly vacuuming/analyzing) would be a major "value add". I haven't
educated myself as to the history of it, but I do wonder why the
performance focus is not on this. I'd imagine it would be a performance
hit (which argues for making it optional), but I'd gladly take a 10%
performance hit over the current highly undesireable degradation. You
could do a whole lotta optimization on the planner/parser/executor and
not get close to the end-user-perceptible gains from fixing this
problem...

Regards,
Ed Loehr

Bruce Momjian

pgman@candle.pha.pa.us

over 25 years ago

In reply to: Ed Loehr (#1)

Re: vacuum analyze feedback

I know this topic has been rehashed a million times, but I just wanted to
add one datapoint. I have a database (150 tables, less than 20K tuples
in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access,
and I still see frequent situations where my query times bloat by roughly
300% (4 times slower) in the intervening time between vacuums. All this
is to say that I think a more strategic implementation of the
functionality of vacuum analyze (specifically, non-batched, automated,
on-the-fly vacuuming/analyzing) would be a major "value add". I haven't
educated myself as to the history of it, but I do wonder why the
performance focus is not on this. I'd imagine it would be a performance
hit (which argues for making it optional), but I'd gladly take a 10%
performance hit over the current highly undesireable degradation. You
could do a whole lotta optimization on the planner/parser/executor and
not get close to the end-user-perceptible gains from fixing this
problem...

Vadim is planning over-write storage manager in 7.2 which will allow
expired tuples to be reunsed without vacuum.

Or is the ANALYZE the issue for you? You need hourly statistics?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Ed Loehr

eloehr@austin.rr.com

over 25 years ago

In reply to: Bruce Momjian (#2)

Re: vacuum analyze feedback

Bruce Momjian wrote:

I know this topic has been rehashed a million times, but I just wanted to
add one datapoint. I have a database (150 tables, less than 20K tuples
in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access,
and I still see frequent situations where my query times bloat by roughly
300% (4 times slower) in the intervening time between vacuums. All this
is to say that I think a more strategic implementation of the
functionality of vacuum analyze (specifically, non-batched, automated,
on-the-fly vacuuming/analyzing) would be a major "value add". I haven't
educated myself as to the history of it, but I do wonder why the
performance focus is not on this. I'd imagine it would be a performance
hit (which argues for making it optional), but I'd gladly take a 10%
performance hit over the current highly undesireable degradation. You
could do a whole lotta optimization on the planner/parser/executor and
not get close to the end-user-perceptible gains from fixing this
problem...

Vadim is planning over-write storage manager in 7.2 which will allow
expired tuples to be reunsed without vacuum.

Sorry, I missed that in prior threads...that would be good.

Or is the ANALYZE the issue for you?

Both, actually. More specifically, blocking end-user access during
vacuum, and degraded end-user performance as pg_statistics diverge from
reality. Both are losses of service from the system.

You need hourly statistics?

My unstated point was that hourly stats have turned out *not* to be
nearly good enough in my case. Better would be if the system was smart
enough to recognize when the outcome of a query/plan was sufficiently
divergent from statistics to warrant a system-initiated analyze (or
whatever form it would take). I'll probably end up doing this detection
from the app/client side, but that's not the right place for it, IMO.

Regards,
Ed Loehr

Bruce Momjian

pgman@candle.pha.pa.us

over 25 years ago

In reply to: Ed Loehr (#3)

Re: vacuum analyze feedback

Bruce Momjian wrote:

I know this topic has been rehashed a million times, but I just wanted to
add one datapoint. I have a database (150 tables, less than 20K tuples
in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access,
and I still see frequent situations where my query times bloat by roughly
300% (4 times slower) in the intervening time between vacuums. All this
is to say that I think a more strategic implementation of the
functionality of vacuum analyze (specifically, non-batched, automated,
on-the-fly vacuuming/analyzing) would be a major "value add". I haven't
educated myself as to the history of it, but I do wonder why the
performance focus is not on this. I'd imagine it would be a performance
hit (which argues for making it optional), but I'd gladly take a 10%
performance hit over the current highly undesireable degradation. You
could do a whole lotta optimization on the planner/parser/executor and
not get close to the end-user-perceptible gains from fixing this
problem...

Vadim is planning over-write storage manager in 7.2 which will allow
expired tuples to be reunsed without vacuum.

Sorry, I missed that in prior threads...that would be good.

Or is the ANALYZE the issue for you?

Both, actually. More specifically, blocking end-user access during
vacuum, and degraded end-user performance as pg_statistics diverge from
reality. Both are losses of service from the system.

You need hourly statistics?

My unstated point was that hourly stats have turned out *not* to be
nearly good enough in my case. Better would be if the system was smart
enough to recognize when the outcome of a query/plan was sufficiently
divergent from statistics to warrant a system-initiated analyze (or
whatever form it would take). I'll probably end up doing this detection
from the app/client side, but that's not the right place for it, IMO.

Yes, I think eventually, we need to feed information about actual query
results back into the optimizer for use in later queries.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Bruce Momjian

pgman@candle.pha.pa.us

over 25 years ago

In reply to: Bruce Momjian (#4)

Re: vacuum analyze feedback

At 15:54 25/05/00 -0400, Bruce Momjian wrote:

Yes, I think eventually, we need to feed information about actual query
results back into the optimizer for use in later queries.

You could be a little more ambituous and do what Dec/Rdb does - use the
results of current query execution to (possibly) cause a change in the
current strategy.

yes.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Import Notes

Reply to msg id not found: 3.0.5.32.20000526133133.0226b840@mail.rhyme.com.au | Resolved by subject fallback

Philip Warner

pjw@rhyme.com.au

over 25 years ago

In reply to: Bruce Momjian (#4)

Re: vacuum analyze feedback

At 15:54 25/05/00 -0400, Bruce Momjian wrote:

Yes, I think eventually, we need to feed information about actual query
results back into the optimizer for use in later queries.

You could be a little more ambituous and do what Dec/Rdb does - use the
results of current query execution to (possibly) cause a change in the
current strategy.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/