AW: AW: Call for alpha testing: planner statistics revi sion s

Started by Zeugswetter Andreas SBalmost 25 years ago5 messageshackers
Jump to latest
#1Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at

Because we do not want the dba to decide which statistics are optimal,
there should probably be an analyze helper application that is invoked
with "vacuum analyze database optimal" or some such, that also decides
whether a table was sufficiently altered to justify new stats gathering
or vacuum.

And on what are you going to base "sufficiently altered"?

Probably current table size vs table size in statistics and maybe timestamp
when statistics were last updated. Good would also be the active row count, but
we don't have cheap access to the current value.

The point is, that if the combined effort of all "hackers" (with the help of
some large scale users) cannot come to a more or less generally adequate answer,
the field dba most certainly won't eighter.

Andreas

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB (#1)
Re: AW: AW: Call for alpha testing: planner statistics revi sion s

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

And on what are you going to base "sufficiently altered"?

Probably current table size vs table size in statistics and maybe
timestamp when statistics were last updated. Good would also be the
active row count, but we don't have cheap access to the current value.

Once we get done with online VACUUM and internal free space re-use
(which is next on my to-do list), growth of the physical file will be
a poor guide to number of updated tuples, too. So the above proposal
reduces to "time since last update", for which we do not need any
backend support: people already run VACUUM ANALYZE from cron tasks.

The point is, that if the combined effort of all "hackers" (with the
help of some large scale users) cannot come to a more or less
generally adequate answer, the field dba most certainly won't eighter.

True, but I regard your "if" as unproven. The reason for this call for
alpha testing is to find out whether we have a good enough solution or
not. I feel no compulsion to assume that it's not good enough on the
basis of no evidence.

regards, tom lane

#3Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Zeugswetter Andreas SB (#1)

3. if at all, an automatic analyze should do the samples on small tables,
and accurate stats on large tables

Other way 'round, surely? It already does that: if your table has fewer
rows than the sampling target, they all get used.

I mean, that it is probably not useful to maintain distribution statistics
for a table that is that small at all (e.g. <= 3000 rows and less than 512 k size).
So let me reword: do the samples for medium sized tables.

When on the other hand the optimizer does a "mistake" on a huge table
the difference is easily a matter of hours, thus you want accurate stats.

Not if it takes hours to get the stats. I'm more interested in keeping
ANALYZE cheap and encouraging DBAs to run it frequently, so that the
stats stay up-to-date. It doesn't matter how perfect the stats were
when they were made, if the table has changed since then.

That is true, but this is certainly a tradeoff situation. For a huge table
that is quite static you would certainly want most accurate statistics even
if it takes hours to compute once a month.

My comments are based on praxis and not theory :-) Of course current
state of the art optimizer implementations might lag well behind state of
the art theory from ACM SIGMOD :-)

Andreas

#4Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Tom Lane (#2)
AW: AW: AW: Call for alpha testing: planner statistics revi sion s

The point is, that if the combined effort of all "hackers" (with the
help of some large scale users) cannot come to a more or less
generally adequate answer, the field dba most certainly won't eighter.

True, but I regard your "if" as unproven. The reason for this call for
alpha testing is to find out whether we have a good enough solution or
not. I feel no compulsion to assume that it's not good enough on the
basis of no evidence.

Yes, sure, sorry. I certainly don't mean to be offensive. I am just
very interested in this area, and the reasoning behind your decisions.
Time to start reading all your code comments, and doing test cases :-)

Andreas

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB (#3)
Re: AW: AW: Call for alpha testing: planner statistics revi sion s

Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

I mean, that it is probably not useful to maintain distribution statistics
for a table that is that small at all (e.g. <= 3000 rows and less than
512 k size).

Actually, stats are quite interesting for smaller tables too. Maybe not
so much for the table itself (ie, deciding between seq and index scan is
not so critical), but to estimate sizes of joins against other tables.

Not if it takes hours to get the stats. I'm more interested in keeping
ANALYZE cheap and encouraging DBAs to run it frequently, so that the
stats stay up-to-date. It doesn't matter how perfect the stats were
when they were made, if the table has changed since then.

That is true, but this is certainly a tradeoff situation. For a huge table
that is quite static you would certainly want most accurate statistics even
if it takes hours to compute once a month.

Sure. My thought is that one would do this by increasing the SET
STATISTICS targets for such tables, thus yielding more detailed stats
that take longer to compute. What we need now is experimentation to
find out how well this works in practice. It might well be that more
knobs will turn out to be useful, but let's not add complexity until
we've proven it to be necessary ...

regards, tom lane