Query Hints? No thanks. Data hints?

Started by Dimitri Fontaineover 17 years ago3 messages

Dimitri Fontaine

dim@hi-media.com

over 17 years ago

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi -hackers,

In another thread about "GUC parameter cursors_tuple_fraction", the
debate seems to drift onto query hints. About which the consensus here
is pretty clear and strong, no query hints in PostgreSQL, thanks, or
we're never gonna have a perfect generic planner.

IIRC, I've read here in the past some attempts to begin a proposal on
the topic of data hints, allowing the user to describe his data in a
way ANALYZE can't figure out by itself, as e.g. "this column value is
tied to this other column value in this way".
This could be a materialized column, mutual-exclusive NOT NULLs, or
any multi-columns relationships, as well as "this table is a fact
table", etc.

What do you -hackers think about such a plan:
- assess cases where the planner is failing short of good statistics

- assess data properties SQL does not give us but would be of
interrest
to internals, and at the same time not so difficult to know about
by DBAs

- based on this, prepare a descriptive language of some sort tying
this all in

- implement it in a good way ;)

I'm thinking we could have a new set of commands to tell PostgreSQL
some "high-level" facts about the data, e.g. "there's a injective
function such as f(t.colA) = t.colB" or any useful thing to be found
in the firsts proposed step.

Is there a chance we're gonna improve the planner this way? And answer
Simon's (and many others here and there, -performance etc) concerns?

HTH, regards,
- --
dim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkgeEjoACgkQlBXRlnbh1bnlHwCfcHL5uOlCpptekwLBMp+E9kUn
4roAoMfwdITByHtxCi35l9jDCTSFw2Ho
=whVn
-----END PGP SIGNATURE-----

Martijn van Oosterhout

kleptog@svana.org

over 17 years ago

In reply to: Dimitri Fontaine (#1)

Re: Query Hints? No thanks. Data hints?

On Sun, May 04, 2008 at 09:44:58PM +0200, Dimitri Fontaine wrote:

IIRC, I've read here in the past some attempts to begin a proposal on
the topic of data hints, allowing the user to describe his data in a
way ANALYZE can't figure out by itself, as e.g. "this column value is
tied to this other column value in this way".

There has been talk about this in the past, things like being able to
store cross-table correlations which isn't know at all. The problem is
two-fold:

- How do you calculate the stats?
- How do you use them?

Cross-table correlations are easy for the second part, because it's
fairly simple to see where it could be used. However, no-one has come
up with an algorithm to produce a useful number to use. For others it's
harder.

I'm thinking we could have a new set of commands to tell PostgreSQL
some "high-level" facts about the data, e.g. "there's a injective
function such as f(t.colA) = t.colB" or any useful thing to be found
in the firsts proposed step.

In general postgres could use many bits of information not currently
available. For example: A=B implies lower(A)=lower(B), hence an index
on lower(A) could be used to optimise comparisons against A. Certain
operations preserve order, which may also be useful.

You can always propose something, see where it goes.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

Peter Eisentraut

peter_e@gmx.net

over 17 years ago

In reply to: Martijn van Oosterhout (#2)

Re: Query Hints? No thanks. Data hints?

Am Dienstag, 6. Mai 2008 schrieb Martijn van Oosterhout:

Cross-table correlations are easy for the second part, because it's
fairly simple to see where it could be used. However, no-one has come
up with an algorithm to produce a useful number to use. For others it's
harder.

For an algorithm, principal components analysis would be my guess. It is
designed to answer the question "this column value is tied to this other
column value in this way" [quote Simon], at least for the sort of data that a
B-tree would cover. For nonlinear data, it is of course harder.

In general postgres could use many bits of information not currently
available. For example: A=B implies lower(A)=lower(B), hence an index
on lower(A) could be used to optimise comparisons against A. Certain
operations preserve order, which may also be useful.

Locale horrors looming ... ;-)