get_relation_stats_hook()

Started by Simon Riggsover 17 years ago17 messages
#1Simon Riggs
simon@2ndquadrant.com

Currently we have a plugin capability for get_relation_info_hook(), but
no corresponding capability for statistics info.

So, all calls to SearchSysCache would be replaced with a call to
get_relation_info_hook(), if present.

Any objections, thoughts?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Simon Riggs (#1)
Re: get_relation_stats_hook()

Simon Riggs wrote:

Currently we have a plugin capability for get_relation_info_hook(), but
no corresponding capability for statistics info.

So, all calls to SearchSysCache would be replaced with a call to
get_relation_info_hook(), if present.

I assume you meant get_relation_stats_hook in the last paragraph, but I
can't understand what you mean with SearchSysCache (surely that's not
it.)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: get_relation_stats_hook()

Simon Riggs <simon@2ndquadrant.com> writes:

Currently we have a plugin capability for get_relation_info_hook(), but
no corresponding capability for statistics info.

So, all calls to SearchSysCache would be replaced with a call to
get_relation_info_hook(), if present.

Surely you didn't mean ALL calls. Please be more specific about what
you're proposing.

regards, tom lane

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: get_relation_stats_hook()

On Thu, 2008-06-26 at 11:18 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Currently we have a plugin capability for get_relation_info_hook(), but
no corresponding capability for statistics info.

So, all calls to SearchSysCache would be replaced with a call to
get_relation_info_hook(), if present.

Surely you didn't mean ALL calls. Please be more specific about what
you're proposing.

The statistics relation STATRELATT is accessed in a few places in the
planner. Since it is in the syscache it is accessed directly from there.
I would like to add hooks so that stats data can come from somewhere
else other than the syscache for tables, just as we can already do with
get_relation_stats_hook(). These new changes would complete the existing
feature to ensure it is fully usabled in the way originally intended.

In selfunc.c: There are 3 calls to SearchSysCache(STATRELATT,...).

In lsyscache.c: There is 1 call to SearchSysCache(STATRELATT...) in
get_attavgwidth() and 2 calls to SysCacheGetAttr(STATRELATT...) in
get_attstatsslot().

Calls to SearchSysCache(STATRELATT...) would be replaced by a call to an
external module, if present, with a function pointer to
get_relation_stats_hook(). This returns a tuple with stats info about
that relation.

Calls to SysCacheGetAttr(STATRELATT...) would be replaced by a call to
an external module, if present with a function pointer to
get_attribute_stats_hook(). This returns a stats slot.

The call in ANALYZE would not be touched.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#4)
Re: get_relation_stats_hook()

Simon Riggs <simon@2ndquadrant.com> writes:

Surely you didn't mean ALL calls. Please be more specific about what
you're proposing.

The statistics relation STATRELATT is accessed in a few places in the
planner. Since it is in the syscache it is accessed directly from there.
I would like to add hooks so that stats data can come from somewhere
else other than the syscache for tables, just as we can already do with
get_relation_stats_hook().

Well, defining the hooks as replacing STATRELATT lookups seems the wrong
level of abstraction. What if you want to insert stats that can't be
represented by the current pg_statistic definition? Even if there's no
functional limitation, cons'ing up a dummy pg_statistic tuple seems the
hard way to do it --- we didn't define get_relation_info_hook as working
by supplying made-up catalog rows. Furthermore, many of the interesting
cases that someone might want to hook into are code paths that don't
even try to look up a pg_statistic row because they know there won't be
one, such as two out of the three cases in examine_variable().

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.

regards, tom lane

#6Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: get_relation_stats_hook()

On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Surely you didn't mean ALL calls. Please be more specific about what
you're proposing.

The statistics relation STATRELATT is accessed in a few places in the
planner. Since it is in the syscache it is accessed directly from there.
I would like to add hooks so that stats data can come from somewhere
else other than the syscache for tables, just as we can already do with
get_relation_stats_hook().

Well, defining the hooks as replacing STATRELATT lookups seems the wrong
level of abstraction. What if you want to insert stats that can't be
represented by the current pg_statistic definition? Even if there's no
functional limitation, cons'ing up a dummy pg_statistic tuple seems the
hard way to do it --- we didn't define get_relation_info_hook as working
by supplying made-up catalog rows. Furthermore, many of the interesting
cases that someone might want to hook into are code paths that don't
even try to look up a pg_statistic row because they know there won't be
one, such as two out of the three cases in examine_variable().

The reason for doing it this way is I'm interested in using stats
literally copied from other servers. So the pg_statistic tuples will be
available for us directly. I'm building a tool to allow people to export
their production environment to a test system, so that SQL developers
can experiment with query tuning and optimizer developers can recreate
problems.

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.

Both sound like good ideas. I wasn't really after ultimate flexibility,
but perhaps I should be.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#7Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#5)
2 attachment(s)
Re: [HACKERS] get_relation_stats_hook()

On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Surely you didn't mean ALL calls. Please be more specific about what
you're proposing.

The statistics relation STATRELATT is accessed in a few places in the
planner. Since it is in the syscache it is accessed directly from there.
I would like to add hooks so that stats data can come from somewhere
else other than the syscache for tables, just as we can already do with
get_relation_stats_hook().

Well, defining the hooks as replacing STATRELATT lookups seems the wrong
level of abstraction. What if you want to insert stats that can't be
represented by the current pg_statistic definition? Even if there's no
functional limitation, cons'ing up a dummy pg_statistic tuple seems the
hard way to do it --- we didn't define get_relation_info_hook as working
by supplying made-up catalog rows. Furthermore, many of the interesting
cases that someone might want to hook into are code paths that don't
even try to look up a pg_statistic row because they know there won't be
one, such as two out of the three cases in examine_variable().

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.

I've looked at ways of refactoring this, but the hooks provided here
complement the existing functionality fairly well.

We could add per column info to RelOptInfo, though toasted stats data is
always going to be difficult to handle.

There are other ways of doing this
1. replace selfuncs directly via the catalog
2. put in a hook for each selfunc at top level
3. decide an abstract API we would like to support

1 is possible now, 2 can still be done whether or not this patch is
accepted and I will likely submit a patch for that also, though 3 is too
open ended to pursue right now, IMHO.

Also, this patch can be backpatched more easily, to allow helping
current issues.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

plan_hooks.v2.patchtext/x-patch; charset=UTF-8; name=plan_hooks.v2.patchDownload
Index: src/backend/optimizer/util/plancat.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/optimizer/util/plancat.c,v
retrieving revision 1.146
diff -c -r1.146 plancat.c
*** src/backend/optimizer/util/plancat.c	12 May 2008 00:00:49 -0000	1.146
--- src/backend/optimizer/util/plancat.c	30 Jun 2008 19:52:27 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "nodes/makefuncs.h"
  #include "optimizer/clauses.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/planhook.h"
  #include "optimizer/predtest.h"
  #include "optimizer/prep.h"
  #include "parser/parse_expr.h"
***************
*** 45,52 ****
  /* GUC parameter */
  bool		constraint_exclusion = false;
  
! /* Hook for plugins to get control in get_relation_info() */
! get_relation_info_hook_type get_relation_info_hook = NULL;
  
  
  static List *get_relation_constraints(PlannerInfo *root,
--- 46,55 ----
  /* GUC parameter */
  bool		constraint_exclusion = false;
  
! /* Hook for plugins to get planner info, defined in planhook.h */
! get_relation_info_hook_type 		get_relation_info_hook = NULL;
! get_relation_stats_hook_type 		get_relation_stats_hook = NULL;
! get_relation_avg_width_hook_type 	get_relation_avg_width_hook = NULL;
  
  
  static List *get_relation_constraints(PlannerInfo *root,
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.249
diff -c -r1.249 selfuncs.c
*** src/backend/utils/adt/selfuncs.c	12 May 2008 00:00:51 -0000	1.249
--- src/backend/utils/adt/selfuncs.c	30 Jun 2008 19:52:52 -0000
***************
*** 87,92 ****
--- 87,93 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/planhook.h"
  #include "optimizer/predtest.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/var.h"
***************
*** 3749,3755 ****
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
--- 3750,3761 ----
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			if (get_relation_stats_hook)
! 				vardata->statsTuple = (*get_relation_stats_hook) 
! 												(ObjectIdGetDatum(rte->relid),
! 												 Int16GetDatum(var->varattno));
! 			else
! 				vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
***************
*** 3869,3878 ****
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						vardata->statsTuple = SearchSysCache(STATRELATT,
! 										   ObjectIdGetDatum(index->indexoid),
! 													  Int16GetDatum(pos + 1),
! 															 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
--- 3875,3889 ----
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						if (get_relation_stats_hook)
! 							vardata->statsTuple = (*get_relation_stats_hook) 
! 														(ObjectIdGetDatum(index->indexoid),
! 														 Int16GetDatum(pos + 1));
! 						else
! 							vardata->statsTuple = SearchSysCache(STATRELATT,
! 														 ObjectIdGetDatum(index->indexoid),
! 														 Int16GetDatum(pos + 1),
! 														 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
***************
*** 5507,5516 ****
  		colnum = 1;
  	}
  
! 	tuple = SearchSysCache(STATRELATT,
! 						   ObjectIdGetDatum(relid),
! 						   Int16GetDatum(colnum),
! 						   0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
--- 5518,5532 ----
  		colnum = 1;
  	}
  
! 	if (get_relation_stats_hook)
! 		tuple = (*get_relation_stats_hook) (
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum));
! 	else
! 		tuple = SearchSysCache(STATRELATT,
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum),
! 							0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.157
diff -c -r1.157 lsyscache.c
*** src/backend/utils/cache/lsyscache.c	13 Apr 2008 20:51:21 -0000	1.157
--- src/backend/utils/cache/lsyscache.c	30 Jun 2008 19:54:46 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
+ #include "optimizer/planhook.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "utils/array.h"
***************
*** 2457,2462 ****
--- 2458,2468 ----
  {
  	HeapTuple	tp;
  
+ 	if (get_relation_avg_width_hook)
+ 		return (*get_relation_avg_width_hook) (
+ 						ObjectIdGetDatum(relid),
+ 						Int16GetDatum(attnum));
+ 
  	tp = SearchSysCache(STATRELATT,
  						ObjectIdGetDatum(relid),
  						Int16GetDatum(attnum),
Index: src/include/optimizer/plancat.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
retrieving revision 1.49
diff -c -r1.49 plancat.h
*** src/include/optimizer/plancat.h	1 Apr 2008 00:48:33 -0000	1.49
--- src/include/optimizer/plancat.h	30 Jun 2008 13:41:48 -0000
***************
*** 17,30 ****
  #include "nodes/relation.h"
  #include "utils/rel.h"
  
- /* Hook for plugins to get control in get_relation_info() */
- typedef void (*get_relation_info_hook_type) (PlannerInfo *root,
- 														 Oid relationObjectId,
- 														 bool inhparent,
- 														 RelOptInfo *rel);
- extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
- 
- 
  extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
  				  bool inhparent, RelOptInfo *rel);
  
--- 17,22 ----
planhook.htext/x-chdr; charset=UTF-8; name=planhook.hDownload
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#7)
Re: [HACKERS] get_relation_stats_hook()

Simon Riggs <simon@2ndquadrant.com> writes:

On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.

I've looked at ways of refactoring this, but the hooks provided here
complement the existing functionality fairly well.

The hooks proposed here are completely useless --- it's impossible
to return anything except actual catalog data, because whatever is
returned is going to be subjected to ReleaseSysCache() later on.
I'd think you at least need to be able to turn that into a no-op,
or perhaps a pfree() call. Also it's clear that some calls of
examine_variable would still return regular syscache entries, so
the cleanup behavior has to be determinable on a per-call basis.

I also still think that you need to reconsider the level at which
the hook gets control. It's impossible given these hook definitions
to create "fake" data for an appendrel or sub-SELECT, which seems to
me to be an important requirement, at least till such time as the
core planner handles those cases better.

My feeling is that the hooks ought to be at more or less the same
semantic level as examine_variable and ReleaseVariableStats, and that
rather than making special-case hooks for the other couple of direct
accesses to STATRELATT, the right thing is to make sure that the
examine_variable hook will work for them too.

I do now concur that having the hook return a pg_statistic tuple
is appropriate --- after looking at the code that uses this stuff,
decoupling that representation would be more work than it's worth.

I think it might be a good idea to prepare an actual working example
of a plug-in that does something with the hook in order to verify
that you've got a useful definition.

regards, tom lane

#9Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: [HACKERS] get_relation_stats_hook()

On Thu, 2008-07-10 at 14:59 -0400, Tom Lane wrote:

I think it might be a good idea to prepare an actual working example
of a plug-in that does something with the hook in order to verify
that you've got a useful definition.

Thanks, will do. I was waiting for the API, but I see a concrete example
will help us test whether we got the API in the right place.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#8)
1 attachment(s)
Re: [HACKERS] get_relation_stats_hook()

On Thu, 2008-07-10 at 14:59 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Thu, 2008-06-26 at 12:50 -0400, Tom Lane wrote:

I think you need to move up a level, and perhaps refactor some of the
existing code to make it easier to inject made-up stats.

I've looked at ways of refactoring this, but the hooks provided here
complement the existing functionality fairly well.

The hooks proposed here are completely useless --- it's impossible
to return anything except actual catalog data, because whatever is
returned is going to be subjected to ReleaseSysCache() later on.
I'd think you at least need to be able to turn that into a no-op,
or perhaps a pfree() call. Also it's clear that some calls of
examine_variable would still return regular syscache entries, so
the cleanup behavior has to be determinable on a per-call basis.

I've introduced an additional plugin function

typedef void (*release_relation_stats_hook_type) (HeapTuple statstup);

Simply pfree-ing things would prevent the plugin from sometimes
returning a syscache entry or keeping its own cache internally.

I also still think that you need to reconsider the level at which
the hook gets control. It's impossible given these hook definitions
to create "fake" data for an appendrel or sub-SELECT, which seems to
me to be an important requirement, at least till such time as the
core planner handles those cases better.

I agree with this, especially with regard to join relations.

However, I'm having difficulty seeing how to identify these structures
externally to allow any meaningful lookups of data. So if we have a 7
table join, then how do we specifically identify the join between tables
4 and 7, say.

We can say, its "foo = foo", but that can easily occur multiple times in
a many way join, especially since join columns are frequently named the
same. How should we tell them apart? AFAICS we can't and yet we need to
if this is to be useful.

Overriding that is already reasonably easily possible anyway using
operators, so I'm not seeing this plugin as a way to introduce hints.
This is purely to provide access to alternate stats.

My feeling is that the hooks ought to be at more or less the same
semantic level as examine_variable and ReleaseVariableStats, and that
rather than making special-case hooks for the other couple of direct
accesses to STATRELATT, the right thing is to make sure that the
examine_variable hook will work for them too.

In general, I agree, though this does cause more work in the plugin and
in other areas of selfuncs.c

btcostestimate() searches syscache directly rather than going through
examine_variable. I suspect refactoring that would cause changes to
internals of examine_variable, possibly also changing the API to cater
for index stats.

I have tried to follow this through as you suggest, but the thought that
examine_variable is the right level doesn't seem as clean as it sounded
when you first suggested.

With those problems in mind, can we stick to the original plan? It's
reasonably clear how it works, and it does work correctly now with the
new release call mentioned above.

Or can you suggest the refactoring you would like to see in that area?
Or better still commit something for me to use as a base.

I think it might be a good idea to prepare an actual working example
of a plug-in that does something with the hook in order to verify
that you've got a useful definition.

I've tested this new patch with the plugin I showed you before, rewired
so it makes syscache calls via patch, as a "null" test case.
All works, no problems.

I'll submit the fully working plugin once we've stabilised the API. It's
designed as a contrib module, so it can go in pgfoundry or contrib.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

stat_hooks.v3.patchtext/x-patch; charset=utf-8; name=stat_hooks.v3.patchDownload
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.250
diff -c -r1.250 selfuncs.c
*** src/backend/utils/adt/selfuncs.c	7 Jul 2008 20:24:55 -0000	1.250
--- src/backend/utils/adt/selfuncs.c	6 Aug 2008 09:06:26 -0000
***************
*** 103,108 ****
--- 103,111 ----
  #include "utils/selfuncs.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control when we ask for stats */
+ get_relation_stats_hook_type get_relation_stats_hook = NULL;
+ release_relation_stats_hook_type release_relation_stats_hook = NULL;
  
  static double var_eq_const(VariableStatData *vardata, Oid operator,
  			 Datum constval, bool constisnull,
***************
*** 3769,3775 ****
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
--- 3772,3783 ----
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			if (get_relation_stats_hook)
! 				vardata->statsTuple = (*get_relation_stats_hook) 
! 												(ObjectIdGetDatum(rte->relid),
! 												 Int16GetDatum(var->varattno));
! 			else
! 				vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
***************
*** 3889,3898 ****
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  Int16GetDatum(pos + 1),
! 															 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
--- 3897,3911 ----
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						if (get_relation_stats_hook)
! 							vardata->statsTuple = (*get_relation_stats_hook) 
! 											(ObjectIdGetDatum(index->indexoid),
! 														 Int16GetDatum(pos + 1));
! 						else
! 							vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  	 Int16GetDatum(pos + 1),
! 														 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
***************
*** 5527,5536 ****
  		colnum = 1;
  	}
  
! 	tuple = SearchSysCache(STATRELATT,
! 						   ObjectIdGetDatum(relid),
! 						   Int16GetDatum(colnum),
! 						   0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
--- 5540,5554 ----
  		colnum = 1;
  	}
  
! 	if (get_relation_stats_hook)
! 		tuple = (*get_relation_stats_hook) (
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum));
! 	else
! 		tuple = SearchSysCache(STATRELATT,
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum),
! 							0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.157
diff -c -r1.157 lsyscache.c
*** src/backend/utils/cache/lsyscache.c	13 Apr 2008 20:51:21 -0000	1.157
--- src/backend/utils/cache/lsyscache.c	6 Aug 2008 09:10:41 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
+ #include "optimizer/plancat.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "utils/array.h"
***************
*** 35,40 ****
--- 36,43 ----
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control in get_attavgwidth() */
+ get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
  
  /*				---------- AMOP CACHES ----------						 */
  
***************
*** 2451,2466 ****
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
  
! 	tp = SearchSysCache(STATRELATT,
  						ObjectIdGetDatum(relid),
  						Int16GetDatum(attnum),
  						0, 0);
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
--- 2454,2480 ----
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
+  *
+  *	  Calling a hook at this point looks somewhat strange, but is required
+  * 	  because the optimizer handles inheritance relations by calling for
+  *	  the avg width later in the planner than get_relation_info_hook().
+  *	  So the APIs and call points of hooks must match the optimizer.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
  
! 	if (get_attavgwidth_hook)
! 		return (*get_attavgwidth_hook) (
! 						ObjectIdGetDatum(relid),
! 						Int16GetDatum(attnum));
! 	else
! 		tp = SearchSysCache(STATRELATT,
  						ObjectIdGetDatum(relid),
  						Int16GetDatum(attnum),
  						0, 0);
+ 
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
Index: src/include/optimizer/plancat.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
retrieving revision 1.50
diff -c -r1.50 plancat.h
*** src/include/optimizer/plancat.h	19 Jun 2008 00:46:06 -0000	1.50
--- src/include/optimizer/plancat.h	6 Aug 2008 09:05:19 -0000
***************
*** 14,19 ****
--- 14,20 ----
  #ifndef PLANCAT_H
  #define PLANCAT_H
  
+ #include "access/htup.h"
  #include "nodes/relation.h"
  #include "utils/relcache.h"
  
***************
*** 24,29 ****
--- 25,41 ----
  														 RelOptInfo *rel);
  extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
  
+ /* Hooks for plugins to get control in lsyscache.c and selfuncs.c */
+ typedef HeapTuple (*get_relation_stats_hook_type) (Datum relid, Datum attnum);
+ extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
+ 
+ typedef void (*release_relation_stats_hook_type) (HeapTuple statstup);
+ extern PGDLLIMPORT release_relation_stats_hook_type release_relation_stats_hook;
+ 
+ typedef int32 (*get_attavgwidth_hook_type) (Datum relid, Datum attnum);
+ extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
+ 
+ 
  
  extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
  				  bool inhparent, RelOptInfo *rel);
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.44
diff -c -r1.44 selfuncs.h
*** src/include/utils/selfuncs.h	9 Mar 2008 00:32:09 -0000	1.44
--- src/include/utils/selfuncs.h	6 Aug 2008 05:07:09 -0000
***************
*** 79,85 ****
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 			ReleaseSysCache((vardata).statsTuple); \
  	} while(0)
  
  
--- 79,90 ----
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 		{ \
! 			if (release_relation_stats_hook) \
! 				(* release_relation_stats_hook)((vardata).statsTuple); \
! 			else \
! 				ReleaseSysCache((vardata).statsTuple); \
! 		} \
  	} while(0)
  
  
#11Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#10)
2 attachment(s)
Re: [HACKERS] get_relation_stats_hook()

On Wed, 2008-08-06 at 16:37 +0100, Simon Riggs wrote:

I'll submit the fully working plugin once we've stabilised the API. It's
designed as a contrib module, so it can go in pgfoundry or contrib.

OK, here's fully working plugin, plus API patch.

I expect to open a pgfoundry project for the plugin, but will wait for
the main patch review.

I've tried the APIs three different ways and this seems cleanest and
most widely applicable approach. It's possible to add calls in more
places, but I haven't done this for reasons discussed previously.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

stat_hooks.v4.patchtext/x-patch; charset=utf-8; name=stat_hooks.v4.patchDownload
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.250
diff -c -r1.250 selfuncs.c
*** src/backend/utils/adt/selfuncs.c	7 Jul 2008 20:24:55 -0000	1.250
--- src/backend/utils/adt/selfuncs.c	6 Aug 2008 22:24:08 -0000
***************
*** 103,108 ****
--- 103,111 ----
  #include "utils/selfuncs.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control when we ask for stats */
+ get_relation_stats_hook_type get_relation_stats_hook = NULL;
+ release_relation_stats_hook_type release_relation_stats_hook = NULL;
  
  static double var_eq_const(VariableStatData *vardata, Oid operator,
  			 Datum constval, bool constisnull,
***************
*** 3769,3775 ****
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
--- 3772,3786 ----
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			if (get_relation_stats_hook)
! 				vardata->statsTuple = (*get_relation_stats_hook) 
! 												(rte->relid, 
! 												 var->varattno);
! 
! 			if (vardata->statsTuple)
! 				vardata->free_using_plugin = true;				
! 			else
! 				vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
***************
*** 3889,3898 ****
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  Int16GetDatum(pos + 1),
! 															 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
--- 3900,3918 ----
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						if (get_relation_stats_hook)
! 						{
! 							vardata->statsTuple = (*get_relation_stats_hook) 
! 													(index->indexoid, 
! 													 pos + 1);
! 							vardata->free_using_plugin = true;				
! 						}
! 
! 						if (!vardata->statsTuple)
! 							vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  	 Int16GetDatum(pos + 1),
! 														 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
***************
*** 5323,5329 ****
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
--- 5343,5349 ----
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple = NULL;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
***************
*** 5332,5337 ****
--- 5352,5358 ----
  	bool		found_null_op;
  	double		num_sa_scans;
  	ListCell   *l;
+ 	bool		free_using_plugin = false;
  
  	/*
  	 * For a btree scan, only leading '=' quals plus inequality quals for the
***************
*** 5527,5536 ****
  		colnum = 1;
  	}
  
! 	tuple = SearchSysCache(STATRELATT,
! 						   ObjectIdGetDatum(relid),
! 						   Int16GetDatum(colnum),
! 						   0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
--- 5548,5563 ----
  		colnum = 1;
  	}
  
! 	if (get_relation_stats_hook)
! 		tuple = (*get_relation_stats_hook) (relid, colnum);
! 
! 	if (tuple)
! 		free_using_plugin = true;				
! 	else
! 		tuple = SearchSysCache(STATRELATT,
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum),
! 							0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
***************
*** 5571,5577 ****
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 		ReleaseSysCache(tuple);
  	}
  
  	PG_RETURN_VOID();
--- 5598,5605 ----
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 
! 		ReleaseStatsTuple(tuple, free_using_plugin);
  	}
  
  	PG_RETURN_VOID();
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.157
diff -c -r1.157 lsyscache.c
*** src/backend/utils/cache/lsyscache.c	13 Apr 2008 20:51:21 -0000	1.157
--- src/backend/utils/cache/lsyscache.c	6 Aug 2008 17:45:35 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
+ #include "optimizer/plancat.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "utils/array.h"
***************
*** 35,40 ****
--- 36,43 ----
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control in get_attavgwidth() */
+ get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
  
  /*				---------- AMOP CACHES ----------						 */
  
***************
*** 2451,2466 ****
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
  
  	tp = SearchSysCache(STATRELATT,
! 						ObjectIdGetDatum(relid),
! 						Int16GetDatum(attnum),
! 						0, 0);
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
--- 2454,2483 ----
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
+  *
+  *	  Calling a hook at this point looks somewhat strange, but is required
+  * 	  because the optimizer handles inheritance relations by calling for
+  *	  the avg width later in the planner than get_relation_info_hook().
+  *	  So the APIs and call points of hooks must match the optimizer.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
+ 	int32		stawidth;
+ 
+ 	if (get_attavgwidth_hook)
+ 	{
+ 		stawidth = (*get_attavgwidth_hook) (relid, attnum);
+ 		if (stawidth > 0)
+ 			return stawidth;
+ 	}
  
  	tp = SearchSysCache(STATRELATT,
! 					ObjectIdGetDatum(relid),
! 					Int16GetDatum(attnum),
! 					0, 0);
! 
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
Index: src/include/optimizer/plancat.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
retrieving revision 1.50
diff -c -r1.50 plancat.h
*** src/include/optimizer/plancat.h	19 Jun 2008 00:46:06 -0000	1.50
--- src/include/optimizer/plancat.h	6 Aug 2008 21:51:09 -0000
***************
*** 14,19 ****
--- 14,20 ----
  #ifndef PLANCAT_H
  #define PLANCAT_H
  
+ #include "access/htup.h"
  #include "nodes/relation.h"
  #include "utils/relcache.h"
  
***************
*** 24,29 ****
--- 25,41 ----
  														 RelOptInfo *rel);
  extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
  
+ /* Hooks for plugins to get control in lsyscache.c and selfuncs.c */
+ typedef HeapTuple (*get_relation_stats_hook_type) (Oid relid, AttrNumber attnum);
+ extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
+ 
+ typedef void (*release_relation_stats_hook_type) (HeapTuple statstup);
+ extern PGDLLIMPORT release_relation_stats_hook_type release_relation_stats_hook;
+ 
+ typedef int32 (*get_attavgwidth_hook_type) (Oid relid, AttrNumber attnum);
+ extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
+ 
+ 
  
  extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
  				  bool inhparent, RelOptInfo *rel);
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.44
diff -c -r1.44 selfuncs.h
*** src/include/utils/selfuncs.h	9 Mar 2008 00:32:09 -0000	1.44
--- src/include/utils/selfuncs.h	6 Aug 2008 21:53:53 -0000
***************
*** 74,85 ****
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
  } VariableStatData;
  
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 			ReleaseSysCache((vardata).statsTuple); \
  	} while(0)
  
  
--- 74,112 ----
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
+ 	bool		free_using_plugin;
  } VariableStatData;
  
+ #define ReleaseStatsTuple(tuple, free_using_plugin)  \
+ 	do { \
+ 		if (HeapTupleIsValid(tuple)) \
+ 		{ \
+ 			if (free_using_plugin) \
+ 			{ \
+ 				if (release_relation_stats_hook) \
+ 					(* release_relation_stats_hook)(tuple); \
+ 				else \
+ 					elog(ERROR, "unable to release variable stats correctly"); \
+ 			} \
+ 			else \
+ 				ReleaseSysCache(tuple); \
+ 		} \
+ 	} while(0)
+ 
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 		{ \
! 			if ((vardata).free_using_plugin) \
! 			{ \
! 				if (release_relation_stats_hook) \
! 					(* release_relation_stats_hook)((vardata).statsTuple); \
! 				else \
! 					elog(ERROR, "unable to release variable stats correctly"); \
! 			} \
! 			else \
! 				ReleaseSysCache((vardata).statsTuple); \
! 		} \
  	} while(0)
  
  
TOM.v2.tarapplication/x-tar; name=TOM.v2.tarDownload
#12Gregory Stark
stark@enterprisedb.com
In reply to: Simon Riggs (#11)
Re: get_relation_stats_hook()

Hm, I assume we want to be able to turn on and off plugins in a running
session? I think the "free_using_plugin" flag:

! if (get_relation_stats_hook)
! vardata->statsTuple = (*get_relation_stats_hook)
! (rte->relid,
! var->varattno);
!
! if (vardata->statsTuple)
! vardata->free_using_plugin = true;
! else
! vardata->statsTuple = SearchSysCache(STATRELATT,

is insufficient to handle this. vardata->free_using_plugin could be true but
by the time the variable is released the plugin pointer could have been
cleared. Or worse, set to a different plugin.

The easiest way to fix this seems like also the best way, instead of storing a
boolean store the pointer to the release function.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Gregory Stark (#12)
Re: get_relation_stats_hook()

On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

The easiest way to fix this seems like also the best way, instead of storing a
boolean store the pointer to the release function.

OK, I like that better anyhow.

Hadn't thought about turning plugin off, but I can see the requirement
now your raise it.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Gregory Stark (#12)
Re: get_relation_stats_hook()

On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

The easiest way

Did you have further review comments? If so, I'll wait for those before
making further mods. Thanks for ones so far.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Gregory Stark (#12)
1 attachment(s)
Re: get_relation_stats_hook()

On Mon, 2008-09-22 at 18:41 +0100, Gregory Stark wrote:

The easiest way to fix this seems like also the best way, instead of storing a
boolean store the pointer to the release function.

Implemented as suggested. v5 enclosed.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

stat_hooks.v5.patchtext/x-patch; charset=utf-8; name=stat_hooks.v5.patchDownload
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.253
diff -c -r1.253 selfuncs.c
*** src/backend/utils/adt/selfuncs.c	25 Aug 2008 22:42:34 -0000	1.253
--- src/backend/utils/adt/selfuncs.c	25 Sep 2008 15:57:58 -0000
***************
*** 118,123 ****
--- 118,126 ----
  #include "utils/selfuncs.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control when we ask for stats */
+ get_relation_stats_hook_type get_relation_stats_hook = NULL;
+ release_relation_stats_hook_type release_relation_stats_hook = NULL;
  
  static double var_eq_const(VariableStatData *vardata, Oid operator,
  			 Datum constval, bool constisnull,
***************
*** 3996,4005 ****
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
  		}
  		else
  		{
--- 3999,4017 ----
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			if (get_relation_stats_hook)
! 				vardata->statsTuple = (*get_relation_stats_hook) 
! 										(rte->relid, 
! 										 var->varattno,
! 										 vardata->freefunc);
! 			if (!vardata->statsTuple)
! 			{
! 				vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
+ 				vardata->freefunc = ReleaseSysCache;
+ 			}
  		}
  		else
  		{
***************
*** 4116,4125 ****
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  Int16GetDatum(pos + 1),
! 															 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
--- 4128,4147 ----
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						if (get_relation_stats_hook)
! 							vardata->statsTuple = (*get_relation_stats_hook) 
! 													(index->indexoid, 
! 													 pos + 1,
! 													 vardata->freefunc);
! 						if (!vardata->statsTuple)
! 						{
! 							vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  	 Int16GetDatum(pos + 1),
! 														 0, 0);
! 							vardata->freefunc = ReleaseSysCache;
! 						}
! 
  						if (vardata->statsTuple)
  							break;
  					}
***************
*** 5551,5557 ****
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
--- 5573,5580 ----
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple = NULL;
! 	void		(*freefunc) (HeapTuple tuple) = NULL;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
***************
*** 5756,5765 ****
  		colnum = 1;
  	}
  
! 	tuple = SearchSysCache(STATRELATT,
! 						   ObjectIdGetDatum(relid),
! 						   Int16GetDatum(colnum),
! 						   0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
--- 5779,5795 ----
  		colnum = 1;
  	}
  
! 	if (get_relation_stats_hook)
! 		tuple = (*get_relation_stats_hook) (relid, colnum, freefunc);
! 
! 	if (!tuple)
! 	{
! 		tuple = SearchSysCache(STATRELATT,
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum),
! 							0, 0);
! 		freefunc = ReleaseSysCache;
! 	}
  
  	if (HeapTupleIsValid(tuple))
  	{
***************
*** 5800,5806 ****
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 		ReleaseSysCache(tuple);
  	}
  
  	PG_RETURN_VOID();
--- 5830,5837 ----
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 
! 		ReleaseStatsTuple(tuple, freefunc);
  	}
  
  	PG_RETURN_VOID();
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.159
diff -c -r1.159 lsyscache.c
*** src/backend/utils/cache/lsyscache.c	2 Aug 2008 21:32:00 -0000	1.159
--- src/backend/utils/cache/lsyscache.c	25 Sep 2008 14:16:01 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
+ #include "optimizer/plancat.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "utils/array.h"
***************
*** 35,40 ****
--- 36,43 ----
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control in get_attavgwidth() */
+ get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
  
  /*				---------- AMOP CACHES ----------						 */
  
***************
*** 2492,2507 ****
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
  
  	tp = SearchSysCache(STATRELATT,
! 						ObjectIdGetDatum(relid),
! 						Int16GetDatum(attnum),
! 						0, 0);
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
--- 2495,2524 ----
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
+  *
+  *	  Calling a hook at this point looks somewhat strange, but is required
+  * 	  because the optimizer handles inheritance relations by calling for
+  *	  the avg width later in the planner than get_relation_info_hook().
+  *	  So the APIs and call points of hooks must match the optimizer.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
+ 	int32		stawidth;
+ 
+ 	if (get_attavgwidth_hook)
+ 	{
+ 		stawidth = (*get_attavgwidth_hook) (relid, attnum);
+ 		if (stawidth > 0)
+ 			return stawidth;
+ 	}
  
  	tp = SearchSysCache(STATRELATT,
! 					ObjectIdGetDatum(relid),
! 					Int16GetDatum(attnum),
! 					0, 0);
! 
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
Index: src/include/optimizer/plancat.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
retrieving revision 1.51
diff -c -r1.51 plancat.h
*** src/include/optimizer/plancat.h	16 Aug 2008 00:01:38 -0000	1.51
--- src/include/optimizer/plancat.h	25 Sep 2008 15:57:06 -0000
***************
*** 14,19 ****
--- 14,20 ----
  #ifndef PLANCAT_H
  #define PLANCAT_H
  
+ #include "access/htup.h"
  #include "nodes/relation.h"
  #include "utils/relcache.h"
  
***************
*** 24,29 ****
--- 25,43 ----
  														 RelOptInfo *rel);
  extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
  
+ /* Hooks for plugins to get control in lsyscache.c and selfuncs.c */
+ typedef HeapTuple (*get_relation_stats_hook_type) (Oid relid, AttrNumber attnum, 
+ 											void (*freefunc) (HeapTuple tuple));
+ extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
+ 
+ /* must match ReleaseSysCache call signature */
+ typedef void (*release_relation_stats_hook_type) (HeapTuple tuple);
+ extern PGDLLIMPORT release_relation_stats_hook_type release_relation_stats_hook;
+ 
+ typedef int32 (*get_attavgwidth_hook_type) (Oid relid, AttrNumber attnum);
+ extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
+ 
+ 
  
  extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
  				  bool inhparent, RelOptInfo *rel);
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.46
diff -c -r1.46 selfuncs.h
*** src/include/utils/selfuncs.h	16 Aug 2008 00:01:38 -0000	1.46
--- src/include/utils/selfuncs.h	25 Sep 2008 15:55:05 -0000
***************
*** 74,85 ****
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
  } VariableStatData;
  
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 			ReleaseSysCache((vardata).statsTuple); \
  	} while(0)
  
  
--- 74,102 ----
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
+ 	void	(*freefunc) (HeapTuple tuple);	/* funct ptr to free statsTuple */
  } VariableStatData;
  
+ #define ReleaseStatsTuple(tuple, freefunc)  \
+ 	do { \
+ 		if (HeapTupleIsValid(tuple)) \
+ 		{ \
+ 			if (freefunc) \
+ 				freefunc(tuple); \
+ 			else \
+ 				elog(ERROR, "unable to release variable stats correctly"); \
+ 		} \
+ 	} while(0)
+ 
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 		{ \
! 			if ((vardata).freefunc) \
! 				(vardata).freefunc((vardata).statsTuple); \
! 			else \
! 				elog(ERROR, "unable to release variable stats correctly"); \
! 		} \
  	} while(0)
  
  
#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#11)
2 attachment(s)
Re: [HACKERS] get_relation_stats_hook()

On Wed, 2008-08-06 at 23:38 +0100, Simon Riggs wrote:

On Wed, 2008-08-06 at 16:37 +0100, Simon Riggs wrote:

I'll submit the fully working plugin once we've stabilised the API. It's
designed as a contrib module, so it can go in pgfoundry or contrib.

OK, here's fully working plugin, plus API patch.

I expect to open a pgfoundry project for the plugin, but will wait for
the main patch review.

I've tried the APIs three different ways and this seems cleanest and
most widely applicable approach. It's possible to add calls in more
places, but I haven't done this for reasons discussed previously.

New version of Postgres patch, v5. Implements suggested changes.
Ready for review and apply.

New version of stats plugin, v3. Works with v5.
Corrected problems:
* now loads using preload_shared_libraries as well as LOAD
* example test script fix

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Attachments:

stat_hooks.v5.patchtext/x-patch; charset=UTF-8; name=stat_hooks.v5.patchDownload
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.253
diff -c -r1.253 selfuncs.c
*** src/backend/utils/adt/selfuncs.c	25 Aug 2008 22:42:34 -0000	1.253
--- src/backend/utils/adt/selfuncs.c	25 Sep 2008 15:57:58 -0000
***************
*** 118,123 ****
--- 118,126 ----
  #include "utils/selfuncs.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control when we ask for stats */
+ get_relation_stats_hook_type get_relation_stats_hook = NULL;
+ release_relation_stats_hook_type release_relation_stats_hook = NULL;
  
  static double var_eq_const(VariableStatData *vardata, Oid operator,
  			 Datum constval, bool constisnull,
***************
*** 3996,4005 ****
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
  		}
  		else
  		{
--- 3999,4017 ----
  		}
  		else if (rte->rtekind == RTE_RELATION)
  		{
! 			if (get_relation_stats_hook)
! 				vardata->statsTuple = (*get_relation_stats_hook) 
! 										(rte->relid, 
! 										 var->varattno,
! 										 vardata->freefunc);
! 			if (!vardata->statsTuple)
! 			{
! 				vardata->statsTuple = SearchSysCache(STATRELATT,
  												 ObjectIdGetDatum(rte->relid),
  												 Int16GetDatum(var->varattno),
  												 0, 0);
+ 				vardata->freefunc = ReleaseSysCache;
+ 			}
  		}
  		else
  		{
***************
*** 4116,4125 ****
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  Int16GetDatum(pos + 1),
! 															 0, 0);
  						if (vardata->statsTuple)
  							break;
  					}
--- 4128,4147 ----
  							index->indpred == NIL)
  							vardata->isunique = true;
  						/* Has it got stats? */
! 						if (get_relation_stats_hook)
! 							vardata->statsTuple = (*get_relation_stats_hook) 
! 													(index->indexoid, 
! 													 pos + 1,
! 													 vardata->freefunc);
! 						if (!vardata->statsTuple)
! 						{
! 							vardata->statsTuple = SearchSysCache(STATRELATT,
  										   ObjectIdGetDatum(index->indexoid),
! 													  	 Int16GetDatum(pos + 1),
! 														 0, 0);
! 							vardata->freefunc = ReleaseSysCache;
! 						}
! 
  						if (vardata->statsTuple)
  							break;
  					}
***************
*** 5551,5557 ****
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
--- 5573,5580 ----
  	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(7);
  	Oid			relid;
  	AttrNumber	colnum;
! 	HeapTuple	tuple = NULL;
! 	void		(*freefunc) (HeapTuple tuple) = NULL;
  	double		numIndexTuples;
  	List	   *indexBoundQuals;
  	int			indexcol;
***************
*** 5756,5765 ****
  		colnum = 1;
  	}
  
! 	tuple = SearchSysCache(STATRELATT,
! 						   ObjectIdGetDatum(relid),
! 						   Int16GetDatum(colnum),
! 						   0, 0);
  
  	if (HeapTupleIsValid(tuple))
  	{
--- 5779,5795 ----
  		colnum = 1;
  	}
  
! 	if (get_relation_stats_hook)
! 		tuple = (*get_relation_stats_hook) (relid, colnum, freefunc);
! 
! 	if (!tuple)
! 	{
! 		tuple = SearchSysCache(STATRELATT,
! 							ObjectIdGetDatum(relid),
! 							Int16GetDatum(colnum),
! 							0, 0);
! 		freefunc = ReleaseSysCache;
! 	}
  
  	if (HeapTupleIsValid(tuple))
  	{
***************
*** 5800,5806 ****
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 		ReleaseSysCache(tuple);
  	}
  
  	PG_RETURN_VOID();
--- 5830,5837 ----
  
  			free_attstatsslot(InvalidOid, NULL, 0, numbers, nnumbers);
  		}
! 
! 		ReleaseStatsTuple(tuple, freefunc);
  	}
  
  	PG_RETURN_VOID();
Index: src/backend/utils/cache/lsyscache.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/utils/cache/lsyscache.c,v
retrieving revision 1.159
diff -c -r1.159 lsyscache.c
*** src/backend/utils/cache/lsyscache.c	2 Aug 2008 21:32:00 -0000	1.159
--- src/backend/utils/cache/lsyscache.c	25 Sep 2008 14:16:01 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_type.h"
+ #include "optimizer/plancat.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "utils/array.h"
***************
*** 35,40 ****
--- 36,43 ----
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
  
+ /* Hook for plugins to get control in get_attavgwidth() */
+ get_attavgwidth_hook_type get_attavgwidth_hook = NULL;
  
  /*				---------- AMOP CACHES ----------						 */
  
***************
*** 2492,2507 ****
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
  
  	tp = SearchSysCache(STATRELATT,
! 						ObjectIdGetDatum(relid),
! 						Int16GetDatum(attnum),
! 						0, 0);
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
--- 2495,2524 ----
   *
   *	  Given the table and attribute number of a column, get the average
   *	  width of entries in the column.  Return zero if no data available.
+  *
+  *	  Calling a hook at this point looks somewhat strange, but is required
+  * 	  because the optimizer handles inheritance relations by calling for
+  *	  the avg width later in the planner than get_relation_info_hook().
+  *	  So the APIs and call points of hooks must match the optimizer.
   */
  int32
  get_attavgwidth(Oid relid, AttrNumber attnum)
  {
  	HeapTuple	tp;
+ 	int32		stawidth;
+ 
+ 	if (get_attavgwidth_hook)
+ 	{
+ 		stawidth = (*get_attavgwidth_hook) (relid, attnum);
+ 		if (stawidth > 0)
+ 			return stawidth;
+ 	}
  
  	tp = SearchSysCache(STATRELATT,
! 					ObjectIdGetDatum(relid),
! 					Int16GetDatum(attnum),
! 					0, 0);
! 
  	if (HeapTupleIsValid(tp))
  	{
  		int32		stawidth = ((Form_pg_statistic) GETSTRUCT(tp))->stawidth;
Index: src/include/optimizer/plancat.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/optimizer/plancat.h,v
retrieving revision 1.51
diff -c -r1.51 plancat.h
*** src/include/optimizer/plancat.h	16 Aug 2008 00:01:38 -0000	1.51
--- src/include/optimizer/plancat.h	25 Sep 2008 15:57:06 -0000
***************
*** 14,19 ****
--- 14,20 ----
  #ifndef PLANCAT_H
  #define PLANCAT_H
  
+ #include "access/htup.h"
  #include "nodes/relation.h"
  #include "utils/relcache.h"
  
***************
*** 24,29 ****
--- 25,43 ----
  														 RelOptInfo *rel);
  extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
  
+ /* Hooks for plugins to get control in lsyscache.c and selfuncs.c */
+ typedef HeapTuple (*get_relation_stats_hook_type) (Oid relid, AttrNumber attnum, 
+ 											void (*freefunc) (HeapTuple tuple));
+ extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
+ 
+ /* must match ReleaseSysCache call signature */
+ typedef void (*release_relation_stats_hook_type) (HeapTuple tuple);
+ extern PGDLLIMPORT release_relation_stats_hook_type release_relation_stats_hook;
+ 
+ typedef int32 (*get_attavgwidth_hook_type) (Oid relid, AttrNumber attnum);
+ extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
+ 
+ 
  
  extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
  				  bool inhparent, RelOptInfo *rel);
Index: src/include/utils/selfuncs.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/utils/selfuncs.h,v
retrieving revision 1.46
diff -c -r1.46 selfuncs.h
*** src/include/utils/selfuncs.h	16 Aug 2008 00:01:38 -0000	1.46
--- src/include/utils/selfuncs.h	25 Sep 2008 15:55:05 -0000
***************
*** 74,85 ****
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
  } VariableStatData;
  
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 			ReleaseSysCache((vardata).statsTuple); \
  	} while(0)
  
  
--- 74,102 ----
  	Oid			atttype;		/* type to pass to get_attstatsslot */
  	int32		atttypmod;		/* typmod to pass to get_attstatsslot */
  	bool		isunique;		/* true if matched to a unique index */
+ 	void	(*freefunc) (HeapTuple tuple);	/* funct ptr to free statsTuple */
  } VariableStatData;
  
+ #define ReleaseStatsTuple(tuple, freefunc)  \
+ 	do { \
+ 		if (HeapTupleIsValid(tuple)) \
+ 		{ \
+ 			if (freefunc) \
+ 				freefunc(tuple); \
+ 			else \
+ 				elog(ERROR, "unable to release variable stats correctly"); \
+ 		} \
+ 	} while(0)
+ 
  #define ReleaseVariableStats(vardata)  \
  	do { \
  		if (HeapTupleIsValid((vardata).statsTuple)) \
! 		{ \
! 			if ((vardata).freefunc) \
! 				(vardata).freefunc((vardata).statsTuple); \
! 			else \
! 				elog(ERROR, "unable to release variable stats correctly"); \
! 		} \
  	} while(0)
  
  
TOM.v3.tarapplication/x-tar; name=TOM.v3.tarDownload
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#16)
Re: [HACKERS] get_relation_stats_hook()

Simon Riggs <simon@2ndQuadrant.com> writes:

New version of Postgres patch, v5. Implements suggested changes.
Ready for review and apply.

Applied with some revisions. The method for passing back freefunc
didn't work, so I made it pass the whole VariableStatsData struct
instead; this might allow some additional flexibility by changing other
fields besides the intended statsTuple and freefunc. Also, I was still
unhappy about adding a hook in the midst of code that clearly needs
improvement, without making it possible for the hook to override the
adjacent broken code paths; so I refactored the API a bit for that too.

The plugin function would now be something like this:

static bool
plugin_get_relation_stats(PlannerInfo *root,
RangeTblEntry *rte,
AttrNumber attnum,
VariableStatData *vardata)
{
HeapTuple statstup = NULL;

/* For now, we only cover the simple-relation case */
if (rte->rtekind != RTE_RELATION || rte->inh)
return false;

if (!get_tom_stats_tupletable(rte->relid, attnum))
return false;

/*
* Get stats if present. We asked for only one row, so no need for loops.
*/
if (SPI_processed > 0)
statstup = SPI_copytuple(SPI_tuptable->vals[0]);

SPI_freetuptable(SPI_tuptable);
SPI_finish();

if (!statstup)
return false; /* should this happen? */

vardata->statsTuple = statstup;
/* define function to use when time to free the tuple */
vardata->freefunc = heap_freetuple;

return true;
}

and if you want to insert stats for expression indexes then there's a
separate get_index_stats_hook for that.

regards, tom lane