PL instrumentation plugin support (i.e. PL/pgSQL debugger infrastructure)

Started by korryd@enterprisedb.comover 19 years ago18 messages
#1korryd@enterprisedb.com
korryd@enterprisedb.com
1 attachment(s)

The attached patch adds support for loadable instrumentation plugins for
procedural languages (as discussed at the anniversary summit). It also
adds plugin support to the PL/pgSQL language handler.

We are using this plugin mechanism to load the PL/pgSQL debugger on
demand (the debugger is not part of this patch, the patch is just the
infrastructure that we need to load various instrumentation plugins).
We will also post some sample plugins (a PL/pgSQL profiler and a simple
tracer), probably at the edb-debugger pgFoundry project site so you can
see how to use the plugin mechanism.

A couple of notes: to use a plugin, you must define a custom GUC
variable that specifies the name of the shared-object that implements
the plugin. For example, to use the PL/pgSQL profiler (which is
implement in $libdir/plugin_profiler.so), you would add the following to
postgresql.conf

custom_variable_classes = 'plpgsql'

plpgsql.plugin = 'plugin_profiler'

The plpgsql.plugin variable is treated as an SUSET variable so you must
be a superuser to change plugins.

Also, we define a helper function (in fmgr.c) named load_pl_plugin()
that a language handler can use to load a plugin. The plugin requires
the name of the GUC variable (plplgsql.plugin, pljava.plugin,
plperl.pluhgN, etc.) and a pointer to a language-specific struct that
the plugin fills in with a set of function pointers (presumably, the
language handler will call those functions at appropriate points in
time).

-- Korry

Attachments:

plugin.patchtext/x-patch; charset=UTF-8; name=plugin.patchDownload
Index: src/backend/utils/fmgr/fmgr.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/fmgr/fmgr.c,v
retrieving revision 1.101
diff -w -c -r1.101 fmgr.c
*** src/backend/utils/fmgr/fmgr.c	30 May 2006 21:21:30 -0000	1.101
--- src/backend/utils/fmgr/fmgr.c	28 Jul 2006 13:22:15 -0000
***************
*** 25,30 ****
--- 25,31 ----
  #include "utils/fmgrtab.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
+ #include "utils/guc.h"
  
  /*
   * Declaration for old-style function pointer type.  This is now used only
***************
*** 69,76 ****
  	const Pg_finfo_record *inforec;		/* address of its info record */
  } CFuncHashTabEntry;
  
! static HTAB *CFuncHash = NULL;
  
  
  static void fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
  					   bool ignore_security);
--- 70,87 ----
  	const Pg_finfo_record *inforec;		/* address of its info record */
  } CFuncHashTabEntry;
  
! /*
!  * Hashtable for keeping track of language-handler plugins
!  */
! 
! typedef struct
! {
! 	char	varName[NAMEDATALEN];	/* Name of GUC variable for this handler      */
! 	char  * varValue;				/* A place to store the value of the variable */
! } pluginHashTabEntry;
  
+ static HTAB *CFuncHash = NULL;
+ static HTAB *pluginHash = NULL;
  
  static void fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
  					   bool ignore_security);
***************
*** 81,87 ****
  			  PGFunction user_fn, const Pg_finfo_record *inforec);
  static Datum fmgr_oldstyle(PG_FUNCTION_ARGS);
  static Datum fmgr_security_definer(PG_FUNCTION_ARGS);
! 
  
  /*
   * Lookup routines for builtin-function table.	We can search by either Oid
--- 92,100 ----
  			  PGFunction user_fn, const Pg_finfo_record *inforec);
  static Datum fmgr_oldstyle(PG_FUNCTION_ARGS);
  static Datum fmgr_security_definer(PG_FUNCTION_ARGS);
! static char * findPluginName(const char * varName);
! static void createPluginHash(void);
! static const char * assign_plugin(const char *newval, bool doit, GucSource source);
  
  /*
   * Lookup routines for builtin-function table.	We can search by either Oid
***************
*** 2081,2083 ****
--- 2094,2286 ----
  
  	return argtype;
  }
+ 
+ /*--------------------------------------------------------------------------
+  * Support routines for loadable procedural-language instrumentation plugins
+  *--------------------------------------------------------------------------
+  */
+ 
+ /*
+  * Load a plugin for a procedural-language handler (or just about 
+  * anything else).
+  *
+  * A language handler (such as pl_handler.c) can call this function
+  * to load an instrumentation plugin (typically a structure that 
+  * contains a set of function pointers that the language handler 
+  * will invoke at appropriate points during the execution of a PL
+  * function).  
+  *
+  * 'hooks' will typically point to a language-specific structure
+  * that the plugin will populate with function pointers (and any
+  * other data required by the language handler).
+  *
+  * 'varName' is the name of a GUC variable (such as 'plpgsql.plugin')
+  * that determines the name of the plugin's shared-object library.
+  *
+  * For example, the PL/pgSQL language handler calls load_pl_plugin()
+  * like this:
+  *		PLpgSQL_plugin 	hookFunctions;
+  *
+  *		load_pl_plugin(&hookFunctions, "plpgsql.plugin" );
+  *
+  * Note that load_pl_plugin() allocates space (in pluginHash) for 
+  * the GUC variables so callers don't have to deal with that.
+  *
+  * Also note that there is (currently) no corresponding unload_pl_plugin()
+  * function - if you want to change plugins, you have to change the 
+  * value of the GUC variable, and then LOAD 'plpgsql' again.
+  */
+ 
+ void 
+ load_pl_plugin(void * hooks, char * varName)
+ {
+ 	plugin_loader   loader;
+ 	char		  * pluginName;
+ 
+ 	Assert(hooks != NULL);
+ 	Assert(varName != NULL);
+ 
+ 	/* Ensure that our variable name->plugin name hash exists */
+ 	createPluginHash();
+ 
+ 	/* See if we have a GUC setting for the given variable name */
+ 	if ((pluginName = findPluginName(varName)) == NULL )
+ 		return;
+ 
+ 	/* 
+ 	 * If the pluginName contains a ':' we assume that the user 
+ 	 * gave us the name of the plugin loader function (following
+ 	 * the ':') otherwise, we look for a function whose name is
+ 	 * 'load_plugin'.
+ 	 */
+ 
+ 	if (strstr(pluginName, ":") == NULL)
+ 	{
+ 		loader = (plugin_loader)load_external_function(pluginName, "load_plugin", true, NULL);
+ 	}
+ 	else
+ 	{
+ 		/* 
+ 		 * The user gave us a string of the form 'libName:funcName' - pick it apart,
+ 		 * load the given libName, and search for the given funcName within that 
+ 		 * library
+ 		 */
+ 
+ 		char * libName  = pstrdup(pluginName);
+ 		char * sep      = strstr(libName, ":");
+ 		char * funcName = sep+1;
+ 
+ 		*sep = '\0';
+ 
+ 		loader = (plugin_loader)load_external_function(libName, funcName, true, NULL);
+ 
+ 		pfree(libName);
+ 	}
+ 
+ 	/*
+ 	 * Call the loader function that we found (note - load_external_function()
+ 	 * throws an error if it can't find the loader function so its safe to call
+ 	 * *loader without checking for NULL.
+ 	 */
+ 
+ 	Assert(loader != NULL);
+ 
+ 	(*loader)(hooks);
+ }
+ 
+ /*
+  * Creates the hash table that keeps track of which PL plugins we have loaded.
+  * The key to this hash is the name of the GUC variable (such as 'plpgsql.plugin')
+  * and the data for each entry is simply a char pointer that points to the value
+  * of that variable.
+  *
+  * Note: you can call this function as many times as you like, it only does 
+  * interesting work the first time you call it.
+  */
+ 
+ static void
+ createPluginHash(void)
+ {
+ 	if (pluginHash == NULL)
+ 	{
+ 		HASHCTL	ctl;
+ 
+ 		ctl.keysize   = NAMEDATALEN;
+ 		ctl.entrysize = sizeof( pluginHashTabEntry );
+ 
+ 		pluginHash = hash_create( "PL Plugin Cache", 5, &ctl, HASH_ELEM);
+ 
+ 		Assert(pluginHash != NULL);
+ 	}
+ }
+ 
+ /*
+  * findPluginName() creates a new (custom classed) GUC variable named 'varName'
+  * and returns the value (possibly NULL) of that variable.  Presumably, varName
+  * is the name of a PL plugin library.
+  */
+ 
+ static char *
+ findPluginName(const char * varName)
+ {
+ 	char	             key[NAMEDATALEN] = {0};
+ 	pluginHashTabEntry * hentry;
+ 	bool				 found;
+ 
+ 	Assert(pluginHash != NULL);
+ 	
+ 	/* Add the variable to our pluginHash if it's not already there */
+ 	snprintf(key, NAMEDATALEN - 1, "%s", varName);
+ 
+ 	hentry = (pluginHashTabEntry *) hash_search(pluginHash, key, HASH_ENTER, &found);
+ 
+ 	if (!found)
+ 	{
+ 		/*
+ 		 * We didn't find 'varName' in our hash, tell the GUC mechanism about 
+ 		 * this variable.
+ 		 *
+ 		 * NOTE: We must define a USERSET variable, but we really want an SUSET 
+ 		 * variable.  Apparently, you can't really create SUSET custom-classed 
+ 		 * variables.  So, we create a USERSET variable instead and use an assign-
+ 		 * hook to enforce privileges.
+ 		 */
+ 
+ 		DefineCustomStringVariable( varName,
+ 		    "Name of instrumentation plugin to use when procedural-language function is invoked",
+ 			 NULL, &hentry->varValue, PGC_USERSET, assign_plugin, NULL );
+ 	}
+ 
+ 	return hentry->varValue;
+ }
+ 
+ /*
+  * This function is an assign-hook for custom-classed PL plugin 
+  * GUC variables. We want to prevent non-superusers from loading
+  * arbitrary PL plugins so we check privileges here.
+  */
+ 
+ static const char *
+ assign_plugin(const char *newval, bool doit, GucSource source)
+ {
+ 	if (doit)
+ 	{
+ 		/*
+ 		 * If this plugin name is being modified by a client 
+ 		 * application or by the user (i.e. a SET command),
+ 		 * make sure that we have superuser() privileges.
+ 		 */
+ 		if (source > PGC_S_USER)
+ 		{
+ 			if( !superuser())
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ 						 errmsg("must be superuser to modify plugin name")));
+ 		}
+ 
+ 		return strdup(newval);
+ 	}
+ 
+ 	return newval;
+ }
+ 
Index: src/include/fmgr.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/fmgr.h,v
retrieving revision 1.45
diff -w -c -r1.45 fmgr.h
*** src/include/fmgr.h	31 May 2006 20:58:09 -0000	1.45
--- src/include/fmgr.h	28 Jul 2006 13:22:19 -0000
***************
*** 340,345 ****
--- 340,347 ----
  	int		namedatalen;		/* NAMEDATALEN */
  } Pg_magic_struct;
  
+ typedef void (*plugin_loader)(void *);
+ 
  /* The actual data block contents */
  #define PG_MODULE_MAGIC_DATA \
  { \
***************
*** 480,485 ****
--- 482,489 ----
  extern Oid	get_fn_expr_argtype(FmgrInfo *flinfo, int argnum);
  extern Oid	get_call_expr_argtype(fmNodePtr expr, int argnum);
  
+ extern void load_pl_plugin(void * hooks, char * varName);
+ 
  /*
   * Routines in dfmgr.c
   */
Index: src/pl/plpgsql/src/pl_exec.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v
retrieving revision 1.174
diff -w -c -r1.174 pl_exec.c
*** src/pl/plpgsql/src/pl_exec.c	13 Jul 2006 16:49:20 -0000	1.174
--- src/pl/plpgsql/src/pl_exec.c	28 Jul 2006 13:22:22 -0000
***************
*** 255,260 ****
--- 255,266 ----
  	exec_set_found(&estate, false);
  
  	/*
+ 	 * Let the instrumentation plugin peek at this function
+ 	 */
+ 	if (plugin.func_beg)
+ 		plugin.func_beg(&estate, func);
+ 
+ 	/*
  	 * Now call the toplevel block of statements
  	 */
  	estate.err_text = NULL;
***************
*** 389,394 ****
--- 395,407 ----
  		}
  	}
  
+ 	/*
+ 	 * Tell the (optional) plugin that we've finished executing this 
+ 	 * function
+ 	 */
+ 	if (plugin.func_end)
+ 		plugin.func_end(&estate, func);
+ 
  	/* Clean up any leftover temporary memory */
  	FreeExprContext(estate.eval_econtext);
  	estate.eval_econtext = NULL;
***************
*** 583,588 ****
--- 596,607 ----
  	exec_set_found(&estate, false);
  
  	/*
+ 	 * Let the instrumentation plugin peek at this function
+ 	 */
+ 	if (plugin.func_beg)
+ 		plugin.func_beg(&estate, func);
+ 
+ 	/*
  	 * Now call the toplevel block of statements
  	 */
  	estate.err_text = NULL;
***************
*** 635,640 ****
--- 654,666 ----
  		rettup = SPI_copytuple((HeapTuple) (estate.retval));
  	}
  
+ 	/*
+ 	 * Tell the (optional) plugin that we've finished executing this 
+ 	 * function
+ 	 */
+ 	if (plugin.func_end)
+ 		plugin.func_end(&estate, func);
+ 
  	/* Clean up any leftover temporary memory */
  	FreeExprContext(estate.eval_econtext);
  	estate.eval_econtext = NULL;
***************
*** 1039,1044 ****
--- 1065,1074 ----
  	save_estmt = estate->err_stmt;
  	estate->err_stmt = stmt;
  
+ 	/* Let the plugin know that we are about to execute this statement */
+ 	if (plugin.stmt_beg)
+ 		plugin.stmt_beg(estate, stmt);
+ 
  	CHECK_FOR_INTERRUPTS();
  
  	switch (stmt->cmd_type)
***************
*** 1130,1135 ****
--- 1160,1169 ----
  
  	estate->err_stmt = save_estmt;
  
+ 	/* Let the plugin know that we have finished executing this statement */
+ 	if (plugin.stmt_end)
+ 		plugin.stmt_end(estate, stmt);
+ 
  	return rc;
  }
  
***************
*** 2204,2209 ****
--- 2238,2252 ----
  	 * child of simple_eval_estate.
  	 */
  	estate->eval_econtext = CreateExprContext(simple_eval_estate);
+ 
+ 	/*
+ 	 * Let the plugin see this function before we initialize any
+ 	 * local PL/pgSQL variables - note that we also give the plugin
+ 	 * a few function pointers so it can call back into the executor
+ 	 * for doing things like variable assignments and stack traces
+ 	 */
+ 	if (plugin.func_setup)
+ 		plugin.func_setup(estate, func, plpgsql_exec_error_callback, exec_assign_expr);
  }
  
  /* ----------
Index: src/pl/plpgsql/src/pl_handler.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/pl_handler.c,v
retrieving revision 1.29
diff -w -c -r1.29 pl_handler.c
*** src/pl/plpgsql/src/pl_handler.c	30 May 2006 22:12:16 -0000	1.29
--- src/pl/plpgsql/src/pl_handler.c	28 Jul 2006 13:22:24 -0000
***************
*** 23,28 ****
--- 23,29 ----
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"
+ #include "fmgr.h"
  
  extern DLLIMPORT bool check_function_bodies;
  
***************
*** 32,37 ****
--- 33,40 ----
  
  static void plpgsql_init_all(void);
  
+ PLpgSQL_plugin plugin = {0};	/* Hooks for (optional) instrumentation plugin */
+ 
  
  /*
   * plpgsql_init()			- postmaster-startup safe initialization
***************
*** 48,53 ****
--- 51,59 ----
  	plpgsql_HashTableInit();
  	RegisterXactCallback(plpgsql_xact_cb, NULL);
  	plpgsql_firstcall = false;
+ 
+ 	/* Load any plugins identified by the plpgsql.plugin GUC variable */
+ 	load_pl_plugin(&plugin, "plpgsql.plugin");
  }
  
  /*
Index: src/pl/plpgsql/src/plpgsql.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/pl/plpgsql/src/plpgsql.h,v
retrieving revision 1.77
diff -w -c -r1.77 plpgsql.h
*** src/pl/plpgsql/src/plpgsql.h	11 Jul 2006 17:26:59 -0000	1.77
--- src/pl/plpgsql/src/plpgsql.h	28 Jul 2006 13:22:24 -0000
***************
*** 621,629 ****
--- 621,661 ----
  	PLpgSQL_function *err_func; /* current func */
  	PLpgSQL_stmt *err_stmt;		/* current stmt */
  	const char *err_text;		/* additional state info */
+ 	void       *plugin_info;	/* reserved for use by optional plugin */
  } PLpgSQL_execstate;
  
  
+ /*
+  * A PLpgSQL_plugin structure represents an instrumentation plugin.
+  * We keep one of these structures (in pl_handler) and initialize 
+  * it by loading a plugin (if desired). This structure is basically
+  * a collection of function pointers - at various points in pl_exec.c,
+  * we call those functions (if they are non-NULL) to give the plugin
+  * a chance to watch what we are doing.
+  *
+  *	func_setup is called when we start a function, before we've initialized
+  *  the local variables (that is, the PL/pgSQL variables defined by the 
+  *  function).
+  *
+  *  func_beg is called when we start a function, after we've initialized
+  *  the local variables
+  *
+  *  func_end is called at the end of a function
+  *
+  *  stmt_beg and stmt_end are called before and after (respectively) each
+  *  statement
+  */
+ 
+ typedef struct
+ {
+ 	void (*func_setup)(PLpgSQL_execstate * estate, PLpgSQL_function * func, void (*error_callback)(void *arg), void (*assign_expr)( PLpgSQL_execstate *estate, PLpgSQL_datum *target, PLpgSQL_expr *expr));
+ 	void (*func_beg)(PLpgSQL_execstate * estate, PLpgSQL_function * func);
+ 	void (*func_end)(PLpgSQL_execstate * estate, PLpgSQL_function * func);
+ 	void (*stmt_beg)(PLpgSQL_execstate * estate, PLpgSQL_stmt * stmt);
+ 	void (*stmt_end)(PLpgSQL_execstate * estate, PLpgSQL_stmt * stmt);
+ } PLpgSQL_plugin;
+ 
+ 
  /**********************************************************************
   * Global variable declarations
   **********************************************************************/
***************
*** 645,650 ****
--- 677,684 ----
  extern bool plpgsql_check_syntax;
  extern MemoryContext compile_tmp_cxt;
  
+ extern PLpgSQL_plugin plugin;
+ 
  /**********************************************************************
   * Function declarations
   **********************************************************************/
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#1)
Re: PL instrumentation plugin support (i.e. PL/pgSQL debugger infrastructure)

"korryd@enterprisedb.com" <korryd@enterprisedb.com> writes:

The attached patch adds support for loadable instrumentation plugins for
procedural languages (as discussed at the anniversary summit). It also
adds plugin support to the PL/pgSQL language handler.

In view of the other patch submitted to support init/fini functions for
shared libraries, I'm inclined to change this one to depend on that;
in particular it seems like we could eliminate the necessity for users
to specify the correct setup-function names. Thoughts?

regards, tom lane

#3korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Tom Lane (#2)
Re: PL instrumentation plugin support (i.e. PL/pgSQL

In view of the other patch submitted to support init/fini functions for
shared libraries, I'm inclined to change this one to depend on that;
in particular it seems like we could eliminate the necessity for users
to specify the correct setup-function names. Thoughts?

I think that would be great. Can you point me to the patch you're
referring to? I can convert my patch if you prefer.

-- Korry

--
Korry Douglas korryd@enterprisedb.com
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#3)
Plugins redux (was Re: [PATCHES] PL instrumentation plugin support)

Yesterday I suggested that we should redesign the PL plugin patch:
http://archives.postgresql.org/pgsql-patches/2006-07/msg00291.php
in view of the recently-committed patch to add initialization/
finalization support for all dynamic libraries loaded into the backend:
http://archives.postgresql.org/pgsql-committers/2006-08/msg00176.php

That was just handwaving at the time, because I didn't have a concrete
proposal, but here is one.

Overview
--------

ISTM there are two separate bits of functionality that the core backend
needs to provide in support of PL plugins. The first is that we need a
"rendezvous" facility whereby two separately-loaded shared libraries can
connect and exchange data (a struct of function pointers for the immediate
plans for plpgsql, but in general it might be anything). The exchanged
data mustn't be limited to things known in advance to the core backend.
The facility should support loading of communicating libraries in either
order, and it should also allow for unloading of libraries. This seems
easy enough to design: the core backend should basically provide a
hashtable that maps names to "void *" pointer variables, and then two
libraries that want to communicate just agree on a name and the data
structure to be pointed to. (More details below.)

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application. For
interactive use of a PL debugger it might be sufficient to tell people to
do "LOAD 'plpgsql_debugger'" before running their function-to-be-tested,
but if you're trying to debug some behavior that only manifests in a large
complicated application, it may be impractical to get the application to
issue such a command. We agreed while discussing this at the PG
Anniversary Conference that providing a GUC variable for this purpose is
probably sufficient, as there are multiple ways that such a variable can
be set on behalf of an application: ALTER USER SET, ALTER DATABASE SET,
"export PGOPTIONS=--var=val" before starting the app, etc. As submitted,
the plugin patch envisions defining a custom GUC variable for each PL and
having each PL add logic to force loading of any shared library mentioned
in its specific variable. I claim however that that is just duplicative,
and that a single backend-wide GUC variable should be sufficient.

Details
-------

I propose adding one function to dfmgr.c,

void ** find_rendezvous_variable(const char *varname)

Internally this function has a per-backend hashtable that associates names
with "void *" data fields. It merely looks up the given name and returns
a pointer to the data field; if the name is not already present then it's
added with a data field initialized to NULL. The hashtable entries
continue to exist for the life of the backend, and are never modified by
the core backend code.

A plugin such as a plpgsql debugger would do this in its _PG_init()
function:

static PLpgSQL_plugin my_plugin = { ... function addresses ... };

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = &my_plugin;

and this in its _PG_fini() function:

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = NULL;

Meanwhile, plpgsql itself would do this in its _PG_init() function:

static PLpgSQL_plugin **plugin_ptr = NULL;

plugin_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");

and in the places where it wants to pass control to the plugin, it'd do

if (*plugin_ptr)
((*plugin_ptr)->function_field) (... args ...);

The principal difference between this proposal and the original patch
is that this way supports unloading of PL plugins; AFAICS, the original
patch would just crash if the plugin were unloaded and then use of plpgsql
continued. Also, this way doesn't depend on the user to correctly specify
an initialization function within the plugin: as long as you load the
right shared library, everything works.

As for forcing the library load to occur, I propose a new GUC variable
"backend_load_libraries" that is much like the postmaster's
preload_libraries, except that the requested library loads happen
at backend start time instead of in the postmaster. Then we need
write and document the code only once, and there are other possible
uses for it besides PL plugins.

We need to think about security issues in defining such a variable:
we don't want unprivileged users to be able to load arbitrary code
into the server. One solution to this is to make the variable SUSET,
ie, you must be superuser to set it. This would make it impossible
for people to debug their own functions without superuser privileges
... but IIRC actual use of the proposed plpgsql debugger requires
superuser privileges too, so this isn't necessarily a fatal objection.
Still, it'd be nice if the PGOPTIONS way of setting the variable could
be used without requiring that the client app itself run as superuser.
Plan B is to restrict which shared libraries can be loaded using the
variable --- there are different ways we could do that, but my suggestion
would be to insist that the shared library come from "$libdir/plugins/".
The DBA is then responsible for allowing only "safe" shared libraries
to be installed in that directory, and therefore we can allow unprivileged
users to set "backend_load_libraries". (We might as well also allow them
to execute LOAD for these libraries, so that plugins could be switched
intra-session without superuser privileges.)

BTW, is anyone up for renaming the existing "preload_libraries" variable
to "postmaster_load_libraries"? This would be more symmetrical with
"backend_load_libraries", and so perhaps clearer about which does what
when.

Comments?

regards, tom lane

#5Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#4)
remote query debugging was: Plugins redux

Tom Lane wrote:

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application. For
interactive use of a PL debugger it might be sufficient to tell people to
do "LOAD 'plpgsql_debugger'" before running their function-to-be-tested,
but if you're trying to debug some behavior that only manifests in a large
complicated application, it may be impractical to get the application to
issue such a command.

A similar issue applies to plain SQL that's not touching any PL: In the
past, I encountered numerous situations where I'd have liked to take a
peek at the current application's queries (on MSSQL, this can be done
with SQL Profiler), but not have constant statement logging. IMHO it
would be a good idea if

- debugging could be turned on and off on-the-fly, e.g. to skip the app
startup phase. Thus I question the statement "GUC variable is sufficient"
- the mechnism would cover plain SQL too.

Regards,
Andreas

#6Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#4)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

On Wed, 2006-08-09 at 12:44 -0400, Tom Lane wrote:

A plugin such as a plpgsql debugger would do this in its _PG_init()
function:

static PLpgSQL_plugin my_plugin = { ... function addresses ... };

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = &my_plugin;

and this in its _PG_fini() function:

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = NULL;

Meanwhile, plpgsql itself would do this in its _PG_init() function:

static PLpgSQL_plugin **plugin_ptr = NULL;

plugin_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");

and in the places where it wants to pass control to the plugin, it'd do

if (*plugin_ptr)
((*plugin_ptr)->function_field) (... args ...);

I know this is a trivial question, but is there some kind of lock that
would prevent the PG_fini for the plpgsql debugger from executing after
"if(*plugin_ptr)" and before "((*plugin_ptr)->function_field)(...)"?

Regards,
Jeff Davis

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#6)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin support)

Jeff Davis <pgsql@j-davis.com> writes:

I know this is a trivial question, but is there some kind of lock that
would prevent the PG_fini for the plpgsql debugger from executing after
"if(*plugin_ptr)" and before "((*plugin_ptr)->function_field)(...)"?

Backends are not multi-threaded.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Pflug (#5)
Re: remote query debugging was: Plugins redux

Andreas Pflug <pgadmin@pse-consulting.de> writes:

Tom Lane wrote:

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application.

A similar issue applies to plain SQL that's not touching any PL: In the
past, I encountered numerous situations where I'd have liked to take a
peek at the current application's queries (on MSSQL, this can be done
with SQL Profiler), but not have constant statement logging. IMHO it
would be a good idea if

- debugging could be turned on and off on-the-fly, e.g. to skip the app
startup phase. Thus I question the statement "GUC variable is sufficient"
- the mechnism would cover plain SQL too.

I'd turn that around: I think you are arguing for a way to change GUC
settings on-the-fly for a single existing session, without cooperation
from the client. You can do that now if you're desperate enough (change
postgresql.conf, SIGHUP just that backend, revert postgresql.conf
change) but I agree something easier would be nice. In any case the
problem needs a generic solution not something that only works for PL
plugin loading, so solving it within GUC seems right to me.

regards, tom lane

#9Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Tom Lane (#8)
Re: remote query debugging was: Plugins redux

Tom Lane wrote:

I'd turn that around: I think you are arguing for a way to change GUC
settings on-the-fly for a single existing session, without cooperation
from the client.

Ok, implemented that way would solve it (partially)
Something like pg_set_guc(pid int4, varname text, value text) would be
fine to set GUC on-the-fly. Could probably be signaled to the target
backend with SIGHUP, but how should the individual parameter be
transmitted, and eventually be retrieved? What about multiple parameters
to be set atomically?

A different aproach: A system table pg_guc, that holds current GUC
settings for each backend.
- on SIGHUP, the backend reload postgresql.conf as usual and writes guc
into pg_guc, unless a "config file override" flag is set.
- if pg_guc.config_override is set, guc are read from the table instead,
and the flag is reset.
- truncate pg_guc on postmaster start/restart

Regards,
Andreas

PS the non-solved part for me is still that log_statement logging would
still go to the standard log, in a less machine-readable way, mixed with
other backend's data and possibly truncated. But that's a different story.

#10korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Tom Lane (#4)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin

ISTM there are two separate bits of functionality that the core backend
needs to provide in support of PL plugins. The first is that we need a
"rendezvous" facility whereby two separately-loaded shared libraries can
connect and exchange data (a struct of function pointers for the immediate
plans for plpgsql, but in general it might be anything). The exchanged
data mustn't be limited to things known in advance to the core backend.
The facility should support loading of communicating libraries in either
order, and it should also allow for unloading of libraries. This seems
easy enough to design: the core backend should basically provide a
hashtable that maps names to "void *" pointer variables, and then two
libraries that want to communicate just agree on a name and the data
structure to be pointed to. (More details below.)

I like it so far...

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application. For
interactive use of a PL debugger it might be sufficient to tell people to
do "LOAD 'plpgsql_debugger'" before running their function-to-be-tested,
but if you're trying to debug some behavior that only manifests in a large
complicated application, it may be impractical to get the application to
issue such a command.

Ok, but you should know that the PL/pgSQL debugger already handles this
without any cooperation from the client.

Loading the PL debugger means that your backend becomes debuggable, not
that it actually starts debugging.

If you want to debug a PL/pgSQL function invoked by some other process
(such as a Web server), you create a "global breakpoint". When a
debuggable process (that is, a process that has loaded the debugger
plugin) trips across the breakpoint, that's when it starts debugging.
(Note: a global breakpoint can identify a particular backend, by process
ID, or it can omit the process ID and trap the first debuggable backend
that trips across the breakpoint).

So, other than the (negligible) performance penalty, its safe to load
the debugger plugin into every backend. (If you've loaded the debugger,
the extra overhead is a single, lightweight-lock protected,
shared-memory hash lookup for each PL function invocation).

We agreed while discussing this at the PG
Anniversary Conference that providing a GUC variable for this purpose is
probably sufficient, as there are multiple ways that such a variable can
be set on behalf of an application: ALTER USER SET, ALTER DATABASE SET,
"export PGOPTIONS=--var=val" before starting the app, etc. As submitted,
the plugin patch envisions defining a custom GUC variable for each PL and
having each PL add logic to force loading of any shared library mentioned
in its specific variable. I claim however that that is just duplicative,
and that a single backend-wide GUC variable should be sufficient.

Ok, so the rendevous variables replace the per-language GUC variables
(from my patch). In effect, a GUC variable is something that a user
would manage, a rendevous variable is something that two (or more)
pieces of code manage.

GUC variables are visible to the user, rendevous variables are visible
to dynamically loaded libraries.

Oh, and rendevous variables are insensitve to load order too.

Nice design.

The principal difference between this proposal and the original patch
is that this way supports unloading of PL plugins; AFAICS, the original
patch would just crash if the plugin were unloaded and then use of plpgsql
continued.

I don't think it could crash because there's no way to unload a plugin
(there is no UNLOAD statement, is there?).

Which reminds me, you haven't proposed a way to unload a shared-library.

We need to think about security issues in defining such a variable:
we don't want unprivileged users to be able to load arbitrary code
into the server. One solution to this is to make the variable SUSET,
ie, you must be superuser to set it. This would make it impossible
for people to debug their own functions without superuser privileges
... but IIRC actual use of the proposed plpgsql debugger requires
superuser privileges too, so this isn't necessarily a fatal objection.

Correct (the debugger currently requires superuser privileges). We'll
probably want to relax that later.

Still, it'd be nice if the PGOPTIONS way of setting the variable could
be used without requiring that the client app itself run as superuser.
Plan B is to restrict which shared libraries can be loaded using the
variable --- there are different ways we could do that, but my suggestion
would be to insist that the shared library come from "$libdir/plugins/".
The DBA is then responsible for allowing only "safe" shared libraries
to be installed in that directory, and therefore we can allow unprivileged
users to set "backend_load_libraries". (We might as well also allow them
to execute LOAD for these libraries, so that plugins could be switched
intra-session without superuser privileges.)

How about a combination of plan A and plan B? Make
backend_load_libraries a USERSET variable, but you can't *add* libraries
outside of $libdir/plugins/ unless you are a superuser.

BTW, is anyone up for renaming the existing "preload_libraries" variable
to "postmaster_load_libraries"? This would be more symmetrical with
"backend_load_libraries", and so perhaps clearer about which does what

Makes sense to me, of course that breaks existing postgresql.conf files.

Do you want me to do any of this coding?

--
Korry Douglas korryd@enterprisedb.com
EnterpriseDB http://www.enterprisedb.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#10)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin

"korryd@enterprisedb.com" <korryd@enterprisedb.com> writes:

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application.

Ok, but you should know that the PL/pgSQL debugger already handles this
without any cooperation from the client.
Loading the PL debugger means that your backend becomes debuggable, not
that it actually starts debugging.

Right, the question is how do you load it, assuming that you do not want
to enable it for every single session in your database.

So, other than the (negligible) performance penalty, its safe to load
the debugger plugin into every backend. (If you've loaded the debugger,
the extra overhead is a single, lightweight-lock protected,
shared-memory hash lookup for each PL function invocation).

I'm not sure you'll be able to convince everyone that the penalty is so
negligible --- any high-rate access to shared memory is potentially very
expensive, see nearby threads for examples. Even if this is affordable
for the debugger, what of more-invasive plugins such as the performance
monitor? I think a credible general-purpose plugin design has to
address the problem of enabling plugins on-the-fly.

I don't think it could crash because there's no way to unload a plugin
(there is no UNLOAD statement, is there?).

What we actually have at the moment is that you can LOAD a library
again, which causes an unload of the prior version and then loading the
new. I suppose this feature was intended to speed library development by
letting you recompile and then update into your existing backend session.
Not sure how many people are using it, but it's there ...

Which reminds me, you haven't proposed a way to unload a shared-library.

This depends on the semantics we want to assign to
backend_shared_libraries --- I could imagine defining it as "if you
remove an entry from the value then we'll unload that library".

How about a combination of plan A and plan B? Make
backend_load_libraries a USERSET variable, but you can't *add* libraries
outside of $libdir/plugins/ unless you are a superuser.

Not sure how easy that is (ie, can we track which part of the list came
from where), but if doable it'd be OK with me. We might have to split
it into two list variables to make it work, and I'm not sure it's worth
the complication.

BTW, is anyone up for renaming the existing "preload_libraries" variable
to "postmaster_load_libraries"? This would be more symmetrical with
"backend_load_libraries", and so perhaps clearer about which does what

Makes sense to me, of course that breaks existing postgresql.conf files.

We already broke them by removing the init-function name...

Do you want me to do any of this coding?

Up to you --- I can do it if you don't want to.

regards, tom lane

#12korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Tom Lane (#11)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

I'm not sure you'll be able to convince everyone that the penalty is so
negligible --- any high-rate access to shared memory is potentially very
expensive, see nearby threads for examples. Even if this is affordable
for the debugger, what of more-invasive plugins such as the performance
monitor? I think a credible general-purpose plugin design has to
address the problem of enabling plugins on-the-fly.

I have no problem with your solution - I think its a very nice design.

I don't think it could crash because there's no way to unload a plugin
(there is no UNLOAD statement, is there?).

What we actually have at the moment is that you can LOAD a library
again, which causes an unload of the prior version and then loading the
new. I suppose this feature was intended to speed library development by
letting you recompile and then update into your existing backend session.
Not sure how many people are using it, but it's there ...

Right, but you still end up with a plugin loaded afterwards so no crash
(of course you could do something stupid like load a new plugin with the
same name that isn't really a plugin).

Which reminds me, you haven't proposed a way to unload a shared-library.

This depends on the semantics we want to assign to
backend_shared_libraries --- I could imagine defining it as "if you
remove an entry from the value then we'll unload that library".

That's what I was thinking too.

How about a combination of plan A and plan B? Make
backend_load_libraries a USERSET variable, but you can't *add* libraries
outside of $libdir/plugins/ unless you are a superuser.

Not sure how easy that is (ie, can we track which part of the list came
from where), but if doable it'd be OK with me. We might have to split
it into two list variables to make it work, and I'm not sure it's worth
the complication.

The GUC assign hook would parse through backend_load_libraries...

for each library,
{
if ( the library is already loaded )
{
// it's not a new library, doesn't matter where it lives,
doesn't matter if we're a superuser

load( library)
}
else
{
// it's a new entry in backed_load_libraries

if( library lives in $libdir/plugins )
load( library )
else
{
if( is_superuser())
load( library )
else
throw an error
}
}
}

We already broke them by removing the init-function name...

Right...

Do you want me to do any of this coding?

Up to you --- I can do it if you don't want to.

I'll take a stab at it... thanks for your help so far.

-- Korry

--
Korry Douglas korryd@enterprisedb.com
EnterpriseDB http://www.enterprisedb.com

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#12)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

"korryd@enterprisedb.com" <korryd@enterprisedb.com> writes:

(there is no UNLOAD statement, is there?).

What we actually have at the moment is that you can LOAD a library
again, which causes an unload of the prior version and then loading the
new.

Right, but you still end up with a plugin loaded afterwards so no crash
(of course you could do something stupid like load a new plugin with the
same name that isn't really a plugin).

That is only true given careful design and implementation of the
hooks. Right now for instance I think it's possible to crash the
backend by doing "LOAD 'plpgsql'" multiple times, because it hooks
into CallXactCallbacks and doesn't unhook. (Now that we have PG_fini
it should be possible to fix that...) Doesn't seem to crash
on the HPUX machine I just tried it on, but maybe HPUX is weird and
doesn't actually remove the old library from the address space?

Anyway I disagree with your proposal to let unprivileged users re-LOAD
random libraries. If they've not been modified to have clean unload
semantics this isn't safe.

regards, tom lane

#14Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

On Wed, 2006-08-09 at 18:04 -0400, Tom Lane wrote:

BTW, is anyone up for renaming the existing "preload_libraries"

variable

to "postmaster_load_libraries"? This would be more symmetrical

with

"backend_load_libraries", and so perhaps clearer about which does

what

Makes sense to me, of course that breaks existing postgresql.conf

files.

We already broke them by removing the init-function name...

How about:

shared_load_libraries (rather than postmaster_load_libraries)
local_load_libraries (rather than backend_load_libraries)

which is more informative for people that don't know/care what a
postmaster or a backend is and what implications those phrases carry.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#15korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Tom Lane (#4)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin

As for forcing the library load to occur, I propose a new GUC variable
"backend_load_libraries" that is much like the postmaster's
preload_libraries, except that the requested library loads happen
at backend start time instead of in the postmaster. Then we need
write and document the code only once, and there are other possible
uses for it besides PL plugins.

Any thoughts about where to put the call to process_backend_libraries()
(the new function to handle backend_load_libraries)?

I'm thinking that it should go in PostgresMain(), just after (before?)
the call to BeginReportingGUCOptions() - by that time, we know whether
we are a superuser and we have processed all GUC options.

Also, should we create an on_proc_exit() handler that would unload all
dynamic libraries (specifically to call the _PG_fini() functions)?

-- Korry

--
Korry Douglas korryd@enterprisedb.com
EnterpriseDB http://www.enterprisedb.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#15)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation plugin

"korryd@enterprisedb.com" <korryd@enterprisedb.com> writes:

Any thoughts about where to put the call to process_backend_libraries()
(the new function to handle backend_load_libraries)?
I'm thinking that it should go in PostgresMain(), just after (before?)
the call to BeginReportingGUCOptions() - by that time, we know whether
we are a superuser and we have processed all GUC options.

Sounds fine.

Also, should we create an on_proc_exit() handler that would unload all
dynamic libraries (specifically to call the _PG_fini() functions)?

Yeah, I thought about that too, but I'm inclined not to do it;
it seems like just excess cycles. The process is quitting anyway,
so the only reason this would be useful is if the library thinks it's
going to update external or shared state during _PG_fini ... and on
the whole that sounds like a bad idea. Besides, if a library really
needs this it can add its own on_proc_exit handler.

regards, tom lane

#17korryd@enterprisedb.com
korryd@enterprisedb.com
In reply to: Tom Lane (#16)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

Also, should we create an on_proc_exit() handler that would unload all
dynamic libraries (specifically to call the _PG_fini() functions)?

Yeah, I thought about that too, but I'm inclined not to do it;
it seems like just excess cycles. The process is quitting anyway,
so the only reason this would be useful is if the library thinks it's
going to update external or shared state during _PG_fini ... and on
the whole that sounds like a bad idea. Besides, if a library really
needs this it can add its own on_proc_exit handler.

It seems a little dangerous for a dynamic library to register an
on_proc_exit() handler. If we ever add support for unloading a dynamic
library, we'll have to add an unregister_on_proc_exit() too. Otherwise,
a dynamic library might register a function pointer with on_proc_exit()
and then leave a dangling pointer when it gets unloaded.

Given that, I assume you don't feel the need to unload old shared
libraries if the user (a superuser) removes an entry from
backend_load_libraries, right?

In fact, it looks _PG_fini() is only called if you *reload* a library,
unless I'm missing something somwhere.

--
Korry Douglas korryd@enterprisedb.com
EnterpriseDB http://www.enterprisedb.com

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: korryd@enterprisedb.com (#17)
Re: Plugins redux (was Re: [PATCHES] PL instrumentation

"korryd@enterprisedb.com" <korryd@enterprisedb.com> writes:

It seems a little dangerous for a dynamic library to register an
on_proc_exit() handler. If we ever add support for unloading a dynamic
library, we'll have to add an unregister_on_proc_exit() too.

True, but there might be other uses for such a thing anyway. I can't
see any objection to adding that routine.

In fact, it looks _PG_fini() is only called if you *reload* a library,
unless I'm missing something somwhere.

Sure, because we don't currently have a routine for unloading a library
without immediately reloading it.

regards, tom lane