PL/Python initialization cleanup

Started by Peter Eisentraut12 days ago3 messages
#1Peter Eisentraut
peter@eisentraut.org
4 attachment(s)

As I was working through steps to make PL/Python more thread-safe, I
noticed that the initialization code of PL/Python is pretty messy. I
think some of this has grown while both Python 2 and 3 were supported,
because they required different initialization steps, and we had some
defenses against accidentally running both at the same time. But that
is over, and right now a lot of this doesn't make sense anymore. For
example, the function PLy_init_interp() said "Initialize the Python
interpreter ..." but it didn't actually do this, and PLy_init_plpy()
said "initialize plpy module" but it didn't do that either (or at least
they used the term "initialize" in non-standard ways).

Here are some patches to clean this up. After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy(). (For the thread-safety job, the plpy module
initialization will need to be rewritten using a different API. That's
why I'm keen to have it clearly separated.) I also tried to add more
comments and make existing comments more precise. There was also some
apparently obsolete or redundant code that could be deleted.

Surely, all of this will need some more rounds of careful scrutiny, but
I think the overall code arrangement is correct and an improvement.

Attachments:

v1-0001-plpython-Remove-commented-out-code.patchtext/plain; charset=UTF-8; name=v1-0001-plpython-Remove-commented-out-code.patchDownload
From d07fa3f6d7f729ca457a4db1060ef934fa5ba2ef Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 30 Dec 2025 22:17:57 +0100
Subject: [PATCH v1 1/4] plpython: Remove commented out code

This code has been commented out since the first commit of plpython.
It doesn't seem worth keeping.
---
 src/pl/plpython/plpy_plpymodule.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index 89931612c5b..66781291519 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -153,8 +153,6 @@ PLy_init_plpy(void)
 
 	PyModule_Create(&PLy_module);
 
-	/* PyDict_SetItemString(plpy, "PlanType", (PyObject *) &PLy_PlanType); */
-
 	/*
 	 * initialize main module, and add plpy
 	 */

base-commit: ffdcc9c638eeb71d2e6990e27f91623736603b58
-- 
2.52.0

v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patchtext/plain; charset=UTF-8; name=v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patchDownload
From a174797efde3b5c0f7ead1fa2a89fde0358ceeed Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 30 Dec 2025 22:17:57 +0100
Subject: [PATCH v1 2/4] plpython: Clean up PyModule_AddObject() uses

The comments "PyModule_AddObject does not add a refcount to the
object, for some odd reason" seem distracting.  Arguably, this
behavior is expected, not odd.  Also, the additional references
created by the existing code do no seem necessary.  But we should
clean up the reference in the error case, as suggested by the Python
documentation.
---
 src/pl/plpython/plpy_plpymodule.c | 39 ++++++++++++++-----------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index 66781291519..194739c3db6 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -172,18 +172,6 @@ PLy_add_exceptions(PyObject *plpy)
 	PyObject   *excmod;
 	HASHCTL		hash_ctl;
 
-	excmod = PyModule_Create(&PLy_exc_module);
-	if (excmod == NULL)
-		PLy_elog(ERROR, "could not create the spiexceptions module");
-
-	/*
-	 * PyModule_AddObject does not add a refcount to the object, for some odd
-	 * reason; we must do that.
-	 */
-	Py_INCREF(excmod);
-	if (PyModule_AddObject(plpy, "spiexceptions", excmod) < 0)
-		PLy_elog(ERROR, "could not add the spiexceptions module");
-
 	PLy_exc_error = PLy_create_exception("plpy.Error", NULL, NULL,
 										 "Error", plpy);
 	PLy_exc_fatal = PLy_create_exception("plpy.Fatal", NULL, NULL,
@@ -191,12 +179,22 @@ PLy_add_exceptions(PyObject *plpy)
 	PLy_exc_spi_error = PLy_create_exception("plpy.SPIError", NULL, NULL,
 											 "SPIError", plpy);
 
+	excmod = PyModule_Create(&PLy_exc_module);
+	if (excmod == NULL)
+		PLy_elog(ERROR, "could not create the spiexceptions module");
+
 	hash_ctl.keysize = sizeof(int);
 	hash_ctl.entrysize = sizeof(PLyExceptionEntry);
 	PLy_spi_exceptions = hash_create("PL/Python SPI exceptions", 256,
 									 &hash_ctl, HASH_ELEM | HASH_BLOBS);
 
 	PLy_generate_spi_exceptions(excmod, PLy_exc_spi_error);
+
+	if (PyModule_AddObject(plpy, "spiexceptions", excmod) < 0)
+	{
+		Py_XDECREF(excmod);
+		PLy_elog(ERROR, "could not add the spiexceptions module");
+	}
 }
 
 /*
@@ -212,19 +210,18 @@ PLy_create_exception(char *name, PyObject *base, PyObject *dict,
 	if (exc == NULL)
 		PLy_elog(ERROR, NULL);
 
-	/*
-	 * PyModule_AddObject does not add a refcount to the object, for some odd
-	 * reason; we must do that.
-	 */
-	Py_INCREF(exc);
-	PyModule_AddObject(mod, modname, exc);
-
 	/*
 	 * The caller will also store a pointer to the exception object in some
-	 * permanent variable, so add another ref to account for that.  This is
-	 * probably excessively paranoid, but let's be sure.
+	 * permanent variable, so add another ref to account for that.
 	 */
 	Py_INCREF(exc);
+
+	if (PyModule_AddObject(mod, modname, exc) < 0)
+	{
+		Py_XDECREF(exc);
+		PLy_elog(ERROR, "could not add exceptions %s", name);
+	}
+
 	return exc;
 }
 
-- 
2.52.0

v1-0003-plpython-Remove-duplicate-PyModule_Create.patchtext/plain; charset=UTF-8; name=v1-0003-plpython-Remove-duplicate-PyModule_Create.patchDownload
From a9d8b839306be227a1adce0c4c3605fc3114736b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 30 Dec 2025 22:17:57 +0100
Subject: [PATCH v1 3/4] plpython: Remove duplicate PyModule_Create()

This seems to have existed like this since Python 3 support was
added (commit dd4cd55c158), but it's unclear what this second call is
supposed to accomplish.
---
 src/pl/plpython/plpy_plpymodule.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index 194739c3db6..2d76c6a5573 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -151,8 +151,6 @@ PLy_init_plpy(void)
 	PLy_subtransaction_init_type();
 	PLy_cursor_init_type();
 
-	PyModule_Create(&PLy_module);
-
 	/*
 	 * initialize main module, and add plpy
 	 */
-- 
2.52.0

v1-0004-plpython-Streamline-initialization.patchtext/plain; charset=UTF-8; name=v1-0004-plpython-Streamline-initialization.patchDownload
From 247ea7aa96df56253754e88c17974488ea09e39b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 30 Dec 2025 22:17:57 +0100
Subject: [PATCH v1 4/4] plpython: Streamline initialization

The initialization of PL/Python (the Python interpreter, the global
state, the plpy module) was arranged confusingly across different
functions with unclear and confusing boundaries.  For example,
PLy_init_interp() said "Initialize the Python interpreter ..." but it
didn't actually do this, and PLy_init_plpy() said "initialize plpy
module" but it didn't do that either.  After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy().
---
 src/pl/plpython/plpy_main.c       | 77 ++++++++++++++++---------------
 src/pl/plpython/plpy_plpymodule.c | 25 +---------
 src/pl/plpython/plpy_plpymodule.h |  1 -
 3 files changed, 40 insertions(+), 63 deletions(-)

diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 7673b5eca7b..482ca7a1ada 100644
--- a/src/pl/plpython/plpy_main.c
+++ b/src/pl/plpython/plpy_main.c
@@ -39,11 +39,9 @@ PG_FUNCTION_INFO_V1(plpython3_call_handler);
 PG_FUNCTION_INFO_V1(plpython3_inline_handler);
 
 
-static void PLy_initialize(void);
 static PLyTrigType PLy_procedure_is_trigger(Form_pg_proc procStruct);
 static void plpython_error_callback(void *arg);
 static void plpython_inline_error_callback(void *arg);
-static void PLy_init_interp(void);
 
 static PLyExecutionContext *PLy_push_execution_context(bool atomic_context);
 static void PLy_pop_execution_context(void);
@@ -58,25 +56,52 @@ static PLyExecutionContext *PLy_execution_contexts = NULL;
 void
 _PG_init(void)
 {
-	pg_bindtextdomain(TEXTDOMAIN);
+	PyObject   *main_mod;
+	PyObject   *main_dict;
+	PyObject   *GD;
+	PyObject   *plpy_mod;
 
-	PLy_initialize();
-}
+	pg_bindtextdomain(TEXTDOMAIN);
 
-/*
- * Perform one-time setup of PL/Python.
- */
-static void
-PLy_initialize(void)
-{
+	/* Add plpy to table of built-in modules. */
 	PyImport_AppendInittab("plpy", PyInit_plpy);
+
+	/* Initialize Python interpreter. */
 	Py_Initialize();
-	PyImport_ImportModule("plpy");
-	PLy_init_interp();
-	PLy_init_plpy();
+
+	main_mod = PyImport_AddModule("__main__");
+	if (main_mod == NULL || PyErr_Occurred())
+		PLy_elog(ERROR, "could not get \"__main__\" module");
+	Py_INCREF(main_mod);
+
+	main_dict = PyModule_GetDict(main_mod);
+
+	/*
+	 * Set up GD.
+	 */
+	GD = PyDict_New();
+	if (GD == NULL)
+		PLy_elog(ERROR, NULL);
+	PyDict_SetItemString(main_dict, "GD", GD);
+
+	/*
+	 * Import plpy.
+	 */
+	plpy_mod = PyImport_ImportModule("plpy");
+	if (plpy_mod == NULL)
+		PLy_elog(ERROR, "could not import \"plpy\" module");
+	PyDict_SetItemString(main_dict, "plpy", plpy_mod);
+	if (PyErr_Occurred())
+		PLy_elog(ERROR, "could not import \"plpy\" module");
+
+	Py_DECREF(main_mod);
+
 	if (PyErr_Occurred())
 		PLy_elog(FATAL, "untrapped error in initialization");
 
+	Py_INCREF(main_dict);
+	PLy_interp_globals = main_dict;
+
 	init_procedure_caches();
 
 	explicit_subtransactions = NIL;
@@ -84,30 +109,6 @@ PLy_initialize(void)
 	PLy_execution_contexts = NULL;
 }
 
-/*
- * This should be called only once, from PLy_initialize. Initialize the Python
- * interpreter and global data.
- */
-static void
-PLy_init_interp(void)
-{
-	static PyObject *PLy_interp_safe_globals = NULL;
-	PyObject   *mainmod;
-
-	mainmod = PyImport_AddModule("__main__");
-	if (mainmod == NULL || PyErr_Occurred())
-		PLy_elog(ERROR, "could not import \"__main__\" module");
-	Py_INCREF(mainmod);
-	PLy_interp_globals = PyModule_GetDict(mainmod);
-	PLy_interp_safe_globals = PyDict_New();
-	if (PLy_interp_safe_globals == NULL)
-		PLy_elog(ERROR, NULL);
-	PyDict_SetItemString(PLy_interp_globals, "GD", PLy_interp_safe_globals);
-	Py_DECREF(mainmod);
-	if (PLy_interp_globals == NULL || PyErr_Occurred())
-		PLy_elog(ERROR, "could not initialize globals");
-}
-
 Datum
 plpython3_validator(PG_FUNCTION_ARGS)
 {
diff --git a/src/pl/plpython/plpy_plpymodule.c b/src/pl/plpython/plpy_plpymodule.c
index 2d76c6a5573..e24ad02e086 100644
--- a/src/pl/plpython/plpy_plpymodule.c
+++ b/src/pl/plpython/plpy_plpymodule.c
@@ -133,35 +133,12 @@ PyInit_plpy(void)
 
 	PLy_add_exceptions(m);
 
-	return m;
-}
-
-void
-PLy_init_plpy(void)
-{
-	PyObject   *main_mod,
-			   *main_dict,
-			   *plpy_mod;
-
-	/*
-	 * initialize plpy module
-	 */
 	PLy_plan_init_type();
 	PLy_result_init_type();
 	PLy_subtransaction_init_type();
 	PLy_cursor_init_type();
 
-	/*
-	 * initialize main module, and add plpy
-	 */
-	main_mod = PyImport_AddModule("__main__");
-	main_dict = PyModule_GetDict(main_mod);
-	plpy_mod = PyImport_AddModule("plpy");
-	if (plpy_mod == NULL)
-		PLy_elog(ERROR, "could not import \"plpy\" module");
-	PyDict_SetItemString(main_dict, "plpy", plpy_mod);
-	if (PyErr_Occurred())
-		PLy_elog(ERROR, "could not import \"plpy\" module");
+	return m;
 }
 
 static void
diff --git a/src/pl/plpython/plpy_plpymodule.h b/src/pl/plpython/plpy_plpymodule.h
index 1ca3823daf2..88f902346a6 100644
--- a/src/pl/plpython/plpy_plpymodule.h
+++ b/src/pl/plpython/plpy_plpymodule.h
@@ -13,6 +13,5 @@ extern HTAB *PLy_spi_exceptions;
 
 
 PyMODINIT_FUNC PyInit_plpy(void);
-extern void PLy_init_plpy(void);
 
 #endif							/* PLPY_PLPYMODULE_H */
-- 
2.52.0

#2Chao Li
li.evan.chao@gmail.com
In reply to: Peter Eisentraut (#1)
Re: PL/Python initialization cleanup

On Dec 31, 2025, at 16:47, Peter Eisentraut <peter@eisentraut.org> wrote:

As I was working through steps to make PL/Python more thread-safe, I noticed that the initialization code of PL/Python is pretty messy. I think some of this has grown while both Python 2 and 3 were supported, because they required different initialization steps, and we had some defenses against accidentally running both at the same time. But that is over, and right now a lot of this doesn't make sense anymore. For example, the function PLy_init_interp() said "Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy module" but it didn't do that either (or at least they used the term "initialize" in non-standard ways).

Here are some patches to clean this up. After this change, all the global initialization is called directly from _PG_init(), and the plpy module initialization is all called from its registered initialization function PyInit_plpy(). (For the thread-safety job, the plpy module initialization will need to be rewritten using a different API. That's why I'm keen to have it clearly separated.) I also tried to add more comments and make existing comments more precise. There was also some apparently obsolete or redundant code that could be deleted.

Surely, all of this will need some more rounds of careful scrutiny, but I think the overall code arrangement is correct and an improvement.
<v1-0001-plpython-Remove-commented-out-code.patch><v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patch><v1-0003-plpython-Remove-duplicate-PyModule_Create.patch><v1-0004-plpython-Streamline-initialization.patch>

I just did an eyeball review. Overall looks good to me. The cleanup, as explained in the patch email, makes sense to me. Only a nit comment on 0002:

1 - 0002
```
+	if (PyModule_AddObject(mod, modname, exc) < 0)
+	{
+		Py_XDECREF(exc);
+		PLy_elog(ERROR, "could not add exceptions %s", name);
+	}
```

Plural “exceptions” is a little confusing. What about “could not add exception object”?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#3Matheus Alcantara
matheusssilv97@gmail.com
In reply to: Peter Eisentraut (#1)
Re: PL/Python initialization cleanup

On Wed Dec 31, 2025 at 5:47 AM -03, Peter Eisentraut wrote:

As I was working through steps to make PL/Python more thread-safe, I
noticed that the initialization code of PL/Python is pretty messy. I
think some of this has grown while both Python 2 and 3 were supported,
because they required different initialization steps, and we had some
defenses against accidentally running both at the same time. But that
is over, and right now a lot of this doesn't make sense anymore. For
example, the function PLy_init_interp() said "Initialize the Python
interpreter ..." but it didn't actually do this, and PLy_init_plpy()
said "initialize plpy module" but it didn't do that either (or at least
they used the term "initialize" in non-standard ways).

Here are some patches to clean this up. After this change, all the
global initialization is called directly from _PG_init(), and the plpy
module initialization is all called from its registered initialization
function PyInit_plpy(). (For the thread-safety job, the plpy module
initialization will need to be rewritten using a different API. That's
why I'm keen to have it clearly separated.) I also tried to add more
comments and make existing comments more precise. There was also some
apparently obsolete or redundant code that could be deleted.

Surely, all of this will need some more rounds of careful scrutiny, but
I think the overall code arrangement is correct and an improvement.

0001, 0003 and 0004 looks good to me, just a small comment on 0002:

-	/*
-	 * PyModule_AddObject does not add a refcount to the object, for some odd
-	 * reason; we must do that.
-	 */
-	Py_INCREF(exc);
-	PyModule_AddObject(mod, modname, exc);
-
 	/*
 	 * The caller will also store a pointer to the exception object in some
-	 * permanent variable, so add another ref to account for that.  This is
-	 * probably excessively paranoid, but let's be sure.
+	 * permanent variable, so add another ref to account for that.
 	 */
 	Py_INCREF(exc);

The later code comment say "so add another ref to account for that", but
you've removed the previous Py_INCREF() call. The returned object
PyErr_NewException() already have a refcount increased for usage? If
that's not the case I think that the "add another ref..." don't seems
correct because IIUC we are increasing the ref count for the first time,
so there is no "another" refcount being increased.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com