Tightening behaviour for non-immutable behaviour in immutable functions
Recently I had someone complaining about a pg_restore failure and I
believe we semi-regularly get complaints that are similar -- though
I'm having trouble searching for them because the keywords "dump
restore failure" are pretty generic.
The root cause here -- and I believe for a lot of users -- are
functions that are declared IMMUTABLE but are not for reasons that
aren't obvious to the user. Indeed poking at this more carefully I
think it's downright challenging to write an IMMUTABLE function at
all. I suspect *most* users, perhaps even nearly all users, who write
functions intending them to be immutable are actually not really as
successful as they believe.
The biggest culprit is of course search_path. Afaics it's nigh
impossible to write any non-trivial immutable function without just
setting the search_path GUC on the function. And there's nothing
Postgres that requires that. I don't even see anything in the docs
recommending it.
Many users probably always run with the same search_path so in
practice they're probably mostly safe. But one day they could insert
data with a different search path with a different function definition
in their path and corrupt their index which would be.... poor... Or as
in my user they could discover the problem only in the middle of an
upgrade which is a terrible time to discover it.
I would suggest we should probably at the very least warn users if
they create an immutable function that doesn't have search_path set
for the function. I would actually prefer to make it an error but that
brings in compatibility issues. Perhaps it could be optional.
But putting a GUC on a function imposes a pretty heavy performance
cost. I'm not sure how bad it is compared to running plpgsql code let
alone other languages but IIUC it invalidates some catalog caches
which for something happening repeatedly in, e.g. a data load would be
pretty bad.
It would be nice to have a way to avoid the performance cost and I see
two options.
Thinking of plpgsql here, we already run the raw parser on all sql
when the function is defined. We could somehow check whether the
raw_parser found any non-schema-qualified references. This looks like
it would be awkward but doable. That would allow users to write
non-search_path-dependent code and if postgres doesn't warn they would
know they've done it properly. It would still put quite a burden on
users, especially when it comes to operators...
Or alternatively we could offer lexical scoping so that all objects
are looked up at parse time and the fully qualified reference is
stored instead of the non-qualified reference. That would be more
similar to how views and other object references are handled.
I suppose there's a third option that we could provide something which
instead of *setting* the guc when a function is entered just verifies
that guc is set as expected. That way the function would simply throw
an error if search_path is "incorrect" and not have to invalidate any
caches. That would at least avoid index corruption but not guarantee
dump/reload would work.
--
greg
On Jun 8, 2022, at 2:42 PM, Greg Stark <stark@mit.edu> wrote:
Thinking of plpgsql here, we already run the raw parser on all sql
when the function is defined. We could somehow check whether the
raw_parser found any non-schema-qualified references. This looks like
it would be awkward but doable. That would allow users to write
non-search_path-dependent code and if postgres doesn't warn they would
know they've done it properly. It would still put quite a burden on
users, especially when it comes to operators...Or alternatively we could offer lexical scoping so that all objects
are looked up at parse time and the fully qualified reference is
stored instead of the non-qualified reference. That would be more
similar to how views and other object references are handled.
I like the general idea, but I'm confused why you are limiting the analysis to search path resolution. The following is clearly wrong, but not for that reason:
create function public.identity () returns double precision as $$
select random()::integer;
$$
language sql
immutable
parallel safe
-- set search_path to 'pg_catalog'
;
Uncommenting that last bit wouldn't make it much better.
Isn't the more general approach to look for non-immutable (or non-stable) operations, with object resolution just one type of non-immutable operation? Perhaps raise an error when you can prove the given function's provolatile marking is wrong, and a warning when you cannot prove the marking is correct? That would tend to give warnings for polymorphic functions that use functions or operators over the polymorphic types, or which use dynamic sql, but maybe that's ok. Those functions probably deserve closer scrutiny anyway.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, 8 Jun 2022 at 19:39, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I like the general idea, but I'm confused why you are limiting the analysis to search path resolution. The following is clearly wrong, but not for that reason:
create function public.identity () returns double precision as $$
select random()::integer;
Well.... I did originally think it would be necessary to consider
cases like this. (or even just cases where you call a user function
that is not immutable because of search_path dependencies).
But there are two problems:
Firstly, that would be a lot harder to implement. We don't actually do
any object lookups in plpgsql when defining plpgsql functions. So this
would be a much bigger change.
But secondly, there are a lot of cases where non-immutable functions
*are* immutable if they're used carefully. to_char() is obviously the
common example, but it's perfectly safe if you set the timezone or
other locale settings or if your format string doesn't actually depend
on any settings.
Similarly, a user function that is non-immutable only due to a
dependency on search_path *would* be safe to call from within an
immutable function if that function does set search_path. The
search_path would be inherited alleviating the problem.
Even something like random() could be safely used in an immutable
function as long as it doesn't actually change the output -- say if it
just logs diagnostic messages as a result?
Generally I think the idea is that the user *is* responsible for
writing immutable functions carefully to hide any non-deterministic
behaviour from the code they're calling. But that does raise the
question of why to focus on search_path.
I guess I'm just saying my goal isn't to *prove* the code is correct.
The user is still responsible for asserting it's correct. I just want
to detect cases where I can prove (or at least show it's likely that)
it's *not* correct.
--
greg
On Thu, Jun 9, 2022 at 12:39 PM Greg Stark <stark@mit.edu> wrote:
Generally I think the idea is that the user *is* responsible for
writing immutable functions carefully to hide any non-deterministic
behaviour from the code they're calling. But that does raise the
question of why to focus on search_path.I guess I'm just saying my goal isn't to *prove* the code is correct.
The user is still responsible for asserting it's correct. I just want
to detect cases where I can prove (or at least show it's likely that)
it's *not* correct.
Right. It's virtually impossible to prove that, for many reasons, so
the final responsibility must lie with the user-defined code.
Presumably there is still significant value in detecting cases where
some user-defined code provably does the wrong thing. Especially by
targeting mistakes that experience has shown are relatively common.
That's what the search_path case seems like to me.
If somebody else wants to write another patch that adds on that,
great. If not, then having this much still seems useful.
--
Peter Geoghegan
On Thu, 9 Jun 2022 at 16:12, Peter Geoghegan <pg@bowt.ie> wrote:
Presumably there is still significant value in detecting cases where
some user-defined code provably does the wrong thing. Especially by
targeting mistakes that experience has shown are relatively common.
That's what the search_path case seems like to me.
By "relatively common" I think we're talking "nigh universal". Afaics
there are no warnings in the docs about worrying about search_path on
IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit
I wasn't aware myself of all the gotchas described there.
For that matter.... the gotchas are a bit .... convoluted....
If you leave out pg_catalog from search_path that's fine but if you
leave out pg_temp that's a security disaster. If you put pg_catalog in
it better be first or else it might be ok or might be a security issue
but when you put pg_temp in it better be last or else it's
*definitely* a disaster. $user is in search_path by default and that's
fine for SECURITY DEFINER functions but it's a disaster for IMMUTABLE
functions...
I kind of feel like perhaps all the implicit stuff is unnecessary
baroque frills. We should just put pg_temp and pg_catalog into the
default postgresql.conf search_path and assume users will keep them
there. And I'm not sure why we put pg_temp *first* -- I mean it sort
of seems superficially sensible but it doesn't seem like there's any
real reason to name your temporary tables the same as your permanent
ones so why not just always add it last?
I've attached a very WIP patch that implements the checks I'm leaning
towards making (as elogs currently). They cause a ton of regression
failures so probably we need to think about how to reduce the pain for
users upgrading...
Perhaps we should automatically fix up the current search patch and
attach it to functions where necessary for users instead of just
whingeing at them....
Attachments:
0001-Add-checks-on-search_path-for-IMMUTABLE-and-SECURITY.patchtext/x-patch; charset=US-ASCII; name=0001-Add-checks-on-search_path-for-IMMUTABLE-and-SECURITY.patchDownload
From 5cfccb714e7d9fd8f623700701e960abd54af25c Mon Sep 17 00:00:00 2001
From: Greg Stark <stark@mit.edu>
Date: Mon, 13 Jun 2022 16:47:49 -0400
Subject: [PATCH] Add checks on search_path for IMMUTABLE and SECURITY DEFINER
functions
---
src/backend/commands/functioncmds.c | 76 +++++++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 59 +++++++++++++++++++++-
src/include/utils/guc.h | 1 +
3 files changed, 135 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 00a6d282cf..13d5c668b7 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -675,6 +675,78 @@ update_proconfig_value(ArrayType *a, List *set_items)
return a;
}
+/* We only enforce constraints on IMMUTABLE functions (because they can be
+ * used in indexes and search_path hacking can corrupt indexes for everyone)
+ * or SECURITY DEFINER functions (for obvious reasons). For these functions we
+ * enforce that the proconfig GUC settings should include search_path and that
+ * that search path should follow the recommendations listed at
+ * https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
+ */
+static void
+verify_proconfig_value(ArrayType *a, char provolatile, bool prosecdef) {
+ const char *search_path = NULL;
+
+ if (provolatile != PROVOLATILE_IMMUTABLE && !prosecdef) {
+ return;
+ }
+
+ if (a != NULL) {
+ search_path = GUCArrayLookup(a, "search_path");
+ }
+
+ if (!search_path)
+ {
+ elog(WARNING, "IMMUTABLE and SECURITY DEFINER functions must have search_path set");
+ }
+
+ /* XXX This logic should perhaps be moved to namespace.c XXX */
+
+ char *rawname;
+ List *namelist;
+ ListCell *l;
+ /* Need a modifiable copy of namespace_search_path string */
+ rawname = pstrdup(namespace_search_path);
+
+ /* Parse string into list of identifiers */
+ if (!SplitIdentifierString(rawname, ',', &namelist))
+ {
+ /* syntax error in name list */
+ /* this should not happen if GUC checked check_search_path */
+ elog(ERROR, "invalid list syntax in proconfig");
+ }
+
+ bool has_dollar_user = false;
+ bool pg_catalog_first = true;
+ bool pg_temp_last = false;
+ bool is_first = true;
+ foreach(l, namelist)
+ {
+ char *curname = (char *) lfirst(l);
+ Oid namespaceId;
+
+ /* It's not last unless it's last */
+ pg_temp_last = false;
+
+ if (strcmp(curname, "$user") == 0)
+ has_dollar_user = true;
+ if (strcmp(curname, "pg_catalog") == 0 && !is_first)
+ pg_catalog_first = false;
+ if (strcmp(curname,"pg_temp") == 0)
+ pg_temp_last = true;
+
+ is_first = false;
+ }
+
+ if (has_dollar_user)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should not have a search path including $user");
+
+ if (!pg_catalog_first)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should have pg_catalog first in search_path (or omit it implicitly placing it first)");
+
+ if (!pg_temp_last)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should explicitly place pg_temp last or else it's implicitly first");
+}
+
static Oid
interpret_func_support(DefElem *defel)
{
@@ -1161,6 +1233,8 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
}
}
+ verify_proconfig_value(proconfig, volatility, security);
+
/*
* Convert remaining parameters of CREATE to form wanted by
* ProcedureCreate.
@@ -1490,6 +1564,8 @@ AlterFunction(ParseState *pstate, AlterFunctionStmt *stmt)
/* update according to each SET or RESET item, left to right */
a = update_proconfig_value(a, set_items);
+ verify_proconfig_value(a, procForm->prosecdef, procForm->provolatile);
+
/* update the tuple */
memset(repl_repl, false, sizeof(repl_repl));
repl_repl[Anum_pg_proc_proconfig - 1] = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a7cc49898b..bcfe8447d8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11435,7 +11435,6 @@ ProcessGUCArray(ArrayType *array,
}
}
-
/*
* Add an entry to an option array. The array parameter may be NULL
* to indicate the current table entry is NULL.
@@ -11514,6 +11513,64 @@ GUCArrayAdd(ArrayType *array, const char *name, const char *value)
return a;
}
+const char *
+GUCArrayLookup(ArrayType *array, const char *search_name)
+{
+ int i;
+
+ Assert(array != NULL);
+ Assert(ARR_ELEMTYPE(array) == TEXTOID);
+ Assert(ARR_NDIM(array) == 1);
+ Assert(ARR_LBOUND(array)[0] == 1);
+
+ for (i = 1; i <= ARR_DIMS(array)[0]; i++)
+ {
+ Datum d;
+ bool isnull;
+ char *s;
+ char *name;
+ char *value;
+ char *valuecopy;
+
+ d = array_ref(array, 1, &i,
+ -1 /* varlenarray */ ,
+ -1 /* TEXT's typlen */ ,
+ false /* TEXT's typbyval */ ,
+ TYPALIGN_INT /* TEXT's typalign */ ,
+ &isnull);
+
+ if (isnull)
+ continue;
+
+ s = TextDatumGetCString(d);
+
+ ParseLongOption(s, &name, &value);
+ if (!name)
+ {
+ elog(WARNING, "parselongoption returned null name");
+ continue;
+ }
+ else if (strcmp(name, search_name))
+ {
+ continue;
+ }
+ else if (!value)
+ {
+ elog(WARNING, "parselongoption returned null value");
+ return NULL;
+ continue;
+ }
+ /* free malloc'd strings immediately to avoid leak upon error */
+ valuecopy = pstrdup(value);
+ free(name);
+ free(value);
+ pfree(s);
+
+ return valuecopy;
+ }
+ return NULL;
+}
+
/*
* Delete an entry from an option array. The array parameter may be NULL
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index d5976ecbfb..24dca3c7e6 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -404,6 +404,7 @@ extern char *ExtractSetVariableArgs(VariableSetStmt *stmt);
extern void ProcessGUCArray(ArrayType *array,
GucContext context, GucSource source, GucAction action);
extern ArrayType *GUCArrayAdd(ArrayType *array, const char *name, const char *value);
+extern const char * GUCArrayLookup(ArrayType *array, const char *search_name);
extern ArrayType *GUCArrayDelete(ArrayType *array, const char *name);
extern ArrayType *GUCArrayReset(ArrayType *array);
--
2.36.1
On Mon, Jun 13, 2022 at 1:51 PM Greg Stark <stark@mit.edu> wrote:
By "relatively common" I think we're talking "nigh universal". Afaics
there are no warnings in the docs about worrying about search_path on
IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit
I wasn't aware myself of all the gotchas described there.
I didn't realize that it was that bad. Even if it's only 10% as bad as
you say, it would still be very valuable to do something about it
(ideally with an approach that is non-invasive).
--
Peter Geoghegan
On Mon, Jun 13, 2022 at 06:41:17PM -0700, Peter Geoghegan wrote:
On Mon, Jun 13, 2022 at 1:51 PM Greg Stark <stark@mit.edu> wrote:
By "relatively common" I think we're talking "nigh universal". Afaics
there are no warnings in the docs about worrying about search_path on
IMMUTABLE functions. There is for SECURITY DEFINER but I have to admit
I wasn't aware myself of all the gotchas described there.I didn't realize that it was that bad. Even if it's only 10% as bad as
you say, it would still be very valuable to do something about it
(ideally with an approach that is non-invasive).
Having checks implemented so as users cannot bite themselves back is a
good idea in the long term, but I have also seen cases where abusing
of immutable functions was useful:
- Enforce the push down of function expressions to remote server.
- Regression tests. Just a few weeks ago I have played with an
advisory lock within an index expression.
Perhaps I never should have done what the first point was doing
anyway, but having a way to disable any of that, be it just a
developer option for the purpose of some regression tests, would be
nice. Skimming quickly through the patch, any of the checks related
to search_path would not apply to the fancy cases I saw, though.
--
Michael
On Mon, 13 Jun 2022 at 16:50, Greg Stark <stark@mit.edu> wrote:
For that matter.... the gotchas are a bit .... convoluted....
Perhaps we should automatically fix up the current search patch and
attach it to functions where necessary for users instead of just
whingeing at them....
So I reviewed my own patch and.... it was completely broken... I fixed
it to actually check the right variables.
I also implemented the other idea above of actually fixing up
search_path in proconfig for the user by default. I think this is
going to be more practical.
The problem with expecting the user to provide search_path is that
they probably aren't today so the warnings would be firing for
everything...
Providing a fixed up search_path for users would give them a smoother
upgrade process where we can give a warning only if the search_path is
changed substantively which is much less likely.
I'm still quite concerned about the performance impact of having
search_path on so many functions. It causes a cache flush which could
be pretty painful on a frequently called function such as one in an
index expression during a data load....
The other issue is that having proconfig set does prevent these
functions from being inlined which can be seen in the regression test
as seen below. I'm not sure how big a problem this is for users.
Inlining is important for many use cases I think. Maybe we can go
ahead and inline things even if they have a proconfig if it matches
the proconfig on the caller? Or maybe even check if we get the same
objects from both search_paths?
Of course this patch is still very WIP. Only one or the other function
makes sense to keep. And I'm not opposed to having a GUC to
enable/disable the enforcement or warnings. And the code itself needs
to be cleaned up with parts of it moving to guc.c and/or namespace.c.
Example of regression tests noticing that immutable functions with
proconfig set become non-inlineable:
diff -U3 /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out
/home/stark/src/postgresql/src/test/regress/results/rangefuncs.out
--- /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out
2022-01-17 12:01:54.958628564 -0500
+++ /home/stark/src/postgresql/src/test/regress/results/rangefuncs.out
2022-06-16 02:16:47.589703966 -0400
@@ -1924,14 +1924,14 @@
select * from array_to_set(array['one', 'two']) as t(f1 point,f2 text);
ERROR: return type mismatch in function declared to return record
DETAIL: Final statement returns integer instead of point at column 1.
-CONTEXT: SQL function "array_to_set" during inlining
+CONTEXT: SQL function "array_to_set" during startup
explain (verbose, costs off)
select * from array_to_set(array['one', 'two']) as t(f1
numeric(4,2),f2 text);
- QUERY PLAN
---------------------------------------------------------------
- Function Scan on pg_catalog.generate_subscripts i
- Output: i.i, ('{one,two}'::text[])[i.i]
- Function Call: generate_subscripts('{one,two}'::text[], 1)
+ QUERY PLAN
+----------------------------------------------------
+ Function Scan on public.array_to_set t
+ Output: f1, f2
+ Function Call: array_to_set('{one,two}'::text[])
(3 rows)
create temp table rngfunc(f1 int8, f2 int8);
@@ -2064,11 +2064,12 @@
explain (verbose, costs off)
select * from testrngfunc();
- QUERY PLAN
---------------------------------------------------------
- Result
- Output: 7.136178::numeric(35,6), 7.14::numeric(35,2)
-(2 rows)
+ QUERY PLAN
+-------------------------------------
+ Function Scan on public.testrngfunc
+ Output: f1, f2
+ Function Call: testrngfunc()
+(3 rows)
select * from testrngfunc();
f1 | f2
--
greg
Attachments:
0002-Instead-of-checks-actually-force-search_path-on-immu.patchapplication/x-patch; name=0002-Instead-of-checks-actually-force-search_path-on-immu.patchDownload
From 02f16014c3458e549719aaaaf4bb6c338e55b6e2 Mon Sep 17 00:00:00 2001
From: Greg Stark <stark@mit.edu>
Date: Thu, 16 Jun 2022 01:40:49 -0400
Subject: [PATCH 2/2] Instead of checks actually force search_path on immutable
and security definer functions
---
src/backend/commands/functioncmds.c | 79 ++++++++++++++++++++++++++++-
1 file changed, 78 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 9347ed70e2..20da751202 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -41,6 +41,7 @@
#include "catalog/indexing.h"
#include "catalog/objectaccess.h"
#include "catalog/pg_aggregate.h"
+#include "catalog/pg_authid.h"
#include "catalog/pg_cast.h"
#include "catalog/pg_language.h"
#include "catalog/pg_namespace.h"
@@ -683,9 +684,83 @@ update_proconfig_value(ArrayType *a, List *set_items)
* https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
*/
-/* XXX This logic should perhaps be moved to namespace.c XXX */
+/* XXX Some or all of this logic should perhaps be moved to namespace.c XXX */
#include "utils/varlena.h"
+static ArrayType *
+fixup_proconfig_value(ArrayType *a, char provolatile, bool prosecdef) {
+ char *search_path = NULL;
+
+ List *namelist;
+ ListCell *l;
+ StringInfoData string;
+
+ if (provolatile != PROVOLATILE_IMMUTABLE && !prosecdef) {
+ return a;
+ }
+ if (a != NULL) {
+ search_path = GUCArrayLookup(a, "search_path");
+ }
+ if (!search_path) {
+ search_path = pstrdup(namespace_search_path);
+ }
+
+ /* Parse string into list of identifiers
+ * GUCArrayLookup returns a pstrdup'd string so this is safe
+ */
+ if (!SplitIdentifierString(search_path, ',', &namelist))
+ {
+ /* syntax error in name list */
+ elog(ERROR, "invalid list syntax in search_path setting on function");
+ }
+
+ initStringInfo(&string);
+ appendStringInfoString(&string, "pg_catalog");
+ foreach(l, namelist)
+ {
+ char *curname = (char *) lfirst(l);
+
+ if (strcmp(curname, "pg_catalog") == 0)
+ {
+ if (foreach_current_index(l) != 0)
+ elog(WARNING, "moving pg_catalog to first position in search_path for IMMUTABLE or SECURITY DEFINER function");
+ continue;
+ }
+ if (strcmp(curname, "pg_temp") == 0)
+ {
+ if (foreach_current_index(l) != namelist->length-1)
+ elog(WARNING, "moving pg_temp to last position in search_path for IMMUTABLE or SECURITY DEFINER function");
+ continue;
+ }
+ if (strcmp(curname, "$user") == 0)
+ {
+ /* $user --- substitute namespace matching user name, if any */
+ HeapTuple tuple;
+
+ tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(GetUserId()));
+ if (HeapTupleIsValid(tuple))
+ {
+ curname = NameStr(((Form_pg_authid) GETSTRUCT(tuple))->rolname);
+ ReleaseSysCache(tuple);
+ /*elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should not have a search path including $user");*/
+ } else {
+ /* No great options here */
+ ReleaseSysCache(tuple);
+ elog(WARNING, "removing invalid $user from search_path on IMMUTABLE or SECURITY DEFINER function");
+ continue;
+ }
+ }
+ appendStringInfoChar(&string, ',');
+ appendStringInfoString(&string, quote_identifier(curname));
+ }
+ appendStringInfoChar(&string, ',');
+ appendStringInfoString(&string, "pg_temp");
+
+ a = GUCArrayAdd(a, "search_path", string.data);
+ return a;
+}
+
+
static void
verify_proconfig_value(ArrayType *a, char provolatile, bool prosecdef) {
char *proconfig_search_path = NULL;
@@ -1233,6 +1308,7 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
}
}
+ proconfig = fixup_proconfig_value(proconfig, volatility, security);
verify_proconfig_value(proconfig, volatility, security);
/*
@@ -1564,6 +1640,7 @@ AlterFunction(ParseState *pstate, AlterFunctionStmt *stmt)
/* update according to each SET or RESET item, left to right */
a = update_proconfig_value(a, set_items);
+ a = fixup_proconfig_value(a, procForm->provolatile, procForm->prosecdef);
verify_proconfig_value(a, procForm->provolatile, procForm->prosecdef);
/* update the tuple */
--
2.36.1
0001-Add-checks-on-search_path-for-IMMUTABLE-and-SECURITY.patchapplication/x-patch; name=0001-Add-checks-on-search_path-for-IMMUTABLE-and-SECURITY.patchDownload
From 94fcda77f88b55a4c6e06a2e4367c1bd322c07b4 Mon Sep 17 00:00:00 2001
From: Greg Stark <stark@mit.edu>
Date: Mon, 13 Jun 2022 16:47:49 -0400
Subject: [PATCH 1/2] Add checks on search_path for IMMUTABLE and SECURITY
DEFINER functions
---
src/backend/commands/functioncmds.c | 76 +++++++++++++++++++++++++++++
src/backend/utils/misc/guc.c | 57 +++++++++++++++++++++-
src/include/utils/guc.h | 1 +
3 files changed, 133 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 00a6d282cf..9347ed70e2 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -675,6 +675,78 @@ update_proconfig_value(ArrayType *a, List *set_items)
return a;
}
+/* We only enforce constraints on IMMUTABLE functions (because they can be
+ * used in indexes and search_path hacking can corrupt indexes for everyone)
+ * or SECURITY DEFINER functions (for obvious reasons). For these functions we
+ * enforce that the proconfig GUC settings should include search_path and that
+ * that search path should follow the recommendations listed at
+ * https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
+ */
+
+/* XXX This logic should perhaps be moved to namespace.c XXX */
+#include "utils/varlena.h"
+
+static void
+verify_proconfig_value(ArrayType *a, char provolatile, bool prosecdef) {
+ char *proconfig_search_path = NULL;
+ List *namelist;
+ ListCell *l;
+ bool has_dollar_user = false;
+ bool pg_catalog_first = true;
+ bool pg_temp_last = false;
+ bool is_first = true;
+
+ if (provolatile != PROVOLATILE_IMMUTABLE && !prosecdef) {
+ return;
+ }
+
+ if (a != NULL) {
+ proconfig_search_path = GUCArrayLookup(a, "search_path");
+ }
+
+ if (!proconfig_search_path)
+ {
+ elog(WARNING, "IMMUTABLE and SECURITY DEFINER functions must have search_path set");
+ return;
+ }
+
+ /* Parse string into list of identifiers
+ * GUCArrayLookup returns a pstrdup'd string so this is safe
+ */
+ if (!SplitIdentifierString(proconfig_search_path, ',', &namelist))
+ {
+ /* syntax error in name list */
+ elog(ERROR, "invalid list syntax in search_path setting on function");
+ }
+
+ foreach(l, namelist)
+ {
+ char *curname = (char *) lfirst(l);
+
+ /* It's not last unless it's last... If it's missing pg_temp is
+ * implicitly added as first */
+ pg_temp_last = false;
+
+ if (strcmp(curname, "$user") == 0)
+ has_dollar_user = true;
+ if (strcmp(curname, "pg_catalog") == 0 && !is_first)
+ pg_catalog_first = false;
+ if (strcmp(curname,"pg_temp") == 0)
+ pg_temp_last = true;
+
+ is_first = false;
+ }
+
+ if (has_dollar_user)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should not have a search path including $user");
+
+ if (!pg_catalog_first)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should have pg_catalog first in search_path (or omit it implicitly placing it first)");
+
+ if (!pg_temp_last)
+ elog(WARNING, "IMMUTABLE or SECURITY DEFINER functions should explicitly place pg_temp last or else it's implicitly first");
+}
+
static Oid
interpret_func_support(DefElem *defel)
{
@@ -1161,6 +1233,8 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
}
}
+ verify_proconfig_value(proconfig, volatility, security);
+
/*
* Convert remaining parameters of CREATE to form wanted by
* ProcedureCreate.
@@ -1490,6 +1564,8 @@ AlterFunction(ParseState *pstate, AlterFunctionStmt *stmt)
/* update according to each SET or RESET item, left to right */
a = update_proconfig_value(a, set_items);
+ verify_proconfig_value(a, procForm->provolatile, procForm->prosecdef);
+
/* update the tuple */
memset(repl_repl, false, sizeof(repl_repl));
repl_repl[Anum_pg_proc_proconfig - 1] = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a7cc49898b..55eed02851 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11435,7 +11435,6 @@ ProcessGUCArray(ArrayType *array,
}
}
-
/*
* Add an entry to an option array. The array parameter may be NULL
* to indicate the current table entry is NULL.
@@ -11514,6 +11513,62 @@ GUCArrayAdd(ArrayType *array, const char *name, const char *value)
return a;
}
+/*
+ * Return palloced string with the value for a GUC in a GUCArray (i.e. array
+ * of foo=bar text datums) with name search_name. Returns NULL if the value of
+ * the GUC can not be parsed using ParseLongOption or if it's absent from the
+ * array.
+ */
+char *
+GUCArrayLookup(ArrayType *array, const char *search_name)
+{
+ int i;
+
+ Assert(array != NULL);
+ Assert(ARR_ELEMTYPE(array) == TEXTOID);
+ Assert(ARR_NDIM(array) == 1);
+ Assert(ARR_LBOUND(array)[0] == 1);
+
+ for (i = 1; i <= ARR_DIMS(array)[0]; i++)
+ {
+ Datum d;
+ bool isnull;
+ char *s;
+ char *name;
+ char *value;
+ char *valuecopy = NULL;
+ bool done = false;
+
+ d = array_ref(array, 1, &i,
+ -1 /* varlenarray */ ,
+ -1 /* TEXT's typlen */ ,
+ false /* TEXT's typbyval */ ,
+ TYPALIGN_INT /* TEXT's typalign */ ,
+ &isnull);
+
+ if (isnull)
+ continue;
+
+ s = TextDatumGetCString(d);
+
+ /* ParseLongOption returns malloced storage */
+ ParseLongOption(s, &name, &value);
+ done = (strcmp(name, search_name) == 0);
+
+ if (done && value) {
+ valuecopy = pstrdup(value);
+ }
+ if (name)
+ free(name);
+ if (value)
+ free(value);
+
+ if (done)
+ return valuecopy;
+ }
+ return NULL;
+}
+
/*
* Delete an entry from an option array. The array parameter may be NULL
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index d5976ecbfb..24aa13f9e6 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -404,6 +404,7 @@ extern char *ExtractSetVariableArgs(VariableSetStmt *stmt);
extern void ProcessGUCArray(ArrayType *array,
GucContext context, GucSource source, GucAction action);
extern ArrayType *GUCArrayAdd(ArrayType *array, const char *name, const char *value);
+extern char * GUCArrayLookup(ArrayType *array, const char *search_name);
extern ArrayType *GUCArrayDelete(ArrayType *array, const char *name);
extern ArrayType *GUCArrayReset(ArrayType *array);
--
2.36.1
On Thu, 16 Jun 2022 at 12:04, Greg Stark <stark@mit.edu> wrote:
Providing a fixed up search_path for users would give them a smoother
upgrade process where we can give a warning only if the search_path is
changed substantively which is much less likely.I'm still quite concerned about the performance impact of having
search_path on so many functions. It causes a cache flush which could
be pretty painful on a frequently called function such as one in an
index expression during a data load....
So it seems I missed a small change in Postgres SQL function world,
namely the SQL standard syntax and prosqlbody column from e717a9a18.
This feature is huge. It's awesome! It basically provides the lexical
scoping feature I was hoping to implement. Any sql immutable standard
syntax sql function can be safely used in indexes or elsewhere
regardless of your search_path as all the names are already resolved.
I'm now thinking we should just provide a LEXICAL option on Postgres
style functions to implement the same name path and store sqlbody for
them as well. They would have to be bound by the same restrictions
(notably no polymorphic parameters) but otherwise I think it should be
straightforward.
Functions defined this way would always be safe for pg_dump regardless
of the search_path used to define them and would also protect users
from accidentally corrupting indexes when users have different
search_paths.
This doesn't really address plpgsql functions of course, I doubt we
can do the same thing.
--
greg
Hi,
On 2022-06-16 12:04:12 -0400, Greg Stark wrote:
Of course this patch is still very WIP. Only one or the other function
makes sense to keep. And I'm not opposed to having a GUC to
enable/disable the enforcement or warnings. And the code itself needs
to be cleaned up with parts of it moving to guc.c and/or namespace.c.
This currently obviously doesn't pass tests - are you planning to work on this
further? As is I'm not really clear what the CF entry is for. Given the
current state it doesn't look like it's actually looking for review?
Greetings,
Andres Freund