CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
The attached patch implements a new SEARCH clause for CREATE FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.
Background:
Controlling search_path is critical for the correctness and security of
functions. Right now, the author of a function without a SET clause has
little ability to control the function's behavior, because even basic
operators like "+" involve search_path. This is a big problem for, e.g.
functions used in expression indexes which are called by any user with
write privileges on the table.
Motivation:
I'd like to (eventually) get to safe-by-default behavior. In other
words, the simplest function declaration should be safe for the most
common use cases.
To get there, we need some way to explicitly specify the less common
cases. Right now there's no way for the function author to indicate
that a function intends to use the session's search path. We also need
an easier way to specify that the user wants a safe search_path ("SET
search_path = pg_catalog, pg_temp" is arcane).
And when we know more about the user's actual intent, then it will be
easier to either form a transition plan to push users into the safer
behavior, or at least warn strongly when the user is doing something
dangerous (i.e. using a function that depends on the session's search
path as part of an expression index).
Today, the only information we have about the user's intent is the
presence or absence of a "SET search_path" clause, which is not a
strong signal.
Proposal:
Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER
function.
* SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up
stored in the catalog as prosearch='d'.
* SEARCH SYSTEM means that we switch to the safe search path of
"pg_catalog, pg_temp" when executing the function. Stored as
prosearch='y'.
* SEARCH SESSION means that we don't switch the search_path when
executing the function, and it's inherited from the session. Stored as
prosearch='e'.
Regardless of the SEARCH clause, a "SET search_path" clause will
override it. The SEARCH clause only matters when "SET search_path" is
not there.
Additionally provide a GUC, defaulting to false for compatibility, that
can interpret prosearch='d' as if it were prosearch='y'. It could help
provide a transition path. I know there's a strong reluctance to adding
these kinds of GUCs; I can remove it and I think the patch will still
be worthwhile. Perhaps there are alternatives that could help with
migration at pg_dump time instead?
Benefits:
1. The user can be more explicit about their actual intent. Do they
want safety and consistency? Or the flexibility of using the session's
search_path?
2. We can more accurately serve the user's intent. For instance, the
safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
there just because we don't have a way to specify that pg_temp be
excluded entirely. But perhaps in the future we *do* want to exclude
pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
allows us some freedom to do that.
3. Users can be forward-compatible by specifying the functions that
really do need to use the session's search path as SEARCH SESSION, so
that they will never be broken in the future. That gives us a cleaner
path toward making the default behavior safe.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
Attachments:
0001-CREATE-FUNCTION-.-SEARCH-DEFAULT-SYSTEM-SESSION.patchtext/x-patch; charset=UTF-8; name=0001-CREATE-FUNCTION-.-SEARCH-DEFAULT-SYSTEM-SESSION.patchDownload+305-10
On 8/11/23 22:35, Jeff Davis wrote:
The attached patch implements a new SEARCH clause for CREATE FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.Background:
Controlling search_path is critical for the correctness and security of
functions. Right now, the author of a function without a SET clause has
little ability to control the function's behavior, because even basic
operators like "+" involve search_path. This is a big problem for, e.g.
functions used in expression indexes which are called by any user with
write privileges on the table.Motivation:
I'd like to (eventually) get to safe-by-default behavior. In other
words, the simplest function declaration should be safe for the most
common use cases.
I agree with the general need.
Add SEARCH { DEFAULT | SYSTEM | SESSION } clause to CREATE/ALTER
function.* SEARCH DEFAULT is the same as no SEARCH clause at all, and ends up
stored in the catalog as prosearch='d'.
* SEARCH SYSTEM means that we switch to the safe search path of
"pg_catalog, pg_temp" when executing the function. Stored as
prosearch='y'.
* SEARCH SESSION means that we don't switch the search_path when
executing the function, and it's inherited from the session. Stored as
prosearch='e'.
It isn't clear to me what is the precise difference between DEFAULT and
SESSION
2. We can more accurately serve the user's intent. For instance, the
safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
there just because we don't have a way to specify that pg_temp be
excluded entirely. But perhaps in the future we *do* want to exclude
pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
allows us some freedom to do that.
Personally I think having pg_temp in the SYSTEM search path makes sense
for temp tables, but I find it easy to forget that functions can be
created by unprivileged users in pg_temp, and therefore having pg_temp
in the search path for functions is dangerous.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 8/12/23 09:15, Joe Conway wrote:
On 8/11/23 22:35, Jeff Davis wrote:
2. We can more accurately serve the user's intent. For instance, the
safe search_path of "pg_catalog, pg_temp" is arcane and seems to be
there just because we don't have a way to specify that pg_temp be
excluded entirely. But perhaps in the future we *do* want to exclude
pg_temp entirely. Knowing that the user just wants "SEARCH SYSTEM"
allows us some freedom to do that.Personally I think having pg_temp in the SYSTEM search path makes sense
for temp tables, but I find it easy to forget that functions can be
created by unprivileged users in pg_temp, and therefore having pg_temp
in the search path for functions is dangerous.
Hmm, I guess I was too hasty -- seems we have some magic related to this
already.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Sat, 2023-08-12 at 09:15 -0400, Joe Conway wrote:
It isn't clear to me what is the precise difference between DEFAULT
and
SESSION
The the current patch, SESSION always gets the search path from the
session, while DEFAULT is controlled by the GUC
safe_function_search_path. If the GUC is false (the default) then
DEFAULT and SESSION are the same. If the GUC is true, then DEFAULT and
SYSTEM are the same.
There are alternatives to using a GUC to differentiate them. The main
point of this patch is to capture what the user intends in a convenient
and forward-compatible way. If the user specifies nothing at all, they
get DEFAULT, and we could treat that specially in various ways to move
toward safety while minimizing breakage.
Personally I think having pg_temp in the SYSTEM search path makes
sense
for temp tables
The patch doesn't change this behavior -- SYSTEM (without any other
SET) gives you "pg_catalog, pg_temp" and there's no way to exclude
pg_temp entirely.
My point was that by capturing the user's intent with SEARCH SYSTEM, it
gives us a bit more freedom to have these kinds of discussions later.
And it's certainly easier for the user to specify SEARCH SYSTEM than
"SET search_path = pg_catalog, pg_temp".
Regards,
Jeff Davis
On Sat, 2023-08-12 at 09:50 -0400, Joe Conway wrote:
Hmm, I guess I was too hasty -- seems we have some magic related to
this
already.
I was worried after your first email. But yes, the magic is in
FuncnameGetCandidates(), which simply ignores functions in the temp
namespace.
It would be better if we were obviously safe rather than magically
safe, though.
Regards,
Jeff Davis
Hi,
On 2023-08-11 19:35:22 -0700, Jeff Davis wrote:
Controlling search_path is critical for the correctness and security of
functions. Right now, the author of a function without a SET clause has
little ability to control the function's behavior, because even basic
operators like "+" involve search_path. This is a big problem for, e.g.
functions used in expression indexes which are called by any user with
write privileges on the table.
Motivation:
I'd like to (eventually) get to safe-by-default behavior. In other
words, the simplest function declaration should be safe for the most
common use cases.
I'm not sure that anything based, directly or indirectly, on search_path
really is a realistic way to get there.
To get there, we need some way to explicitly specify the less common
cases. Right now there's no way for the function author to indicate
that a function intends to use the session's search path. We also need
an easier way to specify that the user wants a safe search_path ("SET
search_path = pg_catalog, pg_temp" is arcane).
No disagreement with that. Even if I don't yet agree that your proposal is a
convincing path to "easy security for PLs" - just making the search path stuff
less arcane is good.
And when we know more about the user's actual intent, then it will be
easier to either form a transition plan to push users into the safer
behavior, or at least warn strongly when the user is doing something
dangerous (i.e. using a function that depends on the session's search
path as part of an expression index).
I think that'd be pretty painful from a UX perspective. Having to write
e.g. operators as operator(schema, op) just sucks as an experience. And with
extensions plenty of operators will live outside of pg_catalog, so there is
plenty things that will need qualifying. And because of things like type
coercion search, which prefers "bettering fitting" coercions over search path
order, you can't just put "less important" things later in search path.
I wonder if we ought to work more on "fossilizing" the result of search path
resolutions at the time functions are created, rather than requiring the user
to do so explicitly. Most of the problem here comes down to the fact that if
a user creates a function like 'a + b' we'll not resolve the operator, the
potential type coercions etc, when the function is created - we do so when the
function is executed.
We can't just store the oids at the time, because that'd end up very fragile -
tables/functions/... might be dropped and recreated etc and thus change their
oid. But we could change the core PLs to rewrite all the queries (*) so that
they schema qualify absolutely everything, including operators and implicit
type casts.
That way objects referenced by functions can still be replaced, but search
path can't be used to "inject" objects in different schemas. Obviously it
could lead to errors on some schema changes - e.g. changing a column type
might mean that a relevant cast lives in a different place than with the old
type - but I think that'll be quite rare. Perhaps we could offer a ALTER
FUNCTION ... REFRESH REFERENCES; or such?
One obvious downside of such an approach is that it requires some work with
each PL. I'm not sure that's avoidable - and I suspect that most "security
sensitive" functions are written in just a few languages.
(*) Obviously the one thing that doesn't work for is use of EXECUTE in plpgsql
and similar constructs elsewhere. I'm not sure there's much that can be done
to make that safe, but it's worth thinking about more.
Greetings,
Andres Freund
On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote:
I'm not sure that anything based, directly or indirectly, on
search_path
really is a realistic way to get there.
Can you explain a little more? I see what you mean generally, that
search_path is an imprecise thing, and that it leaves room for
ambiguity and mistakes.
But I also think we can do a lot better than we're doing today and
still retain the basic concept of search_path, which is good because
it's deeply integrated into postgres, and it's not clear that we're
going to get away from it any time soon.
I think that'd be pretty painful from a UX perspective. Having to
write
e.g. operators as operator(schema, op) just sucks as an experience.
I'm not suggesting that the user fully-qualify everything; I'm
suggesting that the include a "SET search_path" clause if they depend
on anything other than pg_catalog.
And with
extensions plenty of operators will live outside of pg_catalog, so
there is
plenty things that will need qualifying.
In my proposal, that would still involve a "SET search_path TO
myextension, pg_catalog, pg_temp".
The main reason that's bad is that adding pg_temp at the end is painful
UX -- just something that the user needs to remember to do with little
obvious reason or observable impact; but it has important security
implications. Perhaps we should just not implicitly include pg_temp for
a function's search_path (at least for the case of CREATE FUNCTION ...
SEARCH SYSTEM)?
And because of things like type
coercion search, which prefers "bettering fitting" coercions over
search path
order, you can't just put "less important" things later in search
path.
I understand this introduces some ambiguity, but you just can't include
schemas in the search_path that you don't trust, for similar reasons as
$PATH. If you have a few objects you'd like to access in another user's
schema, fully-qualify them.
We can't just store the oids at the time, because that'd end up very
fragile -
tables/functions/... might be dropped and recreated etc and thus
change their
oid.
Robert suggested something along those lines[1]/messages/by-id/CA+Tgmobd=eFRGWHhfG4mG2cA+dsVuA4jpBvD8N1NS=Vc9eHFQg@mail.gmail.com. I won't rule it out,
but I agree that there are quite a few things left to figure out.
But we could change the core PLs to rewrite all the queries (*) so
that
they schema qualify absolutely everything, including operators and
implicit
type casts.
So not quite like "SET search_path FROM CURRENT": you resolve it to a
specific "schemaname.objectname", but stop just short of resolving to a
specific OID?
An interesting compromise, but I'm not sure what the benefit is vs. SET
search_path FROM CURRENT (or some defined search_path).
That way objects referenced by functions can still be replaced, but
search
path can't be used to "inject" objects in different schemas.
Obviously it
could lead to errors on some schema changes - e.g. changing a column
type
might mean that a relevant cast lives in a different place than with
the old
type - but I think that'll be quite rare. Perhaps we could offer a
ALTER
FUNCTION ... REFRESH REFERENCES; or such?
Hmm. I feel like that's making things more complicated. I'd find it
more straightforward to use something like Robert's approach of fully
parsing something, and then have the REFRESH command reparse it when
something needs updating. Or perhaps just create all of the dependency
entries more like a view query and then auto-refresh.
(*) Obviously the one thing that doesn't work for is use of EXECUTE
in plpgsql
and similar constructs elsewhere. I'm not sure there's much that can
be done
to make that safe, but it's worth thinking about more.
I think it would be really nice to have some better control over the
search_path regardless, because it still helps with cases like this. A
lot of C functions build queries, and I don't think it's reasonable to
constantly worry about the ambiguity of the schema for "=".
Regards,
Jeff Davis
[1]: /messages/by-id/CA+Tgmobd=eFRGWHhfG4mG2cA+dsVuA4jpBvD8N1NS=Vc9eHFQg@mail.gmail.com
/messages/by-id/CA+Tgmobd=eFRGWHhfG4mG2cA+dsVuA4jpBvD8N1NS=Vc9eHFQg@mail.gmail.com
On 12.08.23 04:35, Jeff Davis wrote:
The attached patch implements a new SEARCH clause for CREATE FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.
I don't understand this. This adds a new option for cases where the
existing option wasn't specified. Why not specify the existing option
then? Is it not good enough? Can we improve it?
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
On 12.08.23 04:35, Jeff Davis wrote:
The attached patch implements a new SEARCH clause for CREATE
FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.I don't understand this. This adds a new option for cases where the
existing option wasn't specified. Why not specify the existing
option
then? Is it not good enough? Can we improve it?
SET search_path = '...' not good enough in my opinion.
1. Not specifying a SET clause falls back to the session's search_path,
which is a bad default because it leads to all kinds of inconsistent
behavior and security concerns.
2. There's no way to explicitly request that you'd actually like to use
the session's search_path, so it makes it very hard to ever change the
default.
3. It's user-unfriendly. A safe search_path that would be suitable for
most functions is "SET search_path = pg_catalog, pg_temp", which is
arcane, and requires some explanation.
4. search_path for the session is conceptually different than for a
function. A session should be context-sensitive and the same query
should (quite reasonably) behave differently for different sessions and
users to sort out things like object name conflicts, etc. A function
should (ordinarily) be context-insensitive, especially when used in
something like an index expression or constraint. Having different
syntax helps separate those concepts.
5. There's no way to prevent pg_temp from being included in the
search_path. This is separately fixable, but having the proposed SEARCH
syntax is likely to make for a better user experience in the common
cases.
I'm open to suggestion about other ways to improve it, but SEARCH is
what I came up with.
Regards,
Jeff Davis
On 16.08.23 19:44, Jeff Davis wrote:
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
On 12.08.23 04:35, Jeff Davis wrote:
The attached patch implements a new SEARCH clause for CREATE
FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.I don't understand this. This adds a new option for cases where the
existing option wasn't specified. Why not specify the existing
option
then? Is it not good enough? Can we improve it?SET search_path = '...' not good enough in my opinion.
1. Not specifying a SET clause falls back to the session's search_path,
which is a bad default because it leads to all kinds of inconsistent
behavior and security concerns.
Not specifying SEARCH would have the same issue?
2. There's no way to explicitly request that you'd actually like to use
the session's search_path, so it makes it very hard to ever change the
default.
That sounds like something that should be fixed independently. I could
see this being useful for other GUC settings, like I want to run a
function explicitly with the session's work_mem.
3. It's user-unfriendly. A safe search_path that would be suitable for
most functions is "SET search_path = pg_catalog, pg_temp", which is
arcane, and requires some explanation.
True, but is that specific to functions? Maybe I want a safe
search_path just in general, for a session or something.
4. search_path for the session is conceptually different than for a
function. A session should be context-sensitive and the same query
should (quite reasonably) behave differently for different sessions and
users to sort out things like object name conflicts, etc. A function
should (ordinarily) be context-insensitive, especially when used in
something like an index expression or constraint. Having different
syntax helps separate those concepts.
I'm not sure I follow that. When you say a function should be
context-insensitive, you could also say, a function should be
context-sensitive, but have a separate context. Which is kind of how it
works now. Maybe not well enough.
5. There's no way to prevent pg_temp from being included in the
search_path. This is separately fixable, but having the proposed SEARCH
syntax is likely to make for a better user experience in the common
cases.
seems related to #3
I'm open to suggestion about other ways to improve it, but SEARCH is
what I came up with.
Some extensions of the current mechanism, like search_path = safe,
search_path = session, search_path = inherit, etc. might work.
On Fri, 2023-08-18 at 14:25 +0200, Peter Eisentraut wrote:
Not specifying SEARCH would have the same issue?
Not specifying SEARCH is equivalent to SEARCH DEFAULT, and that gives
us some control over what happens. In the proposed patch, a GUC
determines whether it behaves like SEARCH SESSION (the default for
compatibility reasons) or SEARCH SYSTEM (safer).
2. There's no way to explicitly request that you'd actually like to
use
the session's search_path, so it makes it very hard to ever change
the
default.That sounds like something that should be fixed independently. I
could
see this being useful for other GUC settings, like I want to run a
function explicitly with the session's work_mem.
I'm confused about how this would work. It doesn't make sense to set a
GUC to be the session value in postgresql.conf, because there's no
session yet. And it doesn't really make sense in a top-level session,
because it would just be a no-op (right?). It maybe makes sense in a
function, but I'm still not totally clear on what that would mean.
True, but is that specific to functions? Maybe I want a safe
search_path just in general, for a session or something.
I agree this is a somewhat orthogonal problem and we should have a way
to keep pg_temp out of the search_path entirely. We just need to agree
on a string representation of a search path that omits pg_temp. One
idea would be to have special identifiers "!pg_temp" and "!pg_catalog"
that would cause those to be excluded entirely.
I'm not sure I follow that. When you say a function should be
context-insensitive, you could also say, a function should be
context-sensitive, but have a separate context. Which is kind of how
it
works now. Maybe not well enough.
For functions called from index expressions or constraints, you want
the function's result to only depend on its arguments; otherwise you
can easily violate a constraint or cause an index to return wrong
results.
You're right that there is some other context, like the database
default collation, but (a) that's mostly nailed down; and (b) if it
changes unexpectedly that also causes problems.
I'm open to suggestion about other ways to improve it, but SEARCH
is
what I came up with.Some extensions of the current mechanism, like search_path = safe,
search_path = session, search_path = inherit, etc. might work.
I had considered some new special names like this in search path, but I
didn't come up with a specific proposal that I liked. Do you have some
more details about how this would help get us to a safe default?
Regards,
Jeff Davis
Hi,
On 2023-08-14 12:25:30 -0700, Jeff Davis wrote:
On Sat, 2023-08-12 at 11:25 -0700, Andres Freund wrote:
I'm not sure that anything based, directly or indirectly, on
search_path
really is a realistic way to get there.Can you explain a little more? I see what you mean generally, that
search_path is an imprecise thing, and that it leaves room for
ambiguity and mistakes.
It just doesn't seem to provide enough control and it's really painful for
users to manage. If you install a bunch of extensions into public - very very
common from what I have seen - you can't really remove public from the search
path. Which then basically makes all approaches of resolving any of the
security issues via search path pretty toothless.
I think that'd be pretty painful from a UX perspective. Having to
write
e.g. operators as operator(schema, op) just sucks as an experience.I'm not suggesting that the user fully-qualify everything; I'm
suggesting that the include a "SET search_path" clause if they depend
on anything other than pg_catalog.
I don't think that really works in practice, due to the very common practice
of installing extensions into the same schema as the application. Then that
schema needs to be in search path (if you don't want to schema qualify
everything), which leaves you wide open.
And with
extensions plenty of operators will live outside of pg_catalog, so
there is
plenty things that will need qualifying.In my proposal, that would still involve a "SET search_path TO
myextension, pg_catalog, pg_temp".
myextension is typically public. Which means that there's zero protection due
to such a search path.
� And because of things like type
coercion search, which prefers "bettering fitting" coercions over
search path
order, you can't just put "less important" things later in search
path.I understand this introduces some ambiguity, but you just can't include
schemas in the search_path that you don't trust, for similar reasons as
$PATH. If you have a few objects you'd like to access in another user's
schema, fully-qualify them.
I think the more common attack paths are things like tricking extension
scripts into evaluating arbitrary code, to gain "real superuser" privileges.
We can't just store the oids at the time, because that'd end up very
fragile -
tables/functions/... might be dropped and recreated etc and thus
change their
oid.Robert suggested something along those lines[1]. I won't rule it out,
but I agree that there are quite a few things left to figure out.But we could change the core PLs to rewrite all the queries (*) so
that
they schema qualify absolutely everything, including operators and
implicit
type casts.So not quite like "SET search_path FROM CURRENT": you resolve it to a
specific "schemaname.objectname", but stop just short of resolving to a
specific OID?An interesting compromise, but I'm not sure what the benefit is vs. SET
search_path FROM CURRENT (or some defined search_path).
Search path does not reliably protect things involving "type matching". If you
have a better fitting cast, or a function call with parameters that won't need
coercion, later in search path, they'll win, even if there's another fit
earlier on.
IOW, search path is a bandaid for this kind of thing, at best.
If we instead store something that avoids the need for such search, the
"better fitting cast" logic wouldn't add these kind of security issues
anymore.
That way objects referenced by functions can still be replaced, but
search
path can't be used to "inject" objects in different schemas.
Obviously it
could lead to errors on some schema changes - e.g. changing a column
type
might mean that a relevant cast lives in a different place than with
the old
type - but I think that'll be quite rare. Perhaps we could offer a
ALTER
FUNCTION ... REFRESH REFERENCES; or such?Hmm. I feel like that's making things more complicated. I'd find it
more straightforward to use something like Robert's approach of fully
parsing something, and then have the REFRESH command reparse it when
something needs updating. Or perhaps just create all of the dependency
entries more like a view query and then auto-refresh.
Hm, I'm not quite sure I follow on what exactly you see as different here.
Greetings,
Andres Freund
On Wed, Aug 16, 2023 at 1:45 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
On 12.08.23 04:35, Jeff Davis wrote:
The attached patch implements a new SEARCH clause for CREATE
FUNCTION.
The SEARCH clause controls the search_path used when executing
functions that were created without a SET clause.I don't understand this. This adds a new option for cases where the
existing option wasn't specified. Why not specify the existing
option
then? Is it not good enough? Can we improve it?SET search_path = '...' not good enough in my opinion.
1. Not specifying a SET clause falls back to the session's search_path,
which is a bad default because it leads to all kinds of inconsistent
behavior and security concerns.2. There's no way to explicitly request that you'd actually like to use
the session's search_path, so it makes it very hard to ever change the
default.3. It's user-unfriendly. A safe search_path that would be suitable for
most functions is "SET search_path = pg_catalog, pg_temp", which is
arcane, and requires some explanation.4. search_path for the session is conceptually different than for a
function. A session should be context-sensitive and the same query
should (quite reasonably) behave differently for different sessions and
users to sort out things like object name conflicts, etc. A function
should (ordinarily) be context-insensitive, especially when used in
something like an index expression or constraint. Having different
syntax helps separate those concepts.5. There's no way to prevent pg_temp from being included in the
search_path. This is separately fixable, but having the proposed SEARCH
syntax is likely to make for a better user experience in the common
cases.I'm open to suggestion about other ways to improve it, but SEARCH is
what I came up with.
The one thing that I really like about your proposal is that you
explicitly included a way of specifying that the prevailing
search_path should be used. If we move to any kind of a system where
the default behavior is something other than that, then we need that
as an option. Another, related thing that I recently discovered would
be useful is a way to say "I'd like to switch the search_path to X,
but I'd also like to discover what the prevailing search_path was just
before entering this function." For example, if I have a function that
is SECURITY DEFINER which takes some executable code as an input, I
might want to arrange to eventually execute that code with the
caller's user ID and search_path, but I can't discover the caller's
search_path unless I don't set it, and that's a dangerous thing to do.
However, my overall concern here is that this feels like it's
reinventing the wheel. We already have a way of setting search_path;
this gives us a second one. If we had no existing mechanism for that,
I think this would definitely be an improvement, and quite possibly
better than the current mechanism. But given that we had a mechanism
already, if we added this, we'd then have two, which seems like the
wrong number.
I'm inclined to think that if there are semantics that we currently
lack, we should think of extending the current syntax to support them.
Right now you can SET search_path = 'specific value' or SET
search_path FROM CURRENT or leave it out. We could introduce a new way
of spelling "leave it out," like RESET search_path or whatever. We
could introduce a new setting that doesn't set the search_path at all
but reverts to the old value on function exit, like SET search_path
USING CALL or whatever. And we could think of making SET search_path
FROM CURRENT or any new semantics we introduce the default in a future
release, or even make the default behavior depend on an evil
behavior-changing GUC as you proposed. I'm not quite sure what we
should do here conceptually, but I don't see why having a completely
new syntax for doing it really helps.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, 2023-08-19 at 11:59 -0700, Andres Freund wrote:
If you install a bunch of extensions into public - very very
common from what I have seen - you can't really remove public from
the search
path. Which then basically makes all approaches of resolving any of
the
security issues via search path pretty toothless.
Toothless only if (a) untrusted users have CREATE privileges in the
public schema, which is no longer the default; and (b) you're writing a
function that accesses extension objects installed in the public
schema.
While those may be normal things to do, there are a lot of times when
those things aren't true. I speculate that it's far more common to
write functions that only use pg_catalog objects (e.g. the "+"
operator, some string manipulation, etc.) and basic control flow.
There's a lot of value in making those simple cases secure-by-default.
We are already moving users towards a readable-but-not-writable public
schema as a best practice, so if we also move to something like SEARCH
SYSTEM as a best practice, then that will help a LOT of users.
I don't think that really works in practice, due to the very common
practice
of installing extensions into the same schema as the application.
Then that
schema needs to be in search path (if you don't want to schema
qualify
everything), which leaves you wide open.
...
myextension is typically public. Which means that there's zero
protection due
to such a search path.
You mentioned this three times so I must be missing something. Why is
it "wide open" and "zero protection"? If the schema is not world-
writable, then aren't attacks a lot harder to pull off?
I think the more common attack paths are things like tricking
extension
scripts into evaluating arbitrary code, to gain "real superuser"
privileges.
Extension scripts are a separate beast. I do see some potential avenues
of attack, but I don't see how your approach of resolving schemas early
would help.
Search path does not reliably protect things involving "type
matching". If you
have a better fitting cast, or a function call with parameters that
won't need
coercion, later in search path, they'll win, even if there's another
fit
earlier on.
You need to trust the schemas in your search_path.
If we instead store something that avoids the need for such search,
the
"better fitting cast" logic wouldn't add these kind of security
issues
anymore.
I don't disagree, but I don't understand the approach in detail (i.e. I
couldn't write it up as a proposal). For instance, what would the
pg_dump output look like?
And even if we had that in place, I think we'd still want a better way
to control the search_path.
Hm, I'm not quite sure I follow on what exactly you see as different
here.
From what I understand, Robert's approach is to fully parse the
commands and resolve to specific OIDs (necessitating dependencies,
etc.); while your approach resolves to fully-qualified names but not
OIDs (and needing no dependencies).
I don't understand either proposal entirely, so perhaps I'm on the
wrong track here, but I feel like Robert's approach is more "normal"
and easy to document whereas your approach is more "creative" and
perhaps hard to document.
Both approaches (resolving to names and resolving to OIDs) seem pretty
far away, so I'm still very much inclined to nudge users toward safer
best practices with search_path. I think SEARCH SYSTEM is a good start
there and doable for 17.
Regards,
Jeff Davis
On Mon, 2023-08-21 at 15:14 -0400, Robert Haas wrote:
Another, related thing that I recently discovered would
be useful is a way to say "I'd like to switch the search_path to X,
but I'd also like to discover what the prevailing search_path was
just
before entering this function."
Interesting, that could probably be accommodated one way or another.
However, my overall concern here is that this feels like it's
reinventing the wheel. We already have a way of setting search_path;
this gives us a second one.
In one sense, you are obviously right. We have a way to set search_path
for a function already, just like any other GUC.
But I don't look at the search_path as "just another GUC" when it comes
to executing a function. The source of the initial value of search_path
is more like the IMMUTABLE marker.
We can also do something with the knowledge the SEARCH marker gives us.
For instance, issue WARNINGs or ERRORs when someone uses a SEARCH
SESSION function in an index expression or constraint, or perhaps when
they try to declare a function IMMUTABLE in the first place.
In other words, the SEARCH clause tells us where search_path comes
from, not so much what it is specifically. I believe that tells us
something fundamental about the kind of function it is. If I tell you
nothing about a function except whether the search path comes from the
system or the session, you can imagine how it should be used (or not
used, as the case may be).
I'm inclined to think that if there are semantics that we currently
lack, we should think of extending the current syntax to support
them.
Right now you can SET search_path = 'specific value' or SET
search_path FROM CURRENT or leave it out. We could introduce a new
way
of spelling "leave it out," like RESET search_path or whatever.
The thought occurred to me but any way I looked at it was messier and
less user-friendly. It feels like generalizing from search_path to all
GUCs, and then needing to specialize for search_path anyway.
For instance, if we want the default search_path to be the safe value
'pg_catalog, pg_temp', where would that default value come from? Or
instead, we could say that the default would be FROM CURRENT, which
would seem to generalize; but then we immediately run into the problem
that we don't want most GUCs to default to FROM CURRENT (because that
would capture the entire GUC state, which seems bad for several
reasons), and again we'd need to specialize for search_path.
In other words, search_path really *is* special. I don't think it's
great to generalize from it as though it were just like every other
GUC.
I do recognize that "SEARCH SYSTEM ... SET search_path = '...'" is
redundant, and that's not great. I just see the other options as worse,
but if I've misunderstood your approach then please clarify.
Regards,
Jeff Davis
On Mon, Aug 21, 2023 at 5:32 PM Jeff Davis <pgsql@j-davis.com> wrote:
But I don't look at the search_path as "just another GUC" when it comes
to executing a function. The source of the initial value of search_path
is more like the IMMUTABLE marker.
I mean I agree and I disagree.
Philosophically, I agree. Most functions are written with some
particular search_path in mind; the author imagines that the function
will be executed with, well, probably whatever search path the author
typically uses themselves. Now and then, someone may write a function
that's intended to run with various different search paths, e.g.
anything of the form customer_XXXX, pg_catalog, pg_temp. I think that
is a real thing that people actually do, intentionally varying the
search_path with the idea of rebinding some references. However, cases
where somebody sincerely intends for the caller to be able to make +
or || mean something different from normal probably do not exist in
practice. So, if we were designing a system from scratch, then I would
recommend against making search_path a GUC, because it's clearly
shouldn't behave in the same way as a session property like
debug_print_plan or enable_seqscan, where you could want to run the
same code with various values.
But practically, I disagree. As things stand today, search_path *is* a
GUC that dynamically changes the run-time properties of a session, and
your proposed patch wouldn't change that. What it would do is layer
another mechanism on top of that which, IMHO, makes something that is
already complicated and error-prone even more complicated. If we
wanted to really make seach_path behave like a property of the code
rather than the session, I think we'd need to change quite a bit more
stuff, and the price of that in terms of backward-compatibility might
be higher than we'd be willing to pay, but if, hypothetically, we
decided to pay that price, then at the end of it search_path as a GUC
would be gone, and we'd have one way of managing sarch_path that is
different from the one we have now.
But with the patch as you have proposed it that's not what happens. We
just end up with two interconnected mechanisms for managing what,
right now, is managed by a single mechanism. That mechanism is (and I
think we probably mostly all agree on this) bad. Like really really
bad. But having more than one mechanism, to me, still seems worse.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, 2023-09-18 at 12:01 -0400, Robert Haas wrote:
But with the patch as you have proposed it that's not what happens.
We
just end up with two interconnected mechanisms for managing what,
right now, is managed by a single mechanism. That mechanism is (and I
think we probably mostly all agree on this) bad. Like really really
bad. But having more than one mechanism, to me, still seems worse.
I don't want to make an argument of the form "the status quo is really
bad, and therefore my proposal is good". That line of argument is
suspect for good reason.
But if my proposal isn't good enough, and we don't have a clear
alternative, we need to think seriously about how much we've
collectively over-promised and under-delivered on the concept of
privilege separation.
Absent a better idea, we need to figure out a way to un-promise what we
can't do and somehow guide users towards safe practices. For instance,
don't grant the INSERT or UPDATE privilege if the table uses functions
in index expressions or constraints. Also don't touch any table unless
the onwer has SET ROLE privileges on your role already, or the
operation is part of a special carve out (logical replication or a
maintenance command). And don't use the predefined role
pg_write_all_data, because that's unsafe for most imaginable use cases.
Regards,
Jeff Davis
On Mon, Sep 18, 2023 at 4:51 PM Jeff Davis <pgsql@j-davis.com> wrote:
I don't want to make an argument of the form "the status quo is really
bad, and therefore my proposal is good". That line of argument is
suspect for good reason.
+1.
But if my proposal isn't good enough, and we don't have a clear
alternative, we need to think seriously about how much we've
collectively over-promised and under-delivered on the concept of
privilege separation.Absent a better idea, we need to figure out a way to un-promise what we
can't do and somehow guide users towards safe practices. For instance,
don't grant the INSERT or UPDATE privilege if the table uses functions
in index expressions or constraints. Also don't touch any table unless
the onwer has SET ROLE privileges on your role already, or the
operation is part of a special carve out (logical replication or a
maintenance command). And don't use the predefined role
pg_write_all_data, because that's unsafe for most imaginable use cases.
I agree this is a mess, and that documenting the mess better would be
good. But instead of saying not to do something, we need to say what
will happen if you do the thing. I'm regularly annoyed when somebody
reports that "I tried to do X and it didn't work," instead of saying
what happened when they tried, and this situation is another form of
the same thing. "If you do X, then Y will or can occur" is much better
than "do not do X". And I think better documentation of this area
would be useful regardless of any other improvements that we may or
may not make. Indeed, really good documentation of this area might
facilitate making further improvements by highlighting some of the
problems so that they can more easily be understood by a wider
audience. I fear it will be hard to come up with something that is
clear, that highlights the severity of the problems, and that does not
veer off into useless vitriol against the status quo, but if we can
get there, that would be good.
But, leaving that to one side, what technical options do we have on
the table, supposing that we want to do something that is useful but
not this exact thing?
I think one option is to somehow change the behavior around
search_path but in a different way than you've proposed. The most
radical option would be to make it not be a GUC any more. I think the
backward-compatibility implications of that would likely be
unpalatable to many, and the details of what we'd actually do are also
not clear, at least to me. For a function, I think there is a
reasonable argument that you just make it a function property, like
IMMUTABLE, as you said before. But for code that goes directly into
the session, where's the search_path supposed to come from? It's got
to be configured somewhere, and somehow that somewhere feels a lot
like a GUC. That leads to a second idea, which is having it continue
to be a GUC but only affect directly-entered SQL, with all
indirectly-entered SQL either being stored as a node tree or having a
search_path property attached somewhere. Or, as a third idea, suppose
we leave it a GUC but start breaking semantics around where and how
that GUC gets set, e.g. by changing CREATE FUNCTION to capture the
prevailing search_path by default unless instructed otherwise.
Personally I feel like we'd need pretty broad consensus for any of
these kinds of changes because it would break a lot of stuff for a lot
of people, but if we could get that then I think we could maybe emerge
in a better spot once the pain of the compatibility break receded.
Another option is something around sandboxing and/or function trust.
The idea here is to not care too much about the search_path behavior
itself, and instead focus on the consequences, namely what code is
getting executed as which user and perhaps what kinds of operations
it's performing. To me, this seems like a possibly easier answer way
forward at least in the short to medium term, because I think it will
break fewer things for fewer people, and if somebody doesn't like the
new behavior they can just say "well I trust everyone completely" and
it all goes back to the way it was. That said, I think there are
problems with my previous proposals on the other thread so I believe
some adjustments would be needed there, and then there's the problem
of actually implementing anything. I'll try to respond to your
comments on that thread soon.
Are there any other categories of things we can do? More specific
kinds of things we can do in each category? I don't really see an
option other than (1) "change something in the system design so that
people use search_path wrongly less often" or (2) "make it so that it
doesn't matter as much if people using the wrong search_path" but
maybe I'm missing a clever idea.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:
I agree this is a mess, and that documenting the mess better would be
good. But instead of saying not to do something, we need to say what
will happen if you do the thing. I'm regularly annoyed when somebody
reports that "I tried to do X and it didn't work," instead of saying
what happened when they tried, and this situation is another form of
the same thing. "If you do X, then Y will or can occur" is much
better
than "do not do X".
Good documentation includes some guidance. Sure, it should describe the
system behavior, but without anchoring it to some kind of expected use
case, it can be equally frustrating.
Allow me to pick on this example which came up in a recent thread:
"[password_required] Specifies whether connections to the publisher
made as a result of this subscription must use password authentication.
This setting is ignored when the subscription is owned by a superuser.
The default is true. Only superusers can set this value to false."
-- https://www.postgresql.org/docs/16/sql-createsubscription.html
Only superusers can set it, and it's ignored for superusers. That does
a good job of describing the actual behavior, but is a bit puzzling.
I guess what the user is supposed to do is one of:
1. Create a subscription as a superuser with the right connection
string (without a password) and password_required=false, then reassign
it to a non-superuser; or
2. Create a subscription as a non-superuser member of
pg_create_subscription using a bogus connection string, then a
superuser can alter it to set password_required=false, then alter the
connection string; or
3. Create a superuser, let the new superuser create a subscription
with password_required=false, and then remove their superuser status.
so why not just document one of those things as the expected thing to
do? Not a whole section or anything, but a sentence to suggest what
they should do or where else they should look.
I don't mean to set some major new standard in the documentation that
should apply to everything; but for the privilege system, even hackers
are having trouble keeping up (myself included). A bit of guidance
toward supported use cases helps a lot.
I fear it will be hard to come up with something that is
clear, that highlights the severity of the problems, and that does
not
veer off into useless vitriol against the status quo, but if we can
get there, that would be good.
I hope what I'm saying is not useless vitriol. I am offering the best
solutions I see in a bad situation. And I believe I've uncovered some
emergent behaviors that are not well-understood even among prominent
hackers.
That leads to a second idea, which is having it continue
to be a GUC but only affect directly-entered SQL, with all
indirectly-entered SQL either being stored as a node tree or having a
search_path property attached somewhere.
That's not too far from the proposed patch and I'd certainly be
interested to hear more and/or adapt my patch towards this idea.
Or, as a third idea, suppose
we leave it a GUC but start breaking semantics around where and how
that GUC gets set, e.g. by changing CREATE FUNCTION to capture the
prevailing search_path by default unless instructed otherwise.
How would one instruct otherwise?
Personally I feel like we'd need pretty broad consensus for any of
these kinds of changes
+1
because it would break a lot of stuff for a lot
of people, but if we could get that then I think we could maybe
emerge
in a better spot once the pain of the compatibility break receded.
Are there ways we can soften this a bit? I know compatibility GUCs are
not to be added lightly, but perhaps one is justified here?
Another option is something around sandboxing and/or function trust.
The idea here is to not care too much about the search_path behavior
itself, and instead focus on the consequences, namely what code is
getting executed as which user and perhaps what kinds of operations
it's performing.
I'm open to discussing that further, and it certainly may solve some
problems, but it does not seem to solve the fundamental problem with
search_path: that the caller can (intentionally or unintentionally)
cause a function to do unexpected things.
Sometimes an unexpected thing is not a the kind of thing that would be
caught by a sandbox, e.g. just an unexpected function result. But if
that function is used in a constraint or expression index, that
unexpected result can lead to a violated constraint or a bad index
(that will later cause wrong results). The owner of the table might
reasonably consider that a privilege problem, if the user who causes
the trouble had only INSERT privileges.
Are there any other categories of things we can do? More specific
kinds of things we can do in each category? I don't really see an
option other than (1) "change something in the system design so that
people use search_path wrongly less often" or (2) "make it so that it
doesn't matter as much if people using the wrong search_path" but
maybe I'm missing a clever idea.
Perhaps there are some clever ideas about maintaining compatibility
within the approaches (1) or (2), which might make one of them more
appealing.
Regards,
Jeff Davis
On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
...
On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote:That leads to a second idea, which is having it continue
to be a GUC but only affect directly-entered SQL, with all
indirectly-entered SQL either being stored as a node tree or having a
search_path property attached somewhere.That's not too far from the proposed patch and I'd certainly be
interested to hear more and/or adapt my patch towards this idea.
As an interested bystander, that's the same thing I was thinking when
reading this. I reread your original e-mail, Jeff, and I still think
that.
I wonder if something like CURRENT (i.e., the search path at function
creation time) might be a useful keyword addition. I can see some uses
(more forgiving than SYSTEM but not as loose as SESSION), but I don't
know if it would justify its presence.
Thanks for working on this.
Thanks,
Maciek