Facility for detecting insecure object naming
Commit e09144e documented the rules for writing safe qualified names, but
those rules are tedious to apply in practice. Spotting the defects in this
function definition (from an unpublished draft intended for
/messages/by-id/20180710014308.GA805781@rfd.leadboat.com) is, I think, too
hard:
CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS 'SELECT CASE
WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8
WHEN @cube_schema@.cube_ll_coord($1, 3) OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8
ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord($1, 3)
OPERATOR(pg_catalog./) @extschema@.earth()))
END';
If hackers and non-core extension authors are to write such code, let's make
it easier to check the work. Different classes of code need different checks.
In each class, qualified function and operator names referring to untrusted
schemas need an exact match of function parameters, including any VARIADIC.
Class-specific rules:
a. SQL intended to run under secure search_path. No class-specific rules.
src/bin code is an example of this class, and this is the only secure class
for end-user applications.
b. SQL intended for a particular search_path, possibly untrusted. Unqualified
names need an exact match. Use a qualified name for any object whose
schema appears in search_path later than some untrusted schema. Examples
of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some
functions with "SET search_path", and many casual end-user applications.
c. SQL intended to work the same under every search_path setting. Do not use
any unqualified name. Most pg_catalog and contrib functions, but not those
having a "SET search_path" annotation, are examples of this class.
I believe PostgreSQL can apply each class's rules given a list of trusted
schemas and a flag to enable the checks. Class (b) naturally degenerates to
class (a) if every schema of search_path appears in the whitelist. To check
class-(c) code, issue "SET search_path = not_in_whitelist, pg_temp,
pg_catalog, ..." before the test queries. That's something of a hack, but I
didn't think of a non-hack that I liked better.
Should this feature warn about "SELECT 'earth()'::pg_catalog.regprocedure"
under the conditions that would make it warn about "SELECT earth()"? Should
it offer the option to warn or not warn? Some uses of reg*,
e.g. appendQualifiedRelation(), would consider those to be false positives.
Then there's the question of exact UI naming. Some possibilities:
SET lint_trusted_schemas = pg_catalog, admin
SET lint = reg*, exact_match, qualified_name
SET lint = all
SET lint = ''
SET lint_trusted_schemas = pg_catalog, admin
SET lint_name_security = on
SET name_security_warning_trusted_schemas = pg_catalog, admin
SET name_security_warning = on
SET name_security_warnings_trusted_schemas = pg_catalog, admin
SET warnings = reg*, exact_match, qualified_name
Preferences, other ideas?
On Sun, Aug 5, 2018 at 1:34 PM, Noah Misch <noah@leadboat.com> wrote:
If hackers and non-core extension authors are to write such code, let's make
it easier to check the work.
+1. Better still would be to invent a way to remove the need for such
onerous qualification, but I don't have a good idea.
a. SQL intended to run under secure search_path. No class-specific rules.
src/bin code is an example of this class, and this is the only secure class
for end-user applications.b. SQL intended for a particular search_path, possibly untrusted. Unqualified
names need an exact match. Use a qualified name for any object whose
schema appears in search_path later than some untrusted schema. Examples
of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some
functions with "SET search_path", and many casual end-user applications.c. SQL intended to work the same under every search_path setting. Do not use
any unqualified name. Most pg_catalog and contrib functions, but not those
having a "SET search_path" annotation, are examples of this class.
If I understand correctly, the only difference between (b) and (c) is
that references to an object in the very first schema in the search
path could be unqualified; in most cases, that would be pg_catalog.
So I guess what we need is a UI that lets you say either:
- Don't tell me about anything, or
- Tell me about all unqualified references, or
- Tell me about all unqualified references except those that latch
onto an object in <list of schemas>
Personally, I'm not entirely sold on having that third capability. I
guess it's valuable, but the second one seems like the much more
valuable thing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Aug 08, 2018 at 12:00:45PM +0530, Robert Haas wrote:
On Sun, Aug 5, 2018 at 1:34 PM, Noah Misch <noah@leadboat.com> wrote:
If hackers and non-core extension authors are to write such code, let's make
it easier to check the work.+1. Better still would be to invent a way to remove the need for such
onerous qualification, but I don't have a good idea.
Agreed. Thanks for thinking about it.
a. SQL intended to run under secure search_path. No class-specific rules.
src/bin code is an example of this class, and this is the only secure class
for end-user applications.b. SQL intended for a particular search_path, possibly untrusted. Unqualified
names need an exact match. Use a qualified name for any object whose
schema appears in search_path later than some untrusted schema. Examples
of this class include extension scripts, pre-CVE-2018-1058 pg_dump, some
functions with "SET search_path", and many casual end-user applications.c. SQL intended to work the same under every search_path setting. Do not use
any unqualified name. Most pg_catalog and contrib functions, but not those
having a "SET search_path" annotation, are examples of this class.If I understand correctly, the only difference between (b) and (c) is
that references to an object in the very first schema in the search
path could be unqualified; in most cases, that would be pg_catalog.
In (b), there's no bound on the number of schemas for which qualification is
optional. Consider search_path=pg_temp,pg_catalog,admin,"$user",public with
you trusting each of those schemas except "public". The rules of (b) then do
not require any qualified names, though unqualified names shall arrange an
exact match of argument types. On the other hand, with
search_path=public,equally_public, one shall qualify all names of objects
located in equally_public.
So I guess what we need is a UI that lets you say either:
- Don't tell me about anything, or
- Tell me about all unqualified references, or
- Tell me about all unqualified references except those that latch
onto an object in <list of schemas>
"SELECT exp(1.0)" merits a warning under search_path=public,pg_catalog but not
under search_path=pg_catalog,public. Thus, it's more subtle than your last
bullet. That's why I envisioned the user declaring which schemas to trust.
The system uses that and the actual search_path to choose warnings.
Personally, I'm not entirely sold on having that third capability. I
guess it's valuable, but the second one seems like the much more
valuable thing.
That's fair. Even without pursuing that, it's valuable to know the list of
trusted schemas so we can relax about exact argument type matches when the
function is in a trusted schema. For example, pg_catalog.exp(1) is safe
because no hostile user can create pg_catalog.exp(int4).
Robert Haas <robertmhaas@gmail.com> writes:
+1. Better still would be to invent a way to remove the need for such
onerous qualification, but I don't have a good idea.
Yeah.
So I guess what we need is a UI that lets you say either:
- Don't tell me about anything, or
- Tell me about all unqualified references, or
- Tell me about all unqualified references except those that latch
onto an object in <list of schemas>
Personally, I'm not entirely sold on having that third capability. I
guess it's valuable, but the second one seems like the much more
valuable thing.
I'm not sold on #2 either. That path leads to, for example,
s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic
to both readability and portability of your SQL code. We *must* find
a way to do better, not tell people that's what to do.
When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
regards, tom lane
On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
I must be missing something. Aren't search_path manipulation problems
avoided by using "SET search_path FROM CURRENT"?
While I'm asking, does anybody know why this isn't the default, especially
for SECURITY DEFINER functions? It seems like in addition to being a more
secure default, it would be better for JIT compilation - right now it seems
you need to re-compile whenever the function is called with a different
search_path. The ability for a function's meaning to change dramatically
depending on the caller's search_path seems like an occasionally-useful
extra, not what one would expect as the default.
Isaac Morland <isaac.morland@gmail.com> writes:
On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
I must be missing something. Aren't search_path manipulation problems
avoided by using "SET search_path FROM CURRENT"?
No, not if you have any publicly-writable schemas anywhere in your
search path. The reason this business is so nasty is that our
historical default ("$user, public", with pg_catalog implicitly
searched before that) is actually insecure, even for references
intended to go to pg_catalog. There are too many cases where the
ambiguous-operator resolution rules will allow a reference to be
captured by a similarly-named operator later in the search path.
If you're using a secure search_path to start with, it's possible
that decorating your functions with SET search_path FROM CURRENT
would be helpful, but it's not like that's some kind of magic bullet.
While I'm asking, does anybody know why this isn't the default, especially
for SECURITY DEFINER functions?
It might fix some subset of security issues, but I think that changing
the default behavior like that would break a bunch of other use-cases.
It'd be especially surprising for such a thing to apply only to
SECURITY DEFINER functions.
The bigger picture here is that it's really hard to fix this class
of problems by trying to dictate changes in the way people have their
search paths set up. You're much more likely to break working
applications than help anybody. I'm still of the opinion that we're
going to be forced to provide loopholes in what we did to pg_dump
et al in CVE-2018-1058. We've been seeing an increasing number
of complaints about that since the patches came out, and I'm pretty
sure that most of the complainers are totally uninterested in
defending against share-my-database-with-a-hostile-user scenarios.
Why should they have to complicate their lives because of problems
that other people might have?
The advantage of a function trust mechanism is that it'd provide
security against calling functions you didn't intend to without
any visible changes in normal application behavior. The security
team gave up on that approach because it seemed too complicated to
pursue as a secretly-developed security patch, but I still think
it's the right long-term answer.
regards, tom lane
On Aug 8, 2018, at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
On 8 August 2018 at 09:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.I must be missing something. Aren't search_path manipulation problems
avoided by using "SET search_path FROM CURRENT"?No, not if you have any publicly-writable schemas anywhere in your
search path. The reason this business is so nasty is that our
historical default ("$user, public", with pg_catalog implicitly
searched before that) is actually insecure, even for references
intended to go to pg_catalog. There are too many cases where the
ambiguous-operator resolution rules will allow a reference to be
captured by a similarly-named operator later in the search path.If you're using a secure search_path to start with, it's possible
that decorating your functions with SET search_path FROM CURRENT
would be helpful, but it's not like that's some kind of magic bullet.While I'm asking, does anybody know why this isn't the default, especially
for SECURITY DEFINER functions?It might fix some subset of security issues, but I think that changing
the default behavior like that would break a bunch of other use-cases.
It'd be especially surprising for such a thing to apply only to
SECURITY DEFINER functions.The bigger picture here is that it's really hard to fix this class
of problems by trying to dictate changes in the way people have their
search paths set up. You're much more likely to break working
applications than help anybody. I'm still of the opinion that we're
going to be forced to provide loopholes in what we did to pg_dump
et al in CVE-2018-1058. We've been seeing an increasing number
of complaints about that since the patches came out, and I'm pretty
sure that most of the complainers are totally uninterested in
defending against share-my-database-with-a-hostile-user scenarios.
Why should they have to complicate their lives because of problems
that other people might have?The advantage of a function trust mechanism is that it'd provide
security against calling functions you didn't intend to without
any visible changes in normal application behavior. The security
team gave up on that approach because it seemed too complicated to
pursue as a secretly-developed security patch, but I still think
it's the right long-term answer.
Do you have a WIP patch partially developed for this? If it is no
longer secret, perhaps the rest of us could take a look?
mark
Mark Dilger <hornschnorter@gmail.com> writes:
On Aug 8, 2018, at 7:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The advantage of a function trust mechanism is that it'd provide
security against calling functions you didn't intend to without
any visible changes in normal application behavior. The security
team gave up on that approach because it seemed too complicated to
pursue as a secretly-developed security patch, but I still think
it's the right long-term answer.
Do you have a WIP patch partially developed for this? If it is no
longer secret, perhaps the rest of us could take a look?
Yeah, I do have a POC prototype, let me blow the dust off it ...
regards, tom lane
On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote:
I'm not sold on #2 either. That path leads to, for example,
s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic
to both readability and portability of your SQL code. We *must* find
a way to do better, not tell people that's what to do.When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
Interesting. If we got a function trust mechanism, how much qualification
would you then like? Here are the levels I know about, along with their
implications:
-- (1) Use qualified references and exact match for all objects.
--
-- Always secure, even if schema usage does not conform to ddl-schemas-patterns
-- and function trust is disabled.
--
-- Subject to denial of service from anyone able to CREATE in cube schema or
-- earthdistance schema.
CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS $$SELECT CASE
WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3)
OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN -90::pg_catalog.float8
WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3)
OPERATOR(pg_catalog./)
@extschema@.earth() OPERATOR(pg_catalog.>) 1 THEN 90::pg_catalog.float8
ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord(
$1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth()))
END$$;
-- (2) Use qualified references for objects outside pg_catalog.
--
-- With function trust disabled, this would be subject to privilege escalation
-- from anyone able to CREATE in cube schema.
--
-- Subject to denial of service from anyone able to CREATE in cube schema or
-- earthdistance schema.
CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS $$SELECT CASE
WHEN @cube_schema@.cube_ll_coord($1, 3)
/
@extschema@.earth() < -1 THEN -90::float8
WHEN @cube_schema@.cube_ll_coord($1, 3)
/
@extschema@.earth() > 1 THEN 90::float8
ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / @extschema@.earth()))
END$$;
-- (3) "SET search_path" with today's code.
--
-- Security and reliability considerations are the same as (2). Today, this
-- reduces performance by suppressing optimizations like inlining.
CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
SET search_path FROM CURRENT
AS $$SELECT CASE
WHEN cube_ll_coord($1, 3)
/
earth() < -1 THEN -90::float8
WHEN cube_ll_coord($1, 3)
/
earth() > 1 THEN 90::float8
ELSE degrees(asin(cube_ll_coord($1, 3) / earth()))
END$$;
-- (4) Today's code (reformatted).
--
-- Always secure if schema usage conforms to ddl-schemas-patterns, even if
-- function trust is disabled. If cube schema or earthdistance schema is not in
-- search_path, function doesn't work.
CREATE FUNCTION latitude(earth)
RETURNS float8
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS $$SELECT CASE
WHEN cube_ll_coord($1, 3)
/
earth() < -1 THEN -90::float8
WHEN cube_ll_coord($1, 3)
/
earth() > 1 THEN 90::float8
ELSE degrees(asin(cube_ll_coord($1, 3) / earth()))
END$$;
On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote:
-- (3) "SET search_path" with today's code.
--
-- Security and reliability considerations are the same as (2). Today, this
-- reduces performance by suppressing optimizations like inlining.
Out of curiosity, why does this suppress inlining?
Anyways, my preference would be to have syntax by which to say: resolve
at declaration time using the then-in-effect search_path and store
as-qualified. This could just be SET search_path without an assignment.
CREATE FUNCTION ... AS $$ ... $$ SET search_path;
Another possibility would be to have a way to set a search_path for all
expressions in a given schema, something like:
SET SCHEMA my_schema DEFAULT search_path = ...;
which would apply to all expressions in schema elements in schema
"my_schema":
- CHECK expressions
- INDEX expressions
- VIEWs and MATERIALIZED VIEWs
- FUNCTION and STORED PROCEDURE bodies
- ...
CREATE SCHEMA IF NOT EXISTS my_schema;
SET SCHEMA my_schema DEFAULT search_path = my_schema, my_other_schema;
CREATE OR REPLACE FUNCTION foo() ... AS $$ ... $$;
...
Nico
--
On Wed, Aug 08, 2018 at 10:47:04AM -0400, Tom Lane wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
While I'm asking, does anybody know why this isn't the default, especially
for SECURITY DEFINER functions?It might fix some subset of security issues, but I think that changing
the default behavior like that would break a bunch of other use-cases.
It'd be especially surprising for such a thing to apply only to
SECURITY DEFINER functions.
Some projects consider breaking backwards compatibility to fix security
problems (within reason, and with discussion) to be a fair thing to do.
Already people have to qualify their apps for every release of PG. I
think this problem very much deserves a good solution.
Nico
--
On Sat, Aug 11, 2018 at 03:32:23PM -0500, Nico Williams wrote:
On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote:
-- (3) "SET search_path" with today's code.
--
-- Security and reliability considerations are the same as (2). Today, this
-- reduces performance by suppressing optimizations like inlining.Out of curiosity, why does this suppress inlining?
The function call machinery, fmgr_security_definer(), is what actually applies
the setting. To inline such functions, one would need to develop a
representation that attaches the setting change to the nodes resulting from
the inlining. When evaluating said nodes, apply the attached setting.
Anyways, my preference would be to have syntax by which to say: resolve
at declaration time using the then-in-effect search_path and store
as-qualified.
Agreed; having that would be great. (I mentioned it as option (7) of
/messages/by-id/20180710014308.GA805781@rfd.leadboat.com.) It has
limitations, though:
- Does not help with inexact argument type matches.
- While the applicability to sql-language functions seems clear, other
languages don't benefit as much. You might extend it to a subset of
PL/pgSQL functions, excluding e.g. ones that contain EXECUTE. I see no
chance to help PL/Perl or PL/Python.
- Unlikely to be a good back-patch candidate.
Another possibility would be to have a way to set a search_path for all
expressions in a given schema, something like:SET SCHEMA my_schema DEFAULT search_path = ...;
which would apply to all expressions in schema elements in schema
"my_schema":- CHECK expressions
- INDEX expressions
- VIEWs and MATERIALIZED VIEWs
- FUNCTION and STORED PROCEDURE bodies
- ...CREATE SCHEMA IF NOT EXISTS my_schema;
SET SCHEMA my_schema DEFAULT search_path = my_schema, my_other_schema;
CREATE OR REPLACE FUNCTION foo() ... AS $$ ... $$;
That's interesting. I suspect we'd first want to make inlining possible.
On Wed, Aug 8, 2018 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm not sold on #2 either. That path leads to, for example,
s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic
to both readability and portability of your SQL code. We *must* find
a way to do better, not tell people that's what to do.When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.
The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. Many years ago,
Perl had only "local" as a way of declaring local variables, which
used dynamic scoping, and then they invented "my", which uses lexical
scoping, and everybody abandoned "local" and started using "my,"
because otherwise a reference to a variable inside a procedure may
happen to refer to a local variable further up the call stack rather
than a global variable if the names happen to collide. It turns out
that not knowing to which object a reference will refer is awful, and
you should avoid it like the plague. This is exactly the same problem
we have with search_path. People define functions -- or index
expressions -- assuming that the function and operator references will
refer to the same thing at execution time that they do at definition
time.
That is, I think, an entirely *reasonable* expectation. Basically all
programming languages work that way. If you could call a C function
in such a way that the meaning of identifiers that appeared there was
dependent on some piece of state inherited from the caller's context
rather than from the declarations visible at the time that the C
function was compiled, it would be utter and complete chaos. It would
in fact greatly resemble the present state of affairs in PostgreSQL,
where making your code reliably mean what you intended requires either
forcing the search path everywhere or schema-qualifying the living
daylights out of everything. I think we should try to follow Perl's
example, acknowledge that we basically got this wrong on the first
try, and invent something new that works better.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote:
The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. Many years ago,
Perl had only "local" as a way of declaring local variables, which
used dynamic scoping, and then they invented "my", which uses lexical
scoping, and everybody abandoned "local" and started using "my,"
because otherwise a reference to a variable inside a procedure may
happen to refer to a local variable further up the call stack rather
than a global variable if the names happen to collide. It turns out
that not knowing to which object a reference will refer is awful, and
you should avoid it like the plague. This is exactly the same problem
we have with search_path. People define functions -- or index
expressions -- assuming that the function and operator references will
refer to the same thing at execution time that they do at definition
time.That is, I think, an entirely *reasonable* expectation. Basically all
programming languages work that way. If you could call a C function
in such a way that the meaning of identifiers that appeared there was
dependent on some piece of state inherited from the caller's context
rather than from the declarations visible at the time that the C
function was compiled, it would be utter and complete chaos. It would
in fact greatly resemble the present state of affairs in PostgreSQL,
where making your code reliably mean what you intended requires either
forcing the search path everywhere or schema-qualifying the living
daylights out of everything. I think we should try to follow Perl's
example, acknowledge that we basically got this wrong on the first
try, and invent something new that works better.
Well, in C, if the calling function is not in the current C file, you
really don't know what function you are linking to --- it depends on
what other object files are linked into the executable. I am unclear
how lexical scoping helps with this, or with Postgres.
With Postgres you are effectively adding functions into the executable
that didn't exist at link time, and potentially replacing compiled-in
functions with others that might match the call site better.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On Tue, Aug 14, 2018 at 3:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
Well, in C, if the calling function is not in the current C file, you
really don't know what function you are linking to --- it depends on
what other object files are linked into the executable.
True.
I am unclear how lexical scoping helps with this, or with Postgres.
Well, if you say sqrt(x) in C, and you don't have sqrt or x defined in
the current function, you know that you're going to call a global
function called sqrt and pass to it a global variable called x.
You're not going to latch onto a local variable called x in the
function that called your function, and if the calling function has a
local variable called sqrt, that doesn't matter either. The linker
can point every called to sqrt() in the program at a different place
than expected, but you can't monkey with the behavior of an individual
call by having the calling function declare identifiers that mask the
ones the function intended to reference.
On the other hand, when you call an SQL-language function, or even to
some extent a plpgsql-language function, from PostgreSQL, you can in
fact change the meanings of every identifier that appears in that
function body unless those references are all schema-qualified or the
function sets the search path. If an SQL-language function calls
sqrt(x), you can set the search_path to some schema you've created
where there is a completely different sqrt() function than that
function intended to call. That is a very good way to make stuff (a)
break or (b) be insecure.
The point of having a lexically-scoped search path is that it is
generally the case that when you write SQL code, you know the search
path under which you want it to execute. You can get that behavior
today by attaching a SET clause to the function, but that defeats
inlining, is slower, and the search path you configure applies not
only to the code to which is attached but to any code which it in turn
calls. In other words, the search path setting which you configure
has dynamic scope. I submit that's bad. Functions can reasonably be
expected to know the search path that should be used to run the code
they actually contain, but they cannot and should not need to know the
search path that should be used to run code in other functions which
they happen to call.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 14, 2018 at 04:23:33PM -0400, Robert Haas wrote:
On Tue, Aug 14, 2018 at 3:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
Well, in C, if the calling function is not in the current C file, you
really don't know what function you are linking to --- it depends on
what other object files are linked into the executable.True.
I am unclear how lexical scoping helps with this, or with Postgres.
Well, if you say sqrt(x) in C, and you don't have sqrt or x defined in
the current function, you know that you're going to call a global
function called sqrt and pass to it a global variable called x.
You're not going to latch onto a local variable called x in the
function that called your function, and if the calling function has a
local variable called sqrt, that doesn't matter either. The linker
can point every called to sqrt() in the program at a different place
than expected, but you can't monkey with the behavior of an individual
call by having the calling function declare identifiers that mask the
ones the function intended to reference.
What we are doing is more like C++ virtual functions, where the class
calling it can replace function calls in subfunctions:
https://www.geeksforgeeks.org/virtual-function-cpp/
On the other hand, when you call an SQL-language function, or even to
some extent a plpgsql-language function, from PostgreSQL, you can in
fact change the meanings of every identifier that appears in that
function body unless those references are all schema-qualified or the
function sets the search path. If an SQL-language function calls
I think we decide that search path alone for functions is insufficient
because of data type matching, unless the schema is secure.
sqrt(x), you can set the search_path to some schema you've created
where there is a completely different sqrt() function than that
function intended to call. That is a very good way to make stuff (a)
break or (b) be insecure.The point of having a lexically-scoped search path is that it is
generally the case that when you write SQL code, you know the search
path under which you want it to execute. You can get that behavior
today by attaching a SET clause to the function, but that defeats
inlining, is slower, and the search path you configure applies not
only to the code to which is attached but to any code which it in turn
calls. In other words, the search path setting which you configure
has dynamic scope. I submit that's bad. Functions can reasonably be
expected to know the search path that should be used to run the code
they actually contain, but they cannot and should not need to know the
search path that should be used to run code in other functions which
they happen to call.
So you are saying PG functions should lock down their search path at
function definition time, and use that for all function invocations?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On Aug 14, 2018, at 8:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 8, 2018 at 1:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm not sold on #2 either. That path leads to, for example,
s/=/OPERATOR(pg_catalog.=)/g everywhere, which is utterly catastrophic
to both readability and portability of your SQL code. We *must* find
a way to do better, not tell people that's what to do.When the security team was discussing this issue before, we speculated
about ideas like inventing a function trust mechanism, so that attacks
based on search path manipulations would fail even if they managed to
capture an operator reference. I'd rather go down that path than
encourage people to do more schema qualification.The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. Many years ago,
Perl had only "local" as a way of declaring local variables, which
used dynamic scoping, and then they invented "my", which uses lexical
scoping, and everybody abandoned "local" and started using "my,"
because otherwise a reference to a variable inside a procedure may
happen to refer to a local variable further up the call stack rather
than a global variable if the names happen to collide. It turns out
that not knowing to which object a reference will refer is awful, and
you should avoid it like the plague. This is exactly the same problem
we have with search_path. People define functions -- or index
expressions -- assuming that the function and operator references will
refer to the same thing at execution time that they do at definition
time.That is, I think, an entirely *reasonable* expectation. Basically all
programming languages work that way. If you could call a C function
in such a way that the meaning of identifiers that appeared there was
dependent on some piece of state inherited from the caller's context
rather than from the declarations visible at the time that the C
function was compiled, it would be utter and complete chaos. It would
in fact greatly resemble the present state of affairs in PostgreSQL,
where making your code reliably mean what you intended requires either
forcing the search path everywhere or schema-qualifying the living
daylights out of everything. I think we should try to follow Perl's
example, acknowledge that we basically got this wrong on the first
try, and invent something new that works better.
I think you are interpreting the problem too broadly. This is basically
just a privilege escalation attack vector. If there is a function that
gets run under different privileges than those of the caller, and if the
caller can coerce the function to do something with those elevated
privileges that the function author did not intend, then the caller can
do mischief. But if the caller is executing a function that operates
under the caller's own permissions, then there is no mischief possible,
since the caller can't trick the function to do anything that the caller
couldn't do herself directly.
For example, if there is a function that takes type anyarray and then
performs operations over the elements of that array using operator=,
there is no problem with me defining an operator= prior to calling that
function and having my operator be the one that gets used, assuming the
function doesn't escalate privileges.
To my mind, the search_path should get set whenever SetUserIdAndSecContext
or similar gets called and restored when SetUserIdAndSecContext gets called
to restore the saved uid. Currently coded stuff like:
GetUserIdAndSecContext(&saved_uid, &save_sec_context);
...
if (saved_uid != authid_uid)
SetUserIdAndSecContext(authid_uid,
save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
...
SetUserIdAndSecContext(saved_uid, save_sec_context);
need to have a third variable, saved_search_path or similar.
Thoughts?
mark
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote:
The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. [...]
Yes please!
This is what I want. Evaluate the search_path at function definition
time, and record code with fully-qualified symbols in the catalog.
Nico
--
On 08/14/2018 07:01 PM, Nico Williams wrote:
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote:
The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. [...]Yes please!
This is what I want. Evaluate the search_path at function definition
time, and record code with fully-qualified symbols in the catalog.
What would the interface to that look like for out-of-tree PL
maintainers?
-Chap
On Aug 14, 2018, at 4:01 PM, Nico Williams <nico@cryptonector.com> wrote:
On Tue, Aug 14, 2018 at 03:00:55PM +0000, Robert Haas wrote:
The more I think about it, the more I think having a way to set a
lexically-scoped search path is probably the answer. [...]Yes please!
This is what I want. Evaluate the search_path at function definition
time, and record code with fully-qualified symbols in the catalog.
Unfortunately, that falls apart for relocatable extensions.