leaky views, yet again
I think we pretty much have conceptual agreement on the shape of the
solution to this problem: when a view is created with CREATE SECURITY
VIEW, restrict functions and operators that might disclose tuple data
from being pushed down into view (unless, perhaps, the user invoking
the view has sufficient privileges to execute the underlying query
anyhow, e.g. superuser or view owner). What we have not resolved is
exactly how we're going to decide which functions and operators might
do such a dastardly thing. I think the viable options are as follows:
1. Adopt Heikki's proposal of treating indexable operators as non-leaky.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php
Or, perhaps, and even more restrictively, treat only
hashable/mergeable operators as non-leaky.
2. Add an explicit flag to pg_proc, which can only be set by
superusers (and which is cleared if a non-superuser modifies it in any
way), allowing a function to be tagged as non-leaky. I believe that
it would be reasonable to set this flag for all of our built-in
indexable operators (though I'd read the code before doing it), but it
would remove the need for the assumption that third-party operators
are equally sane.
CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT
42$$ IMMUTABLE SEAWORTHY; -- doesn't leak
This problem is not going away, so we should try to decide on something.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
(2010/07/08 22:08), Robert Haas wrote:
I think we pretty much have conceptual agreement on the shape of the
solution to this problem: when a view is created with CREATE SECURITY
VIEW, restrict functions and operators that might disclose tuple data
from being pushed down into view (unless, perhaps, the user invoking
the view has sufficient privileges to execute the underlying query
anyhow, e.g. superuser or view owner). What we have not resolved is
exactly how we're going to decide which functions and operators might
do such a dastardly thing. I think the viable options are as follows:1. Adopt Heikki's proposal of treating indexable operators as non-leaky.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php
Or, perhaps, and even more restrictively, treat only
hashable/mergeable operators as non-leaky.2. Add an explicit flag to pg_proc, which can only be set by
superusers (and which is cleared if a non-superuser modifies it in any
way), allowing a function to be tagged as non-leaky. I believe that
it would be reasonable to set this flag for all of our built-in
indexable operators (though I'd read the code before doing it), but it
would remove the need for the assumption that third-party operators
are equally sane.CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT
42$$ IMMUTABLE SEAWORTHY; -- doesn't leakThis problem is not going away, so we should try to decide on something.
I'd like to vote the second option, because this approach will be also
applied on another aspect of leaky views.
When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.
It will not be an easy work to mark built-in functions as either leaky
or non-leaky, but it seems to me the most straight forward solution.
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/7/8 KaiGai Kohei <kaigai@ak.jp.nec.com>:
(2010/07/08 22:08), Robert Haas wrote:
I think we pretty much have conceptual agreement on the shape of the
solution to this problem: when a view is created with CREATE SECURITY
VIEW, restrict functions and operators that might disclose tuple data
from being pushed down into view (unless, perhaps, the user invoking
the view has sufficient privileges to execute the underlying query
anyhow, e.g. superuser or view owner). What we have not resolved is
exactly how we're going to decide which functions and operators might
do such a dastardly thing. I think the viable options are as follows:1. Adopt Heikki's proposal of treating indexable operators as non-leaky.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php
Or, perhaps, and even more restrictively, treat only
hashable/mergeable operators as non-leaky.2. Add an explicit flag to pg_proc, which can only be set by
superusers (and which is cleared if a non-superuser modifies it in any
way), allowing a function to be tagged as non-leaky. I believe that
it would be reasonable to set this flag for all of our built-in
indexable operators (though I'd read the code before doing it), but it
would remove the need for the assumption that third-party operators
are equally sane.CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT
42$$ IMMUTABLE SEAWORTHY; -- doesn't leakThis problem is not going away, so we should try to decide on something.
I'd like to vote the second option, because this approach will be also
applied on another aspect of leaky views.When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.It will not be an easy work to mark built-in functions as either leaky
or non-leaky, but it seems to me the most straight forward solution.
Does anyone else have an opinion on this?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On 09/07/10 06:47, KaiGai Kohei wrote:
When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.
No, that needs to be forbidden as part of the fix. Leaky functions must
not be executed before all the quals from the view are evaluated.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 19/07/10 20:08, Robert Haas wrote:
2010/7/8 KaiGai Kohei<kaigai@ak.jp.nec.com>:
(2010/07/08 22:08), Robert Haas wrote:
I think we pretty much have conceptual agreement on the shape of the
solution to this problem: when a view is created with CREATE SECURITY
VIEW, restrict functions and operators that might disclose tuple data
from being pushed down into view (unless, perhaps, the user invoking
the view has sufficient privileges to execute the underlying query
anyhow, e.g. superuser or view owner). What we have not resolved is
exactly how we're going to decide which functions and operators might
do such a dastardly thing. I think the viable options are as follows:1. Adopt Heikki's proposal of treating indexable operators as non-leaky.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php
Or, perhaps, and even more restrictively, treat only
hashable/mergeable operators as non-leaky.2. Add an explicit flag to pg_proc, which can only be set by
superusers (and which is cleared if a non-superuser modifies it in any
way), allowing a function to be tagged as non-leaky. I believe that
it would be reasonable to set this flag for all of our built-in
indexable operators (though I'd read the code before doing it), but it
would remove the need for the assumption that third-party operators
are equally sane.CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT
42$$ IMMUTABLE SEAWORTHY; -- doesn't leakThis problem is not going away, so we should try to decide on something.
I'd like to vote the second option, because this approach will be also
applied on another aspect of leaky views.When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.It will not be an easy work to mark built-in functions as either leaky
or non-leaky, but it seems to me the most straight forward solution.Does anyone else have an opinion on this?
I have a bad feeling that marking functions explicitly as seaworthy is
going to be hard to get right, and every time you get that wrong it's a
security issue. On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.
It would be easier to assess this if we had a patch to play with that
contained all the planner changes to keep track of the seaworthiness of
functions and apply the quals in right order. You could then try out
different scenarios to see what the performance is like.
I guess what I'm saying is "write a patch, and I'll shoot it down when
you're done" :-/. But hopefully the planner changes don't depend much on
how we deduce which quals are leaky.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Jul 19, 2010 at 1:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I guess what I'm saying is "write a patch, and I'll shoot it down when
you're done" :-/. But hopefully the planner changes don't depend much on how
we deduce which quals are leaky.
Oh, great... I love that.
/me rolls eyes
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
(2010/07/20 2:13), Heikki Linnakangas wrote:
On 09/07/10 06:47, KaiGai Kohei wrote:
When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.No, that needs to be forbidden as part of the fix. Leaky functions must
not be executed before all the quals from the view are evaluated.
IIUC, a view is extracted to a subquery in the rewriter phase, then it
can be pulled up to join clause at pull_up_subqueries(). In this case,
WHERE clause may have the quals come from different origins, isn't it?
E.g)
SELECT * FROM v1 WHERE f_malicious(v1.a);
At the rewriter:
-> SELECT v1.* FROM (SELECT * FROM t1 WHERE f_policy(t1.b)) v1 WHERE f_malicious(v1.a);
At the pull_up_subqueries()
-> SELECT * FROM t1 WHERE f_policy(t1.b) AND f_malicious(t1.a);
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
cost = 100 cost = 0.0001
Apart from an idea of secure/leaky function mark, isn't it necessary any
mechanism to enforce f_policy() shall be executed earlier than f_malicious()?
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/07/20 2:19), Heikki Linnakangas wrote:
On 19/07/10 20:08, Robert Haas wrote:
2010/7/8 KaiGai Kohei<kaigai@ak.jp.nec.com>:
(2010/07/08 22:08), Robert Haas wrote:
I think we pretty much have conceptual agreement on the shape of the
solution to this problem: when a view is created with CREATE SECURITY
VIEW, restrict functions and operators that might disclose tuple data
from being pushed down into view (unless, perhaps, the user invoking
the view has sufficient privileges to execute the underlying query
anyhow, e.g. superuser or view owner). What we have not resolved is
exactly how we're going to decide which functions and operators might
do such a dastardly thing. I think the viable options are as follows:1. Adopt Heikki's proposal of treating indexable operators as
non-leaky.http://archives.postgresql.org/pgsql-hackers/2010-06/msg00291.php
Or, perhaps, and even more restrictively, treat only
hashable/mergeable operators as non-leaky.2. Add an explicit flag to pg_proc, which can only be set by
superusers (and which is cleared if a non-superuser modifies it in any
way), allowing a function to be tagged as non-leaky. I believe that
it would be reasonable to set this flag for all of our built-in
indexable operators (though I'd read the code before doing it), but it
would remove the need for the assumption that third-party operators
are equally sane.CREATE OR REPLACE FUNCTION doesnt_leak() RETURNS integer AS $$SELECT
42$$ IMMUTABLE SEAWORTHY; -- doesn't leakThis problem is not going away, so we should try to decide on
something.I'd like to vote the second option, because this approach will be also
applied on another aspect of leaky views.When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.It will not be an easy work to mark built-in functions as either leaky
or non-leaky, but it seems to me the most straight forward solution.Does anyone else have an opinion on this?
I have a bad feeling that marking functions explicitly as seaworthy is
going to be hard to get right, and every time you get that wrong it's a
security issue.
Yes, it is indeed a uphill work to mark either secure or leaky on the
functions correctly. :(
On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.
I also think it is a worthful idea to try as a proof-of-concept.
If we only allow to push down indexable operators, do you have any idea
to distinguish between a purely indexable operator qual and others?
At the distribute_qual_to_rels() stage, how can we find out a qual which
is consist of only indexable operators?
It would be easier to assess this if we had a patch to play with that
contained all the planner changes to keep track of the seaworthiness of
functions and apply the quals in right order. You could then try out
different scenarios to see what the performance is like.I guess what I'm saying is "write a patch, and I'll shoot it down when
you're done" :-/. But hopefully the planner changes don't depend much on
how we deduce which quals are leaky.
I agree. The decision to mark either secure or leaky on functions should
be done independently from the planner code, like contain_volatile_functions().
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/7/21 KaiGai Kohei <kaigai@ak.jp.nec.com>:
(2010/07/20 2:13), Heikki Linnakangas wrote:
On 09/07/10 06:47, KaiGai Kohei wrote:
When leaky and non-leaky functions are chained within a WHERE clause,
it will be ordered by the cost of functions. So, we have possibility
that leaky functions are executed earlier than non-leaky functions.No, that needs to be forbidden as part of the fix. Leaky functions must
not be executed before all the quals from the view are evaluated.IIUC, a view is extracted to a subquery in the rewriter phase, then it
can be pulled up to join clause at pull_up_subqueries(). In this case,
WHERE clause may have the quals come from different origins, isn't it?E.g)
SELECT * FROM v1 WHERE f_malicious(v1.a);At the rewriter:
-> SELECT v1.* FROM (SELECT * FROM t1 WHERE f_policy(t1.b)) v1 WHERE f_malicious(v1.a);At the pull_up_subqueries()
-> SELECT * FROM t1 WHERE f_policy(t1.b) AND f_malicious(t1.a);
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
cost = 100 cost = 0.0001Apart from an idea of secure/leaky function mark, isn't it necessary any
mechanism to enforce f_policy() shall be executed earlier than f_malicious()?
I think you guys are in fact agreeing with each other.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 KaiGai Kohei <kaigai@ak.jp.nec.com>:
On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.I also think it is a worthful idea to try as a proof-of-concept.
Yeah. So, should we mark this patch as Returned with Feedback, and
you can submit a proof-of-concept patch for the next CF?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
(2010/07/21 19:26), Robert Haas wrote:
2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>:
On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.I also think it is a worthful idea to try as a proof-of-concept.
Yeah. So, should we mark this patch as Returned with Feedback, and
you can submit a proof-of-concept patch for the next CF?
Yes, it's fair enough.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
(2010/07/21 19:35), KaiGai Kohei wrote:
(2010/07/21 19:26), Robert Haas wrote:
2010/7/21 KaiGai Kohei<kaigai@ak.jp.nec.com>:
On the other hand, if it's enough from a performance
point of view to review and mark only a few built-in functions like
index operators, maybe it's ok.I also think it is a worthful idea to try as a proof-of-concept.
Yeah. So, should we mark this patch as Returned with Feedback, and
you can submit a proof-of-concept patch for the next CF?Yes, it's fair enough.
The attached patch is a proof-of-concept one according to the suggestion
from Heikki before.
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.
This patch modifies the logic that the planner decides where the
given qualifier clause should be distributed.
If the clause does not contain any "leakable" functions, nothing
were changed. In this case, the clause shall be pushed down into
inside of the join, if it depends on one-side of the join.
Elsewhere, if the clause contains any "leakable" functions, this
patch prevents to push down the clause into join loop, because
the clause need to be evaluated after the condition of join.
If it would not be prevented, the "leakable" function may expose
the contents to be invisible for users; due to the VIEWs for
row-level security purpose.
Example
--------------------------------------------
postgres=# CREATE OR REPLACE FUNCTION f_policy(int)
RETURNS bool LANGUAGE 'plpgsql'
AS 'begin return $1 % 2 = 0; end;';
CREATE FUNCTION
postgres=# CREATE OR REPLACE FUNCTION f_malicious(text)
RETURNS bool LANGUAGE 'plpgsql' COST 0.001
AS 'begin raise notice ''leak: %'', $1; return true; end';
CREATE FUNCTION
postgres=# CREATE TABLE t1 (a int primary key, b text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
CREATE TABLE
postgres=# CREATE TABLE t2 (x int primary key, y text);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for table "t2"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc');
INSERT 0 3
postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz');
INSERT 0 3
postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x);
CREATE VIEW
* SELECT * FROM v WHERE f_malicious(b);
[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
QUERY PLAN
------------------------------------------------------------------
Nested Loop (cost=0.00..133685.13 rows=168100 width=72)
Join Filter: f_policy((t1.a + t2.x))
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..24.35 rows=410 width=36)
-> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36)
Filter: f_malicious(b)
(6 rows)
==> f_malicious() may be raise a notice about invisible tuples.
[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Nested Loop (cost=0.00..400969.96 rows=168100 width=72)
Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x)))
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..28.45 rows=1230 width=36)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
==> f_malicious() is moved to outside of the join.
It is evaluated earlier than f_policy() in same level due to
the function cost, but it is another matter.
* SELECT * FROM v WHERE a = 2;
[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..353.44 rows=410 width=72)
Join Filter: f_policy((t1.a + t2.x))
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..353.44 rows=410 width=72)
Join Filter: f_policy((t1.a + t2.x))
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
==> "a = 2" is a built-in operator, so we assume it is safe.
This clause was pushed down into the join loop, then utilized to
index scan.
* SELECT * FROM v WHERE a::text = 'a';
[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2';
QUERY PLAN
----------------------------------------------------------------
Nested Loop (cost=0.00..2009.54 rows=2460 width=72)
Join Filter: f_policy((t1.a + t2.x))
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..31.55 rows=6 width=36)
-> Seq Scan on t1 (cost=0.00..31.52 rows=6 width=36)
Filter: ((a)::text = '2'::text)
(6 rows)
==> "a::text = 'a'" was pushed down into the join loop.
[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2';
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..412312.92 rows=2521 width=72)
Join Filter: (((t1.a)::text = '2'::text) AND f_policy((t1.a + t2.x)))
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..28.45 rows=1230 width=36)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
==> The "a::text = '2'" clause contains CoerceViaIO node, so this patch
assumes it is not safe.
Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.
To fix this problem definitely, we also need to mark nest-level
of the clause being originally supplied, and need to order
the clauses by the nest-level with higher priority than cost.
If we found f_malicious() and f_policy() in same level at the
above example, f_policy() should be executed earlier because
it was originally supplied in the deeper nest level.
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
pgsql-fix-leaky-join-view.4.patchtext/x-patch; name=pgsql-fix-leaky-join-view.4.patchDownload+151-1
2010/9/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.
Without making some attempt to address these two points, I don't see
the point of this patch.
Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
(2010/09/02 11:57), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.Without making some attempt to address these two points, I don't see
the point of this patch.Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.
Perhaps, I remember the previous discussion incorrectly.
If we have a hint about whether the supplied view is intended to security
purpose, or not, it seems to me it is a reliable method to prevent pulling
up the subqueries come from security views.
Is it too much deoptimization?
If we keep security views as subqueries, the later point shall be solved
implicitly. Even if we try to optimize security views in a certain level,
it is not difficult to solve, as I demonstrated before.
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
2010/9/1 KaiGai Kohei <kaigai@ak.jp.nec.com>:
(2010/09/02 11:57), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.Without making some attempt to address these two points, I don't see
the point of this patch.Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.Perhaps, I remember the previous discussion incorrectly.
If we have a hint about whether the supplied view is intended to security
purpose, or not, it seems to me it is a reliable method to prevent pulling
up the subqueries come from security views.
Is it too much deoptimization?
Well, that'd prevent something like id = 3 from getting pushed down,
which seems a bit harsh.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
(2010/09/02 12:38), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
(2010/09/02 11:57), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.Without making some attempt to address these two points, I don't see
the point of this patch.Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.Perhaps, I remember the previous discussion incorrectly.
If we have a hint about whether the supplied view is intended to security
purpose, or not, it seems to me it is a reliable method to prevent pulling
up the subqueries come from security views.
Is it too much deoptimization?Well, that'd prevent something like id = 3 from getting pushed down,
which seems a bit harsh.
Hmm. If so, we need to remember what FromExpr was come from subqueries
of security views, and what were not. Then, we need to prevent leakable
clause will be distributed to inside of them.
In addition, we also need to care about the order of function calls in
same level, because it is not implicitly solved.
At first, let's consider top-half of the matter.
When views are expanded into subqueries in query-rewriter, Query tree
lost an information OID of the view, because RangeTblEntry does not have
OID of the relation when it is RTE_SUBQUERY. So, we need to patch here
to mark a flag whether the supplied view is security focused, or not.
Then, pull_up_simple_subquery() pulls up a supplied subquery into normal
join, if possible. In this case, FromExpr is chained into the upper level.
Of course, FromExpr does not have a flag to show its origin, so we also
need to copy the new flag in RangeTblEntry to FromExpr.
Then, when distribute_qual_to_rels() is called, the caller also provides
a Bitmapset of relation-Ids which are contained under the FromExpr with
the flag saying it came from the security views.
Even if the supplied clause references a part of the Bitmapset, we need
to prevent the clause being pushed down into the relations came from
security views, except for ones we can make sure these are safe.
Perhaps, it is not impossible....
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
(2010/09/02 13:30), KaiGai Kohei wrote:
(2010/09/02 12:38), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
(2010/09/02 11:57), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.Without making some attempt to address these two points, I don't see
the point of this patch.Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.Perhaps, I remember the previous discussion incorrectly.
If we have a hint about whether the supplied view is intended to security
purpose, or not, it seems to me it is a reliable method to prevent pulling
up the subqueries come from security views.
Is it too much deoptimization?Well, that'd prevent something like id = 3 from getting pushed down,
which seems a bit harsh.
I've tried to implement a proof of the concept patch according to the
following logic.
(For the quick hack, it does not include statement support. It just
considers view with name begun from "s" as security views.)
Hmm. If so, we need to remember what FromExpr was come from subqueries
of security views, and what were not. Then, we need to prevent leakable
clause will be distributed to inside of them.
In addition, we also need to care about the order of function calls in
same level, because it is not implicitly solved.At first, let's consider top-half of the matter.
When views are expanded into subqueries in query-rewriter, Query tree
lost an information OID of the view, because RangeTblEntry does not have
OID of the relation when it is RTE_SUBQUERY. So, we need to patch here
to mark a flag whether the supplied view is security focused, or not.
This patch added 'security_view' flag into RangeTblEntry.
It shall be set when the query rewriter expands security views.
Then, pull_up_simple_subquery() pulls up a supplied subquery into normal
join, if possible. In this case, FromExpr is chained into the upper level.
Of course, FromExpr does not have a flag to show its origin, so we also
need to copy the new flag in RangeTblEntry to FromExpr.
This patch also added 'security_view' flag into FromExpr.
It shall be set when the pull_up_simple_subquery() pulled up
a RangeTblEntry with security_view = true.
Then, when distribute_qual_to_rels() is called, the caller also provides
a Bitmapset of relation-Ids which are contained under the FromExpr with
the flag saying it came from the security views.
Even if the supplied clause references a part of the Bitmapset, we need
to prevent the clause being pushed down into the relations came from
security views, except for ones we can make sure these are safe.
Just before distribute_qual_to_rels(), deconstruct_recurse() is invoked.
It walks on the supplied join-tree to collect what relations are appeared
under the current FromExpr/JoinExpr.
The deconstruct_recurse() was modified to take two new arguments of
'bool below_sec_barriers' and 'Relids *sec_barriers'.
The first one means the current recursion is under the FromExpr with
security_view being true. At that time, it set appeared relations on
the sec_barriers, then returns.
In the result, the 'sec_barriers' shall become a bitmapset of relations
being under the FromExpr which is originated by security views.
Then, 'sec_barriers' shall be delivered to distribute_qual_to_rels().
If the supplied qualifier references a part of 'sec_barriers' and
contains possibly leakable functions, it appends whole of the sec_barriers
to the bitmapset of relations on which the clause is depending.
In the result, it shall not be pushed down into the security view.
Example)
testdb=# CREATE VIEW n_view AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x;
CREATE VIEW
testdb=# CREATE VIEW s_view AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x;
CREATE VIEW
testdb=# EXPLAIN SELECT * FROM n_view WHERE f_malicious(y);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=334.93..365.94 rows=410 width=72)
Hash Cond: (t1.a = t2.x)
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=329.80..329.80 rows=410 width=36)
-> Seq Scan on t2 (cost=0.00..329.80 rows=410 width=36)
Filter: f_malicious(y)
(6 rows)
testdb=# EXPLAIN SELECT * FROM s_view WHERE f_malicious(y);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=37.68..384.39 rows=410 width=72)
Hash Cond: (t1.a = t2.x)
Join Filter: f_malicious(t2.y)
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=22.30..22.30 rows=1230 width=36)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(6 rows)
==> f_malicious() was moved to outside of the join loop
testdb=# EXPLAIN SELECT * FROM n_view WHERE f_malicious(y) AND a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..16.80 rows=1 width=72)
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Index Scan using t2_pkey on t2 (cost=0.00..8.52 rows=1 width=36)
Index Cond: (x = 2)
Filter: f_malicious(y)
(6 rows)
testdb=# EXPLAIN SELECT * FROM s_view WHERE f_malicious(y) AND a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..16.80 rows=1 width=72)
Join Filter: f_malicious(t2.y)
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (x = 2)
(6 rows)
==> Because 'a = 2' is not leakable, this clause was pushed down into
the join loop. But f_malicious() is not in 's_view' example.
testdb=# CREATE VIEW n_view2 AS SELECT * FROM t1 WHERE f_policy(a);
CREATE VIEW
testdb=# CREATE VIEW s_view2 AS SELECT * FROM t1 WHERE f_policy(a);
CREATE VIEW
testdb=# EXPLAIN SELECT * FROM n_view2 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_malicious(b) AND f_policy(a))
(2 rows)
testdb=# EXPLAIN SELECT * FROM s_view2 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_malicious(b) AND f_policy(a))
(2 rows)
==> But it does not affect anything on the case when a leakable
function from outside of the view is chained to same level
with security policy function of the view.
In this case, we need to mark functions the original nest
level, then sort by the nest level rather than cost on the
order_qual_clauses().
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
pgsql-fix-leaky-join-view.5.patchtext/x-patch; name=pgsql-fix-leaky-join-view.5.patchDownload+258-66
(2010/09/02 11:57), Robert Haas wrote:
2010/9/1 KaiGai Kohei<kaigai@ak.jp.nec.com>:
Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.Please note that this patch does not case about a case when
a function inside a view and a function outside a view are
distributed into same level and the later function has lower
cost value.Without making some attempt to address these two points, I don't see
the point of this patch.Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.
The attached patch tackles both of the above two points, and allows to
control this deoptimization when CREATE SECURITY VIEW is used.
Example
---------
CREATE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql'
AS 'begin return $1 % 2 = 0; end';
CREATE FUNCTION f_malicious(text) RETURNS bool COST 0.0001 LANGUAGE 'plpgsql'
AS 'begin raise notice ''leak: %'', $1; return true; end';
CREATE OR REPLACE VIEW v1
AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x);
CREATE OR REPLACE VIEW v2
AS SELECT * FROM t1 WHERE f_policy(a);
In this case, we don't want to invoke f_malicious() with argument come
from the tuples which should be filtered inside of the views, even if
people provides it on the WHERE clause.
Without this patch
====================
postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..287.63 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36)
Filter: f_malicious(b)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (x = t1.a)
Filter: f_policy(x)
(6 rows)
==> f_policy(x) was chained inside of the join loop, so it enables to
something to be filtered out.
postgres=# select * from v1 where f_malicious(b);
NOTICE: leak: aaa
NOTICE: leak: bbb
NOTICE: leak: ccc
NOTICE: leak: ddd
a | b | x | y
---+-----+---+-----
2 | bbb | 2 | xxx
4 | ddd | 4 | zzz
(2 rows)
postgres=# EXPLAIN select * from v2 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_malicious(b) AND f_policy(a))
(2 rows)
==> The view of v2 is pulled up to the upper level, because this subquery
is enough simple. In this case, f_malicious() and f_policy() are
handled in a same level, and ordered by its cost to execution.
Since f_malicious() has lower cost (0.001) the f_policy(), so it shall
be executed earlier, although f_policy() is used in more deep depth.
With this patch
=================
postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..287.63 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36)
Filter: f_malicious(b)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (x = t1.a)
Filter: f_policy(x)
(6 rows)
==> It has same result of the case when without this patch,
because this view is declared without "SECURITY"
postgres=# CREATE OR REPLACE SECURITY VIEW v1
AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x);
CREATE VIEW
postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=37.68..384.39 rows=137 width=72)
Hash Cond: (t1.a = t2.x)
Join Filter: (f_policy(t2.x) AND f_malicious(t1.b))
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=22.30..22.30 rows=1230 width=36)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(6 rows)
==> After we replaced this view with SECURITY option, it prevents
f_malicious() being come from outside of the view into inside
of the join loop.
postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..16.80 rows=1 width=72)
Join Filter: (f_policy(t2.x) AND f_malicious(t1.b))
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (x = 2)
(6 rows)
==> We assume operators implemented with built-in functions are safe.
So, we don't prevent this pushing-down, if the clause does not
contains something leakable.
postgres=# EXPLAIN select * from v2 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(2 rows)
==> In addition, when functions come from different nest levels
are distributed a certain scan filter, these are ordered by
its original nest level, not only function cost.
Introduction of the patch
===========================
This patch tries to tackle the two matters.
The first one is pushing down leakable functions into inside of join-loops
being originated from security view.
The distribute_qual_to_rels() distributes qualifiers of WHERE/JOIN ... ON
clauses to appropriate set of relations. Even if a qualifier is depending
on a part of join-loop inside of security view, we need to prevent it to
be pushed down.
This patch informs distribute_qual_to_rels() what relations being come from
security views using Bitmapset of 'sec_barriers'. If a qualifier overlaps
with any members of security_barriers, its dependency shall be expanded to
the union set of 'sec_barriers'. In this case, this qualifier depends on
whole of the relations inside of the security view, we cannot push it down
anymore.
Views are expanded to sub-queries at query rewriter.
This patch add a flag to FromExpr to distinguish whether this sub-query was
expanded from security view, or not. When deconstruct_recurse() walks on
the supplied join-tree, it construct a bitmapset of the relations under the
FromExpr with security_view = true, then, it shall be delivered to
the distribute_qual_to_rels().
Instead of modifying the structure of pg_class, this patch stores a flag of
security view on the reloption. So, it needed to modify routines about
reloptions because it is the first case to store reloptions of views.
The other matter is priority to order qualifiers of a certain scanning
filter. As I mentioned above, the planner pull up simple subqueries to
join, so a function originated from inside of view and outside of view are
distributed to same scan plan. Then, these are ordered by cost value.
It means functions come from outside of views (maybe leakable) are invoked
with arguments come from tuples to be filtered out with functions come from
inside of the views.
So, I added 'depth' field to FuncExpr and so on. It shall be set just after
query rewriter, then referenced in order_qual_clauses().
If the supplied plan multiple qualifiers, these are ordered by the depth of
qualifier first, then ordered by the cost when here are multiple qualifiers
with same depth.
It makes sure qualifiers being originated inside of views are executed
earlier than others.
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
pgsql-fix-leaky-join-view.6.patchtext/x-patch; name=pgsql-fix-leaky-join-view.6.patchDownload+553-141
2010/9/6 KaiGai Kohei <kaigai@ak.jp.nec.com>:
The attached patch tackles both of the above two points, and allows to
control this deoptimization when CREATE SECURITY VIEW is used.
I'll send a review for the patch.
Contents & Purpose
==================
The patch adds a new SECURITY option for VIEWs. Views defined with
CREATE SECURITY
VIEW command are not merged with external WHERE-clauses including
security-leakable
functions in queries. However, since internal functions and operators are not
considered as leakable functions, the planner still works well for
security views
unless we use user-defined functions in the WHERE-clause.
Initial Run
===========
* Because of the delayed review, the patch has one hunk:
1 out of 5 hunks FAILED -- saving rejects to file
src/backend/commands/tablecmds.c.rej
but it is not serious at all, and can be easily fixed.
* It can be compiled, but has two warnings:
rewriteHandler.c: In function 'MarkFuncExprDepthWalker':
rewriteHandler.c:1155: warning: cast from pointer to integer of different size
rewriteHandler.c:1227: warning: cast to pointer from integer of different size
They should be harmless, but casting intptr_t is often used to avoid warnings:
- L1155: (int) (intptr_t) context;
- L1227: (void *) (intptr_t) (depth + 1)
* It passes all regression tests, but doesn't have own new tests.
Remaining works
===============
The patch might include only core code, but I'll let you know additional
sub-works are still required to complete the feature. Also, I've not measured
the performance yet, though performance might not be so critical for the patch.
* Regression tests and documentation for the feature are required.
* pg_dump must support to dump SECURITY VIEWs.
They are dumped as normal views for now.
* pg_views can have "security" column.
* psql's \dv can show security attributes of views.
Questions
=========
postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2;
==> We assume operators implemented with built-in functions are safe.
So, we don't prevent this pushing-down, if the clause does not
contains something leakable.
The term "built-in functions" means functions written in INTERNAL language
here. But we also have SQL functions by default. Some of them are just a
wrapper to internal functions. I'm not sure the checking of INTERNAL language
is the best way for the purpose. Did you compare it with other methods?
For example, "oid < FirstNormalObjectId" looks workable for me.
Instead of modifying the structure of pg_class, this patch stores a flag of
security view on the reloption. So, it needed to modify routines about
reloptions because it is the first case to store reloptions of views.
Why did you need to extend StdRdOptions strucuture? Since other fields in
the structure are not used by views at all, I think adding a new structure
struct ViewOptions { vl_len, security_view }
would be better than extending StdRdOptions.
--
Itagaki Takahiro
(2010/10/05 18:01), Itagaki Takahiro wrote:
2010/9/6 KaiGai Kohei<kaigai@ak.jp.nec.com>:
The attached patch tackles both of the above two points, and allows to
control this deoptimization when CREATE SECURITY VIEW is used.I'll send a review for the patch.
Thanks for your reviewing.
Contents& Purpose
==================
The patch adds a new SECURITY option for VIEWs. Views defined with
CREATE SECURITY
VIEW command are not merged with external WHERE-clauses including
security-leakable
functions in queries. However, since internal functions and operators are not
considered as leakable functions, the planner still works well for
security views
unless we use user-defined functions in the WHERE-clause.Initial Run
===========
* Because of the delayed review, the patch has one hunk:
1 out of 5 hunks FAILED -- saving rejects to file
src/backend/commands/tablecmds.c.rej
but it is not serious at all, and can be easily fixed.
The patch was submitted a month ago. I'll fix it.
* It can be compiled, but has two warnings:
rewriteHandler.c: In function 'MarkFuncExprDepthWalker':
rewriteHandler.c:1155: warning: cast from pointer to integer of different size
rewriteHandler.c:1227: warning: cast to pointer from integer of different sizeThey should be harmless, but casting intptr_t is often used to avoid warnings:
- L1155: (int) (intptr_t) context;
- L1227: (void *) (intptr_t) (depth + 1)
Thanks for the good idea.
* It passes all regression tests, but doesn't have own new tests.
OK, I'll add it. Perhaps, EXPLAIN enables to show us differences between
security views and others.
Remaining works
===============
The patch might include only core code, but I'll let you know additional
sub-works are still required to complete the feature. Also, I've not measured
the performance yet, though performance might not be so critical for the patch.* Regression tests and documentation for the feature are required.
* pg_dump must support to dump SECURITY VIEWs.
They are dumped as normal views for now.
* pg_views can have "security" column.
* psql's \dv can show security attributes of views.
All are fair enough for me.
Questions
=========postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2;
==> We assume operators implemented with built-in functions are safe.
So, we don't prevent this pushing-down, if the clause does not
contains something leakable.The term "built-in functions" means functions written in INTERNAL language
here. But we also have SQL functions by default. Some of them are just a
wrapper to internal functions. I'm not sure the checking of INTERNAL language
is the best way for the purpose. Did you compare it with other methods?
For example, "oid< FirstNormalObjectId" looks workable for me.
Hmm. I'm not sure they can be used for index-scans. If these operators are not
corresponding to index-scans, I want to keep the logic to check INTERNAL language,
because these have obviously no side effects (= not leakable anything).
kaigai=# select oprname, oprleft::regtype, oprright::regtype, prosrc
from pg_operator o join pg_proc p on o.oprcode = p.oid where prolang <> 12;
oprname | oprleft | oprright | prosrc
---------+------------------------+-----------------------------+------------------------------------
@> | path | point | select pg_catalog.on_ppath($2, $1)
+ | time without time zone | date | select ($2 + $1)
+ | time with time zone | date | select ($2 + $1)
+ | bigint | inet | select $2 + $1
+ | interval | time without time zone | select $2 + $1
+ | interval | date | select $2 + $1
+ | interval | time with time zone | select $2 + $1
+ | interval | timestamp without time zone | select $2 + $1
+ | interval | timestamp with time zone | select $2 + $1
+ | integer | date | select $2 + $1
|| | text | anynonarray | select $1 || $2::pg_catalog.text
|| | anynonarray | text | select $1::pg_catalog.text || $2
~ | path | point | select pg_catalog.on_ppath($2, $1)
(13 rows)
Instead of modifying the structure of pg_class, this patch stores a flag of
security view on the reloption. So, it needed to modify routines about
reloptions because it is the first case to store reloptions of views.Why did you need to extend StdRdOptions strucuture? Since other fields in
the structure are not used by views at all, I think adding a new structure
struct ViewOptions { vl_len, security_view }
would be better than extending StdRdOptions.
Hmm. It might be better than ad-hoc enhancement of StdRdOptions.
BTW, which is more preference to store the flag of security view:
reloption of the view or a new bool variable in the pg_class?
I tried to store this flag within reloptions to minimize the patch
size, but it seems to me reloption support for views makes the patch
size larger in the result.
Thanks,
--
KaiGai Kohei <kaigai@ak.jp.nec.com>