A question about leakproof

Started by qiumingchengover 3 years ago14 messagesgeneral
Jump to latest
#1qiumingcheng
qiumingcheng@aliyun.com

Hello, My questions are as follows:
Problem description
After testing, we don't find the difference between functions of proleakproof=true and functions of proleakproof=false (the function is described in pg_proc). Can you give specific examples to show that functions of proleakproof=true are more secure or can prevent data disclosure than functions of proleakproof=false. My related testing process is as follows (the rsp_user and wumk used below are the two database users that have been created).
1. Operation under user rsp_user
1.1 create table
drop table if exists tb_a cascade;
create table tb_a(id int4,c1 int, c2 int, pad text, effective_date timestamp without time zone NOT NULL) ;
CREATE INDEX tb_a_t_idx_id ON tb_a USING btree (id) TABLESPACE pg_default;
CREATE INDEX tb_a_t_idx_ed ON tb_a USING btree (effective_date) TABLESPACE pg_default;
CREATE INDEX tb_a_t_idx_c2 ON tb_a USING btree (c2) TABLESPACE pg_default;
1.2 insert data
insert into tb_a select id, id %200, id%1000, 'ss', current_date - floor((random() * 10000))::int from (select generate_series(1,10000) id) tb_a;
analyze;
1.3 create view
a. condition of view:effective_date > now() - TIME'23:00', in particular, type of effective_date is ‘timestamp without time zone NOT NULL’,but the type of the result of now() - TIME'23:00' is:timestamp with time zone,the related SQL statements are as follows:
drop view if exists tb_a_date_v1;
CREATE VIEW tb_a_date_v1 AS select * from tb_a where effective_date > now() - TIME'23:00';
b. the condition of the view is id=183. Note that type of id field is int4. the relevant SQL statements are as follows:
drop view if exists tb_a_int4_v1;
CREATE VIEW tb_a_int4_v1 AS select * from tb_a where id=183;
1.4 Authorize the view to user wumk
GRANT ALL ON SCHEMA public TO wumk;
GRANT ALL ON table tb_a_date_v1 TO wumk;
GRANT ALL ON table tb_a_int4_v1 TO wumk;
1.5 test SQL
Execute the following SQL statements respectively:explain select * from tb_a_int4_v1; explain select * from tb_a_date_v1。the results are as follows, the plans for both SQL statements are indexscan。
2. Operation under user wumk
1.1. test SQL,looking the plans
Execute the following SQL statements respectively:explain select * from tb_a_int4_v1; explain select * from tb_a_date_v1。The results are as follows:
As shown above, the plan of view tb_a_date_ v1 is seqscan (under the user rsp_user, its plan is indexscan),After analyzing the source code of postgresql, it is found that this problem is related to the following code:
In the above figure, func_oid=2523, the condition (effective_date > now() - TIME'23:00' in view b_a_data_v1 ) will use the function(oid=2523) for calculation, the function name is timestamp_gt_timestamptz,as shown in the figure below:
The proleakproof of the function is false,so,for the selectivity calculation of effective_date > now() - TIME'23:00 don’t use statistical information
,so the selectivity calculation of indexscan is higher than the actual,So finally, seqscan is selected.
While condition id=183 in in view tb_a_int4_v1 uses a comparison function whose proleakproof is true, so tb_a_int4_v1 view will normally use statistics info to calculate the selectivity, so that the correct indexscan is finally selected
So check the official postgresql document about leakproof:
According to my understanding, if the proleakproof is true, the function will not cause data leakage, and if the proleakproof is false, the function may cause data leakage. So I had tested the proleakproof about data leakage in sections 2.2 and 2.3.
1.2. test the data leakage of seqscan
a. Execute the following SQL and the results are as follows:
create or replace function leak_date(timestamp with time zone) returns bool as $$begin raise notice 'abc:%', $1; return true; end$$ language plpgsql cost 0.0000000000000001;
create or replace function leak_int4(int4) returns bool as $$begin raise notice 'abc:%', $1; return true; end$$ language plpgsql cost 0.0000000000000001;
set enable_indexscan=off;
set enable_bitmapscan=off;
select * from tb_a_date_v1 where leak_date(effective_date);
select * from tb_a_int4_v1 where leak_int4(id);
Analysis conclusion:
It is found that whether the proleakproof is true or false, there is a problem of data leakage under the seqscan plan.
1.3. test the data leakage of indexscan
To ensure that the plan must be indexscan, you need to set the following parameters:
set enable_seqscan=off; set enable_indexscan=on; set enable_bitmapscan=on;
Then execute the sqls:
select * from tb_a_date_v1 where leak_date(effective_date);
select * from tb_a_int4_v1 where leak_int4(id);
The results are as follows:
Analysis conclusion:
It is found that whether the proleakproof is true or false, there is no problem of data leakage under the indexscan plan

Attachments:

temp4cj.pngapplication/octet-streamDownload+1-0
temp4cj.pngapplication/octet-streamDownload+1-1
temp4cj.pngapplication/octet-streamDownload+1-1
temp4cj.pngapplication/octet-streamDownload
temp4cj.pngapplication/octet-streamDownload
temp4cj.pngapplication/octet-streamDownload+2-0
temp4cj.pngapplication/octet-streamDownload+0-1
temp4cj.pngapplication/octet-streamDownload+1-1
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: qiumingcheng (#1)
Re: A question about leakproof

Hi,

On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote:

Hello, My questions are as follows:
Problem description
After testing, we don't find the difference between functions of
proleakproof=true and functions of proleakproof=false (the function is
described in pg_proc). Can you give specific examples to show that functions
of proleakproof=true are more secure or can prevent data disclosure than
functions of proleakproof=false. My related testing process is as follows
(the rsp_user and wumk used below are the two database users that have been
created).

Have you looked at
https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof?

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#2)
Re: A question about leakproof

Julien Rouhaud <rjuju123@gmail.com> writes:

On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote:

After testing, we don't find the difference between functions of
proleakproof=true and functions of proleakproof=false (the function is
described in pg_proc).

Have you looked at
https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof?

Also: the fact that a built-in function is not marked leakproof
doesn't mean that it isn't leakproof. It could just mean that
we haven't looked at it closely, or that there's too much code
involved to have much confidence that it would stay leakproof.

regards, tom lane

#4qiumingcheng
qiumingcheng@aliyun.com
In reply to: Tom Lane (#3)
回复:A question about leakproof

Have you looked at
https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof <https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof >?

Yes, if I use securtiy_barrierys, it do work, but it still can't use index, I guess it may cause performance problems, right ?

Also: the fact that a built-in function is not marked leakproof
doesn't mean that it isn't leakproof. It could just mean that
we haven't looked at it closely, or that there's too much code
involved to have much confidence that it would stay leakproof.

1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does that mean you will adjust it to proleakproof=false later?
2. What basis do you set proleakproof of in4eq function to true? How should I judge whether a function should be marked as proleakproof.Can you give a function that will not leak?
------------------------------------------------------------------
发件人:Tom Lane <tgl@sss.pgh.pa.us>
发送时间:2022年10月17日(星期一) 09:54
收件人:Julien Rouhaud <rjuju123@gmail.com>
抄 送:qiumingcheng <qiumingcheng@aliyun.com>; pgsql-general <pgsql-general@lists.postgresql.org>
主 题:Re: A question about leakproof
Julien Rouhaud <rjuju123@gmail.com> writes:

On Mon, Oct 17, 2022 at 09:15:20AM +0800, qiumingcheng wrote:

After testing, we don't find the difference between functions of
proleakproof=true and functions of proleakproof=false (the function is
described in pg_proc).

Have you looked at
https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof <https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Security_barriers_and_Leakproof >?

Also: the fact that a built-in function is not marked leakproof
doesn't mean that it isn't leakproof. It could just mean that
we haven't looked at it closely, or that there's too much code
involved to have much confidence that it would stay leakproof.
regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: qiumingcheng (#4)
Re: 回复:A question about leakproof

"qiumingcheng" <qiumingcheng@aliyun.com> writes:

1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does that mean you will adjust it to proleakproof=false later?

int4eq is about as leakproof as a function could possibly be: it does
not leak, it's plain from the code of the function that it does not
leak, and it calls no other code that might accidentally introduce
a leak in future.

I think you do not understand what that property actually means.
Per the CREATE FUNCTION man page:

LEAKPROOF indicates that the function has no side
effects. It reveals no information about its arguments other than by
its return value. For example, a function which throws an error message
for some argument values but not others, or which includes the argument
values in any error message, is not leakproof.

Please note that this definition talks only about the behavior
of the function itself. Re-reading your email, you seem to be
imagining that changes in a query's plan on the basis of changes in
collected statistics have something to do with this. They do not.

regards, tom lane

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#5)
Re: 回复:A question about leakproof

On Sun, Oct 16, 2022 at 8:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"qiumingcheng" <qiumingcheng@aliyun.com> writes:

1. In the test example I gave, the in4eq function's proleakproof=true,

but its actual test result is leaking. Does that mean you will adjust it to
proleakproof=false later?

int4eq is about as leakproof as a function could possibly be: it does
not leak, it's plain from the code of the function that it does not
leak, and it calls no other code that might accidentally introduce
a leak in future.

I think you do not understand what that property actually means.
Per the CREATE FUNCTION man page:

LEAKPROOF indicates that the function has no side
effects. It reveals no information about its arguments other than
by
its return value. For example, a function which throws an error
message
for some argument values but not others, or which includes the
argument
values in any error message, is not leakproof.

Please note that this definition talks only about the behavior
of the function itself. Re-reading your email, you seem to be
imagining that changes in a query's plan on the basis of changes in
collected statistics have something to do with this. They do not.

You omitted including the part about when the system even has to care about
leakproof:

"This affects how the system executes queries against views created with
the security_barrier option or tables with row level security enabled."

A non-leakproof function must not be called with inputs that the current
user is not allowed to see. Allowed being the operative word, if they
could see the value if not for other non-security-related conditions in the
query it is acceptable to call the function with those inputs even if the
end result is simply going to be thrown away later (while inefficient, this
is why function cost is tunable). None of your examples prohibit any row
from being processed by any function and so your spy function may see any
and all rows present in the table.

A leakproof function is allowed to process data that the current user is
not allowed to see - because if the input row ends up being filtered out
the values of the input arguments will never be viewable by the user. They
will neither be in the query output, nor able to be read or inferred by
some kind of side-effect. Your spy function, which is not leakproof,
should never see such prohibited rows - which you seem to have confirmed.
The planner indeed must ensure other security-related filters are applied
first.

David J.

#7qiumingcheng
qiumingcheng@aliyun.com
In reply to: Tom Lane (#5)
回复:回复:A question about leakproof

you seem to be imagining that changes in a query's plan on the basis of changes in collected statistics have something to do with this. They do not.

Sorry, I may not fully understand what you mean. I mean that after my tests, the execution results of this SQL (explain select * from tb_a_date_v1) execution plan are different under different users, which is really related to the parameter proleakproof.
If the 2 below is changed to 'return true' , the execution plan will be indexscan
------------------------------------------------------------------
发件人:Tom Lane <tgl@sss.pgh.pa.us>
发送时间:2022年10月17日(星期一) 11:33
收件人:qiumingcheng <qiumingcheng@aliyun.com>
抄 送:Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>
主 题:Re: 回复:A question about leakproof
"qiumingcheng" <qiumingcheng@aliyun.com> writes:

1. In the test example I gave, the in4eq function's proleakproof=true, but its actual test result is leaking. Does that mean you will adjust it to proleakproof=false later?

int4eq is about as leakproof as a function could possibly be: it does
not leak, it's plain from the code of the function that it does not
leak, and it calls no other code that might accidentally introduce
a leak in future.
I think you do not understand what that property actually means.
Per the CREATE FUNCTION man page:
LEAKPROOF indicates that the function has no side
effects. It reveals no information about its arguments other than by
its return value. For example, a function which throws an error message
for some argument values but not others, or which includes the argument
values in any error message, is not leakproof.
Please note that this definition talks only about the behavior
of the function itself. Re-reading your email, you seem to be
imagining that changes in a query's plan on the basis of changes in
collected statistics have something to do with this. They do not.
regards, tom lane

Attachments:

temp4cj.pngapplication/octet-streamDownload
#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: qiumingcheng (#7)
Re: 回复:回复:A question about leakproof

On Mon, 2022-10-17 at 13:17 +0800, qiumingcheng wrote:

you seem to be imagining that changes in a query's plan on the basis of changes
in collected statistics have something to do with this.  They do not.

Sorry, I may not fully understand what you mean. I mean that after my tests,
the execution results of this SQL (explain select * from tb_a_date_v1) execution plan
are different under different users, which is really related to the parameter proleakproof.

That's the idea behind leakproof: if a function is not leakproof, the optimizer
will not move it "inside" the view definition. Then the function is evaluated only
after the view definition. That may very well lead to a slower execution plan,
because it cannot use certain indexes on the underlying tables.

It is the price you have to pay for good security.

Yours,
Laurenz Albe

#9qiumingcheng
qiumingcheng@aliyun.com
In reply to: Laurenz Albe (#8)
回复:回复:回复:A question about leakproof

"you seem to be imagining that changes in a query's plan on the basis of changes
in collected statistics have something to do with this. They do not."

1. My understanding of the above paragraph is that for the same view and different users, the proleakproof=false attribute of the function will not lead to inconsistent plans, but my actual test result is that proleakproof=false will lead to inconsistent plans。
2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? Can you explain how it causes data leakage?
------------------------------------------------------------------
发件人:Laurenz Albe <laurenz.albe@cybertec.at>
发送时间:2022年10月17日(星期一) 15:20
收件人:qiumingcheng <qiumingcheng@aliyun.com>; Tom Lane <tgl@sss.pgh.pa.us>
抄 送:Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>
主 题:Re: 回复:回复:A question about leakproof
On Mon, 2022-10-17 at 13:17 +0800, qiumingcheng wrote:

you seem to be imagining that changes in a query's plan on the basis of changes
in collected statistics have something to do with this. They do not.

Sorry, I may not fully understand what you mean. I mean that after my tests,
the execution results of this SQL (explain select * from tb_a_date_v1) execution plan
are different under different users, which is really related to the parameter proleakproof.

That's the idea behind leakproof: if a function is not leakproof, the optimizer
will not move it "inside" the view definition. Then the function is evaluated only
after the view definition. That may very well lead to a slower execution plan,
because it cannot use certain indexes on the underlying tables.
It is the price you have to pay for good security.
Yours,
Laurenz Albe

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: qiumingcheng (#9)
Re: 回复:回复:回复:A question about leakproof

"qiumingcheng" <qiumingcheng@aliyun.com> writes:

2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? Can you explain how it causes data leakage?

It's capable of throwing an error (see
timestamp2timestamptz_opt_overflow). Now, maybe that's unreachable, or
maybe we could rerrange things to remove it. But there's still enough
code underneath the timezone conversion requirement that I'd be very
hesitant to apply a leakproof marking. In short: it might be leakproof
in practice, but we don't wish to offer a guarantee.

regards, tom lane

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: qiumingcheng (#9)
Re: 回复:回复:回复:A question about leakproof

On Mon, 2022-10-17 at 16:24 +0800, qiumingcheng wrote:

"you seem to be imagining that changes in a query's plan on the basis of changes
in collected statistics have something to do with this.  They do not."

1. My understanding of the above paragraph is that for the same view and different users,
the proleakproof=false attribute of the function will not lead to inconsistent plans,
but my actual test result is that proleakproof=false will lead to inconsistent plans.

The above says "on the basis of changes in collected statistics". The different execution
you see is not because the statistics are different, but because the permissions of the
users are different.

2. What's the reason about the function timestamp_gt_timestampz  may  cause data leakage?
Can you explain how it causes data leakage?

I don't know the reason in this case. You could look at the source code, perhaps it is
possible to cause error messages that can give you some clue as to the value that you
compare with. But perhaps, as Tome said, it is just that nobody scrutinized the function
hard enough to exclude that something like that can happen.

Yours,
Laurenz Albe

#12qiumingcheng
qiumingcheng@aliyun.com
In reply to: Tom Lane (#10)
回复:回复:回复:回复:A question about leakproof

It's capable of throwing an error (see timestamp2timestamptz_opt_overflow).

Yes, It's capable of throwing an error(timestamp out of range) , but the message "timestamp out of range" is not sensitive information. Only from this function(timestamp_gt_timestamptz), can it be marked as leakproof?
------------------------------------------------------------------
发件人:Tom Lane <tgl@sss.pgh.pa.us>
发送时间:2022年10月17日(星期一) 22:07
收件人:qiumingcheng <qiumingcheng@aliyun.com>
抄 送:Laurenz Albe <laurenz.albe@cybertec.at>; Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>; yuexingzhi <yuexingzhi@hotmail.com>
主 题:Re: 回复:回复:回复:A question about leakproof
"qiumingcheng" <qiumingcheng@aliyun.com> writes:

2. What's the reason about the function timestamp_gt_timestampz may cause data leakage? Can you explain how it causes data leakage?

It's capable of throwing an error (see
timestamp2timestamptz_opt_overflow). Now, maybe that's unreachable, or
maybe we could rerrange things to remove it. But there's still enough
code underneath the timezone conversion requirement that I'd be very
hesitant to apply a leakproof marking. In short: it might be leakproof
in practice, but we don't wish to offer a guarantee.
regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: qiumingcheng (#12)
Re: 回复:回复:回复:回复:A question about leakproof

"qiumingcheng" <qiumingcheng@aliyun.com> writes:

Yes, It's capable of throwing an error(timestamp out of range) , but the
message "timestamp out of range" is not sensitive information.

Really? Whether that's true at all is a matter of opinion. There's
also the prospect that somebody could determine the value of a
supposedly-unreadable timestamp by seeing how big an interval could
be added to it without overflow. Maybe that's infeasible because of
timestamp_pl_interval not being marked leakproof, but then we're
getting into precisely the sort of conditional-on-other-assumptions
reasoning that we don't want to indulge in.

Only from this function(timestamp_gt_timestamptz), can it be marked as leakproof?

Project policy is that we will not mark a function as leakproof unless
it's evident from the text of the function that it can't throw errors.
I don't see a good argument for making a exception for this one.

regards, tom lane

#14qiumingcheng
qiumingcheng@aliyun.com
In reply to: Tom Lane (#13)
回复:回复:回复:回复:回复:A question about leakproof

OK, got it. Thank you very much.
------------------------------------------------------------------
发件人:Tom Lane <tgl@sss.pgh.pa.us>
发送时间:2022年10月18日(星期二) 00:27
收件人:qiumingcheng <qiumingcheng@aliyun.com>
抄 送:Laurenz Albe <laurenz.albe@cybertec.at>; Julien Rouhaud <rjuju123@gmail.com>; pgsql-general <pgsql-general@lists.postgresql.org>; yuexingzhi <yuexingzhi@hotmail.com>
主 题:Re: 回复:回复:回复:回复:A question about leakproof
"qiumingcheng" <qiumingcheng@aliyun.com> writes:

Yes, It's capable of throwing an error(timestamp out of range) , but the
message "timestamp out of range" is not sensitive information.

Really? Whether that's true at all is a matter of opinion. There's
also the prospect that somebody could determine the value of a
supposedly-unreadable timestamp by seeing how big an interval could
be added to it without overflow. Maybe that's infeasible because of
timestamp_pl_interval not being marked leakproof, but then we're
getting into precisely the sort of conditional-on-other-assumptions
reasoning that we don't want to indulge in.

Only from this function(timestamp_gt_timestamptz), can it be marked as leakproof?

Project policy is that we will not mark a function as leakproof unless
it's evident from the text of the function that it can't throw errors.
I don't see a good argument for making a exception for this one.
regards, tom lane