Proposal for enhancements of privilege system
Some of you may remember some mumblings about some work on access
privileges, so this is the idea. Comments welcome.
* Goals
The goal of the first stage is to provide a fully SQL92 compliant
solution. That comprises particularly column level granularity, the
REFERENCES[*] privilege for use by constraints, and the USAGE
privilege. We currently don't have any of the things that USAGE
pertains to -- domains, character sets, collations, translations --
but at least we shouldn't have to start all over when we do. Also
GRANT OPTION needs to be supported.
[*] -- now with that RI snafu unveiled that goal seems optimistic
The second stage would be adopting all specifications made by SQL3 if
they are applicable. This includes particularly the privilege types
TRIGGER and UNDER (for table inheritance, which should probably work
well first). Also we could think about EXECUTE for functions and some
`setuid'-like features.
Stage two isn't necessarily anticipated for 7.1 but I'd like to have a
framework which adapts well.
* User management
One thing I'd like to see resolved first is the issue of
pg_shadow.usesysid. This field is fully functionally dependent on
pg_shadow.oid so there's little theoretical need to keep it around.
Secondly, the system happily reassigns previously used sysids, which
is a pretty dangerous thing to do as we all know, since there might
still be old database objects hanging around that the new users
shouldn't necessarily have access to. (And connecting to all databases
in turn to remove any dangling objects when a user is dropped isn't
really an option.) So the answer is to not recycle sysids. But then
why not use the oid?
Some arguments for user sysids I have heard in the past were that some
people want to keep them the same as the Unix uid. While I'm at a loss
as to how this would matter in practice (aren't names enough) I grant
that that's an argument (albeit one that doesn't scale well because
not every database user is a Unix user and two identically numbered
Unix users from different machines would presumably map to different
database users). But if you look closer then this thinking is
primarily caused by the fact that there is a usesysid field at all --
if there wasn't, you wouldn't have to keep it in sync.
Another reason why an oid based arrangement would be nicer is that if
we did the same thing for groups why could refer to both users and
groups through one attribute. See `Implementation' below.
* Implementation
The central idea in this proposal is a new system table to store
permissions:
pg_privilege (
priobj oid,
prigrantor oid,
prigrantee oid,
priaction char,
priisgrantable boolean,
primary key (priobj, prigrantee, priaction)
)
"priobj" would store the oid of the object being described, for
example a table or function or type. "prigrantor" is the user who
granted the privilege. (It is necessary to store this to get grant
options to work correctly.) "prigrantee" is obviously the user to
which you grant the privilege or a group. We could put 0 for "public".
"priaction" would be the encoding of the privilege type, such as
's'=select, 'u'=update, perhaps. And "prigrantable" is whether the
privilege is grantable.
The key advantages to this method over the old one are:
- Extensible beyond tables, in fact to any kind of object
- Easier to query, e.g., for what-if inquiries
- The old method would make grant options pretty tough without a major
rework
- A pg_privilege row would be almost exactly what SQL calls a
"privilege descriptor". So the implementation will be much easier
and verifyable because you can read the program code out of the
standard text. (in theory anyway)
Those that follow will see how simple-minded grant, revoke, and
privilege lookup will be in their core: simply insert, delete, or look
for a row. (Of course the devil is in the details.)
* Column privileges
There are two approaches I see to managing column privileges, one is a
little cleaner, the other faster. Note that granting access to a
table is different than granting access to all of its columns; the
difference is what happens when you add a new column.
The straightforward choice would be to store a single reference to
pg_class when the privilege describes the whole table, and
pg_attribute references when only specific columns are named. That
would mean the lookup routine will first look for a pg_class.oid entry
and, failing that, then for possible pg_attribute.oid entries for the
columns that it's interested in. This is of course suboptimal when no
privilege exists in the first place but that is not necessarily the case
we're optimizing for.
The second choice would be to always have an entry for the table, even
if it only says "I'm not the real privilege, but there are column
privileges, so you better keep looking." That would probably mean
another column in the pg_privilege. This way you have to maintain
redundant information but there is enough precedent for this sort of
thing in the other system catalogs.
* Groups
Handling groups efficiently is a bit tricky because it's essentially
equivalent to a join: scan all the privileges and all the groups and
look for matches between them and with the current user id. I suppose
one could simply run this query by hand once and see what the
optimizer thinks would be a good way to run it, but that isn't
facilitated by the way group information is stored right now.
I would do it like this: Looking up privileges granted to groups would
be done if the lookup based on the user id fails. Then you have to
scan pg_group anyway, so you might as well just scan it once
completely and record all the groups the user is in. Then you do a
privilege lookup for each group in a manner identical to individual
users.
This is different from the current implementation which looks through
all existing privileges on a table and if one is owned by a group then
scan pg_group to see if the user is in the group. That might be
suboptimal.
* Performance concerns
The fastest privilege system if of course one that does no checking at
all. Features always cost a price. I have no concern, however, that
this new implementation would cause any noticeable penalty at all. If
you consider how much reading the parser, planner, optimizer, and
rewriter do just to make sense of a query, this is really a minor
item.
If you're the table owner then no access checking is done at all. If
you don't use groups or column privileges then one syscache lookup
will tell you yes or no. If you do use groups then the new system
would potentially even be faster. If you want to use column privileges
you'd currently wait forever. :)
* Possibilities for extensions
One thing that has been thrown around is a LOCK privilege. Currently
everyone with write access can lock the table completely. It would
make sense to me to restrict locks of Share mode and higher to the
owner and owners of this privilege.
There is also demand for various CREATE privileges (one for each thing
you can create, one supposes). Once we have schemas we can easily fit
this into the above design. Since this is not covered by the standard
("implementation-defined"), a good round of discussion ought to take
place first.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes:
[ pretty good proposal ]
Just a couple of trivial comments ---
Some arguments for user sysids I have heard in the past were that some
people want to keep them the same as the Unix uid.
There may once have been a reason for that, but it's probably buried in
ancient Berkeley-specific admin practices. I sure can't see any good
reason to keep the extra number around now. As you say, it should be
OK to use the pg_shadow row OID to identify users.
BTW I believe most of the "owner" columns in the system tables are
declared as "int4" because they hold sysids ... don't forget to change
'em to be "Oid" when you do this.
Another reason why an oid based arrangement would be nicer is that if
we did the same thing for groups why could refer to both users and
groups through one attribute. See `Implementation' below.
"findoidjoins" will probably get unhappy with you if you do that.
Which is maybe not a big deal, but...
"prigrantee" is obviously the user to
which you grant the privilege or a group.
We could put 0 for "public".
I'd be inclined to provide an additional field that explicitly encodes
"grantee is user", "grantee is group", or "grantee is public". That
way you don't need to do a join to find out what you are looking at.
Really, having an OID column that might reference either users or groups
is the SQL equivalent of a type pun. An alternative representation that
would avoid that would be two OID columns, one to use if user and one
to use if group (if they're both 0 then it's grant to public).
regards, tom lane
Another reason why an oid based arrangement would be nicer is that if
we did the same thing for groups why could refer to both users and
groups through one attribute. See `Implementation' below."findoidjoins" will probably get unhappy with you if you do that.
Which is maybe not a big deal, but...
I think it will find it. It is not a big deal anyway.
--
Bruce Momjian | http://www.op.net/~candle
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Peter Eisentraut wrote:
pg_privilege (
priobj oid,
prigrantor oid,
prigrantee oid,
priaction char,
priisgrantable boolean,primary key (priobj, prigrantee, priaction)
)
I like it.
The straightforward choice would be to store a single reference to
pg_class when the privilege describes the whole table, and
pg_attribute references when only specific columns are named. That
would mean the lookup routine will first look for a pg_class.oid entry
and, failing that, then for possible pg_attribute.oid entries for the
columns that it's interested in. This is of course suboptimal when no
privilege exists in the first place but that is not necessarily the case
we're optimizing for.
Don't worry about performance for the access denied case. That is going
to be outweighed 1000:1 by the access allowed case. Go for the clean
solution.
Tom Lane writes:
Really, having an OID column that might reference either users or groups
is the SQL equivalent of a type pun.
Well, I don't really know what a type pun is but the priobj column would
do exactly the same thing by referring to tables, types, functions, etc.
by unadorned oid, which I thought would be pretty nice. Really, in normal
mode of operation there is never a question "Does this privilege apply to
a user or a group?" it's always "Given this object and this user/group id,
do I have access?" I don't see that as a practical problem, but I'll think
about it.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
On Tue, 23 May 2000, Chris Bitmead wrote:
Peter Eisentraut wrote:
pg_privilege (
priobj oid,
prigrantor oid,
prigrantee oid,
priaction char,
priisgrantable boolean,primary key (priobj, prigrantee, priaction)
)I like it.
Imho this is an area where it does make sense to look at what other db's do,
because it makes the toolwriters life so much easier if pg behaves like some other
common db. Thus I do not really like a standalone design.
Other db's usually use a char array for priaction and don't have priisgrantable,
but code it into priaction. Or they use a bitfield. This has the advantage of only
producing one row per table.
Andreas
Andreas Zeugswetter writes:
Imho this is an area where it does make sense to look at what other
db's do, because it makes the toolwriters life so much easier if pg
behaves like some other common db.
The defined interface to the privilege system is GRANT, REVOKE, and
"access denied" (and a couple of INFORMATION_SCHEMA views, eventually).
I don't see how other db's play into this.
Other db's usually use a char array for priaction and don't have
priisgrantable, but code it into priaction. Or they use a bitfield.
This has the advantage of only producing one row per table.
That's the price I'm willing to pay for abstraction, extensibility, and
verifyability. But I'm open for better ideas.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Imho this is an area where it does make sense to look at what other
db's do, because it makes the toolwriters life so much easier if pg
behaves like some other common db.The defined interface to the privilege system is GRANT, REVOKE, and
"access denied" (and a couple of INFORMATION_SCHEMA views,
eventually).
I don't see how other db's play into this.
Of course the grant revoke is the same. But administrative tools usually
allow you to dump schema, all rights, triggers ... for an object and thus
need
access to the system tables containing the grants.
Other db's usually use a char array for priaction and don't have
priisgrantable, but code it into priaction. Or they use a bitfield.
This has the advantage of only producing one row per table.That's the price I'm willing to pay for abstraction,
extensibility, and
verifyability. But I'm open for better ideas.
Imho this is an area that is extremly sensitive to performance,
the rights have to be checked for each access.
Andreas
Import Notes
Resolved by subject fallback
On Tue, 30 May 2000, Zeugswetter Andreas SB wrote:
Other db's usually use a char array for priaction and don't have
priisgrantable, but code it into priaction. Or they use a bitfield.
This has the advantage of only producing one row per table.That's the price I'm willing to pay for abstraction,
extensibility, and
verifyability. But I'm open for better ideas.Imho this is an area that is extremly sensitive to performance,
the rights have to be checked for each access.
Yes, but I believe that Peter's idea is good. System tables are used for
each access not only for ACL, and performance problem is a problem for
system cache not primary for privilege system.
I look forward set privilege for columns and functions. Large multiuser
projects need it.
Karel
On Tue, 30 May 2000, Zeugswetter Andreas SB wrote:
Of course the grant revoke is the same. But administrative tools
usually allow you to dump schema, all rights, triggers ... for an
object and thus need access to the system tables containing the
grants.
That's what you use the information schema views for. Also, of course,
we're light years away from having anything like a portable pg_dump.
Imho this is an area that is extremly sensitive to performance, the
rights have to be checked for each access.
But using some sort of arrays is going to make it slower in any case since
you can't use indexes on those.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Of course the grant revoke is the same. But administrative tools
usually allow you to dump schema, all rights, triggers ... for an
object and thus need access to the system tables containing the
grants.That's what you use the information schema views for.
Ok.
Also, of course,
we're light years away from having anything like a portable pg_dump.
Hmm ? I am not talking about pg_dump, I am talking about some graphical tool
that shows the table structure and grants.
Imho this is an area that is extremly sensitive to performance, the
rights have to be checked for each access.But using some sort of arrays is going to make it slower in
any case since
you can't use indexes on those.
Again Hmm ? Are you going to do select * from <authtable> where pri="select"
or some such ? Usually you look up a users rights for a specific table,
and that needs to be fast.
Andreas
Import Notes
Resolved by subject fallback
Other db's usually use a char array for priaction and don't have
priisgrantable, but code it into priaction. Or they usea bitfield.
This has the advantage of only producing one row per table.
That's the price I'm willing to pay for abstraction,
extensibility, and
verifyability. But I'm open for better ideas.Imho this is an area that is extremly sensitive to performance,
the rights have to be checked for each access.Yes, but I believe that Peter's idea is good. System tables
are used for
each access not only for ACL, and performance problem is a problem for
system cache not primary for privilege system.
Yes I totally agree, that the basic idea is great, all I am saying is, that
I would
1. gather more than one priviledge per table into one row (all of: select,
insert, update ...)
2. try to look at some existing table structure from one biggie db and see
if it fits
Andreas
Import Notes
Resolved by subject fallback
On Tue, 30 May 2000, Zeugswetter Andreas SB wrote:
Yes, but I believe that Peter's idea is good. System tables
are used for
each access not only for ACL, and performance problem is a problem for
system cache not primary for privilege system.Yes I totally agree, that the basic idea is great, all I am saying is, that
I would
1. gather more than one priviledge per table into one row (all of: select,
insert, update ...)
I disccuse this idea with Peter some month ago via private mails (Peter
has big patience .. :-) and we already calculate about it.
* needful ACL data for one object will very small and not spend very memory
in cache,
* in one moment you need information about one object and one privilege
type. SELECT/UPDATE/etc in one row is not needful, if you run SELECT you
need information about priv. for select only.
* it is very easy extendible, is not defined some special pozition in some
string or some special column for (example) SELECT. You can in future add
new privilege element.
2. try to look at some existing table structure from one biggie db and see
if it fits
See pg_attribute --- here is very simular situation, but it is larger.
Karel
Zeugswetter Andreas SB writes:
Again Hmm ? Are you going to do select * from <authtable> where pri="select"
or some such ? Usually you look up a users rights for a specific table,
and that needs to be fast.
Exactly, that's why I have to do it like this. To interface a system
catalog to the shared cache you need a primary key, which would be
(object, user, action) in my proposal. With that setup I can easily make
queries of the sort "does user X have select right on table Y" as fast as
possible, no slower than, say, looking up an attribute definition in
pg_attribute.
With several privileges per row you make the table unnecessarily sparse,
you make interfacing to the catalog cache a nightmare, and you create all
sorts of funny implementation problems (for example, revoking a privilege
might be an update or a delete, depending on whether it was the last
privilege revoked).
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
Again Hmm ? Are you going to do select * from <authtable> where pri="select"
or some such ? Usually you look up a users rights for a specific table,
and that needs to be fast.Exactly, that's why I have to do it like this. To interface a system
catalog to the shared cache you need a primary key, which would be
(object, user, action) in my proposal. With that setup I can easily make
queries of the sort "does user X have select right on table Y" as fast as
possible, no slower than, say, looking up an attribute definition in
pg_attribute.
Ok, I see that you will somtimes want to do a select like that, only I do
not see the reason why this has to be the primary target for speed.
Remember that for each row in the db you have >30 bytes of overhead
(I forgot the exact number) plus table_oid + user_oid thus if a user has
all permissions on a table, that will take 300 bytes.
I also think that a key of object + {user|group} is imho selective enough,
you don't want a key whose only info is a boolean.
Andreas
"Zeugswetter Andreas" <andreas.zeugswetter@telecom.at> writes:
Exactly, that's why I have to do it like this. To interface a system
catalog to the shared cache you need a primary key, which would be
(object, user, action) in my proposal. With that setup I can easily make
queries of the sort "does user X have select right on table Y" as fast as
possible, no slower than, say, looking up an attribute definition in
pg_attribute.
Ok, I see that you will somtimes want to do a select like that, only I do
not see the reason why this has to be the primary target for speed.
Remember that for each row in the db you have >30 bytes of overhead
(I forgot the exact number) plus table_oid + user_oid thus if a user has
all permissions on a table, that will take 300 bytes.
I also think that a key of object + {user|group} is imho selective enough,
you don't want a key whose only info is a boolean.
I tend to agree with Andreas on this: having a separate tuple for each
individual kind of access right will consume an unreasonable amount of
space --- both on disk and in the syscache, if a cache is used for this
table. (In the cache, that translates to entries not living very long
before they fall off the LRU list.)
regards, tom lane
Tom Lane writes:
having a separate tuple for each individual kind of access right will
consume an unreasonable amount of space --- both on disk and in the
syscache, if a cache is used for this table.
That's a valid concern, but unfortunately things aren't that easy. For
each access right you also have to store what user granted that privilege
and whether it's grantable, and for SELECT also whether it includes the
"hierarchy option" (has to do with table inheritance somehow).
Say you store all privileges in an array, then you'd either need to encode
all 3 1/2 pieces of information into one single data type and make an
array thereof (like `array of record privtype; privgrantor;
privgrantable'), which doesn't really make things easier, or you have
three arrays per tuple, which makes things worse. Also querying arrays is
painful.
So the alternative is to have separate columns per privilege, like
pg_privilege ( priobj, prigrantee,
priupdate, priupdateisgrantable, priupdategrantor,
priselect, priselectisgrantable, priselectgrantor,
... /* delete, insert, references */
)
The originally proposed schema would be 14 bytes data plus overhead. This
new idea would cost 38 bytes of data. As I understand, the overhead is 40
bytes. So the break-even point for this new scheme is when users have on
average at least 1.4 privileges (78/54) granted to them on one object.
Considering that such objects as types and functions will in any case have
at most one privilege (USAGE or EXECUTE, resp.), that there are groups (or
roles), that column level privileges will probably tend to have sparse
tuples of this kind, and that object owners are short-circuited in any
case, then it is not at all clear whether that figure will be reached.
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden