Changing the concept of a DATABASE
On 21 May 2012 20:40, Stephen Frost <sfrost@snowman.net> wrote:
This is important. I like the idea of breaking down the barriers
between databases to allow it to be an option for one backend to
access tables in multiple databases. The current mechanism doesn't
actually prevent looking at data from other databases using internal
APIs, so full security doesn't exist. It's a very common user
requirement to wish to join tables stored in different databases,
which ought to be possible more cleanly with correct privileges.That's really a whole different ball of wax and I don't believe what
Robert was proposing would actually allow that to happen due to the
other database-level things which are needed to keep everything
consistent... That's my understanding, anyway. I'd be happy as anyone
if we could actually make it work, but isn't like the SysCache stuff per
database? Also, cross-database queries would actually make it more
difficult to have per-database roles, which is one thing that I was
hoping we might be able to work into this, though perhaps we could have
a shared roles table and a per-database roles table and only 'global'
roles would be able to issue cross-database queries..
So collecting a few requirements from various places:
* Ability to have a Role that can only access one Database
* Allow user info to be dumped with a database, to make a db
completely self-consistent
* Allow databases to be transportable
* Allow users to access tables in >1 database easily, with appropriate rights.
I don't see any reasons why these things would be against each other.
The main objectives are to make a Database a more easily used
administrative grouping. At present, people who use multiple Databases
face many problems - they aren't as separate as you'd like, but
neither can they be ignored when required.
The idea of "one main database per session" is fine, but wiring it so
closely into the backend has a few disadvantages, many of them weird
internal things.
Are there arguments against those requirements before we spend time on
design/thinking?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 22/05/12 11:46, Simon Riggs wrote:
On 21 May 2012 20:40, Stephen Frost<sfrost@snowman.net> wrote:
This is important. I like the idea of breaking down the barriers
between databases to allow it to be an option for one backend to
access tables in multiple databases. The current mechanism doesn't
actually prevent looking at data from other databases using internal
APIs, so full security doesn't exist. It's a very common user
requirement to wish to join tables stored in different databases,
which ought to be possible more cleanly with correct privileges.That's really a whole different ball of wax and I don't believe what
Robert was proposing would actually allow that to happen due to the
other database-level things which are needed to keep everything
consistent... That's my understanding, anyway. I'd be happy as anyone
if we could actually make it work, but isn't like the SysCache stuff per
database? Also, cross-database queries would actually make it more
difficult to have per-database roles, which is one thing that I was
hoping we might be able to work into this, though perhaps we could have
a shared roles table and a per-database roles table and only 'global'
roles would be able to issue cross-database queries..
IMVHO: s/database/schema/g does resolve many of the problems that you
were referring to... and 'dblink' should solve the rest, right?
Please, feel free to point out what I am (most probably) not considering
-- not experienced enough yet :)
On the other hand, the separation of databases allows what otherwise
would only be possible by using multiple instances of the database
server (à la Oracle, AFAIK ) -- save for resource management, but that
is another question whatsoever.
So collecting a few requirements from various places:
* Ability to have a Role that can only access one Database
Yes, please
* Allow user info to be dumped with a database, to make a db
completely self-consistent
+1
* Allow databases to be transportable
+1. Ideally, the binary format could be make platform-independent, so
that a snapshot/rsync of the cluster can span architectures easily.
AFAIK, endianness-change is relatively cheap on current processors [1
ASM instruction?] and it's not like we are memory-mapping tuples anyway
(TOASTed values can certainly not be mapped), so it shouldn't be
noticeable performance-wise.
* Allow users to access tables in>1 database easily, with appropriate rights.
See above, but I am probably wrong ...
I don't see any reasons why these things would be against each other.
Look quite orthogonal to me.
The main objectives are to make a Database a more easily used
administrative grouping. At present, people who use multiple Databases
face many problems - they aren't as separate as you'd like, but
neither can they be ignored when required.The idea of "one main database per session" is fine, but wiring it so
closely into the backend has a few disadvantages, many of them weird
internal things.Are there arguments against those requirements before we spend time on
design/thinking?
OTOH, the postmaster/cluster - session/database coupling looks to me
clean, simple... and seems to make the code simpler. This is can only be
good (but again, I don't know enough yet to be sure)
Regards,
Jose Luis Tallon
On 22 May 2012 12:05, José Luis Tallón <jltallon@nosys.es> wrote:
IMVHO: s/database/schema/g does resolve many of the problems that you were
referring to... and 'dblink' should solve the rest, right?
Please, feel free to point out what I am (most probably) not considering --
not experienced enough yet :)
The choice of schema/database is an important one. If you get it
wrong, you are in major difficulty. In many cases schemas would be a
better choice, but not in all cases. So I'm interested in solving the
problems for people who have multiple databases on same server.
dblink is the only solution, but its very poor way to do this when we
have 2 databases on same server.
My thinking is that reaching out to multiple databases is actually
mostly easy, except in a few places where dbid is hardwired into the
backend.
On the other hand, the separation of databases allows what otherwise would
only be possible by using multiple instances of the database server (à la
Oracle, AFAIK ) -- save for resource management, but that is another
question whatsoever.
Separation of databases is fine. I have no intention to change that,
as long as the user wishes that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On May22, 2012, at 11:46 , Simon Riggs wrote:
* Ability to have a Role that can only access one Database
* Allow user info to be dumped with a database, to make a db
completely self-consistent
These two could be achieved by having database-local roles I think.
* Allow databases to be transportable
That would be very useful, but extremely hard to do unless we switch to
per-database XID spaces. Or unless we're content with having databases
only be transportable after some special "MAKE TRANSPORTABLE" operation,
which would freeze all tuples and make the database read-only.
* Allow users to access tables in >1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.
best regards,
Florian Pflug
On 22 May 2012 12:35, Florian Pflug <fgp@phlo.org> wrote:
* Allow users to access tables in >1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.
It's a poor approximation of it, free or not.
If it actually worked well, I'd be happy. It doesn't. No proper
transaction support, no session pool, poor planning etc etc. At best
its a band-aid, not something sufficiently good to recommend for
general production use.
FDWs are very good but aren't good enough for intra-database usage.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, May 22, 2012 at 7:35 AM, Florian Pflug <fgp@phlo.org> wrote:
* Allow users to access tables in >1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 22/05/12 13:24, Simon Riggs wrote:
On 22 May 2012 12:05, José Luis Tallón<jltallon@nosys.es> wrote:
IMVHO: s/database/schema/g does resolve many of the problems that you were
referring to... and 'dblink' should solve the rest, right?
Please, feel free to point out what I am (most probably) not considering --
not experienced enough yet :)The choice of schema/database is an important one. If you get it
wrong, you are in major difficulty. In many cases schemas would be a
better choice, but not in all cases. So I'm interested in solving the
problems for people who have multiple databases on same server.
Ok. Understood.
Thank you for the clarification
dblink is the only solution, but its very poor way to do this when we
have 2 databases on same server.My thinking is that reaching out to multiple databases is actually
mostly easy, except in a few places where dbid is hardwired into the
backend.
The only drawback I see is that it might weaken the separation.
Even though arguably a kludge, dblink could have a "shortcut" added,
whereby connections to another database within the same cluster would be
serviced directly within the backend, as opossed to opening a new db
connection. This is effectively a fastpath within dblink, which
optimizes a relatively common case while at the same time not loosing
generality.
On the other hand, the separation of databases allows what otherwise would
only be possible by using multiple instances of the database server (à la
Oracle, AFAIK ) -- save for resource management, but that is another
question whatsoever.Separation of databases is fine. I have no intention to change that,
as long as the user wishes that.
Perfect.
Thanks,
Jose Luis Tallon
On 22/05/12 13:47, Simon Riggs wrote:
On 22 May 2012 12:35, Florian Pflug<fgp@phlo.org> wrote:
* Allow users to access tables in>1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.It's a poor approximation of it, free or not.
If it actually worked well, I'd be happy. It doesn't. No proper
transaction support, no session pool, poor planning etc etc. At best
its a band-aid, not something sufficiently good to recommend for
general production use.
- Transaction support: should be fixed, I guess.
- Session pool: is this really needed? I would it externally -- for
example, by specifying a connection string to a pgPool/pgBouncer as
opposed to directly to the db server.
- Planning: add a tunable specifying a higher cost (with a exception for
cluster-local dblinks, which would have a lower cost), and the rest
should be straightforward. Of course, planning would'nt be as accurate
---we can't access the other db statistics in order to potentially
rewrite conditions---, but I don't think that would be a proper approach
(separation concerns, again)
FDWs are very good but aren't good enough for intra-database usage.
The optimization I just proposed (plus the required fixes to FDW) might
very well solve this, while providing useful enhancements for all users,
whether they are accessing cluster-local databases or not (or even truly
foreign datasources)
Regards,
Jose Luis Tallon
On May22, 2012, at 13:47 , Simon Riggs wrote:
On 22 May 2012 12:35, Florian Pflug <fgp@phlo.org> wrote:
* Allow users to access tables in >1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.It's a poor approximation of it, free or not.
If it actually worked well, I'd be happy. It doesn't. No proper
transaction support, no session pool, poor planning etc etc. At best
its a band-aid, not something sufficiently good to recommend for
general production use.
These all sound fixable, though.
FDWs are very good but aren't good enough for intra-database usage.
OTOH, cross-database queries would by design be limited to databases
within one cluster, whereas a FDW-based solution would not. I don't really see
the different between telling people "put all your tables into one database
if you need to access them from within one session" and "put all your databases
on one server if you need to access them from within one session".
Plus, the more tightly different databases in the same cluster are coupled,
the more people will setup one cluster per database for performance reasons.
Then, when they discovered they need inter-database queries after all, we'll
again have to tell them "well, then don't set things up the way you have".
If we want to make it easier for people to migrate from multiple databases
to a single database with multiple schemas, maybe we should look into allowing
nested schemas? AFAIK, the main reason not to do that are ambiguities in the
meaning of identifiers, which cross-database queries would have to deal with
also.
best regards,
Florian Pflug
On 05/22/2012 07:56 AM, Robert Haas wrote:
On Tue, May 22, 2012 at 7:35 AM, Florian Pflug<fgp@phlo.org> wrote:
* Allow users to access tables in>1 database easily, with appropriate rights.
That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.+1.
That seems to be leaving aside the fact that we don't currently have any
notion of how to allow FDWs to write the foreign tables.
What is more, isn't the postgres FDW about talking to any postgres
source? If so, does it have special smarts for when we are talking to
ourselves? And if it doesn't then it seems unlikely to be an acceptable
substitute for allowing talking direct to a sibling database.
I'm not at all yet sold on Simon's plan, but I'm skeptical that an FDW
would adequately meet the case if we wanted to go that way.
cheers
andrew
* Simon Riggs (simon@2ndQuadrant.com) wrote:
* Ability to have a Role that can only access one Database
Alright, I'd like to think about this one specifically and solicit
feedback on the idea that we keep the existing shared role tables but
add on additional tables for per-database roles.
In the past, I feel like we've been focused on the idea of moving all
roles to be per-database instead of per-cluster, which certainly has a
lot of problems associated with it, but in the end, I think people would
be really happy with some shared roles and some per-DB roles.
What would the semantics of that look like though? Which is "preferred"
when you do a 'grant select' or 'grant role'? Or do we just disallow
overlaps between per-DB roles and global roles? If we don't allow
duplicates, I suspect a lot of the other questions suddenly become a lot
easier to deal with, but would that be too much of a restriction? How
would you handle migrating an existing global role to a per-database
role?
Thanks,
Stephen
On Tue, May 22, 2012 at 8:40 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
That seems to be leaving aside the fact that we don't currently have any
notion of how to allow FDWs to write the foreign tables.What is more, isn't the postgres FDW about talking to any postgres source?
If so, does it have special smarts for when we are talking to ourselves? And
if it doesn't then it seems unlikely to be an acceptable substitute for
allowing talking direct to a sibling database.I'm not at all yet sold on Simon's plan, but I'm skeptical that an FDW would
adequately meet the case if we wanted to go that way.
Well, I don't think anyone is claiming that FDWs as they exist today
solve all of the problems in this area. But I think that improving
FDWs is a more promising line of attack than trying to make backends
talk to multiple databases. Doing the latter will require massive
surgery on the relcache, the catcache, most of the secondary catalog
caches, the ProcArray, and every portion of the backend that thinks an
OID uniquely identifies an SQL object. Basically, they'd all need
database OID as an additional key field, which is undesirable for
performance reasons even if there were no issue of code churn.
So I'd rather see us put the effort into pgsql_fdw, which, as Florian
says, will also let us talk to a completely separate server. If
you've got multiple databases in the same cluster and really need to
be doing queries across all of them, that's what schemas are
supposedly for. Now I know that people feel that doesn't work as well
as it needs to, but there again I think it would be easier to fix
schemas than to make cross-database queries work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Simon Riggs wrote:
On 21 May 2012 20:40, Stephen Frost <sfrost@snowman.net> wrote:
This is important. I like the idea of breaking down the barriers
between databases to allow it to be an option for one backend to
access tables in multiple databases.
So collecting a few requirements from various places:
[...]
* Allow users to access tables in >1 database easily, with appropriate
rights.
The main objectives are to make a Database a more easily used
administrative grouping. At present, people who use multiple Databases
face many problems - they aren't as separate as you'd like, but
neither can they be ignored when required.The idea of "one main database per session" is fine, but wiring it so
closely into the backend has a few disadvantages, many of them weird
internal things.Are there arguments against those requirements before we spend time on
design/thinking?
From my perspective it is a great advantage that a user connected
to one database cannot access objects from a different database
without additional software, no matter what permissions he or she has
(short of superuser, who can do anything).
This enables us to have many different databases in one cluster
without having to worry that they can affect each other.
If you need different applications to routinely access each other's
tables, why not assign them to different schemas in one database?
For those cases where you absolutely need access to a different
database, you can use dblink or a foreign data wrapper (hopefully
in 9.3).
So -1 on that particular suggestion.
Yours,
Laurenz Albe
On 22 May 2012 14:04, Stephen Frost <sfrost@snowman.net> wrote:
What would the semantics of that look like though? Which is "preferred"
when you do a 'grant select' or 'grant role'? Or do we just disallow
overlaps between per-DB roles and global roles? If we don't allow
duplicates, I suspect a lot of the other questions suddenly become a lot
easier to deal with, but would that be too much of a restriction? How
would you handle migrating an existing global role to a per-database
role?
Perhaps:
CREATE [ GLOBAL | LOCAL ] ROLE name [ LIKE role_name ] [ [ WITH ]
option [ ... ] ]
Then:
CREATE LOCAL ROLE localrolename LIKE globalrolename;
REASSIGN OWNED BY globalrolename TO localrolename;
Conflicts would occur where localrolename matches an existing local
role name within the same database, or a global role name, but not a
local role name within another database. The problem with this,
however, is that creating global roles would need conflict checks
against local roles in every database, unless a manifest of all local
roles were registered globally.
--
Thom
Thom Brown <thom@linux.com> writes:
Conflicts would occur where localrolename matches an existing local
role name within the same database, or a global role name, but not a
local role name within another database. The problem with this,
however, is that creating global roles would need conflict checks
against local roles in every database, unless a manifest of all local
roles were registered globally.
Yeah. The same type of issue arises for the roles' OIDs. You'd really
want local and global roles to have nonconflicting OIDs, else it's
necessary to carry around an indication of which type each role is;
which would be more or less a show-stopper in terms of the number of
catalogs and internal APIs affected. But I don't currently see any
nice way to guarantee that if each database has a private table of
local roles.
You could possibly make it work if all roles, local and global, are
stored in a single shared catalog. But that seems pretty ugly.
BTW, I wonder whether this type of problem isn't also pretty fatal for
the sorts of hierarchical catalog structures we were speculating about
at PGCon. When we were talking about that I was sort of assuming that
the more-closely-nested levels could just hide conflicting objects at
outer levels, but on reflection that seems a bit scary.
regards, tom lane
On Tue, May 22, 2012 at 9:37 AM, Thom Brown <thom@linux.com> wrote:
On 22 May 2012 14:04, Stephen Frost <sfrost@snowman.net> wrote:
What would the semantics of that look like though? Which is "preferred"
when you do a 'grant select' or 'grant role'? Or do we just disallow
overlaps between per-DB roles and global roles? If we don't allow
duplicates, I suspect a lot of the other questions suddenly become a lot
easier to deal with, but would that be too much of a restriction? How
would you handle migrating an existing global role to a per-database
role?Perhaps:
CREATE [ GLOBAL | LOCAL ] ROLE name [ LIKE role_name ] [ [ WITH ]
option [ ... ] ]Then:
CREATE LOCAL ROLE localrolename LIKE globalrolename;
REASSIGN OWNED BY globalrolename TO localrolename;
Conflicts would occur where localrolename matches an existing local
role name within the same database, or a global role name, but not a
local role name within another database. The problem with this,
however, is that creating global roles would need conflict checks
against local roles in every database, unless a manifest of all local
roles were registered globally.
There are race conditions to worry about, too. In most cases, we rely
on the btree index machinery as a final backstop against duplicate
catalog entries. But that doesn't work unless everything's in one
catalog, nor for anything more complicated than "this set of columns
taken together should be unique over every role". Even if we were OK
with incurring the ugliness of storing per-database roles in a shared
catalog, the uniqueness constraint you'd want is something like "no
two roles can share the same name unless they have unequal database
OIDs neither of which is zero", which I don't believe we can enforce
via the btree machinery, at least not without an expression index that
won't work in a system catalog anyway.
In retrospect, I think the idea of shared catalogs was probably a bad
idea. I think we should have made roles and tablespaces database
objects rather than shared objects, and come up with some ad-hoc
method of representing the set of available databases. But that
decision seems to have been made sometime pre-1996, so the thought of
changing it now is pretty painful, but I can dream...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
* Thom Brown (thom@linux.com) wrote:
Conflicts would occur where localrolename matches an existing local
role name within the same database, or a global role name, but not a
local role name within another database. The problem with this,
however, is that creating global roles would need conflict checks
against local roles in every database, unless a manifest of all local
roles were registered globally.
Hmm, right, that's a bit of a sticky point. wrt your suggestion- it
works great if we don't allow duplicates, but then people have to accept
their role name is getting changed. That said, perhaps that's not that
big of a deal, since you could rename it afterwards.
The issue with the constraints on other databases might actually be
cause enough to allow duplicates, just to avoid that issue.. We could
mirror the per-database roles into a shared space that isn't externally
visible, but at that point, maybe we should try to get RLS for the
catalogs instead, or just modify the views to only show roles which can
connect to this database. That's not going to make them completely
transparent, but it might be enough for some use cases.
Thanks,
Stephen
Am 22.05.2012 15:27, schrieb Albe Laurenz:
If you need different applications to routinely access each other's
tables, why not assign them to different schemas in one database?
The use case in my mind for accessing more databases is when you want to
access stuff different languages.
You only can set encoding / LC_Collate per database not per schema.
So for different languages you might need different databases to do
correct sorting / indexing.
Susanne
--
Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com
Susanne Ebrecht <susanne@2ndquadrant.com> writes:
The use case in my mind for accessing more databases is when you want to
access stuff different languages.
You only can set encoding / LC_Collate per database not per schema.
So for different languages you might need different databases to do
correct sorting / indexing.
Encoding yes, but since 9.1 we have pretty fine-grained control of
collation. So I think this argument is a lot weaker than it used
to be. It would only really apply if you have one of the corner
cases where utf8 doesn't work for you.
regards, tom lane
On May22, 2012, at 16:09 , Tom Lane wrote:
Thom Brown <thom@linux.com> writes:
Conflicts would occur where localrolename matches an existing local
role name within the same database, or a global role name, but not a
local role name within another database. The problem with this,
however, is that creating global roles would need conflict checks
against local roles in every database, unless a manifest of all local
roles were registered globally.Yeah. The same type of issue arises for the roles' OIDs. You'd really
want local and global roles to have nonconflicting OIDs, else it's
necessary to carry around an indication of which type each role is;
which would be more or less a show-stopper in terms of the number of
catalogs and internal APIs affected. But I don't currently see any
nice way to guarantee that if each database has a private table of
local roles.
Maybe we could simply make all global role's OIDs even, and all local ones
odd, or something like that.
best regards,
Florian Pflug