SQL/MED compatible connection manager
Howdy,
Currently pl/proxy, dblink, DBI-link etc. each have their own remote connection
info management infrastructure (or none at all). It would certainly help if
they could use a common connection database -- with access control, pg_dump
support, etc. There have been hints that a SQL/MED compatible connection
manager would be desirable:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00314.php
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00909.php
So the proposal is to implement a small subset of SQL/MED to cope with
connection info management -- connection manager. This will only manage the
connection metadata and provide the required system catalogs and commands for
maintaining them. The actual connection management (open/close etc.) is still
handled by the client modules.
I have put together a draft that describes a possible implementation:
http://wiki.postgresql.org/wiki/SqlMedConnectionManager
Tons of details have been omitted, but should be enough to start discussion.
What do you think, does this sound usable? Suggestions, objections?
thanks,
Martin
On Mon, Oct 27, 2008 at 10:06 AM, Martin Pihlak <martin.pihlak@gmail.com> wrote:
So the proposal is to implement a small subset of SQL/MED to cope with
connection info management -- connection manager. This will only manage the
connection metadata and provide the required system catalogs and commands for
maintaining them. The actual connection management (open/close etc.) is still
handled by the client modules.
Per SQL:2008, there are no expected changes to SQL/MED from SQL:2003.
As such, two weeks ago, I completed a full review of SQL/MED and am
planning to fully implement it for 8.5. Currently, I'm working on a
proof of concept and have created a SQL/MED access method (sqlmed) as
well as started implementing the FDW API and hooks into the optimizer
to support remote capabilities, costing, and predicate pushdown. The
first wrappers I intend to support are ODBC and
This is a large project, and I'm certainly open to assistance :)
--
Jonah H. Harris, Senior DBA
myYearbook.com
On Mon, Oct 27, 2008 at 10:35 AM, Jonah H. Harris
<jonah.harris@gmail.com> wrote:
The first wrappers I intend to support are ODBC and
Damn multiple windows :)
The first wrappers I intend to support are ODBC and CSV/fixed-width text.
--
Jonah H. Harris, Senior DBA
myYearbook.com
Per SQL:2008, there are no expected changes to SQL/MED from SQL:2003.
As such, two weeks ago, I completed a full review of SQL/MED and am
planning to fully implement it for 8.5. Currently, I'm working on a
proof of concept and have created a SQL/MED access method (sqlmed) as
well as started implementing the FDW API and hooks into the optimizer
to support remote capabilities, costing, and predicate pushdown. The
first wrappers I intend to support are ODBC and
Cool. Have you published some notes on it (wiki etc)?
This is a large project, and I'm certainly open to assistance :)
It certainly is an undertaking :) I'm mostly interested in the connection
management -- so hopefully I can help there.
regards,
Martin
On Mon, Oct 27, 2008 at 11:31 AM, Martin Pihlak <martin.pihlak@gmail.com> wrote:
Cool. Have you published some notes on it (wiki etc)?
Not yet. Discussed it a little on irc, but nothing substantial. I'll
look at updating the Wiki hopefully today.
It certainly is an undertaking :) I'm mostly interested in the connection
management -- so hopefully I can help there.
That would be awesome!
--
Jonah H. Harris, Senior DBA
myYearbook.com
martin.pihlak@gmail.com (Martin Pihlak) writes:
Tons of details have been omitted, but should be enough to start discussion.
What do you think, does this sound usable? Suggestions, objections?
Slony-I does some vaguely similar stuff in its handling of "connection paths"; here's the schema:
create table @NAMESPACE@.sl_path (
pa_server int4,
pa_client int4,
pa_conninfo text NOT NULL,
pa_connretry int4,
CONSTRAINT "sl_path-pkey"
PRIMARY KEY (pa_server, pa_client),
CONSTRAINT "pa_server-no_id-ref"
FOREIGN KEY (pa_server)
REFERENCES @NAMESPACE@.sl_node (no_id),
CONSTRAINT "pa_client-no_id-ref"
FOREIGN KEY (pa_client)
REFERENCES @NAMESPACE@.sl_node (no_id)
) WITHOUT OIDS;
comment on table @NAMESPACE@.sl_path is 'Holds connection information for the paths between nodes, and the synchronisation delay';
comment on column @NAMESPACE@.sl_path.pa_server is 'The Node ID # (from sl_node.no_id) of the data source';
comment on column @NAMESPACE@.sl_path.pa_client is 'The Node ID # (from sl_node.no_id) of the data target';
comment on column @NAMESPACE@.sl_path.pa_conninfo is 'The PostgreSQL connection string used to connect to the source node.';
comment on column @NAMESPACE@.sl_path.pa_connretry is 'The synchronisation delay, in seconds';
I wouldn't be surprised to find there being some value in using
something like SQL/MED.
One detail I'll point out, that I'm noticing from an application I'm
working on right now. We might want to have something like a "db
connection" data type; here's a prototype I put together:
slonyregress1=# create type dbconn as (port integer, dbname text, username text, password text, ssl boolean);
CREATE TYPE
slonyregress1=# create table dbconns (id serial primary key, db dbconn);
NOTICE: CREATE TABLE will create implicit sequence "dbconns_id_seq" for serial column "dbconns.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "dbconns_pkey" for table "dbconns"
CREATE TABLE
slonyregress1=# insert into dbconns (db) values ((5432, 'slonyregress1', 'slony', 'secret!', 'true'));
INSERT 0 1
slonyregress1=# select * from dbconns;
id | db
----+--------------------------------------
1 | (5432,slonyregress1,slony,secret!,t)
(1 row)
I'm not certain that this is forcibly the right representation, but I
think it is possible that we'd want a finer-grained representation
than merely a connection string.
--
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://linuxdatabases.info/info/finances.html
"DTDs are not common knowledge because programming students are not
taught markup. A markup language is not a programming language."
-- Peter Flynn <silmaril@m-net.arbornet.org>
Chris Browne wrote:
Slony-I does some vaguely similar stuff in its handling of "connection paths"; here's the schema:
create table @NAMESPACE@.sl_path (
pa_server int4,
pa_client int4,
pa_conninfo text NOT NULL,
pa_connretry int4,
[snip ...]
I wouldn't be surprised to find there being some value in using
something like SQL/MED.
Here the pa_conninfo could be replaced with the connection name (actually
SERVER). For the complete connection definition a USER MAPPING (eg. remote
username and password) is also needed. But that can be fetched by the
connection connection lookup function
One detail I'll point out, that I'm noticing from an application I'm
working on right now. We might want to have something like a "db
connection" data type; here's a prototype I put together:
slonyregress1=# create type dbconn as (port integer, dbname text, username text, password text, ssl boolean);
CREATE TYPE
[snip]
slonyregress1=# select * from dbconns;
id | db
----+--------------------------------------
1 | (5432,slonyregress1,slony,secret!,t)
(1 row)I'm not certain that this is forcibly the right representation, but I
think it is possible that we'd want a finer-grained representation
than merely a connection string.
Yes -- the server, user mapping and FDW all take generic options. Some of them
might be part of the connect string, others could specify some hints of how the
connection should be handled (driver options etc). DBD-Excel has a particularly
nasty example of those. A fixed type would not be able to cover all of them.
This is where the SQL/MED stuff can help - all of this complexity can be reduced
to a single name. Though it adds the additional step of doing the lookup.
The dbconns example could be written like this:
test=# create table dbconns (id serial primary key, db regserver);
...
test=# insert into dbconns (db) values ('test');
INSERT 0 1
test=# select * from dbconns;
id | db
----+-------------
1 | public.test
(1 row)
And then for the connection details:
test=# select * from pg_get_remote_connection_info('test');
option | value
----------+--------
host | /tmp
port | 6543
dbname | foo
username | bob
password | secret
(5 rows)
This assumes that there is a server "public.test" and a user mapping for
the session user. The option/value pairs are outputted by the "dummy" FDW
that just dumps the generic options for the server and user mapping. A
"smart" FDW could turn this into just a connection string. Still, there
probably should be a set of functions for accessing the raw options/value
pairs as well
regards,
Martin
On Mon, 2008-10-27 at 16:50 -0400, Chris Browne wrote:
martin.pihlak@gmail.com (Martin Pihlak) writes:
Tons of details have been omitted, but should be enough to start discussion.
What do you think, does this sound usable? Suggestions, objections?Slony-I does some vaguely similar stuff in its handling of "connection paths"; here's the schema:
I think the whole issue was initially raised by "insecurity", as dblink
conrib module exposed connection strings to all users, and SQL/MED was
seen as a "standard" way to hide it.
The simple credentials hiding could of course be achieved by having
something similar to pg_user/pg_shadow and some SECURITY DEFINER
functions for actually opening the connections, but probably it seemed
easier to at least base it on standards, so we can actually start with
pg_remote_server table public
pg_user_mapping_shadow table (restricted)/ pg_user_mapping view(public)
and some functions with proper grants to match the subset that Martin
outlined in http://wiki.postgresql.org/wiki/SqlMedConnectionManager
if we've got that working, then we could move to massaging it into the
parser to provide standard SQL/MED syntax.
so I think that first we should agree on functionality and get the few
system (?) tables and functions done, and worry about parser changes
once the actual functionality is field tested.
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
I have put together a draft that describes a possible implementation:
http://wiki.postgresql.org/wiki/SqlMedConnectionManager
I'll update this with an experimental patch. This implements:
* System catalogs for FDW, SERVER and USER MAPPING
* regserver data type for servers (convert from text to oid etc).
* FDW placeholders as shared libraries -- currently provides dummy
and pgsql wrappers.
* Connection lookup via FDW.
* SQL interface to the lookup function.
There is an example of how all this works at:
http://wiki.postgresql.org/wiki/SqlMedConnectionManager#Current_implementation
I was hoping to get the internal features done by the start of November
Commitfest. But right now this seems unlikely. Nevertheless, I'm all ears
for suggestions and criticism.
PS. Jonah, does this somehow coincide with your approach to FDW-s?
regards,
Martin
Attachments:
Here is an updated patch. I'm also submitting this to November Commitfest -
I believe it is possible to get this into shape for 8.4. This is still a WIP
but I really need a review before moving on with the syntax, pg_dump support
etc.
Currently the system catalogs and user accessible connection lookup function
have been implemented. Ships with 2 FOREIGN DATA WRAPPERS -- dummy and pgsql.
It is possible to define connections by inserting directly into the catalogs:
insert into pg_catalog.pg_foreign_server
select 'foo', 2200, 10, oid, null,
'{host=/tmp,port=6543,dbname=foo}'::text[]
from pg_foreign_data_wrapper
where fdwname='default';
insert into pg_catalog.pg_foreign_user_mapping
select 10, oid, '{user=bob,password=secret}'::text[]
from pg_foreign_server
where srvname='foo' ;
select * from pg_get_remote_connection_info('foo');
option | value
----------+--------
host | /tmp
port | 6543
dbname | foo
user | bob
password | secret
(5 rows)
regards,
Martin
Attachments:
On Fri, 2008-10-31 at 17:49 +0200, Martin Pihlak wrote:
Here is an updated patch. I'm also submitting this to November Commitfest -
I believe it is possible to get this into shape for 8.4. This is still a WIP
but I really need a review before moving on with the syntax, pg_dump support
etc.
Any hope of getting pl/proxy using this submitted for 8.4 ?
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
Attached is next revision of the connection manager, this is now nearly
feature complete -- the syntax is there, privileges are implemented.
Should apply cleanly to HEAD and pass regression. There are some usage
examples at: http://wiki.postgresql.org/wiki/SqlMedConnectionManager#Examples
Some issues that come to mind:
- Object naming: probably should drop "foreign_" prefix from user mapping
and server. This would leave us with pg_user_mapping and pg_server.
Maybe add a _shadow suffix to the hidden version of pg_user_mapping.
- MAPPING became a reserved keyword, this is not good (unreserved causes
shift/reduce conflicts).
- There is some more grammar hacking to be done - ALTER handling is not
complete (allows nop statements, impossible to reset options, cannot use
default as FDW name, ...).
- System catalog and connection lookup function privileges are revoked from
public in system_views.sql, is there an alternative?
- Worry about FDW library function pointers getting stale (library reloads)
- Do we need to support copyObject/equal for the fdw/server/user mapping
parse nodes?
Things to do:
- internal cleanup - add verbose commentary where needed, cleanup error
reporting.
- documentation
- more thorough regression tests
- psql support (tab completion, help, \dX commands)
- pg_dump support
- ecpg support?
regards,
Martin
Attachments:
Martin Pihlak <martin.pihlak@gmail.com> writes:
- Do we need to support copyObject/equal for the fdw/server/user mapping
parse nodes?
Yes.
regards, tom lane
Here's another revision of the connection manager. This adds:
* reference documentation
* psql, tab-completion, \dw, \dr, \dm commands.
* pg_dump support
Still todo:
* more comprehensive regression tests
* introductory documentation
* dblink support
I hope to finish these items during next week, and remove the WIP
status then.
regards,
Martin
Attachments:
connection-manager-1121.patch.gzapplication/x-gzip; name=connection-manager-1121.patch.gzDownload+0-1
Martin Pihlak wrote:
Here's another revision of the connection manager. This adds:
* reference documentation
* psql, tab-completion, \dw, \dr, \dm commands.
* pg_dump supportStill todo:
* more comprehensive regression tests
* introductory documentation
* dblink supportI hope to finish these items during next week, and remove the WIP
status then.
Looks very good, please continue.
Most of this is straight out of the standard, so there isn't much to
discuss on the interfaces. I have two small things to wonder about:
On your wiki page, you have this example:
CREATE SERVER userdb TYPE 'plproxy_cluster'
FOREIGN DATA WRAPPER plproxy
OPTIONS (
server='dbname=userdb_p0 host=127.0.0.1 port=6000',
server='dbname=userdb_p1 host=127.0.0.1 port=6000',
server='dbname=userdb_p2 host=127.0.0.1 port=6000',
server='dbname=userdb_p3 host=127.0.0.1 port=6000',
connection_lifetime=3600
);
If I read this right, SQL/MED requires option names to be unique for a
server. To this needs to be rethought.
Do we really need the function pg_get_remote_connection_info()? This
could be done directly with the information schema. E.g., your example
SELECT * FROM pg_get_remote_connection_info('userdb');
appears to be the same as
SELECT option_name, option_value
FROM information_schema.foreign_server_options
WHERE foreign_server_name = 'userdb';
This view also appears to have the necessary access control provisions.
And similarly, pg_get_user_mapping_options() is equivalent to
information_schema.user_mapping_options.
Peter Eisentraut wrote:
Looks very good, please continue.
Thanks, will do :)
On your wiki page, you have this example:
CREATE SERVER userdb TYPE 'plproxy_cluster'
FOREIGN DATA WRAPPER plproxy
OPTIONS (
server='dbname=userdb_p0 host=127.0.0.1 port=6000',
server='dbname=userdb_p1 host=127.0.0.1 port=6000',
[snip]
If I read this right, SQL/MED requires option names to be unique for a
server. To this needs to be rethought.
Indeed, seems that I somehow managed to miss that. Additionally, according
to the standard the options should be specified as <name> <value>, instead
of <name> = <value>. Plus the possibility to alter individual options.
I'll look into that.
Updated the wiki to use unique option names.
Do we really need the function pg_get_remote_connection_info()? This
could be done directly with the information schema. E.g., your exampleSELECT * FROM pg_get_remote_connection_info('userdb');
The purpouse of the connection lookup function is to compose the the actual
connection string from generic options (adds user mapping if needed). This
aims to make it easier for the clients to perform connection lookups. The
idea is that the connection string should be composed by the foreign data
wrapper, instead of letting each client have it's own interpretation of the
options. Though, it is still possible to query the options directly.
And similarly, pg_get_user_mapping_options() is equivalent to
information_schema.user_mapping_options.
Hmm, the options are stored as text[], these need to be transformed to be
usable in the views. Seems that additional functions for foreign data wrapper
and server options are also needed. Will add those, along with the
information_schema views.
regards,
Martin
Peter Eisentraut wrote:
If I read this right, SQL/MED requires option names to be unique for a
server. To this needs to be rethought.
Attached is another revision of the connection manager, notable changes:
* Generic options are now standard compliant -- no duplicate option names,
possibility to alter individual options.
* information_schema views, added to the end of the file, using chapter
numbers from part 5 of the standard. Qualified names are used in fdw
and server names unless directly visible.
* Added documentation for the connection lookup functions in "System
Administration Functions". Also documented the new system catalogs.
* Example dblink implementation. Added functions dblink_connect_s,
dblink_exec_s, dblink_s that operate on predefined foreign servers.
No documentation or regression tests at the moment. Some examples at:
http://wiki.postgresql.org/wiki/SqlMedConnectionManager#dblink
I'll also change the commitfest status to "Pending review" -- the features
are complete and I'm not actively working on the code.
regards,
Martin
Martin Pihlak wrote:
Peter Eisentraut wrote:
If I read this right, SQL/MED requires option names to be unique for a
server. To this needs to be rethought.Attached is another revision of the connection manager, notable changes:
Attached is my current patch after surgery. I have mainly worked on
making naming better and more consistent.
Problem: You have implemented foreign-data wrappers and foreign servers
as schema-qualified objects, but the standard has them outside schemas,
qualified only optionally by catalogs (a.k.a. databases). I think that
should be fixed.
Attachments:
connection-manager-1209-petere.patch.gzapplication/x-gzip; name=connection-manager-1209-petere.patch.gzDownload+1-2
Peter Eisentraut wrote:
Attached is my current patch after surgery. I have mainly worked on
making naming better and more consistent.
Thanks.
Problem: You have implemented foreign-data wrappers and foreign servers
as schema-qualified objects, but the standard has them outside schemas,
qualified only optionally by catalogs (a.k.a. databases). I think that
should be fixed.
Darn. At least it is a lot easier to root out the schema support
than to add it ... Will look into it.
regards,
Martin
Peter Eisentraut wrote:
Problem: You have implemented foreign-data wrappers and foreign servers
as schema-qualified objects, but the standard has them outside schemas,
qualified only optionally by catalogs (a.k.a. databases). I think that
should be fixed.
Attached. Removed schema support for foreign data wrapper and server,
updated documentation and regression tests.
Also added has_foreign_data_wrapper_privilege and has_server_privilege
functions. information_schema views use those to determine if user
has usage on the foreign data wrapper or server.
regards,
Martin