Notes on implementing URI syntax for libpq
Hello,
It was proposed a while ago for libpq to support URI syntax for specifying the connection information:
http://archives.postgresql.org/message-id/1302114698.23164.17.camel@jd-desktop
http://archives.postgresql.org/pgsql-hackers/2011-07/msg01144.php
It appears to me that the consensus was that:
1) this feature is indeed going to be useful,
and
2) that we would go by implementing a simple URI parser ourselves instead of adding dependency on any fancy external library.
Now we're going to actually implement this.
It is known that libpq (and, thus every utility using it to connect a database: psql, pg_dump, etc.) supports a way to specify some of the connection parameters (or all of them) via a single conninfo string, e.g:
psql -d "dbname=mydb host=example.net port=5433"
This, in my opinion, is very similar to what we would like to achieve with the URI syntax, so the above could also be specified using a URI parameter like this:
psql -d postgresql://example.net:5433/mydb
We can also support specifying extra parameters via the usual "?keyword=value&keyword2=other" syntax. As it was noted in the original discussion, sticking to what JDBC provides makes the most sense:
http://jdbc.postgresql.org/documentation/head/connect.html
So we should support 'user', 'password' and 'ssl' parameters (and probably just ignore the rest, at least for start.)
Upon libpq code inspection I come to think that the best place to plug this seems to be conninfo_array_parse function (where dbname keyword is currently being checked for '=' symbol and expanded):
We could similarly check for "postgresql:" designator and if present, extract the connection options from the dbname keyword thought to be a connection URI. The check should obviously go before the current check for '=', if we're going support the extra parameters, as outlined above.
I am going to sketch a work-in-progress patch in the background of a discussion here.
Your thoughts on this are very welcome!
--
Alex
It was proposed a while ago for libpq to support URI syntax for specifying the connection information:
...
Now we're going to actually implement this.
Do you know that we had this feature (more or less) in libpq for years but it
was removed quite a while ago. It should still be there in the archive, not
sure though if the old code fits the requirements for this feature completely.
Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
* Alexander Shulgin:
This, in my opinion, is very similar to what we would like to achieve with the URI syntax, so the above could also be specified using a URI parameter like this:
psql -d postgresql://example.net:5433/mydb
How would you specifiy a local port/UNIX domain socket?
Would it be possible to add something like
psql -d postgresql+ssh://fweimer@db5/var/run/postgresql/.s.PGSQL.5432
similar to what Subversion supports? (This might have security
implications when used from untrusted PHP scripts.)
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
Excerpts from Florian Weimer's message of Wed Nov 23 13:04:47 +0200 2011:
* Alexander Shulgin:
This, in my opinion, is very similar to what we would like to achieve with the URI syntax, so the above could also be specified using a URI parameter like this:
psql -d postgresql://example.net:5433/mydb
How would you specifiy a local port/UNIX domain socket?
Would it be possible to add something like
psql -d postgresql+ssh://fweimer@db5/var/run/postgresql/.s.PGSQL.5432
similar to what Subversion supports? (This might have security
implications when used from untrusted PHP scripts.)
While it is really tempting to provide support for all that fancy stuff (or at least support "user:password@host" part instead of the ugly "?user=&password=") this will make psql URIs backward-incompatible with the JDBC syntax, which is exactly what we want to avoid.
The primary reason people even considering adding the syntax, IMO is compatibility and thus, it has to be compatible in both directions. If we support something that's more than JDBC provides, we're just adding to the soup of incompatible URI syntaxes out there.
--
Alex
Excerpts from Florian Weimer's message of Wed Nov 23 13:04:47 +0200 2011:
* Alexander Shulgin:
This, in my opinion, is very similar to what we would like to achieve with the URI syntax, so the above could also be specified using a URI parameter like this:
psql -d postgresql://example.net:5433/mydb
How would you specifiy a local port/UNIX domain socket?
Missed that in my previous reply.
If host part of the URI points to localhost, the UNIX domain socket would be considered by libpq just as if you would pass "-h localhost -p 5433".
Hey Alexander,
2011/11/24 Alexander Shulgin <ash@commandprompt.com>
Excerpts from Florian Weimer's message of Wed Nov 23 13:04:47 +0200 2011:
* Alexander Shulgin:
This, in my opinion, is very similar to what we would like to achieve
with the URI syntax, so the above could also be specified using a URI
parameter like this:psql -d postgresql://example.net:5433/mydb
How would you specifiy a local port/UNIX domain socket?
Missed that in my previous reply.
If host part of the URI points to localhost, the UNIX domain socket would
be considered by libpq just as if you would pass "-h localhost -p 5433".
But what if the user wants to connect exactly via socket or
TCP/IP ?
And what if the user needs to specify a socket file name extension?
--
// Dmitriy.
Excerpts from Dmitriy Igrishin's message of Thu Nov 24 09:19:02 +0200 2011:
If host part of the URI points to localhost, the UNIX domain socket would
be considered by libpq just as if you would pass "-h localhost -p 5433".But what if the user wants to connect exactly via socket or
TCP/IP ?
And what if the user needs to specify a socket file name extension?
How do you achieve that with the current psql set of command line options (and, possibly environment variables?)
I would think the same method will work with URI, as with the proposed approach the URI is just decomposed into host, port and dbname parts and the rest of the code works like if you've had specified "-h example.net -p 5433 -d mydb" instead of the URI parameter.
--
Alex
On Thu, Nov 24, 2011 at 08:59:56AM +0200, Alexander Shulgin wrote:
How would you specifiy a local port/UNIX domain socket?
Missed that in my previous reply.
If host part of the URI points to localhost, the UNIX domain socket would be considered by libpq just as if you would pass "-h localhost -p 5433".
Uh, no it doesn't. "-h localhost" uses TCP/IP (try it). This is one
piece of mysql magic we don't copy. If you want to use the socket you
need to specify "-h /tmp" or wherever you keep it. Leaving out the -h
parameter also uses UNIX domain sockets.
Which does raise the valid question of how to represent that in URI
syntax. SQLAlchemy (for example) doesn't try with it's URL syntax, to
connect to a non-default UNIX socket, you need to create the URL object
directly.
How about the "service" option, that's a nice way of handling
non-default socket options.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.
-- Arthur Schopenhauer
2011/11/24 Alexander Shulgin <ash@commandprompt.com>
Excerpts from Dmitriy Igrishin's message of Thu Nov 24 09:19:02 +0200 2011:
If host part of the URI points to localhost, the UNIX domain socket
would
be considered by libpq just as if you would pass "-h localhost -p
5433".
But what if the user wants to connect exactly via socket or
TCP/IP ?
And what if the user needs to specify a socket file name extension?How do you achieve that with the current psql set of command line options
(and, possibly environment variables?)
For psql(1) see -h option and -p option
http://www.postgresql.org/docs/9.1/static/app-psql.html
For the libpq see host option and port option of PQconnectdbparams()
http://www.postgresql.org/docs/9.1/static/libpq-connect.html
In both cases:
If the value of host begins with a slash, it is used as the directory for
the Unix-domain socket.
Port specifies the TCP port or the local Unix-domain socket file extension.
I would think the same method will work with URI, as with the proposed
approach the URI is just decomposed into host, port and dbname parts and
the rest of the code works like if you've had specified "-h example.net-p 5433 -d mydb" instead of the URI parameter.
Thats great, but see above.
--
// Dmitriy.
On Nov 24, 2011, at 9:40 AM, Martijn van Oosterhout wrote:
On Thu, Nov 24, 2011 at 08:59:56AM +0200, Alexander Shulgin wrote:
How would you specifiy a local port/UNIX domain socket?
Missed that in my previous reply.
If host part of the URI points to localhost, the UNIX domain socket would be considered by libpq just as if you would pass "-h localhost -p 5433".
Uh, no it doesn't. "-h localhost" uses TCP/IP (try it). This is one
piece of mysql magic we don't copy. If you want to use the socket you
need to specify "-h /tmp" or wherever you keep it. Leaving out the -h
parameter also uses UNIX domain sockets.Which does raise the valid question of how to represent that in URI
syntax. SQLAlchemy (for example) doesn't try with it's URL syntax, to
connect to a non-default UNIX socket, you need to create the URL object
directly.How about the "service" option, that's a nice way of handling
non-default socket options.
Another idea is to use local:/dir/name for UNIX domain socket instead of hostname:port, like it's displayed in the psql prompt.
--
Alexey Klyukin http://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.
Excerpts from Martijn van Oosterhout's message of Thu Nov 24 09:40:42 +0200 2011:
On Thu, Nov 24, 2011 at 08:59:56AM +0200, Alexander Shulgin wrote:
How would you specifiy a local port/UNIX domain socket?
Missed that in my previous reply.
If host part of the URI points to localhost, the UNIX domain socket would be considered by libpq just as if you would pass "-h localhost -p 5433".
Uh, no it doesn't. "-h localhost" uses TCP/IP (try it). This is one
piece of mysql magic we don't copy. If you want to use the socket you
need to specify "-h /tmp" or wherever you keep it. Leaving out the -h
parameter also uses UNIX domain sockets.
Oh, you're right -- I was under wrong impression (hacking in the wrong local install, you know.)
Which does raise the valid question of how to represent that in URI
syntax. SQLAlchemy (for example) doesn't try with it's URL syntax, to
connect to a non-default UNIX socket, you need to create the URL object
directly.
Well, whatever syntax we're going to invent here: it is not supported by the JDBC driver.
"Because Java does not support using unix sockets the PostgreSQL™ server must be configured to allow TCP/IP connections."
http://jdbc.postgresql.org/documentation/head/prepare.html
Or, this has to be done not in the URI syntax itself, but with the use of some external option.
Or maybe we can just add &unixsocket=... and hope that JDBC simply ignores that? I think I will try the last option to see if that's the case. (Looking at libpq code, I think we will also need to verify that host/hostaddr parameter is pointing to the local host and reset it to NULL, to actually make libpq consider UNIX sockets.)
How about the "service" option, that's a nice way of handling
non-default socket options.
The service handling isn't going to be affected with the proposed approach. So, if PGSERVICE is given, the options from the service file are applied after the URI is parsed, filling any parameters not set using previous methods.
--
Alex
On Nov 24, 2011, at 1:57 AM, Alexander Shulgin <ash@commandprompt.com> wrote:
While it is really tempting to provide support for all that fancy stuff (or at least support "user:password@host" part instead of the ugly "?user=&password=") this will make psql URIs backward-incompatible with the JDBC syntax, which is exactly what we want to avoid.
I think it would be really weird not to support user:pw@host:port. You can presumably also support the JDBC style for backward compatibility, but I don't think we should adopt that syntax as project standard.
...Robert
Excerpts from Robert Haas's message of Thu Nov 24 13:57:17 +0200 2011:
I think it would be really weird not to support user:pw@host:port. You can presumably also support the JDBC style for backward compatibility, but I don't think we should adopt that syntax as project standard.
Well, I don't believe JDBC syntax is ideal either, but I don't recall any better option proposed in the original discussion: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01945.php
Do you suggest that we should reconsider?
--
Alex
Excerpts from Alexander Shulgin's message of jue nov 24 05:58:57 -0300 2011:
Excerpts from Martijn van Oosterhout's message of Thu Nov 24 09:40:42 +0200 2011:
Which does raise the valid question of how to represent that in URI
syntax. SQLAlchemy (for example) doesn't try with it's URL syntax, to
connect to a non-default UNIX socket, you need to create the URL object
directly.Well, whatever syntax we're going to invent here: it is not supported by the JDBC driver.
"Because Java does not support using unix sockets the PostgreSQL™ server must be configured to allow TCP/IP connections."
http://jdbc.postgresql.org/documentation/head/prepare.html
Or, this has to be done not in the URI syntax itself, but with the use of some external option.
Or maybe we can just add &unixsocket=... and hope that JDBC simply ignores that?
I think this is misguided. We don't need to have a URL that specifies a
Unix socket to work on JDBC, because it's obviously not going to work;
if you just have it "ignore" the &unixsocket bit, then the URI is no
longer the same and you could have it connecting to a completely
different server.
I think we should just propose something that will not work in JDBC.
Surely if the user wants an URL that works both in JDBC and libpq, they
should just not use a Unix-domain-socket specifying URI in the first
place.
What about something like
postgresql://<path-to-dir>:port/database
where the < > are present, i.e. if you want to specify a different
socket directory,
postgresql://</var/run/postgresql>:5433/database
and if you just want to use the default location,
postgresql://<>:5433/database
A coworker also suggested using a different designator:
postgresqli:///path/to/socket:5433/database
postgresqli://:5433/database
How about the "service" option, that's a nice way of handling
non-default socket options.The service handling isn't going to be affected with the proposed approach. So, if PGSERVICE is given, the options from the service file are applied after the URI is parsed, filling any parameters not set using previous methods.
I think the question is allowing the URI to specify a service.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Nov 24, 2011 at 7:33 AM, Alexander Shulgin
<ash@commandprompt.com> wrote:
Excerpts from Robert Haas's message of Thu Nov 24 13:57:17 +0200 2011:
I think it would be really weird not to support user:pw@host:port. You can presumably also support the JDBC style for backward compatibility, but I don't think we should adopt that syntax as project standard.
Well, I don't believe JDBC syntax is ideal either, but I don't recall any better option proposed in the original discussion: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01945.php
Do you suggest that we should reconsider?
I guess my feeling is that if we're going to have URLs, we ought to
try to adhere to the same conventions that are used for pretty much
every other service that supports URLs. user:pw@host:port is widely
supported by multiple protocols, so I think we would need a very good
reason to decide to go off in a completely different direction. It
would be nice to be compatible with whatever JDBC does (link?) but I'm
not prepared to put that ahead of general good design.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Excerpts from Alvaro Herrera's message of Thu Nov 24 15:21:49 +0200 2011:
I think the question is allowing the URI to specify a service.
Huh? The service definitions are read from a local pg_service.conf, and are specified by setting PGSERVICE (and PGSERVICEFILE) environment variables, no?
What would you do with such URI if you need to other people to connect to the same service? Send them URI along with the pg_service.conf?
Or are we talking about different things completely?
Excerpts from Robert Haas's message of Thu Nov 24 15:35:36 +0200 2011:
Do you suggest that we should reconsider?
I guess my feeling is that if we're going to have URLs, we ought to
try to adhere to the same conventions that are used for pretty much
every other service that supports URLs. user:pw@host:port is widely
supported by multiple protocols, so I think we would need a very good
reason to decide to go off in a completely different direction. It
would be nice to be compatible with whatever JDBC does (link?) but I'm
not prepared to put that ahead of general good design.
What JDBC supports is rather weird and far from being ideal: http://jdbc.postgresql.org/documentation/head/connect.html
The problem with supporting multiple syntaxes, IMO is that it makes libpq compatible in only one direction: from particular foreign syntax to libpq, but not from libqp to any other particular foreign syntax. So when you see psql -d <URL> you wouldn't know if you can copy that URL to JDBC or any other connection interface parameter, unless you check the docs thoroughly.
--
Alex
Excerpts from Robert Haas's message of jue nov 24 10:35:36 -0300 2011:
On Thu, Nov 24, 2011 at 7:33 AM, Alexander Shulgin
<ash@commandprompt.com> wrote:Excerpts from Robert Haas's message of Thu Nov 24 13:57:17 +0200 2011:
I think it would be really weird not to support user:pw@host:port. You can presumably also support the JDBC style for backward compatibility, but I don't think we should adopt that syntax as project standard.
Well, I don't believe JDBC syntax is ideal either, but I don't recall any better option proposed in the original discussion: http://archives.postgresql.org/pgsql-hackers/2011-03/msg01945.php
Do you suggest that we should reconsider?
I guess my feeling is that if we're going to have URLs, we ought to
try to adhere to the same conventions that are used for pretty much
every other service that supports URLs. user:pw@host:port is widely
supported by multiple protocols, so I think we would need a very good
reason to decide to go off in a completely different direction. It
would be nice to be compatible with whatever JDBC does (link?) but I'm
not prepared to put that ahead of general good design.
Apparently there's no standard:
http://www.petefreitag.com/articles/jdbc_urls/
Supporting the usual user:pw@host convention, _in addition to_ what our
own JDBC driver already supports, seems reasonable to me.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Martijn van Oosterhout's message of jue nov 24 04:40:42 -0300 2011:
How about the "service" option, that's a nice way of handling
non-default socket options.
What about it? Are you suggesting we should support some way to specify
a service name in the URI?
If so, consider this: if you set up a pg_service.conf file, and then
pass around a URI that specifies a service, no one else can use the URI
until you also pass around the service file.
So, in that light, do we still think that letting the user specify a
service name in the URI makes sense? (My personal opinion is yes).
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Nov 24, 2011 at 8:50 AM, Alexander Shulgin
<ash@commandprompt.com> wrote:
What JDBC supports is rather weird and far from being ideal: http://jdbc.postgresql.org/documentation/head/connect.html
The problem with supporting multiple syntaxes, IMO is that it makes libpq compatible in only one direction: from particular foreign syntax to libpq, but not from libqp to any other particular foreign syntax. So when you see psql -d <URL> you wouldn't know if you can copy that URL to JDBC or any other connection interface parameter, unless you check the docs thoroughly.
Well, based on that document, I think that trying to be bug-compatible
with the JDBC syntax is a, erm, doomed effort. I mean, what are you
going to do with things like loglevel or logUnclosedConnections that
change the behavior of JDBC, not PostgreSQL?
I think we could do something like:
postgresql://user:pw@host:port/database?param1=val1¶m2=val2¶m3=val3&...
...where the param and val bits are standard libpq connection
parameters. And for compatibility you could allow "user" and
"password" to be specified as connection parameters rather than
included in the host portion of the string. But you're still not
going to be 100% compatible with JDBC, because we're not going support
unknownLenghth=42 in libpq just because JDBC has chosen to implement
some weirdness in that area.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company