strange error reporting
I just made the mistake of trying to run pgbench without first running
createdb and got this:
pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist
This looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string and (2) saying that it
couldn't connect to the socket is wrong, else it would not also be
showing a server message.
I haven't investigated why this is happening; apologies if this is a
known issue.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
I just made the mistake of trying to run pgbench without first running
createdb and got this:
pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist
This looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string and (2) saying that it
couldn't connect to the socket is wrong, else it would not also be
showing a server message.
I'm not sure about the empty DB name in the first part (presumably
that's from pgbench, so what was your pgbench command exactly?).
But the 'could not connect to socket' part is a consequence of my
recent fiddling with libpq's connection failure reporting, see
52a10224e. We could discuss exactly how that ought to be spelled,
but the idea is to consistently identify the host that we were trying
to connect to. If you have a multi-host connection string, it's
conceivable that "rhaas" exists on some of those hosts and not others,
so I do not think the info is irrelevant.
Just looking at this, I wonder if we ought to drop pgbench's
contribution to the message entirely; it seems like libpq's
message is now fairly freestanding.
regards, tom lane
On Wed, Jan 20, 2021 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I just made the mistake of trying to run pgbench without first running
createdb and got this:pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not existThis looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string and (2) saying that it
couldn't connect to the socket is wrong, else it would not also be
showing a server message.I'm not sure about the empty DB name in the first part (presumably
that's from pgbench, so what was your pgbench command exactly?).
I think it was just 'pgbench -i 40'. For sure, I didn't specify a database name.
But the 'could not connect to socket' part is a consequence of my
recent fiddling with libpq's connection failure reporting, see
52a10224e. We could discuss exactly how that ought to be spelled,
but the idea is to consistently identify the host that we were trying
to connect to. If you have a multi-host connection string, it's
conceivable that "rhaas" exists on some of those hosts and not others,
so I do not think the info is irrelevant.
I'm not saying that which socket I used is totally irrelevant although
in most cases it's going to be a lot of detail. I'm just saying that,
at least for me, when you say you can't connect to a socket, I at
least think about the return value of connect(2), which was clearly 0
here.
Just looking at this, I wonder if we ought to drop pgbench's
contribution to the message entirely; it seems like libpq's
message is now fairly freestanding.
Maybe it would be better if it said:
connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not exist
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Jan 20, 2021 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
But the 'could not connect to socket' part is a consequence of my
recent fiddling with libpq's connection failure reporting, see
52a10224e. We could discuss exactly how that ought to be spelled,
but the idea is to consistently identify the host that we were trying
to connect to. If you have a multi-host connection string, it's
conceivable that "rhaas" exists on some of those hosts and not others,
so I do not think the info is irrelevant.
I'm not saying that which socket I used is totally irrelevant although
in most cases it's going to be a lot of detail. I'm just saying that,
at least for me, when you say you can't connect to a socket, I at
least think about the return value of connect(2), which was clearly 0
here.
Fair. One possibility, which'd take a few more cycles in libpq but
likely not anything significant, is to replace "could not connect to ..."
with "while connecting to ..." once we're past the connect() per se.
Maybe it would be better if it said:
connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not exist
I'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible.
regards, tom lane
On Wed, Jan 20, 2021 at 12:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fair. One possibility, which'd take a few more cycles in libpq but
likely not anything significant, is to replace "could not connect to ..."
with "while connecting to ..." once we're past the connect() per se.
Yeah. I think this is kind of a client-side version of errcontext(),
except we don't really have that context formally, so we're trying to
figure out how to fake it in specific cases.
Maybe it would be better if it said:
connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not existI'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible.
"connection to server" rather than "connection to database" works for
me; in fact, I think I like it slightly better.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021-Jan-20, Robert Haas wrote:
On Wed, Jan 20, 2021 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I just made the mistake of trying to run pgbench without first running
createdb and got this:pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not existThis looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string [...]I'm not sure about the empty DB name in the first part (presumably
that's from pgbench, so what was your pgbench command exactly?).I think it was just 'pgbench -i 40'. For sure, I didn't specify a database name.
That's because pgbench reports the input argument dbname, but since you
didn't specify anything, then PQconnectdbParams() uses the libpq
behavior. I think we'd have to use PQdb() instead.
--
�lvaro Herrera Valdivia, Chile
On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
That's because pgbench reports the input argument dbname, but since you
didn't specify anything, then PQconnectdbParams() uses the libpq
behavior. I think we'd have to use PQdb() instead.
I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021-Jan-20, Robert Haas wrote:
On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
That's because pgbench reports the input argument dbname, but since you
didn't specify anything, then PQconnectdbParams() uses the libpq
behavior. I think we'd have to use PQdb() instead.I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.
Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.
--
�lvaro Herrera 39�49'30"S 73�17'W
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)
Attachments:
pgbench-db.patchtext/x-diff; charset=us-asciiDownload+1-1
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2021-Jan-20, Robert Haas wrote:
I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.
Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.
I'm +1 for applying and back-patching that. I still think we might
want to just drop the phrase altogether in HEAD, but we wouldn't do
that in the back branches, and the message is surely misleading as-is.
regards, tom lane
On Wed, Jan 20, 2021 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2021-Jan-20, Robert Haas wrote:
I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.I'm +1 for applying and back-patching that. I still think we might
want to just drop the phrase altogether in HEAD, but we wouldn't do
that in the back branches, and the message is surely misleading as-is.
Sure, that makes sense.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Maybe it would be better if it said:
connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not exist
I'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible.
"connection to server" rather than "connection to database" works for
me; in fact, I think I like it slightly better.
If I don't hear any other opinions, I'll change these messages to
"connection to server at socket \"%s\" failed: "
"connection to server at \"%s\" (%s), port %s failed: "
(or maybe "server on socket"? "at" sounds right for the IP address
case, but it feels a little off in the socket pathname case.)
regards, tom lane
I wrote:
If I don't hear any other opinions, I'll change these messages to
"connection to server at socket \"%s\" failed: "
"connection to server at \"%s\" (%s), port %s failed: "
Done. Also, here is a patch to remove the redundant-seeming prefixes
from our reports of connection failures. My feeling that this is the
right thing was greatly increased when I noticed that psql, as well as
a few other programs, already did it like this. (I still favor
Alvaro's patch for the back branches, though.)
regards, tom lane
Attachments:
make-connection-failure-messages-less-redundant.patchtext/x-diff; charset=us-ascii; name=make-connection-failure-messages-less-redundant.patchDownload+31-49
On 2021-Jan-20, Robert Haas wrote:
On Wed, Jan 20, 2021 at 1:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.I'm +1 for applying and back-patching that. I still think we might
want to just drop the phrase altogether in HEAD, but we wouldn't do
that in the back branches, and the message is surely misleading as-is.Sure, that makes sense.
OK, I pushed it. Thanks,
pgbench has one occurrence of the old pattern in master, in line 6043.
However, since doConnect() returns NULL when it gets CONNECTION_BAD,
that seems dead code. This patch kills it.
--
�lvaro Herrera 39�49'30"S 73�17'W
"I can see support will not be a problem. 10 out of 10." (Simon Wittber)
(http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)
Attachments:
deadcode.patchtext/x-diff; charset=us-asciiDownload+0-7
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
pgbench has one occurrence of the old pattern in master, in line 6043.
However, since doConnect() returns NULL when it gets CONNECTION_BAD,
that seems dead code. This patch kills it.
Oh ... I missed that because it wasn't adjacent to the PQconnectdbParams
call :-(. You're right, that's dead code and we should just delete it.
regards, tom lane
On 2021-Jan-26, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
pgbench has one occurrence of the old pattern in master, in line 6043.
However, since doConnect() returns NULL when it gets CONNECTION_BAD,
that seems dead code. This patch kills it.Oh ... I missed that because it wasn't adjacent to the PQconnectdbParams
call :-(. You're right, that's dead code and we should just delete it.
Pushed, thanks.
--
�lvaro Herrera 39�49'30"S 73�17'W
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
s�lo le suma el nuevo terror de la locura" (Perelandra, C.S.Lewis)
On 21.01.21 02:33, Tom Lane wrote:
I'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible."connection to server" rather than "connection to database" works for
me; in fact, I think I like it slightly better.If I don't hear any other opinions, I'll change these messages to
"connection to server at socket \"%s\" failed:"
"connection to server at \"%s\" (%s), port %s failed:"(or maybe "server on socket"? "at" sounds right for the IP address
case, but it feels a little off in the socket pathname case.)
I was just trying some stuff with PG14, which led me to this thread.
I find these new error messages to be more distracting than before in
some cases. For example:
PG13:
clusterdb: error: could not connect to database typo: FATAL: database
"typo" does not exist
PG14:
clusterdb: error: connection to server on socket "/tmp/.s.PGSQL.65432"
failed: FATAL: database "typo" does not exist
Throwing the socket address in there seems a bit distracting and
misleading, and it also pushes off the actual information very far to
the end. (Also, in some cases the socket path is very long, making the
actual information even harder to find.) By the time you get to this
error, you have already connected, so mentioning the server address
seems secondary at best.
On Mon, May 3, 2021 at 6:08 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
I find these new error messages to be more distracting than before in
some cases. For example:PG13:
clusterdb: error: could not connect to database typo: FATAL: database
"typo" does not existPG14:
clusterdb: error: connection to server on socket "/tmp/.s.PGSQL.65432"
failed: FATAL: database "typo" does not existThrowing the socket address in there seems a bit distracting and
misleading, and it also pushes off the actual information very far to
the end. (Also, in some cases the socket path is very long, making the
actual information even harder to find.) By the time you get to this
error, you have already connected, so mentioning the server address
seems secondary at best.
It feels a little counterintuitive to me too but I am nevertheless
inclined to believe that it's an improvement. When multi-host
connection strings are used, the server address may not be clear. In
fact, even when they're not, it may not be clear to a new user that
socket communication is used, and it may not be clear where the socket
is located. New users may not even realize that there's a socket
involved; I certainly didn't know that for quite a while. It's a lot
harder for the database name to be unclear, because since a particular
connection attempt will never try more than one, and also because when
it's relevant to understanding why the connection failed, the server
will hopefully include it in the message string anyway, as here. So
the PG13 message is really kind of silly: it tells us the same thing
twice, which we must already know, instead of telling us something
that we might not know.
It might be more intuitive in some ways if the socket information were
demoted to the end of the message, but I think we'd lose more than we
gained. The standard way of reporting someone else's error is
basically "what i have to say about the problem: %s" and that's
exactly what we're doing here. We could find some way of gluing the
information about the socket onto the end of the server message, but
it seems unclear how to do that in a way that looks natural, and it
would depart from our usual practice. So even though I also find this
to be a bit distracting, I think we should just live with it, because
everything else seems worse.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, May 3, 2021 at 6:08 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:Throwing the socket address in there seems a bit distracting and
misleading, and it also pushes off the actual information very far to
the end. (Also, in some cases the socket path is very long, making the
actual information even harder to find.) By the time you get to this
error, you have already connected, so mentioning the server address
seems secondary at best.
It feels a little counterintuitive to me too but I am nevertheless
inclined to believe that it's an improvement. When multi-host
connection strings are used, the server address may not be clear. In
fact, even when they're not, it may not be clear to a new user that
socket communication is used, and it may not be clear where the socket
is located.
Yeah. The specific problem I'm concerned about solving here is
"I wasn't connecting to the server I thought I was", which could be
a contributing factor in almost any connection-time failure. The
multi-host-connection-string feature made that issue noticeably worse,
but surely we've all seen trouble reports that boiled down to that
even before that feature came in.
As you say, we could perhaps redesign the messages to provide this
info in another order. But it'd be difficult, and I think it might
come out even more confusing in cases where libpq tried several
servers on the way to finally failing. The old code's error
reporting for such cases completely sucked, whereas now you get
a reasonably complete trace of the attempts. As a quick example,
for a case of bad hostname followed by wrong port:
$ psql -d "host=foo1,sss2 port=5432,5342"
psql: error: could not translate host name "foo1" to address: Name or service not known
connection to server at "sss2" (192.168.1.48), port 5342 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
v13 renders this as
$ psql -d "host=foo1,sss2 port=5432,5342"
psql: error: could not translate host name "foo1" to address: Name or service not known
could not connect to server: Connection refused
Is the server running on host "sss2" (192.168.1.48) and accepting
TCP/IP connections on port 5342?
Now, of course the big problem there is the lack of consistency about
how the two errors are laid out; but I'd argue that putting the
server identity info first is better than putting it later.
Also, if you experiment with other cases such as some of the servers
complaining about wrong user name, the old behavior is even harder
to follow about which server said what.
regards, tom lane