libpq: Which functions may hang due to network issues?

Started by Daniel Freyover 4 years ago10 messagesgeneral
Jump to latest
#1Daniel Frey
d.frey@gmx.de

I need to know which functions of libpq may "hang", depending on network issues. For some functions is seems to be clear, as they only work locally, other functions are clearly documented to wait on some network interaction. But for some functions, it is unclear on whether they are guaranteed to work locally without any possibility to hang or not, e.g. PQfinish(), PQstatus(), PQtransactionStatus(), etc.

Is there a complete list of methods that might wait for network communication?

Some background: I'm writing a C++ wrapper for libpq <https://github.com/taocpp/taopq/&gt; and our applications, which are going to use that library, should never hang, even when there is a network problem and network communication breaks down for a connection. For that reason I'm using asynchronous calls for libpq only and I use timeouts when polling on the socket/FD. When a timeout occurs, I need to handle the situation in a reasonable manner. In my case, I currently close the connection by calling PQfinish(). Also, later I might call PQstatus() or PQtransactionStatus() in order to decide whether a connection is still valid and should be returned to the connection pool or if it needs to be discarded.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Daniel Frey (#1)
Re: libpq: Which functions may hang due to network issues?

On Fri, 2021-12-03 at 11:37 +0100, Daniel Frey wrote:

I need to know which functions of libpq may "hang", depending on network issues. For some functions is
seems to be clear, as they only work locally, other functions are clearly documented to wait on some
network interaction. But for some functions, it is unclear on whether they are guaranteed to work
locally without any possibility to hang or not, e.g. PQfinish(), PQstatus(), PQtransactionStatus(), etc.

Is there a complete list of methods that might wait for network communication?

No; you have to read the code.

For example, PGstatus is defined like this:

PQstatus(const PGconn *conn)
{
if (!conn)
return CONNECTION_BAD;
return conn->status;
}

This does not access the network.

Yours,
Laurenz Albe

#3Daniel Frey
d.frey@gmx.de
In reply to: Laurenz Albe (#2)
Re: libpq: Which functions may hang due to network issues?

On 3. Dec 2021, at 17:00, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2021-12-03 at 11:37 +0100, Daniel Frey wrote:

I need to know which functions of libpq may "hang", depending on network issues. For some functions is
seems to be clear, as they only work locally, other functions are clearly documented to wait on some
network interaction. But for some functions, it is unclear on whether they are guaranteed to work
locally without any possibility to hang or not, e.g. PQfinish(), PQstatus(), PQtransactionStatus(), etc.

Is there a complete list of methods that might wait for network communication?

No; you have to read the code.

I feel that this is insufficient, as the code might change. And it might be simple enought for something like PQstatus(), but not all functions are that simple.

If this property of a function is not guaranteed by the documentation, how am I expected to write a library that doesn't depend on a specific version of libpq? Could these guarantees be added to the documentation, please?

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Frey (#3)
Re: libpq: Which functions may hang due to network issues?

Daniel Frey <d.frey@gmx.de> writes:

On 3. Dec 2021, at 17:00, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-12-03 at 11:37 +0100, Daniel Frey wrote:

Is there a complete list of methods that might wait for network communication?

No; you have to read the code.

I feel that this is insufficient, as the code might change. And it might be simple enought for something like PQstatus(), but not all functions are that simple.

If this property of a function is not guaranteed by the documentation, how am I expected to write a library that doesn't depend on a specific version of libpq? Could these guarantees be added to the documentation, please?

No. For one thing, we'd probably forget to maintain any such info.
In any case, I think you'd be best off to assume that anything that
isn't purely local state inspection might try to contact the server.
And it's not hard to see which ones are local state inspection.

regards, tom lane

#5Daniel Frey
d.frey@gmx.de
In reply to: Tom Lane (#4)
Re: libpq: Which functions may hang due to network issues?

On 3. Dec 2021, at 18:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Frey <d.frey@gmx.de> writes:

On 3. Dec 2021, at 17:00, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-12-03 at 11:37 +0100, Daniel Frey wrote:

Is there a complete list of methods that might wait for network communication?

No; you have to read the code.

I feel that this is insufficient, as the code might change. And it might be simple enought for something like PQstatus(), but not all functions are that simple.

If this property of a function is not guaranteed by the documentation, how am I expected to write a library that doesn't depend on a specific version of libpq? Could these guarantees be added to the documentation, please?

No. For one thing, we'd probably forget to maintain any such info.
In any case, I think you'd be best off to assume that anything that
isn't purely local state inspection might try to contact the server.
And it's not hard to see which ones are local state inspection.

It might be "easy" for *some* functions to figure out that they won't lead to any network communication, like PQstatus() or PQtransactionStatus(). But expecting a user of libpq to inspect the source code to figure that out and still have no guarantee for the future seems extremely weird to me. If you put that guarantee in the documentation and maybe add a comment into the source code, I don't see how that would lead to anyone forgetting about it.

But the real issue, at least for me, is PQfinish(). Considering that my application is not allowed to hang (or crash, leak, ...), what should I do in case of a timeout? I have existing connections and at some point the network connections stop working (e.g. due to a firewall issue/reboot), etc. If I don't want a resource leak, I *must* call PQfinish(), correct? But I have no idea whether it might hang. If you don't want to guarantee that PQfinish() will not hang, then please advise how to use libpq properly in this situation. If there some asynchronous version of PQfinish()? Or should I handle hanging connections differently?

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Daniel Frey (#5)
Re: libpq: Which functions may hang due to network issues?

On Fri, 2021-12-03 at 21:33 +0100, Daniel Frey wrote:

But the real issue, at least for me, is PQfinish(). Considering that my application is not
allowed to hang (or crash, leak, ...), what should I do in case of a timeout?

I am tempted to say that you shouldn't use TCP with the requirement that it should not hang.

I have existing
connections and at some point the network connections stop working (e.g. due to a firewall
issue/reboot), etc. If I don't want a resource leak, I *must* call PQfinish(), correct?
But I have no idea whether it might hang. If you don't want to guarantee that PQfinish()
will not hang, then please advise how to use libpq properly in this situation. If there
some asynchronous version of PQfinish()? Or should I handle hanging connections differently?

You could start a separate process that has your PostgreSQL connection and kill it if it
times out. But then you'd have a similar problem communicating with that process.

A normal thing to do when your database call times out or misbehaves in other ways is
to give up, report an error and die (after some retries perhaps).

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#7Daniel Frey
d.frey@gmx.de
In reply to: Laurenz Albe (#6)
Re: libpq: Which functions may hang due to network issues?

On 4. Dec 2021, at 22:43, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2021-12-03 at 21:33 +0100, Daniel Frey wrote:

But the real issue, at least for me, is PQfinish(). Considering that my application is not
allowed to hang (or crash, leak, ...), what should I do in case of a timeout?

I am tempted to say that you shouldn't use TCP with the requirement that it should not hang.

We actually use UDP in a lot of places, specifically Radius. But the DB connection is supposed to be TCP, no?

I have existing
connections and at some point the network connections stop working (e.g. due to a firewall
issue/reboot), etc. If I don't want a resource leak, I *must* call PQfinish(), correct?
But I have no idea whether it might hang. If you don't want to guarantee that PQfinish()
will not hang, then please advise how to use libpq properly in this situation. If there
some asynchronous version of PQfinish()? Or should I handle hanging connections differently?

You could start a separate process that has your PostgreSQL connection and kill it if it
times out. But then you'd have a similar problem communicating with that process.

Shifting the problem somewhere else (and adding even more complexity to the system) doesn't solve it.

A normal thing to do when your database call times out or misbehaves in other ways is
to give up, report an error and die (after some retries perhaps).

Our software is expected to run 24/7 without dying just because some other system has a (temporary) outage. And when database connections die, we issue an alarm and we regularly check if we can open new ones in a rate limited manner, so we don't flood the network and the DB with connection requests. We then clear the alarm once DB connectivity comes up again. Our software includes fallback logic on how to minimize customer impact while DB connectivity is down or when another systems is temporarily unavailable, this is a defined and controlled scenario. If we were to simply crash, what would the next system up the chain do? See that we are not responsing, so it would also crash? (BTW, I'm working for a big telco company in Germany, just to give some idea/perspective what kind of systems we are talking about).

With all that said, I think that PostgreSQL/libpq should have a clear, documented way to get rid of a connection that is guaranteed to not hang. It has something similar for almost all other methods like opening connections, sending request, retrieving results. Why stop there?

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Frey (#7)
Re: libpq: Which functions may hang due to network issues?

Daniel Frey <d.frey@gmx.de> writes:

With all that said, I think that PostgreSQL/libpq should have a clear, documented way to get rid of a connection that is guaranteed to not hang. It has something similar for almost all other methods like opening connections, sending request, retrieving results. Why stop there?

AFAICS, PQfinish() already acts that way, at least up to the same level of
guarantee as you have for "all other methods". That is, if you previously
set the connection into nonblock mode, it won't block.

regards, tom lane

#9Daniel Frey
d.frey@gmx.de
In reply to: Tom Lane (#8)
Re: libpq: Which functions may hang due to network issues?

On 5. Dec 2021, at 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Frey <d.frey@gmx.de> writes:

With all that said, I think that PostgreSQL/libpq should have a clear, documented way to get rid of a connection that is guaranteed to not hang. It has something similar for almost all other methods like opening connections, sending request, retrieving results. Why stop there?

AFAICS, PQfinish() already acts that way, at least up to the same level of
guarantee as you have for "all other methods". That is, if you previously
set the connection into nonblock mode, it won't block.

OK, thanks Tom, that is at least something. I would still like this to be kinda documented/guaranteed, especially if nonblocking mode is required for this behavior (which is given in my case). But I guess that's not up to me, so I'll drop the topic and I'll just have to accept the status quo.

Thanks, Daniel

#10Daniel Frey
d.frey@gmx.de
In reply to: Daniel Frey (#9)
Re: libpq: Which functions may hang due to network issues?

On 5. Dec 2021, at 21:32, Daniel Frey <d.frey@gmx.de> wrote:

On 5. Dec 2021, at 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Frey <d.frey@gmx.de> writes:

With all that said, I think that PostgreSQL/libpq should have a clear, documented way to get rid of a connection that is guaranteed to not hang. It has something similar for almost all other methods like opening connections, sending request, retrieving results. Why stop there?

AFAICS, PQfinish() already acts that way, at least up to the same level of
guarantee as you have for "all other methods". That is, if you previously
set the connection into nonblock mode, it won't block.

One more question about this: What is the purpose of *not* using nonblocking mode with PQfinish()? Is there any benefit to the user in waiting for something? Or could it make sense for PQfinish() to always use nonblocking mode internally?