libpq v17 PQsocketPoll timeout is not granular enough

Started by Dominique Deviennealmost 2 years ago7 messagesgeneral
Jump to latest
#1Dominique Devienne
ddevienne@gmail.com

Hi. I've noticed [that libpq API in v17 beta1][1]https://www.postgresql.org/docs/17/libpq-connect.html#LIBPQ-PQSOCKETPOLL, and wanted to use
it to replace an existing Boost.ASIO-based async polling of the
connection's socket, waiting for notifications. The use case being
using PostgreSQL LISTEN/NOTIFY for a simple message queue. The code
needs to be cross-platform Windows and Linux. My goal would be to
eliminate that Boost.ASIO dependency for that, to use just libpq.

PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
Despite the [underlying implementation in fe-misc.c][2]https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L1086 supporting at
least milliseconds.

My use case is clients posting "requests" to the "queue" (i.e. a
PostgreSQL table), which trigger NOTIFY messages, waited on by
"servers"; and those servers informing back clients via further
notifications (on per-request channels) about the processing status of
their requests.

In that use case, second-resolution on long-lived servers is OK.
But OTOH, for the client side, waiting 1s or more to know whether a
server picked up their request is "too long / slow". I'd need
millisecond timeouts for that.

The doc for PQsocketPoll() mentions a different use case for that API.
But I think it would a pity if that unreleased API couldn't be made to
accomodate sub-second timeouts and more use-cases, like above.
Especially given that libpq v17 isn't out yet. I may come late to the
game, but hopefully it is not too late.

Thoughts? Thanks, --DD

[1]: https://www.postgresql.org/docs/17/libpq-connect.html#LIBPQ-PQSOCKETPOLL
[2]: https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L1086

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dominique Devienne (#1)
Re: libpq v17 PQsocketPoll timeout is not granular enough

Dominique Devienne <ddevienne@gmail.com> writes:

PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
Despite the [underlying implementation in fe-misc.c][2] supporting at
least milliseconds.
...
But I think it would a pity if that unreleased API couldn't be made to
accomodate sub-second timeouts and more use-cases, like above.
Especially given that libpq v17 isn't out yet. I may come late to the
game, but hopefully it is not too late.

This is an interesting suggestion, but I think yes it's too late.
We're already past beta1 and this'd require some fairly fundamental
rethinking, since there's no easy substitute for type time_t that has
millisecond resolution. (The callers do want to specify an end time
not a timeout interval, since some of them loop around PQsocketPoll
and don't want the end time to slip.)

I guess conceivably we could use gettimeofday() and struct timeval
instead of time() and time_t, but it'd touch a lot of places in
libpq and it'd make some of the calculations a lot more complex.

Maybe a better idea would be to convert to using our
src/include/portability/instr_time.h abstraction? But that
would be problematic for outside callers.

In any case this doesn't seem like a sane thing to be redesigning
post-beta. A few months ago maybe we'd have done it, but ...

regards, tom lane

#3Dominique Devienne
ddevienne@gmail.com
In reply to: Tom Lane (#2)
Re: libpq v17 PQsocketPoll timeout is not granular enough

Bummer… I didn’t presume to suggest an api before, but simply adding an
extra int with the milliseconds offset from the time_t is simple, and
trivial to plug into the implementation I saw. Callers who don’t care can
simply pass zero. while I could pass a computed time_t and ms offset using
<chrono>. No need for fancy types imho. Aren’t betas precisely for the
purpose of exposing apis to those like myself to vet them? This is also
beta1, I,e, the first one. My €0.02

On Mon, Jun 10, 2024 at 6:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Dominique Devienne <ddevienne@gmail.com> writes:

PQsocketPoll() being based on time_t, it has only second resolution,

AFAIK.

Despite the [underlying implementation in fe-misc.c][2] supporting at
least milliseconds.
...
But I think it would a pity if that unreleased API couldn't be made to
accomodate sub-second timeouts and more use-cases, like above.
Especially given that libpq v17 isn't out yet. I may come late to the
game, but hopefully it is not too late.

This is an interesting suggestion, but I think yes it's too late.
We're already past beta1 and this'd require some fairly fundamental
rethinking, since there's no easy substitute for type time_t that has
millisecond resolution. (The callers do want to specify an end time
not a timeout interval, since some of them loop around PQsocketPoll
and don't want the end time to slip.)

I guess conceivably we could use gettimeofday() and struct timeval
instead of time() and time_t, but it'd touch a lot of places in
libpq and it'd make some of the calculations a lot more complex.

Maybe a better idea would be to convert to using our
src/include/portability/instr_time.h abstraction? But that
would be problematic for outside callers.

In any case this doesn't seem like a sane thing to be redesigning
post-beta. A few months ago maybe we'd have done it, but ...

regards, tom lane

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Dominique Devienne (#3)
Re: libpq v17 PQsocketPoll timeout is not granular enough

On 6/10/24 11:43, Dominique Devienne wrote:

Bummer… I didn’t presume to suggest an api before, but simply adding an
extra int with the milliseconds offset from the time_t is simple, and
trivial to plug into the implementation I saw. Callers who don’t care
can simply pass zero. while I could pass a computed time_t and ms offset
using <chrono>. No need for fancy types imho. Aren’t betas precisely for

https://www.postgresql.org/developer/beta/

"PostgreSQL beta and release candidate releases are pre-release testing
versions before the community makes a new release generally available.
They are feature-frozen (i.e. no new features are added), and we release
these to the public for testing before our final release. PostgreSQL
beta and release candidate release are not meant for use in production
systems."

the purpose of exposing apis to those like myself to vet them? This is
also beta1, I,e, the first one. My €0.02

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Dominique Devienne (#1)
Re: libpq v17 PQsocketPoll timeout is not granular enough

On Tue, Jun 11, 2024 at 2:36 AM Dominique Devienne <ddevienne@gmail.com> wrote:

Hi. I've noticed [that libpq API in v17 beta1][1], and wanted to use
it to replace an existing Boost.ASIO-based async polling of the
connection's socket, waiting for notifications. The use case being
using PostgreSQL LISTEN/NOTIFY for a simple message queue. The code
needs to be cross-platform Windows and Linux. My goal would be to
eliminate that Boost.ASIO dependency for that, to use just libpq.

One idea I have wondered about is why you wouldn't just use poll()
directly, if you want a facility that works on all known operating
systems without extra portability libraries. Windows has WSApoll(),
which AFAIK was designed to be Unix-compatible and a drop-in
replacement, requiring just a rename but otherwise having the same
macros and struct etc.

For some period of time, people who had to deal with socket connection
events (that includes us) avoided it, with the Curl guys' blog being
the most often cited public explanation for why:

https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

However, as far as I know, that was fixed ~4 years ago:

https://learn.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll

"Note As of Windows 10 version 2004, when a TCP socket fails to
connect, (POLLHUP \| POLLERR \| POLLWRNORM) is indicated."

I wonder if that means that it's now completely usable on all
reasonable versions of the OS. I think so? I don't use Windows
myself, my interest in this topic is that I have a slow moving
background project to figure out how and when to remove all remaining
uses of select() from our tree, and this one is on my hit list.

PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
Despite the [underlying implementation in fe-misc.c][2] supporting at
least milliseconds.

Yeah, that is not nice and your complaint is very reasonable, and we
should probably do something like what Tom suggested.

Hmm, but if what I speculated above is true, I wonder if the extern
function is even worth its bits... but I don't know how to determine
that completely.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#5)
Re: libpq v17 PQsocketPoll timeout is not granular enough

Thomas Munro <thomas.munro@gmail.com> writes:

On Tue, Jun 11, 2024 at 2:36 AM Dominique Devienne <ddevienne@gmail.com> wrote:

PQsocketPoll() being based on time_t, it has only second resolution, AFAIK.
Despite the [underlying implementation in fe-misc.c][2] supporting at
least milliseconds.

Yeah, that is not nice and your complaint is very reasonable, and we
should probably do something like what Tom suggested.

Hmm, but if what I speculated above is true, I wonder if the extern
function is even worth its bits... but I don't know how to determine
that completely.

I think if we're going to change anything at all here, we should
define the external API in microseconds not milliseconds. The lesson
we need to be taking from this is that system calls come and go,
but libpq API is forever ;-). Somebody could conceivably want
sub-millisecond wait resolution within the lifespan of libpq.

regards, tom lane

#7Francisco Olarte
folarte@peoplecall.com
In reply to: Tom Lane (#6)
Re: libpq v17 PQsocketPoll timeout is not granular enough

Tom:

On Tue, 11 Jun 2024 at 01:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think if we're going to change anything at all here, we should
define the external API in microseconds not milliseconds. The lesson
we need to be taking from this is that system calls come and go,
but libpq API is forever ;-). Somebody could conceivably want
sub-millisecond wait resolution within the lifespan of libpq.

I was thinking on that, but since you need at least 20 bits, so 32,
for microseconds I would suggest nanoseconds, which fit in 32 too.
Sure, nanos seems too much for the current time but it pushes the
future problem further down, nearly forever for comms between
different machines until someone develops FTL networks.

Francisco Olarte.