Converting pqsignal to void return

Started by Paul Ramsey12 months ago9 messages
#1Paul Ramsey
pramsey@cleverelephant.ca

Nathan,

I just ran across this change when a bunch of CI’s I run barfed.

https://github.com/postgres/postgres/commit/d4a43b283751b23d32bbfa1ecc2cad2d16e3dde9

The problem we are having in extension land is that we often run functions in external libraries that take a long time to return, and we would like to ensure that PgSQL users can cancel their queries, even when control has passed into those functions.

The way we have done it, historically, has been to take the return value of pqsignal(SIGINT, extension_signint_handler) and remember it, and then, inside extension_signint_handler, call the pgsql handler once we have done our own business.

https://github.com/pramsey/pgsql-http/blob/master/http.c#L345

It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

Alternatively, if we have a valid use case, it would be nice to have the return value from pqsignal() back.

Thanks!

Paul

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Paul Ramsey (#1)
Re: Converting pqsignal to void return

On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:

I just ran across this change when a bunch of CI�s I run barfed.

Oops, sorry.

The problem we are having in extension land is that we often run
functions in external libraries that take a long time to return, and we
would like to ensure that PgSQL users can cancel their queries, even when
control has passed into those functions.

The way we have done it, historically, has been to take the return value
of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
inside extension_signint_handler, call the pgsql handler once we have
done our own business..

I see a couple other projects that might be doing something similar [0]https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1.

It is possible we have been Doing It Wrong all this time, and would love
some pointers on the right way to do this.

Alternatively, if we have a valid use case, it would be nice to have the
return value from pqsignal() back.

I don't think you were doing it wrong before, but commit 3b00fdb did add
some weird race conditions to the return value, but only in rare cases that
I don't think apply to the case you shared.

I was tempted to suggest that you try using sigaction() with the second
"act" argument set to NULL to retrieve the original signal handler, but I
think there's a good chance you'd get our wrapper_handler() function (we
store the actual handlers in a separate array), which is no good. So, I
should probably just revert d4a43b2. An alternative approach could be to
provide our own way to retrieve the current handler, but that comes with
its own race conditions, although perhaps those are more obvious than the
pqsignal() ones. WDYT?

[0]: https://codesearch.debian.net/search?q=%3D+pqsignal&literal=1

--
nathan

#3Andres Freund
andres@anarazel.de
In reply to: Paul Ramsey (#1)
Re: Converting pqsignal to void return

Hi,

On 2025-01-22 07:57:52 -0800, Paul Ramsey wrote:

I just ran across this change when a bunch of CI’s I run barfed.

https://github.com/postgres/postgres/commit/d4a43b283751b23d32bbfa1ecc2cad2d16e3dde9

The problem we are having in extension land is that we often run functions
in external libraries that take a long time to return, and we would like to
ensure that PgSQL users can cancel their queries, even when control has
passed into those functions.

The way we have done it, historically, has been to take the return value of
pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
inside extension_signint_handler, call the pgsql handler once we have done
our own business.

I think that's a really bad idea. For one, postgres might use signals in
different process types for different things. We are also working on not using
plain signals in a bunch of places.

It's also really hard to write correct signal handlers. E.g.:

https://github.com/pramsey/pgsql-http/blob/master/http.c#L345

That signal handler is profoundly unsafe:

http_interrupt_handler(int sig)
{
/* Handle the signal here */
elog(DEBUG2, "http_interrupt_handler: sig=%d", sig);
http_interrupt_requested = sig;
pgsql_interrupt_handler(sig);
return;
}

You're definitely not allowed to elog in a signal handler unless you take
*extraordinary* precautions (basically converting the normal execution path to
only perform async-signal-safe work).

This signal handler can e.g. corrupt the memory context used by elog.c,
deadlock in the libc memory allocator, break the FE/BE protocol if any client
ever set client_min_messages=debug2, jump out of a signal handler in case of
an allocation failure and many other things.

It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

All your interrupt handler is doing "for you" is setting
http_interrupt_requested, right? Why do you need that variable? Seems you
could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
you want to handle termination too.

Greetings,

Andres Freund

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: Converting pqsignal to void return

Hi,

On 2025-01-22 11:50:30 -0500, Andres Freund wrote:

On 2025-01-22 07:57:52 -0800, Paul Ramsey wrote:

It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

All your interrupt handler is doing "for you" is setting
http_interrupt_requested, right? Why do you need that variable? Seems you
could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
you want to handle termination too.

FWIW, the even better approach in this case probably would be to use curl in a
non-blocking manner. Then you can do all the blocking in a loop around
WaitLatchOrSocket (or, for slightly improved efficiency, create a wait event
and wait with WaitEventSetWait()) and react to interrupts with
CHECK_FOR_INTERRUPTS() like normal postgres code.

Greetings,

Andres Freund

#5Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Andres Freund (#3)
Re: Converting pqsignal to void return

On Jan 22, 2025, at 8:50 AM, Andres Freund <andres@anarazel.de> wrote:

It is possible we have been Doing It Wrong all this time, and would love some pointers on the right way to do this.

All your interrupt handler is doing "for you" is setting
http_interrupt_requested, right? Why do you need that variable? Seems you
could just check postgres' QueryCancelPending? And perhaps ProcDiePending, if
you want to handle termination too.

This is the Knowledge that was sitting right in front of me, but I could not see.
From the road to Damascus,
Paul

#6Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#2)
Re: Converting pqsignal to void return

Hi,

On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:

On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:

The problem we are having in extension land is that we often run
functions in external libraries that take a long time to return, and we
would like to ensure that PgSQL users can cancel their queries, even when
control has passed into those functions.

The way we have done it, historically, has been to take the return value
of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
inside extension_signint_handler, call the pgsql handler once we have
done our own business..

I see a couple other projects that might be doing something similar [0].

[0] https://codesearch.debian.net/search?q=%3D+pqsignal&amp;literal=1

Grouping by package I see three packages.

All these seem to not handle backend termination, which certainly doesn't seem
optimal - it e.g. means that a fast shutdown won't really work.

1) postgis

For at least some of the cases it doesn't look trivial to convert to
checking QueryCancelPending/ProcDiePending because the signal handler calls
into GEOS and lwgeom.

2) psql-http

Also doesn't handle termination.

Looks like it could trivially be converted to checking
QueryCancelPending/ProcDiePending.

3) timescaledb

This looks to be test only code where the signal handler only prints out a
message. I'd assume it's not critical to log that message.

IOW, the only worrisome case here is postgis.

Given that we are working towards *not* relying on signals for a lot of this,
I wonder if we ought to support registering callbacks to be called on receipt
of query cancellation and backend termination. That then could e.g. call
GEOS_interruptRequest() as postgis does. That'd have a chance of working in a
threaded postgres too - unfortunately it looks like neither lwgeom's nor
GEOS's interrupt mechanisms are thread safe at this point.

It's worth noting that postgis ends up with a bunch of postgres-internals
knowledge due to its desire to handle signals:
https://github.com/postgis/postgis/blob/master/postgis/postgis_module.c#L51-L56

#ifdef WIN32
static void interruptCallback() {
if (UNBLOCKED_SIGNAL_QUEUE())
pgwin32_dispatch_queued_signals();
}
#endif

Which seems really rather undesirable.

But given how windows signals are currently delivered via the equivalent code
in CHECK_FOR_INTERRUPTS(), I don't really see an alternative at this point :(.

Greetings,

Andres Freund

#7Paul Ramsey
pramsey@cleverelephant.ca
In reply to: Andres Freund (#6)
Re: Converting pqsignal to void return

On Jan 22, 2025, at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:

On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:

The problem we are having in extension land is that we often run
functions in external libraries that take a long time to return, and we
would like to ensure that PgSQL users can cancel their queries, even when
control has passed into those functions.

The way we have done it, historically, has been to take the return value
of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
inside extension_signint_handler, call the pgsql handler once we have
done our own business..

I see a couple other projects that might be doing something similar [0].

[0] https://codesearch.debian.net/search?q=%3D+pqsignal&amp;literal=1

Grouping by package I see three packages.

So many things for me to atone for! :)

All these seem to not handle backend termination, which certainly doesn't seem
optimal - it e.g. means that a fast shutdown won't really work.

1) postgis

For at least some of the cases it doesn't look trivial to convert to
checking QueryCancelPending/ProcDiePending because the signal handler calls
into GEOS and lwgeom.

We control both those libraries, so we just may need to change them up a little to maybe pass in an callback for their internal check-for-interrupt to call.

2) psql-http

Also doesn't handle termination.
Looks like it could trivially be converted to checking
QueryCancelPending/ProcDiePending.

Going to fix this one first.

3) timescaledb

Not me!

IOW, the only worrisome case here is postgis.
Given that we are working towards *not* relying on signals for a lot of this,
I wonder if we ought to support registering callbacks to be called on receipt
of query cancellation and backend termination. That then could e.g. call
GEOS_interruptRequest() as postgis does. That'd have a chance of working in a
threaded postgres too - unfortunately it looks like neither lwgeom's nor
GEOS's interrupt mechanisms are thread safe at this point.

I think callbacks are a good thing. Two of the libraries of interest to me (curl, GDAL) I end up using the progress meter callback to wedge myself in and force an interrupt as necessary. This seems like a common feature of libraries with long running work.

ATB,

P

Show quoted text

It's worth noting that postgis ends up with a bunch of postgres-internals
knowledge due to its desire to handle signals:
https://github.com/postgis/postgis/blob/master/postgis/postgis_module.c#L51-L56

#ifdef WIN32
static void interruptCallback() {
if (UNBLOCKED_SIGNAL_QUEUE())
pgwin32_dispatch_queued_signals();
}
#endif

Which seems really rather undesirable.

But given how windows signals are currently delivered via the equivalent code
in CHECK_FOR_INTERRUPTS(), I don't really see an alternative at this point :(.

Greetings,

Andres Freund

#8Andres Freund
andres@anarazel.de
In reply to: Paul Ramsey (#7)
Re: Converting pqsignal to void return

Hi,

On 2025-01-22 09:50:18 -0800, Paul Ramsey wrote:

On Jan 22, 2025, at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:
On 2025-01-22 10:22:45 -0600, Nathan Bossart wrote:

On Wed, Jan 22, 2025 at 07:57:52AM -0800, Paul Ramsey wrote:

The problem we are having in extension land is that we often run
functions in external libraries that take a long time to return, and we
would like to ensure that PgSQL users can cancel their queries, even when
control has passed into those functions.

The way we have done it, historically, has been to take the return value
of pqsignal(SIGINT, extension_signint_handler) and remember it, and then,
inside extension_signint_handler, call the pgsql handler once we have
done our own business..

I see a couple other projects that might be doing something similar [0].

[0] https://codesearch.debian.net/search?q=%3D+pqsignal&amp;literal=1

Grouping by package I see three packages.

So many things for me to atone for! :)

Turns out creating good stuff almost invariably leeds to creating not so great
parts of those good things :)

All these seem to not handle backend termination, which certainly doesn't seem
optimal - it e.g. means that a fast shutdown won't really work.

1) postgis

For at least some of the cases it doesn't look trivial to convert to
checking QueryCancelPending/ProcDiePending because the signal handler calls
into GEOS and lwgeom.

We control both those libraries, so we just may need to change them up a
little to maybe pass in an callback for their internal check-for-interrupt
to call.

Yea, makes sense. Looking at the callsites for the interrupt checking, it
doesn't look like a function call during interrupt checking will have a
meaningful performance impact.

2) psql-http

Also doesn't handle termination.
Looks like it could trivially be converted to checking
QueryCancelPending/ProcDiePending.

Going to fix this one first.

Makes sense.

3) timescaledb

Not me!

:)

IOW, the only worrisome case here is postgis.
Given that we are working towards *not* relying on signals for a lot of this,
I wonder if we ought to support registering callbacks to be called on receipt
of query cancellation and backend termination. That then could e.g. call
GEOS_interruptRequest() as postgis does. That'd have a chance of working in a
threaded postgres too - unfortunately it looks like neither lwgeom's nor
GEOS's interrupt mechanisms are thread safe at this point.

I think callbacks are a good thing. Two of the libraries of interest to me
(curl, GDAL) I end up using the progress meter callback to wedge myself in
and force an interrupt as necessary. This seems like a common feature of
libraries with long running work.

As I mentioned earlier, for something like curl, where afaict you'd want to
accept interrupts while waiting for network IO, I think it's better to convert
to non-blocking use of such APIs and checking interrupt in the event
loop. It's been a while, but I have done that with curl, and it wasn't
particularly hard.

That approach imo is already cleaner today and I think it's what we'll
eventually require for at least a bunch of architectural improvements:

- faster query execution (so that execution can switch to another part of the
query tree while blocked on network IO)

- connection scalability (if there's no 1:1 mapping between sessions and
processes/threads, threads have to switch between executing queries for
different sessions)

- to benefit from asynchronous completion-based FE/BE network handling.

But of course all of these are all quite a bit away.

Obviously event based loops are not a viable approach for wanting to check for
interrupts in, what I assume to be CPU bound, cases like GDAL/lwgeom.

If you have a "progress meter callback", I don't think there's really a
benefit in postgres providing a way to register a callback when receiving a
cancellation / termination event?

Greetings,

Andres Freund

#9Michael Paquier
michael@paquier.xyz
In reply to: Paul Ramsey (#5)
Re: Converting pqsignal to void return

On Wed, Jan 22, 2025 at 09:01:09AM -0800, Paul Ramsey wrote:

This is the Knowledge that was sitting right in front of me, but I could not see.
From the road to Damascus,

Note also that there is some documentation showing how to write a
signal handler and what should not be done:
https://www.postgresql.org/docs/devel/source-conventions.html#SOURCE-CONVENTIONS-SIGNAL-HANDLERS

See also 0da096d78e1e, in the same spirit.
--
Michael