Make query cancellation keys longer
Currently, cancel request key is a 32-bit token, which isn't very much
entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of a
query isn't very serious, but it nevertheless would be nice to have more
protection from it. The attached patch makes it longer. It is an
optional protocol feature, so it's fully backwards-compatible with
clients that don't support longer keys.
If the client requests the "_pq_.extended_query_cancel" protocol
feature, the server will generate a longer 256-bit cancellation key.
However, the new longer key length is no longer hardcoded in the
protocol. The client is expected to deal with variable length keys, up
to some reasonable upper limit (TODO: document the maximum). This
flexibility allows e.g. a connection pooler to add more information to
the cancel key, which could be useful. If the client doesn't request the
protocol feature, the server generates a 32-bit key like before.
One complication with this was that because we no longer know how long
the key should be, 4-bytes or something longer, until the backend has
performed the protocol negotiation, we cannot generate the key in the
postmaster before forking the process anymore. The first patch here
changes things so that the cancellation key is generated later, in the
backend, and the backend advertises the key in the PMSignalState array.
This is similar to how this has always worked in EXEC_BACKEND mode with
the ShmemBackendArray, but instead of having a separate array, I added
fields to the PMSignalState slots. This removes a bunch of
EXEC_BACKEND-specific code, which is nice.
Any thoughts on this? Documentation is still missing, and there's one
TODO on adding a portable time-constant memcmp() function; I'll add
those if there's agreement on this otherwise.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
0002-Make-cancel-request-keys-longer-as-an-optional-proto.patchtext/x-patch; charset=UTF-8; name=0002-Make-cancel-request-keys-longer-as-an-optional-proto.patchDownload+148-73
0001-Move-cancel-key-generation-to-after-forking-the-back.patchtext/x-patch; charset=UTF-8; name=0001-Move-cancel-key-generation-to-after-forking-the-back.patchDownload+116-228
This is a preliminary review. I'll look at this more closely soon.
On Thu, 29 Feb 2024 at 22:26, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
If the client requests the "_pq_.extended_query_cancel" protocol
feature, the server will generate a longer 256-bit cancellation key.
Huge +1 for this general idea. This is a problem I ran into with
PgBouncer, and had to make some concessions when fitting the
information I wanted into the available bits. Also from a security
perspective I don't think the current amount of bits have stood the
test of time.
+ ADD_STARTUP_OPTION("_pq_.extended_query_cancel", "");
Since this parameter doesn't actually take a value (just empty
string). I think this behaviour makes more sense as a minor protocol
version bump instead of a parameter.
+ if (strcmp(conn->workBuffer.data, "_pq_.extended_query_cancel") == 0)
+ {
+ /* that's ok */
+ continue;
+ }
Please see this thread[1]/messages/by-id/CAGECzQSr2=JPJHNN06E_jTF2+0E60K=hotyBw5wY=q9Wvmt7DQ@mail.gmail.com, which in the first few patches makes
pqGetNegotiateProtocolVersion3 actually usable for extending the
protocol. I started that, because very proposed protocol change that's
proposed on the list has similar changes to
pqGetNegotiateProtocolVersion3 and I think we shouldn't make these
changes ad-hoc hacked into the current code, but actually do them once
in a way that makes sense for all protocol changes.
Documentation is still missing
I think at least protocol message type documentation would be very
helpful in reviewing, because that is really a core part of this
change. Based on the current code I think it should have a few
changes:
+ int cancel_key_len = 5 + msgLength - (conn->inCursor - conn->inStart);
+
+ conn->be_cancel_key = malloc(cancel_key_len);
+ if (conn->be_cancel_key == NULL)
This is using the message length to determine the length of the cancel
key in BackendKey. That is not something we generally do in the
protocol. It's even documented: "Notice that although each message
includes a byte count at the beginning, the message format is defined
so that the message end can be found without reference to the byte
count." So I think the patch should be changed to include the length
of the cancel key explicitly in the message.
[1]: /messages/by-id/CAGECzQSr2=JPJHNN06E_jTF2+0E60K=hotyBw5wY=q9Wvmt7DQ@mail.gmail.com
On 29.02.24 22:25, Heikki Linnakangas wrote:
Currently, cancel request key is a 32-bit token, which isn't very much
entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of a
query isn't very serious, but it nevertheless would be nice to have more
protection from it. The attached patch makes it longer. It is an
optional protocol feature, so it's fully backwards-compatible with
clients that don't support longer keys.
My intuition would be to make this a protocol version bump, not an
optional feature. I think this is something that everyone should
eventually be using, not a niche feature that you explicitly want to
opt-in for.
One complication with this was that because we no longer know how long
the key should be, 4-bytes or something longer, until the backend has
performed the protocol negotiation, we cannot generate the key in the
postmaster before forking the process anymore.
Maybe this would be easier if it's a protocol version number change,
since that is sent earlier than protocol extensions?
On Fri, 1 Mar 2024 at 15:19, Peter Eisentraut <peter@eisentraut.org> wrote:
One complication with this was that because we no longer know how long
the key should be, 4-bytes or something longer, until the backend has
performed the protocol negotiation, we cannot generate the key in the
postmaster before forking the process anymore.Maybe this would be easier if it's a protocol version number change,
since that is sent earlier than protocol extensions?
Protocol version and protocol extensions are both sent in the
StartupMessage, so the same complication applies. (But I do agree that
a protocol version bump is more appropriate for this type of change)
On Fri, 1 Mar 2024 at 06:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
This is a preliminary review. I'll look at this more closely soon.
Some more thoughts after looking some more at the proposed changes
+#define EXTENDED_CANCEL_REQUEST_CODE PG_PROTOCOL(1234,5677)
nit: I think the code should be 1234,5679 because it's the newer
message type, so to me it makes more sense if the code is a larger
number.
+ * FIXME: we used to use signal_child. I believe kill() is
+ * maybe even more correct, but verify that.
signal_child seems to be the correct one, not kill. signal_child has
this relevant comment explaining why it's better than plain kill:
* On systems that have setsid(), each child process sets itself up as a
* process group leader. For signals that are generally interpreted in the
* appropriate fashion, we signal the entire process group not just the
* direct child process. This allows us to, for example, SIGQUIT a blocked
* archive_recovery script, or SIGINT a script being run by a backend via
* system().
+SendCancelRequest(int backendPID, int32 cancelAuthCode)
I think this name of the function is quite confusing, it's not sending
a cancel request, it is processing one. It sends a SIGINT.
While we're at it, switch to using atomics in pmsignal.c for the state
field. That feels easier to reason about than volatile
pointers.
I feel like this refactor would benefit from being a separate commit.
That would make it easier to follow which change to pmsignal.c is
necessary for what.
+ MyCancelKeyLength = (MyProcPort != NULL &&
MyProcPort->extended_query_cancel) ? MAX_CANCEL_KEY_LENGTH : 4;
I think we should be doing this check the opposite way, i.e. only fall
back to the smaller key when explicitly requested:
+ MyCancelKeyLength = (MyProcPort != NULL &&
MyProcPort->old_query_cancel) ? 4 : MAX_CANCEL_KEY_LENGTH;
That way we'd get the more secure, longer key length for non-backend
processes such as background workers.
+ case EOF:
+ /* We'll come back when there is more data */
+ return PGRES_POLLING_READING;
Nice catch, I'll go steal this for my patchset which adds all the
necessary changes to be able to do a protocol bump[1]/messages/by-id/CAGECzQSr2=JPJHNN06E_jTF2+0E60K=hotyBw5wY=q9Wvmt7DQ@mail.gmail.com.
+ int be_pid; /* PID of backend --- needed for XX cancels */
Seems like you accidentally added XX to the comment in this line.
[1]: /messages/by-id/CAGECzQSr2=JPJHNN06E_jTF2+0E60K=hotyBw5wY=q9Wvmt7DQ@mail.gmail.com
On Sun, 3 Mar 2024 at 15:27, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
+ case EOF: + /* We'll come back when there is more data */ + return PGRES_POLLING_READING;Nice catch, I'll go steal this for my patchset which adds all the
necessary changes to be able to do a protocol bump[1].
Actually, it turns out your change to return PGRES_POLLING_READING on
EOF is incorrect (afaict). A little bit above there is this code
comment above a check to see if the whole body was received:
* Can't process if message body isn't all here yet.
*
* After this check passes, any further EOF during parsing
* implies that the server sent a bad/truncated message.
* Reading more bytes won't help in that case, so don't return
* PGRES_POLLING_READING after this point.
So I'll leave my patchset as is.
On Fri, Mar 1, 2024 at 03:19:23PM +0100, Peter Eisentraut wrote:
On 29.02.24 22:25, Heikki Linnakangas wrote:
Currently, cancel request key is a 32-bit token, which isn't very much
entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of a
query isn't very serious, but it nevertheless would be nice to have more
protection from it. The attached patch makes it longer. It is an
optional protocol feature, so it's fully backwards-compatible with
clients that don't support longer keys.My intuition would be to make this a protocol version bump, not an optional
feature. I think this is something that everyone should eventually be
using, not a niche feature that you explicitly want to opt-in for.
Agreed.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
On 01/03/2024 07:19, Jelte Fennema-Nio wrote:
I think this behaviour makes more sense as a minor protocol version
bump instead of a parameter.
Ok, the consensus is to bump the minor protocol version for this. Works
for me.
+ if (strcmp(conn->workBuffer.data, "_pq_.extended_query_cancel") == 0) + { + /* that's ok */ + continue; + }Please see this thread[1], which in the first few patches makes
pqGetNegotiateProtocolVersion3 actually usable for extending the
protocol. I started that, because very proposed protocol change that's
proposed on the list has similar changes to
pqGetNegotiateProtocolVersion3 and I think we shouldn't make these
changes ad-hoc hacked into the current code, but actually do them once
in a way that makes sense for all protocol changes.
Thanks, I will take a look and respond on that thread.
Documentation is still missing
I think at least protocol message type documentation would be very
helpful in reviewing, because that is really a core part of this
change.
Added some documentation. There's more work to be done there, but at
least the message type descriptions are now up-to-date.
Based on the current code I think it should have a few changes:
+ int cancel_key_len = 5 + msgLength - (conn->inCursor - conn->inStart); + + conn->be_cancel_key = malloc(cancel_key_len); + if (conn->be_cancel_key == NULL)This is using the message length to determine the length of the cancel
key in BackendKey. That is not something we generally do in the
protocol. It's even documented: "Notice that although each message
includes a byte count at the beginning, the message format is defined
so that the message end can be found without reference to the byte
count." So I think the patch should be changed to include the length
of the cancel key explicitly in the message.
The nice thing about relying on the message length was that we could
just redefine the CancelRequest message to have a variable length
secret, and old CancelRequest with 4-byte secret was compatible with the
new definition too. But it doesn't matter much, so added an explicit
length field.
FWIW I don't think that restriction makes sense. Any code that parses
the messages needs to have the message length available, and I don't
think it helps with sanity checking that much. I think the documentation
is a little anachronistic. The real reason that all the message types
include enough information to find the message end is that in protocol
version 2, there was no message length field. The only exception that
doesn't have that property is CopyData, and it's no coincidence that it
was added in protocol version 3.
On 03/03/2024 16:27, Jelte Fennema-Nio wrote:
On Fri, 1 Mar 2024 at 06:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
This is a preliminary review. I'll look at this more closely soon.
Some more thoughts after looking some more at the proposed changes
+#define EXTENDED_CANCEL_REQUEST_CODE PG_PROTOCOL(1234,5677)
nit: I think the code should be 1234,5679 because it's the newer
message type, so to me it makes more sense if the code is a larger
number.
Unfortunately 1234,5679 already in use by NEGOTIATE_SSL_CODE. That's why
I went the other direction. If we want to add this to "the end", it
needs to be 1234,5681. But I wanted to keep the cancel requests together.
+ * FIXME: we used to use signal_child. I believe kill() is + * maybe even more correct, but verify that.signal_child seems to be the correct one, not kill. signal_child has
this relevant comment explaining why it's better than plain kill:* On systems that have setsid(), each child process sets itself up as a
* process group leader. For signals that are generally interpreted in the
* appropriate fashion, we signal the entire process group not just the
* direct child process. This allows us to, for example, SIGQUIT a blocked
* archive_recovery script, or SIGINT a script being run by a backend via
* system().
I changed it to signal the process group if HAVE_SETSID, like
pg_signal_backend() does. We don't need to signal both the process group
and the process itself like signal_child() does, because we don't have
the race condition with recently-forked children that signal_child()
talks about.
+SendCancelRequest(int backendPID, int32 cancelAuthCode)
I think this name of the function is quite confusing, it's not sending
a cancel request, it is processing one. It sends a SIGINT.
Heh, well, it's sending the cancel request signal to the right backend,
but I see your point. Renamed to ProcessCancelRequest.
While we're at it, switch to using atomics in pmsignal.c for the state
field. That feels easier to reason about than volatile
pointers.I feel like this refactor would benefit from being a separate commit.
That would make it easier to follow which change to pmsignal.c is
necessary for what.
Point taken. I didn't do that yet, but it makes sense.
+ MyCancelKeyLength = (MyProcPort != NULL &&
MyProcPort->extended_query_cancel) ? MAX_CANCEL_KEY_LENGTH : 4;I think we should be doing this check the opposite way, i.e. only fall
back to the smaller key when explicitly requested:+ MyCancelKeyLength = (MyProcPort != NULL &&
MyProcPort->old_query_cancel) ? 4 : MAX_CANCEL_KEY_LENGTH;That way we'd get the more secure, longer key length for non-backend
processes such as background workers.
+1, fixed.
On 03/03/2024 19:27, Jelte Fennema-Nio wrote:
On Sun, 3 Mar 2024 at 15:27, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
+ case EOF: + /* We'll come back when there is more data */ + return PGRES_POLLING_READING;Nice catch, I'll go steal this for my patchset which adds all the
necessary changes to be able to do a protocol bump[1].Actually, it turns out your change to return PGRES_POLLING_READING on
EOF is incorrect (afaict). A little bit above there is this code
comment above a check to see if the whole body was received:* Can't process if message body isn't all here yet.
*
* After this check passes, any further EOF during parsing
* implies that the server sent a bad/truncated message.
* Reading more bytes won't help in that case, so don't return
* PGRES_POLLING_READING after this point.So I'll leave my patchset as is.
Yep, thanks.
One consequence of this patch that I didn't mention earlier is that it
makes libpq incompatible with server version 9.2 and below, because the
minor version negotiation was introduced in version 9.3. We could teach
libpq to disconnect and reconnect with minor protocol version 3.0, if
connecting with 3.1 fails, but IMHO it's better to not complicate this
and accept the break in backwards-compatibility. 9.3 was released in
2013. We dropped pg_dump support for versions older than 9.2 a few years
ago, this raises the bar for pg_dump to 9.3 as well.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v2-0001-Move-cancel-key-generation-to-after-forking-the-b.patchtext/x-patch; charset=UTF-8; name=v2-0001-Move-cancel-key-generation-to-after-forking-the-b.patchDownload+116-231
v2-0002-Make-cancel-request-keys-longer-bump-minor-protoc.patchtext/x-patch; charset=UTF-8; name=v2-0002-Make-cancel-request-keys-longer-bump-minor-protoc.patchDownload+366-82
On Fri, 8 Mar 2024 at 23:20, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Added some documentation. There's more work to be done there, but at
least the message type descriptions are now up-to-date.
Thanks, that's very helpful.
The nice thing about relying on the message length was that we could
just redefine the CancelRequest message to have a variable length
secret, and old CancelRequest with 4-byte secret was compatible with the
new definition too. But it doesn't matter much, so I added an explicit
length field.FWIW I don't think that restriction makes sense. Any code that parses
the messages needs to have the message length available, and I don't
think it helps with sanity checking that much. I think the documentation
is a little anachronistic. The real reason that all the message types
include enough information to find the message end is that in protocol
version 2, there was no message length field. The only exception that
doesn't have that property is CopyData, and it's no coincidence that it
was added in protocol version 3.
Hmm, looking at the current code, I do agree that not introducing a
new message would simplify both client and server implementation. Now
clients need to do different things depending on if the server
supports 3.1 or 3.0 (sending CancelRequestExtended instead of
CancelRequest and having to parse BackendKeyData differently). And I
also agree that the extra length field doesn't add much in regards to
sanity checking (for the CancelRequest and BackendKeyData message
types at least). So, sorry for the back and forth on this, but I now
agree with you that we should not add the length field. I think one
reason I didn't see the benefit before was because the initial patch
0002 was still introducing a CancelRequestExtended message type. If we
can get rid of this message type by not adding a length, then I think
that's worth losing the sanity checking.
Unfortunately 1234,5679 already in use by NEGOTIATE_SSL_CODE. That's why
I went the other direction. If we want to add this to "the end", it
needs to be 1234,5681. But I wanted to keep the cancel requests together.
Fair enough, I didn't realise that. This whole point is moot anyway if
we're not introducing CancelRequestExtended
We could teach
libpq to disconnect and reconnect with minor protocol version 3.0, if
connecting with 3.1 fails, but IMHO it's better to not complicate this
and accept the break in backwards-compatibility.
Yeah, implementing automatic reconnection seems a bit overkill to me
too. But it might be nice to add a connection option that causes libpq
to use protocol 3.0. Having to install an old libpq to connect to an
old server seems quite annoying. Especially since I think that many
other types of servers that implement the postgres protocol have not
implemented the minor version negotiation.
I at least know PgBouncer[1]https://github.com/pgbouncer/pgbouncer/pull/1007 and pgcat[2]https://github.com/postgresml/pgcat/issues/706 have not, but probably other
server implementations like CockroachDB and Google Spanner have this
problem too.
I'll try to add such a fallback connection option to my patchset soon.
[1]: https://github.com/pgbouncer/pgbouncer/pull/1007
[2]: https://github.com/postgresml/pgcat/issues/706
On Fri, Mar 8, 2024 at 5:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
The nice thing about relying on the message length was that we could
just redefine the CancelRequest message to have a variable length
secret, and old CancelRequest with 4-byte secret was compatible with the
new definition too. But it doesn't matter much, so added an explicit
length field.
I think I liked your original idea better than this one.
One consequence of this patch that I didn't mention earlier is that it
makes libpq incompatible with server version 9.2 and below, because the
minor version negotiation was introduced in version 9.3. We could teach
libpq to disconnect and reconnect with minor protocol version 3.0, if
connecting with 3.1 fails, but IMHO it's better to not complicate this
and accept the break in backwards-compatibility. 9.3 was released in
2013. We dropped pg_dump support for versions older than 9.2 a few years
ago, this raises the bar for pg_dump to 9.3 as well.
I think we shouldn't underestimate the impact of bumping the minor
protocol version. Minor version negotiation is probably not supported
by all drivers and Jelte says that it's not supported by any poolers,
so for anybody but direct libpq users, there will be some breakage.
Now, on the one hand, as Jelte has said, there's little value in
having a protocol version if we're too afraid to make use of it. But
on the other hand, is this problem serious enough to justify the
breakage we'll cause? I'm not sure. It seems pretty silly to be using
a 32-bit value for this in 2024, but I'm sure some people aren't going
to like it, and they may not all have noticed this thread. On the
third hand, if we do this, it may help to unblock a bunch of other
pending patches that also want to do protocol-related things.
--
Robert Haas
EDB: http://www.enterprisedb.com
Here's a new version of the first patch. In the previous version, I
added the pid cancellation key to pmsignal.c, but on second thoughts, I
think procsignal.c is a better place. The ProcSignal array already
contains the pid, we just need to add the cancellation key there.
This first patch just refactors the current code, without changing the
protocol or length of the cancellation key. I'd like to get this
reviewed and committed first, and get back to the protocol changes after
that.
We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.
Thoughts?
--
Heikki Linnakangas
Neon (https://neon.tech)
On 04/07/2024 13:32, Heikki Linnakangas wrote:
Here's a new version of the first patch.
Sorry, forgot attachment.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v3-0001-Move-cancel-key-generation-to-after-forking-the-b.patchtext/x-patch; charset=UTF-8; name=v3-0001-Move-cancel-key-generation-to-after-forking-the-b.patchDownload+113-226
On Thu, 4 Jul 2024 at 12:35, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 04/07/2024 13:32, Heikki Linnakangas wrote:
Here's a new version of the first patch.
Sorry, forgot attachment.
It seems you undid the following earlier change. Was that on purpose?
If not, did you undo any other earlier changes by accident?
Show quoted text
+SendCancelRequest(int backendPID, int32 cancelAuthCode)
I think this name of the function is quite confusing, it's not sending
a cancel request, it is processing one. It sends a SIGINT.Heh, well, it's sending the cancel request signal to the right backend,
but I see your point. Renamed to ProcessCancelRequest.
On 04/07/2024 13:50, Jelte Fennema-Nio wrote:
On Thu, 4 Jul 2024 at 12:35, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 04/07/2024 13:32, Heikki Linnakangas wrote:
Here's a new version of the first patch.
Sorry, forgot attachment.
It seems you undid the following earlier change. Was that on purpose?
If not, did you undo any other earlier changes by accident?+SendCancelRequest(int backendPID, int32 cancelAuthCode)
I think this name of the function is quite confusing, it's not sending
a cancel request, it is processing one. It sends a SIGINT.Heh, well, it's sending the cancel request signal to the right backend,
but I see your point. Renamed to ProcessCancelRequest.
Ah, I made that change as part of the second patch earlier, so I didn't
consider it now.
I don't feel strongly about it, but I think SendCancelRequest() actually
feels a little better, in procsignal.c. It's more consistent with the
existing SendProcSignal() function.
There was indeed another change in the second patch that I missed:
+ /* If we have setsid(), signal the backend's whole process group */ +#ifdef HAVE_SETSID + kill(-backendPID, SIGINT); +#else kill(backendPID, SIGINT); +#endif
I'm not sure that's really required, when sending SIGINT to a backend
process. A backend process shouldn't have any child processes, and if it
did, it's not clear what good SIGINT will do them. But I guess it makes
sense to do it anyway, for consistency with pg_cancel_backend(), which
also signals the whole process group.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.
I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.
+ slot->pss_cancel_key_valid = false;
+ slot->pss_cancel_key = 0;
If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.
Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.
- volatile pid_t pss_pid;
+ pid_t pss_pid;
Why remove the volatile modifier?
Hi,
I don't have any immediate feedback regarding this patch, but I'm
wondering about one thing related to cancellations - we talk cancelling
a query, but we really target a PID (or a particular backend, no matter
how we identify it).
I occasionally want to only cancel a particular query, but I don't think
that's really possible - the query may complete meanwhile, and the
backend may even get used for a different user connection (e.g. with a
connection pool)? Or am I missing something important?
Anyway, I wonder if making the cancellation key longer (or variable
length) might be useful for this too - it would allow including some
sort of optional "query ID" in the request, not just the PID. (Maybe
st_xact_start_timestamp would work?)
Obviously, that's not up to this patch, but it's somewhat related.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 4 Jul 2024 at 14:43, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
I don't have any immediate feedback regarding this patch, but I'm
wondering about one thing related to cancellations - we talk cancelling
a query, but we really target a PID (or a particular backend, no matter
how we identify it).I occasionally want to only cancel a particular query, but I don't think
that's really possible - the query may complete meanwhile, and the
backend may even get used for a different user connection (e.g. with a
connection pool)? Or am I missing something important?
No, you're not missing anything. Having the target of the cancel
request be the backend instead of a specific query is really annoying
and can cause all kinds of race conditions. I had to redesign and
complicate the cancellation logic in PgBouncer significantly, to make
sure that one client could not cancel a connection from another client
anymore: https://github.com/pgbouncer/pgbouncer/pull/717
Anyway, I wonder if making the cancellation key longer (or variable
length) might be useful for this too - it would allow including some
sort of optional "query ID" in the request, not just the PID. (Maybe
st_xact_start_timestamp would work?)
Yeah, some query ID would be necessary. I think we'd want a dedicated
field for it though. Instead of encoding it in the secret. Getting the
xact_start_timestamp would require communication with the server to
get it, which you would like to avoid since the server might be
unresponsive. So I think a command counter that both sides keep track
of would be better. This counter could then be incremented after every
Query and Sync message.
Obviously, that's not up to this patch, but it's somewhat related.
Yeah, let's postpone more discussion on this until we have the
currently proposed much simpler change in, which only changes the
secret length.
On 04/07/2024 15:20, Jelte Fennema-Nio wrote:
On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.+ slot->pss_cancel_key_valid = false;
+ slot->pss_cancel_key = 0;If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.
Ok, here's a version with spinlocks.
I went back and forth on what exactly is protected by the spinlock. I
kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it
can still be safely read without holding the spinlock in
CheckProcSignal, but all the functions that set the flags now hold the
spinlock. That removes the race condition that you might set the flag
for wrong slot, if the target backend exits and the slot is recycled.
The race condition was harmless and there were comments to note it, but
it doesn't occur anymore with the spinlock.
(Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags
altogether. I'm looking forward to that.)
- volatile pid_t pss_pid; + pid_t pss_pid;Why remove the volatile modifier?
Because I introduced a memory barrier to ensure the reads/writes of
pss_pid become visible to other processes in right order. That makes the
'volatile' unnecessary IIUC. With the spinlock, the point is moot however.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v4-0001-Move-cancel-key-generation-to-after-forking-the-b.patchtext/x-patch; charset=UTF-8; name=v4-0001-Move-cancel-key-generation-to-after-forking-the-b.patchDownload+175-261
On 24/07/2024 19:12, Heikki Linnakangas wrote:
On 04/07/2024 15:20, Jelte Fennema-Nio wrote:
On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.+ slot->pss_cancel_key_valid = false;
+ slot->pss_cancel_key = 0;If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.Ok, here's a version with spinlocks.
I went back and forth on what exactly is protected by the spinlock. I
kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it
can still be safely read without holding the spinlock in
CheckProcSignal, but all the functions that set the flags now hold the
spinlock. That removes the race condition that you might set the flag
for wrong slot, if the target backend exits and the slot is recycled.
The race condition was harmless and there were comments to note it, but
it doesn't occur anymore with the spinlock.(Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags
altogether. I'm looking forward to that.)- volatile pid_t pss_pid; + pid_t pss_pid;Why remove the volatile modifier?
Because I introduced a memory barrier to ensure the reads/writes of
pss_pid become visible to other processes in right order. That makes the
'volatile' unnecessary IIUC. With the spinlock, the point is moot however.
I:
- fixed a few comments,
- fixed a straightforward failure with EXEC_BACKEND (ProcSignal needs to
be passed down in BackendParameters now),
- put back the snippet to signal the whole process group if supported,
which you pointed out earlier
and committed this refactoring patch.
Thanks for the review!
--
Heikki Linnakangas
Neon (https://neon.tech)
+ * See if we have a matching backend. Reading the pss_pid and
+ * pss_cancel_key fields is racy, a backend might die and remove itself
+ * from the array at any time. The probability of the cancellation key
+ * matching wrong process is miniscule, however, so we can live with that.
+ * PIDs are reused too, so sending the signal based on PID is inherently
+ * racy anyway, although OS's avoid reusing PIDs too soon.
Just BTW, we know that Windows sometimes recycles PIDs very soon,
sometimes even immediately, to the surprise of us Unix hackers. It can
make for some very confusing build farm animal logs. My patch will
propose to change that particular thing to proc numbers anyway so I'm
not asking for a change here, I just didn't want that assumption to go
un-footnoted. I suppose that's actually (another) good reason to want
to widen the cancellation key, so that we don't have to worry about
proc number allocation order being less protective than traditional
Unix PID allocation...