Replay attack of query cancel

Started by Heikki Linnakangasover 17 years ago30 messageshackers
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

It occurred to me a while ago that our query cancel messages are sent
unencrypted, even when SSL is otherwise used. That's not a big issue on
its own, because the cancellation message only contains the backend PID
and the cancellation key, but it does open us to a replay attack. After
the first query in a connection has been cancelled, an eavesdropper can
reuse the backend PID and cancellation key to cancel subsequent queries
on the same connection.

We discussed this on the security list, and the consensus was that this
isn't worth a quick fix and a security release, because
- it only affects applications that use query cancel, which is rare
- it only affects SSL encrypted connections (the point is moot
non-encrypted connections, as you can just snatch the cancel key from
the initial message)
- it only let's you cancel queries, IOW it's only a DOS attack.
- there's no simple fix.

However, it is something to keep in mind, and perhaps fix for the next
release.

One idea for fixing this is to make cancellation keys disposable, and
automatically issue a new one through the main connection when one is
used, but that's not completely trivial, and requires a change in both
the clients and the server. Another idea is to send the query cancel
message only after SSL authentication, but that is impractical for libpq
because we PQcancel needs to be callable from a signal handler.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#1)
Re: Replay attack of query cancel

Heikki Linnakangas wrote:

One idea for fixing this is to make cancellation keys disposable, and
automatically issue a new one through the main connection when one is
used, but that's not completely trivial, and requires a change in both
the clients and the server. Another idea is to send the query cancel
message only after SSL authentication, but that is impractical for libpq
because we PQcancel needs to be callable from a signal handler.

I wonder if we can do something diffie-hellman'ish, where we have a
parameter exchanged in the initial SSL'ed handshake, which is later used
to generate new cancel keys each time the previous one is used.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Replay attack of query cancel

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we can do something diffie-hellman'ish, where we have a
parameter exchanged in the initial SSL'ed handshake, which is later used
to generate new cancel keys each time the previous one is used.

Seems like the risk of getting out of sync would outweigh any benefits.
Lose one cancel message in the network, you have no hope of getting any
more accepted.

regards, tom lane

#4Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#2)
Re: Replay attack of query cancel

Alvaro Herrera wrote:

Heikki Linnakangas wrote:

One idea for fixing this is to make cancellation keys disposable, and
automatically issue a new one through the main connection when one is
used, but that's not completely trivial, and requires a change in both
the clients and the server. Another idea is to send the query cancel
message only after SSL authentication, but that is impractical for libpq
because we PQcancel needs to be callable from a signal handler.

I wonder if we can do something diffie-hellman'ish, where we have a
parameter exchanged in the initial SSL'ed handshake, which is later used
to generate new cancel keys each time the previous one is used.

Seems much more complex than needed.

IIRC, the protocol allows us (at least does not explicitly forbid) to
issue new cancel keys at any time. And libpq will, again IIRC, accept
them. So we could just change the server to issue a new cancel key
whenever it has used the previous one (since the new key will then be
sent over the encrypted channel, we're safe).

The problem was (third IIRC here :-P) in other clients, such as the JDBC
driver (I think that one was checked specifically) which currently only
accept the BackendKeyData message during startup. All drivers not based
on libpq would have to be checked and potentially updated, but it's
sitll a lot easier than DHing or so.

//Magnus

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#4)
Re: Replay attack of query cancel

Magnus Hagander <magnus@hagander.net> writes:

The problem was (third IIRC here :-P) in other clients, such as the JDBC
driver (I think that one was checked specifically) which currently only
accept the BackendKeyData message during startup. All drivers not based
on libpq would have to be checked and potentially updated, but it's
sitll a lot easier than DHing or so.

The other problem was getting the new cancel key from the postmaster to
the backend and thence to the client (hopefully in a timely manner),
recognizing that (a) we don't want the postmaster touching shared memory
very much, and certainly not engaging in any locking behavior; (b)
backends feel free to ignore SIGINT when they're not doing anything.

Certainly the prospect of a de facto protocol change is the bigger
problem, but there are nontrivial implementation issues in the server
too.

If we were going to make it a de jure protocol change (ie new version
number) instead of just hoping nobody notices the behavioral difference,
I'd be inclined to think about widening the cancel key, too. 32 bits
ain't that much randomness anymore.

regards, tom lane

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Heikki Linnakangas (#1)
Re: Replay attack of query cancel

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we can do something diffie-hellman'ish, where we have
a parameter exchanged in the initial SSL'ed handshake, which is
later used to generate new cancel keys each time the previous one
is used.

Tom> Seems like the risk of getting out of sync would outweigh any
Tom> benefits. Lose one cancel message in the network, you have no
Tom> hope of getting any more accepted.

That's easily solved: when the client wants to do a cancel, have it
send, in place of the actual cancel key, an integer N and the value
HMAC(k,N) where k is the cancel key. Replay is prevented by requiring
the value of N to be strictly greater than any previous value
successfully used for this session. (Since we already have md5 code,
HMAC-MD5 would be the obvious choice.)

Migration to this could probably be handled without a version change
to the protocol, by defining a new SecureCancelRequest message and a
GUC to control whether the old CancelRequest message is accepted or
ignored. The key length for the cancel key can be increased with a
minor-version change to the protocol (if client asks for protocol 3.1,
send it a longer key, otherwise a shorter one).

--
Andrew (irc:RhodiumToad)

#7Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#3)
Re: Replay attack of query cancel

Tom Lane napsal(a):

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we can do something diffie-hellman'ish, where we have a
parameter exchanged in the initial SSL'ed handshake, which is later used
to generate new cancel keys each time the previous one is used.

Seems like the risk of getting out of sync would outweigh any benefits.
Lose one cancel message in the network, you have no hope of getting any
more accepted.

When cancellation key is used client should explicitly ask for a new regenerated
cancel key.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

#8Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Replay attack of query cancel

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

The problem was (third IIRC here :-P) in other clients, such as the JDBC
driver (I think that one was checked specifically) which currently only
accept the BackendKeyData message during startup. All drivers not based
on libpq would have to be checked and potentially updated, but it's
sitll a lot easier than DHing or so.

The other problem was getting the new cancel key from the postmaster to
the backend and thence to the client (hopefully in a timely manner),
recognizing that (a) we don't want the postmaster touching shared memory
very much, and certainly not engaging in any locking behavior; (b)
backends feel free to ignore SIGINT when they're not doing anything.

In EXEC_BACKEND, we already store this in shared memory. If we could
live with doing that for the non-exec case as well, it'd be a lot easier.

And we could then just have the backend update the array when it sends
out a "new key" message.

Certainly the prospect of a de facto protocol change is the bigger
problem, but there are nontrivial implementation issues in the server
too.

Yeah.

If we were going to make it a de jure protocol change (ie new version
number) instead of just hoping nobody notices the behavioral difference,
I'd be inclined to think about widening the cancel key, too. 32 bits
ain't that much randomness anymore.

That, or rely on something that's not just a simple shared secret
(something like what Andrew Gierth suggested). And AFAICS, his
suggestion allows us to manage without having to update the cancel key
in shared memory at all - but it does require a protocol modification.

//Magnus

#9Magnus Hagander
magnus@hagander.net
In reply to: Andrew Gierth (#6)
Re: Replay attack of query cancel

Andrew Gierth wrote:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if we can do something diffie-hellman'ish, where we have
a parameter exchanged in the initial SSL'ed handshake, which is
later used to generate new cancel keys each time the previous one
is used.

Tom> Seems like the risk of getting out of sync would outweigh any
Tom> benefits. Lose one cancel message in the network, you have no
Tom> hope of getting any more accepted.

That's easily solved: when the client wants to do a cancel, have it
send, in place of the actual cancel key, an integer N and the value
HMAC(k,N) where k is the cancel key. Replay is prevented by requiring
the value of N to be strictly greater than any previous value
successfully used for this session. (Since we already have md5 code,
HMAC-MD5 would be the obvious choice.)

I like this approach. It does away with the sharead memory hackery - we
just need to store one more number in the postmaster array, which
shouldn't be a problem.

Migration to this could probably be handled without a version change
to the protocol, by defining a new SecureCancelRequest message and a
GUC to control whether the old CancelRequest message is accepted or
ignored.

We could even have a third setting - accept, but write a warning to the log.

The key length for the cancel key can be increased with a
minor-version change to the protocol (if client asks for protocol 3.1,
send it a longer key, otherwise a shorter one).

Yeah, that seems easy enough. The question is how important is it to
increase the cancel key length? Should we do it now, or should we push
it off until whenever we have to bump the protocol version number anyway?

If we don't touch the protocol version, we could in theory at least
backpatch this as a fix for those who are really concerned about this
issue. But it's probably too big a change for that?

/Magnus

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#9)
Re: Replay attack of query cancel

Magnus Hagander <magnus@hagander.net> writes:

Andrew Gierth wrote:

That's easily solved: when the client wants to do a cancel, have it
send, in place of the actual cancel key, an integer N and the value
HMAC(k,N) where k is the cancel key. Replay is prevented by requiring
the value of N to be strictly greater than any previous value
successfully used for this session. (Since we already have md5 code,
HMAC-MD5 would be the obvious choice.)

I like this approach.

It's not a bad idea, if we are willing to change the protocol.

If we don't touch the protocol version, we could in theory at least
backpatch this as a fix for those who are really concerned about this
issue.

Huh? How can you argue this isn't a protocol change?

[ thinks for a bit... ] You could make it a change in the cancel
protocol, which is to some extent independent of the main FE/BE
protocol. The problem is: how can the client know whether it's okay to
use this new protocol for cancel?

regards, tom lane

#11Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#10)
Re: Replay attack of query cancel

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Andrew Gierth wrote:

That's easily solved: when the client wants to do a cancel, have it
send, in place of the actual cancel key, an integer N and the value
HMAC(k,N) where k is the cancel key. Replay is prevented by requiring
the value of N to be strictly greater than any previous value
successfully used for this session. (Since we already have md5 code,
HMAC-MD5 would be the obvious choice.)

I like this approach.

It's not a bad idea, if we are willing to change the protocol.

If we don't touch the protocol version, we could in theory at least
backpatch this as a fix for those who are really concerned about this
issue.

Huh? How can you argue this isn't a protocol change?

Um. By looking at it only from the backend perspective? *blush*

[ thinks for a bit... ] You could make it a change in the cancel
protocol, which is to some extent independent of the main FE/BE
protocol. The problem is: how can the client know whether it's okay to
use this new protocol for cancel?

Yeah, that's the point that will require a protocol bump, I think. Since
there is no response to the cancel packet, we can't even do things like
sending in a magic key and look at the response (which would be a rather
ugly hack, but doable if we had a success/fail response to the cancel
packet).

I guess bumping the protocol to 3.1 pretty much kills any chance for a
backpatch though :( Since a "new libpq" would no longer be able to talk
to an old server, if I remember the logic correctly?

//Magnus

In reply to: Tom Lane (#10)
Re: Replay attack of query cancel

Tom Lane wrote:

[ thinks for a bit... ] You could make it a change in the cancel
protocol, which is to some extent independent of the main FE/BE
protocol. The problem is: how can the client know whether it's okay to
use this new protocol for cancel?

Two options:
a. Send two cancelkeys in rapid succession at session startup, whereas
the first one is 0 or something. The client can detect the first
"special" cancelkey and then knows that the connection supports
cancelmethod 2.
b. At sessionstartup, advertise a new runtimeparameter:
cancelmethod=plainkey,hmaccoded
which the client can then chose from.

I'd prefer b over a.
--
Sincerely,
Stephen R. van den Berg.

"And now for something *completely* different!"

#13Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#11)
Re: Replay attack of query cancel

"Magnus Hagander" <magnus@hagander.net> writes:

Yeah, that's the point that will require a protocol bump, I think. Since
there is no response to the cancel packet, we can't even do things like
sending in a magic key and look at the response (which would be a rather
ugly hack, but doable if we had a success/fail response to the cancel
packet).

From the server point of view we could accept either kind of cancel message
for the first cancel message and set a variable saying which to expect from
there forward. If the first cancel message is an old-style message then we
always expect old-style messages. If it's a new-style message then we require
new-style messages and keep track of the counter to require a monotically
increasing counter.

From the client point-of-view we have no way to know if the server is going to
accept new-style cancel messages though. We could try sending the new-style
message and see if we get an error (do we get an error if you send an invalid
cancel message?).

We could have the server indicate it's the new protocol by sending the initial
cancel key twice. If the client sees more than one cancel key it automatically
switches to new-style cancel messages.

Or we could just bump the protocol version.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#14Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#13)
Re: Replay attack of query cancel

Gregory Stark wrote:

"Magnus Hagander" <magnus@hagander.net> writes:

Yeah, that's the point that will require a protocol bump, I think. Since
there is no response to the cancel packet, we can't even do things like
sending in a magic key and look at the response (which would be a rather
ugly hack, but doable if we had a success/fail response to the cancel
packet).

From the server point of view we could accept either kind of cancel message
for the first cancel message and set a variable saying which to expect from
there forward. If the first cancel message is an old-style message then we
always expect old-style messages. If it's a new-style message then we require
new-style messages and keep track of the counter to require a monotically
increasing counter.

From the client point-of-view we have no way to know if the server is going to
accept new-style cancel messages though. We could try sending the new-style
message and see if we get an error (do we get an error if you send an invalid
cancel message?).

No, that is the point I made above - we don't respond to the cancel
message *at all*.

We could have the server indicate it's the new protocol by sending the initial
cancel key twice. If the client sees more than one cancel key it automatically
switches to new-style cancel messages.

That will still break things like JDBC I think - they only expect one
cancel message, and then move on to expect other things.

Or we could just bump the protocol version.

Yeah, but that would kill backwards compatibility in that the new libpq
could no longer talk to old servers.

What would work is using a parameter field, per Stephen's suggestion
elsewhere in the thread. Older libpq versions should just ignore the
parameter if they don't know what it is. Question is, is that too ugly a
workaround, since we'll need to keep it around forever? (We have special
handling of a few other parameters already, so maybe not?)

//Magnus

In reply to: Magnus Hagander (#14)
Re: Replay attack of query cancel

Magnus Hagander wrote:

Gregory Stark wrote:

"Magnus Hagander" <magnus@hagander.net> writes:
We could have the server indicate it's the new protocol by sending the initial
cancel key twice. If the client sees more than one cancel key it automatically
switches to new-style cancel messages.

That will still break things like JDBC I think - they only expect one
cancel message, and then move on to expect other things.

Why didn't they follow recommended practice to process any message
anytime? Was/is there a specific reason to avoid that in that driver?
(Just curious).

What would work is using a parameter field, per Stephen's suggestion
elsewhere in the thread. Older libpq versions should just ignore the
parameter if they don't know what it is. Question is, is that too ugly a
workaround, since we'll need to keep it around forever? (We have special
handling of a few other parameters already, so maybe not?)

You only need to keep the runtimeparameter for as long as you don't bump
the protocol version.
Then again, runtimeparameters are cheap.
--
Sincerely,
Stephen R. van den Berg.

"And now for something *completely* different!"

#16Magnus Hagander
magnus@hagander.net
In reply to: Stephen R. van den Berg (#15)
Re: Replay attack of query cancel

Stephen R. van den Berg wrote:

Magnus Hagander wrote:

Gregory Stark wrote:

"Magnus Hagander" <magnus@hagander.net> writes:
We could have the server indicate it's the new protocol by sending the initial
cancel key twice. If the client sees more than one cancel key it automatically
switches to new-style cancel messages.

That will still break things like JDBC I think - they only expect one
cancel message, and then move on to expect other things.

Why didn't they follow recommended practice to process any message
anytime? Was/is there a specific reason to avoid that in that driver?
(Just curious).

No idea, but that's how it is IIRC. And there are other drivers to think
about as well.

What would work is using a parameter field, per Stephen's suggestion
elsewhere in the thread. Older libpq versions should just ignore the
parameter if they don't know what it is. Question is, is that too ugly a
workaround, since we'll need to keep it around forever? (We have special
handling of a few other parameters already, so maybe not?)

You only need to keep the runtimeparameter for as long as you don't bump
the protocol version.
Then again, runtimeparameters are cheap.

Yeah, I guess that's true. Once you break backwards compatibility once,
you can break a couple of things at the same time :)

//Magnus

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#14)
Re: Replay attack of query cancel

Magnus Hagander <magnus@hagander.net> writes:

What would work is using a parameter field, per Stephen's suggestion
elsewhere in the thread. Older libpq versions should just ignore the
parameter if they don't know what it is.

+1 for that one; we have done it already for several send-at-startup
parameters that didn't use to be there, eg standard_conforming_strings.

I'd go with defining it as the maximum cancel version number
supported, eg "cancel_version = 31", with the understanding that
if it's not reported then 3.0 should be assumed.

So the plan looks like this, I think:

* Main FE/BE protocol doesn't change, but there might be an additional
value reported in the initial ParameterStatus messages. We believe that
this will not break any existing clients.

* Server accepts two different styles of cancel messages, identified
by different protocol numbers.

* Client decides which type to send based on looking for the
cancel_version parameter.

This seems back-patchable since neither old clients nor old servers
are broken: the only consequence of using an old one is your cancels
aren't sent securely, which frankly a lot of people aren't going
to care about anyway.

Note after looking at the postmaster code: the contents of new-style
cancel keys ought to be the backend PID in clear, the sequence number in
clear, and the hash of backend PID + cancel key + sequence number.
If we don't do it that way, the postmaster will need to apply the hash
function for *each* backend to see if it matches, which seems like
a lot more computation than we want. The postmaster needs to be able to
identify which backend is the potential match before executing the hash.

The main drawback I can see to keeping this backward-compatible is that
keeping the cancel key to 32 bits could still leave us vulnerable to
brute force attacks: once you've seen a cancel message, just try all
the possible keys till you get a match, and then you can generate a
cancel that will work. Can we refine the HMAC idea to defeat that?
Otherwise we're assuming that 2^32 HMACs take longer than the useful
life of a cancel key, which doesn't seem like a very future-proof
assumption.

regards, tom lane

#18Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#17)
Re: Replay attack of query cancel

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

What would work is using a parameter field, per Stephen's suggestion
elsewhere in the thread. Older libpq versions should just ignore the
parameter if they don't know what it is.

+1 for that one; we have done it already for several send-at-startup
parameters that didn't use to be there, eg standard_conforming_strings.

I'd go with defining it as the maximum cancel version number
supported, eg "cancel_version = 31", with the understanding that
if it's not reported then 3.0 should be assumed.

So the plan looks like this, I think:

* Main FE/BE protocol doesn't change, but there might be an additional
value reported in the initial ParameterStatus messages. We believe that
this will not break any existing clients.

* Server accepts two different styles of cancel messages, identified
by different protocol numbers.

With the additional point that there is a GUC variable to turn this off
or warn about it, right?

* Client decides which type to send based on looking for the
cancel_version parameter.

This seems back-patchable since neither old clients nor old servers
are broken: the only consequence of using an old one is your cancels
aren't sent securely, which frankly a lot of people aren't going
to care about anyway.

Right. All those people running without SSL in the first place, for
example...

Note after looking at the postmaster code: the contents of new-style
cancel keys ought to be the backend PID in clear, the sequence number in
clear, and the hash of backend PID + cancel key + sequence number.
If we don't do it that way, the postmaster will need to apply the hash
function for *each* backend to see if it matches, which seems like
a lot more computation than we want. The postmaster needs to be able to
identify which backend is the potential match before executing the hash.

Yes, certainly. Otherwise, the cost ends up a lot higher on the server
than the client, which is a really simple DOS opportunity.

The main drawback I can see to keeping this backward-compatible is that
keeping the cancel key to 32 bits could still leave us vulnerable to
brute force attacks: once you've seen a cancel message, just try all
the possible keys till you get a match, and then you can generate a
cancel that will work. Can we refine the HMAC idea to defeat that?
Otherwise we're assuming that 2^32 HMACs take longer than the useful
life of a cancel key, which doesn't seem like a very future-proof
assumption.

Well, you're also going to have to increment <n> every time. We could
just cap <n> at <arbitrary level>. Say you can only cancel queries on a
single connection a million times or so. It's not perfect, but it gets
you somewhere.

Another option would be to send a new, longer, cancel key as part of the
separate parameter we're sending during startup (when we're indicating
which version we support). Then we'll use the longer cancel key if we're
dealing with "new style cancel" but keep the old 32-bit one for
backwards compatibility.

//Magnus

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#18)
Re: Replay attack of query cancel

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

* Server accepts two different styles of cancel messages, identified
by different protocol numbers.

With the additional point that there is a GUC variable to turn this off
or warn about it, right?

I see pretty much no value in that.

The main drawback I can see to keeping this backward-compatible is that
keeping the cancel key to 32 bits could still leave us vulnerable to
brute force attacks: once you've seen a cancel message, just try all
the possible keys till you get a match, and then you can generate a
cancel that will work. Can we refine the HMAC idea to defeat that?
Otherwise we're assuming that 2^32 HMACs take longer than the useful
life of a cancel key, which doesn't seem like a very future-proof
assumption.

Well, you're also going to have to increment <n> every time. We could
just cap <n> at <arbitrary level>. Say you can only cancel queries on a
single connection a million times or so. It's not perfect, but it gets
you somewhere.

Once you've brute-forced the secret key, you can just use an <n> that's
somewhat more than the last one the client used, assuming you've been
sniffing the connection the whole time. Or use one that's just a bit
less than what you can predict the server will take.

Not only do you get to kill the current query, but you'll have prevented
the client from issuing legitimate cancels after that, since it won't
know you bumped the server's <n> value. So the idea still needs work.

Another option would be to send a new, longer, cancel key as part of the
separate parameter we're sending during startup (when we're indicating
which version we support). Then we'll use the longer cancel key if we're
dealing with "new style cancel" but keep the old 32-bit one for
backwards compatibility.

Yeah. So then we just need one added parameter: secure_cancel_key =
string.

BTW, should we make all of this conditional on the use of an SSL
connection? If the original sending of the cancel key isn't secure
against sniffing, it's hard to see what anyone is buying with all the
added computation.

regards, tom lane

#20Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#19)
Re: Replay attack of query cancel

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

* Server accepts two different styles of cancel messages, identified
by different protocol numbers.

With the additional point that there is a GUC variable to turn this off
or warn about it, right?

I see pretty much no value in that.

The server is the point where you enforce security policies. The server
is where you say that SSL is required, for example. The same way, this
would let you at the server level say that secure cancels are required.

The main drawback I can see to keeping this backward-compatible is that
keeping the cancel key to 32 bits could still leave us vulnerable to
brute force attacks: once you've seen a cancel message, just try all
the possible keys till you get a match, and then you can generate a
cancel that will work. Can we refine the HMAC idea to defeat that?
Otherwise we're assuming that 2^32 HMACs take longer than the useful
life of a cancel key, which doesn't seem like a very future-proof
assumption.

Well, you're also going to have to increment <n> every time. We could
just cap <n> at <arbitrary level>. Say you can only cancel queries on a
single connection a million times or so. It's not perfect, but it gets
you somewhere.

Once you've brute-forced the secret key, you can just use an <n> that's
somewhat more than the last one the client used, assuming you've been
sniffing the connection the whole time. Or use one that's just a bit
less than what you can predict the server will take.

Not only do you get to kill the current query, but you'll have prevented
the client from issuing legitimate cancels after that, since it won't
know you bumped the server's <n> value. So the idea still needs work.

Yeah, I like the idea below much better.

Another option would be to send a new, longer, cancel key as part of the
separate parameter we're sending during startup (when we're indicating
which version we support). Then we'll use the longer cancel key if we're
dealing with "new style cancel" but keep the old 32-bit one for
backwards compatibility.

Yeah. So then we just need one added parameter: secure_cancel_key =
string.

Yup. Seems a whole lot cleaner.

BTW, should we make all of this conditional on the use of an SSL
connection? If the original sending of the cancel key isn't secure
against sniffing, it's hard to see what anyone is buying with all the
added computation.

Not sure. In practice it makes no difference, but our code is more
readable with less #ifdefs and branches. I don't think the added
computation makes any noticable difference at all in the normal
non-attacker scenario, so I'd just as well leave it in.

//Magnus

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#19)
#22Marko Kreen
markokr@gmail.com
In reply to: Heikki Linnakangas (#1)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Marko Kreen (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#1)
#25Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Magnus Hagander (#20)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Gierth (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#26)
#28Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#27)
#29Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#27)
#30Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#29)