Query cancel seems to be broken in master since Oct 17
Hi,
In pgjdbc we have regular regression testing against "build from
master" PostgreSQL, and recent master builds fail for "statement cancel"
test.
The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine, then "statement
cancel" started to fail.
The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.
Exactly the same driver and test version works fine for 8.4, 9.1, 9.2, 9.3,
9.4, 9.5, and 9.6:
Here's a sample build report:
https://travis-ci.org/pgjdbc/pgjdbc/builds/168444341
Any hints what could be the issue?
Was the breakage intentional?
Vladimir
On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:
Hi,
In pgjdbc we have regular regression testing against "build from
master" PostgreSQL, and recent master builds fail for "statement cancel"
test.The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine,
then "statement cancel" started to fail.
The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.Exactly the same driver and test version works fine for 8.4, 9.1, 9.2,
9.3, 9.4, 9.5, and 9.6:
Here's a sample build report: https://travis-ci.org/
pgjdbc/pgjdbc/builds/168444341Any hints what could be the issue?
Was the breakage intentional?
My guess is it's related to this:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c
That's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.
What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
What platform does the postgres server run on?
Ubuntu
OS name: "linux", version: "3.19.0-66-generic", arch: "amd64", family:
"unix"
2016-10-18 11:05 GMT+03:00 Magnus Hagander <magnus@hagander.net>:
Show quoted text
On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:Hi,
In pgjdbc we have regular regression testing against "build from
master" PostgreSQL, and recent master builds fail for "statement cancel"
test.The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine,
then "statement cancel" started to fail.
The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.Exactly the same driver and test version works fine for 8.4, 9.1, 9.2,
9.3, 9.4, 9.5, and 9.6:
Here's a sample build report: https://travis-ci.org/
pgjdbc/pgjdbc/builds/168444341Any hints what could be the issue?
Was the breakage intentional?My guess is it's related to this: https://git.postgresql.org/
gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e
258ff3a77cThat's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.
My guess is it's related to this:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c
That's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.
What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?
I can confirm that query cancel is broken in HEAD on RHEL6.
regression=# select pg_sleep(10);
^CCancel request sent
... nothing happens for the balance of the 10 seconds ...
regression=#
There's a smoking gun in the postmaster log:
2016-10-18 09:10:34.547 EDT [18502] LOG: wrong key in cancel request for process 18491
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/18/2016 04:13 PM, Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.My guess is it's related to this:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c
That's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.
What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?I can confirm that query cancel is broken in HEAD on RHEL6.
regression=# select pg_sleep(10);
^CCancel request sent
... nothing happens for the balance of the 10 seconds ...
regression=#There's a smoking gun in the postmaster log:
2016-10-18 09:10:34.547 EDT [18502] LOG: wrong key in cancel request for process 18491
Ok, I've reverted that commit for now. It clearly needs more thought,
because of this, and the pademelon failure discussed on the other thread.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas <hlinnaka@iki.fi> writes:
On 10/18/2016 04:13 PM, Tom Lane wrote:
There's a smoking gun in the postmaster log:
2016-10-18 09:10:34.547 EDT [18502] LOG: wrong key in cancel request for process 18491
Ok, I've reverted that commit for now. It clearly needs more thought,
because of this, and the pademelon failure discussed on the other thread.
I think that was an overreaction. The problem is pretty obvious after
adding some instrumentation:
2016-10-18 09:57:47.508 EDT [21229] LOG: wrong key (0x7B7E4D5E, expected 0xF0F804017B7E4D5E) in cancel request for process 21228
To wit, the various cancel_key backend variables are declared as "long",
and the new code
if (!pg_strong_random(&MyCancelKey, sizeof(MyCancelKey)))
is therefore computing an 8-byte random value on 64-bit-long machines.
But only 4 bytes go to the client and come back.
The cleanest fix might be to change those various "long" variables
to uint32. You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
The cleanest fix might be to change those various "long" variables
to uint32. You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.
Or actually, no, you wouldn't have to think very hard. I was supposing
that those calls were declared to traffic in "long"s, but they aren't
and never have been, at least not since SUSv2:
uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);
So s/long/uint32/ would probably fix it just fine.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 18, 2016 at 11:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
The cleanest fix might be to change those various "long" variables
to uint32. You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.Or actually, no, you wouldn't have to think very hard. I was supposing
that those calls were declared to traffic in "long"s, but they aren't
and never have been, at least not since SUSv2:uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);So s/long/uint32/ would probably fix it just fine.
The pg_strong_random patch has been reverted, what's done is done.
FWIW only doing that is needed to fix the problem as already mentioned
by Tom:
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f232083..634578d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,7 +38,7 @@ volatile uint32 CritSectionCount = 0;
int MyProcPid;
pg_time_t MyStartTime;
struct Port *MyProcPort;
-long MyCancelKey;
+uint32 MyCancelKey;
int MyPMChildSlot;
/*
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 78545da..5e623a1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -162,7 +162,7 @@ extern PGDLLIMPORT int MyProcPid;
extern PGDLLIMPORT pg_time_t MyStartTime;
extern PGDLLIMPORT struct Port *MyProcPort;
extern PGDLLIMPORT struct Latch *MyLatch;
-extern long MyCancelKey;
+extern uint32 MyCancelKey;
extern int MyPMChildSlot;
extern char OutputFileName[];
I'll send an update for that, as well as a solution for the pademelon
problem on the SCRAM thread.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 18, 2016 at 10:03:39AM -0400, Tom Lane wrote:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
On 10/18/2016 04:13 PM, Tom Lane wrote:
There's a smoking gun in the postmaster log:
2016-10-18 09:10:34.547 EDT [18502] LOG: wrong key in cancel request for process 18491Ok, I've reverted that commit for now. It clearly needs more thought,
because of this, and the pademelon failure discussed on the other thread.I think that was an overreaction. The problem is pretty obvious after
adding some instrumentation:
I endorse Heikki's revert.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers