12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
Hi folks,
with 12.1, after a couple of queries, at a random place, the clientlib
does produce a failed query without giving reason or error-message [1]the message from ruby is PG::ConnectionBad: PQconsumeInput() : <query>.
Then when retrying, the clientlib switches off signal handling and
sits inactive in memory (needs kill -9).
The server log shows no error or other hint.
The behaviour happens rarely with trust access, and almost always when
using Kerberos5 (Heimdal as included in FreeBSD).
11.5 clientlib has none of this behaviour and seems to work fine, like
10.10 did.
Environment:
OS FreeBSD 11.3
Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2
(it makes no difference if that one is compiled with
the 12.1 or the 10.10 library)
Server 12.1
[1]: the message from ruby is PG::ConnectionBad: PQconsumeInput() : <query>
PG::ConnectionBad: PQconsumeInput() : <query>
rgds,
PMc
On 1/9/20 10:18 AM, Peter wrote:
Hi folks,
with 12.1, after a couple of queries, at a random place, the clientlib
does produce a failed query without giving reason or error-message [1].
Then when retrying, the clientlib switches off signal handling and
sits inactive in memory (needs kill -9).The server log shows no error or other hint.
The behaviour happens rarely with trust access, and almost always when
using Kerberos5 (Heimdal as included in FreeBSD).11.5 clientlib has none of this behaviour and seems to work fine, like
10.10 did.
Might want to take at below:
https://github.com/ged/ruby-pg/issues/311
Environment:
OS FreeBSD 11.3
Applic. Ruby-on-Rails, ruby=2.5.7, gem 'pg'=1.2.2
(it makes no difference if that one is compiled with
the 12.1 or the 10.10 library)
Server 12.1[1] the message from ruby is
PG::ConnectionBad: PQconsumeInput() : <query>rgds,
PMc
--
Adrian Klaver
adrian.klaver@aklaver.com
Peter <pmc@citylink.dinoex.sub.org> writes:
with 12.1, after a couple of queries, at a random place, the clientlib
does produce a failed query without giving reason or error-message [1].
Then when retrying, the clientlib switches off signal handling and
sits inactive in memory (needs kill -9).
Seems like you'd better raise this with the author(s) of the "pg"
Ruby gem. Perhaps they read this mailing list, but more likely
they have a specific bug reporting mechanism somewhere.
regards, tom lane
On Thu, Jan 09, 2020 at 10:47:00AM -0800, Adrian Klaver wrote:
!
! Might want to take at below:
!
! https://github.com/ged/ruby-pg/issues/311
Thanks a lot! That option
gssencmode: "disable"
seems to solve the issue.
But I think the people there are concerned by a different issue: they
are bothering about fork(), while my flaw appears also when I do *NOT*
do fork. Also the picture is slightly different; they get segfaults, I
get misbehaviour.
rgds,
PMc
On Thu, Jan 09, 2020 at 01:48:01PM -0500, Tom Lane wrote:
! Peter <pmc@citylink.dinoex.sub.org> writes:
! > with 12.1, after a couple of queries, at a random place, the clientlib
! > does produce a failed query without giving reason or error-message [1].
! > Then when retrying, the clientlib switches off signal handling and
! > sits inactive in memory (needs kill -9).
!
! Seems like you'd better raise this with the author(s) of the "pg"
! Ruby gem. Perhaps they read this mailing list, but more likely
! they have a specific bug reporting mechanism somewhere.
Tom,
I don't think this has anything to do with "pg". Just checked: I get
garbage and misbehaviour on the "psql" command line tool also:
$ psql -h myhost flowmdev
psql (12.1)
GSSAPI-encrypted connection
Type "help" for help.
flowmdev=> select * from flows;
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
flowmdev=> select * from flows;
server sent data ("D" message) without prior row description ("T" message)
flowmdev=> select * from flows;
message type 0x54 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
id | name | ... <here finally starts the data as expected>
To the contrary:
$ PGGSSENCMODE="disable" psql -h myhost flowmdev
psql (12.1)
Type "help" for help.
flowmdev=> select * from flows;
id | name | ... <all working as normal>
rgds,
PMc
Peter <pmc@citylink.dinoex.sub.org> writes:
I don't think this has anything to do with "pg". Just checked: I get
garbage and misbehaviour on the "psql" command line tool also:
$ psql -h myhost flowmdev
psql (12.1)
GSSAPI-encrypted connection
Type "help" for help.
flowmdev=> select * from flows;
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
message type 0x44 arrived from server while idle
Oh ... that does look pretty broken. However, we've had no other similar
reports, so there must be something unique to your configuration. Busted
GSSAPI library, or some ABI inconsistency, perhaps? What platform are you
on, and how did you build or obtain this Postgres code?
regards, tom lane
On Thu, Jan 09, 2020 at 04:31:44PM -0500, Tom Lane wrote:
! Peter <pmc@citylink.dinoex.sub.org> writes:
! > flowmdev=> select * from flows;
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
! > message type 0x44 arrived from server while idle
!
! Oh ... that does look pretty broken. However, we've had no other similar
! reports, so there must be something unique to your configuration. Busted
! GSSAPI library, or some ABI inconsistency, perhaps? What platform are you
! on, and how did you build or obtain this Postgres code?
This is a FreeBSD 11.3-p3 r351611 built from source. Postgres is built
from
https://svn0.eu.freebsd.org/ports/branches/2019Q4 (rel. 12r1) or
https://svn0.eu.freebsd.org/ports/branches/2020Q1 (rel. 12.1)
with "make package install".
I have a build environment for base&ports that forces recompiles on
any change and should make ABI inconsistencies quite hard to create.
All local patches are versioned and documented; there are none that
I could imagine influencing this.
There are no patches on postgres. Also no patches on the GSSAPI.
There are a couple of patches on the Heimdal, to fix broken
commandline parsing, broken pidfile handling and broken daemonization.
None of them touches the core functionality (like key handling).
But I just recognize something of interest (which I had taken for
granted when importing the database): the flaw does NOT appear when
accessing the database from the server's local system (with TCP and
GSSAPI encryption active). Only from remote system.
But then, if I go on the local system, and change the mtu:
# ifconfig lo0 mtu 1500
and restart the server, then I get the exact same errors locally.
I don't get a clue of that, it doesn't make sense. With the default
lo0 mtu of 16384 the packets go on the network with the full 8256
bytes you send. With mtu 1500 they are split into 1448 byte pieces;
but TCP is supposed to handle this transparently. And what difference
would the encryption make with this?
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
These are also bigger. No, I don't understand that.
The only thing - these are all VIMAGE jails. VIMAGE was considered
'experimental' some time ago, and went productive in FreeBSD 12.0,
and 11.3 is lower and later than 12.0 - whatever that concedes.
Another thing I found out: the slower the network, the worse the
errors. So might it be nobody complained just because those people
usually having GSSAPI also have very fast machines and networks
nowadays?
When I go to packet-radio speed:
# ipfw pipe 4 config bw 10kbit/s
then I can see the query returning empty at the first received bytes:
flowmdev=# select * from flows;
flowmdev=#
and not even waiting the 8 seconds for the first block to arrive.
rgds,
PMc
[ redirecting to -hackers ]
Peter <pmc@citylink.dinoex.sub.org> writes:
But I just recognize something of interest (which I had taken for
granted when importing the database): the flaw does NOT appear when
accessing the database from the server's local system (with TCP and
GSSAPI encryption active). Only from remote system.
But then, if I go on the local system, and change the mtu:
# ifconfig lo0 mtu 1500
and restart the server, then I get the exact same errors locally.
Oh-ho, that is interesting.
Looking at our regression tests for gssenc, I observe that they
only try to transport a negligible amount of data (viz, single-row
boolean results). So we'd not notice any problem that involved
multiple-packet messages.
I modified the kerberos test so that it tries a query with a less
negligible amount of data, and what I find is:
* passes on Fedora 30, with either default or 1500 mtu
* passes on FreeBSD 12.0 with default mtu
* FAILS on FreeBSD 12.0 with mtu = 1500
I haven't run it further to ground than that, but there's definitely
something fishy here. Based on just these results one would be hard
pressed to say if it's our bug or FreeBSD's, but your report that you
don't see the failure with PG 11 makes it sound like our problem.
OTOH, I also find that there's some hysteresis in the behavior:
once it's failed, reverting the mtu to the default setting doesn't
necessarily make subsequent runs pass. It's really hard to explain
that behavior if it's our bug.
I tested today's HEAD of our code with up-to-date FreeBSD 12.0-RELEASE-p12
running on amd64 bare metal, no jails or emulators or VIMAGE or anything.
Attached are proposed test patch, as well as client-side regression log
output from a failure. (There's no evidence of distress in the
postmaster log, same as your report.)
regards, tom lane
I wrote:
I haven't run it further to ground than that, but there's definitely
something fishy here. Based on just these results one would be hard
pressed to say if it's our bug or FreeBSD's, but your report that you
don't see the failure with PG 11 makes it sound like our problem.
Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
to the fact that setting errno is a critical part of its API.
Sometimes it returns -1 while leaving errno in a state that causes
pqReadData to draw the wrong conclusions. In particular that can
happen when it reads an incomplete packet, and that's very timing
dependent, which is why this is so ticklish to reproduce.
I'll have a patch in a little while.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I wrote:
I haven't run it further to ground than that, but there's definitely
something fishy here. Based on just these results one would be hard
pressed to say if it's our bug or FreeBSD's, but your report that you
don't see the failure with PG 11 makes it sound like our problem.Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
to the fact that setting errno is a critical part of its API.
Sometimes it returns -1 while leaving errno in a state that causes
pqReadData to draw the wrong conclusions. In particular that can
happen when it reads an incomplete packet, and that's very timing
dependent, which is why this is so ticklish to reproduce.
Ah-hah. Not sure if that was Robbie or myself (probably me, really,
since I rewrote a great deal of that code). I agree that the regression
tests don't test with very much data, but I tested pushing quite a bit
of data through and didn't see any issues with my testing. Apparently I
was getting pretty lucky. :/
I'll have a patch in a little while.
That's fantastic, thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Ah-hah. Not sure if that was Robbie or myself (probably me, really,
since I rewrote a great deal of that code). I agree that the regression
tests don't test with very much data, but I tested pushing quite a bit
of data through and didn't see any issues with my testing. Apparently I
was getting pretty lucky. :/
You were *very* lucky, because this code is absolutely full of mistakes
related to incomplete reads, inadequate or outright wrong error handling,
etc.
I was nearly done cleaning that up, when it sank into me that
fe-secure-gssapi.c uses static buffers for partially-read or
partially-encoded data. That means that any client trying to use
multiple GSSAPI-encrypted connections is very likely to see breakage
due to different connections trying to use the same buffers concurrently.
I wonder whether that doesn't explain the complaints mentioned upthread
from the Ruby folks.
(be-secure-gssapi.c is coded identically, but there it's OK since
any backend only has one client connection to deal with.)
I'll have a patch in a little while.
That's fantastic, thanks!
This is gonna take longer than I thought.
regards, tom lane
On Fri, Jan 10, 2020 at 12:59:09PM -0500, Tom Lane wrote:
! [ redirecting to -hackers ]
! I modified the kerberos test so that it tries a query with a less
! negligible amount of data, and what I find is:
!
! * passes on Fedora 30, with either default or 1500 mtu
! * passes on FreeBSD 12.0 with default mtu
! * FAILS on FreeBSD 12.0 with mtu = 1500
So, it is not related to only VIMAGE @ R11.3, and -more important to
me- it is not only happening in my kitchen. Thank You very much :)
! OTOH, I also find that there's some hysteresis in the behavior:
! once it's failed, reverting the mtu to the default setting doesn't
! necessarily make subsequent runs pass. It's really hard to explain
! that behavior if it's our bug.
That's affirmative. Made me go astray a few times when trying to
isolate it.
On Fri, Jan 10, 2020 at 02:25:22PM -0500, Tom Lane wrote:
! Ah, I have it: whoever wrote pg_GSS_read() failed to pay attention
! to the fact that setting errno is a critical part of its API.
! Sometimes it returns -1 while leaving errno in a state that causes
Wow, that's fast. My probability-guess this morning was either some
hard-coded 8192-byte buffer, or something taking an [EWOULDBLOCK] for
OK. Then I decided to not look into the code, as You will be much
faster anyway, and there are other pieces of software where I do
not have such a competent peer to talk to...
Anyway, thanks a lot!
PMc
Greetings,
On Fri, Jan 10, 2020 at 15:58 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Stephen Frost <sfrost@snowman.net> writes:
Ah-hah. Not sure if that was Robbie or myself (probably me, really,
since I rewrote a great deal of that code). I agree that the regression
tests don't test with very much data, but I tested pushing quite a bit
of data through and didn't see any issues with my testing. Apparently I
was getting pretty lucky. :/You were *very* lucky, because this code is absolutely full of mistakes
related to incomplete reads, inadequate or outright wrong error handling,
etc.
I guess so.. I specifically remember running into problems transferring
large data sets before I rewrote the code but after doing so it was
reliable (for me anyway...).
I was nearly done cleaning that up, when it sank into me that
fe-secure-gssapi.c uses static buffers for partially-read or
partially-encoded data. That means that any client trying to use
multiple GSSAPI-encrypted connections is very likely to see breakage
due to different connections trying to use the same buffers concurrently.
Ughhh. That’s a completely valid point and one I should have thought of.
I wonder whether that doesn't explain the complaints mentioned upthread
from the Ruby folks.
No- the segfault issue has been demonstrated to be able to reproduce
without any PG code involved at all, and it also involved threads with only
one connection, at least as I recall (on my phone atm).
(be-secure-gssapi.c is coded identically, but there it's OK since
any backend only has one client connection to deal with.)
Right... I actually wrote the backend code first and then largely copied
it to the front end, and then adjusted it, but obviously insufficiently as
I had been thinking of just the one connection that the backend has to deal
with.
Thanks!
Stephen
Here's a draft patch that cleans up all the logic errors I could find.
I also expanded the previous patch for the kerberos test so that it
verifies that we can upload a nontrivial amount of data to the server,
as well as download.
I also spent a fair amount of effort on removing cosmetic differences
between the comparable logic in be-secure-gssapi.c and fe-secure-gssapi.c,
such that the two files can now be diff'd to confirm that be_gssapi_write
and be_gssapi_read implement identical logic to pg_GSS_write and
pg_GSS_read. (They did not, before :-(.)
This does not deal with the problem that libpq shouldn't be using
static data space for this purpose. It seems reasonable to me to
leave that for a separate patch.
This passes tests for me, on my FreeBSD build with lo0 mtu = 1500.
It wouldn't hurt to get some more mileage on it though. Peter,
I didn't follow how to set up the "packet radio speed" environment
that you mentioned, but if you could beat on this with that setup
it would surely be useful.
regards, tom lane
Attachments:
fix-assorted-gssapi-bugs-1.patchtext/x-diff; charset=us-ascii; name=fix-assorted-gssapi-bugs-1.patchDownload+296-214
I wrote:
Here's a draft patch that cleans up all the logic errors I could find.
So last night I was assuming that this problem just requires more careful
attention to what to return in the error exit paths. In the light of
morning, though, I realize that the algorithms involved in
be-secure-gssapi.c and fe-secure-gssapi.c are just fundamentally wrong:
* On the read side, the code will keep looping until it gets a no-data
error from the underlying socket call. This is silly. In every or
almost every use, the caller's read length request corresponds to the
size of a buffer that's meant to be larger than typical messages, so
that betting that we're going to fill that buffer completely is the
wrong way to bet. Meanwhile, it's fairly likely that the incoming
encrypted packet's length *does* correspond to some actual message
boundary; that would only not happen if the sender is forced to break
up a message, which ought to be a minority situation, else our buffer
size choices are too small. So it's very likely that the looping just
results in doubling the number of read() calls that are made, with
half of them failing with EWOULDBLOCK. What we should do instead is
return to the caller whenever we finish handing back the decrypted
contents of a packet. We can do the read() on the next call, after
the caller's dealt with that data.
* On the write side, if the code encrypts some data and then gets
EWOULDBLOCK trying to write it, it will tell the caller that it
successfully wrote that data. If that was all the data the caller
had to write (again, not so unlikely) this is a catastrophic
mistake, because the caller will be satisfied and will go to sleep,
rather than calling again to attempt another write. What we *must*
do is to reflect the write failure verbatim whether or not we
encrypted some data. We must remember how much data we encrypted
and then discount that much of the caller's supplied data next time.
There are hints in the existing comments that somebody understood
this at one point, but the code isn't acting that way today.
I expect that I can prove point B by hot-wiring pqsecure_raw_write
to randomly return EWOULDBLOCK (instead of making any write attempt)
every so often. I think strace will be enough to confirm point A,
but haven't tried it yet.
regards, tom lane
I wrote:
So last night I was assuming that this problem just requires more careful
attention to what to return in the error exit paths. In the light of
morning, though, I realize that the algorithms involved in
be-secure-gssapi.c and fe-secure-gssapi.c are just fundamentally wrong:
Here's a revised patch that attempts to deal with those issues.
(Still doesn't touch the static-buffer issue, though.)
The 0002 patch isn't meant for commit, but testing with that gives me
a whole lot more confidence that the gssapi code deals with EWOULDBLOCK
correctly.
regards, tom lane
I wrote:
(Still doesn't touch the static-buffer issue, though.)
And here's a delta patch for that. This fixes an additional portability
hazard, which is that there were various places doing stuff like
input.length = ntohl(*(uint32 *) PqGSSRecvBuffer);
That's a SIGBUS hazard on alignment-picky hardware, because there is
no guarantee that a variable that's just declared "char ...[...]"
will have any particular alignment. But malloc'ing the space will
provide maxaligned storage.
My FreeBSD testing has now given me enough confidence in these patches
that I'm just going to go ahead and push them. But, if you'd like to
do some more testing in advance of 12.2 release, please do.
regards, tom lane
Attachments:
0003-buffer-allocation-1.patchtext/x-diff; charset=us-ascii; name=0003-buffer-allocation-1.patchDownload+95-39
On Fri, Jan 10, 2020 at 10:51:50PM -0500, Tom Lane wrote:
! Here's a draft patch that cleans up all the logic errors I could find.
Okiee, thank You!
Let's see (was a bit busy yesterday trying to upgrade pgadmin3 -
difficult matter), now lets sort this out:
With the first patch applied (as from Friday - applied only on the
client side), the application did appear to work well.
But then, when engaging bandwidth-limiting to some modem-speed, it did
not work: psql would receive all (or most of) the data from a SELECT,
but then somehow not recognize the end of it and sit there and wait for
whatever:
flowmdev=> select * from flows;
^CCancel request sent
^CCancel request sent
Now with the new patches 0001+0003 applied, on both server & client,
all now running 12.1 release, on a first run I did not perceive
a malfunction, bandwidth limited or not.
I'll leave them applied, but this here will not experience serious
loads; You'll need somebody else to test for that...
rgds,
PMc
Peter <pmc@citylink.dinoex.sub.org> writes:
With the first patch applied (as from Friday - applied only on the
client side), the application did appear to work well.
But then, when engaging bandwidth-limiting to some modem-speed, it did
not work: psql would receive all (or most of) the data from a SELECT,
but then somehow not recognize the end of it and sit there and wait for
whatever:
Yeah, that's just the behavior I'd expect (and was able to reproduce
here) because of the additional design problem.
Now with the new patches 0001+0003 applied, on both server & client,
all now running 12.1 release, on a first run I did not perceive
a malfunction, bandwidth limited or not.
I'll leave them applied, but this here will not experience serious
loads; You'll need somebody else to test for that...
Cool, let us know if you do see any problems.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I wrote:
Here's a draft patch that cleans up all the logic errors I could find.
So last night I was assuming that this problem just requires more careful
attention to what to return in the error exit paths. In the light of
morning, though, I realize that the algorithms involved in
be-secure-gssapi.c and fe-secure-gssapi.c are just fundamentally wrong:* On the read side, the code will keep looping until it gets a no-data
error from the underlying socket call. This is silly. In every or
almost every use, the caller's read length request corresponds to the
size of a buffer that's meant to be larger than typical messages, so
that betting that we're going to fill that buffer completely is the
wrong way to bet. Meanwhile, it's fairly likely that the incoming
encrypted packet's length *does* correspond to some actual message
boundary; that would only not happen if the sender is forced to break
up a message, which ought to be a minority situation, else our buffer
size choices are too small. So it's very likely that the looping just
results in doubling the number of read() calls that are made, with
half of them failing with EWOULDBLOCK. What we should do instead is
return to the caller whenever we finish handing back the decrypted
contents of a packet. We can do the read() on the next call, after
the caller's dealt with that data.
Yeah, I agree that this is a better approach. Doing unnecessary
read()'s certainly isn't ideal but beyond being silly it doesn't sound
like this was fundamentally broken..? (yes, the error cases certainly
weren't properly being handled, I understand that)
* On the write side, if the code encrypts some data and then gets
EWOULDBLOCK trying to write it, it will tell the caller that it
successfully wrote that data. If that was all the data the caller
had to write (again, not so unlikely) this is a catastrophic
mistake, because the caller will be satisfied and will go to sleep,
rather than calling again to attempt another write. What we *must*
do is to reflect the write failure verbatim whether or not we
encrypted some data. We must remember how much data we encrypted
and then discount that much of the caller's supplied data next time.
There are hints in the existing comments that somebody understood
this at one point, but the code isn't acting that way today.
That's a case I hadn't considered and you're right- the algorithm
certainly wouldn't work in such a case. I don't recall specifically if
the code had handled it better previously, or not, but I do recall there
was something previously about being given a buffer and then having the
API defined as "give me back the exact same buffer because I had to
stop" and I recall finding that to ugly, but I get it now, seeing this
issue. I'd certainly be happier if there was a better alternative but I
don't know that there really is.
Thanks,
Stephen