Out-of-memory error reports in libpq

Started by Tom Laneover 4 years ago20 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

While cleaning out dead branches in my git repo, I came across an
early draft of what eventually became commit ffa2e4670 ("In libpq,
always append new error messages to conn->errorMessage"). I realized
that it contained a good idea that had gotten lost on the way to that
commit. Namely, let's reduce all of the 60-or-so "out of memory"
reports in libpq to calls to a common subroutine, and then let's teach
the common subroutine a recovery strategy for the not-unlikely
possibility that it fails to append the "out of memory" string to
conn->errorMessage. That recovery strategy of course is to reset the
errorMessage buffer to empty, hopefully regaining some space. We lose
whatever we'd had in the buffer before, but we have a better chance of
the "out of memory" message making its way to the user.

The first half of that just saves a few hundred bytes of repetitive
coding. However, I think that the addition of recovery logic is
important for robustness, because as things stand libpq may be
worse off than before for OOM handling. Before ffa2e4670, almost
all of these call sites did printfPQExpBuffer(..., "out of memory").
That would automatically clear the message buffer to empty, and
thereby be sure to report the out-of-memory failure if at all
possible. Now we might fail to report the thing that the user
really needs to know to make sense of what happened.

Therefore, I feel like this was an oversight in ffa2e4670,
and we ought to back-patch the attached into v14.

cc'ing the RMT in case they wish to object.

regards, tom lane

Attachments:

improve-libpq-OOM-reporting.patchtext/x-diff; charset=us-ascii; name=improve-libpq-OOM-reporting.patchDownload+110-124
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#1)
Re: Out-of-memory error reports in libpq

On 7/27/21, 3:41 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

The first half of that just saves a few hundred bytes of repetitive
coding. However, I think that the addition of recovery logic is
important for robustness, because as things stand libpq may be
worse off than before for OOM handling. Before ffa2e4670, almost
all of these call sites did printfPQExpBuffer(..., "out of memory").
That would automatically clear the message buffer to empty, and
thereby be sure to report the out-of-memory failure if at all
possible. Now we might fail to report the thing that the user
really needs to know to make sense of what happened.

IIUC, before ffa2e4670, callers mainly used printfPQExpBuffer(), which
always cleared the buffer before attempting to append the OOM message.
With ffa2e4670 applied, callers always attempt to append the OOM
message without resetting the buffer first. With this new change,
callers will attempt to append the OOM message without resetting the
buffer first, but if that fails, we fall back to the original behavior
before ffa2e4670.

+	if (PQExpBufferBroken(errorMessage))
+	{
+		resetPQExpBuffer(errorMessage);
+		appendPQExpBufferStr(errorMessage, msg);
+	}

I see that appendPQExpBufferStr() checks whether the buffer is broken
by way of enlargePQExpBuffer(), so the fallback steps roughly match
the calls to printfPQExpBuffer() before ffa2e4670.

-			appendPQExpBuffer(&conn->errorMessage,
-							  libpq_gettext("out of memory allocating GSSAPI buffer (%d)\n"),
-							  payloadlen);
+			pqReportOOM(conn);

I see that some context is lost in a few places (e.g., the one above
points to a GSSAPI buffer). Perhaps this extra context could be
useful to identify problematic areas, but it might be unlikely to help
much in these parts of libpq. In any case, the vast majority of
existing callers don't provide any extra context.

Overall, the patch looks good to me.

Therefore, I feel like this was an oversight in ffa2e4670,
and we ought to back-patch the attached into v14.

Back-patching to v14 seems reasonable to me.

Nathan

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#2)
Re: Out-of-memory error reports in libpq

"Bossart, Nathan" <bossartn@amazon.com> writes:

-			appendPQExpBuffer(&conn->errorMessage,
-							  libpq_gettext("out of memory allocating GSSAPI buffer (%d)\n"),
-							  payloadlen);
+			pqReportOOM(conn);

I see that some context is lost in a few places (e.g., the one above
points to a GSSAPI buffer). Perhaps this extra context could be
useful to identify problematic areas, but it might be unlikely to help
much in these parts of libpq. In any case, the vast majority of
existing callers don't provide any extra context.

Yeah, there are half a dozen places that currently print something
more specific than "out of memory". I judged that the value of this
was not worth the complexity it'd add to support it in this scheme.
Different opinions welcome of course.

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: Out-of-memory error reports in libpq

On Tue, Jul 27, 2021 at 10:31:25PM -0400, Tom Lane wrote:

Yeah, there are half a dozen places that currently print something
more specific than "out of memory". I judged that the value of this
was not worth the complexity it'd add to support it in this scheme.
Different opinions welcome of course.

I don't mind either that this removes a bit of context. For
unlikely-going-to-happen errors that's not worth the extra translation
cost. No objections from me for an integration into 14 as that's
straight-forward, and that would minimize conflicts between HEAD and
14 in the event of a back-patch

+pqReportOOM(PGconn *conn)
+{
+   pqReportOOMBuffer(&conn->errorMessage);
+}
+
+/*
+ * As above, but work with a bare error-message-buffer pointer.
+ */
+void
+pqReportOOMBuffer(PQExpBuffer errorMessage)
+{
Not much a fan of having two routines to do this job though.  I would
vote for keeping the one named pqReportOOM() with PQExpBuffer as
argument.
--
Michael
#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Out-of-memory error reports in libpq

On 7/27/21 6:40 PM, Tom Lane wrote:

While cleaning out dead branches in my git repo, I came across an
early draft of what eventually became commit ffa2e4670 ("In libpq,
always append new error messages to conn->errorMessage"). I realized
that it contained a good idea that had gotten lost on the way to that
commit. Namely, let's reduce all of the 60-or-so "out of memory"
reports in libpq to calls to a common subroutine, and then let's teach
the common subroutine a recovery strategy for the not-unlikely
possibility that it fails to append the "out of memory" string to
conn->errorMessage. That recovery strategy of course is to reset the
errorMessage buffer to empty, hopefully regaining some space. We lose
whatever we'd had in the buffer before, but we have a better chance of
the "out of memory" message making its way to the user.

The first half of that just saves a few hundred bytes of repetitive
coding. However, I think that the addition of recovery logic is
important for robustness, because as things stand libpq may be
worse off than before for OOM handling. Before ffa2e4670, almost
all of these call sites did printfPQExpBuffer(..., "out of memory").
That would automatically clear the message buffer to empty, and
thereby be sure to report the out-of-memory failure if at all
possible. Now we might fail to report the thing that the user
really needs to know to make sense of what happened.

Therefore, I feel like this was an oversight in ffa2e4670,
and we ought to back-patch the attached into v14.

cc'ing the RMT in case they wish to object.

I'm honored you've confused me with Alvaro :-)

This seems sensible, and we certainly shouldn't be worse off than
before, so let's do it.

I'm fine with having two functions for call simplicity, but I don't feel
strongly about it.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: Out-of-memory error reports in libpq

Michael Paquier <michael@paquier.xyz> writes:

On Tue, Jul 27, 2021 at 10:31:25PM -0400, Tom Lane wrote:

Yeah, there are half a dozen places that currently print something
more specific than "out of memory". I judged that the value of this
was not worth the complexity it'd add to support it in this scheme.
Different opinions welcome of course.

I don't mind either that this removes a bit of context. For
unlikely-going-to-happen errors that's not worth the extra translation
cost.

Yeah, the extra translatable strings are the main concrete cost of
keeping this behavior. But I'm dubious that labeling a small number
of the possible OOM points is worth anything, especially if they're
not providing the failed allocation request size. You can't tell if
that request was unreasonable or if it was just an unlucky victim
of bloat elsewhere. Unifying the reports into a common function
could be a starting point for more consistent/detailed OOM reports,
if anyone cared to work on that. (I hasten to add that I don't.)

+ pqReportOOMBuffer(&conn->errorMessage);

Not much a fan of having two routines to do this job though. I would
vote for keeping the one named pqReportOOM() with PQExpBuffer as
argument.

Here I've got to disagree. We do need the form with a PQExpBuffer
argument, because there are some places where that isn't a pointer
to a PGconn's errorMessage. But the large majority of the calls
are "pqReportOOM(conn)", and I think having to write that as
"pqReportOOM(&conn->errorMessage)" is fairly ugly and perhaps
error-prone.

I'm not wedded to the name "pqReportOOMBuffer" though --- maybe
there's some better name for that one?

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: Out-of-memory error reports in libpq

On 7/28/21 11:02 AM, Tom Lane wrote:

Here I've got to disagree. We do need the form with a PQExpBuffer
argument, because there are some places where that isn't a pointer
to a PGconn's errorMessage. But the large majority of the calls
are "pqReportOOM(conn)", and I think having to write that as
"pqReportOOM(&conn->errorMessage)" is fairly ugly and perhaps
error-prone.

I'm not wedded to the name "pqReportOOMBuffer" though --- maybe
there's some better name for that one?

Is it worth making the first one a macro?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: Out-of-memory error reports in libpq

Andrew Dunstan <andrew@dunslane.net> writes:

Is it worth making the first one a macro?

It'd be the same from a source-code perspective, but probably a
shade bulkier in terms of object code.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: Out-of-memory error reports in libpq

Hi,

On 2021-07-27 18:40:48 -0400, Tom Lane wrote:

The first half of that just saves a few hundred bytes of repetitive
coding. However, I think that the addition of recovery logic is
important for robustness, because as things stand libpq may be
worse off than before for OOM handling.

Agreed.

Before ffa2e4670, almost all of these call sites did
printfPQExpBuffer(..., "out of memory"). That would automatically
clear the message buffer to empty, and thereby be sure to report the
out-of-memory failure if at all possible. Now we might fail to report
the thing that the user really needs to know to make sense of what
happened.

Hm. It seems we should be able to guarantee that the recovery path can print
something, at least in the PGconn case. Is it perhaps worth pre-sizing
PGConn->errorMessage so it'd fit an error like this?

But perhaps that's more effort than it's worth.

+void
+pqReportOOMBuffer(PQExpBuffer errorMessage)
+{
+	const char *msg = libpq_gettext("out of memory\n");

I should probably know this, but I don't. Nor did I quickly find an answer. I
assume gettext() reliably and reasonably deals with OOM?

Looking in the gettext code I'm again scared by the fact that it takes locks
during gettext (because of stuff like erroring out of signal handlers, not
OOMs).

It does look like it tries to always return the original string in case of
OOM. Although the code is quite maze-like, so it's not easy to tell..

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Out-of-memory error reports in libpq

Andres Freund <andres@anarazel.de> writes:

I should probably know this, but I don't. Nor did I quickly find an answer. I
assume gettext() reliably and reasonably deals with OOM?

I've always assumed that their fallback in cases of OOM, can't read
the message file, yadda yadda is to return the original string.
I admit I haven't gone and checked their code, but it'd be
unbelievably stupid to do otherwise.

Looking in the gettext code I'm again scared by the fact that it takes locks
during gettext (because of stuff like erroring out of signal handlers, not
OOMs).

Hm.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Out-of-memory error reports in libpq

Andres Freund <andres@anarazel.de> writes:

Hm. It seems we should be able to guarantee that the recovery path can print
something, at least in the PGconn case. Is it perhaps worth pre-sizing
PGConn->errorMessage so it'd fit an error like this?

Forgot to address this. Right now, the normal situation is that
PGConn->errorMessage is "pre sized" to 256 bytes, because that's
what pqexpbuffer.c does for all PQExpBuffers. So unless you've
overrun that, the question is moot. If you have, and you got
an OOM in trying to expand the PQExpBuffer, then pqexpbuffer.c
will release what it has and substitute the "oom_buffer" empty
string. If you're really unlucky you might then not be able
to allocate another 256-byte buffer, in which case we end up
with an empty-string result. I don't think it's probable,
but in a multithread program it could happen.

But perhaps that's more effort than it's worth.

Yeah. I considered changing things so that oom_buffer contains
"out of memory\n" rather than an empty string, but I'm afraid
that that's making unsupportable assumptions about what PQExpBuffers
are used for.

For now, I'm content if it's not worse than v13. We've not
heard a lot of complaints in this area.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: Out-of-memory error reports in libpq

I wrote:

Andres Freund <andres@anarazel.de> writes:

Hm. It seems we should be able to guarantee that the recovery path can print
something, at least in the PGconn case. Is it perhaps worth pre-sizing
PGConn->errorMessage so it'd fit an error like this?
But perhaps that's more effort than it's worth.

Yeah. I considered changing things so that oom_buffer contains
"out of memory\n" rather than an empty string, but I'm afraid
that that's making unsupportable assumptions about what PQExpBuffers
are used for.

Actually, wait a minute. There are only a couple of places that ever
read out the value of conn->errorMessage, so let's make those places
responsible for dealing with OOM scenarios. That leads to a nicely
small patch, as attached, and it looks to me like it makes us quite
bulletproof against such scenarios.

It might still be worth doing the "pqReportOOM" changes to save a
few bytes of code space, but I'm less excited about that now.

regards, tom lane

Attachments:

improve-libpq-OOM-reporting-2.patchtext/x-diff; charset=us-ascii; name=improve-libpq-OOM-reporting-2.patchDownload+29-7
#13Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#12)
Re: Out-of-memory error reports in libpq

Em qua., 28 de jul. de 2021 às 21:25, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

I wrote:

Andres Freund <andres@anarazel.de> writes:

Hm. It seems we should be able to guarantee that the recovery path can

print

something, at least in the PGconn case. Is it perhaps worth pre-sizing
PGConn->errorMessage so it'd fit an error like this?
But perhaps that's more effort than it's worth.

Yeah. I considered changing things so that oom_buffer contains
"out of memory\n" rather than an empty string, but I'm afraid
that that's making unsupportable assumptions about what PQExpBuffers
are used for.

Actually, wait a minute. There are only a couple of places that ever
read out the value of conn->errorMessage, so let's make those places
responsible for dealing with OOM scenarios. That leads to a nicely
small patch, as attached, and it looks to me like it makes us quite
bulletproof against such scenarios.

It might still be worth doing the "pqReportOOM" changes to save a
few bytes of code space, but I'm less excited about that now.

IMO, I think that "char *msg" is unnecessary, isn't it?

+ if (!PQExpBufferBroken(errorMessage))
+ res->errMsg = pqResultStrdup(res, errorMessage->data);
  else
- res->errMsg = NULL;
+ res->errMsg = libpq_gettext("out of memory\n");
Show quoted text

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#13)
Re: Out-of-memory error reports in libpq

Ranier Vilela <ranier.vf@gmail.com> writes:

IMO, I think that "char *msg" is unnecessary, isn't it?

+ if (!PQExpBufferBroken(errorMessage))
+ res->errMsg = pqResultStrdup(res, errorMessage->data);
else
- res->errMsg = NULL;
+ res->errMsg = libpq_gettext("out of memory\n");

Please read the comment.

regards, tom lane

#15Peter Smith
smithpb2250@gmail.com
In reply to: Tom Lane (#14)
Re: Out-of-memory error reports in libpq

(This is not a code review - this is just to satisfy my curiosity)

I've seen lots of code like this where I may have been tempted to use
a ternary operator for readability, so I was wondering is there a PG
convention to avoid such ternary operator assignments, or is it simply
a personal taste thing, or is there some other reason?

For example:

if (msg)
res->errMsg = msg;
else
res->errMsg = libpq_gettext("out of memory\n");

VERSUS:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

------
Kind Regards,
Peter Smith.
Fujitsu Australia

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Peter Smith (#15)
Re: Out-of-memory error reports in libpq

On 7/29/21 3:01 AM, Peter Smith wrote:

(This is not a code review - this is just to satisfy my curiosity)

I've seen lots of code like this where I may have been tempted to use
a ternary operator for readability, so I was wondering is there a PG
convention to avoid such ternary operator assignments, or is it simply
a personal taste thing, or is there some other reason?

For example:

if (msg)
res->errMsg = msg;
else
res->errMsg = libpq_gettext("out of memory\n");

VERSUS:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

A simple grep on the sources should disabuse you of any idea that there
is such a convention. The code is littered with examples of the ?: operator.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#17Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#14)
Re: Out-of-memory error reports in libpq

Em qui., 29 de jul. de 2021 às 00:40, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

IMO, I think that "char *msg" is unnecessary, isn't it?

+ if (!PQExpBufferBroken(errorMessage))
+ res->errMsg = pqResultStrdup(res, errorMessage->data);
else
- res->errMsg = NULL;
+ res->errMsg = libpq_gettext("out of memory\n");

Please read the comment.

You're right, I missed pqResultStrdup fail.

+1

regards,
Ranier Vilela

#18Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Smith (#15)
Re: Out-of-memory error reports in libpq

Em qui., 29 de jul. de 2021 às 04:02, Peter Smith <smithpb2250@gmail.com>
escreveu:

(This is not a code review - this is just to satisfy my curiosity)

I've seen lots of code like this where I may have been tempted to use
a ternary operator for readability, so I was wondering is there a PG
convention to avoid such ternary operator assignments, or is it simply
a personal taste thing, or is there some other reason?

For example:

if (msg)
res->errMsg = msg;
else
res->errMsg = libpq_gettext("out of memory\n");

The C compiler will expand:

res->errMsg = msg ? msg : libpq_gettext("out of memory\n");

to

if (msg)
res->errMsg = msg;
else
res->errMsg = libpq_gettext("out of memory\n");

What IMHO is much more readable.

regards,
Ranier Vilela

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: Out-of-memory error reports in libpq

Andrew Dunstan <andrew@dunslane.net> writes:

On 7/29/21 3:01 AM, Peter Smith wrote:

I've seen lots of code like this where I may have been tempted to use
a ternary operator for readability, so I was wondering is there a PG
convention to avoid such ternary operator assignments, or is it simply
a personal taste thing, or is there some other reason?

A simple grep on the sources should disabuse you of any idea that there
is such a convention. The code is littered with examples of the ?: operator.

Yeah. I happened not to write it that way here, but if I'd been reviewing
someone else's code and they'd done it that way, I'd not have objected.

In the case at hand, I'd personally avoid a ternary op for the first
assignment because then the line would run over 80 characters, and
you'd have to make decisions about where to break it. (We don't have
a standardized convention about that, and none of the alternatives
look very good to my eye.) Then it seemed to make sense to also
write the second step as an "if" not a ternary op.

regards, tom lane

#20Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
Re: Out-of-memory error reports in libpq

On Thu, Jul 29, 2021 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In the case at hand, I'd personally avoid a ternary op for the first
assignment because then the line would run over 80 characters, and
you'd have to make decisions about where to break it. (We don't have
a standardized convention about that, and none of the alternatives
look very good to my eye.)

This is exactly why I rarely use ?:

--
Robert Haas
EDB: http://www.enterprisedb.com