Make PQgetResult() not return NULL on out-of-memory error

Started by Yugo Nagata4 months ago12 messages

nagata@sraoss.co.jp

4 months ago

Hi,

Currently, PQgetResult() returns NULL not only when no results remain for
a sent query, but also when an out-of-memory error occurs, except when
PGconn itself is NULL. As a result, users cannot distinguish between query
completion and an out-of-memory error when PQgetResult() returns NULL.

The result returned by PQgetResult() is generated by either pqPipelineProcessQueue()
or getCopyResult(). While pqPipelineProcessQueue() never returns NULL, even in the
case of an out-of-memory error, getCopyResult() may return NULL.
Therefore, I propose modifying getCopyResult() so that it never returns NULL, but
instead returns OOM_result, as pqPipelineProcessQueue() does.

I’ve attached a patch for this.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Chao Li

li.evan.chao@gmail.com

4 months ago

In reply to: Yugo Nagata (#1)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Nov 11, 2025, at 01:07, Yugo Nagata <nagata@sraoss.co.jp> wrote:

Hi,

Currently, PQgetResult() returns NULL not only when no results remain for
a sent query, but also when an out-of-memory error occurs, except when
PGconn itself is NULL. As a result, users cannot distinguish between query
completion and an out-of-memory error when PQgetResult() returns NULL.

The result returned by PQgetResult() is generated by either pqPipelineProcessQueue()
or getCopyResult(). While pqPipelineProcessQueue() never returns NULL, even in the
case of an out-of-memory error, getCopyResult() may return NULL.
Therefore, I propose modifying getCopyResult() so that it never returns NULL, but
instead returns OOM_result, as pqPipelineProcessQueue() does.

I’ve attached a patch for this.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>
<0001-Make-PQgetResult-not-return-NULL-on-out-of-memory-er.patch>

This is a front-end OOM. As it changes the behavior of PGgetResult(), I am just worrying about if callers can handle the behavior change.

For example, pgbench:

* When pgbench calls readCommandResponse()
* If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR
* readCommandResponse() will goto the error tag, then discardAvailableResults() will be called
* discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infinite loop?

Otherwise the patch looks good to me. I agree that is a good idea to distinguish a client OOM error from a server side error.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Fujii Masao

masao.fujii@gmail.com

4 months ago

In reply to: Chao Li (#2)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Tue, Nov 11, 2025 at 12:12 PM Chao Li <li.evan.chao@gmail.com> wrote:

For example, pgbench:

* When pgbench calls readCommandResponse()
* If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR
* readCommandResponse() will goto the error tag, then discardAvailableResults() will be called
* discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infinite loop?

Yes, that's a valid concern. If PQgetResult() keeps returning OOM_result due to
an out-of-memory error, some functions (e.g., discardAvailableResults()
in pgbench.c or PQexecFinish() in fe-exec.c) that expect PQgetResult() to
eventually return NULL could end up in an infinite loop.

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.
For example, one possible solution would be to add PGRES_OOM to
ExecStatusType, allowing callers to detect out-of-memory errors with
PQresultStatus() == PGRES_OOM.

Currently (even with the patch), OOM_result is returned only when
PQmakeEmptyPGresult() fails to allocate memory. But I guess there might be
other allocation failure paths. If we explicitly expose OOM_result,
we may also need to update PQgetResult() to return it in those other cases.
I'm not sure if that's the right direction, though... Probably we need to
look into the original commit that introduced OOM_result and its
related discussion.

Regards,

--
Fujii Masao

Tom Lane

tgl@sss.pgh.pa.us

4 months ago

In reply to: Fujii Masao (#3)

Re: Make PQgetResult() not return NULL on out-of-memory error

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

regards, tom lane

Fujii Masao

masao.fujii@gmail.com

4 months ago

In reply to: Fujii Masao (#3)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Tue, Nov 11, 2025 at 2:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Nov 11, 2025 at 12:12 PM Chao Li <li.evan.chao@gmail.com> wrote:

For example, pgbench:

* When pgbench calls readCommandResponse()
* If OOM happens, PQgetResult() returns OOM_reslt whose resultState is PGRES_FATAL_ERROR
* readCommandResponse() will goto the error tag, then discardAvailableResults() will be called
* discardAvailableResults() only returns when res is NULL, so that, would discardAvailableResults() fall into an infinite loop?

Yes, that's a valid concern. If PQgetResult() keeps returning OOM_result due to
an out-of-memory error, some functions (e.g., discardAvailableResults()
in pgbench.c or PQexecFinish() in fe-exec.c) that expect PQgetResult() to
eventually return NULL could end up in an infinite loop.

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.
For example, one possible solution would be to add PGRES_OOM to
ExecStatusType, allowing callers to detect out-of-memory errors with
PQresultStatus() == PGRES_OOM.

This approach might not be good. Many applications currently would expect
PQgetResult() to return PGRES_FATAL_ERROR for any fatal error,
including out-of-memory. Introducing a new result status like PGRES_OOM
could break those applications, since they would need to be updated to
handle this new status explicitly.

Regards,

--
Fujii Masao

Fujii Masao

masao.fujii@gmail.com

4 months ago

In reply to: Tom Lane (#4)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult()
returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
even in OOM case, so functions looping until NULL won't end up in an
infinite loop.
Of course, issuing another query on the same connection afterward could be
problematic, though.

I'm still not sure this behavior is correct, but I'm just wondering if
asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned.

Regards,

--
Fujii Masao

Yugo Nagata

nagata@sraoss.co.jp

4 months ago

In reply to: Fujii Masao (#6)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Tue, 11 Nov 2025 20:16:11 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult()
returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
even in OOM case, so functions looping until NULL won't end up in an
infinite loop.
Of course, issuing another query on the same connection afterward could be
problematic, though.

I'm still not sure this behavior is correct, but I'm just wondering if
asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned.

I agree that PQgetResult() should return NULL after returning OOM_result, and
resetting asyncStatus to PGASYNC_IDLE would work in the normal mode.

I also agree that continuing to use the same connection seems problematic,
especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult()
never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue.

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

This would prevent applications from continueing to use potentially problamatic
connection without providing the way to distinguish OOM from other errors.

What do you think?

I've attached an updated patch in this direction.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Yugo Nagata

nagata@sraoss.co.jp

4 months ago

In reply to: Yugo Nagata (#7)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Wed, 12 Nov 2025 16:20:13 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

On Tue, 11 Nov 2025 20:16:11 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult()
returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
even in OOM case, so functions looping until NULL won't end up in an
infinite loop.
Of course, issuing another query on the same connection afterward could be
problematic, though.

I'm still not sure this behavior is correct, but I'm just wondering if
asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned.

I agree that PQgetResult() should return NULL after returning OOM_result, and
resetting asyncStatus to PGASYNC_IDLE would work in the normal mode.

I also agree that continuing to use the same connection seems problematic,
especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult()
never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue.

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

This would prevent applications from continueing to use potentially problamatic
connection without providing the way to distinguish OOM from other errors.

What do you think?

I've attached an updated patch in this direction.

I'm sorry, the patch attached to my previous post was incorrect.
I've attached the correct one.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Chao Li

li.evan.chao@gmail.com

4 months ago

In reply to: Yugo Nagata (#7)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Nov 12, 2025, at 15:20, Yugo Nagata <nagata@sraoss.co.jp> wrote:

On Tue, 11 Nov 2025 20:16:11 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult()
returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
even in OOM case, so functions looping until NULL won't end up in an
infinite loop.
Of course, issuing another query on the same connection afterward could be
problematic, though.

I'm still not sure this behavior is correct, but I'm just wondering if
asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned.

I agree that PQgetResult() should return NULL after returning OOM_result, and
resetting asyncStatus to PGASYNC_IDLE would work in the normal mode.

I also agree that continuing to use the same connection seems problematic,
especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult()
never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue.

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

This would prevent applications from continueing to use potentially problamatic
connection without providing the way to distinguish OOM from other errors.

What do you think?

I've attached an updated patch in this direction.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>
<v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch>

Here, the OOM condition is a client-side problem that’s generally beyond libpq’s control. When an out-of-memory error occurs, it usually isn’t caused by a memory leak inside libpq itself. After OOM, any operation that requires additional allocation may fail — even something as simple as writing a log message. Closing the current connection might release some memory, but that memory could be immediately consumed by other processes, so further actions might still fail. In such cases, simply aborting the process can be a reasonable and straightforward strategy.

However, in many cases the libpq client is an application server, where forcibly aborting the entire process is not acceptable. From that perspective, closing the affected connection is a more appropriate response. The application should detect the broken database connection and attempt to reconnect. If the OOM condition persists and reconnection continues to fail, the application’s own recovery and failover logic should take over.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#10

Fujii Masao

masao.fujii@gmail.com

4 months ago

In reply to: Chao Li (#9)

Re: Make PQgetResult() not return NULL on out-of-memory error

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

There are many code paths in libpq that can trigger an out-of-memory error.
However, this patch makes libpq drop the connection when OOM_result is returned,
which likely covers only a subset of those cases. So I'm not sure dropping
the connection only for OOM_result would meaningfully improve things.

Perhaps it would be more consistent to drop the connection for any
internal libpq error
such as out-of-memory, though that might require invasive code changes and coul
break existing applications, I'm afraid.

Regards,

--
Fujii Masao

#11

Yugo Nagata

nagata@sraoss.co.jp

4 months ago

In reply to: Fujii Masao (#10)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Wed, 12 Nov 2025 18:09:28 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

There are many code paths in libpq that can trigger an out-of-memory error.
However, this patch makes libpq drop the connection when OOM_result is returned,
which likely covers only a subset of those cases. So I'm not sure dropping
the connection only for OOM_result would meaningfully improve things.

Perhaps it would be more consistent to drop the connection for any
internal libpq error
such as out-of-memory, though that might require invasive code changes and coul
break existing applications, I'm afraid.

I see that some internal errors are handled by calling handleFatalError()
(i.e. dropping the connection), but for some OOM errors in libpq/fe-protocol3.c,
functions just fail silently. Therefore, I'm not sure if dropping the connection
for any internal libpq errors is good idea.

If an OOM during the preparation of a result of PGgetResult() is not considered
a fatal error should terminate the connection, then is it the application's
responsibility to determine whetehr the connection can continue to be used?
In that case, a way to detect OOM failure (e.g. PQresultIsOutOrMemory()
or something similar) should be provided.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#12

Yugo Nagata

nagata@sraoss.co.jp

4 months ago

In reply to: Chao Li (#9)

Re: Make PQgetResult() not return NULL on out-of-memory error

On Wed, 12 Nov 2025 16:40:34 +0800
Chao Li <li.evan.chao@gmail.com> wrote:

On Nov 12, 2025, at 15:20, Yugo Nagata <nagata@sraoss.co.jp> wrote:

On Tue, 11 Nov 2025 20:16:11 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Nov 11, 2025 at 2:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

To address this, callers need a way to distinguish between PGRES_FATAL_ERROR
and OOM. Functions that loop until PQgetResult() returns NULL should continue
if the result is PGRES_FATAL_ERROR, but should break if it's an OOM.

Not sure about that. We might or might not be able to make progress
if the caller keeps calling PQgetResult. But if it stops after an
OOM report then we might as well just close the connection, because
there is no hope of getting back in sync.

I'm inclined to think that it's correct that we should return
OOM_result not NULL if we couldn't make a PGresult, but further
analysis is needed to make sure that libpq can make progress
afterwards. I don't think we should expect applications to
involve themselves in that recovery logic, because they won't,
and most certainly won't test it carefully.

The whole project might be doomed to failure really. I think
that one very likely failure if we are approaching OOM is that
we can't enlarge libpq's input buffer enough to accept all of
a (long) incoming server message. What hope have we got of
getting out of that situation?

When pqCheckInBufferSpace() fails to enlarge the input buffer and PQgetResult()
returns PGRES_FATAL_ERROR, I noticed that PQgetResult() sets asyncStatus to
PGASYNC_IDLE. This means subsequent calls to PQgetResult() will return NULL
even in OOM case, so functions looping until NULL won't end up in an
infinite loop.
Of course, issuing another query on the same connection afterward could be
problematic, though.

I'm still not sure this behavior is correct, but I'm just wondering if
asyncStatus should be reset to PGASYNC_IDLE also when OOM_result is returned.

I agree that PQgetResult() should return NULL after returning OOM_result, and
resetting asyncStatus to PGASYNC_IDLE would work in the normal mode.

I also agree that continuing to use the same connection seems problematic,
especially in pipeline mode; the pipeline status remains PQ_PIPELINE_OK, PQgetResult()
never returns PGRES_PIPELINE_SYNC, and the sent commands are left in the command queue.

Therefore, I wonder about closing the connection and resetting the status
when OOM_result is retunred, by callling pqDropConnection() as handleFatalError() does.
In this case, the connection status should also be set to CONNECTINO_BAD.

This would prevent applications from continueing to use potentially problamatic
connection without providing the way to distinguish OOM from other errors.

What do you think?

I've attached an updated patch in this direction.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>
<v2-0001-Make-PQgetResult-not-return-NULL-on-out-of-memory.patch>

Here, the OOM condition is a client-side problem that’s generally beyond libpq’s control. When an out-of-memory error occurs, it usually isn’t caused by a memory leak inside libpq itself. After OOM, any operation that requires additional allocation may fail — even something as simple as writing a log message. Closing the current connection might release some memory, but that memory could be immediately consumed by other processes, so further actions might still fail. In such cases, simply aborting the process can be a reasonable and straightforward strategy.

However, in many cases the libpq client is an application server, where forcibly aborting the entire process is not acceptable. From that perspective, closing the affected connection is a more appropriate response. The application should detect the broken database connection and attempt to reconnect. If the OOM condition persists and reconnection continues to fail, the application’s own recovery and failover logic should take over.

What I meant is that closing the connection on OOM errors is intended to prevent
applications from continuing to use a potentially problematic connection, rather
than to release memory.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Make PQgetResult() not return NULL on out-of-memory error

Attachments:

Attachments:

Attachments: