Probable memory leak with ECPG and AIX

Started by Guillaume Lelargeover 4 years ago20 messageshackers
Jump to latest
#1Guillaume Lelarge
guillaume@lelarge.info

Hello,

Our customer thinks he has found a memory leak on ECPG and AIX.

The code is quite simple. It declares a cursor, opens it, and fetches the
only line available in the table many times. After some time, the client
crashes with a segfault error. According to him, it consumed around 256MB.
What's weird is that it works great on Linux, but crashed on AIX. One
coworker thought it could be the compiler. Our customer used cc, but he
also tried with gcc, and got the same error.

The test case is attached (testcase.pgc is the ECPG code, testcase.sh is
what our customer used to precompile and compile his code). Do you have any
idea why that happens on AIX?

Two queries to create the table and populate it with a single record:

CREATE TABLE foo(
key integer PRIMARY KEY,
value character varying(20)
);
INSERT INTO foo values (1, 'one');

Thanks.

Regards.

--
Guillaume.

Attachments:

testcase.pgcapplication/octet-stream; name=testcase.pgcDownload
testcase.shapplication/x-shellscript; name=testcase.shDownload
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Guillaume Lelarge (#1)
Re: Probable memory leak with ECPG and AIX

On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:

Hello,

Our customer thinks he has found a memory leak on ECPG and AIX.

The code is quite simple. It declares a cursor, opens it, and fetches the
only line available in the table many times. After some time, the client
crashes with a segfault error. According to him, it consumed around 256MB.
What's weird is that it works great on Linux, but crashed on AIX. One
coworker thought it could be the compiler. Our customer used cc, but he
also tried with gcc, and got the same error.

A memory leak isn't the same as a segfault (although I don't know how AIX
responds to OOM).

Can you show that it's a memory leak ? Show RAM use increasing continuously
and linearly with loop count.

How many loops does it take to crash ?

Could you obtain a backtrace ?

--
Justin

#3Guillaume Lelarge
guillaume@lelarge.info
In reply to: Justin Pryzby (#2)
Re: Probable memory leak with ECPG and AIX

Le sam. 11 déc. 2021 à 07:52, Justin Pryzby <pryzby@telsasoft.com> a écrit :

On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:

Hello,

Our customer thinks he has found a memory leak on ECPG and AIX.

The code is quite simple. It declares a cursor, opens it, and fetches the
only line available in the table many times. After some time, the client
crashes with a segfault error. According to him, it consumed around

256MB.

What's weird is that it works great on Linux, but crashed on AIX. One
coworker thought it could be the compiler. Our customer used cc, but he
also tried with gcc, and got the same error.

A memory leak isn't the same as a segfault (although I don't know how AIX
responds to OOM).

Can you show that it's a memory leak ? Show RAM use increasing
continuously
and linearly with loop count.

How many loops does it take to crash ?

Could you obtain a backtrace ?

Thanks. I'll try to get all these informations, but it won't be before
monday.

--
Guillaume.

#4Noah Misch
noah@leadboat.com
In reply to: Guillaume Lelarge (#1)
Re: Probable memory leak with ECPG and AIX

On Fri, Dec 10, 2021 at 03:40:50PM +0100, Guillaume Lelarge wrote:

After some time, the client
crashes with a segfault error. According to him, it consumed around 256MB.
What's weird is that it works great on Linux, but crashed on AIX.

That almost certainly means he's using a 32-bit binary with the default heap
size. To use more heap on AIX, build 64-bit or override the heap size. For
example, "env LDR_CNTRL=MAXDATA=0x80000000 ./a.out" gives 2GiB of heap. See
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
for more ways to control heap size. While that documentation focuses on the
server, the same techniques apply to clients like your test program.

That said, I don't know why your test program reaches 256MB on AIX. On
GNU/Linux, it uses a lot less. What version of PostgreSQL provided your
client libraries?

#5talk to ben
blo.talkto@gmail.com
In reply to: Noah Misch (#4)
Re: Probable memory leak with ECPG and AIX

Hi,

(I work with Guillaume on this case.)

On Sun, Dec 12, 2021 at 8:34 AM Noah Misch <noah@leadboat.com> wrote:

That almost certainly means he's using a 32-bit binary with the default
heap
size. To use more heap on AIX, build 64-bit or override the heap size.
For
example, "env LDR_CNTRL=MAXDATA=0x80000000 ./a.out" gives 2GiB of heap.
See

https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
for more ways to control heap size. While that documentation focuses on
the
server, the same techniques apply to clients like your test program.

That said, I don't know why your test program reaches 256MB on AIX. On
GNU/Linux, it uses a lot less. What version of PostgreSQL provided your
client libraries?

They use a 12.3 in production but have also tested on a 12.9 with the same
result.
We relayed your suggestion and will get back to you with the results.

Thanks for the input !

#6Benoit Lobréau
benoit.lobreau@gmail.com
In reply to: talk to ben (#5)
Re: Probable memory leak with ECPG and AIX

Our client confirmed that the more he fetches the more memory is consumed.
The segfault was indeed caused by the absence of LDR_CNTRL.

The tests show that:

* without LDR_CNTRL, we reach 256Mb and segfault ;
* with LDR_CNTRL=MAXDATA=0x10000000, we reach 256Mo but there is no
segfault, the program just continues running ;
* with LDR_CNTRL=MAXDATA=0x80000000, we reach 2Go and there is no segfault
either, the program just continues running.

#7Noah Misch
noah@leadboat.com
In reply to: Benoit Lobréau (#6)
Re: Probable memory leak with ECPG and AIX

On Wed, Dec 15, 2021 at 04:20:42PM +0100, Benoit Lobr�au wrote:

* with LDR_CNTRL=MAXDATA=0x10000000, we reach 256Mo but there is no
segfault, the program just continues running ;
* with LDR_CNTRL=MAXDATA=0x80000000, we reach 2Go and there is no segfault
either, the program just continues running.

I get the same results. The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale(). The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

#include <locale.h>
int main(int argc, char **argv)
{
while (1)
freelocale(newlocale(LC_NUMERIC_MASK, "C", (locale_t) 0));
return 0;
}

If you have access to file an AIX bug, I recommend doing so. If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often. For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#7)
Re: Probable memory leak with ECPG and AIX

Noah Misch <noah@leadboat.com> writes:

I get the same results. The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale(). The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

Bleah.

If you have access to file an AIX bug, I recommend doing so. If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often. For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

It's worse than that: see also ECPGget_desc(). Seems like a case
could be made for doing something about this just on the basis
of cycles expended, never mind freelocale() bugs.

regards, tom lane

#9Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#8)
Re: Probable memory leak with ECPG and AIX

On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

I get the same results. The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale(). The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

Bleah.

If you have access to file an AIX bug, I recommend doing so. If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often. For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

It's worse than that: see also ECPGget_desc(). Seems like a case
could be made for doing something about this just on the basis
of cycles expended, never mind freelocale() bugs.

Agreed. Once per process seems best. I only hesitated before since it means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar. However, ECPG already has bits of never-freed memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.

#10Guillaume Lelarge
guillaume@lelarge.info
In reply to: Noah Misch (#9)
Re: Probable memory leak with ECPG and AIX

Le dim. 2 janv. 2022 à 01:07, Noah Misch <noah@leadboat.com> a écrit :

On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

I get the same results. The leak arises because AIX freelocale()

doesn't free

all memory allocated in newlocale(). The following program uses

trivial

memory on GNU/Linux, but it leaks like you're seeing on AIX:

Bleah.

If you have access to file an AIX bug, I recommend doing so. If we

want

PostgreSQL to work around this, one idea is to have ECPG do this

newlocale()

less often. For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

It's worse than that: see also ECPGget_desc(). Seems like a case
could be made for doing something about this just on the basis
of cycles expended, never mind freelocale() bugs.

Agreed. Once per process seems best. I only hesitated before since it
means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar. However, ECPG already has bits of never-freed
memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.

Did this get anywhere? Is there something we could do to make this move
forward?

--
Guillaume.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Lelarge (#10)
Re: Probable memory leak with ECPG and AIX

Guillaume Lelarge <guillaume@lelarge.info> writes:

Did this get anywhere? Is there something we could do to make this move
forward?

No. Write a patch?

regards, tom lane

#12Guillaume Lelarge
guillaume@lelarge.info
In reply to: Tom Lane (#11)
Re: Probable memory leak with ECPG and AIX

Le ven. 25 mars 2022, 14:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

Guillaume Lelarge <guillaume@lelarge.info> writes:

Did this get anywhere? Is there something we could do to make this move
forward?

No. Write a patch?

I wouldn't have asked if I could write such a patch :-)

#13Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#9)
Re: Probable memory leak with ECPG and AIX

On Sat, Jan 01, 2022 at 04:07:50PM -0800, Noah Misch wrote:

On Sat, Jan 01, 2022 at 11:35:02AM -0500, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

I get the same results. The leak arises because AIX freelocale() doesn't free
all memory allocated in newlocale(). The following program uses trivial
memory on GNU/Linux, but it leaks like you're seeing on AIX:

Bleah.

If you have access to file an AIX bug, I recommend doing so. If we want
PostgreSQL to work around this, one idea is to have ECPG do this newlocale()
less often. For example, do it once per process or once per connection
instead of once per ecpg_do_prologue().

It's worse than that: see also ECPGget_desc(). Seems like a case
could be made for doing something about this just on the basis
of cycles expended, never mind freelocale() bugs.

Agreed. Once per process seems best. I only hesitated before since it means
nothing will free this storage, which could be annoying in the context of
Valgrind and similar. However, ECPG already has bits of never-freed memory in
the form of pthread_key_create() calls having no pthread_key_delete(), so I
don't mind adding a bit more.

The comparison to pthread_key_create() wasn't completely fair. While POSIX
welcomes pthread_key_create() to fail with ENOMEM, the glibc implementation
appears not to allocate memory. Even so, I'm okay leaking one newlocale() per
process lifetime.

I had expected to use pthread_once() for the newlocale() call, but there would
be no useful way to report failure and try again later. Instead, I called
newlocale() while ECPGconnect() holds connections_mutex. See log message and
comments for details. I tested "./configure ac_cv_func_uselocale=no ..." and
tested the scenario of newlocale() failing every time.

Attachments:

newlocale-once-ecpg-v1.patchtext/plain; charset=us-asciiDownload+66-29
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#13)
Re: Probable memory leak with ECPG and AIX

Noah Misch <noah@leadboat.com> writes:

I had expected to use pthread_once() for the newlocale() call, but there would
be no useful way to report failure and try again later. Instead, I called
newlocale() while ECPGconnect() holds connections_mutex. See log message and
comments for details. I tested "./configure ac_cv_func_uselocale=no ..." and
tested the scenario of newlocale() failing every time.

This looks solid to me. The only nit I can find to pick is that I'd
have added one more comment, along the lines of

diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 9f958b822c..96f99ae072 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 #ifdef ENABLE_THREAD_SAFETY
 	pthread_mutex_lock(&connections_mutex);
 #endif
+
+	/*
+	 * ... but first, make certain we have created ecpg_clocale.  Rely on
+	 * holding connections_mutex to ensure this is done by only one thread.
+	 */
 #ifdef HAVE_USELOCALE
 	if (!ecpg_clocale)
 	{

I've marked it RFC.

regards, tom lane

#15Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#14)
Re: Probable memory leak with ECPG and AIX

On Sat, Jul 02, 2022 at 02:53:34PM -0400, Tom Lane wrote:

This looks solid to me. The only nit I can find to pick is that I'd
have added one more comment, along the lines of

diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c
index 9f958b822c..96f99ae072 100644
--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
#ifdef ENABLE_THREAD_SAFETY
pthread_mutex_lock(&connections_mutex);
#endif
+
+	/*
+	 * ... but first, make certain we have created ecpg_clocale.  Rely on
+	 * holding connections_mutex to ensure this is done by only one thread.
+	 */
#ifdef HAVE_USELOCALE
if (!ecpg_clocale)
{

I've marked it RFC.

Thanks for reviewing. Pushed with that comment. prairiedog complains[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2022-07-03%2001%3A14%3A19:

ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
connect.o definition of common _ecpg_clocale (size 4)

I bet this would fix it:

--- a/src/interfaces/ecpg/ecpglib/connect.c
+++ b/src/interfaces/ecpg/ecpglib/connect.c
@@ -11,7 +11,7 @@
 #include "sqlca.h"
 #ifdef HAVE_USELOCALE
-locale_t	ecpg_clocale;
+locale_t	ecpg_clocale = (locale_t) 0;
 #endif

#ifdef ENABLE_THREAD_SAFETY

I hear[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2022-07-03%2001%3A14%3A19 adding -fno-common to compiler options would also fix that. Still,
in the absence of other opinions, I'll just add the no-op initialization.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2022-07-03%2001%3A14%3A19
[2]: https://gcc.gnu.org/legacy-ml/gcc/2005-06/msg00378.html

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#15)
Re: Probable memory leak with ECPG and AIX

Noah Misch <noah@leadboat.com> writes:

Thanks for reviewing. Pushed with that comment. prairiedog complains[1]:
ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
connect.o definition of common _ecpg_clocale (size 4)

Blah.

I bet this would fix it:

-locale_t	ecpg_clocale;
+locale_t	ecpg_clocale = (locale_t) 0;

Hmm, I was considering suggesting that just on stylistic grounds,
but decided it was too nitpicky even for me.
Do you want me to test it on prairiedog?

I hear[1] adding -fno-common to compiler options would also fix that.

I've got -fno-common turned on on my other macOS animals, but in
those cases I did it to detect bugs not fix them. I'm not sure
whether prairiedog's ancient toolchain has that switch at all,
or whether it behaves the same as in more recent platforms.
Still, that gcc.gnu.org message you cite is of the right era.

regards, tom lane

#17Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#16)
Re: Probable memory leak with ECPG and AIX

On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

Thanks for reviewing. Pushed with that comment. prairiedog complains[1]:
ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option
connect.o definition of common _ecpg_clocale (size 4)

Blah.

I bet this would fix it:

-locale_t	ecpg_clocale;
+locale_t	ecpg_clocale = (locale_t) 0;

Hmm, I was considering suggesting that just on stylistic grounds,
but decided it was too nitpicky even for me.
Do you want me to test it on prairiedog?

Sure, if it's easy enough. If not, I'm 87% sure it will suffice.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#17)
Re: Probable memory leak with ECPG and AIX

Noah Misch <noah@leadboat.com> writes:

On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote:

Do you want me to test it on prairiedog?

Sure, if it's easy enough. If not, I'm 87% sure it will suffice.

On it now, but it'll take a few minutes :-(

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#18)
Re: Probable memory leak with ECPG and AIX

I wrote:

On it now, but it'll take a few minutes :-(

Confirmed that either initializing ecpg_clocale or adding -fno-common
allows the ecpglib build to succeed. (Testing it beyond that would
require another hour or so to build the rest of the system, so I won't.)

As I said, I was already leaning to the idea that initializing the
variable explicitly is better style, so I recommend we do that.

regards, tom lane

#20Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#19)
Re: Probable memory leak with ECPG and AIX

On Sat, Jul 02, 2022 at 11:59:58PM -0400, Tom Lane wrote:

Confirmed that either initializing ecpg_clocale or adding -fno-common
allows the ecpglib build to succeed. (Testing it beyond that would
require another hour or so to build the rest of the system, so I won't.)

As I said, I was already leaning to the idea that initializing the
variable explicitly is better style, so I recommend we do that.

Works for me. Pushed that way, and things have been clean.