--enable-thread-safety bug

Started by Steve Clarkabout 18 years ago16 messagesgeneral
Jump to latest
#1Steve Clark
sclark@netwolves.com

Hello List,

I am running 8.3.1 on FreeBSD 6.2 patch-7.

The ports for Freebsd turn on --enable-thread-safety during configure
of pg.

When running my app after some time I have been getting a core dump -
sig 11.

#0 0x28333b96 in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x28333b96 in memcpy () from /lib/libc.so.6
#1 0x280d0122 in ecpg_init_sqlca (sqlca=0x0) at misc.c:100
#2 0x280d0264 in ECPGget_sqlca () at misc.c:145
#3 0x280d056c in ecpg_log (
format=0x280d1d78 "free_params line %d: parameter %d = %s\n") at
misc.c:243
#4 0x280c9758 in free_params (paramValues=0x836fe00, nParams=104,
print=1 '\001',
lineno=3303) at execute.c:1045
#5 0x280c9f08 in ecpg_execute (stmt=0xa726f00) at execute.c:1298
#6 0x280ca978 in ECPGdo (lineno=3303, compat=0, force_indicator=1,
connection_name=0x0, questionmarks=0 '\0', st=0,
query=0x806023c "update T_UNIT_STATUS_LOG set ip_address = $1
:: inet , last_ip_address = $2 :: inet , unit_date = $3 ::
timestamp with time zone , unit_raw_time = $4 , status_date = now
() , unit_ac"...) at execute.c:1636
#7 0x08057a46 in UpdateTUSL (pCachedUnit=0x807b680, msg=0xbfbf8850 "",
p_threshold=80, p_actualIP=0xbfbfe880 "24.39.85.226")
at srm2_monitor_db.pgc:3303
#8 0x0804f174 in main (argc=3, argv=0xbfbf7fc0) at
srm2_monitor_server.c:3265
(gdb) f 2
#2 0x280d0264 in ECPGget_sqlca () at misc.c:145
145 ecpg_init_sqlca(sqlca);
(gdb) p sqlca
$1 = (struct sqlca_t *) 0x0

in looking in the code in misc.c

I see:

struct sqlca_t *
ECPGget_sqlca(void)
{
#ifdef ENABLE_THREAD_SAFETY
struct sqlca_t *sqlca;

pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);

sqlca = pthread_getspecific(sqlca_key);
if (sqlca == NULL)
{
sqlca = malloc(sizeof(struct sqlca_t));
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ecpg_init_sqlca(sqlca);
pthread_setspecific(sqlca_key, sqlca);
}
return (sqlca);
#else
return (&sqlca);
#endif
}

The return from malloc should be checked to make sure it succeeds -
right???

Steve

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Clark (#1)
Re: --enable-thread-safety bug

Steve Clark <sclark@netwolves.com> writes:

The return from malloc should be checked to make sure it succeeds -
right???

Probably, but what do you expect the code to do if it doesn't succeed?
This function seems not to have any defined error-return convention.

regards, tom lane

#3Steve Clark
sclark@netwolves.com
In reply to: Tom Lane (#2)
Re: --enable-thread-safety bug

Tom Lane wrote:

Steve Clark <sclark@netwolves.com> writes:

The return from malloc should be checked to make sure it succeeds -
right???

Probably, but what do you expect the code to do if it doesn't succeed?
This function seems not to have any defined error-return convention.

regards, tom lane

Retry - the malloc - maybe there is a memory leak when
--enable-thread-saftey is enabled,
send an out of memory message to the postgres log, abort the
transaction - I don't know I am
not a postgres developer so I don't know all the issues. I all I know
as a user having a program
like postgres just sig 11 is unacceptable! As a commercial developer
of software for over 30 years
I would never just do nothing.

My $.02
Steve

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Steve Clark (#3)
Re: --enable-thread-safety bug

On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote:

Retry - the malloc - maybe there is a memory leak when
--enable-thread-saftey is enabled,
send an out of memory message to the postgres log, abort the
transaction - I don't know I am
not a postgres developer so I don't know all the issues. I all I know
as a user having a program
like postgres just sig 11 is unacceptable! As a commercial developer
of software for over 30 years
I would never just do nothing.

Note this is your in application, not the server. Only your program
died. Ofcourse the transaction got aborted, since the client (you)
disconnected. There is no way for this to write to the server log,
since it may be one another machine...

As to the issue at hand: it looks like your program ran out of memory.
Can you confirm the memory was running low? Even if it handled it by
returning NULL, the caller will die because it also needs memory.

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#4)
Re: --enable-thread-safety bug

Martijn van Oosterhout <kleptog@svana.org> writes:

Note this is your in application, not the server. Only your program
died. Ofcourse the transaction got aborted, since the client (you)
disconnected. There is no way for this to write to the server log,
since it may be one another machine...

Right. And note that if we don't have enough memory for the struct
that was requested, we *certainly* don't have enough to do anything
interesting. We could try

fprintf(stderr, "out of memory\n");
exit(1);

but even that I would give only about 50-50 odds of success; and more
to the point, how is this any better for an application than a core
dump? It's still summary termination.

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

The OP's program isn't threaded at all, since he was apparently running
with a non-threaded ecpg/libpq before. This means that the proposal of
looping till someone else frees memory is at least as silly as allowing
the core dump to happen.

regards, tom lane

#6Steve Clark
sclark@netwolves.com
In reply to: Martijn van Oosterhout (#4)
Re: --enable-thread-safety bug

Martijn van Oosterhout wrote:

On Sat, Mar 22, 2008 at 11:28:24AM -0400, Steve Clark wrote:

Retry - the malloc - maybe there is a memory leak when
--enable-thread-saftey is enabled,
send an out of memory message to the postgres log, abort the
transaction - I don't know I am
not a postgres developer so I don't know all the issues. I all I know
as a user having a program
like postgres just sig 11 is unacceptable! As a commercial developer
of software for over 30 years
I would never just do nothing.

Note this is your in application, not the server. Only your program
died. Ofcourse the transaction got aborted, since the client (you)
disconnected. There is no way for this to write to the server log,
since it may be one another machine...

As to the issue at hand: it looks like your program ran out of memory.
Can you confirm the memory was running low? Even if it handled it by
returning NULL, the caller will die because it also needs memory.

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

Have a nice day,

My program had no threads - as I pointed out if I change the default
Makefile in the FreeBSD ports
system to not enable thread safety my programs runs just fine for days
on end. It appears to me
without any kind of close examination that there is a memory leak in
the ecpg library when enable
thread safety is turned on.

I had an earlier problem in 8.2.6 where if enable-thread-safety was
turned on sqlca would always be zero
no matter if there was an error or not.

This appears to me to be a problem in the ecpg library when thread
safety is enabled.

Have a nice day.

Steve

#7Steve Clark
sclark@netwolves.com
In reply to: Tom Lane (#5)
Re: --enable-thread-safety bug

Tom Lane wrote:

Martijn van Oosterhout <kleptog@svana.org> writes:

Note this is your in application, not the server. Only your program
died. Ofcourse the transaction got aborted, since the client (you)
disconnected. There is no way for this to write to the server log,
since it may be one another machine...

Right. And note that if we don't have enough memory for the struct
that was requested, we *certainly* don't have enough to do anything
interesting. We could try

fprintf(stderr, "out of memory\n");
exit(1);

but even that I would give only about 50-50 odds of success; and more
to the point, how is this any better for an application than a core
dump? It's still summary termination.

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

The OP's program isn't threaded at all, since he was apparently running
with a non-threaded ecpg/libpq before. This means that the proposal of
looping till someone else frees memory is at least as silly as allowing
the core dump to happen.

regards, tom lane

I guess the real question is why we are running out of memory when
this option is enabled.
Since my app doesn't use threads that points to a memory leak in the
ecpg library when enable thread
safety is turned on.

Steve

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#5)
Re: --enable-thread-safety bug

On Sat, Mar 22, 2008 at 12:42:51PM -0400, Tom Lane wrote:

Do you create and destroy a lot of threads since it seems this memory
won't be freed?

The OP's program isn't threaded at all, since he was apparently running
with a non-threaded ecpg/libpq before. This means that the proposal of
looping till someone else frees memory is at least as silly as allowing
the core dump to happen.

I found an old report where someone found that the get/setspecific
wasn't working and it was allocating a new version of the structure
each time.

http://www.mail-archive.com/pgsql-general@postgresql.org/msg42918.html

That was on Solaris though. It would be instructive to test that by
calling that function multiple times successivly and ensure it's
returning the same addess each time.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#9Michael Meskes
meskes@postgresql.org
In reply to: Steve Clark (#6)
Re: --enable-thread-safety bug

On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:

My program had no threads - as I pointed out if I change the default
Makefile in the FreeBSD ports
system to not enable thread safety my programs runs just fine for days
on end. It appears to me
without any kind of close examination that there is a memory leak in the
ecpg library when enable
thread safety is turned on.

There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
how the program manages to spend so much time allocating memory to eat
all of it. Could you give us some more info about your source code? Do
you use descriptors? Auto allocating?

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

#10Steve Clark
sclark@netwolves.com
In reply to: Michael Meskes (#9)
Re: --enable-thread-safety bug

Michael Meskes wrote:

On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:

My program had no threads - as I pointed out if I change the default
Makefile in the FreeBSD ports
system to not enable thread safety my programs runs just fine for days
on end. It appears to me
without any kind of close examination that there is a memory leak in the
ecpg library when enable
thread safety is turned on.

There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
how the program manages to spend so much time allocating memory to eat
all of it. Could you give us some more info about your source code? Do
you use descriptors? Auto allocating?

Michael

Hi Michael,

Not exactly sure what you are asking about - descriptors and auto
allocating.

The program processes about 800000 packets a day, which can update
several tables.
It runs continously reading udp packets from systems at remote
locations coming in over the internet.

It has a global
exec sql include sqlca;

then a number of functions that get called with each function having
it own

xxx( args,... )
{
EXEC SQL BEGIN DECLARE SECTION;
a bunch of variable
EXEC SQL END DECLARE SECTION;

with various EXEC SQL inserts, updates and selects.
with checks of sqlca.sqlcode to determine if the sql statement succeeded.

}

Steve

#11Steve Clark
sclark@netwolves.com
In reply to: Steve Clark (#10)
Re: --enable-thread-safety bug

Steve Clark wrote:

Michael Meskes wrote:

On Sat, Mar 22, 2008 at 12:51:30PM -0400, Steve Clark wrote:

My program had no threads - as I pointed out if I change the default
Makefile in the FreeBSD ports
system to not enable thread safety my programs runs just fine for days
on end. It appears to me
without any kind of close examination that there is a memory leak in the
ecpg library when enable
thread safety is turned on.

There are just a few variables covered by ENABLE_THREAD_SAFETY. I wonder
how the program manages to spend so much time allocating memory to eat
all of it. Could you give us some more info about your source code? Do
you use descriptors? Auto allocating?

Michael

Hi Michael,

Not exactly sure what you are asking about - descriptors and auto
allocating.

The program processes about 800000 packets a day, which can update
several tables.
It runs continously reading udp packets from systems at remote
locations coming in over the internet.

It has a global
exec sql include sqlca;

then a number of functions that get called with each function having
it own

xxx( args,... )
{
EXEC SQL BEGIN DECLARE SECTION;
a bunch of variable
EXEC SQL END DECLARE SECTION;

with various EXEC SQL inserts, updates and selects.
with checks of sqlca.sqlcode to determine if the sql statement succeeded.

}

Steve

to further illustrate our code below is a typical exec sql statement:
exec sql insert into t_unit_event_log
(event_log_no,
unit_serial_no,
event_type,
event_category,
event_mesg,
event_severity,
event_status,
event_ref_log_no,
event_logged_by,
event_date,
alarm,
last_updated_by,
last_updated_date)
values (nextval('seq_event_log_no'),
:h_serial_no,
'ALERT',
:h_category,
:h_mesg,
:h_sev,
3,
NULL,
current_user,
now(),
:h_alarm,
current_user,
now());

if (sqlca.sqlcode != 0)

{
VARLOG(INFO, LOG_LEVEL_DBG4, "could not insert into
T_UNIT_EVENT_LOG\n");
VARLOG(INFO, LOG_LEVEL_DBG4, "insertTUEL returns %d\n", ret);
return ret;
}

#12Craig Ringer
craig@2ndquadrant.com
In reply to: Steve Clark (#7)
Re: --enable-thread-safety bug

Steve Clark wrote:

I guess the real question is why we are running out of memory when
this option is enabled.
Since my app doesn't use threads that points to a memory leak in the
ecpg library when enable thread
safety is turned on.

It might be worth building ecpg with debug symbols then running your
app, linked to that ecpg, under Valgrind. If you are able to produce
more specific information about how the leak occurs in the context of
your application people here may be more able to help you.

--
Craig Ringer

#13Steve Clark
sclark@netwolves.com
In reply to: Craig Ringer (#12)
Re: --enable-thread-safety bug

Craig Ringer wrote:

Steve Clark wrote:

I guess the real question is why we are running out of memory when
this option is enabled.
Since my app doesn't use threads that points to a memory leak in the
ecpg library when enable thread
safety is turned on.

It might be worth building ecpg with debug symbols then running your
app, linked to that ecpg, under Valgrind. If you are able to produce
more specific information about how the leak occurs in the context of
your application people here may be more able to help you.

--
Craig Ringer

Hi Craig,

I could do that - but in my situation I am not using threads so I
really don't need --enable-thread-safety
turned on. The freebsd ports maintainer for postgresql decided
everybody should have it whether they
needed it or not. I simply deleted the option from the freebsd
makefile rebuilt the port - relinked my app
and no more problem. I just thought the postgresql developers would
want to know there was a bug. If
they don't care to investigate or trouble shoot the bug it is fine by me.

I just find it is interesting that a non-threaded program causes a
memory leak when used with postgres
libraries that are compiled with --enable-thread-safety - doesn't seem
to safe to me.

Have a nice day.

Steve

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Steve Clark (#13)
Re: --enable-thread-safety bug

Steve Clark <sclark@netwolves.com> writes:

I could do that - but in my situation I am not using threads so I
really don't need --enable-thread-safety
turned on. The freebsd ports maintainer for postgresql decided
everybody should have it whether they
needed it or not. I simply deleted the option from the freebsd
makefile rebuilt the port - relinked my app
and no more problem. I just thought the postgresql developers would
want to know there was a bug. If
they don't care to investigate or trouble shoot the bug it is fine by me.

I don't think you grasp the situation, Steve. Having
enable-thread-safety turned on is standard across a wide swath of the
world, and yet nobody else has reported severe memory leaks in ecpg.
So there's something very specific to what your app is doing that
triggers the problem. There's little point in anyone else investigating
unless you can give them a test case that reproduces the misbehavior.

I can assure you we would like to fix the problem if we can find it.
But with no cooperation from you, we'll just have to wait until someone
else stumbles across it and can show us exactly how to make it happen.

regards, tom lane

#15Michael Meskes
meskes@postgresql.org
In reply to: Steve Clark (#10)
Re: --enable-thread-safety bug

On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote:

Not exactly sure what you are asking about - descriptors and auto
allocating.

So I guess you don't use either feature. :-)

The program processes about 800000 packets a day, which can update
several tables.
It runs continously reading udp packets from systems at remote locations
coming in over the internet.

But the code for processing all thoss statements is the same, with and
without threading enabled.

One code that differs is allocation of sqlca, but given that this
structure has a mere 215 bytes (about). Even if it was allocated 800000
times it would make up for a memory loss of about 164MB. Which brings up
the question how long the application runs until it segfaults.

As Tom already pointed out, without more information there simply is no
way for us to find out what's going on. We are more than willing to dig
into it, but we need more to be able to.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

#16Steve Clark
sclark@netwolves.com
In reply to: Michael Meskes (#15)
Re: --enable-thread-safety bug

Michael Meskes wrote:

On Sat, Mar 22, 2008 at 04:58:28PM -0400, Steve Clark wrote:

Not exactly sure what you are asking about - descriptors and auto
allocating.

So I guess you don't use either feature. :-)

The program processes about 800000 packets a day, which can update
several tables.
It runs continously reading udp packets from systems at remote locations
coming in over the internet.

But the code for processing all thoss statements is the same, with and
without threading enabled.

One code that differs is allocation of sqlca, but given that this
structure has a mere 215 bytes (about). Even if it was allocated 800000
times it would make up for a memory loss of about 164MB. Which brings up
the question how long the application runs until it segfaults.

As Tom already pointed out, without more information there simply is no
way for us to find out what's going on. We are more than willing to dig
into it, but we need more to be able to.

Michael

Ok I tryed valgrind and after a while it dies with a valgrind
assertion error before providing any
useful data.

So I tried linking with -lc_r and it appears to have stopped the leak.
Without -lc_r
using "top" my app quickly climbed over 150mbyte in memory size - it
is now staying steady
at about 8mb - which is about what it ran when I compiled the ecpg lib
without --enable-thread-safety
enabled.

Now why does this make a difference in ecpg?

HTH,
Steve

If anyone cares below is the valgrind assertion failure:
valgrind: vg_malloc2.c:1008 (vgPlain_arena_malloc): Assertion `new_sb
!= ((void*)0)' failed.
==4166== at 0xB802BE1F: (within /usr/local/lib/valgrind/stage2)
==4166== by 0xB802BE1E: (within /usr/local/lib/valgrind/stage2)
==4166== by 0xB802BE5D: vgPlain_core_assert_fail (in
/usr/local/lib/valgrind/stage2)
==4166== by 0xB8028091: vgPlain_arena_malloc (in
/usr/local/lib/valgrind/stage2)

sched status:

Thread 1: status = Runnable, associated_mx = 0x0, associated_cv = 0x0
==4166== at 0x3C03894B: calloc (in
/usr/local/lib/valgrind/vgpreload_memcheck.so)

Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.

If that doesn't help, please report this bug to: valgrind.kde.org

In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using. Thanks.