Re: pg_upgrade segfault (was: pg_migrator segfault)

Started by hernan gonzalezover 15 years ago5 messagesgeneral
Jump to latest
#1hernan gonzalez
hgonzalez@gmail.com

2010/11/2 hernan gonzalez <hgonzalez@gmail.com>

2010/11/2 Grzegorz Jaśkiewicz <gryzman@gmail.com>

try gdb --args ./pg_upgrade -d /var/pgsql-8_4_3/data/ -D

/var/pgsql-9_0_1/data/ -b /var/pgsql-8_4_3/bin/ -B
/var/pgsql-9_0_1/bin/ --check -P 5433 -v -g -G debug
and when it fails, type in 'bt' and paste it here please.

--
GJ

I read somewhere that it can happen that a programs segfaults because some
allocation problem, which doesnt happen inside gbd (because there some more
memory is allocated, or whatever).

Running gbd with the core generated by the segfault, it outputs this:

Program terminated with signal 11, Segmentation fault.
#0 0xb7df84ed in _int_realloc () from /lib/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.1-4.i686
(gdb) bt
#0 0xb7df84ed in _int_realloc () from /lib/libc.so.6
#1 0xb7df88a0 in realloc () from /lib/libc.so.6
#2 0xb7db2a5e in __add_to_environ () from /lib/libc.so.6
#3 0xb7db27b7 in putenv () from /lib/libc.so.6
#4 0x0804aa11 in putenv2 ()
#5 0x0804af93 in get_control_data ()
#6 0x08049801 in check_cluster_compatibility ()
#7 0x0804eb88 in main ()

Hernán J. González

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: hernan gonzalez (#1)

hernan gonzalez <hgonzalez@gmail.com> writes:

Running gbd with the core generated by the segfault, it outputs this:

Program terminated with signal 11, Segmentation fault.
#0 0xb7df84ed in _int_realloc () from /lib/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.1-4.i686
(gdb) bt
#0 0xb7df84ed in _int_realloc () from /lib/libc.so.6
#1 0xb7df88a0 in realloc () from /lib/libc.so.6
#2 0xb7db2a5e in __add_to_environ () from /lib/libc.so.6
#3 0xb7db27b7 in putenv () from /lib/libc.so.6
#4 0x0804aa11 in putenv2 ()
#5 0x0804af93 in get_control_data ()
#6 0x08049801 in check_cluster_compatibility ()
#7 0x0804eb88 in main ()

Hmm, this suggests that pg_upgrade has managed to clobber malloc's
internal data structures, probably by writing past the end of an
allocated chunk. You should be able to identify where if you can
run pg_upgrade under valgrind or ElectricFence.

regards, tom lane

#3hernan gonzalez
hgonzalez@gmail.com
In reply to: Tom Lane (#2)

In pg_upgrade/controldata.c , putenv2 function :

char *envstr = (char *) pg_malloc(ctx, strlen(var)
+ strlen(val) + 1);
sprintf(envstr, "%s=%s", var, val);

Shouldn't it be "+ 2 " instead of "+ 1" ? (one for the '=', plus one for
the null terminating char) ?

I think that fixes it.

Hernán J. González
http://hjg.com.ar/

#4hernan gonzalez
hgonzalez@gmail.com
In reply to: hernan gonzalez (#3)

Replacing that 1 for 2 it's enough for making it work, for me, it seems.

But it's not enough to get valgrind happy (It still reports 4 "definitely
lost" blocks, all from that putenv2 function). Perhaps that's related to the
comment:

/*
* Do not free envstr because it becomes part of the environment
* on some operating systems. See port/unsetenv.c::unsetenv.
*/

Hernán J. González
http://hjg.com.ar/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: hernan gonzalez (#3)

hernan gonzalez <hgonzalez@gmail.com> writes:

In pg_upgrade/controldata.c , putenv2 function :
char *envstr = (char *) pg_malloc(ctx, strlen(var)
+ strlen(val) + 1);
sprintf(envstr, "%s=%s", var, val);

Shouldn't it be "+ 2 " instead of "+ 1" ?

Yup, it sure should. So probably the reason you're the first one to see
it is that the problem would depend on the exact lengths of the strings
being used here :-(

But it's not enough to get valgrind happy (It still reports 4 "definitely
lost" blocks, all from that putenv2 function).

That's expected; those blocks aren't supposed to get freed.

regards, tom lane