Mac OS X, PostgreSQL, PL/Tcl

Started by Scott Goodwinabout 22 years ago9 messageshackersbugs
Jump to latest
#1Scott Goodwin
scott@scottg.net
hackersbugs

Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.

I compile Tcl, PostgreSQL, create the database and then run the
following:

create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER
as 'plpgsql.so' language 'c';

create trusted procedural language 'plpgsql'
HANDLER plpgsql_call_handler
LANCOMPILER 'PL/pgSQL';

create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
as 'pltcl.so' language 'c';

create trusted procedural language 'pltcl'
HANDLER pltcl_call_handler
LANCOMPILER 'PL/Tcl';

The PL/pgSQL part loads fine. The PL/Tcl part crashes the server, and
psql reports this:

psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:/Users/scott/pgtest/add_languages.sql:12: connection to server was
lost

I have tried the exact same procedure on Linux without any problems
using the exact same scripts, setup etc. I've tried both PG 7.4.1 and a
CVS copy from 11 Feb. I've used gcc 3.3, 3.1 and 2.85. I've tried
loading PL/Tcl without loading PL/pgSQL at all, same problem. I tried
Tcl 8.4.3, 8.4.4 and 8.4.5. pgtclsh runs fine.

I used ktrace to attach to the PG process and it's generating a
SIGSEGV. I get several "file name too long" errors before the SEGV.
Problem is probably not with PG, but could be with Tcl and/or Mac OS X
loadable libs. Here's the significant portion of it (you can find the
whole output trace at http://scottg.net/pgktrace.txt):

... stuff prior ...

27296 postgres 0.000021 NAMI "/usr/lib/libicucore.A.dylib"
27296 postgres 0.000019 RET open 114/0x72
27296 postgres 0.000009 CALL fstat(0x72,0xbfffdf50)
27296 postgres 0.000009 RET fstat 0
27296 postgres 0.000047 CALL
load_shared_file(0x9019060c,0x605000,0x13b680,0xbfffdd60,0x4,0xbfffdcf0,
0xbfffdd64)
27296 postgres 0.000053 NAMI "/usr/lib/libicucore.A.dylib"
27296 postgres 0.000135 RET load_shared_file 0
27296 postgres 0.000034 CALL close(0x72)
27296 postgres 0.000015 RET close 0
27296 postgres 0.000113 CALL stat(0x800200,0xbfffde20)
27296 postgres 0.000016 NAMI "

/libSystem.B.dylib"
27296 postgres 0.000023 RET stat -1 errno 2 No such file or directory
27296 postgres 0.000021 CALL stat(0x800200,0xbfffde20)
27296 postgres 0.000009 NAMI "

/libSystem.B.dylib"
27296 postgres 0.000017 RET stat -1 errno 2 No such file or directory
27296 postgres 0.004552 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000044 RET stat -1 errno 63 File name too long
27296 postgres 0.000019 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000012 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000008 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x182ea00,0xbfffd430)
27296 postgres 0.000009 RET stat -1 errno 63 File name too long
27296 postgres 0.000013 CALL stat(0x90104e34,0xbfffd3b0)
27296 postgres 0.000118 NAMI "/"
27296 postgres 0.000019 RET stat 0
27296 postgres 0.000012 CALL lstat(0x182f600,0xbfffd3b0)
27296 postgres 0.000007 NAMI "."
27296 postgres 0.000016 RET lstat 0
27296 postgres 0.000009 CALL stat(0x182f600,0xbfffd1a0)
27296 postgres 0.000006 NAMI ".."
27296 postgres 0.000018 RET stat 0
27296 postgres 0.000009 CALL open(0x182f600,0x4,0xfefefeff)

... more stuff ...

27296 postgres 0.000007 NAMI "../../../../../.."
27296 postgres 0.000021 RET stat 0
27296 postgres 0.000008 CALL open(0x182f600,0x4,0)
27296 postgres 0.000008 NAMI "../../../../../.."
27296 postgres 0.000016 RET open 114/0x72
27296 postgres 0.000009 CALL fstat(0x72,0xbfffd1a0)
27296 postgres 0.000007 RET fstat 0
27296 postgres 0.000007 CALL fcntl(0x72,0x2,0x1)
27296 postgres 0.000007 RET fcntl 0
27296 postgres 0.000008 CALL fstatfs(0x72,0xbfffd200)
27296 postgres 0.000007 RET fstatfs 0
27296 postgres 0.000009 CALL fstat(0x72,0xbfffd3b0)
27296 postgres 0.000007 RET fstat 0
27296 postgres 0.000008 CALL
getdirentries(0x72,0x182fa00,0x1000,0x501b74)
27296 postgres 0.000065 RET getdirentries 640/0x280
27296 postgres 0.000015 CALL lseek(0x72,0,0,0)
27296 postgres 0.000007 RET lseek 0
27296 postgres 0.000009 CALL close(0x72)
27296 postgres 0.000009 RET close 0
27296 postgres 0.000007 CALL lstat(0x182f600,0xbfffd3b0)
27296 postgres 0.000007 NAMI "../../../../../../"
27296 postgres 0.000019 RET lstat 0
27296 postgres 0.000024 CALL stat(0xbfffd4f0,0xbfffd900)
27296 postgres 0.000009 RET stat -1 errno 63 File name too long
27296 postgres 0.140906 PSIG SIGSEGV SIG_DFL
26999 postgres 0.004582 CSW resume kernel
26999 postgres 0.000025 RET select -1 errno 4 Interrupted system call
26999 postgres 0.000010 PSIG SIGCHLD caught handler=0xe59ac mask=0x0
code=0x0
26999 postgres 0.000302 CALL sigprocmask(0x3,0x23fc74,0)
26999 postgres 0.000036 RET sigprocmask 0
26999 postgres 0.000037 CALL wait4(0xffffffff,0xbfffe670,0x1,0)
26999 postgres 0.000086 RET wait4 27296/0x6aa0
26999 postgres 0.000258 CALL write(0x2,0xbfffdd10,0x3d)
26999 postgres 0.000031 GIO fd 2 wrote 61 bytes
"LOG: server process (PID 27296) was terminated by signal 11
"
26999 postgres 0.000009 RET write 61/0x3d
26999 postgres 0.000020 CALL write(0x2,0xbfffdd10,0x34)
26999 postgres 0.000013 GIO fd 2 wrote 52 bytes
"LOG: terminating any other active server processes
"
26999 postgres 0.000008 RET write 52/0x34
26999 postgres 0.000032 CALL kill(0x6a35,0x3)
26999 postgres 0.000020 RET kill 0
26999 postgres 0.000011 CALL sendto(0x6e,0xbfffe5a0,0x18,0,0,0)

thanks,

/s.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Goodwin (#1)
hackersbugs
Re: Mac OS X, PostgreSQL, PL/Tcl

Scott Goodwin <scott@scottg.net> writes:

Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.

FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2
and PG CVS tip, I did
configure --with-tcl --without-tk
then make, make install, etc. pltcl installs and passes its regression
test.

psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can you provide a stack trace for this?

regards, tom lane

#3Scott Goodwin
scott@scottg.net
In reply to: Tom Lane (#2)
hackersbugs
Re: Mac OS X, PostgreSQL, PL/Tcl

Ok, so it's something specific to my setup. I created a test account,
logged in and compiled postgresql there with a clean shell environment
and it worked fine. So I'm shooting myself in the foot in my login
environment. *sigh*.

thanks,

/s.

On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

Show quoted text

Scott Goodwin <scott@scottg.net> writes:

Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.

FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2
and PG CVS tip, I did
configure --with-tcl --without-tk
then make, make install, etc. pltcl installs and passes its regression
test.

psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can you provide a stack trace for this?

regards, tom lane

#4Scott Goodwin
scott@scottg.net
In reply to: Tom Lane (#2)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.

I have duplicated the problem by renaming my .bashrc and logging back
in. With this clean environment, I started PG and loaded PG/Tcl without
any problems. I then created the following environment variable on the
command line:

LONG_VAR=aaaaaaaaaaaaaaaaaa:bbbbbbbbbbbbbbbbbbb:cccccccccccccccccc:
ddddddddddddddddddd:eeeeeeeeeeeeeeeeeee:fffffffffffffff:
ggggggggggggggggg:hhhhhhhhhhhhhhhhhhhh:iiiiiiiiiiiiiiiiiii:
jjjjjjjjjjjjjjjjjjjjj:kkkkkkkkkkkkkkkkkkkkkk:llllllllllllllllllll:
mmmmmmmmmmmmmmmmmmmmmmm:nnnnnnnnnnnnnnnnnnnnnnnnn:
ooooooooooooooooooooooo:pppppppppppppppppppppp:qqqqqqqqqqqqqqqqqqqqqqq:
rrrrrrrrrrrrrrrrrrrrrrr:ssssssssssssssssssssssssss:
ttttttttttttttttttttttttttt:uuuuuuuuuuuuuuuuuuuuuuuuu:
vvvvvvvvvvvvvvvvvvvvvv:wwwwwwwwwwwwwwwwwwwwwwwwwwwwww:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:yyyyyyyyyyyyyyyyyyyyyyyyyyyyy:
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

and exported it. (Obviously the line above is going to be broken into
multiple lines by the mailer...).

Then I stopped and restarted PG, loaded PG/Tcl and PG crashed. You
*must* stop and restart PG for the problem to exhibit itself, otherwise
it won't pick up the change in the environment. I suspect I'm running
into a buffer overflow situation.

Ok, it fails consistently when LONG_VAR is 523 characters or greater;
works consistently when LONG_VAR is 522 characters or smaller. Might
not fail at the same number for others.

/s.

To prove that this was the problem, I cleaned out my environment by
moving my .bashrc file to another name, logged out, logged in, start
On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

Show quoted text

Scott Goodwin <scott@scottg.net> writes:

Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.

FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2
and PG CVS tip, I did
configure --with-tcl --without-tk
then make, make install, etc. pltcl installs and passes its regression
test.

psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can you provide a stack trace for this?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Goodwin (#4)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

Scott Goodwin <scott@scottg.net> writes:

Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.

Interesting. Did you check whether the limiting factor is the longest
variable length, or the total size of the environment? ("env|wc" would
probably do as an approximation for the latter.)

regards, tom lane

#6Scott Goodwin
scott@scottg.net
In reply to: Tom Lane (#5)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

I'm certain that the length of a single env var is the only factor
involved, and not the size of the enviroment itself. If I login to my
normal environment and unset LS_COLORS, everything works fine. If I
move my .bashrc out of the way, login fresh and create an env var > 522
chars, it fails. My login environment is much larger than the
environment I get without . bashrc, and the results of setting a single
env var to > 522 chars duplicates the problem in both envs. leading me
to believe that env size doesn't have an effect on this problem. I've
now set my PG startup script to 'unset LS_COLORS' before starting PG,
and this works great. Has anyone else tried to duplicate this problem?
I'm using Mac OS 10.3.2, PG 7.4.1, Tcl 8.4.5.

/s.

On Feb 22, 2004, at 12:21 PM, Tom Lane wrote:

Show quoted text

Scott Goodwin <scott@scottg.net> writes:

Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.

Interesting. Did you check whether the limiting factor is the longest
variable length, or the total size of the environment? ("env|wc" would
probably do as an approximation for the latter.)

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Goodwin (#4)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

Scott Goodwin <scott@scottg.net> writes:

Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.

I was able to duplicate this. I am not entirely sure why the problem is
dependent on the environment size, but I now know what causes it.
It seems Darwin's libc keeps its own copy of the argv pointer, and when
we move argv and then scribble on the original, it causes problems for
subsequent code that tries to look at argv[0] to determine the
executable's location. (It's a good thing Darwin is open source, 'cause
I'm not sure we'd have ever seen the connection if we hadn't been able
to look at the source code for their libc.)

The fix is basically

+ #if defined(__darwin__)
+ #include <crt_externs.h>
+ #endif
+ #if defined(__darwin__)
+ 		*_NSGetArgv() = new_argv;
+ #endif

which you can stick into main.c if you need a workaround. I applied a
more extensive patch to HEAD that refactors this code into ps_status.c,
but I'm disinclined to apply that patch to stable branches...

regards, tom lane

#8Scott Goodwin
scott@scottg.net
In reply to: Tom Lane (#7)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

I'll grab the CVS PG copy and try it out. Is this something the Darwin
folks should be notified about? It might cause problems with other
apps.

thanks,

/s.

On Feb 22, 2004, at 4:47 PM, Tom Lane wrote:

Show quoted text

Scott Goodwin <scott@scottg.net> writes:

Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.

I was able to duplicate this. I am not entirely sure why the problem
is
dependent on the environment size, but I now know what causes it.
It seems Darwin's libc keeps its own copy of the argv pointer, and when
we move argv and then scribble on the original, it causes problems for
subsequent code that tries to look at argv[0] to determine the
executable's location. (It's a good thing Darwin is open source,
'cause
I'm not sure we'd have ever seen the connection if we hadn't been able
to look at the source code for their libc.)

The fix is basically

+ #if defined(__darwin__)
+ #include <crt_externs.h>
+ #endif
+ #if defined(__darwin__)
+ 		*_NSGetArgv() = new_argv;
+ #endif

which you can stick into main.c if you need a workaround. I applied a
more extensive patch to HEAD that refactors this code into ps_status.c,
but I'm disinclined to apply that patch to stable branches...

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Goodwin (#8)
hackersbugs
Re: [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

Scott Goodwin <scott@scottg.net> writes:

I'll grab the CVS PG copy and try it out. Is this something the Darwin
folks should be notified about? It might cause problems with other
apps.

It's unlikely that they'll consider it their problem.

regards, tom lane