Detecting glibc getopt?

Started by Tom Laneabout 24 years ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I have traced down the postmaster-option-processing failure that Thomas
reported this morning. It appears to be specific to systems running
glibc: the problem is that resetting optind to 1 is not enough to
put glibc's getopt() subroutine into a good state to process a fresh
set of options. (Internally it has a "nextchar" pointer that is still
pointing at the old argv list, and only if the pointer points to a null
character will it wake up enough to reexamine the argv pointer you give
it.) The reason we see this now, and didn't see it before, is that
I rearranged startup to set the ps process title as soon as possible
after forking a subprocess --- and at least on Linux machines, that
"nextchar" pointer is pointing into the argv array that's overwritten
by init_ps_display.

While I could revert that change, I don't want to. The idea was to be
sure that a postmaster child running its authentication cycle could be
identified, and I still think that's an important feature. So I want to
find a way to make it work.

Looking at the source code of glibc's getopt, it seems there are two
ways to force a reset:

* set __getopt_initialized to 0. I thought this was an ideal solution
since configure could check for the presence of __getopt_initialized.
Unfortunately it seems that glibc is built in such a way that that
symbol isn't exported :-(, even though it looks global in the source.

* set optind to 0, instead of the more usual 1. This will work, but
it requires us to know that we're dealing with glibc getopt and not
anyone else's getopt.

I have thought of two ways to detect glibc getopt: one is to assume that
if getopt_long() is available, we should set optind=0. The other is to
try a runtime test in configure and see if it works to set optind=0.
Runtime configure tests aren't very appealing, but I don't much care
for equating HAVE_GETOPT_LONG to how we should reset optind, either.

Opinions anyone? Better ideas?

regards, tom lane

#2Thomas Lockhart
lockhart@fourpalms.org
In reply to: Tom Lane (#1)
Re: Detecting glibc getopt?

(I still see the symptom btw; did a make distclean and configure after
updating my tree)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#2)
Re: Detecting glibc getopt?

Thomas Lockhart <lockhart@fourpalms.org> writes:

(I still see the symptom btw; did a make distclean and configure after
updating my tree)

Yeah, it's still busted; my first try was wrong. I have confirmed the
"optind = 0" fix works on my LinuxPPC machine, but we need to decide
how to autoconfigure that hack.

regards, tom lane

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#1)
Re: Detecting glibc getopt?

Tom Lane writes:

The reason we see this now, and didn't see it before, is that
I rearranged startup to set the ps process title as soon as possible
after forking a subprocess --- and at least on Linux machines, that
"nextchar" pointer is pointing into the argv array that's overwritten
by init_ps_display.

How about copying the entire argv[] array to a new location before the
very first call to getopt(). Then you can use getopt() without hackery
and can do anything you want to the "real" argv area. That should be a
lot safer. (We don't know yet what other platforms might play
optimization tricks in getopt().)

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#4)
Re: Detecting glibc getopt?

Peter Eisentraut <peter_e@gmx.net> writes:

How about copying the entire argv[] array to a new location before the
very first call to getopt(). Then you can use getopt() without hackery
and can do anything you want to the "real" argv area. That should be a
lot safer. (We don't know yet what other platforms might play
optimization tricks in getopt().)

Well, mumble --- strictly speaking, there is *NO* way to use getopt
over multiple cycles "without hackery". The standard for getopt
(http://www.opengroup.org/onlinepubs/7908799/xsh/getopt.html)
doesn't say you're allowed to scribble on optind in the first place.
But you're probably right that having a read-only copy of the argv
vector will make things safer. Will do it that way.

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#1)
Re: Detecting glibc getopt?

Is this resolved?

---------------------------------------------------------------------------

I have traced down the postmaster-option-processing failure that Thomas
reported this morning. It appears to be specific to systems running
glibc: the problem is that resetting optind to 1 is not enough to
put glibc's getopt() subroutine into a good state to process a fresh
set of options. (Internally it has a "nextchar" pointer that is still
pointing at the old argv list, and only if the pointer points to a null
character will it wake up enough to reexamine the argv pointer you give
it.) The reason we see this now, and didn't see it before, is that
I rearranged startup to set the ps process title as soon as possible
after forking a subprocess --- and at least on Linux machines, that
"nextchar" pointer is pointing into the argv array that's overwritten
by init_ps_display.

While I could revert that change, I don't want to. The idea was to be
sure that a postmaster child running its authentication cycle could be
identified, and I still think that's an important feature. So I want to
find a way to make it work.

Looking at the source code of glibc's getopt, it seems there are two
ways to force a reset:

* set __getopt_initialized to 0. I thought this was an ideal solution
since configure could check for the presence of __getopt_initialized.
Unfortunately it seems that glibc is built in such a way that that
symbol isn't exported :-(, even though it looks global in the source.

* set optind to 0, instead of the more usual 1. This will work, but
it requires us to know that we're dealing with glibc getopt and not
anyone else's getopt.

I have thought of two ways to detect glibc getopt: one is to assume that
if getopt_long() is available, we should set optind=0. The other is to
try a runtime test in configure and see if it works to set optind=0.
Runtime configure tests aren't very appealing, but I don't much care
for equating HAVE_GETOPT_LONG to how we should reset optind, either.

Opinions anyone? Better ideas?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#7Thomas Lockhart
lockhart@fourpalms.org
In reply to: Bruce Momjian (#6)
Re: Detecting glibc getopt?

Is this resolved?

Sure. Within a day or two of the initial problem report.

- Thomas