Current CVS tip segfaulting

Started by Alvaro Herreraover 21 years ago19 messages
#1Alvaro Herrera
alvherre@dcc.uchile.cl

Hackers,

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database. The generated core file does not
seem to contain any useful information. The first time I saw this I
managed to PANIC the system -- I can't seem to be able to reproduce that
part.

(Newly built on an empty vpath, so this should be a case of "make
distcleaning" ...)

Core was generated by `postgres: alvherre asd [local] startup
'.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output error

Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libreadline.so.4.3...done.
Loaded symbols for /lib/libreadline.so.4.3
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libgpm.so.1...done.
Loaded symbols for /lib/libgpm.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/gconv/ISO8859-15.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-15.so
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-1.so
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Alvaro Herrera (#1)
Re: Current CVS tip segfaulting

Please recompile with debug symbols and report back the stack trace.
See the faq on running debug.

---------------------------------------------------------------------------

Alvaro Herrera wrote:

Hackers,

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database. The generated core file does not
seem to contain any useful information. The first time I saw this I
managed to PANIC the system -- I can't seem to be able to reproduce that
part.

(Newly built on an empty vpath, so this should be a case of "make
distcleaning" ...)

Core was generated by `postgres: alvherre asd [local] startup
'.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output error

Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libreadline.so.4.3...done.
Loaded symbols for /lib/libreadline.so.4.3
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libgpm.so.1...done.
Loaded symbols for /lib/libgpm.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/gconv/ISO8859-15.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-15.so
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-1.so
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Alvaro Herrera Munoz
alvherre@dcc.uchile.cl
In reply to: Bruce Momjian (#2)
Re: Current CVS tip segfaulting

On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:

Please recompile with debug symbols and report back the stack trace.
See the faq on running debug.

No, I already did that (all my builds are like that anyway and I read
stack traces more frequently than I'd like). The "can't read pathname"
message I don't understand, but I had never seen it.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con
ciervos incendi���nse", y el computador dir��� "especifique el tipo de ciervo"
(Jason Alexander)

#4Alvaro Herrera Munoz
alvherre@dcc.uchile.cl
In reply to: Alvaro Herrera Munoz (#3)
Re: Current CVS tip segfaulting

On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:

On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:

Please recompile with debug symbols and report back the stack trace.
See the faq on running debug.

No, I already did that (all my builds are like that anyway and I read
stack traces more frequently than I'd like). The "can't read pathname"
message I don't understand, but I had never seen it.

strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening. I don't know if this is the correct
fix, but it may suggest something. Maybe it's a problem with my
platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).

Index: postgres.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
retrieving revision 1.400
diff -c -r1.400 postgres.c
*** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
--- postgres.c  24 Apr 2004 02:20:47 -0000
***************
*** 2686,2692 ****
                     errhint("Try \"%s --help\" for more information.", argv[0])));
        }
        else if (argc - optind == 1)
!           dbname = argv[optind];
        else if ((dbname = username) == NULL)
        {
            ereport(FATAL,
--- 2648,2654 ----
                     errhint("Try \"%s --help\" for more information.", argv[0])));
        }
        else if (argc - optind == 1)
!           dbname = pstrdup(argv[optind]);
        else if ((dbname = username) == NULL)
        {
            ereport(FATAL,

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: Current CVS tip segfaulting

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Hmm, works for me with this morning's sources. Bruce created a bug of
that ilk a few days ago but fixed it shortly thereafter. Is it possible
the anon-CVS server is out of date?

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Alvaro Herrera Munoz (#3)
Re: Current CVS tip segfaulting

Alvaro Herrera Munoz wrote:

On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:

Please recompile with debug symbols and report back the stack trace.
See the faq on running debug.

No, I already did that (all my builds are like that anyway and I read
stack traces more frequently than I'd like). The "can't read pathname"
message I don't understand, but I had never seen it.

Oh, you mean the line:

warning: current_sos: Can't read pathname for load map: Input/output error

That is strange. Does it happen if you call abort() from the C code?
That should dump a core on its own. The question is whether things are
getting corrupted because of the way it crashed or some other configure
problem.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Current CVS tip segfaulting

Tom Lane wrote:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Hmm, works for me with this morning's sources. Bruce created a bug of
that ilk a few days ago but fixed it shortly thereafter. Is it possible
the anon-CVS server is out of date?

The bug I fixed was related to a postmaster restart when connecting to a
non-existant database, and the fix was to prevent the longjump for
elog(FATAL) if the code hadn't reached the longjump location yet.

It could be a bug, but if it is, it is a different fix than the one I
did, I think.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Alvaro Herrera Munoz (#4)
Re: Current CVS tip segfaulting

FYI, I just tried:

$ psql lkjasdf
psql: FATAL: database "lkjasdf" does not exist
(2) cat /u/pg/server.log
LOG: database system was shut down at 2004-04-23 15:23:20 EDT
LOG: checkpoint record is at 0/9DCCCC
LOG: redo record is at 0/9DCCCC; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 457; next OID: 17208
LOG: database system is ready
FATAL: database "lkjasdf" does not exist

That looks OK to me on BSD/OS.

I can put a copy of CVS head on my ftp site for testing if you wish.

---------------------------------------------------------------------------

Alvaro Herrera Munoz wrote:

On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:

On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:

Please recompile with debug symbols and report back the stack trace.
See the faq on running debug.

No, I already did that (all my builds are like that anyway and I read
stack traces more frequently than I'd like). The "can't read pathname"
message I don't understand, but I had never seen it.

strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening. I don't know if this is the correct
fix, but it may suggest something. Maybe it's a problem with my
platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).

Index: postgres.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
retrieving revision 1.400
diff -c -r1.400 postgres.c
*** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
--- postgres.c  24 Apr 2004 02:20:47 -0000
***************
*** 2686,2692 ****
errhint("Try \"%s --help\" for more information.", argv[0])));
}
else if (argc - optind == 1)
!           dbname = argv[optind];
else if ((dbname = username) == NULL)
{
ereport(FATAL,
--- 2648,2654 ----
errhint("Try \"%s --help\" for more information.", argv[0])));
}
else if (argc - optind == 1)
!           dbname = pstrdup(argv[optind]);
else if ((dbname = username) == NULL)
{
ereport(FATAL,

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Current CVS tip segfaulting

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It could be a bug, but if it is, it is a different fix than the one I
did, I think.

Re-reading Alvaro's message, I wondered if cranking logging up to a
higher-than-default setting was needed to reproduce the bug. A quick
experiment in that line didn't show a problem, but maybe I missed the
critical setting. Alvaro, what postgresql.conf settings are you using?

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera Munoz (#4)
Re: Current CVS tip segfaulting

Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:

[ bug goes away if ]
! dbname = argv[optind];
[becomes]
! dbname = pstrdup(argv[optind]);

Hm, that's interesting. I could believe this would have something to do
with overwriting the argv area, but we have not touched any of that code
recently; so why would it break for you just now?

Which PS_USE_FOO option does your platform use? (See
src/backend/utils/misc/ps_status.c)

regards, tom lane

#11Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#9)
Re: Current CVS tip segfaulting

On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

It could be a bug, but if it is, it is a different fix than the one I
did, I think.

Re-reading Alvaro's message, I wondered if cranking logging up to a
higher-than-default setting was needed to reproduce the bug. A quick
experiment in that line didn't show a problem, but maybe I missed the
critical setting. Alvaro, what postgresql.conf settings are you using?

I don't touch the standard settings ... log values are from the default
installation.

In another mail you asked:

Which PS_USE_FOO option does your platform use? (See
src/backend/utils/misc/ps_status.c)

PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-)

The relevant strace extract is this (3448 is the backend, 3443 is
postmaster):

3448 write(2, "FATAL: database \"asd\" does not exist\n", 38) = 38
3448 send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c \0L264\0RInitPostgres\0\0", 153, 0) = 153
3448 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
3443 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
3443 --- SIGCHLD (Child exited) @ 0 (0) ---

Note that the ereport() did get the line number, file and function name, the
correct database name, etc. I don't know if the code is changing the ps status
after that; it's difficult to attach a debugger to this ... huh wait, I'll try the
backend's developer switches.

... plays for a while ...

Heh, the -s switch to postmaster seems to behave funny. The bgwriter process
appears in T status in ps (stopped), but not the postmaster; if I then send
SIGCONT to the bgwriter it seems to continue, it returns to S status but
then postmaster doesn't respond correctly to signals (INT or TERM don't shut
it down). Has it been always like this? I haven't used this switch before.

Anyway, this doesn't allow me to examine the dead backend. Trying
postmaster -o "-W 60"
allows me to attach gdb to the backend before it dies:

(gdb) bt
#0 0xffffe410 in ?? ()
#1 0xbfffeda8 in ?? ()
#2 0x4025f800 in ?? () from /lib/tls/libc.so.6
#3 0xbfffec04 in ?? ()
#4 0x401cb460 in nanosleep () from /lib/tls/libc.so.6
#5 0x401cb263 in sleep () from /lib/tls/libc.so.6
#6 0x0818791e in PostgresMain (argc=6, argv=0x82dff18,
username=0x82dfee0 "alvherre") at stdlib.h:382
#7 0x0815fab0 in BackendRun (port=0x82ed050)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664
#8 0x0815f371 in BackendStartup (port=0x82ed050)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297
#9 0x0815db6e in ServerLoop ()
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167
#10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928
#11 0x0812f030 in main (argc=3, argv=0x82deb80)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()

Whoa! New backend, new gdb, try again:

(gdb) break InitPostgres
Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230.
(gdb) cont
Continuing.

Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>,
username=0x80e2540 "U\211�SP�\222���\200= �*\b")
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230
230 bool bootstrap = IsBootstrapProcessingMode();
(gdb)

This surely looks suspicious ...

(gdb) p dbname
$2 = 0xc <Address 0xc out of bounds>
(gdb) frame 1
#1 0x08187581 in PostgresMain (argc=6, argv=0x82dff18,
username=0x82dfee0 "alvherre")
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745
2745 InitPostgres(dbname, username);
(gdb) p argv
$3 = (char **) 0x82dff18
(gdb) p argv[0]
$5 = 0x8265402 "postgres"
(gdb) p argv[1]
$6 = 0x82aa301 "-W"
(gdb) p argv[2]
$7 = 0x82aa304 "60"
(gdb) p argv[3]
$8 = 0xbfffee60 "-v196608"
(gdb) p argv[4]
$9 = 0x826d97a "-p"
(gdb) p argv[5]
$10 = 0x82dfefc "asd"
(gdb) p argv[6]
$11 = 0x0
(gdb) p dbname
$12 = 0x82ea848 "asd"

-- Note that this is not the same as argv[5], it's a copy, and as far as
I can see, it's set by the -p option in the switch/case, in tcop/postgres.c
line 2391, using strdup.

What else?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

#12Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#5)
Re: Current CVS tip segfaulting

On Fri, Apr 23, 2004 at 10:31:46PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Hmm, works for me with this morning's sources. Bruce created a bug of
that ilk a few days ago but fixed it shortly thereafter. Is it possible
the anon-CVS server is out of date?

Did I already say that I use CVSup? It seems to be up to date with the
latest commits, so I don't think this is it.

I'm starting to think that this could be a problem with my glibc/kernel
combination ... This is linux-2.6.3-7mdk with glibc 2.3.3-10mdk.
Is anyone else using Mandrakelinux 10 official?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: Current CVS tip segfaulting

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Alvaro, did you figure this out? I've been mostly distracted for the
past week ...

regards, tom lane

#14Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#13)
Re: Current CVS tip segfaulting

On Fri, Apr 30, 2004 at 12:52:10AM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Alvaro, did you figure this out? I've been mostly distracted for the
past week ...

No. I still see the failure on my platform but I don't know what to
attribute it to.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

#15Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Alvaro Herrera (#14)
Re: Current CVS tip segfaulting

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Alvaro, did you figure this out? I've been mostly distracted for the
past week ...

No. I still see the failure on my platform but I don't know what to
attribute it to.

I also have that for a database installation from CVS on April 17.

It also leaves the server in some incoherent state:

Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL: database "toto" does not exist
Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG: server process (PID 31629) was terminated by signal 11
Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG: terminating any other active server processes
Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING: terminating connection because of crash of another server process
Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server
Apr 30 17:58:22 sablons postgres[31532]: [31-3] process exited abnormally and possibly corrupted shared memory.
Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG: all server processes terminated; reinitializing
Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG: database system was interrupted at 2004-04-30 17:54:56 CEST
Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG: checkpoint record is at 0/B486F30
Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG: redo record is at 0/B486F30; undo record is at 0/0; shutdown TRUE
Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG: next transaction ID: 10769; next OID: 123703
Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG: database system was not properly shut down; automatic recovery in progress
Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG: redo starts at 0/B486F70Apr 30 17:58:22 sablons postgres[31630]: [40-1] PANIC: could not create relation 123703/16660: No such file or directory
Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG: startup process (PID 31630) was terminated by signal 6
Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG: aborting startup due to startup process failure

So it is not a "clean" coredump, if some may be;-)

--
Fabien Coelho - coelho@cri.ensmp.fr

#16Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Fabien COELHO (#15)
Re: Current CVS tip segfaulting

I think we fixed it since then.

---------------------------------------------------------------------------

Fabien COELHO wrote:

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Alvaro, did you figure this out? I've been mostly distracted for the
past week ...

No. I still see the failure on my platform but I don't know what to
attribute it to.

I also have that for a database installation from CVS on April 17.

It also leaves the server in some incoherent state:

Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL: database "toto" does not exist
Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG: server process (PID 31629) was terminated by signal 11
Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG: terminating any other active server processes
Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING: terminating connection because of crash of another server process
Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server
Apr 30 17:58:22 sablons postgres[31532]: [31-3] process exited abnormally and possibly corrupted shared memory.
Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG: all server processes terminated; reinitializing
Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG: database system was interrupted at 2004-04-30 17:54:56 CEST
Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG: checkpoint record is at 0/B486F30
Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG: redo record is at 0/B486F30; undo record is at 0/0; shutdown TRUE
Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG: next transaction ID: 10769; next OID: 123703
Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG: database system was not properly shut down; automatic recovery in progress
Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG: redo starts at 0/B486F70Apr 30 17:58:22 sablons postgres[31630]: [40-1] PANIC: could not create relation 123703/16660: No such file or directory
Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG: startup process (PID 31630) was terminated by signal 6
Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG: aborting startup due to startup process failure

So it is not a "clean" coredump, if some may be;-)

--
Fabien Coelho - coelho@cri.ensmp.fr

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera Munoz (#4)
Re: Current CVS tip segfaulting

Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:

strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening.

else if (argc - optind == 1)
! dbname = argv[optind];
...
else if (argc - optind == 1)
! dbname = pstrdup(argv[optind]);

Surely this is a red herring --- that code path does not even execute
except in the case of a standalone backend.

regards, tom lane

#18Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#17)
Re: Current CVS tip segfaulting

On Fri, Apr 30, 2004 at 11:36:36PM -0400, Tom Lane wrote:

Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:

strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening.

else if (argc - optind == 1)
! dbname = argv[optind];
...
else if (argc - optind == 1)
! dbname = pstrdup(argv[optind]);

Surely this is a red herring --- that code path does not even execute
except in the case of a standalone backend.

Yes, I figured that out later (the normal path uses -p instead). In
fact I then took out the pstrdup() and the fault wasn't happening; so I
recompiled all over again, without the pstrdup and it was back.

I think maybe there's something clobbering argv. I thought about
tracing that with gdb but never got to it. I will do that now and
report back.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

#19Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Alvaro Herrera (#1)
Re: Current CVS tip segfaulting

On Fri, Apr 23, 2004 at 05:10:34PM -0400, Alvaro Herrera wrote:

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.

Just to follow up, I no longer see this problem in CVS tip. I don't
know if somebody fixed it on purpose, but my system is the same as
before and I can't reproduce the bug anymore.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)