Core dump in regression tests.

Started by Keith Parksover 27 years ago50 messageshackers
Jump to latest
#1Keith Parks
emkxp01@mtcc.demon.co.uk

If anyone has been watching my trials and tribulations building
and running the latest CVS snapshot under S/Linux on a SUN
SPARCstation IPX.

The latest position is:-

If I compile with optimization turned off (-O0 instead of -O2)
I get an almost clean run of the regressin tests. Only the
"create_function" tests fail unexpectedly.

[postgres@sparclinux regress]$ psql regression
Welcome to the POSTGRESQL interactive sql monitor:
Please read the file COPYRIGHT for copyright terms of POSTGRESQL

type \? for help on slash commands
type \q to quit
type \g or terminate with semicolon to execute query
You are currently connected to the database: regression

regression=> CREATE FUNCTION widget_in(opaque)
regression-> RETURNS widget
regression-> AS '/usr/local/pgsql/src/test/regress/input/../regress.so'
regression-> LANGUAGE 'c';
NOTICE: ProcedureCreate: type 'widget' is not yet defined
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or while
processing the request.
We have lost the connection to the backend, so further processing is impossible.
Terminating.
[postgres@sparclinux regress]$

The backtrace shows:-

Program received signal SIGSEGV, Segmentation fault.
0x44744 in GetIndexValue (tuple=0x25e210, hTupDesc=0x25e26c, attOff=0,
attrNums=0x261944, fInfo=0x0,
attNull=0xefffcbcf "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x44744 in GetIndexValue (tuple=0x25e210, hTupDesc=0x25e26c, attOff=0,
attrNums=0x261944, fInfo=0x0,
attNull=0xefffcbcf "") at indexam.c:404
#1 0x68e9c in FormIndexDatum (numberOfAttributes=1, attributeNumber=0x261944,
heapTuple=0x25e210,
heapDescriptor=0x25e26c, datum=0xefffccb8, nullv=0xefffcc58 " ",
fInfo=0x0) at index.c:1284
#2 0x69c38 in CatalogIndexInsert (idescs=0xefffcd30, nIndices=3,
heapRelation=0x213b90, heapTuple=0x25e210)
at indexing.c:154
#3 0x6f344 in ProcedureCreate (procedureName=0x208bb0 "widget_in", returnsSet=0
'\000',
returnTypeName=0x208b30 "widget", languageName=0xefffcec8 "C",
prosrc=0x18d7f8 "-",
probin=0x251b10 "/usr/local/pgsql/src/test/regress/input/../regress.so",
canCache=24 '\030',
trusted=1 '\001', byte_pct=100, perbyte_cpu=0, percall_cpu=0,
outin_ratio=100, argList=0x208b50,
dest=Remote) at pg_proc.c:275
#4 0x786c8 in CreateFunction (stmt=0x207650, dest=Remote) at define.c:329
#5 0x131694 in ProcessUtility (parsetree=0x207650, dest=Remote) at
utility.c:392
#6 0x12db18 in pg_exec_query_dest (
query_string=0xefffd130 " CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';", dest=Remote, aclOverride=0 '\000') at postgres.c:720
#7 0x12d98c in pg_exec_query (
query_string=0xefffd130 " CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';") at postgres.c:658
#8 0x12f81c in PostgresMain (argc=9, argv=0xeffff210, real_argc=10,
real_argv=0xeffffd84) at postgres.c:1578
#9 0x10183c in DoBackend (port=0x209000) at postmaster.c:1519
#10 0x100ffc in BackendStartup (port=0x209000) at postmaster.c:1291
#11 0xffed8 in ServerLoop () at postmaster.c:750
#12 0xff860 in PostmasterMain (argc=10, argv=0xeffffd84) at postmaster.c:556
#13 0xb02c0 in main (argc=10, argv=0xeffffd84) at main.c:93
(gdb)

Has anyone else seen anything similar?

Keith.

PS: Bruce, I still need to find which file breaks an -O2 compile but
will spend some time playing over this long weekend.

#2Thomas A. Szybist
szybist@boxhill.com
In reply to: Keith Parks (#1)
Re: [HACKERS] Core dump in regression tests.

Hi, yes, I'm having trouble as well.

I'm crashing anytime I create a table, (-O2). I just tried the 8/29 snapshot.
I've got an environment set up now to try a few things.

Without -O2 it seem to be better. I see the same problem with
create function as you. Also many failures seem to the result
of some type not defined. Is that expected?

Sorry to jump in so late here.

Tom Szybist
szybist@boxhill.com

In message <199808291829.TAA25332@mtcc.demon.co.uk>, Keith Parks writes:

Show quoted text

If anyone has been watching my trials and tribulations building
and running the latest CVS snapshot under S/Linux on a SUN
SPARCstation IPX.

The latest position is:-

If I compile with optimization turned off (-O0 instead of -O2)
I get an almost clean run of the regressin tests. Only the
"create_function" tests fail unexpectedly.

[postgres@sparclinux regress]$ psql regression
Welcome to the POSTGRESQL interactive sql monitor:
Please read the file COPYRIGHT for copyright terms of POSTGRESQL

type \? for help on slash commands
type \q to quit
type \g or terminate with semicolon to execute query
You are currently connected to the database: regression

regression=> CREATE FUNCTION widget_in(opaque)
regression-> RETURNS widget
regression-> AS '/usr/local/pgsql/src/test/regress/input/../regress.so'
regression-> LANGUAGE 'c';
NOTICE: ProcedureCreate: type 'widget' is not yet defined
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or while
processing the request.
We have lost the connection to the backend, so further processing is impossible.
Terminating.
[postgres@sparclinux regress]$

The backtrace shows:-

Program received signal SIGSEGV, Segmentation fault.
0x44744 in GetIndexValue (tuple=0x25e210, hTupDesc=0x25e26c, attOff=0,
attrNums=0x261944, fInfo=0x0,
attNull=0xefffcbcf "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x44744 in GetIndexValue (tuple=0x25e210, hTupDesc=0x25e26c, attOff=0,
attrNums=0x261944, fInfo=0x0,
attNull=0xefffcbcf "") at indexam.c:404
#1 0x68e9c in FormIndexDatum (numberOfAttributes=1, attributeNumber=0x261944,
heapTuple=0x25e210,
heapDescriptor=0x25e26c, datum=0xefffccb8, nullv=0xefffcc58 " ",
fInfo=0x0) at index.c:1284
#2 0x69c38 in CatalogIndexInsert (idescs=0xefffcd30, nIndices=3,
heapRelation=0x213b90, heapTuple=0x25e210)
at indexing.c:154
#3 0x6f344 in ProcedureCreate (procedureName=0x208bb0 "widget_in", returnsSet=0
'\000',
returnTypeName=0x208b30 "widget", languageName=0xefffcec8 "C",
prosrc=0x18d7f8 "-",
probin=0x251b10 "/usr/local/pgsql/src/test/regress/input/../regress.so",
canCache=24 '\030',
trusted=1 '\001', byte_pct=100, perbyte_cpu=0, percall_cpu=0,
outin_ratio=100, argList=0x208b50,
dest=Remote) at pg_proc.c:275
#4 0x786c8 in CreateFunction (stmt=0x207650, dest=Remote) at define.c:329
#5 0x131694 in ProcessUtility (parsetree=0x207650, dest=Remote) at
utility.c:392
#6 0x12db18 in pg_exec_query_dest (
query_string=0xefffd130 " CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';", dest=Remote, aclOverride=0 '\000') at postgres.c:720
#7 0x12d98c in pg_exec_query (
query_string=0xefffd130 " CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';") at postgres.c:658
#8 0x12f81c in PostgresMain (argc=9, argv=0xeffff210, real_argc=10,
real_argv=0xeffffd84) at postgres.c:1578
#9 0x10183c in DoBackend (port=0x209000) at postmaster.c:1519
#10 0x100ffc in BackendStartup (port=0x209000) at postmaster.c:1291
#11 0xffed8 in ServerLoop () at postmaster.c:750
#12 0xff860 in PostmasterMain (argc=10, argv=0xeffffd84) at postmaster.c:556
#13 0xb02c0 in main (argc=10, argv=0xeffffd84) at main.c:93
(gdb)

Has anyone else seen anything similar?

Keith.

PS: Bruce, I still need to find which file breaks an -O2 compile but
will spend some time playing over this long weekend.

#3Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Thomas A. Szybist (#2)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist <szybist@boxhill.com>

Hi, yes, I'm having trouble as well.

Sorry to hear that but I don't feel quite so alone now ;-)

I'm crashing anytime I create a table, (-O2). I just tried the 8/29 snapshot.
I've got an environment set up now to try a few things.

Same thing here with -O2, I think some of the other pg_user, pg_view etc.
problems all boil down to table creation as those are created as
tables after bootstrapping and subsequently converted to views.

Without -O2 it seem to be better. I see the same problem with
create function as you. Also many failures seem to the result
of some type not defined. Is that expected?

Yes, there are many problems in the regression tests due to some
type removal, I think, by Bruce. We're just waiting for the
regression test "expected" results to catch up.

Sorry to jump in so late here.

Better late than never <grin>

Just out of interest, what platform are you running on?
(S/Linux on an SUN IPX here)

Keith.

#4Thomas A. Szybist
szybist@boxhill.com
In reply to: Keith Parks (#3)
Re: [HACKERS] Core dump in regression tests.

In message <199808292119.WAA27374@mtcc.demon.co.uk>, Keith Parks writes:

Thomas A. Szybist <szybist@boxhill.com>

Hi, yes, I'm having trouble as well.

Sorry to hear that but I don't feel quite so alone now ;-)

Misery loves company :).

I'm crashing anytime I create a table, (-O2). I just tried the 8/29 snapshot.
I've got an environment set up now to try a few things.

Same thing here with -O2, I think some of the other pg_user, pg_view etc.
problems all boil down to table creation as those are created as
tables after bootstrapping and subsequently converted to views.

Without -O2 it seem to be better. I see the same problem with
create function as you. Also many failures seem to the result
of some type not defined. Is that expected?

Yes, there are many problems in the regression tests due to some
type removal, I think, by Bruce. We're just waiting for the
regression test "expected" results to catch up.

Sorry to jump in so late here.

Better late than never <grin>

Just out of interest, what platform are you running on?
(S/Linux on an SUN IPX here)

Keith.

This is on a Red Hat 4.1 (Yes 4.1) system. Thing is, it's a production
system, and I haven't reason to upgrade.

Kernel is 2.0.29 gcc 2.7.2.1. Sparc 20.

I was toying with the idea of tossing Red Had 5.1 on a sparc classic,
to see if glibc on S/Linux hoses anything.

Tom Szybist
szybist@boxhill.com

#5Bruce Momjian
bruce@momjian.us
In reply to: Thomas A. Szybist (#4)
Re: [HACKERS] Core dump in regression tests.

In message <199808292119.WAA27374@mtcc.demon.co.uk>, Keith Parks writes:

Thomas A. Szybist <szybist@boxhill.com>

Hi, yes, I'm having trouble as well.

Sorry to hear that but I don't feel quite so alone now ;-)

Misery loves company :).

I'm crashing anytime I create a table, (-O2). I just tried the 8/29 snapshot.
I've got an environment set up now to try a few things.

Same thing here with -O2, I think some of the other pg_user, pg_view etc.
problems all boil down to table creation as those are created as
tables after bootstrapping and subsequently converted to views.

Without -O2 it seem to be better. I see the same problem with
create function as you. Also many failures seem to the result
of some type not defined. Is that expected?

Yes, there are many problems in the regression tests due to some
type removal, I think, by Bruce. We're just waiting for the
regression test "expected" results to catch up.

Sorry to jump in so late here.

Better late than never <grin>

Just out of interest, what platform are you running on?
(S/Linux on an SUN IPX here)

Keith.

This is on a Red Hat 4.1 (Yes 4.1) system. Thing is, it's a production
system, and I haven't reason to upgrade.

Kernel is 2.0.29 gcc 2.7.2.1. Sparc 20.

I was toying with the idea of tossing Red Had 5.1 on a sparc classic,
to see if glibc on S/Linux hoses anything.

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#6Thomas A. Szybist
szybist@boxhill.com
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Core dump in regression tests.

In message <199808300008.UAA15993@candle.pha.pa.us>, Bruce Momjian writes:

In message <199808292119.WAA27374@mtcc.demon.co.uk>, Keith Parks writes:

Thomas A. Szybist <szybist@boxhill.com>

Hi, yes, I'm having trouble as well.

Sorry to hear that but I don't feel quite so alone now ;-)

Misery loves company :).

I'm crashing anytime I create a table, (-O2). I just tried the 8/29 snapshot.
I've got an environment set up now to try a few things.

Same thing here with -O2, I think some of the other pg_user, pg_view etc.
problems all boil down to table creation as those are created as
tables after bootstrapping and subsequently converted to views.

Without -O2 it seem to be better. I see the same problem with
create function as you. Also many failures seem to the result
of some type not defined. Is that expected?

Yes, there are many problems in the regression tests due to some
type removal, I think, by Bruce. We're just waiting for the
regression test "expected" results to catch up.

Sorry to jump in so late here.

Better late than never <grin>

Just out of interest, what platform are you running on?
(S/Linux on an SUN IPX here)

Keith.

This is on a Red Hat 4.1 (Yes 4.1) system. Thing is, it's a production
system, and I haven't reason to upgrade.

Kernel is 2.0.29 gcc 2.7.2.1. Sparc 20.

I was toying with the idea of tossing Red Had 5.1 on a sparc classic,
to see if glibc on S/Linux hoses anything.

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
+  If your life is a hard drive,     |  (610) 353-9879(w)
+  Christ can be your backup.        |  (610) 853-3000(h)

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

Thanks,

Tom Szybist
szybist@boxhill.com

#7Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Core dump in regression tests.

I built and run the regression tests from a clean cvs tree this morning
(1998-08-30 20:00UTC). I have removed the oidint2, oidint4, and oidname
tests since the types no longer exist, and have updated alter_table to
remove mention of these types. I've committed these changes to the cvs
tree.

There are some failures (itemized below). The only failure I saw before
the big OID patch-fest was select_views.

So, the current status on my system (i686, Linux 2.0.30, RedHat 4.2, -O2
optimization enabled) is that all tests pass except the following:

- constraints .. failed
Core dump.

- create_index .. failed
Fails on all create index statements after the first one with the
message:
QUERY: CREATE INDEX onek_unique2 ON onek USING btree(unique2 int4_ops);
ERROR: DefineIndex: onek relation not found
but, this statement executes just fine after the regression tests have
completed and I connect in from another process:
regression=> CREATE INDEX onek_unique2 ON onek USING btree(unique2
int4_ops);
CREATE
Is it a cache problem somewhere??

sanity_check .. failed

NOTICE: Index pg_class_relname_index: NUMBER OF INDEX' TUPLES (144) IS NOT THE SAME AS HEAP' (135)
NOTICE: Index pg_class_oid_index: NUMBER OF INDEX' TUPLES (144) IS NOT THE SAME AS HEAP' (135)

These warnings weren't present before. Also sensitive to missing table
from constraints test failure.

select_having .. failed
Core dump.

select_views .. failed
Core dump. afaik this has been present for a month or two, and is a
failure on the last query in the test. EXPLAIN shows a valid result, so
the crash happens farther back.

run_ruletest .. failed
Apparently not critical; the test depends on the name of the dba being
"pgsql" and my system has a dba named "postgres". The test should be
fixed for v6.4.

- Tom

#8Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Thomas Lockhart (#7)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist" <szybist@boxhill.com>

Bruce Momjian <maillist@candle.pha.pa.us>

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

Thanks,

Tom,

I recompiled the latest cvs with -O2 and found that the crash on
table creation was NOT now failing so I'm a little confused :-(

I'm just updating my cvs, and will do another build and see how
things go.

If only I had the same failures as before I'd be able to confirm
your suspicions on indexing.c

Til Later,
Keith.

#9Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#8)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist" <szybist@boxhill.com>

Bruce Momjian <maillist@candle.pha.pa.us>

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

Thanks,

Tom,

I recompiled the latest cvs with -O2 and found that the crash on
table creation was NOT now failing so I'm a little confused :-(

I'm just updating my cvs, and will do another build and see how
things go.

If only I had the same failures as before I'd be able to confirm
your suspicions on indexing.c

I have found a problem in indexing.c. In CatalogIndexFetchTuple(),
there is a particulary weird do..while loop, and in trying to clean it
up as part of the megapatch, I broke it and thought I had it fixed.

It appears it may still be broken. The ReleaseBuffer(buffer) call could
happen even if no valid tuple is returned because buffer has a random
value.

I am running a test now, and will post the fix as soon as I am sure it
works.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#10Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#8)
Re: [HACKERS] Core dump in regression tests.

I recompiled the latest cvs with -O2 and found that the crash on
table creation was NOT now failing so I'm a little confused :-(

I'm just updating my cvs, and will do another build and see how
things go.

If only I had the same failures as before I'd be able to confirm
your suspicions on indexing.c

Til Later,
Keith.

OK, I am applying my patch now. I certainly fixes a potential problem,
so I suspect it will fix the problems you are seeing.

Thomas, perhaps it will fix the regression problems too. No way to
know.

Here is the new while loop. Much better.

---------------------------------------------------------------------------

sd = index_beginscan(idesc, false, num_keys, skey);
while (indexRes = index_getnext(sd, ForwardScanDirection))
{
ItemPointer iptr;

iptr = &indexRes->heap_iptr;
tuple = heap_fetch(heapRelation, SnapshotNow, iptr, &buffer);
pfree(indexRes);
if (HeapTupleIsValid(tuple))
break;
}

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#11David Hartwig
daybee@bellatlantic.net
In reply to: Bruce Momjian (#5)
Re: [HACKERS] Core dump in regression tests.

Thomas G. Lockhart wrote:

select_having .. failed
Core dump.

My bad. It is caused by a known bug having to do with GROUP BY. It ain't a good bug, but it has
nothing to do with HAVING. For some reason the bug went away for a while, while I was building the test
script. It must have, because that is how I created the expected file. :(

A patch to the regression will be forthcoming.

#12Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#7)
Re: [HACKERS] Core dump in regression tests.

Thomas, I have just run the regression test here with my indexing.c
patch, and the only serious errors I see are the cases that have already
been mentioned like the having core dump, and the postgres/pgsql rule
difference.

Do you still see other differences? Let me know.

I am putting the two spaces after an elog message back in because I
don't think we want to change that format for no good reason.

I can see people expecting the string to look a certain way.

I built and run the regression tests from a clean cvs tree this morning
(1998-08-30 20:00UTC). I have removed the oidint2, oidint4, and oidname
tests since the types no longer exist, and have updated alter_table to
remove mention of these types. I've committed these changes to the cvs
tree.

There are some failures (itemized below). The only failure I saw before
the big OID patch-fest was select_views.

So, the current status on my system (i686, Linux 2.0.30, RedHat 4.2, -O2
optimization enabled) is that all tests pass except the following:

- constraints .. failed
Core dump.

- create_index .. failed
Fails on all create index statements after the first one with the
message:
QUERY: CREATE INDEX onek_unique2 ON onek USING btree(unique2 int4_ops);
ERROR: DefineIndex: onek relation not found
but, this statement executes just fine after the regression tests have
completed and I connect in from another process:
regression=> CREATE INDEX onek_unique2 ON onek USING btree(unique2
int4_ops);
CREATE
Is it a cache problem somewhere??

sanity_check .. failed

NOTICE: Index pg_class_relname_index: NUMBER OF INDEX' TUPLES (144) IS NOT THE SAME AS HEAP' (135)
NOTICE: Index pg_class_oid_index: NUMBER OF INDEX' TUPLES (144) IS NOT THE SAME AS HEAP' (135)

These warnings weren't present before. Also sensitive to missing table
from constraints test failure.

select_having .. failed
Core dump.

select_views .. failed
Core dump. afaik this has been present for a month or two, and is a
failure on the last query in the test. EXPLAIN shows a valid result, so
the crash happens farther back.

run_ruletest .. failed
Apparently not critical; the test depends on the name of the dba being
"pgsql" and my system has a dba named "postgres". The test should be
fixed for v6.4.

- Tom

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#13Thomas A. Szybist
szybist@boxhill.com
In reply to: Bruce Momjian (#10)
Re: [HACKERS] Core dump in regression tests.

In message <199808302324.TAA28018@candle.pha.pa.us>, Bruce Momjian writes:

I recompiled the latest cvs with -O2 and found that the crash on
table creation was NOT now failing so I'm a little confused :-(

I'm just updating my cvs, and will do another build and see how
things go.

If only I had the same failures as before I'd be able to confirm
your suspicions on indexing.c

Til Later,
Keith.

OK, I am applying my patch now. I certainly fixes a potential problem,
so I suspect it will fix the problems you are seeing.

Thomas, perhaps it will fix the regression problems too. No way to
know.

Here is the new while loop. Much better.

---------------------------------------------------------------------------

sd = index_beginscan(idesc, false, num_keys, skey);
while (indexRes = index_getnext(sd, ForwardScanDirection))
{
ItemPointer iptr;

iptr = &indexRes->heap_iptr;
tuple = heap_fetch(heapRelation, SnapshotNow, iptr, &buffer);
pfree(indexRes);
if (HeapTupleIsValid(tuple))
break;
}

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
+  If your life is a hard drive,     |  (610) 353-9879(w)
+  Christ can be your backup.        |  (610) 853-3000(h)

I tried patching indexing.c with this new loop--no luck. I just
checked out a fresh copy, still no luck. I don't understand why it now
works for Keith.

Yesterday I tried this on Solaris, but I was bitten by not having
flock.

Tom Szybist
szybist@boxhill.com

#14Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Thomas A. Szybist (#13)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist" <szybist@boxhill.com>

Bruce Momjian

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

With the latest from cvs the core dump on "create table" is back
when compiled with -O2.

If I compile backend/catalog with -O2 then the table creation is
OK. So it looks like it may be indexing.c, even with Bruce's
recent fixes.

I'm still getting some regression test failures, the worst of which
is a core when creating a function.

Here's a bactrace from a "create function" immediately after an initdb
and using the template1 database.

Keith.

Program received signal SIGSEGV, Segmentation fault.
0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
#1 0x4974c in FormIndexDatum (numberOfAttributes=1, attributeNumber=0x1bd544,
heapTuple=0x1ba810,
heapDescriptor=0x1ba86c, datum=0xefffcd80, nullv=0xefffcd20 " \230",
fInfo=0x0) at index.c:1284
#2 0x4a4e8 in CatalogIndexInsert (idescs=0xefffcdf8, nIndices=3,
heapRelation=0x171b90, heapTuple=0x1ba810)
at indexing.c:154
#3 0x4fb2c in ProcedureCreate (procedureName=0x166bf0 "widget_in", returnsSet=0
'\000',
returnTypeName=0x166bb0 "widget", languageName=0xefffcf98 "C",
prosrc=0xeb800 "-",
probin=0x1aeb10 "/usr/local/pgsql/src/test/regress/input/../regress.so",
canCache=112 'p',
trusted=1 '\001', byte_pct=100, perbyte_cpu=0, percall_cpu=0,
outin_ratio=100, argList=0x166bd0,
dest=Remote) at pg_proc.c:275
#4 0x55674 in CreateFunction (stmt=0x165a50, dest=Remote) at define.c:329
#5 0xb8430 in ProcessUtility (parsetree=0x165a50, dest=Remote) at utility.c:392
#6 0xb60f8 in pg_exec_query_dest (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';", dest=Remote, aclOverride=0 '\000') at postgres.c:749
#7 0xb5fec in pg_exec_query (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';") at postgres.c:687
#8 0xb72ec in PostgresMain (argc=10, argv=0xeffff268, real_argc=10,
real_argv=0xeffffd84) at postgres.c:1609
#9 0x9f27c in DoBackend (port=0x107c00) at postmaster.c:1519
#10 0x9ecf4 in BackendStartup (port=0x167c00) at postmaster.c:1291
#11 0x9e16c in ServerLoop () at postmaster.c:750
#12 0x9dcc4 in PostmasterMain (argc=0, argv=0xeffffd84) at postmaster.c:556
#13 0x723b0 in main (argc=10, argv=0xeffffd84) at main.c:93

#15Thomas A. Szybist
szybist@boxhill.com
In reply to: Keith Parks (#14)
Re: [HACKERS] Core dump in regression tests.

In message <199808311703.SAA29362@mtcc.demon.co.uk>, Keith Parks writes:

Thomas A. Szybist" <szybist@boxhill.com>

Bruce Momjian

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

With the latest from cvs the core dump on "create table" is back
when compiled with -O2.

If I compile backend/catalog with -O2 then the table creation is

^^^

OK. So it looks like it may be indexing.c, even with Bruce's
recent fixes.

Do you mean -O0 here?

I'm still getting some regression test failures, the worst of which
is a core when creating a function.

Here's a bactrace from a "create function" immediately after an initdb
and using the template1 database.

Keith.

Program received signal SIGSEGV, Segmentation fault.
0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
#1 0x4974c in FormIndexDatum (numberOfAttributes=1, attributeNumber=0x1bd544,
heapTuple=0x1ba810,
heapDescriptor=0x1ba86c, datum=0xefffcd80, nullv=0xefffcd20 " \230",
fInfo=0x0) at index.c:1284
#2 0x4a4e8 in CatalogIndexInsert (idescs=0xefffcdf8, nIndices=3,
heapRelation=0x171b90, heapTuple=0x1ba810)
at indexing.c:154
#3 0x4fb2c in ProcedureCreate (procedureName=0x166bf0 "widget_in", returnsSet=0
'\000',
returnTypeName=0x166bb0 "widget", languageName=0xefffcf98 "C",
prosrc=0xeb800 "-",
probin=0x1aeb10 "/usr/local/pgsql/src/test/regress/input/../regress.so",
canCache=112 'p',
trusted=1 '\001', byte_pct=100, perbyte_cpu=0, percall_cpu=0,
outin_ratio=100, argList=0x166bd0,
dest=Remote) at pg_proc.c:275
#4 0x55674 in CreateFunction (stmt=0x165a50, dest=Remote) at define.c:329
#5 0xb8430 in ProcessUtility (parsetree=0x165a50, dest=Remote) at utility.c:392
#6 0xb60f8 in pg_exec_query_dest (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';", dest=Remote, aclOverride=0 '\000') at postgres.c:749
#7 0xb5fec in pg_exec_query (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';") at postgres.c:687
#8 0xb72ec in PostgresMain (argc=10, argv=0xeffff268, real_argc=10,
real_argv=0xeffffd84) at postgres.c:1609
#9 0x9f27c in DoBackend (port=0x107c00) at postmaster.c:1519
#10 0x9ecf4 in BackendStartup (port=0x167c00) at postmaster.c:1291
#11 0x9e16c in ServerLoop () at postmaster.c:750
#12 0x9dcc4 in PostmasterMain (argc=0, argv=0xeffffd84) at postmaster.c:556
#13 0x723b0 in main (argc=10, argv=0xeffffd84) at main.c:93

I managed to get this running on a Solaris box. -O2 was not included
by default (wonder why :)). I got a core dump when running initdb
with -O2. I recompiled indexing.c without -O2, and it is much better.
(I basically get the same results as under Linux.) I get the same
core dumps that Keith is seeing with create function.

So, both my Sparc boxes are behaving the same.

Tom Szybist
szybist@boxhill.com

#16Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#14)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist" <szybist@boxhill.com>

Bruce Momjian

Again, if someone wants to conditionally compile the directories to find
the offending file, I am sure we can get a fix for it.

At first look it seems to be: backend/catalog/indexing.c.
Maybe Keith can confirm?

With the latest from cvs the core dump on "create table" is back
when compiled with -O2.

If I compile backend/catalog with -O2 then the table creation is
OK. So it looks like it may be indexing.c, even with Bruce's
recent fixes.

Do you mean -O0 here?

I'm still getting some regression test failures, the worst of which
is a core when creating a function.

Here's a bactrace from a "create function" immediately after an initdb
and using the template1 database.

I have looked over indexing.c and can still see nothing strange. I do
remember that ProcedureSrcIndexScan/PROSRC cache call never worked in
the old code, so this call is now working, but that is the only
functional difference I remember.

The old code in this section was somewhat mangled.

Can I telnet into this machine?

Keith.

Program received signal SIGSEGV, Segmentation fault.
0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,

My guess is that hTupDesc is badly formed, not having the proper
attributes for the relation. Can you do a print of *hTupDesc, and
attrNums[0]?

attNull=0xefffcc97 "") at indexam.c:404
#1 0x4974c in FormIndexDatum (numberOfAttributes=1, attributeNumber=0x1bd544,
heapTuple=0x1ba810,
heapDescriptor=0x1ba86c, datum=0xefffcd80, nullv=0xefffcd20 " \230",
fInfo=0x0) at index.c:1284
#2 0x4a4e8 in CatalogIndexInsert (idescs=0xefffcdf8, nIndices=3,
heapRelation=0x171b90, heapTuple=0x1ba810)
at indexing.c:154
#3 0x4fb2c in ProcedureCreate (procedureName=0x166bf0 "widget_in", returnsSet=0
'\000',
returnTypeName=0x166bb0 "widget", languageName=0xefffcf98 "C",
prosrc=0xeb800 "-",
probin=0x1aeb10 "/usr/local/pgsql/src/test/regress/input/../regress.so",
canCache=112 'p',
trusted=1 '\001', byte_pct=100, perbyte_cpu=0, percall_cpu=0,
outin_ratio=100, argList=0x166bd0,
dest=Remote) at pg_proc.c:275
#4 0x55674 in CreateFunction (stmt=0x165a50, dest=Remote) at define.c:329
#5 0xb8430 in ProcessUtility (parsetree=0x165a50, dest=Remote) at utility.c:392
#6 0xb60f8 in pg_exec_query_dest (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';", dest=Remote, aclOverride=0 '\000') at postgres.c:749
#7 0xb5fec in pg_exec_query (
query_string=0xefffd1a0 "CREATE FUNCTION widget_in(opaque)\n RETURNS
widget\n AS '/usr/local/pgsql/src/test/regress/input/../regress.so'\n
LANGUAGE 'c';") at postgres.c:687
#8 0xb72ec in PostgresMain (argc=10, argv=0xeffff268, real_argc=10,
real_argv=0xeffffd84) at postgres.c:1609
#9 0x9f27c in DoBackend (port=0x107c00) at postmaster.c:1519
#10 0x9ecf4 in BackendStartup (port=0x167c00) at postmaster.c:1291
#11 0x9e16c in ServerLoop () at postmaster.c:750
#12 0x9dcc4 in PostmasterMain (argc=0, argv=0xeffffd84) at postmaster.c:556
#13 0x723b0 in main (argc=10, argv=0xeffffd84) at main.c:93

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#17Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Bruce Momjian (#16)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist <szybist@boxhill.com>

If I compile backend/catalog with -O2 then the table creation is

^^^

OK. So it looks like it may be indexing.c, even with Bruce's
recent fixes.

Do you mean -O0 here?

Yes, a typo, I used -O0 for this dir.

I managed to get this running on a Solaris box. -O2 was not included
by default (wonder why :)). I got a core dump when running initdb
with -O2. I recompiled indexing.c without -O2, and it is much better.
(I basically get the same results as under Linux.) I get the same
core dumps that Keith is seeing with create function.

So, both my Sparc boxes are behaving the same.

I've not got round to trying a build on my Solaris 2.6 box yet. I was
hoping that someone with something faster than a SPARC 2 would do
the biz and get the same results.

So we have at least two problems, some code that is tickling a gcc
optimiser bug (gcc 2.7.2.1 in my case) and an alignment bug in our
code that affects SPARC architecture.

I've half a mind to see if there is a later version of gcc that
does the optimisation correctly. (rpm format for Redhat 4.2)

The "create function" problem is a little harder for me to see
a way forward. ( my debugging skills are very few.)

Keith.

#18Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Keith Parks (#17)
Re: [HACKERS] Core dump in regression tests.

The "create function" problem is a little harder for me to see
a way forward. ( my debugging skills are very few.)

Hmm. Bruce's most recent patches didn't fix my problems on Linux/i686
reported earlier. So I figured I'd try a full build with -O0 just to see
if it helped. Not only did it not help, but I got several other
regression tests failing, some with core dumps which did not crash with
-O2. Weird.

- Tom

#19Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#18)
Re: [HACKERS] Core dump in regression tests.

Bruce Momjian <maillist@candle.pha.pa.us>

The old code in this section was somewhat mangled.

Can I telnet into this machine?

I'm afraid I'm dialup only but would be willing to liase with
you and be online for you to telnet in, if no alternative
permanently connected system can be found.

Program received signal SIGSEGV, Segmentation fault.
0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:404
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) bt
#0 0x34264 in GetIndexValue (tuple=0x1ba810, hTupDesc=0x1ba86c, attOff=0,
attrNums=0x1bd544, fInfo=0x0,

My guess is that hTupDesc is badly formed, not having the proper
attributes for the relation. Can you do a print of *hTupDesc, and
attrNums[0]?

Breakpoint 1, GetIndexValue (tuple=0x1bc210, hTupDesc=0x1bc26c, attOff=0,
attrNums=0x1bf944, fInfo=0x0,
attNull=0xefffcc97 "") at indexam.c:383
383 if (PointerIsValid(fInfo) && FIgetProcOid(fInfo) != InvalidOid)
(gdb) step
404 returnVal = heap_getattr(tuple, attrNums[attOff],
(gdb) list
399 pfree(attData);
400 *attNull = FALSE;
401 }
402 else
403 {
404 returnVal = heap_getattr(tuple, attrNums[attOff],
405
hTupDesc, attNull);
406 }
407 return returnVal;
408 }
(gdb) print tuple
$1 = (HeapTupleData *) 0x1bc210
(gdb) print *tuple
$2 = {t_len = 205, t_oid = 18241, t_cmin = 0, t_cmax = 0, t_xmin = 22018, t_xmax
= 0, t_ctid = {ip_blkid = {
bi_hi = 0, bi_lo = 18}, ip_posid = 31}, t_natts = 16, t_infomask = 2050,
t_hoff = 40 '(',
t_bits = "\000\000\000"}
(gdb) print attrNums[attOff]
$3 = 15

This is a very interesting number. It is saying that it is the prosrc
field index that it is updating. This never used to work because of a
bug in the code for this index. Let me look at this some more.

(gdb) print *hTupDesc
$5 = {natts = 0, attrs = 0x0, constr = 0x0}
(gdb) print fInfo
$6 = (FuncIndexInfo *) 0x0

Hope this helps,
Keith.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#20Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#17)
Re: [HACKERS] Core dump in regression tests.

Thomas A. Szybist <szybist@boxhill.com>

If I compile backend/catalog with -O2 then the table creation is

^^^

OK. So it looks like it may be indexing.c, even with Bruce's
recent fixes.

Do you mean -O0 here?

Yes, a typo, I used -O0 for this dir.

Can you try:

select * from pg_index;

Crashes here. Not good.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#21Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#17)
#23Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Bruce Momjian (#22)
#24Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: Keith Parks (#23)
#25Thomas A. Szybist
szybist@boxhill.com
In reply to: Bruce Momjian (#22)
#26Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#17)
#27Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#26)
#28David Hartwig
daveh@insightdist.com
In reply to: Bruce Momjian (#26)
#29Bruce Momjian
bruce@momjian.us
In reply to: David Hartwig (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#27)
#31Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#27)
#33Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#32)
#34Andreas Zeugswetter
andreas.zeugswetter@telecom.at
In reply to: Thomas Lockhart (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#33)
#36Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#27)
#37Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#36)
#38Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#37)
#39David Hartwig
daveh@insightdist.com
In reply to: Bruce Momjian (#32)
#40David Hartwig
daveh@insightdist.com
In reply to: Andreas Zeugswetter (#34)
#41Thomas A. Szybist
szybist@boxhill.com
In reply to: Bruce Momjian (#38)
#42Bruce Momjian
bruce@momjian.us
In reply to: David Hartwig (#39)
#43David Hartwig
daybee@bellatlantic.net
In reply to: Bruce Momjian (#42)
#44Keith Parks
emkxp01@mtcc.demon.co.uk
In reply to: David Hartwig (#43)
#45Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#44)
#46Bruce Momjian
bruce@momjian.us
In reply to: Keith Parks (#44)
#47Bruce Momjian
bruce@momjian.us
In reply to: David Hartwig (#43)
#48Bruce Momjian
bruce@momjian.us
In reply to: Andreas Zeugswetter (#34)
#49Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#45)
#50Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#49)