7.1 on DEC/Alpha

Started by Brent Vernerabout 25 years ago33 messages
#1Brent Verner
brent@rcfile.org

Hi,
I saw the thread from a few days ago about Linux/Alpha and 7.1. I
believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).

I noticed the following in the postmaster.log, which occurs, as the
Linux/Alpha bug report states, during the misc regression test.

DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often
Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000
Terminating any active server processes...
Server processes were terminated at Fri Dec 22 17:15:48 2000
Reinitializing shared memory and semaphores
DEBUG: starting up
DEBUG: database system was interrupted at 2000-12-22 17:15:47
DEBUG: CheckPoint record at (0, 316624)
DEBUG: Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE

the full src/test/regress/log/postmaster.log can be snagged from
http://www.rcfile.org/postmaster.log

in addition to this, compiling on DEC/Alpha with gcc does not work,
without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is
a builtin that gcc does not know about. The DEC cc builds pg properly.
either way pg is built the test results are much the same, esp the
FAILURE of misc regression test.

If there is anything else I can do to help get this working, please
let me know.

Brent Verner

#2Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#1)
Re: 7.1 on DEC/Alpha

On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:

observation:

commenting out the queries with 'FROM person* p' causes the misc
regression test to pass.

SELECT p.name, p.hobbies.name FROM person* p;

Brent

| Hi,
| I saw the thread from a few days ago about Linux/Alpha and 7.1. I
| believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).
|
| I noticed the following in the postmaster.log, which occurs, as the
| Linux/Alpha bug report states, during the misc regression test.
|
| DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often
| Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000
| Terminating any active server processes...
| Server processes were terminated at Fri Dec 22 17:15:48 2000
| Reinitializing shared memory and semaphores
| DEBUG: starting up
| DEBUG: database system was interrupted at 2000-12-22 17:15:47
| DEBUG: CheckPoint record at (0, 316624)
| DEBUG: Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE
|
| the full src/test/regress/log/postmaster.log can be snagged from
| http://www.rcfile.org/postmaster.log
|
| in addition to this, compiling on DEC/Alpha with gcc does not work,
| without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is
| a builtin that gcc does not know about. The DEC cc builds pg properly.
| either way pg is built the test results are much the same, esp the
| FAILURE of misc regression test.
|
| If there is anything else I can do to help get this working, please
| let me know.
|
| Brent Verner

#3Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#2)
Re: 7.1 on DEC/Alpha

On 22 Dec 2000 at 21:58 (-0500), Brent Verner wrote:
| On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:
|
| observation:
|
| commenting out the queries with 'FROM person* p' causes the misc
| regression test to pass.

that's not what I meant to say. the misc test still FAILS, but it
no longer causes pg to die.

b

#4Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#3)
Re: 7.1 on DEC/Alpha

here's a post-mortem.

#0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096
#1 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234
#2 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x14016aa70, argList=0x14016a030,
econtext=0x14016a030) at execQual.c:603
#3 0x1200cde54 in ExecMakeFunctionResult (fcache=0x14016aa70,
arguments=0x1401616d0, econtext=0x14016a030, isNull=0x11fffdf88 "",
isDone=0x0) at execQual.c:654
#4 0x1200ce224 in ExecEvalOper (opClause=0x1401615f0, econtext=0x14016a030,
isNull=0x11fffdf88 "", isDone=0x0) at execQual.c:841
#5 0x1200cea24 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1204
#6 0x1200cec54 in ExecQual (qual=0x14016a1a0, econtext=0x14016a030)
at execQual.c:1356
#7 0x1200cf2a8 in ExecScan (node=0x14016a1d0, accessMtd=0x1200d8320 <SeqNext>)
at execScan.c:129
#8 0x1200d846c in ExecSeqScan (node=0x1401615f0) at nodeSeqscan.c:138
#9 0x1200cc280 in ExecProcNode (node=0x14016a1d0, parent=0x14016a1d0)
at execProcnode.c:284
#10 0x1200ca8c0 in ExecutePlan (estate=0x14016a310, plan=0x14016a1d0,
numberTuples=1, direction=ForwardScanDirection, destfunc=0x140020c20)
at execMain.c:959
#11 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14016a310,
count=0) at execMain.c:199
#12 0x1200d1140 in postquel_getnext (es=0x140160630) at functions.c:324
#13 0x1200d1300 in postquel_execute (es=0x140160630, fcinfo=0x1401604a0,
fcache=0x140160590) at functions.c:417
#14 0x1200d14d8 in fmgr_sql (fcinfo=0x1401604a0) at functions.c:542
#15 0x1200ce09c in ExecMakeFunctionResult (fcache=0x140160480,
arguments=0x14015e810, econtext=0x140119cd0, isNull=0x140160350 "",
isDone=0x11fffe258) at execQual.c:712
#16 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,
isNull=0x140160350 "", isDone=0x11fffe258) at execQual.c:883
#17 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208
#18 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,
isNull=0x1 <Error reading address 0x1: Invalid argument>, isDone=0x0)
at execFlatten.c:56
#19 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183
#20 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x140160290, argList=0x14016a030,
econtext=0x140119cd0) at execQual.c:603
#21 0x1200cde54 in ExecMakeFunctionResult (fcache=0x140160290,
arguments=0x14015e840, econtext=0x140119cd0, isNull=0x11fffe3a0 "",
isDone=0x11fffe468) at execQual.c:654
#22 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,
isNull=0x11fffe3a0 "", isDone=0x11fffe468) at execQual.c:883
#23 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208
#24 0x1200ce574 in ExecEvalFieldSelect (fselect=0x14015e720,
econtext=0x14016a030, isNull=0x11fffe3a0 "", isDone=0x0) at execQual.c:1091
#25 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234
#26 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,
isNull=0x1 <Error reading address 0x1: Invalid argument>, isDone=0x0)
at execFlatten.c:56
#27 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183
#28 0x1200ceea4 in ExecTargetList (targetlist=0x14015e870,
targettype=0x140160000, values=0x140160260, econtext=0x140119cd0,
isDone=0x11fffe5a8) at execQual.c:1528
#29 0x1200cf1a8 in ExecProject (projInfo=0x0, isDone=0x1) at execQual.c:1751
#30 0x1200d8074 in ExecResult (node=0x14015e5b0) at nodeResult.c:167
#31 0x1200cc238 in ExecProcNode (node=0x14015e5b0, parent=0x14015e5b0)
at execProcnode.c:272
#32 0x1200ca8c0 in ExecutePlan (estate=0x14015eab0, plan=0x14015e5b0,
numberTuples=0, direction=ForwardScanDirection, destfunc=0x1401603a0)
at execMain.c:959
#33 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14015eab0,
count=0) at execMain.c:199
#34 0x12013e5c0 in ProcessQuery (parsetree=0x14015ea80, plan=0x140160000)
at pquery.c:305
#35 0x12013c568 in pg_exec_query_string (
query_string=0x140115310 "SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM person* p;", parse_context=0x1400c5c60) at postgres.c:817
#36 0x12013dd10 in PostgresMain (argv=0x11fffe9a8, real_argv=0x11ffffae8,
username=0x1400b72f9 "pgadmin") at postgres.c:1827
#37 0x12011aef0 in DoBackend (port=0x1400b7080) at postmaster.c:2021
#38 0x12011a888 in BackendStartup (port=0x1400b7080) at postmaster.c:1798
#39 0x12011938c in ServerLoop () at postmaster.c:957
#40 0x120118c10 in PostmasterMain (argv=0x11ffffae8) at postmaster.c:664
#41 0x1200e5980 in main (argv=0x11ffffae8) at main.c:138

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#4)
Re: Re: 7.1 on DEC/Alpha

Brent Verner <brent@rcfile.org> writes:

here's a post-mortem.

#0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096

Looks reasonable as far as it goes. Evidently the crash is in the
heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
We need to look at the data structures that macro uses.
What do you get from

p *fselect

p *econtext

p *resSlot->val

p *resSlot->ttc_tupleDescriptor

BTW, if you didn't configure with --enable-cassert, it'd be a good idea
to go back and try it that way...

regards, tom lane

#6Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#5)
Re: Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 01:00 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > here's a post-mortem.
|
| > #0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
| > econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096
|
| Looks reasonable as far as it goes. Evidently the crash is in the
| heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
| We need to look at the data structures that macro uses.
| What do you get from
|
| p *fselect

$1 = {type = T_FieldSelect, arg = 0x140169d40, fieldnum = 1, resulttype = 25,
resulttypmod = -1}

| p *econtext

$2 = {type = T_ExprContext, ecxt_scantuple = 0x14016a568,
ecxt_innertuple = 0x0, ecxt_outertuple = 0x0,
ecxt_per_query_memory = 0x1400c5df0, ecxt_per_tuple_memory = 0x1400c6670,
ecxt_param_exec_vals = 0x0, ecxt_param_list_info = 0x140141760,
ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}

| p *resSlot->val

Error accessing memory address 0x40141838: Invalid argument.

| p *resSlot->ttc_tupleDescriptor

Error accessing memory address 0x40141848: Invalid argument.

additionally:

(gdb) p result
$4 = 1075058736

(gdb) p *resSlot
Error accessing memory address 0x40141830: Invalid argument.

| BTW, if you didn't configure with --enable-cassert, it'd be a good idea
| to go back and try it that way...

will reconfig/rebuild shortly.

brent

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#6)
Re: 7.1 on DEC/Alpha

Brent Verner <brent@rcfile.org> writes:

(gdb) p *resSlot
Error accessing memory address 0x40141830: Invalid argument.

Oooh. resSlot has been truncated to 32 bits --- judging by the other
nearby pointer values, it almost certainly should have been 0x140141830.
Now we have a lead.

I am guessing that the truncation happened somewhere in
executor/functions.c, but don't see it right away...

regards, tom lane

#8Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#1)
1 attachment(s)
Re: Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 00:47 (-0500), Tom Lane wrote:
|
| > I'll send the patch that allows me to
| > cleanly build with gcc. right now, s_lock.h does the wrong thing
| > when compiling on Alpha/OSF with gcc.
|
| Roger, we want to build with either.

The attached patch _seems_ to do the right thing. could someone
who knows Alpha assembly check it out (please).

for more info on Alpha assembly, this link may help.
http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HTML/APS31DTE/TITLE.HTM

brent 'who learned too much today'

Attachments:

gcc.s_lock.htext/x-chdr; charset=us-asciiDownload
Index: src/include/storage/s_lock.h
===================================================================
RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/storage/s_lock.h,v
retrieving revision 1.75
diff -u -r1.75 s_lock.h
--- src/include/storage/s_lock.h	2000/12/03 14:41:42	1.75
+++ src/include/storage/s_lock.h	2000/12/24 20:04:59
@@ -79,6 +79,48 @@
  * All the gcc inlines
  */
 
+#if defined(__alpha)
+#define __HAVE_ALPHA_TAS__  
+/* avoid the __alpha && __osf__ stuff below */
+
+/*
+  ow... that hurt. perl hackers should not muck with assembly. I believe this
+  is correct. for more info on Alpha assembly, or to prove that this monkey 
+  hacked some Scr3WeDuP assembly, er, I mean clean up the mess below, see
+    http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HTML/APS31DTE/TITLE.HTM
+      (or for an unauthorized copy/mirror)
+    http://rcfile.org/alpha/asm/
+*/
+
+static __inline__ int
+tas(volatile slock_t *lock)
+{
+   slock_t  _res;
+   slock_t  temp;
+
+   __asm__
+   __volatile__
+   (
+   "1: ldq_l %0, %1       \n"
+   "   and   %0, 1, %2    \n"
+   "   bne   %2, 4f       \n"
+   "   xor   %0, 1, %0    \n"
+   "   stq_c %0, %1       \n"
+   "   beq   %0, 3f       \n"
+   "   mb                 \n"
+   "   br    4f           \n"
+   "3: br    1b           \n"
+   "4: cmpeq %2, 0, %2    \n"
+   "   xor   %2, 1, %2    \n"
+     : "=&r" (temp),
+       "=m"  (*lock),
+       "=&r" (_res)
+     : "m"   (*lock)
+   );
+   return (int) _res;
+}
+
+#endif /* __alpha */
 
 #if defined(__i386__)
 #define TAS(lock) tas(lock)
@@ -283,6 +325,7 @@
  * These are the platforms that have common code for gcc and non-gcc
  */
 
+#if !defined(__HAVE_ALPHA_TAS__)
 
 #if defined(__alpha)
 
@@ -333,6 +376,7 @@
 
 #endif /* __alpha */
 
+#endif /* __HAVE_ALPHA_TAS__ */
 
 #if defined(__hpux)
 /*
#9Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#7)
Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 01:19 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > (gdb) p *resSlot
| > Error accessing memory address 0x40141830: Invalid argument.
|
| Oooh. resSlot has been truncated to 32 bits --- judging by the other
| nearby pointer values, it almost certainly should have been 0x140141830.
| Now we have a lead.

FWIW, saying 'set econtext->ecxt_param_list_info->value 0x14014183' in
geb allows the process to not SEGV where it _was_ destined to do so,
though it does SEGV in a later return to the function. I've tried to
determine where this value is originating, and where it is subsequently
modified, but have not been able to do so. lost in gdb.

Q: I tried doing 'watch <address>', but this (appeared) to just hang.
is there some trick to using 'watch' on addresses that I might be
overlooking?

| I am guessing that the truncation happened somewhere in
| executor/functions.c, but don't see it right away...

more observations WRT sql that blows up postgres on Alpha.

works:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM ONLY person p;

breaks:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person p;
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person* p;

whatever it is that ONLY causes, avoids the breakage. I've spent the
past two days in a gdb-hole, going in circles. I just think don't know
enough (about gdb or postgres) to make any further progress. anyway,
if someone could tell me what difference the ONLY keyword makes WRT
pg internally, it might help me quit running in circles.

thanks.
brent

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#9)
Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

more observations WRT sql that blows up postgres on Alpha.
works:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM ONLY person p;
breaks:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person p;
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person* p;

OK, I see the problem. The breakage actually is present in 7.0.* and
prior versions as well, it just doesn't happen to be exposed by the
regress tests --- until now.

The trouble is the way that entire-tuple function arguments are handled.
Tuple types are declared in pg_type as being the same size as Oid, ie,
4 bytes. This reflects situations where a tuple value is represented by
an Oid reference to a row in a table. (I am not sure whether there is
any code left that depends on that ... in any case I'm nervous about
changing it during beta.) But the expression evaluator's implementation
of a tuple argument is that the Datum value contains a pointer to a
TupleTableSlot. This works fine as long as the Datum is just passed
around as a Datum, but if anyone tries to form a tuple containing that
Datum, only 4 bytes get stored into the tuple. Result: failure on
machines where pointers are wider than 4 bytes.

The reason this shows up in this particular regression test now, and
not before, is that 7.1 does the function evaluations at the top of
the Append plan that implements inheritance union, whereas 7.0 did it
at the bottom. That means that in 7.1, the TupleTableSlot Datum gets
inserted into a tuple that becomes part of the Append output before
it gets to the function execution. 7.0 would still show the bug
under the right circumstances --- a join would do it, for example.

I think that there may still be cases where an Oid is the correct
representation of a tuple type; anyway I'm afraid to foreclose that
possibility. What I'm thinking about doing is setting typmod of
an entire-tuple function argument to sizeof(Pointer), rather than
the default -1, to indicate that a pointer representation is being
used. Comments, hackers?

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

I wrote:

... What I'm thinking about doing is setting typmod of
an entire-tuple function argument to sizeof(Pointer), rather than
the default -1, to indicate that a pointer representation is being
used. Comments, hackers?

Here is a patch to current sources along this line. I have not
committed it, since I'm not sure it does the job. It doesn't break
the regress tests on my machine, but does it fix them on Alphas?
Please apply it locally and let me know what you find.

regards, tom lane

#12Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#11)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote:
| I wrote:
| > ... What I'm thinking about doing is setting typmod of
| > an entire-tuple function argument to sizeof(Pointer), rather than
| > the default -1, to indicate that a pointer representation is being
| > used. Comments, hackers?
|
| Here is a patch to current sources along this line. I have not
| committed it, since I'm not sure it does the job. It doesn't break
| the regress tests on my machine, but does it fix them on Alphas?
| Please apply it locally and let me know what you find.

results _look_ the same from 'make check'. I'm gonna get back into
the debugger on this (I've learned a few tricks that I didn't know
when last I gdb'd on the Alpha).

brent

#13Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#11)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote:
| I wrote:
| > ... What I'm thinking about doing is setting typmod of
| > an entire-tuple function argument to sizeof(Pointer), rather than
| > the default -1, to indicate that a pointer representation is being
| > used. Comments, hackers?
|
| Here is a patch to current sources along this line. I have not
| committed it, since I'm not sure it does the job. It doesn't break
| the regress tests on my machine, but does it fix them on Alphas?
| Please apply it locally and let me know what you find.

what I'm seeing now is much the same. FWIW, it looks like we're picking
up the cruft around

functions.c:354 paramLI->value = fcinfo->arg[paramLI->id - 1];

(both of which are type Datum)

i've been in circles trying to figure out where fcinfo->arg is filled.
can you point me toward that?

thanks for your help.
brent

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#13)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

| Please apply it locally and let me know what you find.

what I'm seeing now is much the same.

Drat. More to do, then.

i've been in circles trying to figure out where fcinfo->arg is filled.
can you point me toward that?

See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
But fmgr is probably only the carrier of disease, not the source...

regards, tom lane

#15Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.

sorry, I sent the previous email w/o the details of the different
behavior. Inside ExecEvalFieldSelect(), result is now 303, instead
of 110599844 (...or whatever is was). I'm not sure if this gives
you any additional clues.

thanks.
brent

#16Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.
|
| Drat. More to do, then.
|
| > i've been in circles trying to figure out where fcinfo->arg is filled.
| > can you point me toward that?
|
| See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
| But fmgr is probably only the carrier of disease, not the source...

ok, I've tracked this further (in the right direction I hope:).

these are the steps leading up the the assignment of the fscked
fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
blow up ExecEvalFieldSelect.

Breakpoint 4, ExecMakeFunctionResult (fcache=0x14014e700,
arguments=0x14014c850, econtext=0x140127ae0, isNull=0x14014e390 "",
isDone=0x11fffde78) at execQual.c:652
652 if (fcache->fcinfo.nargs > 0 && !fcache->argsValid)
(gdb) print fcache->fcinfo
$56 = {flinfo = 0x14014e700, context = 0x0, resultinfo = 0x14014e7d0,
isnull = 0 '\000', nargs = 1, arg = {0 <repeats 16 times>},
argnull = '\000' <repeats 15 times>}
(gdb) cont
Breakpoint 6, ExecEvalVar (variable=0x14014c820, econtext=0x140127ae0,
isNull=0x14014e7c0 "") at execQual.c:298
298 switch (variable->varno)
(gdb) print *variable
$57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220,
vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0}
(gdb) print *econtext
$58 = {type = T_ExprContext, ecxt_scantuple = 0x14014cc58,
ecxt_innertuple = 0x0, ecxt_outertuple = 0x14014cc58,
ecxt_per_query_memory = 0x1400e6370, ecxt_per_tuple_memory = 0x1400e66a0,
ecxt_param_exec_vals = 0x0, ecxt_param_list_info = 0x0,
ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}
(gdb) break 313
(gdb) cont
(gdb) print *slot
$60 = {type = T_TupleTableSlot, val = 0x14014e430, ttc_shouldFree = 0 '\000',
ttc_descIsNew = 1 '\001', ttc_tupleDescriptor = 0x14014ded0, ttc_buffer = 0}
(gdb) break 353
(gdb) cont
(gdb) print *heapTuple
$73 = {t_len = 48, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535},
ip_posid = 0}, t_tableOid = 0, t_datamcxt = 0x1400e6370,
t_data = 0x14014e450}
(gdb) print attnum
$74 = 1
(gdb) print *tuple_type
$75 = {natts = 2, attrs = 0x14014df00, constr = 0x0}
(gdb) print isNull
$76 = (bool *) 0x14014e7c0 ""
(gdb) break 359
(gdb) cont
# after heap_getattr, we have the smashed value.
(gdb) print result
$79 = 303

is this nearing the problem, or still simply witnessing symptoms?

brent 'delirious from sleep dep.'

#17Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.
|
| Drat. More to do, then.

after hours in the gdb-hole, I see this... maybe a clue? :)

src/include/access/common/heaptuple.c:

450 {
451
452 /*
453 * Fix me when going to a machine with more than a four-byte
454 * word!
455 */
456 off = att_align(off, att[j]->attlen, att[j]->attalign);
457
458 att[j]->attcacheoff = off;
459
460 off = att_addlength(off, att[j]->attlen, tp + off);
461 }

I'm pretty sure I don't know best how to fix this, but I've got some
randomly entered code compiling now :) If it passes the regression
tests I'll send it along.

brent 'glad the coffee shop in the backyard is open now :)'

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#17)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

after hours in the gdb-hole, I see this... maybe a clue? :)

I don't think that comment means anything. Possibly it's a leftover
from a time when there was something unportable there. But if att_align
were broken on Alphas, you'd have a lot worse problems than what you're
seeing.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#16)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

these are the steps leading up the the assignment of the fscked
fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
blow up ExecEvalFieldSelect.

That looks OK as far as it goes. Inside ExecEvalVar, you need to look
at the tuple_type data structure in more detail, specifically
p *tuple_type->attrs[0]
p *tuple_type->attrs[1]
(I think the leading * is correct here, try omitting it if gdb gets
unhappy.)

(gdb) print *variable
$57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220,
vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0}

That part looks promising --- vartypmod is sizeof(Pointer) not -1,
so the front-end part of my patch seems to be working. What I suspect
we'll find is that the tupledesc doesn't show sizeof the first field to
be 8 the way we want. Which would imply that I missed a place (or
multiple places :-() that needs to know about the convention for typmod
of a tuple datatype.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#16)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

After further study, I realized that fetchatt() and a number of other
places were not prepared to cope with 8-byte pass-by-value datatypes.
Most of them weren't checking for cases they couldn't handle, either.

Here is a revised patch for you to try (this includes yesterday's patch
plus more changes, so you'll need to reverse out the prior patch before
applying this one). NOTE you will need to do a full reconfigure and
rebuild to make this fly --- I'd suggest "make distclean" to start.

regards, tom lane

#21Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#20)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 27 Dec 2000 at 16:50 (-0500), Tom Lane wrote:
| After further study, I realized that fetchatt() and a number of other
| places were not prepared to cope with 8-byte pass-by-value datatypes.
| Most of them weren't checking for cases they couldn't handle, either.
|
| Here is a revised patch for you to try (this includes yesterday's patch
| plus more changes, so you'll need to reverse out the prior patch before
| applying this one). NOTE you will need to do a full reconfigure and
| rebuild to make this fly --- I'd suggest "make distclean" to start.

excellent!

this patch fixes the SEGV problem in the regression tests. the only
remaining failures, which are not due to SEGV, are:

oid ... FAILED
float8 ... FAILED
geometry ... FAILED

initial comments WRT failures:
float8 fails only when building with gcc.
oid recall seeing one-liner change to correct this. will try.

many thanks,
brent

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#21)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

this patch fixes the SEGV problem in the regression tests. the only
remaining failures, which are not due to SEGV, are:
oid ... FAILED
float8 ... FAILED
geometry ... FAILED

What are the regression diffs, exactly?

regards, tom lane

#23Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#22)
1 attachment(s)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 27 Dec 2000 at 18:10 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > this patch fixes the SEGV problem in the regression tests. the only
| > remaining failures, which are not due to SEGV, are:
| > oid ... FAILED
| > float8 ... FAILED
| > geometry ... FAILED
|
| What are the regression diffs, exactly?

see attachment.

brent

Attachments:

regression.diffstext/plain; charset=us-asciiDownload
*** ./expected/oid.out	Mon Nov 20 22:23:20 2000
--- ./results/oid.out	Wed Dec 27 18:27:16 2000
***************
*** 6,11 ****
--- 6,12 ----
  INSERT INTO OID_TBL(f1) VALUES ('1235');
  INSERT INTO OID_TBL(f1) VALUES ('987');
  INSERT INTO OID_TBL(f1) VALUES ('-1040');
+ ERROR:  oidin: error reading "-1040": Error 0 occurred.
  INSERT INTO OID_TBL(f1) VALUES ('99999999');
  INSERT INTO OID_TBL(f1) VALUES ('');
  -- bad inputs 
***************
*** 15,28 ****
  ERROR:  oidin: error in "99asdfasd": can't parse "asdfasd"
  SELECT '' AS six, OID_TBL.*;
   six |     f1     
! -----+------------
       |       1234
       |       1235
       |        987
-      | 4294966256
       |   99999999
       |          0
! (6 rows)
  
  SELECT '' AS one, o.* FROM OID_TBL o WHERE o.f1 = 1234;
   one |  f1  
--- 16,28 ----
  ERROR:  oidin: error in "99asdfasd": can't parse "asdfasd"
  SELECT '' AS six, OID_TBL.*;
   six |    f1    
! -----+----------
       |     1234
       |     1235
       |      987
       | 99999999
       |        0
! (5 rows)
  
  SELECT '' AS one, o.* FROM OID_TBL o WHERE o.f1 = 1234;
   one |  f1  
***************
*** 32,44 ****
  
  SELECT '' AS five, o.* FROM OID_TBL o WHERE o.f1 <> '1234';
   five |     f1     
! ------+------------
        |       1235
        |        987
-       | 4294966256
        |   99999999
        |          0
! (5 rows)
  
  SELECT '' AS three, o.* FROM OID_TBL o WHERE o.f1 <= '1234';
   three |  f1  
--- 32,43 ----
  
  SELECT '' AS five, o.* FROM OID_TBL o WHERE o.f1 <> '1234';
   five |    f1    
! ------+----------
        |     1235
        |      987
        | 99999999
        |        0
! (4 rows)
  
  SELECT '' AS three, o.* FROM OID_TBL o WHERE o.f1 <= '1234';
   three |  f1  
***************
*** 57,75 ****
  
  SELECT '' AS four, o.* FROM OID_TBL o WHERE o.f1 >= '1234';
   four |     f1     
! ------+------------
        |       1234
        |       1235
-       | 4294966256
        |   99999999
! (4 rows)
  
  SELECT '' AS three, o.* FROM OID_TBL o WHERE o.f1 > '1234';
   three |     f1     
! -------+------------
         |       1235
-        | 4294966256
         |   99999999
! (3 rows)
  
  DROP TABLE OID_TBL;
--- 56,72 ----
  
  SELECT '' AS four, o.* FROM OID_TBL o WHERE o.f1 >= '1234';
   four |    f1    
! ------+----------
        |     1234
        |     1235
        | 99999999
! (3 rows)
  
  SELECT '' AS three, o.* FROM OID_TBL o WHERE o.f1 > '1234';
   three |    f1    
! -------+----------
         |     1235
         | 99999999
! (2 rows)
  
  DROP TABLE OID_TBL;

======================================================================

*** ./expected/float8-fp-exception.out	Thu Mar 30 02:46:00 2000
--- ./results/float8.out	Wed Dec 27 18:27:15 2000
***************
*** 214,220 ****
     SET f1 = FLOAT8_TBL.f1 * '-1'
     WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  floating point exception! The last floating point operation either exceeded legal ranges or was a divide by zero
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  pow() result is out of range
  SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;
--- 214,220 ----
     SET f1 = FLOAT8_TBL.f1 * '-1'
     WHERE FLOAT8_TBL.f1 > '0.0';
  SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  Bad float8 input format -- overflow
  SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
  ERROR:  pow() result is out of range
  SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;

======================================================================

*** ./expected/geometry-alpha-precision.out	Mon Oct 16 18:37:37 2000
--- ./results/geometry.out	Wed Dec 27 18:28:10 2000
***************
*** 163,190 ****
   twentyfour |       translation       
  ------------+-------------------------
              | (2,2),(0,0)
-             | (3,3),(1,1)
-             | (2.5,3.5),(2.5,2.5)
-             | (3,3),(3,3)
              | (-8,2),(-10,0)
-             | (-7,3),(-9,1)
-             | (-7.5,3.5),(-7.5,2.5)
-             | (-7,3),(-7,3)
              | (-1,6),(-3,4)
-             | (0,7),(-2,5)
-             | (-0.5,7.5),(-0.5,6.5)
-             | (0,7),(0,7)
              | (7.1,36.5),(5.1,34.5)
-             | (8.1,37.5),(6.1,35.5)
-             | (7.6,38),(7.6,37)
-             | (8.1,37.5),(8.1,37.5)
              | (-3,-10),(-5,-12)
-             | (-2,-9),(-4,-11)
-             | (-2.5,-8.5),(-2.5,-9.5)
-             | (-2,-9),(-2,-9)
              | (12,12),(10,10)
              | (13,13),(11,11)
              | (12.5,13.5),(12.5,12.5)
              | (13,13),(13,13)
  (24 rows)
  
--- 163,190 ----
   twentyfour |       translation       
  ------------+-------------------------
              | (2,2),(0,0)
              | (-8,2),(-10,0)
              | (-1,6),(-3,4)
              | (7.1,36.5),(5.1,34.5)
              | (-3,-10),(-5,-12)
              | (12,12),(10,10)
+             | (3,3),(1,1)
+             | (-7,3),(-9,1)
+             | (0,7),(-2,5)
+             | (8.1,37.5),(6.1,35.5)
+             | (-2,-9),(-4,-11)
              | (13,13),(11,11)
+             | (2.5,3.5),(2.5,2.5)
+             | (-7.5,3.5),(-7.5,2.5)
+             | (-0.5,7.5),(-0.5,6.5)
+             | (7.6,38),(7.6,37)
+             | (-2.5,-8.5),(-2.5,-9.5)
              | (12.5,13.5),(12.5,12.5)
+             | (3,3),(3,3)
+             | (-7,3),(-7,3)
+             | (0,7),(0,7)
+             | (8.1,37.5),(8.1,37.5)
+             | (-2,-9),(-2,-9)
              | (13,13),(13,13)
  (24 rows)
  
***************
*** 193,220 ****
   twentyfour |        translation        
  ------------+---------------------------
              | (2,2),(0,0)
-             | (3,3),(1,1)
-             | (2.5,3.5),(2.5,2.5)
-             | (3,3),(3,3)
              | (12,2),(10,0)
-             | (13,3),(11,1)
-             | (12.5,3.5),(12.5,2.5)
-             | (13,3),(13,3)
              | (5,-2),(3,-4)
-             | (6,-1),(4,-3)
-             | (5.5,-0.5),(5.5,-1.5)
-             | (6,-1),(6,-1)
              | (-3.1,-32.5),(-5.1,-34.5)
-             | (-2.1,-31.5),(-4.1,-33.5)
-             | (-2.6,-31),(-2.6,-32)
-             | (-2.1,-31.5),(-2.1,-31.5)
              | (7,14),(5,12)
-             | (8,15),(6,13)
-             | (7.5,15.5),(7.5,14.5)
-             | (8,15),(8,15)
              | (-8,-8),(-10,-10)
              | (-7,-7),(-9,-9)
              | (-7.5,-6.5),(-7.5,-7.5)
              | (-7,-7),(-7,-7)
  (24 rows)
  
--- 193,220 ----
   twentyfour |        translation        
  ------------+---------------------------
              | (2,2),(0,0)
              | (12,2),(10,0)
              | (5,-2),(3,-4)
              | (-3.1,-32.5),(-5.1,-34.5)
              | (7,14),(5,12)
              | (-8,-8),(-10,-10)
+             | (3,3),(1,1)
+             | (13,3),(11,1)
+             | (6,-1),(4,-3)
+             | (-2.1,-31.5),(-4.1,-33.5)
+             | (8,15),(6,13)
              | (-7,-7),(-9,-9)
+             | (2.5,3.5),(2.5,2.5)
+             | (12.5,3.5),(12.5,2.5)
+             | (5.5,-0.5),(5.5,-1.5)
+             | (-2.6,-31),(-2.6,-32)
+             | (7.5,15.5),(7.5,14.5)
              | (-7.5,-6.5),(-7.5,-7.5)
+             | (3,3),(3,3)
+             | (13,3),(13,3)
+             | (6,-1),(6,-1)
+             | (-2.1,-31.5),(-2.1,-31.5)
+             | (8,15),(8,15)
              | (-7,-7),(-7,-7)
  (24 rows)
  
***************
*** 224,251 ****
   twentyfour |          rotation           
  ------------+-----------------------------
              | (0,0),(0,0)
-             | (0,0),(0,0)
-             | (0,0),(0,0)
-             | (0,0),(0,0)
              | (-0,0),(-20,-20)
-             | (-10,-10),(-30,-30)
-             | (-25,-25),(-25,-35)
-             | (-30,-30),(-30,-30)
              | (-0,2),(-14,0)
-             | (-7,3),(-21,1)
-             | (-17.5,2.5),(-21.5,-0.5)
-             | (-21,3),(-21,3)
              | (0,79.2),(-58.8,0)
-             | (-29.4,118.8),(-88.2,39.6)
-             | (-73.5,104.1),(-108,99)
-             | (-88.2,118.8),(-88.2,118.8)
              | (14,-0),(0,-34)
-             | (21,-17),(7,-51)
-             | (29.5,-42.5),(17.5,-47.5)
-             | (21,-51),(21,-51)
              | (0,40),(0,0)
              | (0,60),(0,20)
              | (0,60),(-10,50)
              | (0,60),(0,60)
  (24 rows)
  
--- 224,251 ----
   twentyfour |          rotation           
  ------------+-----------------------------
              | (0,0),(0,0)
              | (-0,0),(-20,-20)
              | (-0,2),(-14,0)
              | (0,79.2),(-58.8,0)
              | (14,-0),(0,-34)
              | (0,40),(0,0)
+             | (0,0),(0,0)
+             | (-10,-10),(-30,-30)
+             | (-7,3),(-21,1)
+             | (-29.4,118.8),(-88.2,39.6)
+             | (21,-17),(7,-51)
              | (0,60),(0,20)
+             | (0,0),(0,0)
+             | (-25,-25),(-25,-35)
+             | (-17.5,2.5),(-21.5,-0.5)
+             | (-73.5,104.1),(-108,99)
+             | (29.5,-42.5),(17.5,-47.5)
              | (0,60),(-10,50)
+             | (0,0),(0,0)
+             | (-30,-30),(-30,-30)
+             | (-21,3),(-21,3)
+             | (-88.2,118.8),(-88.2,118.8)
+             | (21,-51),(21,-51)
              | (0,60),(0,60)
  (24 rows)
  

======================================================================

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#23)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

INSERT INTO OID_TBL(f1) VALUES ('-1040');
+ ERROR: oidin: error reading "-1040": Error 0 occurred.

Hm. I thought I'd fixed that. Are you up to date on
src/backend/utils/adt/oid.c ? Current CVS has rev 1.42.

*** ./expected/float8-fp-exception.out	Thu Mar 30 02:46:00 2000
--- ./results/float8.out	Wed Dec 27 18:27:15 2000
***************
*** 214,220 ****
SET f1 = FLOAT8_TBL.f1 * '-1'
WHERE FLOAT8_TBL.f1 > '0.0';
SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  floating point exception! The last floating point operation either exceeded legal ranges or was a divide by zero
SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
ERROR:  pow() result is out of range
SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;
--- 214,220 ----
SET f1 = FLOAT8_TBL.f1 * '-1'
WHERE FLOAT8_TBL.f1 > '0.0';
SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f;
! ERROR:  Bad float8 input format -- overflow
SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f;
ERROR:  pow() result is out of range
SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ;

It would appear that Alpha no longer needs the special
float8-fp-exception.out comparison file. Try removing the line

float8/alpha.*-dec-osf=float8-fp-exception

from src/test/regress/resultmap.

The geometry diffs also look like Alpha may be more nearly in sync with
the rest of the world than it used to be. Do any of the other geometry
comparison files match what you are getting as results/geometry.out?

regards, tom lane

#25Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#24)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 27 Dec 2000 at 18:44 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > INSERT INTO OID_TBL(f1) VALUES ('-1040');
| > + ERROR: oidin: error reading "-1040": Error 0 occurred.
|
| Hm. I thought I'd fixed that. Are you up to date on
| src/backend/utils/adt/oid.c ? Current CVS has rev 1.42.

yup. got that version -- 1.42 2000/12/22 21:36:09 tgl

| It would appear that Alpha no longer needs the special
| float8-fp-exception.out comparison file. Try removing the line
|
| float8/alpha.*-dec-osf=float8-fp-exception

cc w/o line above: FAIL
cc w/ line above: ok
gcc w/o line above: ??? (will retest later)
gcc w/ line above: FAIL

| The geometry diffs also look like Alpha may be more nearly in sync with
| the rest of the world than it used to be. Do any of the other geometry
| comparison files match what you are getting as results/geometry.out?

none match.

| regards, tom lane

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#25)
Re: Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

| float8-fp-exception.out comparison file. Try removing the line
|
| float8/alpha.*-dec-osf=float8-fp-exception

cc w/o line above: FAIL
cc w/ line above: ok
gcc w/o line above: ??? (will retest later)
gcc w/ line above: FAIL

OK, then it should work for both cases if you do

float8/alpha.*-dec-osf.*:cc=float8-fp-exception

regards, tom lane

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#25)
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

| Hm. I thought I'd fixed that. Are you up to date on
| src/backend/utils/adt/oid.c ? Current CVS has rev 1.42.

yup. got that version -- 1.42 2000/12/22 21:36:09 tgl

You're right, it was still broken :-(. I think I've got it now, though.

Oliver Elphick was kind enough to arrange access to an Alpha running
Debian Linux, and I find that current-as-of-this-moment sources pass
all regression tests in either serial or parallel test mode on that
system. Curiously, however, the system fails when you try to shut
it down:

Smart Shutdown request at Thu Dec 28 02:41:49 2000
DEBUG: shutting down
FATAL 2: Checkpoint lock is busy while data base is shutting down
Shutdown failed - abort

I have no idea why this should be. Evidently there's something wrong
with the TAS() macro --- yet it seems to work fine elsewhere. Ideas
anyone?

regards, tom lane

#28Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#27)
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 27 Dec 2000 at 21:45 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Hm. I thought I'd fixed that. Are you up to date on
| > | src/backend/utils/adt/oid.c ? Current CVS has rev 1.42.
|
| > yup. got that version -- 1.42 2000/12/22 21:36:09 tgl
|
| You're right, it was still broken :-(. I think I've got it now, though.

i'll check it tomorrow.

| Oliver Elphick was kind enough to arrange access to an Alpha running
| Debian Linux, and I find that current-as-of-this-moment sources pass
| all regression tests in either serial or parallel test mode on that
| system. Curiously, however, the system fails when you try to shut
| it down:

good. I'm glad you guys linked up :)

| Smart Shutdown request at Thu Dec 28 02:41:49 2000
| DEBUG: shutting down
| FATAL 2: Checkpoint lock is busy while data base is shutting down
| Shutdown failed - abort

I'm not seeing this with my latest revision of the TAS() asm.

Smart Shutdown request at Wed Dec 27 19:25:45 2000
DEBUG: shutting down
DEBUG: MoveOfflineLogs: remove 0000000000000000
DEBUG: database system is shut down

| I have no idea why this should be. Evidently there's something wrong
| with the TAS() macro --- yet it seems to work fine elsewhere. Ideas
| anyone?

re-evaluating the asm stuff now.

thanks.
brent

#29Oliver Elphick
olly@lfix.co.uk
In reply to: Tom Lane (#19)
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Tom Lane wrote:
...

system. Curiously, however, the system fails when you try to shut
it down:

Smart Shutdown request at Thu Dec 28 02:41:49 2000
DEBUG: shutting down
FATAL 2: Checkpoint lock is busy while data base is shutting down
Shutdown failed - abort

I have no idea why this should be. Evidently there's something wrong
with the TAS() macro --- yet it seems to work fine elsewhere. Ideas
anyone?

It's not just on Alpha; I've seen that on my i386 Linux system.

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"For God shall bring every work into judgment,
with every secret thing, whether it be good, or
whether it be evil." Ecclesiastes 12:14

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oliver Elphick (#29)
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

"Oliver Elphick" <olly@lfix.co.uk> writes:

Smart Shutdown request at Thu Dec 28 02:41:49 2000
DEBUG: shutting down
FATAL 2: Checkpoint lock is busy while data base is shutting down
Shutdown failed - abort

It's not just on Alpha; I've seen that on my i386 Linux system.

Oooh, that's interesting. I was just blindly assuming that it was
a problem with the Alpha spinlock code (we've sure heard plenty of
discussion of same). But maybe there's an actual logic bug in the
checkpoint code. I don't see one in a quick scan though.

FWIW, I do *not* see this behavior on HPUX. It seems perfectly
reproducible on the Debian Alpha box. Is it reproducible on your
i386 box, or only sometimes?

Vadim, any ideas?

regards, tom lane

#31Oliver Elphick
olly@lfix.co.uk
In reply to: Oliver Elphick (#29)
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Tom Lane wrote:

"Oliver Elphick" <olly@lfix.co.uk> writes:

FATAL 2: Checkpoint lock is busy while data base is shutting down

It's not just on Alpha; I've seen that on my i386 Linux system.

FWIW, I do *not* see this behavior on HPUX. It seems perfectly
reproducible on the Debian Alpha box. Is it reproducible on your
i386 box, or only sometimes?

Hmm. I'm just waking up a bit more. Now I'm thinking slightly more
clearly, I saw the problem yesterday when I was doing an Alpha build
on faure.debian.org; so I think it was actually on Alpha, not i386 after
all. Sorry for the red herring.

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"For God shall bring every work into judgment,
with every secret thing, whether it be good, or
whether it be evil." Ecclesiastes 12:14

#32Ryan Kirkpatrick
pgsql@rkirkpat.net
In reply to: Tom Lane (#20)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On Wed, 27 Dec 2000, Tom Lane wrote:

After further study, I realized that fetchatt() and a number of other
places were not prepared to cope with 8-byte pass-by-value datatypes.
Most of them weren't checking for cases they couldn't handle, either.

Here is a revised patch for you to try (this includes yesterday's patch
plus more changes, so you'll need to reverse out the prior patch before
applying this one). NOTE you will need to do a full reconfigure and
rebuild to make this fly --- I'd suggest "make distclean" to start.

Good news is that it solves the 'misc' regression test failure. It
now passes with flying colors! The bad news is that the 'oid' regression
test is still broken (with the exact same problem as before). I think
Brent hit the same problem... I guess, verify that your oid fix actually
hit the CVS tree, and if it did, rethink the solution. :(
For testing I used the snapshot from ftp.postgresql.org:/pub/dev/
dated yesterday on my Alpha XLT366 running Debian GNU/Linux 2.2r0, kernel
2.2.17. Though I found the 'configure' file actually a copy of 'config.in'
and had to run the latter file through autoconf to get the correct version
of the former file. Weird.
Also, I tested a patches source tree on an Linux/x86 box, and it
passed all regression tests w/o problems. I can test the patched source
tree on a Linux/Sparc machine if you want (bit more effort required to do
so).
Overall, it looks like we are making progress! Thanks to both you
and Brent for looking deeper into these problems. TTYL.

---------------------------------------------------------------------------
| "For to me to live is Christ, and to die is gain." |
| --- Philippians 1:21 (KJV) |
---------------------------------------------------------------------------
| Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ |
---------------------------------------------------------------------------

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Kirkpatrick (#32)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:

Good news is that it solves the 'misc' regression test failure. It
now passes with flying colors! The bad news is that the 'oid' regression
test is still broken (with the exact same problem as before). I think
Brent hit the same problem... I guess, verify that your oid fix actually
hit the CVS tree, and if it did, rethink the solution. :(

I believe that is fixed as of src/backend/utils/adt/oid.c v 1.43,
committed at Thu Dec 28 01:51:15 2000 UTC. It should have been in
Thursday morning's snapshot. If you've got 1.43 and it still fails
the regress test, let me know.

regards, tom lane