7.1 on DEC/Alpha

Started by Brent Vernerover 25 years ago33 messageshackers
Jump to latest
#1Brent Verner
brent@rcfile.org

Hi,
I saw the thread from a few days ago about Linux/Alpha and 7.1. I
believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).

I noticed the following in the postmaster.log, which occurs, as the
Linux/Alpha bug report states, during the misc regression test.

DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often
Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000
Terminating any active server processes...
Server processes were terminated at Fri Dec 22 17:15:48 2000
Reinitializing shared memory and semaphores
DEBUG: starting up
DEBUG: database system was interrupted at 2000-12-22 17:15:47
DEBUG: CheckPoint record at (0, 316624)
DEBUG: Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE

the full src/test/regress/log/postmaster.log can be snagged from
http://www.rcfile.org/postmaster.log

in addition to this, compiling on DEC/Alpha with gcc does not work,
without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is
a builtin that gcc does not know about. The DEC cc builds pg properly.
either way pg is built the test results are much the same, esp the
FAILURE of misc regression test.

If there is anything else I can do to help get this working, please
let me know.

Brent Verner

#2Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#1)
Re: 7.1 on DEC/Alpha

On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:

observation:

commenting out the queries with 'FROM person* p' causes the misc
regression test to pass.

SELECT p.name, p.hobbies.name FROM person* p;

Brent

| Hi,
| I saw the thread from a few days ago about Linux/Alpha and 7.1. I
| believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D).
|
| I noticed the following in the postmaster.log, which occurs, as the
| Linux/Alpha bug report states, during the misc regression test.
|
| DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often
| Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000
| Terminating any active server processes...
| Server processes were terminated at Fri Dec 22 17:15:48 2000
| Reinitializing shared memory and semaphores
| DEBUG: starting up
| DEBUG: database system was interrupted at 2000-12-22 17:15:47
| DEBUG: CheckPoint record at (0, 316624)
| DEBUG: Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE
|
| the full src/test/regress/log/postmaster.log can be snagged from
| http://www.rcfile.org/postmaster.log
|
| in addition to this, compiling on DEC/Alpha with gcc does not work,
| without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is
| a builtin that gcc does not know about. The DEC cc builds pg properly.
| either way pg is built the test results are much the same, esp the
| FAILURE of misc regression test.
|
| If there is anything else I can do to help get this working, please
| let me know.
|
| Brent Verner

#3Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#2)
Re: 7.1 on DEC/Alpha

On 22 Dec 2000 at 21:58 (-0500), Brent Verner wrote:
| On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote:
|
| observation:
|
| commenting out the queries with 'FROM person* p' causes the misc
| regression test to pass.

that's not what I meant to say. the misc test still FAILS, but it
no longer causes pg to die.

b

#4Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#3)
Re: 7.1 on DEC/Alpha

here's a post-mortem.

#0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096
#1 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234
#2 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x14016aa70, argList=0x14016a030,
econtext=0x14016a030) at execQual.c:603
#3 0x1200cde54 in ExecMakeFunctionResult (fcache=0x14016aa70,
arguments=0x1401616d0, econtext=0x14016a030, isNull=0x11fffdf88 "",
isDone=0x0) at execQual.c:654
#4 0x1200ce224 in ExecEvalOper (opClause=0x1401615f0, econtext=0x14016a030,
isNull=0x11fffdf88 "", isDone=0x0) at execQual.c:841
#5 0x1200cea24 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1204
#6 0x1200cec54 in ExecQual (qual=0x14016a1a0, econtext=0x14016a030)
at execQual.c:1356
#7 0x1200cf2a8 in ExecScan (node=0x14016a1d0, accessMtd=0x1200d8320 <SeqNext>)
at execScan.c:129
#8 0x1200d846c in ExecSeqScan (node=0x1401615f0) at nodeSeqscan.c:138
#9 0x1200cc280 in ExecProcNode (node=0x14016a1d0, parent=0x14016a1d0)
at execProcnode.c:284
#10 0x1200ca8c0 in ExecutePlan (estate=0x14016a310, plan=0x14016a1d0,
numberTuples=1, direction=ForwardScanDirection, destfunc=0x140020c20)
at execMain.c:959
#11 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14016a310,
count=0) at execMain.c:199
#12 0x1200d1140 in postquel_getnext (es=0x140160630) at functions.c:324
#13 0x1200d1300 in postquel_execute (es=0x140160630, fcinfo=0x1401604a0,
fcache=0x140160590) at functions.c:417
#14 0x1200d14d8 in fmgr_sql (fcinfo=0x1401604a0) at functions.c:542
#15 0x1200ce09c in ExecMakeFunctionResult (fcache=0x140160480,
arguments=0x14015e810, econtext=0x140119cd0, isNull=0x140160350 "",
isDone=0x11fffe258) at execQual.c:712
#16 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,
isNull=0x140160350 "", isDone=0x11fffe258) at execQual.c:883
#17 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208
#18 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,
isNull=0x1 <Error reading address 0x1: Invalid argument>, isDone=0x0)
at execFlatten.c:56
#19 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183
#20 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x140160290, argList=0x14016a030,
econtext=0x140119cd0) at execQual.c:603
#21 0x1200cde54 in ExecMakeFunctionResult (fcache=0x140160290,
arguments=0x14015e840, econtext=0x140119cd0, isNull=0x11fffe3a0 "",
isDone=0x11fffe468) at execQual.c:654
#22 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0,
isNull=0x11fffe3a0 "", isDone=0x11fffe468) at execQual.c:883
#23 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208
#24 0x1200ce574 in ExecEvalFieldSelect (fselect=0x14015e720,
econtext=0x14016a030, isNull=0x11fffe3a0 "", isDone=0x0) at execQual.c:1091
#25 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234
#26 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030,
isNull=0x1 <Error reading address 0x1: Invalid argument>, isDone=0x0)
at execFlatten.c:56
#27 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0,
isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183
#28 0x1200ceea4 in ExecTargetList (targetlist=0x14015e870,
targettype=0x140160000, values=0x140160260, econtext=0x140119cd0,
isDone=0x11fffe5a8) at execQual.c:1528
#29 0x1200cf1a8 in ExecProject (projInfo=0x0, isDone=0x1) at execQual.c:1751
#30 0x1200d8074 in ExecResult (node=0x14015e5b0) at nodeResult.c:167
#31 0x1200cc238 in ExecProcNode (node=0x14015e5b0, parent=0x14015e5b0)
at execProcnode.c:272
#32 0x1200ca8c0 in ExecutePlan (estate=0x14015eab0, plan=0x14015e5b0,
numberTuples=0, direction=ForwardScanDirection, destfunc=0x1401603a0)
at execMain.c:959
#33 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14015eab0,
count=0) at execMain.c:199
#34 0x12013e5c0 in ProcessQuery (parsetree=0x14015ea80, plan=0x140160000)
at pquery.c:305
#35 0x12013c568 in pg_exec_query_string (
query_string=0x140115310 "SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM person* p;", parse_context=0x1400c5c60) at postgres.c:817
#36 0x12013dd10 in PostgresMain (argv=0x11fffe9a8, real_argv=0x11ffffae8,
username=0x1400b72f9 "pgadmin") at postgres.c:1827
#37 0x12011aef0 in DoBackend (port=0x1400b7080) at postmaster.c:2021
#38 0x12011a888 in BackendStartup (port=0x1400b7080) at postmaster.c:1798
#39 0x12011938c in ServerLoop () at postmaster.c:957
#40 0x120118c10 in PostmasterMain (argv=0x11ffffae8) at postmaster.c:664
#41 0x1200e5980 in main (argv=0x11ffffae8) at main.c:138

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#4)
Re: Re: 7.1 on DEC/Alpha

Brent Verner <brent@rcfile.org> writes:

here's a post-mortem.

#0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096

Looks reasonable as far as it goes. Evidently the crash is in the
heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
We need to look at the data structures that macro uses.
What do you get from

p *fselect

p *econtext

p *resSlot->val

p *resSlot->ttc_tupleDescriptor

BTW, if you didn't configure with --enable-cassert, it'd be a good idea
to go back and try it that way...

regards, tom lane

#6Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#5)
Re: Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 01:00 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > here's a post-mortem.
|
| > #0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0,
| > econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096
|
| Looks reasonable as far as it goes. Evidently the crash is in the
| heap_getattr macro call at line 1096 of src/backend/executor/execQual.c.
| We need to look at the data structures that macro uses.
| What do you get from
|
| p *fselect

$1 = {type = T_FieldSelect, arg = 0x140169d40, fieldnum = 1, resulttype = 25,
resulttypmod = -1}

| p *econtext

$2 = {type = T_ExprContext, ecxt_scantuple = 0x14016a568,
ecxt_innertuple = 0x0, ecxt_outertuple = 0x0,
ecxt_per_query_memory = 0x1400c5df0, ecxt_per_tuple_memory = 0x1400c6670,
ecxt_param_exec_vals = 0x0, ecxt_param_list_info = 0x140141760,
ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}

| p *resSlot->val

Error accessing memory address 0x40141838: Invalid argument.

| p *resSlot->ttc_tupleDescriptor

Error accessing memory address 0x40141848: Invalid argument.

additionally:

(gdb) p result
$4 = 1075058736

(gdb) p *resSlot
Error accessing memory address 0x40141830: Invalid argument.

| BTW, if you didn't configure with --enable-cassert, it'd be a good idea
| to go back and try it that way...

will reconfig/rebuild shortly.

brent

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#6)
Re: 7.1 on DEC/Alpha

Brent Verner <brent@rcfile.org> writes:

(gdb) p *resSlot
Error accessing memory address 0x40141830: Invalid argument.

Oooh. resSlot has been truncated to 32 bits --- judging by the other
nearby pointer values, it almost certainly should have been 0x140141830.
Now we have a lead.

I am guessing that the truncation happened somewhere in
executor/functions.c, but don't see it right away...

regards, tom lane

#8Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#1)
Re: Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 00:47 (-0500), Tom Lane wrote:
|
| > I'll send the patch that allows me to
| > cleanly build with gcc. right now, s_lock.h does the wrong thing
| > when compiling on Alpha/OSF with gcc.
|
| Roger, we want to build with either.

The attached patch _seems_ to do the right thing. could someone
who knows Alpha assembly check it out (please).

for more info on Alpha assembly, this link may help.
http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HTML/APS31DTE/TITLE.HTM

brent 'who learned too much today'

Attachments:

gcc.s_lock.htext/x-chdr; charset=us-asciiDownload+44-0
#9Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#7)
Re: 7.1 on DEC/Alpha

On 24 Dec 2000 at 01:19 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > (gdb) p *resSlot
| > Error accessing memory address 0x40141830: Invalid argument.
|
| Oooh. resSlot has been truncated to 32 bits --- judging by the other
| nearby pointer values, it almost certainly should have been 0x140141830.
| Now we have a lead.

FWIW, saying 'set econtext->ecxt_param_list_info->value 0x14014183' in
geb allows the process to not SEGV where it _was_ destined to do so,
though it does SEGV in a later return to the function. I've tried to
determine where this value is originating, and where it is subsequently
modified, but have not been able to do so. lost in gdb.

Q: I tried doing 'watch <address>', but this (appeared) to just hang.
is there some trick to using 'watch' on addresses that I might be
overlooking?

| I am guessing that the truncation happened somewhere in
| executor/functions.c, but don't see it right away...

more observations WRT sql that blows up postgres on Alpha.

works:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM ONLY person p;

breaks:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person p;
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person* p;

whatever it is that ONLY causes, avoids the breakage. I've spent the
past two days in a gdb-hole, going in circles. I just think don't know
enough (about gdb or postgres) to make any further progress. anyway,
if someone could tell me what difference the ONLY keyword makes WRT
pg internally, it might help me quit running in circles.

thanks.
brent

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#9)
Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

more observations WRT sql that blows up postgres on Alpha.
works:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM ONLY person p;
breaks:
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person p;
SELECT p.hobbies.equipment.name, p.hobbies.name, p.name
FROM person* p;

OK, I see the problem. The breakage actually is present in 7.0.* and
prior versions as well, it just doesn't happen to be exposed by the
regress tests --- until now.

The trouble is the way that entire-tuple function arguments are handled.
Tuple types are declared in pg_type as being the same size as Oid, ie,
4 bytes. This reflects situations where a tuple value is represented by
an Oid reference to a row in a table. (I am not sure whether there is
any code left that depends on that ... in any case I'm nervous about
changing it during beta.) But the expression evaluator's implementation
of a tuple argument is that the Datum value contains a pointer to a
TupleTableSlot. This works fine as long as the Datum is just passed
around as a Datum, but if anyone tries to form a tuple containing that
Datum, only 4 bytes get stored into the tuple. Result: failure on
machines where pointers are wider than 4 bytes.

The reason this shows up in this particular regression test now, and
not before, is that 7.1 does the function evaluations at the top of
the Append plan that implements inheritance union, whereas 7.0 did it
at the bottom. That means that in 7.1, the TupleTableSlot Datum gets
inserted into a tuple that becomes part of the Append output before
it gets to the function execution. 7.0 would still show the bug
under the right circumstances --- a join would do it, for example.

I think that there may still be cases where an Oid is the correct
representation of a tuple type; anyway I'm afraid to foreclose that
possibility. What I'm thinking about doing is setting typmod of
an entire-tuple function argument to sizeof(Pointer), rather than
the default -1, to indicate that a pointer representation is being
used. Comments, hackers?

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

I wrote:

... What I'm thinking about doing is setting typmod of
an entire-tuple function argument to sizeof(Pointer), rather than
the default -1, to indicate that a pointer representation is being
used. Comments, hackers?

Here is a patch to current sources along this line. I have not
committed it, since I'm not sure it does the job. It doesn't break
the regress tests on my machine, but does it fix them on Alphas?
Please apply it locally and let me know what you find.

regards, tom lane

#12Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#11)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote:
| I wrote:
| > ... What I'm thinking about doing is setting typmod of
| > an entire-tuple function argument to sizeof(Pointer), rather than
| > the default -1, to indicate that a pointer representation is being
| > used. Comments, hackers?
|
| Here is a patch to current sources along this line. I have not
| committed it, since I'm not sure it does the job. It doesn't break
| the regress tests on my machine, but does it fix them on Alphas?
| Please apply it locally and let me know what you find.

results _look_ the same from 'make check'. I'm gonna get back into
the debugger on this (I've learned a few tricks that I didn't know
when last I gdb'd on the Alpha).

brent

#13Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#11)
Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote:
| I wrote:
| > ... What I'm thinking about doing is setting typmod of
| > an entire-tuple function argument to sizeof(Pointer), rather than
| > the default -1, to indicate that a pointer representation is being
| > used. Comments, hackers?
|
| Here is a patch to current sources along this line. I have not
| committed it, since I'm not sure it does the job. It doesn't break
| the regress tests on my machine, but does it fix them on Alphas?
| Please apply it locally and let me know what you find.

what I'm seeing now is much the same. FWIW, it looks like we're picking
up the cruft around

functions.c:354 paramLI->value = fcinfo->arg[paramLI->id - 1];

(both of which are type Datum)

i've been in circles trying to figure out where fcinfo->arg is filled.
can you point me toward that?

thanks for your help.
brent

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#13)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

| Please apply it locally and let me know what you find.

what I'm seeing now is much the same.

Drat. More to do, then.

i've been in circles trying to figure out where fcinfo->arg is filled.
can you point me toward that?

See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
But fmgr is probably only the carrier of disease, not the source...

regards, tom lane

#15Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.

sorry, I sent the previous email w/o the details of the different
behavior. Inside ExecEvalFieldSelect(), result is now 303, instead
of 110599844 (...or whatever is was). I'm not sure if this gives
you any additional clues.

thanks.
brent

#16Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.
|
| Drat. More to do, then.
|
| > i've been in circles trying to figure out where fcinfo->arg is filled.
| > can you point me toward that?
|
| See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c.
| But fmgr is probably only the carrier of disease, not the source...

ok, I've tracked this further (in the right direction I hope:).

these are the steps leading up the the assignment of the fscked
fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
blow up ExecEvalFieldSelect.

Breakpoint 4, ExecMakeFunctionResult (fcache=0x14014e700,
arguments=0x14014c850, econtext=0x140127ae0, isNull=0x14014e390 "",
isDone=0x11fffde78) at execQual.c:652
652 if (fcache->fcinfo.nargs > 0 && !fcache->argsValid)
(gdb) print fcache->fcinfo
$56 = {flinfo = 0x14014e700, context = 0x0, resultinfo = 0x14014e7d0,
isnull = 0 '\000', nargs = 1, arg = {0 <repeats 16 times>},
argnull = '\000' <repeats 15 times>}
(gdb) cont
Breakpoint 6, ExecEvalVar (variable=0x14014c820, econtext=0x140127ae0,
isNull=0x14014e7c0 "") at execQual.c:298
298 switch (variable->varno)
(gdb) print *variable
$57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220,
vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0}
(gdb) print *econtext
$58 = {type = T_ExprContext, ecxt_scantuple = 0x14014cc58,
ecxt_innertuple = 0x0, ecxt_outertuple = 0x14014cc58,
ecxt_per_query_memory = 0x1400e6370, ecxt_per_tuple_memory = 0x1400e66a0,
ecxt_param_exec_vals = 0x0, ecxt_param_list_info = 0x0,
ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0}
(gdb) break 313
(gdb) cont
(gdb) print *slot
$60 = {type = T_TupleTableSlot, val = 0x14014e430, ttc_shouldFree = 0 '\000',
ttc_descIsNew = 1 '\001', ttc_tupleDescriptor = 0x14014ded0, ttc_buffer = 0}
(gdb) break 353
(gdb) cont
(gdb) print *heapTuple
$73 = {t_len = 48, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535},
ip_posid = 0}, t_tableOid = 0, t_datamcxt = 0x1400e6370,
t_data = 0x14014e450}
(gdb) print attnum
$74 = 1
(gdb) print *tuple_type
$75 = {natts = 2, attrs = 0x14014df00, constr = 0x0}
(gdb) print isNull
$76 = (bool *) 0x14014e7c0 ""
(gdb) break 359
(gdb) cont
# after heap_getattr, we have the smashed value.
(gdb) print result
$79 = 303

is this nearing the problem, or still simply witnessing symptoms?

brent 'delirious from sleep dep.'

#17Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#14)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > | Please apply it locally and let me know what you find.
|
| > what I'm seeing now is much the same.
|
| Drat. More to do, then.

after hours in the gdb-hole, I see this... maybe a clue? :)

src/include/access/common/heaptuple.c:

450 {
451
452 /*
453 * Fix me when going to a machine with more than a four-byte
454 * word!
455 */
456 off = att_align(off, att[j]->attlen, att[j]->attalign);
457
458 att[j]->attcacheoff = off;
459
460 off = att_addlength(off, att[j]->attlen, tp + off);
461 }

I'm pretty sure I don't know best how to fix this, but I've got some
randomly entered code compiling now :) If it passes the regression
tests I'll send it along.

brent 'glad the coffee shop in the backyard is open now :)'

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#17)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

after hours in the gdb-hole, I see this... maybe a clue? :)

I don't think that comment means anything. Possibly it's a leftover
from a time when there was something unportable there. But if att_align
were broken on Alphas, you'd have a lot worse problems than what you're
seeing.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#16)
Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

Brent Verner <brent@rcfile.org> writes:

these are the steps leading up the the assignment of the fscked
fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually
blow up ExecEvalFieldSelect.

That looks OK as far as it goes. Inside ExecEvalVar, you need to look
at the tuple_type data structure in more detail, specifically
p *tuple_type->attrs[0]
p *tuple_type->attrs[1]
(I think the leading * is correct here, try omitting it if gdb gets
unhappy.)

(gdb) print *variable
$57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220,
vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0}

That part looks promising --- vartypmod is sizeof(Pointer) not -1,
so the front-end part of my patch seems to be working. What I suspect
we'll find is that the tupledesc doesn't show sizeof the first field to
be 8 the way we want. Which would imply that I missed a place (or
multiple places :-() that needs to know about the convention for typmod
of a tuple datatype.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#16)
Re: [HACKERS] Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)

After further study, I realized that fetchatt() and a number of other
places were not prepared to cope with 8-byte pass-by-value datatypes.
Most of them weren't checking for cases they couldn't handle, either.

Here is a revised patch for you to try (this includes yesterday's patch
plus more changes, so you'll need to reverse out the prior patch before
applying this one). NOTE you will need to do a full reconfigure and
rebuild to make this fly --- I'd suggest "make distclean" to start.

regards, tom lane

#21Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#21)
#23Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#23)
#25Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#25)
#28Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#27)
#29Oliver Elphick
olly@lfix.co.uk
In reply to: Tom Lane (#19)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oliver Elphick (#29)
#31Oliver Elphick
olly@lfix.co.uk
In reply to: Oliver Elphick (#29)
#32Ryan Kirkpatrick
pgsql@rkirkpat.net
In reply to: Tom Lane (#20)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ryan Kirkpatrick (#32)