latest snapshot crashes backend

Started by Oliver Elphickabout 27 years ago8 messages
#1Oliver Elphick
olly@lfix.co.uk
1 attachment(s)

The attached commands crash the backend, which exits with status 11. The
backend log (debug level 255) does not show anything that is meaningful to
me. It shows a query, which seems to relate to the final constraint, and
then terminates without further explanation:

...
query: select 1 from individual where NOT ( surname IS NULL AND forenames IS
NUL
L )
parser outputs:
{ QUERY
...
}
after rewriting:
{ QUERY
...
}
/usr/lib/postgresql/bin/postmaster: reaping dead processes...
/usr/lib/postgresql/bin/postmaster: CleanupProc: pid 8970 exited with status 11
...

I don't know where to start looking. I can trace the backend with the
debugger if you will tell me which routine to break at.

Attachments:

tempaudio/basic; name=tempDownload
#2Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Oliver Elphick (#1)
Re: [HACKERS] latest snapshot crashes backend

The attached commands crash the backend, which exits with status 11.

tgl=> select * from x where not (i is null or c is null);
i|c
-+-
1|T
2|A
0|T
(3 rows)

tgl=> select * from x where not (i is null and c is null);
pqReadData() -- backend closed the channel unexpectedly.

So, let's try rewriting it as a workaround:

tgl=> select * from x where (not i is null) or (not c is null);
pqReadData() -- backend closed the channel unexpectedly.

Oops. For some reason the NOT/AND is fatal, while NOT/OR is OK. That's
not so good. The good news is that it will certainly be repairable with
patches.

- Tom

#3Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#2)
Re: [HACKERS] latest snapshot crashes backend

The attached commands crash the backend, which exits with status 11.

tgl=> select * from x where not (i is null or c is null);
i|c
-+-
1|T
2|A
0|T
(3 rows)

tgl=> select * from x where not (i is null and c is null);
pqReadData() -- backend closed the channel unexpectedly.

So, let's try rewriting it as a workaround:

tgl=> select * from x where (not i is null) or (not c is null);
pqReadData() -- backend closed the channel unexpectedly.

Oops. For some reason the NOT/AND is fatal, while NOT/OR is OK. That's
not so good. The good news is that it will certainly be repairable with
patches.

Care to give us a table:

select * from pg_shadow where (usesysid is null and oid is null)\g

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#3)
Re: [HACKERS] latest snapshot crashes backend

Care to give us a table:
select * from pg_shadow where (usesysid is null and oid is null);

Sure, that's easy. Just negate your where clause:

tgl=> select * from pg_shadow
tgl-> where not (usesysid is null and oid is null);
pqReadData() -- backend closed the channel unexpectedly.

If you can reproduce this, will you have a chance to look at it? I don't
know where the problem is, but suspect that it is farther back than the
parser transformations. Rewrite system, optimizer, or executor??

- Tom

#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#4)
Re: [HACKERS] latest snapshot crashes backend

Care to give us a table:
select * from pg_shadow where (usesysid is null and oid is null);

Sure, that's easy. Just negate your where clause:

tgl=> select * from pg_shadow
tgl-> where not (usesysid is null and oid is null);
pqReadData() -- backend closed the channel unexpectedly.

If you can reproduce this, will you have a chance to look at it? I don't
know where the problem is, but suspect that it is farther back than the
parser transformations. Rewrite system, optimizer, or executor??

I will look at it. I can reproduce it here.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#5)
Re: [HACKERS] latest snapshot crashes backend

tgl=> select * from pg_shadow
tgl-> where not (usesysid is null and oid is null);
pqReadData() -- backend closed the channel unexpectedly.

btw, you may already have noticed that this doesn't crash if the same
clause is changed to be a target:

tgl=> select not (usesysid is null and oid is null) from pg_shadow;
?column?
--------
t
t
(2 rows)

- Thomas

#7Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#4)
Re: [HACKERS] latest snapshot crashes backend

Care to give us a table:
select * from pg_shadow where (usesysid is null and oid is null);

Sure, that's easy. Just negate your where clause:

tgl=> select * from pg_shadow
tgl-> where not (usesysid is null and oid is null);
pqReadData() -- backend closed the channel unexpectedly.

If you can reproduce this, will you have a chance to look at it? I don't
know where the problem is, but suspect that it is farther back than the
parser transformations. Rewrite system, optimizer, or executor??

Here is the problem:

(gdb) print ((Expr *) clause)->oper
$1 = (Node *) 0x0

We have dealt with this before in relation to NOT. The system assumes
NOT is an OP_EXPR oper, while it really isn't.

I fixed another query about a month ago relating to this. In that case,
flatten_tlistentry() was loosing information about EXPR nodes.

In this case, the code:

Oid opno = ((Oper *) ((Expr *) clause)->oper)->opno;

clearly is making an assumption it should not be making.

In this case, clause is:

(gdb) print ((Expr *)clause)[0]
$3 = {type = T_Expr, typeOid = 0, opType = NOT_EXPR, oper = 0x0,
args = 0x8307f90}

where opType is not OP_EXPR, but NOT_EXPR.

I am inclined to check for NOT_EXPR, let the selectivity be computed on
the subclause, and negate(NOT) the resulting selectivity. Not sure on
an exact fix yet, but this is a 6.4.1 issue, anyway.

Problem is in the optimizer:

---------------------------------------------------------------------------

#0 0x80bb1c4 in compute_selec (root=0x82d8590, clauses=0x8307fb0,
or_selectivities=0x0) at clausesel.c:274
#1 0x80bb0cb in compute_clause_selec (root=0x82d8590, clause=0x830ea50,
or_selectivities=0x0) at clausesel.c:182
#2 0x80bfc9c in add_clause_to_rels (root=0x82d8590, clause=0x830ea50)
at initsplan.c:211
#3 0x80bfbea in init_base_rels_qual (root=0x82d8590, clauses=0x830f050)
at initsplan.c:155
#4 0x80c01bb in subplanner (root=0x82d8590, flat_tlist=0x830f170,
qual=0x830f050) at planmain.c:262
#5 0x80c00f8 in query_planner (root=0x82d8590, command_type=1,
tlist=0x83077d0, qual=0x83063d0) at planmain.c:176
#6 0x80c085d in union_planner (parse=0x82d8590) at planner.c:151
#7 0x80c06d5 in planner (parse=0x82d8590) at planner.c:72
#8 0x80f56b4 in pg_parse_and_plan (
query_string=0x80455e4 "select * from pg_shadow where not (usesysid is null and oid is null);\n", typev=0x0, nargs=0, queryListP=0x8045594, dest=Debug,
aclOverride=0 '\000') at postgres.c:628
#9 0x80f57f3 in pg_exec_query_dest (
query_string=0x80455e4 "select * from pg_shadow where not (usesysid is null and oid is null);\n", dest=Debug, aclOverride=0) at postgres.c:722

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Thomas G. Lockhart (#4)
Re: [HACKERS] latest snapshot crashes backend

Care to give us a table:
select * from pg_shadow where (usesysid is null and oid is null);

Sure, that's easy. Just negate your where clause:

tgl=> select * from pg_shadow
tgl-> where not (usesysid is null and oid is null);
pqReadData() -- backend closed the channel unexpectedly.

If you can reproduce this, will you have a chance to look at it? I don't
know where the problem is, but suspect that it is farther back than the
parser transformations. Rewrite system, optimizer, or executor??

OK, here is the fix. I am not applying it yet until 6.4 is released by
Marc, OK? In fact, I am not sure how we are going to do patch
application after 6.4, so would someone else please apply this?

---------------------------------------------------------------------------

*** ./backend/optimizer/path/clausesel.c.orig	Wed Nov  4 16:49:35 1998
--- ./backend/optimizer/path/clausesel.c	Wed Nov  4 17:11:02 1998
***************
*** 254,259 ****
--- 254,264 ----
  		 */
  		s1 = 0.1;
  	}
+ 	else if (not_clause((Node *) clause))
+ 	{
+ 		/* negate this baby */
+ 		return 1 - compute_selec(root, ((Expr *)clause)->args, or_selectivities);
+ 	}
  	else if (is_subplan((Node *) clause))
  	{
-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026