Re: I might be getting closer?

Started by Bruce Momjianover 22 years ago21 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

[ cc to hackers]

It certainly looks closer, particularly because the failure is s simple
domain constraint failure and not a more internal error.

Have you tried moving ahead a few days to see if the bug was fixed in
CVS?

---------------------------------------------------------------------------

Robert Creager wrote:
-- Start of PGP signed section.

Hey Bruce,

I can get version 2003-02-01 to only fail one test, and sporadically at
that (2 out of 50 runs):

*** ./expected/domain.out	Sat Jul 26 12:24:18 2003
--- ./results/domain.out	Sat Jul 26 12:56:01 2003
***************
*** 263,269 ****
insert into domcontest values (5);
alter domain con drop constraint t;
insert into domcontest values (-5); --fails
! ERROR:  ExecEvalConstraintTest: Domain con constraint $1 failed
insert into domcontest values (42);
-- cleanup
drop domain ddef1 restrict;
--- 263,269 ----
insert into domcontest values (5);
alter domain con drop constraint t;
insert into domcontest values (-5); --fails
! ERROR:  ExecEvalConstraintTest: Domain con constraint  failed
insert into domcontest values (42);
-- cleanup
drop domain ddef1 restrict;

======================================================================

--
13:04:42 up 8 days, 17:05, 2 users, load average: 1.84, 1.24, 1.34

-- End of PGP section, PGP failed!

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#2Robert Creager
Robert_Creager@LogicalChaos.org
In reply to: Bruce Momjian (#1)

On Sat, 26 Jul 2003 16:49:27 -0400 (EDT)
Bruce Momjian <pgman@candle.pha.pa.us> said something like:

[ cc to hackers]

It certainly looks closer, particularly because the failure is s
simple domain constraint failure and not a more internal error.

Have you tried moving ahead a few days to see if the bug was fixed in
CVS?

No. I'll run 2003-02-15 next.

I just got the domain failure on 2003-01-26 after 42 passes.

--
15:03:30 up 8 days, 19:04, 2 users, load average: 2.40, 2.15, 2.31

#3Robert Creager
Robert_Creager@LogicalChaos.org
In reply to: Robert Creager (#2)
Regression test failure date.

I found it (I think)...

Looks like something was done after the 15'th...

2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times
2003-02-18 fails 11/50
constraints failed 5 times
sanity_check failed 3 times
misc failed 8 times
inherit failed 2 times
rules failed 1 times
triggers failed 5 times

Cheers,
Rob

--
17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Creager (#3)
Re: Regression test failure date.

Robert Creager <Robert_Creager@LogicalChaos.org> writes:

Looks like something was done after the 15'th...

2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50

As far back as that! Okay, many thanks for the info --- that will help.

I'm buried in error message editing right now but will look at the diffs
in that timeframe tomorrow, unless someone beats me to it.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Creager (#3)
Re: Regression test failure date.

Robert Creager <Robert_Creager@LogicalChaos.org> writes:

2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50

I looked in the CVS logs while waiting for a compile, and the only patch
I see that goes anywhere near the locking or cache code around that time
is this one:

2003-02-17 21:13 momjian

* src/: backend/storage/lmgr/deadlock.c,
backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c,
backend/utils/adt/lockfuncs.c, include/storage/lock.h,
include/storage/proc.h: Rename 'holder' references to 'proclock'
for PROCLOCK references, for consistency.

which seems like a safe change (I assume it was just a
search-and-replace; do you recall, Bruce?) and anyway the time is not
quite right.

What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Regression test failure date.

Tom Lane wrote:

Robert Creager <Robert_Creager@LogicalChaos.org> writes:

2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50

I looked in the CVS logs while waiting for a compile, and the only patch
I see that goes anywhere near the locking or cache code around that time
is this one:

2003-02-17 21:13 momjian

* src/: backend/storage/lmgr/deadlock.c,
backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c,
backend/utils/adt/lockfuncs.c, include/storage/lock.h,
include/storage/proc.h: Rename 'holder' references to 'proclock'
for PROCLOCK references, for consistency.

which seems like a safe change (I assume it was just a
search-and-replace; do you recall, Bruce?) and anyway the time is not
quite right.

Yes, just a rename operation.

What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)

For the date range:

pgcvs log -d'2003-02-15 00:00:00 GMT<2003-02-18 00:00:00 GMT' -rHEAD

I see:

---------------------------------------------------------------------------

/src/include/optimizer/pathnode.h

tgl
Teach planner how to propagate pathkeys from sub-SELECTs in FROM up to
the outer query. (The implementation is a bit klugy, but it would take
nontrivial restructuring to make it nicer, which this is probably not
worth.) This avoids unnecessary sort steps in examples like
SELECT foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foo
which means there is now a reasonable technique for controlling the
order of inputs to custom aggregates, even in the grouping case.

---
/src/test/regress/expected/case.out

tgl
COALESCE() and NULLIF() are now first-class expressions, not macros
that turn into CASE expressions. They evaluate their arguments at most
once. Patch by Kris Jurka, review and (very light) editorializing by
me.

---
/doc/TODO.detail/exists

momjian
Remove IN/EXISTS TODO.detail item.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Robert Creager (#3)
Re: Regression test failure date.

I am seeing repeatable success from a CVS of 2003-05-01, and repeatable
failure from current CVS.

I have only been running nightly paralell regression runs since June 27,
so it is possible that the paralell regression was broken in February,
fixed in May, then broken some time after that.

I will test June 1 now.

---------------------------------------------------------------------------

Robert Creager wrote:
-- Start of PGP signed section.

I found it (I think)...

Looks like something was done after the 15'th...

2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times
2003-02-18 fails 11/50
constraints failed 5 times
sanity_check failed 3 times
misc failed 8 times
inherit failed 2 times
rules failed 1 times
triggers failed 5 times

Cheers,
Rob

--
17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35

-- End of PGP section, PGP failed!

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Robert Creager
Robert_Creager@LogicalChaos.org
In reply to: Tom Lane (#5)
Re: Regression test failure date.

On Sat, 26 Jul 2003 20:24:56 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:

What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)

regards, tom lane

I'm MST, and I did not specify a timezone on the cvs updates. just <cvs
update -D 2003-02-16>

I can re-do with a specific time/date if you tell me what you want. Or
give me a range. I take a few minutes to do a complete cvs download.

Later,
Rob

--
19:10:13 up 8 days, 23:10, 2 users, load average: 0.00, 0.00, 0.00

#9Robert Creager
Robert_Creager@LogicalChaos.org
In reply to: Bruce Momjian (#7)
Re: Regression test failure date.

On Sat, 26 Jul 2003 21:08:46 -0400 (EDT)
Bruce Momjian <pgman@candle.pha.pa.us> said something like:

I am seeing repeatable success from a CVS of 2003-05-01, and
repeatable failure from current CVS.

I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.

I will test June 1 now.

I don't know about that Bruce. When I grabbed 2003-05-01, I have 2
failures in 15 runs so far. One item I did have to change was to move
from bison 1.5 to bison 1.875.

I've attached included the first failure one.

*** ./expected/triggers.out	Sat Nov 23 11:13:22 2002
--- ./results/triggers.out	Sat Jul 26 20:10:18 2003
***************
*** 87,92 ****
--- 87,93 ----
  NOTICE:  check_pkeys_fkey_cascade: 1 tuple(s) of fkeys are deleted
  NOTICE:  check_pkeys_fkey_cascade: 1 tuple(s) of fkeys2 are deleted
  DROP TABLE pkeys;
+ ERROR:  cache lookup of relation 129432 failed
  DROP TABLE fkeys;
  DROP TABLE fkeys2;
  -- -- I've disabled the funny_dup17 test because the new semantics

======================================================================

*** ./expected/sanity_check.out	Mon Aug 19 13:33:36 2002
--- ./results/sanity_check.out	Sat Jul 26 20:10:20 2003
***************
*** 58,68 ****
   pg_statistic        | t
   pg_trigger          | t
   pg_type             | t
   road                | t
   shighway            | t
   tenk1               | t
   tenk2               | t
! (52 rows)
  --
  -- another sanity check: every system catalog that has OIDs should
have--- 58,69 ----
   pg_statistic        | t
   pg_trigger          | t
   pg_type             | t
+  pkeys               | t
   road                | t
   shighway            | t
   tenk1               | t
   tenk2               | t
! (53 rows)

--
-- another sanity check: every system catalog that has OIDs should
have

======================================================================

*** ./expected/misc.out	Sat Jul 26 20:03:48 2003
--- ./results/misc.out	Sat Jul 26 20:10:22 2003
***************
*** 633,638 ****
--- 633,639 ----
   onek2
   path_tbl
   person
+  pkeys
   point_tbl
   polygon_tbl
   ramp
***************
*** 657,663 ****
   toyemp
   varchar_tbl
   xacttest
! (93 rows)
  --SELECT name(equipment(hobby_construct(text 'skywalking', text
'mer'))) AS equip_name;  SELECT hobbies_by_name('basketball');
--- 658,664 ----
   toyemp
   varchar_tbl
   xacttest
! (94 rows)

--SELECT name(equipment(hobby_construct(text 'skywalking', text
'mer'))) AS equip_name; SELECT hobbies_by_name('basketball');

======================================================================

--
20:11:31 up 9 days, 12 min, 2 users, load average: 2.86, 2.30, 1.52

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Regression test failure date.

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I have only been running nightly paralell regression runs since June 27,
so it is possible that the paralell regression was broken in February,
fixed in May, then broken some time after that.

Any further progress on this?

My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.

regards, tom lane

#11Robert Creager
Robert_Creager@LogicalChaos.org
In reply to: Tom Lane (#10)
Re: Regression test failure date.

I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.

I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.

2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times

Cheers,
Rob

On Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.

Any further progress on this?

My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.

regards, tom lane

---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly

--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83

#12Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Robert Creager (#11)
Re: Regression test failure date.

I am testing this today. I found 2003-03-03 to not generate a failure
in 20 tests, so I am moving forward to April/May.

---------------------------------------------------------------------------

Robert Creager wrote:
-- Start of PGP signed section.

I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.

I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.

2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times

Cheers,
Rob

On Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.

Any further progress on this?

My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.

regards, tom lane

---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly

--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83

-- End of PGP section, PGP failed!

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#12)
Re: Regression test failure date.

I am now seeing this error in 2003-03-03.

CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use

---------------------------------------------------------------------------

Bruce Momjian wrote:

I am testing this today. I found 2003-03-03 to not generate a failure
in 20 tests, so I am moving forward to April/May.

---------------------------------------------------------------------------

Robert Creager wrote:
-- Start of PGP signed section.

I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.

I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.

2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times

Cheers,
Rob

On Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.

Any further progress on this?

My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.

regards, tom lane

---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly

--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83

-- End of PGP section, PGP failed!

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: Regression test failure date.

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am now seeing this error in 2003-03-03.

CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use

Define "now seeing". Did you change something? Did you just run more
test cycles and it happened one time? Did it suddenly start to happen a
lot?

regards, tom lane

#15Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#14)
Re: Regression test failure date.

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am now seeing this error in 2003-03-03.

CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use

Define "now seeing". Did you change something? Did you just run more
test cycles and it happened one time? Did it suddenly start to happen a
lot?

Ran more cycles, that's all. I had reported 2003-03-03 was fine, but
only ran a few tests that previous time. I am looking at the
mid-February date range now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#15)
Re: Regression test failure date.

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am now seeing this error in 2003-03-03.

CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use

I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation. Which
would possibly not exist yet, and even if they did exist they'd be
invisible under SnapshotNow rules.

However this bug is of long standing, and it doesn't seem all that
probable as an explanation for your difficulties. It would be worth
running the tests with log_min_messages set to DEBUG4 (along with the
verbosity setting, please) and see if you observe "cache state reset"
log entries just before the failures.

In any case this would not explain failures during DROP TABLE, so
there's another issue to look for.

regards, tom lane

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#16)
1 attachment(s)
Re: Regression test failure date.

Tom, is the attached regression diff considered normal? This was
generated by current CVS.

I am trying to determine what is a normal error and what is something to
be concerned about.

Also, I am up to Feb 25 with no errors, but am still testing.

---------------------------------------------------------------------------

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am now seeing this error in 2003-03-03.

CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use

I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation. Which
would possibly not exist yet, and even if they did exist they'd be
invisible under SnapshotNow rules.

However this bug is of long standing, and it doesn't seem all that
probable as an explanation for your difficulties. It would be worth
running the tests with log_min_messages set to DEBUG4 (along with the
verbosity setting, please) and see if you observe "cache state reset"
log entries just before the failures.

In any case this would not explain failures during DROP TABLE, so
there's another issue to look for.

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pg/test/regress/regression.diffstext/plainDownload
*** ./expected/constraints.out	Mon Jul 28 13:50:13 2003
--- ./results/constraints.out	Mon Jul 28 18:32:55 2003
***************
*** 80,102 ****
  CREATE TABLE CHECK2_TBL (x int, y text, z int,
  	CONSTRAINT SEQUENCE_CON
  	CHECK (x > 3 and y <> 'check failed' and z < 8));
  INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
  INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
  SELECT '' AS two, * from CHECK2_TBL;
!  two | x |    y     | z  
! -----+---+----------+----
!      | 4 | check ok | -2
!      | 7 | check ok |  7
! (2 rows)
! 
  --
  -- Check constraints on INSERT
  --
--- 80,100 ----
  CREATE TABLE CHECK2_TBL (x int, y text, z int,
  	CONSTRAINT SEQUENCE_CON
  	CHECK (x > 3 and y <> 'check failed' and z < 8));
+ ERROR:  relation 126581 deleted while still in use
  INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
+ ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
+ ERROR:  relation "check2_tbl" does not exist
  SELECT '' AS two, * from CHECK2_TBL;
! ERROR:  relation "check2_tbl" does not exist
  --
  -- Check constraints on INSERT
  --

======================================================================

*** ./expected/misc.out	Mon Jul 28 13:50:13 2003
--- ./results/misc.out	Mon Jul 28 18:33:04 2003
***************
*** 580,586 ****
   c
   c_star
   char_tbl
-  check2_tbl
   check_seq
   check_tbl
   circle_tbl
--- 580,585 ----
***************
*** 660,666 ****
   toyemp
   varchar_tbl
   xacttest
! (96 rows)
  
  --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
  SELECT hobbies_by_name('basketball');
--- 659,665 ----
   toyemp
   varchar_tbl
   xacttest
! (95 rows)
  
  --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
  SELECT hobbies_by_name('basketball');

======================================================================

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: Regression test failure date.

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom, is the attached regression diff considered normal? This was
generated by current CVS.

Well, this *looks* like it could be an example of the SI-overrun-
during-create behavior I was talking about. But if you weren't running
a verbose log to show whether a cache flush occurred just before the
error, there's no way to know for sure.

Right at the moment I am more interested in the other cases though
(cache lookup failure during DROP) since I have no plausible
explanation for them.

regards, tom lane

#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#18)
Re: Regression test failure date.

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom, is the attached regression diff considered normal? This was
generated by current CVS.

Well, this *looks* like it could be an example of the SI-overrun-
during-create behavior I was talking about. But if you weren't running
a verbose log to show whether a cache flush occurred just before the
error, there's no way to know for sure.

OK.

Right at the moment I am more interested in the other cases though
(cache lookup failure during DROP) since I have no plausible
explanation for them.

Thanks. That's what I need to know.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: Regression test failure date.

I said:

I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation.

After further study, though, the above theory falls flat on its face:
the relcache does *not* attempt to rebuild new relcache entries after
an SI overrun (see the comments to RelationCacheInvalidate). So I'm
back to wondering what the heck is causing any of these messages.

I think we really need to see a stack trace from one of the failures.
Could you try running CVS tip with an "abort()" call replacing the
"relation %u deleted while still in use" elog? (It's line 1797
in src/backend/utils/cache/relcache.c in CVS tip.) Then when you
get the failure, get a stack trace with gdb from the core dump.

regards, tom lane

#21Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#20)
Re: Regression test failure date.

OK, on it now!

---------------------------------------------------------------------------

Tom Lane wrote:

I said:

I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation.

After further study, though, the above theory falls flat on its face:
the relcache does *not* attempt to rebuild new relcache entries after
an SI overrun (see the comments to RelationCacheInvalidate). So I'm
back to wondering what the heck is causing any of these messages.

I think we really need to see a stack trace from one of the failures.
Could you try running CVS tip with an "abort()" call replacing the
"relation %u deleted while still in use" elog? (It's line 1797
in src/backend/utils/cache/relcache.c in CVS tip.) Then when you
get the failure, get a stack trace with gdb from the core dump.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073