Re: I might be getting closer?
[ cc to hackers]
It certainly looks closer, particularly because the failure is s simple
domain constraint failure and not a more internal error.
Have you tried moving ahead a few days to see if the bug was fixed in
CVS?
---------------------------------------------------------------------------
Robert Creager wrote:
-- Start of PGP signed section.
Hey Bruce,
I can get version 2003-02-01 to only fail one test, and sporadically at
that (2 out of 50 runs):*** ./expected/domain.out Sat Jul 26 12:24:18 2003 --- ./results/domain.out Sat Jul 26 12:56:01 2003 *************** *** 263,269 **** insert into domcontest values (5); alter domain con drop constraint t; insert into domcontest values (-5); --fails ! ERROR: ExecEvalConstraintTest: Domain con constraint $1 failed insert into domcontest values (42); -- cleanup drop domain ddef1 restrict; --- 263,269 ---- insert into domcontest values (5); alter domain con drop constraint t; insert into domcontest values (-5); --fails ! ERROR: ExecEvalConstraintTest: Domain con constraint failed insert into domcontest values (42); -- cleanup drop domain ddef1 restrict;======================================================================
--
13:04:42 up 8 days, 17:05, 2 users, load average: 1.84, 1.24, 1.34
-- End of PGP section, PGP failed!
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Import Notes
Reply to msg id not found: 20030726132014.3704cff9.Robert_Creager@LogicalChaos.org
On Sat, 26 Jul 2003 16:49:27 -0400 (EDT)
Bruce Momjian <pgman@candle.pha.pa.us> said something like:
[ cc to hackers]
It certainly looks closer, particularly because the failure is s
simple domain constraint failure and not a more internal error.Have you tried moving ahead a few days to see if the bug was fixed in
CVS?
No. I'll run 2003-02-15 next.
I just got the domain failure on 2003-01-26 after 42 passes.
--
15:03:30 up 8 days, 19:04, 2 users, load average: 2.40, 2.15, 2.31
I found it (I think)...
Looks like something was done after the 15'th...
2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times
2003-02-18 fails 11/50
constraints failed 5 times
sanity_check failed 3 times
misc failed 8 times
inherit failed 2 times
rules failed 1 times
triggers failed 5 times
Cheers,
Rob
--
17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
Looks like something was done after the 15'th...
2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
As far back as that! Okay, many thanks for the info --- that will help.
I'm buried in error message editing right now but will look at the diffs
in that timeframe tomorrow, unless someone beats me to it.
regards, tom lane
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
I looked in the CVS logs while waiting for a compile, and the only patch
I see that goes anywhere near the locking or cache code around that time
is this one:
2003-02-17 21:13 momjian
* src/: backend/storage/lmgr/deadlock.c,
backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c,
backend/utils/adt/lockfuncs.c, include/storage/lock.h,
include/storage/proc.h: Rename 'holder' references to 'proclock'
for PROCLOCK references, for consistency.
which seems like a safe change (I assume it was just a
search-and-replace; do you recall, Bruce?) and anyway the time is not
quite right.
What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)
regards, tom lane
Tom Lane wrote:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50I looked in the CVS logs while waiting for a compile, and the only patch
I see that goes anywhere near the locking or cache code around that time
is this one:2003-02-17 21:13 momjian
* src/: backend/storage/lmgr/deadlock.c,
backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c,
backend/utils/adt/lockfuncs.c, include/storage/lock.h,
include/storage/proc.h: Rename 'holder' references to 'proclock'
for PROCLOCK references, for consistency.which seems like a safe change (I assume it was just a
search-and-replace; do you recall, Bruce?) and anyway the time is not
quite right.
Yes, just a rename operation.
What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)
For the date range:
pgcvs log -d'2003-02-15 00:00:00 GMT<2003-02-18 00:00:00 GMT' -rHEAD
I see:
---------------------------------------------------------------------------
/src/include/optimizer/pathnode.h
tgl
Teach planner how to propagate pathkeys from sub-SELECTs in FROM up to
the outer query. (The implementation is a bit klugy, but it would take
nontrivial restructuring to make it nicer, which this is probably not
worth.) This avoids unnecessary sort steps in examples like
SELECT foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foo
which means there is now a reasonable technique for controlling the
order of inputs to custom aggregates, even in the grouping case.
---
/src/test/regress/expected/case.out
tgl
COALESCE() and NULLIF() are now first-class expressions, not macros
that turn into CASE expressions. They evaluate their arguments at most
once. Patch by Kris Jurka, review and (very light) editorializing by
me.
---
/doc/TODO.detail/exists
momjian
Remove IN/EXISTS TODO.detail item.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I am seeing repeatable success from a CVS of 2003-05-01, and repeatable
failure from current CVS.
I have only been running nightly paralell regression runs since June 27,
so it is possible that the paralell regression was broken in February,
fixed in May, then broken some time after that.
I will test June 1 now.
---------------------------------------------------------------------------
Robert Creager wrote:
-- Start of PGP signed section.
I found it (I think)...
Looks like something was done after the 15'th...
2003-02-15 passes 50/50 and 33/33 on second pass (so far)
2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times
2003-02-18 fails 11/50
constraints failed 5 times
sanity_check failed 3 times
misc failed 8 times
inherit failed 2 times
rules failed 1 times
triggers failed 5 timesCheers,
Rob--
17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35
-- End of PGP section, PGP failed!
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sat, 26 Jul 2003 20:24:56 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:
What time of day did your successive pulls correspond to, anyway?
(I believe my cvs2cl printout above is showing me EST.)regards, tom lane
I'm MST, and I did not specify a timezone on the cvs updates. just <cvs
update -D 2003-02-16>
I can re-do with a specific time/date if you tell me what you want. Or
give me a range. I take a few minutes to do a complete cvs download.
Later,
Rob
--
19:10:13 up 8 days, 23:10, 2 users, load average: 0.00, 0.00, 0.00
On Sat, 26 Jul 2003 21:08:46 -0400 (EDT)
Bruce Momjian <pgman@candle.pha.pa.us> said something like:
I am seeing repeatable success from a CVS of 2003-05-01, and
repeatable failure from current CVS.I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.I will test June 1 now.
I don't know about that Bruce. When I grabbed 2003-05-01, I have 2
failures in 15 runs so far. One item I did have to change was to move
from bison 1.5 to bison 1.875.
I've attached included the first failure one.
*** ./expected/triggers.out Sat Nov 23 11:13:22 2002
--- ./results/triggers.out Sat Jul 26 20:10:18 2003
***************
*** 87,92 ****
--- 87,93 ----
NOTICE: check_pkeys_fkey_cascade: 1 tuple(s) of fkeys are deleted
NOTICE: check_pkeys_fkey_cascade: 1 tuple(s) of fkeys2 are deleted
DROP TABLE pkeys;
+ ERROR: cache lookup of relation 129432 failed
DROP TABLE fkeys;
DROP TABLE fkeys2;
-- -- I've disabled the funny_dup17 test because the new semantics
======================================================================
*** ./expected/sanity_check.out Mon Aug 19 13:33:36 2002
--- ./results/sanity_check.out Sat Jul 26 20:10:20 2003
***************
*** 58,68 ****
pg_statistic | t
pg_trigger | t
pg_type | t
road | t
shighway | t
tenk1 | t
tenk2 | t
! (52 rows)
--
-- another sanity check: every system catalog that has OIDs should
have--- 58,69 ----
pg_statistic | t
pg_trigger | t
pg_type | t
+ pkeys | t
road | t
shighway | t
tenk1 | t
tenk2 | t
! (53 rows)
--
-- another sanity check: every system catalog that has OIDs should
have
======================================================================
*** ./expected/misc.out Sat Jul 26 20:03:48 2003
--- ./results/misc.out Sat Jul 26 20:10:22 2003
***************
*** 633,638 ****
--- 633,639 ----
onek2
path_tbl
person
+ pkeys
point_tbl
polygon_tbl
ramp
***************
*** 657,663 ****
toyemp
varchar_tbl
xacttest
! (93 rows)
--SELECT name(equipment(hobby_construct(text 'skywalking', text
'mer'))) AS equip_name; SELECT hobbies_by_name('basketball');
--- 658,664 ----
toyemp
varchar_tbl
xacttest
! (94 rows)
--SELECT name(equipment(hobby_construct(text 'skywalking', text
'mer'))) AS equip_name; SELECT hobbies_by_name('basketball');
======================================================================
--
20:11:31 up 9 days, 12 min, 2 users, load average: 2.86, 2.30, 1.52
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have only been running nightly paralell regression runs since June 27,
so it is possible that the paralell regression was broken in February,
fixed in May, then broken some time after that.
Any further progress on this?
My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.
regards, tom lane
I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.
I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.
2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 times
Cheers,
Rob
On Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.Any further progress on this?
My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.regards, tom lane
---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly
--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83
I am testing this today. I found 2003-03-03 to not generate a failure
in 20 tests, so I am moving forward to April/May.
---------------------------------------------------------------------------
Robert Creager wrote:
-- Start of PGP signed section.
I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 timesCheers,
RobOn Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.Any further progress on this?
My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.regards, tom lane
---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83
-- End of PGP section, PGP failed!
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I am now seeing this error in 2003-03-03.
CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use
---------------------------------------------------------------------------
Bruce Momjian wrote:
I am testing this today. I found 2003-03-03 to not generate a failure
in 20 tests, so I am moving forward to April/May.---------------------------------------------------------------------------
Robert Creager wrote:
-- Start of PGP signed section.I will stand by the fact that I cannot generate failures from
2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I
didn't screw up the cvs usage, I'll try again tonight if I get the
chance and re-download re-test these two days.I can set up a script that will step through weekly dates starting from
'now' and see if the 02-16 problem might of been fixed and then
re-introduced if you like.2003-02-16 fails 6/50
vacuum failed 1 times
misc failed 3 times
sanity_check failed 3 times
inherit failed 1 times
triggers failed 4 timesCheers,
RobOn Mon, 28 Jul 2003 02:14:32 -0400
Tom Lane <tgl@sss.pgh.pa.us> said something like:Bruce Momjian <pgman@candle.pha.pa.us> writes:
I have only been running nightly paralell regression runs since June
27, so it is possible that the paralell regression was broken in
February, fixed in May, then broken some time after that.Any further progress on this?
My best theory at the moment is that we have a problem with relcache
entry creation failing if it's interrupted by an SI inval message at
just the right time. I don't much want to grovel through six months
worth of changelog entries looking for candidate mistakes, though.regards, tom lane
---------------------------(end of
broadcast)--------------------------- TIP 3: if posting/reading
through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that
your message can get through to the mailing list cleanly--
06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83-- End of PGP section, PGP failed!
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I am now seeing this error in 2003-03-03.
CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use
Define "now seeing". Did you change something? Did you just run more
test cycles and it happened one time? Did it suddenly start to happen a
lot?
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I am now seeing this error in 2003-03-03.
CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in useDefine "now seeing". Did you change something? Did you just run more
test cycles and it happened one time? Did it suddenly start to happen a
lot?
Ran more cycles, that's all. I had reported 2003-03-03 was fine, but
only ran a few tests that previous time. I am looking at the
mid-February date range now.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I am now seeing this error in 2003-03-03.
CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in use
I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation. Which
would possibly not exist yet, and even if they did exist they'd be
invisible under SnapshotNow rules.
However this bug is of long standing, and it doesn't seem all that
probable as an explanation for your difficulties. It would be worth
running the tests with log_min_messages set to DEBUG4 (along with the
verbosity setting, please) and see if you observe "cache state reset"
log entries just before the failures.
In any case this would not explain failures during DROP TABLE, so
there's another issue to look for.
regards, tom lane
Tom, is the attached regression diff considered normal? This was
generated by current CVS.
I am trying to determine what is a normal error and what is something to
be concerned about.
Also, I am up to Feb 25 with no errors, but am still testing.
---------------------------------------------------------------------------
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I am now seeing this error in 2003-03-03.
CREATE TABLE INSERT_CHILD (cx INT default 42,
cy INT CHECK (cy > x))
INHERITS (INSERT_TBL);
+ ERROR: RelationClearRelation: relation 130996 deleted while still in useI have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation. Which
would possibly not exist yet, and even if they did exist they'd be
invisible under SnapshotNow rules.However this bug is of long standing, and it doesn't seem all that
probable as an explanation for your difficulties. It would be worth
running the tests with log_min_messages set to DEBUG4 (along with the
verbosity setting, please) and see if you observe "cache state reset"
log entries just before the failures.In any case this would not explain failures during DROP TABLE, so
there's another issue to look for.regards, tom lane
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/pg/test/regress/regression.diffstext/plainDownload
*** ./expected/constraints.out Mon Jul 28 13:50:13 2003
--- ./results/constraints.out Mon Jul 28 18:32:55 2003
***************
*** 80,102 ****
CREATE TABLE CHECK2_TBL (x int, y text, z int,
CONSTRAINT SEQUENCE_CON
CHECK (x > 3 and y <> 'check failed' and z < 8));
INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR: new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR: new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR: new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR: new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
SELECT '' AS two, * from CHECK2_TBL;
! two | x | y | z
! -----+---+----------+----
! | 4 | check ok | -2
! | 7 | check ok | 7
! (2 rows)
!
--
-- Check constraints on INSERT
--
--- 80,100 ----
CREATE TABLE CHECK2_TBL (x int, y text, z int,
CONSTRAINT SEQUENCE_CON
CHECK (x > 3 and y <> 'check failed' and z < 8));
+ ERROR: relation 126581 deleted while still in use
INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
+ ERROR: relation "check2_tbl" does not exist
INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR: relation "check2_tbl" does not exist
INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR: relation "check2_tbl" does not exist
INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR: relation "check2_tbl" does not exist
INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR: relation "check2_tbl" does not exist
INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
+ ERROR: relation "check2_tbl" does not exist
SELECT '' AS two, * from CHECK2_TBL;
! ERROR: relation "check2_tbl" does not exist
--
-- Check constraints on INSERT
--
======================================================================
*** ./expected/misc.out Mon Jul 28 13:50:13 2003
--- ./results/misc.out Mon Jul 28 18:33:04 2003
***************
*** 580,586 ****
c
c_star
char_tbl
- check2_tbl
check_seq
check_tbl
circle_tbl
--- 580,585 ----
***************
*** 660,666 ****
toyemp
varchar_tbl
xacttest
! (96 rows)
--SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
SELECT hobbies_by_name('basketball');
--- 659,665 ----
toyemp
varchar_tbl
xacttest
! (95 rows)
--SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
SELECT hobbies_by_name('basketball');
======================================================================
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom, is the attached regression diff considered normal? This was
generated by current CVS.
Well, this *looks* like it could be an example of the SI-overrun-
during-create behavior I was talking about. But if you weren't running
a verbose log to show whether a cache flush occurred just before the
error, there's no way to know for sure.
Right at the moment I am more interested in the other cases though
(cache lookup failure during DROP) since I have no plausible
explanation for them.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Tom, is the attached regression diff considered normal? This was
generated by current CVS.Well, this *looks* like it could be an example of the SI-overrun-
during-create behavior I was talking about. But if you weren't running
a verbose log to show whether a cache flush occurred just before the
error, there's no way to know for sure.
OK.
Right at the moment I am more interested in the other cases though
(cache lookup failure during DROP) since I have no plausible
explanation for them.
Thanks. That's what I need to know.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I said:
I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation.
After further study, though, the above theory falls flat on its face:
the relcache does *not* attempt to rebuild new relcache entries after
an SI overrun (see the comments to RelationCacheInvalidate). So I'm
back to wondering what the heck is causing any of these messages.
I think we really need to see a stack trace from one of the failures.
Could you try running CVS tip with an "abort()" call replacing the
"relation %u deleted while still in use" elog? (It's line 1797
in src/backend/utils/cache/relcache.c in CVS tip.) Then when you
get the failure, get a stack trace with gdb from the core dump.
regards, tom lane
OK, on it now!
---------------------------------------------------------------------------
Tom Lane wrote:
I said:
I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation.After further study, though, the above theory falls flat on its face:
the relcache does *not* attempt to rebuild new relcache entries after
an SI overrun (see the comments to RelationCacheInvalidate). So I'm
back to wondering what the heck is causing any of these messages.I think we really need to see a stack trace from one of the failures.
Could you try running CVS tip with an "abort()" call replacing the
"relation %u deleted while still in use" elog? (It's line 1797
in src/backend/utils/cache/relcache.c in CVS tip.) Then when you
get the failure, get a stack trace with gdb from the core dump.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073