Something flaky in the "relfilenode mapping" infrastructure

Started by Tom Lanealmost 12 years ago17 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2014-03-27%2008%3A12%3A11

================== pgsql.13462/src/test/regress/regression.diffs ===================
*** /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out	Thu Mar 27 04:12:40 2014
--- /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out	Thu Mar 27 04:52:02 2014
***************
*** 2333,2339 ****
      ) mapped;
   incorrectly_mapped | have_mappings 
  --------------------+---------------
!                   0 | t
  (1 row)
  -- Checks on creating and manipulation of user defined relations in
--- 2333,2339 ----
      ) mapped;
   incorrectly_mapped | have_mappings 
  --------------------+---------------
!                   1 | t
  (1 row)

-- Checks on creating and manipulation of user defined relations in

======================================================================

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#1)
1 attachment(s)
Re: Something flaky in the "relfilenode mapping" infrastructure

Hi,

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

*** /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out	Thu Mar 27 04:12:40 2014
--- /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out	Thu Mar 27 04:52:02 2014
***************
*** 2333,2339 ****
) mapped;
incorrectly_mapped | have_mappings 
--------------------+---------------
!                   0 | t
(1 row)

That's rather odd. It has survived for a couple of months on the other
buildfarm animals now... Could one of you apply the attached patch
adding more details to eventual failures?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Add-more-details-to-eventual-relfilenodemap-regressi.patchtext/x-patch; charset=us-asciiDownload
>From b3c2e062b433c866e29066196c4bf555d9c978d2 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 28 Mar 2014 21:33:26 +0100
Subject: [PATCH] Add more details to eventual relfilenodemap regression test
 failures.

---
 src/test/regress/expected/alter_table.out | 26 +++++++++++---------------
 src/test/regress/sql/alter_table.sql      | 19 ++++++++-----------
 2 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 0f0c638..283788f 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -2319,22 +2319,18 @@ Check constraints:
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
 -- Check that we map relation oids to filenodes and back correctly.
--- Don't display all the mappings so the test output doesn't change
--- all the time, but make sure we actually do test some values.
+-- Only display bad mappings so the test output doesn't change all the
+-- time.
 SELECT
-    SUM((mapped_oid != oid OR mapped_oid IS NULL)::int) incorrectly_mapped,
-    count(*) > 200 have_mappings
-FROM (
-    SELECT
-        oid, reltablespace, relfilenode, relname,
-        pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) mapped_oid
-    FROM pg_class
-    WHERE relkind IN ('r', 'i', 'S', 't', 'm')
-    ) mapped;
- incorrectly_mapped | have_mappings 
---------------------+---------------
-                  0 | t
-(1 row)
+    oid, reltablespace, relfilenode, relname
+FROM pg_class,
+    pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) mapped_oid
+WHERE relkind IN ('r', 'i', 'S', 't', 'm')
+    AND (mapped_oid != oid OR mapped_oid IS NULL)
+;
+ oid | reltablespace | relfilenode | relname 
+-----+---------------+-------------+---------
+(0 rows)
 
 -- Checks on creating and manipulation of user defined relations in
 -- pg_catalog.
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 87973c1..6103b87 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1554,18 +1554,15 @@ DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
 
 -- Check that we map relation oids to filenodes and back correctly.
--- Don't display all the mappings so the test output doesn't change
--- all the time, but make sure we actually do test some values.
+-- Only display bad mappings so the test output doesn't change all the
+-- time.
 SELECT
-    SUM((mapped_oid != oid OR mapped_oid IS NULL)::int) incorrectly_mapped,
-    count(*) > 200 have_mappings
-FROM (
-    SELECT
-        oid, reltablespace, relfilenode, relname,
-        pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) mapped_oid
-    FROM pg_class
-    WHERE relkind IN ('r', 'i', 'S', 't', 'm')
-    ) mapped;
+    oid, reltablespace, relfilenode, relname
+FROM pg_class,
+    pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) mapped_oid
+WHERE relkind IN ('r', 'i', 'S', 't', 'm')
+    AND (mapped_oid != oid OR mapped_oid IS NULL)
+;
 
 -- Checks on creating and manipulation of user defined relations in
 -- pg_catalog.
-- 
1.8.3.251.g1462b67

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Something flaky in the "relfilenode mapping" infrastructure

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

That's rather odd. It has survived for a couple of months on the other
buildfarm animals now... Could one of you apply the attached patch
adding more details to eventual failures?

Any objection to separating out the have_mappings bit? It wasn't terribly
appropriate before, but it seems really out of place in this formulation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Something flaky in the "relfilenode mapping" infrastructure

On 2014-03-28 16:41:15 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

That's rather odd. It has survived for a couple of months on the other
buildfarm animals now... Could one of you apply the attached patch
adding more details to eventual failures?

Any objection to separating out the have_mappings bit? It wasn't terribly
appropriate before, but it seems really out of place in this formulation.

The patch I sent removed the have_mapping thing entirely? Do you mean it
should be there, but as a separate query?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Something flaky in the "relfilenode mapping" infrastructure

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-28 16:41:15 -0400, Tom Lane wrote:

Any objection to separating out the have_mappings bit? It wasn't terribly
appropriate before, but it seems really out of place in this formulation.

The patch I sent removed the have_mapping thing entirely? Do you mean it
should be there, but as a separate query?

Oh, so it did. Well, do you think we need a query checking that?
I hadn't questioned the need to do so, but if you feel it's unnecessary
I'm certainly willing to pull it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Something flaky in the "relfilenode mapping" infrastructure

On 2014-03-28 16:45:28 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-28 16:41:15 -0400, Tom Lane wrote:

Any objection to separating out the have_mappings bit? It wasn't terribly
appropriate before, but it seems really out of place in this formulation.

The patch I sent removed the have_mapping thing entirely? Do you mean it
should be there, but as a separate query?

Oh, so it did. Well, do you think we need a query checking that?
I hadn't questioned the need to do so, but if you feel it's unnecessary
I'm certainly willing to pull it.

I don't think it's necessary. As far as I understand LATERAL, a join to
a function returning NULL will still return the row. So, the test now
would only test whether there are rows in pg_class which seems a bit
pointless.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Something flaky in the "relfilenode mapping" infrastructure

Andres Freund <andres@2ndquadrant.com> writes:

I don't think it's necessary. As far as I understand LATERAL, a join to
a function returning NULL will still return the row. So, the test now
would only test whether there are rows in pg_class which seems a bit
pointless.

Yeah, after looking closer I'd come to the same conclusion. If the
lateral function call could generate zero rows it'd perhaps be a risk,
but not in this formulation.

Will commit in a moment.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: Something flaky in the "relfilenode mapping" infrastructure

On 2014-03-28 21:36:11 +0100, Andres Freund wrote:

Hi,

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

*** /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out	Thu Mar 27 04:12:40 2014
--- /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out	Thu Mar 27 04:52:02 2014
***************
*** 2333,2339 ****
) mapped;
incorrectly_mapped | have_mappings 
--------------------+---------------
!                   0 | t
(1 row)

That's rather odd. It has survived for a couple of months on the other
buildfarm animals now... Could one of you apply the attached patch
adding more details to eventual failures?

So I had made a notice to recheck on
this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&amp;br=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Something flaky in the "relfilenode mapping" infrastructure

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

So I had made a notice to recheck on
this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&amp;br=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Hard to say. In any case, I agree we can't make any progress unless we
see it again.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#9)
Re: Something flaky in the "relfilenode mapping" infrastructure

On Tue, Apr 15, 2014 at 03:28:41PM -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

So I had made a notice to recheck on
this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&amp;br=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Hard to say. In any case, I agree we can't make any progress unless we
see it again.

The improved test just tripped:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#10)
Re: Something flaky in the "relfilenode mapping" infrastructure

Noah Misch <noah@leadboat.com> writes:

The improved test just tripped:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

Ugh. If the MTBF is circa three months, how will we catch this before
we're dead?

A quick look around the machine's buildfarm directory says there's nothing
left behind that's not included in the buildfarm server report, so I can't
offer any immediate insight. But I can certainly load it up running some
additional tests if anyone has an idea what to look for.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@2ndquadrant.com
In reply to: Noah Misch (#10)
Re: Something flaky in the "relfilenode mapping" infrastructure

On 2014-06-12 00:38:36 -0400, Noah Misch wrote:

On Tue, Apr 15, 2014 at 03:28:41PM -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-03-27 08:02:35 -0400, Tom Lane wrote:

Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

So I had made a notice to recheck on
this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&amp;br=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Hard to say. In any case, I agree we can't make any progress unless we
see it again.

The improved test just tripped:

Hrmpf. Just one of these days I was happy thinking it was gone...

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

Hm. My guess it's that it's just a 'harmless' concurrency issue. The
test currently run in concurrency with others: I think what happens is
that the table gets dropped in the other relation after the query has
acquired the mvcc snapshot (used for the pg_class) test.
But why is it triggering on such a 'unusual' system and not on others?
That's what worries me a bit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: Something flaky in the "relfilenode mapping" infrastructure

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-06-12 00:38:36 -0400, Noah Misch wrote:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

Hm. My guess it's that it's just a 'harmless' concurrency issue. The
test currently run in concurrency with others: I think what happens is
that the table gets dropped in the other relation after the query has
acquired the mvcc snapshot (used for the pg_class) test.
But why is it triggering on such a 'unusual' system and not on others?
That's what worries me a bit.

prairiedog is pretty damn slow by modern standards. OTOH, I think it
is not the slowest machine in the buildfarm; hamster for instance seems
to be at least a factor of 2 slower. So I'm not sure whether to believe
it's just a timing issue.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#13)
1 attachment(s)
Re: Something flaky in the "relfilenode mapping" infrastructure

On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-06-12 00:38:36 -0400, Noah Misch wrote:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

Hm. My guess it's that it's just a 'harmless' concurrency issue. The
test currently run in concurrency with others: I think what happens is
that the table gets dropped in the other relation after the query has
acquired the mvcc snapshot (used for the pg_class) test.
But why is it triggering on such a 'unusual' system and not on others?
That's what worries me a bit.

I can reproduce a similar disturbance in the test query using gdb and a
concurrent table drop, and the table reported in the prairiedog failure is a
table dropped in a concurrent test group. That explanation may not be the
full story behind these particular failures, but it certainly could cause
similar failures in the future.

Let's prevent this by only reporting rows for relations that still exist after
the query is complete.

prairiedog is pretty damn slow by modern standards. OTOH, I think it
is not the slowest machine in the buildfarm; hamster for instance seems
to be at least a factor of 2 slower. So I'm not sure whether to believe
it's just a timing issue.

That kernel's process scheduler could be a factor.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachments:

filenode_relation-test-race-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index a182176..a274d82 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -2375,14 +2375,18 @@ Check constraints:
 
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
--- Check that we map relation oids to filenodes and back correctly.
--- Only display bad mappings so the test output doesn't change all the
--- time.
+-- Check that we map relation oids to filenodes and back correctly.  Only
+-- display bad mappings so the test output doesn't change all the time.  A
+-- filenode function call can return NULL for a relation dropped concurrently
+-- with the call's surrounding query, so check mappings only for relations
+-- that still exist after all calls finish.
+CREATE TEMP TABLE filenode_mapping AS
 SELECT
     oid, mapped_oid, reltablespace, relfilenode, relname
 FROM pg_class,
     pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS mapped_oid
 WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid;
+SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid;
  oid | mapped_oid | reltablespace | relfilenode | relname 
 -----+------------+---------------+-------------+---------
 (0 rows)
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 3f641f9..19e1229 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1582,15 +1582,20 @@ ALTER TABLE IF EXISTS tt8 SET SCHEMA alter2;
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
 
--- Check that we map relation oids to filenodes and back correctly.
--- Only display bad mappings so the test output doesn't change all the
--- time.
+-- Check that we map relation oids to filenodes and back correctly.  Only
+-- display bad mappings so the test output doesn't change all the time.  A
+-- filenode function call can return NULL for a relation dropped concurrently
+-- with the call's surrounding query, so check mappings only for relations
+-- that still exist after all calls finish.
+CREATE TEMP TABLE filenode_mapping AS
 SELECT
     oid, mapped_oid, reltablespace, relfilenode, relname
 FROM pg_class,
     pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS mapped_oid
 WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid;
 
+SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid;
+
 -- Checks on creating and manipulation of user defined relations in
 -- pg_catalog.
 --
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#14)
Re: Something flaky in the "relfilenode mapping" infrastructure

Noah Misch <noah@leadboat.com> writes:

On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2014-06-12 00:38:36 -0400, Noah Misch wrote:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-12%2000%3A17%3A07

Hm. My guess it's that it's just a 'harmless' concurrency issue. The
test currently run in concurrency with others: I think what happens is
that the table gets dropped in the other relation after the query has
acquired the mvcc snapshot (used for the pg_class) test.
But why is it triggering on such a 'unusual' system and not on others?
That's what worries me a bit.

I can reproduce a similar disturbance in the test query using gdb and a
concurrent table drop, and the table reported in the prairiedog failure is a
table dropped in a concurrent test group. That explanation may not be the
full story behind these particular failures, but it certainly could cause
similar failures in the future.

Yeah, that seems like a plausible explanation, since the table shown
in the failure report is one that would be getting dropped concurrently,
and the discrepancy is that we get NULL rather than the expected value
for the pg_filenode_relation result, which is expected if the table is
already dropped when the mapping function is called.

Let's prevent this by only reporting rows for relations that still exist after
the query is complete.

I think this is a bad solution though; it risks masking actual problems.

What seems like a better fix to me is to change the test

mapped_oid IS DISTINCT FROM oid

to

mapped_oid <> oid

pg_class.oid will certainly never read as NULL, so what this will do is
allow the single case where the function returns NULL. AFAIK there is
no reason to suppose that a NULL result would mean anything except "the
table's been dropped", so changing it this way will allow only that case
and not any others.

Alternatively, we could do something like you suggest but adjust the
second join so that it suppresses only rows in which mapped_oid is null
*and* there's no longer a matching OID in pg_class. That would provide
additional confidence that the null result is a valid indicator of a
just-dropped table.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#15)
Re: Something flaky in the "relfilenode mapping" infrastructure

On Thu, Jun 12, 2014 at 10:50:44PM -0400, Tom Lane wrote:

Alternatively, we could do something like you suggest but adjust the
second join so that it suppresses only rows in which mapped_oid is null
*and* there's no longer a matching OID in pg_class. That would provide
additional confidence that the null result is a valid indicator of a
just-dropped table.

Can't hurt; I adjusted it along those lines and committed. For the record,
the failure on prairiedog has appeared a third time:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-13%2005%3A19%3A35

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#16)
Re: Something flaky in the "relfilenode mapping" infrastructure

Noah Misch <noah@leadboat.com> writes:

On Thu, Jun 12, 2014 at 10:50:44PM -0400, Tom Lane wrote:

Alternatively, we could do something like you suggest but adjust the
second join so that it suppresses only rows in which mapped_oid is null
*and* there's no longer a matching OID in pg_class. That would provide
additional confidence that the null result is a valid indicator of a
just-dropped table.

Can't hurt; I adjusted it along those lines and committed.

Looks good.

For the record,
the failure on prairiedog has appeared a third time:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&amp;dt=2014-06-13%2005%3A19%3A35

Yeah, I saw that. I wonder if we recently changed something that improved
the odds of the timing being just so. Two failures in two days seems out
of line with the critter's previous history.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers