pg_index updates and SI invalidation

Started by Pavan Deolaseealmost 19 years ago12 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com

While experimenting with the proposed CREATE INDEX support with
HOT, I realized that SI invalidation are not sent properly for pg_index
updates.

I noticed the following comment in relcache.c

/*
* RelationReloadClassinfo - reload the pg_class row (only)
*
* This function is used only for indexes. We currently allow only the
* pg_class row of an existing index to change (to support changes of
* owner, tablespace, or relfilenode), not its pg_index row or other
* subsidiary index schema information. Therefore it's sufficient to do
* this when we get an SI invalidation. Furthermore, there are cases
* where it's necessary not to throw away the index information, especially
* for "nailed" indexes which we are unable to rebuild on-the-fly.
*
* We can't necessarily reread the pg_class row right away; we might be
* in a failed transaction when we receive the SI notification. If so,
* RelationClearRelation just marks the entry as invalid by setting
* rd_isvalid to false. This routine is called to fix the entry when it
* is next needed.
*/

From the comment, its clear that we don't expect SI invalidation
to work correctly for pg_index row updates. We are thinking of
adding a new attribute to pg_index row to control the usability of
the index in queries. Is it worth spending time to support SI
invalidation for pg_index updates or should we rather add the
attribute to pg_class though pg_index seems to the right place ?

A side-effect of this limitation is that REINDEX does not make
an index immediately available in the same transaction if REINDEX
is used to fix an earlier failed CREATE INDEX CONCURRENTLY.
Though we set "indisvalid" to 'true' at the end of REINDEX, the
effect is not seen until the transaction completes because of
lack of SI invalidation.

Any suggestions how should I proceed with this ? Should I add
a pg_class attribute or is it worth fixing pg_index SI invalidation ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#1)
Re: pg_index updates and SI invalidation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

While experimenting with the proposed CREATE INDEX support with
HOT, I realized that SI invalidation are not sent properly for pg_index
updates.

Hmm ... actually, CREATE INDEX CONCURRENTLY gets this wrong already, no?
I suspect that sessions existing at the time C.I.C is done will never
see the new index as valid, unless something else happens to make them
drop and rebuild their relcache entries for it.

regards, tom lane

#3Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#2)
Re: pg_index updates and SI invalidation

On 3/26/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... actually, CREATE INDEX CONCURRENTLY gets this wrong already, no?
I suspect that sessions existing at the time C.I.C is done will never
see the new index as valid, unless something else happens to make them
drop and rebuild their relcache entries for it.

Yes, C.I.C gets it wrong. I confirmed that new index is seen as invalid
for existing sessions. Is it something we should fix ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#3)
Re: pg_index updates and SI invalidation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 3/26/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm ... actually, CREATE INDEX CONCURRENTLY gets this wrong already, no?

Yes, C.I.C gets it wrong. I confirmed that new index is seen as invalid
for existing sessions. Is it something we should fix ?

Certainly.

It might be feasible to have RelationReloadClassinfo re-read the
pg_index row and apply only the updates for specific known-changeable
columns. The stuff it's worried about is the subsidiary data such
as support function fmgr lookup records, but we don't need those to
change on the fly.

regards, tom lane

#5Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#4)
1 attachment(s)
Re: pg_index updates and SI invalidation

On 3/26/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be feasible to have RelationReloadClassinfo re-read the
pg_index row and apply only the updates for specific known-changeable
columns. The stuff it's worried about is the subsidiary data such
as support function fmgr lookup records, but we don't need those to
change on the fly.

Here is a patch which fixes this. We re-read part of the pg_index
row and update rd_index with the new data. I tested REINDEX and CIC
and both seems to work fine with the patch applied.

Tom, does this look good ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Attachments:

pg_index_SI_inval.patchapplication/octet-stream; name=pg_index_SI_inval.patchDownload
Index: src/backend/utils/cache/inval.c
===================================================================
RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.79
diff -c -r1.79 inval.c
*** src/backend/utils/cache/inval.c	5 Jan 2007 22:19:43 -0000	1.79
--- src/backend/utils/cache/inval.c	27 Mar 2007 07:17:33 -0000
***************
*** 596,601 ****
--- 596,612 ----
  		 */
  		databaseId = MyDatabaseId;
  	}
+ 	else if (tupleRelId == IndexRelationId)
+ 	{
+ 		Form_pg_index indextup = (Form_pg_index) GETSTRUCT(tuple);
+ 		
+ 		/*
+ 		 * When a pg_index row is updated, we should send SI invaliation
+ 		 * to the index relation
+ 		 */
+ 		relationId = indextup->indexrelid;
+ 		databaseId = MyDatabaseId;
+ 	}
  	else
  		return;
  
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.258
diff -c -r1.258 relcache.c
*** src/backend/utils/cache/relcache.c	19 Mar 2007 23:38:29 -0000	1.258
--- src/backend/utils/cache/relcache.c	27 Mar 2007 07:17:33 -0000
***************
*** 212,217 ****
--- 212,218 ----
  static OpClassCacheEnt *LookupOpclassInfo(Oid operatorClassOid,
  				  StrategyNumber numStrats,
  				  StrategyNumber numSupport);
+ static void RelationReloadIndexInfo(Relation relation);
  
  
  /*
***************
*** 1580,1592 ****
  /*
   * RelationReloadClassinfo - reload the pg_class row (only)
   *
!  *	This function is used only for indexes.  We currently allow only the
   *	pg_class row of an existing index to change (to support changes of
   *	owner, tablespace, or relfilenode), not its pg_index row or other
!  *	subsidiary index schema information.  Therefore it's sufficient to do
!  *	this when we get an SI invalidation.  Furthermore, there are cases
!  *	where it's necessary not to throw away the index information, especially
!  *	for "nailed" indexes which we are unable to rebuild on-the-fly.
   *
   *	We can't necessarily reread the pg_class row right away; we might be
   *	in a failed transaction when we receive the SI notification.  If so,
--- 1581,1597 ----
  /*
   * RelationReloadClassinfo - reload the pg_class row (only)
   *
!  *	This function is used only for indexes.  We used to allow only the
   *	pg_class row of an existing index to change (to support changes of
   *	owner, tablespace, or relfilenode), not its pg_index row or other
!  *	subsidiary index schema information. But for CREATE INDEX CONCURRENTLY
!  *	we need to change the pg_index attribute 'indisvalid' and update
!  *	pg_index row. In future, there might be similar requirements for
!  *	the existing and/or new attributes of pg_index to change. To
!  *	support that we now re-read the pg_index row and update one
!  *	or more fields selectively. We don't rebuild the entire the
!  *	entire information because that may not be even possible for
!  *	"nailed" indexes.
   *
   *	We can't necessarily reread the pg_class row right away; we might be
   *	in a failed transaction when we receive the SI notification.  If so,
***************
*** 1634,1639 ****
--- 1639,1651 ----
  	if (relation->rd_amcache)
  		pfree(relation->rd_amcache);
  	relation->rd_amcache = NULL;
+ 
+ 	/*
+ 	 * Before we mark the relation valid, re-read the pg_index row
+ 	 * and update some of the fields selectively. 
+ 	 */
+ 	RelationReloadIndexInfo(relation);
+ 
  	/* Okay, now it's valid again */
  	relation->rd_isvalid = true;
  }
***************
*** 3747,3749 ****
--- 3759,3817 ----
  	unlink(initfilename);
  	/* ignore any error, since it might not be there at all */
  }
+ 
+ /*
+  * Reload index-access-method support data for an index relation
+  */
+ static void
+ RelationReloadIndexInfo(Relation relation)
+ {
+ 	Form_pg_index	index;
+ 	HeapTuple	pg_index_tuple;
+ 	Relation	pg_index_desc;
+ 	SysScanDesc pg_index_scan;
+ 	ScanKeyData key[1];
+ 	bool		indexOK;
+ 
+ 	/*
+ 	 * form a scan key
+ 	 */
+ 	ScanKeyInit(&key[0],
+ 				Anum_pg_index_indexrelid,
+ 				BTEqualStrategyNumber, F_OIDEQ,
+ 				ObjectIdGetDatum(RelationGetRelid(relation)));
+ 	/*
+ 	 * Open pg_index and fetch a tuple.  Force heap scan if we haven't yet
+ 	 * built the critical relcache entries (this includes initdb and startup
+ 	 * without a pg_internal.init file).
+ 	 */
+ 	indexOK = (RelationGetRelid(relation) != IndexRelidIndexId);
+ 
+ 	pg_index_desc = heap_open(IndexRelationId, AccessShareLock);
+ 	pg_index_scan = systable_beginscan(pg_index_desc, IndexRelidIndexId,
+ 									   indexOK && criticalRelcachesBuilt,
+ 									   SnapshotNow,
+ 									   1, key);
+ 
+ 	pg_index_tuple = systable_getnext(pg_index_scan);
+ 
+ 	/*
+ 	 * Must copy tuple before releasing buffer.
+ 	 */
+ 	if (HeapTupleIsValid(pg_index_tuple))
+ 		pg_index_tuple = heap_copytuple(pg_index_tuple);
+ 
+ 	/* all done */
+ 	systable_endscan(pg_index_scan);
+ 	heap_close(pg_index_desc, AccessShareLock);
+ 
+ 	index = (Form_pg_index) GETSTRUCT(pg_index_tuple);
+ 
+ 	Assert(relation->rd_indextuple);
+ 	Assert(relation->rd_index);
+ 	
+ 	relation->rd_index->indisvalid = index->indisvalid;
+ 
+ 	heap_freetuple(pg_index_tuple);
+ 	return;
+ }
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#5)
Re: pg_index updates and SI invalidation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Here is a patch which fixes this. We re-read part of the pg_index
row and update rd_index with the new data. I tested REINDEX and CIC
and both seems to work fine with the patch applied.

Tom, does this look good ?

It seems a bit brute-force. Why didn't you use SearchSysCache(INDEXRELID)
the same as RelationInitIndexAccessInfo does? And what's the point of
the extra tuple copy step, instead of assigning the values into the
cache entry immediately?

regards, tom lane

#7Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#6)
Re: pg_index updates and SI invalidation

On 3/28/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems a bit brute-force. Why didn't you use SearchSysCache(INDEXRELID)
the same as RelationInitIndexAccessInfo does?

I tried that initially, but it gets into infinite recursion during initdb.

And what's the point of

the extra tuple copy step, instead of assigning the values into the
cache entry immediately?

Oops, sorry. Thats a copy-paste error. We certainly don't need
to copy the tuple.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#7)
Re: pg_index updates and SI invalidation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 3/28/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems a bit brute-force. Why didn't you use SearchSysCache(INDEXRELID)
the same as RelationInitIndexAccessInfo does?

I tried that initially, but it gets into infinite recursion during initdb.

[squint...] How can that fail during a reload if it worked the first
time? Needs a closer look at what's happening.

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: pg_index updates and SI invalidation

Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 3/28/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems a bit brute-force. Why didn't you use SearchSysCache(INDEXRELID)
the same as RelationInitIndexAccessInfo does?

I tried that initially, but it gets into infinite recursion during initdb.

[squint...] How can that fail during a reload if it worked the first
time? Needs a closer look at what's happening.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Bruce Momjian (#9)
1 attachment(s)
Re: pg_index updates and SI invalidation

On 4/3/07, Bruce Momjian <bruce@momjian.us> wrote:

Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:

[squint...] How can that fail during a reload if it worked the first
time? Needs a closer look at what's happening.

Please see the attached updated patch, based on Tom's comments.

Attempt to reload index information for system indexes such as
pg_class_oid_index can cause infinite recursion. But I realized that
we don't need to reload system index information because we
neither allow CREATE INDEX or CIC on system relations. Only
REINDEX is allowed which does not need any reload. So we skip
index information reload for system relations.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Attachments:

pg_index_SI_inval_v2.patchapplication/octet-stream; name=pg_index_SI_inval_v2.patchDownload
Index: src/backend/utils/cache/inval.c
===================================================================
RCS file: /home/cvs/hot/cvs/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.79
retrieving revision 1.79.2.1
diff -c -r1.79 -r1.79.2.1
*** src/backend/utils/cache/inval.c	5 Jan 2007 22:19:43 -0000	1.79
--- src/backend/utils/cache/inval.c	3 Apr 2007 05:53:46 -0000	1.79.2.1
***************
*** 596,601 ****
--- 596,612 ----
  		 */
  		databaseId = MyDatabaseId;
  	}
+ 	else if (tupleRelId == IndexRelationId)
+ 	{
+ 		Form_pg_index indextup = (Form_pg_index) GETSTRUCT(tuple);
+ 		
+ 		/*
+ 		 * When a pg_index row is updated, we should send SI invaliation
+ 		 * to the index relation
+ 		 */
+ 		relationId = indextup->indexrelid;
+ 		databaseId = MyDatabaseId;
+ 	}
  	else
  		return;
  
Index: src/backend/utils/cache/relcache.c
===================================================================
RCS file: /home/cvs/hot/cvs/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.259
diff -c -r1.259 relcache.c
*** src/backend/utils/cache/relcache.c	29 Mar 2007 00:15:38 -0000	1.259
--- src/backend/utils/cache/relcache.c	5 Apr 2007 06:39:07 -0000
***************
*** 212,217 ****
--- 212,218 ----
  static OpClassCacheEnt *LookupOpclassInfo(Oid operatorClassOid,
  				  StrategyNumber numStrats,
  				  StrategyNumber numSupport);
+ static void RelationReloadIndexInfo(Relation relation);
  
  
  /*
***************
*** 1581,1593 ****
  /*
   * RelationReloadClassinfo - reload the pg_class row (only)
   *
!  *	This function is used only for indexes.  We currently allow only the
   *	pg_class row of an existing index to change (to support changes of
   *	owner, tablespace, or relfilenode), not its pg_index row or other
!  *	subsidiary index schema information.  Therefore it's sufficient to do
!  *	this when we get an SI invalidation.  Furthermore, there are cases
!  *	where it's necessary not to throw away the index information, especially
!  *	for "nailed" indexes which we are unable to rebuild on-the-fly.
   *
   *	We can't necessarily reread the pg_class row right away; we might be
   *	in a failed transaction when we receive the SI notification.  If so,
--- 1582,1598 ----
  /*
   * RelationReloadClassinfo - reload the pg_class row (only)
   *
!  *	This function is used only for indexes.  We used to allow only the
   *	pg_class row of an existing index to change (to support changes of
   *	owner, tablespace, or relfilenode), not its pg_index row or other
!  *	subsidiary index schema information. But for CREATE INDEX CONCURRENTLY
!  *	we need to change the pg_index attribute 'indisvalid' and update
!  *	pg_index row. In future, there might be similar requirements for
!  *	the existing and/or new attributes of pg_index to change. To
!  *	support that we now re-read the pg_index row and update one
!  *	or more fields selectively. We don't rebuild the entire the
!  *	entire information because that may not be even possible for
!  *	"nailed" indexes.
   *
   *	We can't necessarily reread the pg_class row right away; we might be
   *	in a failed transaction when we receive the SI notification.  If so,
***************
*** 1635,1640 ****
--- 1640,1653 ----
  	if (relation->rd_amcache)
  		pfree(relation->rd_amcache);
  	relation->rd_amcache = NULL;
+ 
+ 	/*
+ 	 * Before we mark the relation valid, re-read the pg_index row
+ 	 * and update some of the fields selectively. 
+ 	 */
+ 	if (!IsSystemRelation(relation))
+ 		RelationReloadIndexInfo(relation);
+ 
  	/* Okay, now it's valid again */
  	relation->rd_isvalid = true;
  }
***************
*** 3758,3760 ****
--- 3771,3802 ----
  	unlink(initfilename);
  	/* ignore any error, since it might not be there at all */
  }
+ 
+ /*
+  * Reload index-access-method support data for an index relation
+  */
+ static void
+ RelationReloadIndexInfo(Relation relation)
+ {
+ 	HeapTuple	tuple;
+ 	Form_pg_index	index;
+ 
+ 	tuple = SearchSysCache(INDEXRELID,
+ 						   ObjectIdGetDatum(RelationGetRelid(relation)),
+ 						   0, 0, 0);
+ 	if (!HeapTupleIsValid(tuple))
+ 		elog(ERROR, "cache lookup failed for index %u",
+ 			 RelationGetRelid(relation));
+ 
+ 	index = (Form_pg_index) GETSTRUCT(tuple);
+ 
+ 	Assert(relation->rd_indextuple);
+ 	Assert(relation->rd_index);
+ 	
+ 	relation->rd_index->indisvalid = index->indisvalid;
+ 	relation->rd_index->indcreatexid = index->indcreatexid;
+ 	relation->rd_index->indisrdonly = index->indisrdonly;
+ 
+ 	ReleaseSysCache(tuple);
+ 	return;
+ }
#11Bruce Momjian
bruce@momjian.us
In reply to: Pavan Deolasee (#10)
Re: pg_index updates and SI invalidation

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Pavan Deolasee wrote:

On 4/3/07, Bruce Momjian <bruce@momjian.us> wrote:

Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:

[squint...] How can that fail during a reload if it worked the first
time? Needs a closer look at what's happening.

Please see the attached updated patch, based on Tom's comments.

Attempt to reload index information for system indexes such as
pg_class_oid_index can cause infinite recursion. But I realized that
we don't need to reload system index information because we
neither allow CREATE INDEX or CIC on system relations. Only
REINDEX is allowed which does not need any reload. So we skip
index information reload for system relations.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#10)
Re: pg_index updates and SI invalidation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

Please see the attached updated patch, based on Tom's comments.

Attempt to reload index information for system indexes such as
pg_class_oid_index can cause infinite recursion. But I realized that
we don't need to reload system index information because we
neither allow CREATE INDEX or CIC on system relations. Only
REINDEX is allowed which does not need any reload. So we skip
index information reload for system relations.

Applied with revisions --- mostly, trying to keep the comments in sync
with the code. I also added a forced relcache inval on the index's
parent table at the end of CREATE INDEX CONCURRENTLY; this is to flush
cached plans and allow the newly valid index to be considered in
replanning. (The relcache inval on the index won't do it, since by
definition the index is not mentioned in any such plan...)

regards, tom lane