index-only scans versus serializable transactions

Started by Kevin Grittnerover 13 years ago4 messages
#1Kevin Grittner
Kevin.Grittner@wicourts.gov
1 attachment(s)

By not visiting the heap page for tuples, index-only scans fail to
acquire all of the necessary predicate locks for correct behavior at
the serializable transaction isolation level. The tag for the
tuple-level predicate locks includes the xmin, to avoid possible
problems with tid re-use. (This was not covered in initial
pre-release versions of SSI, and testing actually hit the problem.)
When an "index-only" scan does need to look at the heap because the
visibility map doesn't indicate that the tuple is visible to all
transactions, the tuple-level predicate lock is acquired. The best
we can do without visiting the heap is a page level lock on the heap
page, so that is what the attached patch does.

If there are no objections, I will apply to HEAD and 9.2.

-Kevin

Attachments:

index-only-serializable-v1.patchapplication/octet-stream; name=index-only-serializable-v1.patchDownload
*** a/src/backend/executor/nodeIndexonlyscan.c
--- b/src/backend/executor/nodeIndexonlyscan.c
***************
*** 30,35 ****
--- 30,36 ----
  #include "executor/nodeIndexonlyscan.h"
  #include "executor/nodeIndexscan.h"
  #include "storage/bufmgr.h"
+ #include "storage/predicate.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
***************
*** 52,58 **** IndexOnlyNext(IndexOnlyScanState *node)
  	ExprContext *econtext;
  	ScanDirection direction;
  	IndexScanDesc scandesc;
! 	HeapTuple	tuple;
  	TupleTableSlot *slot;
  	ItemPointer tid;
  
--- 53,59 ----
  	ExprContext *econtext;
  	ScanDirection direction;
  	IndexScanDesc scandesc;
! 	HeapTuple	tuple = NULL;
  	TupleTableSlot *slot;
  	ItemPointer tid;
  
***************
*** 147,152 **** IndexOnlyNext(IndexOnlyScanState *node)
--- 148,165 ----
  			}
  		}
  
+ 		/*
+ 		 * Predicate locks for index-only scans must be acquired at the page
+ 		 * level when the heap is not accessed, since tuple-level predicate
+ 		 * locks need the tuple's xmin value.  If we had to visit the tuple
+ 		 * anyway, then we already have the tuple-level lock and can skip the
+ 		 * page lock.
+ 		 */
+ 		if (tuple == NULL)
+ 			PredicateLockPage(scandesc->heapRelation,
+ 							  ItemPointerGetBlockNumber(tid),
+ 							  estate->es_snapshot);
+ 
  		return slot;
  	}
  
*** /dev/null
--- b/src/test/isolation/expected/index-only-scan.out
***************
*** 0 ****
--- 1,41 ----
+ Parsed test spec with 2 sessions
+ 
+ starting permutation: rxwy1 c1 rywx2 c2
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ 
+ starting permutation: rxwy1 rywx2 c1 c2
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c1: COMMIT;
+ step c2: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rxwy1 rywx2 c2 c1
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ step c1: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 rxwy1 c1 c2
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
+ step c2: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 rxwy1 c2 c1
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c2: COMMIT;
+ step c1: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 c2 rxwy1 c1
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
*** a/src/test/isolation/isolation_schedule
--- b/src/test/isolation/isolation_schedule
***************
*** 9,14 **** test: ri-trigger
--- 9,15 ----
  test: partial-index
  test: two-ids
  test: multiple-row-versions
+ test: index-only-scan
  test: fk-contention
  test: fk-deadlock
  test: fk-deadlock2
*** /dev/null
--- b/src/test/isolation/specs/index-only-scan.spec
***************
*** 0 ****
--- 1,46 ----
+ # index-only scan test
+ #
+ # This test tries to expose problems with the interaction between index-only
+ # scans and SSI.
+ #
+ # Any overlap between the transactions must cause a serialization failure.
+ 
+ setup
+ {
+   CREATE TABLE tabx (id int NOT NULL);
+   INSERT INTO tabx SELECT generate_series(1,10000);
+   ALTER TABLE tabx ADD PRIMARY KEY (id);
+   CREATE TABLE taby (id int NOT NULL);
+   INSERT INTO taby SELECT generate_series(1,10000);
+   ALTER TABLE taby ADD PRIMARY KEY (id);
+ }
+ setup { VACUUM FREEZE ANALYZE tabx; }
+ setup { VACUUM FREEZE ANALYZE taby; }
+ 
+ teardown
+ {
+   DROP TABLE tabx;
+   DROP TABLE taby;
+ }
+ 
+ session "s1"
+ setup
+ {
+   BEGIN ISOLATION LEVEL SERIALIZABLE;
+   SET LOCAL seq_page_cost = 0.1;
+   SET LOCAL random_page_cost = 0.1;
+   SET LOCAL cpu_tuple_cost = 0.03;
+ }
+ step "rxwy1" { DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx); }
+ step "c1" { COMMIT; }
+ 
+ session "s2"
+ setup
+ {
+   BEGIN ISOLATION LEVEL SERIALIZABLE;
+   SET LOCAL seq_page_cost = 0.1;
+   SET LOCAL random_page_cost = 0.1;
+   SET LOCAL cpu_tuple_cost = 0.03;
+ }
+ step "rywx2" { DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby); }
+ step "c2" { COMMIT; }
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#1)
Re: index-only scans versus serializable transactions

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

By not visiting the heap page for tuples, index-only scans fail to
acquire all of the necessary predicate locks for correct behavior at
the serializable transaction isolation level. The tag for the
tuple-level predicate locks includes the xmin, to avoid possible
problems with tid re-use. (This was not covered in initial
pre-release versions of SSI, and testing actually hit the problem.)
When an "index-only" scan does need to look at the heap because the
visibility map doesn't indicate that the tuple is visible to all
transactions, the tuple-level predicate lock is acquired. The best
we can do without visiting the heap is a page level lock on the heap
page, so that is what the attached patch does.

If there are no objections, I will apply to HEAD and 9.2.

This isn't right in detail: there are paths through the loop where
"tuple" is not NULL at the beginning of the next iteration
(specifically, consider failure of a lossy-qual recheck). I think
that only results in wasted work, but it's still not operating as
intended. I'd suggest moving the declaration/initialization of the
"tuple" variable to inside the while loop, since there's no desire
for its value to carry across loops.

regards, tom lane

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#2)
1 attachment(s)
Re: index-only scans versus serializable transactions

Tom Lane wrote:

"Kevin Grittner" writes:

By not visiting the heap page for tuples, index-only scans fail to
acquire all of the necessary predicate locks for correct behavior
at the serializable transaction isolation level. The tag for the
tuple-level predicate locks includes the xmin, to avoid possible
problems with tid re-use. (This was not covered in initial
pre-release versions of SSI, and testing actually hit the
problem.) When an "index-only" scan does need to look at the heap
because the visibility map doesn't indicate that the tuple is
visible to all transactions, the tuple-level predicate lock is
acquired. The best we can do without visiting the heap is a page
level lock on the heap page, so that is what the attached patch
does.

If there are no objections, I will apply to HEAD and 9.2.

This isn't right in detail: there are paths through the loop where
"tuple" is not NULL at the beginning of the next iteration
(specifically, consider failure of a lossy-qual recheck). I think
that only results in wasted work, but it's still not operating as
intended. I'd suggest moving the declaration/initialization of the
"tuple" variable to inside the while loop, since there's no desire
for its value to carry across loops.

You're right. It looks to me like moving the declaration (and
initialization) to more local scape (just inside the loop) fixes it.

New version attached. Will apply if no further problems are found.

-Kevin

Attachments:

index-only-serializable-v2.patchapplication/octet-stream; name=index-only-serializable-v2.patchDownload
*** a/src/backend/executor/nodeIndexonlyscan.c
--- b/src/backend/executor/nodeIndexonlyscan.c
***************
*** 30,35 ****
--- 30,36 ----
  #include "executor/nodeIndexonlyscan.h"
  #include "executor/nodeIndexscan.h"
  #include "storage/bufmgr.h"
+ #include "storage/predicate.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
  
***************
*** 52,58 **** IndexOnlyNext(IndexOnlyScanState *node)
  	ExprContext *econtext;
  	ScanDirection direction;
  	IndexScanDesc scandesc;
- 	HeapTuple	tuple;
  	TupleTableSlot *slot;
  	ItemPointer tid;
  
--- 53,58 ----
***************
*** 78,83 **** IndexOnlyNext(IndexOnlyScanState *node)
--- 78,85 ----
  	 */
  	while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
  	{
+ 		HeapTuple	tuple = NULL;
+ 
  		/*
  		 * We can skip the heap fetch if the TID references a heap page on
  		 * which all tuples are known visible to everybody.  In any case,
***************
*** 147,152 **** IndexOnlyNext(IndexOnlyScanState *node)
--- 149,166 ----
  			}
  		}
  
+ 		/*
+ 		 * Predicate locks for index-only scans must be acquired at the page
+ 		 * level when the heap is not accessed, since tuple-level predicate
+ 		 * locks need the tuple's xmin value.  If we had to visit the tuple
+ 		 * anyway, then we already have the tuple-level lock and can skip the
+ 		 * page lock.
+ 		 */
+ 		if (tuple == NULL)
+ 			PredicateLockPage(scandesc->heapRelation,
+ 							  ItemPointerGetBlockNumber(tid),
+ 							  estate->es_snapshot);
+ 
  		return slot;
  	}
  
*** /dev/null
--- b/src/test/isolation/expected/index-only-scan.out
***************
*** 0 ****
--- 1,41 ----
+ Parsed test spec with 2 sessions
+ 
+ starting permutation: rxwy1 c1 rywx2 c2
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ 
+ starting permutation: rxwy1 rywx2 c1 c2
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c1: COMMIT;
+ step c2: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rxwy1 rywx2 c2 c1
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ step c1: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 rxwy1 c1 c2
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
+ step c2: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 rxwy1 c2 c1
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c2: COMMIT;
+ step c1: COMMIT;
+ ERROR:  could not serialize access due to read/write dependencies among transactions
+ 
+ starting permutation: rywx2 c2 rxwy1 c1
+ step rywx2: DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby);
+ step c2: COMMIT;
+ step rxwy1: DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx);
+ step c1: COMMIT;
*** a/src/test/isolation/isolation_schedule
--- b/src/test/isolation/isolation_schedule
***************
*** 9,14 **** test: ri-trigger
--- 9,15 ----
  test: partial-index
  test: two-ids
  test: multiple-row-versions
+ test: index-only-scan
  test: fk-contention
  test: fk-deadlock
  test: fk-deadlock2
*** /dev/null
--- b/src/test/isolation/specs/index-only-scan.spec
***************
*** 0 ****
--- 1,46 ----
+ # index-only scan test
+ #
+ # This test tries to expose problems with the interaction between index-only
+ # scans and SSI.
+ #
+ # Any overlap between the transactions must cause a serialization failure.
+ 
+ setup
+ {
+   CREATE TABLE tabx (id int NOT NULL);
+   INSERT INTO tabx SELECT generate_series(1,10000);
+   ALTER TABLE tabx ADD PRIMARY KEY (id);
+   CREATE TABLE taby (id int NOT NULL);
+   INSERT INTO taby SELECT generate_series(1,10000);
+   ALTER TABLE taby ADD PRIMARY KEY (id);
+ }
+ setup { VACUUM FREEZE ANALYZE tabx; }
+ setup { VACUUM FREEZE ANALYZE taby; }
+ 
+ teardown
+ {
+   DROP TABLE tabx;
+   DROP TABLE taby;
+ }
+ 
+ session "s1"
+ setup
+ {
+   BEGIN ISOLATION LEVEL SERIALIZABLE;
+   SET LOCAL seq_page_cost = 0.1;
+   SET LOCAL random_page_cost = 0.1;
+   SET LOCAL cpu_tuple_cost = 0.03;
+ }
+ step "rxwy1" { DELETE FROM taby WHERE id = (SELECT min(id) FROM tabx); }
+ step "c1" { COMMIT; }
+ 
+ session "s2"
+ setup
+ {
+   BEGIN ISOLATION LEVEL SERIALIZABLE;
+   SET LOCAL seq_page_cost = 0.1;
+   SET LOCAL random_page_cost = 0.1;
+   SET LOCAL cpu_tuple_cost = 0.03;
+ }
+ step "rywx2" { DELETE FROM tabx WHERE id = (SELECT min(id) FROM taby); }
+ step "c2" { COMMIT; }
#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#3)
Re: index-only scans versus serializable transactions

"Kevin Grittner" wrote:

New version attached. Will apply if no further problems are found.

Pushed to master and REL9_2_STABLE.

-Kevin