An out-of-date comment in nodeIndexonlyscan.c

Started by Thomas Munroalmost 8 years ago15 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hello,

Since commit cdf91edb (2012), nodeIndexonlyscan.c says:

/*
* Predicate locks for index-only scans must be
acquired at the page
* level when the heap is not accessed, since
tuple-level predicate
* locks need the tuple's xmin value. If we had to
visit the tuple
* anyway, then we already have the tuple-level lock
and can skip the
* page lock.
*/
if (tuple == NULL)
PredicateLockPage(scandesc->heapRelation,

ItemPointerGetBlockNumber(tid),
estate->es_snapshot);

The first sentence of that comment is no longer true as of commit
c01262a8 (2013). As for whether it's necessary to predicate-lock the
whole eheap page (rather than the heap tuple) anyway because of HOT
update chains, I don't know, so I'm not sure what wording to propose
instead.

--
Thomas Munro
http://www.enterprisedb.com

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thomas Munro (#1)
Re: An out-of-date comment in nodeIndexonlyscan.c

On 14/05/18 02:15, Thomas Munro wrote:

Hello,

Since commit cdf91edb (2012), nodeIndexonlyscan.c says:

/*
* Predicate locks for index-only scans must be
acquired at the page
* level when the heap is not accessed, since
tuple-level predicate
* locks need the tuple's xmin value. If we had to
visit the tuple
* anyway, then we already have the tuple-level lock
and can skip the
* page lock.
*/
if (tuple == NULL)
PredicateLockPage(scandesc->heapRelation,

ItemPointerGetBlockNumber(tid),
estate->es_snapshot);

The first sentence of that comment is no longer true as of commit
c01262a8 (2013). As for whether it's necessary to predicate-lock the
whole eheap page (rather than the heap tuple) anyway because of HOT
update chains, I don't know, so I'm not sure what wording to propose
instead.

Hmm. If there are any updated tuples, HOT or not, the visibility map bit
will not be set, and we won't reach this code. So I think we could
acquire the tuple lock here.

- Heikki

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 14/05/18 02:15, Thomas Munro wrote:

Since commit cdf91edb (2012), nodeIndexonlyscan.c says:

/*
* Predicate locks for index-only scans must be
acquired at the page
* level when the heap is not accessed, since
tuple-level predicate
* locks need the tuple's xmin value. If we had to
visit the tuple
* anyway, then we already have the tuple-level lock
and can skip the
* page lock.
*/
if (tuple == NULL)
PredicateLockPage(scandesc->heapRelation,

ItemPointerGetBlockNumber(tid),
estate->es_snapshot);

The first sentence of that comment is no longer true as of commit
c01262a8 (2013). As for whether it's necessary to predicate-lock the
whole eheap page (rather than the heap tuple) anyway because of HOT
update chains, I don't know, so I'm not sure what wording to propose
instead.

Hmm. If there are any updated tuples, HOT or not, the visibility map bit
will not be set, and we won't reach this code. So I think we could
acquire the tuple lock here.

Right. CC Kevin. I think we should at least fix the comment. As for
changing the lock level, PredicateLockTuple() wants a heap tuple that
we don't have, so we'd probably need to add a PredicateLockTid()
function.

--
Thomas Munro
http://www.enterprisedb.com

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#3)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Fri, Feb 8, 2019 at 4:58 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 14/05/18 02:15, Thomas Munro wrote:

The first sentence of that comment is no longer true as of commit
c01262a8 (2013). As for whether it's necessary to predicate-lock the
whole eheap page (rather than the heap tuple) anyway because of HOT
update chains, I don't know, so I'm not sure what wording to propose
instead.

Hmm. If there are any updated tuples, HOT or not, the visibility map bit
will not be set, and we won't reach this code. So I think we could
acquire the tuple lock here.

Right. CC Kevin. I think we should at least fix the comment. As for
changing the lock level, PredicateLockTuple() wants a heap tuple that
we don't have, so we'd probably need to add a PredicateLockTid()
function.

Here's a patch I'd like to commit to fix the comment. We could look
at improving the actual code after
https://commitfest.postgresql.org/23/2169/ is done.

I wonder if it might be possible to avoid page locks on both the heap
*and* index in some limited cases, as I mentioned over here (just an
idea, could be way off base):

/messages/by-id/CA+hUKGJGDVfhHmoUGzi-_J+N8FmRjmWTY0MCd1ZV5Fj9qdyb1w@mail.gmail.com

--
Thomas Munro
https://enterprisedb.com

Attachments:

0001-Fix-misleading-comment-in-nodeIndexonlyscan.c.patchapplication/octet-stream; name=0001-Fix-misleading-comment-in-nodeIndexonlyscan.c.patchDownload+2-6
#5Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Thomas Munro (#4)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's a patch I'd like to commit to fix the comment. We could look
at improving the actual code after
https://commitfest.postgresql.org/23/2169/ is done.

Change looks good to me.

I wonder if it might be possible to avoid page locks on both the heap
*and* index in some limited cases, as I mentioned over here (just an
idea, could be way off base):

/messages/by-id/CA+hUKGJGDVfhHmoUGzi-_J+N8FmRjmWTY0MCd1ZV5Fj9qdyb1w@mail.gmail.com

I am in same boat as you. One can get serializable conflict error today
based on tuple gets HOT updated or not. HOT is logically internal code
optimization and not so much user functionality, so ideally feels shouldn't
affect serializable behavior. But it does currently, again, due to index
locking. Disable HOT update and 4 isolation tests fail due to "could not
serialize access due to read/write dependencies among transactions"
otherwise not. If the tuple happens to fit on same page user will not get
the error, if the tuple gets inserted to different page the error can
happen, due to index page locking. I had discussed this with Heikki and
thinking is we shouldn't need to take the lock on the index page, if the
index key was not changed, even if it was a non-HOT update. Not sure of
complications and implications, but just a thought.

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Ashwin Agrawal (#5)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

Change looks good to me.

Pushed, thanks.

--
Thomas Munro
https://enterprisedb.com

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Ashwin Agrawal (#5)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I wonder if it might be possible to avoid page locks on both the heap
*and* index in some limited cases, as I mentioned over here (just an
idea, could be way off base):

/messages/by-id/CA+hUKGJGDVfhHmoUGzi-_J+N8FmRjmWTY0MCd1ZV5Fj9qdyb1w@mail.gmail.com

I am in same boat as you. One can get serializable conflict error today based on tuple gets HOT updated or not. HOT is logically internal code optimization and not so much user functionality, so ideally feels shouldn't affect serializable behavior. But it does currently, again, due to index locking. Disable HOT update and 4 isolation tests fail due to "could not serialize access due to read/write dependencies among transactions" otherwise not. If the tuple happens to fit on same page user will not get the error, if the tuple gets inserted to different page the error can happen, due to index page locking. I had discussed this with Heikki and thinking is we shouldn't need to take the lock on the index page, if the index key was not changed, even if it was a non-HOT update. Not sure of complications and implications, but just a thought.

Oh, I think the idea I was suggesting might be the same as this item
from README-SSI (unrelated to HOT, but related to index uniqueness,
particularly in index-only-scan where you might be able to skip the
btree page lock):

"Several optimizations are possible, though not all are implemented yet:
...
* An index scan which is comparing for equality on the entire key
for a unique index need not acquire a predicate lock as long as a key
is found corresponding to a visible tuple which has not been modified
by another transaction -- there are no "between or around" gaps to
cover."

--
Thomas Munro
https://enterprisedb.com

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Ashwin Agrawal (#5)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:

On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's a patch I'd like to commit to fix the comment. We could look
at improving the actual code after
https://commitfest.postgresql.org/23/2169/ is done.

Change looks good to me.

... and here is the corresponding code change, with a test to
demonstrate the change.

I'm working on a proof-of-concept to skip the btree page lock
sometimes too, which seems promising, but it requires some planner
work which has temporarily pretzeled my brain.

Attachments:

0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchtext/x-patch; charset=US-ASCII; name=0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchDownload+113-5
#9Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#8)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Sat, Jun 12, 2021 at 2:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:

... and here is the corresponding code change, with a test to
demonstrate the change.

I'm working on a proof-of-concept to skip the btree page lock
sometimes too, which seems promising, but it requires some planner
work which has temporarily pretzeled my brain.

Here's a highly experimental patch I came up with that seems to
produce the right results in simple cases, without (yet) involving the
planner. The regression tests show single table queries, but it works
also for nest loop joins, which is where this optimisation should be
most interesting, I think. There are a few weird things about this
patch though, and there could well be much better ways to do it, as
noted in the commit message and comments. It's a start on the
problem...

Attachments:

v2-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchDownload+113-5
v2-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchtext/x-patch; charset=UTF-8; name=v2-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchDownload+532-25
#10David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#9)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Mon, 14 Jun 2021 at 00:02, Thomas Munro <thomas.munro@gmail.com> wrote:

Here's a highly experimental patch I came up with that seems to
produce the right results in simple cases, without (yet) involving the
planner.

+ /* Find all equality quals. */
+ for (int i = 0; i < n_scan_keys; ++i)
+ {
+ if (scan_keys[i].sk_strategy == BTEqualStrategyNumber)
+ attnos[nattnos++] = scan_keys[i].sk_attno;
+ }
+
+ /* Are all attributes covered? */
+ /* XXX is this check enough or do we need to work harder? */
+ qsort(attnos, nattnos, sizeof(AttrNumber), compare_int16);
+ nattnos = qunique(attnos, nattnos, sizeof(AttrNumber), compare_int16);
+ if (nattnos == index->rd_index->indnkeyatts)

I think a more optimal and nicer way of doing that would be setting
bits in a Bitmapset then checking bms_num_members is equal to
n_scan_keys.

David

#11Thomas Munro
thomas.munro@gmail.com
In reply to: David Rowley (#10)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Mon, Jun 14, 2021 at 12:54 AM David Rowley <dgrowleyml@gmail.com> wrote:

I think a more optimal and nicer way of doing that would be setting
bits in a Bitmapset then checking bms_num_members is equal to
n_scan_keys.

Shouldn't it be compared with indnkeyatts? Yes, much nicer, thanks!

Attachments:

v3-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchDownload+113-5
v3-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchtext/x-patch; charset=UTF-8; name=v3-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchDownload+519-25
#12David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#11)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Mon, 14 Jun 2021 at 10:03, Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Jun 14, 2021 at 12:54 AM David Rowley <dgrowleyml@gmail.com> wrote:

I think a more optimal and nicer way of doing that would be setting
bits in a Bitmapset then checking bms_num_members is equal to
n_scan_keys.

Shouldn't it be compared with indnkeyatts? Yes, much nicer, thanks!

Oh yeah, I did mean that. Thanks for the correction.

Have you also thought about deferrable unique / primary key constraints?

It's possible to the uniqueness temporarily violated during a
transaction when the unique constraint is deferred,

For example:
create table t (id int primary key deferrable initially deferred);
begin;
insert into t values(1),(1);
select * from t;
id
----
1
1
(2 rows)

I think you'd just need to add a check to ensure that indimmediate is
true around where you're checking the indisunique flag.

David

#13Thomas Munro
thomas.munro@gmail.com
In reply to: David Rowley (#12)
Re: An out-of-date comment in nodeIndexonlyscan.c

On Mon, Jun 14, 2021 at 2:29 PM David Rowley <dgrowleyml@gmail.com> wrote:

Have you also thought about deferrable unique / primary key constraints?

It's possible to the uniqueness temporarily violated during a
transaction when the unique constraint is deferred,

Oh, yeah, very good question. I think many scenarios involving
duplicates work correctly, but I think there is a danger like this:

create table t (i int primary key deferrable initially deferred, j int);
create unique index on t(j);
insert into t values (999, 999); -- avoid empty index
set enable_seqscan = off;

begin transaction isolation level serializable;
insert into t values (1, 2); -- create phantom row
select * from t where i = 1;
delete from t where j = 2; -- remove phantom row
SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE
mode = 'SIReadLock';
commit;

master:

locktype | relation | page | tuple
----------+----------+------+-------
page | t_pkey | 1 |
page | t_j_idx | 1 |
(2 rows)

v3 patch:

locktype | relation | page | tuple
----------+----------+------+-------
(0 rows)

In fact the lock on t_pkey's page 1 was important: it represents the
search for a tuple with i = 1, other than our temporary phantom (only
allowed because constraint deferred). If someone else inserts a row
matching i = 1, the SSI system will not know that we tried to look for
it, because our own temporary phantom confused us.

I think you'd just need to add a check to ensure that indimmediate is
true around where you're checking the indisunique flag.

Yeah, that fixes the problem in this case at least. With v4:

locktype | relation | page | tuple
----------+----------+------+-------
page | t_pkey | 1 |
(1 row)

(It's expected that t_j_idx is not locked: the delete query benefits
from the optimisation when accessing the index on t(j)).

That test case is a little confusing, because at no point does it ever
actually create a duplicate, but I suspect the problem is avoided
without the deferred constraint because you'd either get a UCV on
insertion, or block anyone else from inserting until after you commit
(even after you delete the phantom), and I think that may avoid the
hazard. I could be confused about that. If I am wrong, then a
possible general solution may be to apply the optimisation only if we
find a match that wasn't written by this transaction, though I'm not
sure how given the current layering.

Attachments:

v4-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patchDownload+113-5
v4-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchtext/x-patch; charset=UTF-8; name=v4-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patchDownload+525-25
#14Daniel Gustafsson
daniel@yesql.se
In reply to: Thomas Munro (#13)
Re: An out-of-date comment in nodeIndexonlyscan.c

This patch now fails to apply, probably due to a mostly trivial collision with
fdd88571454e2b00dbe446e8609c6e4294ca89ae in the test files. Can you submit a
rebased version?

--
Daniel Gustafsson https://vmware.com/

#15Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#14)
Re: An out-of-date comment in nodeIndexonlyscan.c

On 9 Nov 2021, at 15:28, Daniel Gustafsson <daniel@yesql.se> wrote:

This patch now fails to apply, probably due to a mostly trivial collision with
fdd88571454e2b00dbe446e8609c6e4294ca89ae in the test files. Can you submit a
rebased version?

I've marked this Returned with Feedback, please feel free to resubmit when a
new version of the patch is available.

--
Daniel Gustafsson https://vmware.com/