Performance-improvement idea: shortcircuit unique-index checks

Started by Tom Lanealmost 25 years ago6 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Over the weekend I noticed that Tatsuo's pgbench benchmark seems to
spend an awful lot of its time down inside _bt_check_unique. This
happens because with table declarations like

create table accounts(aid int, primary key(aid), bid int,
abalance int, filler char(84))

and commands like

update accounts set abalance = abalance + x where aid = y

the "update" is inserting a new tuple and the index on aid wants to
make sure this insertion doesn't violate the uniqueness constraint.
To do that, it has to visit all active *and dead* tuples with the same
aid index value, to make sure they're all deleted or being deleted by
the current transaction. That's expensive if they're scattered all over
the table.

However, since we have not changed the aid column from its prior value,
it seems like this check is wasted effort. We should be able to deduce
that if the prior state of the row was OK then this one is too.

I'm not quite sure how to implement this, but I wanted to toss the idea
out for discussion. Probably we'd have to have some cooperation between
the heap_update level (where the fact that it's an update is known, and
where we'd have a chance to test for changes in particular columns) and
the index access level. Maybe it's wrong for the index access level to
have primary responsibility for uniqueness checks in the first place.

Obviously this isn't going to happen for 7.1, but it might make a nice
performance improvement for 7.2.

Comments?

regards, tom lane

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#1)
Re: Performance-improvement idea: shortcircuit unique-index checks

I'm not quite sure how to implement this, but I wanted to toss the idea
out for discussion. Probably we'd have to have some cooperation between
the heap_update level (where the fact that it's an update is known, and
where we'd have a chance to test for changes in particular columns) and
the index access level. Maybe it's wrong for the index access level to
have primary responsibility for uniqueness checks in the first place.

Obviously this isn't going to happen for 7.1, but it might make a nice
performance improvement for 7.2.

Seems a better solution would be to put a 'deleted' bit in the index so
we would have to visit those heap tuples only once for a committed
status. Similar to what we do with heap tuples so we don't have to
visit pg_log repeatedly.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Performance-improvement idea: shortcircuit unique-index checks

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Seems a better solution would be to put a 'deleted' bit in the index so
we would have to visit those heap tuples only once for a committed
status. Similar to what we do with heap tuples so we don't have to
visit pg_log repeatedly.

That's only a partial solution, since the index is still going to have
to visit the row's existing tuple (which is, by definition, not yet
committed dead). My point is that the index scanning done for
uniqueness checks can be eliminated *entirely* for one pretty-common
case.

A deleted bit in index entries might be useful too, but I think it
attacks a different set of cases.

regards, tom lane

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#1)
Re: Performance-improvement idea: shortcircuit unique-index checks

On Mon, 19 Feb 2001, Tom Lane wrote:

I'm not quite sure how to implement this, but I wanted to toss the idea
out for discussion. Probably we'd have to have some cooperation between
the heap_update level (where the fact that it's an update is known, and
where we'd have a chance to test for changes in particular columns) and
the index access level. Maybe it's wrong for the index access level to
have primary responsibility for uniqueness checks in the first place.

Obviously this isn't going to happen for 7.1, but it might make a nice
performance improvement for 7.2.

Comments?

This sounds like a win for alot of updates where keys don't change.

Also, if work is going to be done here, it might be nice to make the
unique constraint have the correct semantics for checking after statement
rather than per-row when multiple rows are changed in the same statement
since I'm pretty sure the standard semantics is that as long as the
rows are different at the end of the statement it's okay (which is
not what we do currently AFAICS). I'm really not sure what's involved in
that though.

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: Performance-improvement idea: shortcircuit unique-index checks

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Seems a better solution would be to put a 'deleted' bit in the index so
we would have to visit those heap tuples only once for a committed
status. Similar to what we do with heap tuples so we don't have to
visit pg_log repeatedly.

That's only a partial solution, since the index is still going to have
to visit the row's existing tuple (which is, by definition, not yet
committed dead). My point is that the index scanning done for
uniqueness checks can be eliminated *entirely* for one pretty-common
case.

I see.

A deleted bit in index entries might be useful too, but I think it
attacks a different set of cases.

Yes. Let me add some TODO items:

* Add deleted bit to index tuples to reduce heap access
* Prevent index uniqueness checks when UPDATE does not modify column

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#5)
Re: Performance-improvement idea: shortcircuit unique-indexchecks

Bruce Momjian wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes. Let me add some TODO items:

* Add deleted bit to index tuples to reduce heap access

ISTM this isn't a bad idea. However note that there remains only
1 bit unused in IndexTupleData.

Regards,
Hiroshi Inoue