HOT updates in index-less tables

Started by Tom Laneabout 15 years ago8 messages
#1Tom Lane
tgl@sss.pgh.pa.us

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns. But is there any benefit to doing so? I don't
see one offhand, and it has a downside: we're very likely to
encounter broken HOT chains if an index is created later. That leads
to the sort of unexpected behavior exhibited here:
http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php

I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it
always returns false if the relation has no indexes, which could be
checked cheaply via relation->rd_rel->relhasindex.

regards, tom lane

#2Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Tom Lane (#1)
Re: HOT updates in index-less tables

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns. But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

If we don't try the mini vacuum in this case, then some current
behaviuors could become much worse, say a table with one row and load of
updates.

I don't
see one offhand, and it has a downside: we're very likely to
encounter broken HOT chains if an index is created later. That leads
to the sort of unexpected behavior exhibited here:
http://archives.postgresql.org/pgsql-performance/2010-11/msg00216.php

I'm thinking maybe HeapSatisfiesHOTUpdate should be changed so that it
always returns false if the relation has no indexes, which could be
checked cheaply via relation->rd_rel->relhasindex.

regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#2)
Re: HOT updates in index-less tables

Hannu Krosing <hannu@2ndQuadrant.com> writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns. But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not. The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too). With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

regards, tom lane

#4Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Tom Lane (#3)
Re: HOT updates in index-less tables

On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote:

Hannu Krosing <hannu@2ndQuadrant.com> writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns. But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not. The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too). With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum,

How hard would it be to make the pruning logic be aware of there being
no indexes and thus no possibility of index entries pointing at any
tuple ?

since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#4)
Re: HOT updates in index-less tables

Hannu Krosing <hannu@2ndQuadrant.com> writes:

On Sat, 2010-11-13 at 12:13 -0500, Tom Lane wrote:

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not. The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too). With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum,

How hard would it be to make the pruning logic be aware of there being
no indexes and thus no possibility of index entries pointing at any
tuple ?

I think it's problematic, because heap_page_prune can be executed with
only AccessShareLock on the table, which means there's a race condition
against concurrent CREATE INDEX. You could look at relhasindex easily
enough, but that doesn't prove there's not a CREATE INDEX in progress.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: HOT updates in index-less tables

On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hannu Krosing <hannu@2ndQuadrant.com> writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns.  But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not.  The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too).  With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

I'm not at all convinced that's a tiny penalty.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#6)
Re: HOT updates in index-less tables

On 14.11.2010 00:29, Robert Haas wrote:

On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Hannu Krosing<hannu@2ndQuadrant.com> writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns. But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not. The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too). With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

I'm not at all convinced that's a tiny penalty.

Me neither. It's a tiny penalty when you consider one update, but if you
repeatedly update the same tuple, you accumulate dead line pointers
until the next real vacuum runs. With HOT updates, you reach a steady
state where page pruning is all you need. Then again, if you're
repeatedly updating a row in a table with no indexes, presumably it's a
very small table or you would create an index on it. And frequently
autovacuuming a small index is quite cheap too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Heikki Linnakangas (#7)
Re: HOT updates in index-less tables

On Sun, Nov 14, 2010 at 1:12 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 14.11.2010 00:29, Robert Haas wrote:

On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:

Hannu Krosing<hannu@2ndQuadrant.com>  writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns.  But is there any benefit to doing so?

If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.

AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not.  The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too).  With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.

I'm not at all convinced that's a tiny penalty.

Me neither. It's a tiny penalty when you consider one update, but if you
repeatedly update the same tuple, you accumulate dead line pointers until
the next real vacuum runs. With HOT updates, you reach a steady state where
page pruning is all you need. Then again, if you're repeatedly updating a
row in a table with no indexes, presumably it's a very small table or you
would create an index on it. And frequently autovacuuming a small index is
quite cheap too.

The case here is when you have say a control table that is managing a
gapless sequence, or a materialization table with a very small number
of records. These type of tables get updated very frequently, perhaps
in every transaction. People without detailed implementation
knowledge of postgresql might assume that leaving an index off the
table is faster in these situations.

The danger here is that if autovacuum is stalled for whatever reason,
you get exponentially bad behavior as the table gets stuffed with bad
records. index-less hot was put in intentionally. As autovacuum gets
smarter and smarter, the reasoning to do this get weaker.

merlin