HOT readme missing documentation on summarizing index handling

Started by Matthias van de Meentover 2 years ago6 messages
#1Matthias van de Meent
boekewurm+postgres@gmail.com
1 attachment(s)

Hi,

With PG16's 19d8e230, we got rid of BRIN's blocking of HOT updates,
but I just realized that we failed to update the README.HOT document
with this new exception for summarizing indexes.

Attached a patch that updates that document, detailing the related rationale.

I'm not sure if such internal documentation is relevant for
backpatching, but I also don't think it woudl hurt to have this
included in the REL_16_STABLE branch.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

Attachments:

v1-0001-Add-documentation-in-README.HOT-for-handling-summ.patchapplication/octet-stream; name=v1-0001-Add-documentation-in-README.HOT-for-handling-summ.patchDownload
From 5744001117c42b7e1bf7546a8daa2661cc9a42e6 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 6 Jul 2023 13:32:09 +0200
Subject: [PATCH v1] Add documentation in README.HOT for handling summarizing
 indexes

---
 src/backend/access/heap/README.HOT | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 6fd1767f70..983d8017cf 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -126,14 +126,27 @@ whether it was part of a HOT chain or not.  This allows space reclamation
 in advance of running VACUUM for plain DELETEs as well as HOT updates.
 
 The requirement for doing a HOT update is that none of the indexed
-columns are changed.  This is checked at execution time by comparing the
-binary representation of the old and new values.  We insist on bitwise
-equality rather than using datatype-specific equality routines.  The
-main reason to avoid the latter is that there might be multiple notions
-of equality for a datatype, and we don't know exactly which one is
-relevant for the indexes at hand.  We assume that bitwise equality
-guarantees equality for all purposes.
-
+columns are changed (with one exception detailed below).  This is checked
+at execution time by comparing the binary representation of the old and
+new values.  We insist on bitwise equality rather than using
+datatype-specific equality routines.  The main reason to avoid the
+latter is that there might be multiple notions of equality for a
+datatype, and we don't know exactly which one is relevant for the indexes
+at hand.  We assume that bitwise equality guarantees equality for all
+purposes.
+
+The exception to indexed columns is this: Because HOT is used to retain
+referential integrity of indexes across tuple updates, we can still
+apply HOT if the changed columns are only indexed by indexes that do not
+reference the tuple directly. Indexes which indicate that they summarize
+the indexed tuples (at block- or larger granularity, in the amsummarizing
+flag of the IndexAmRoutine struct) will therefore not block the HOT
+optimization for updates. Note that even if HOT is applied in this case
+of only summarized columns getting updated, the updated value still needs
+to be propagated to the indexes that contain the updated columns: The
+data which the summary is based on has been updated, so the summary must
+be updated. This is still more work than the lack of index updates in
+normal HOT, but it's much preferred over having to update all indexes.
 
 Abort Cases
 -----------
-- 
2.40.1

#2Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Matthias van de Meent (#1)
Re: HOT readme missing documentation on summarizing index handling

Yeah, README.HOT should have been updated, and I see no reason not to
backpatch this to v16. Barring objections, I'll do that tomorrow.

I have two suggesting regarding the README.HOT changes:

1) I'm not entirely sure it's very clear what "referential integrity of
indexes across tuple updates" actually means. I'm afraid "referential
integrity" may lead readers to think about foreign keys. Maybe it'd be
better to explain this is about having index pointers to the new tuple
version, etc.

2) Wouldn't it be good to make it a bit more explicit we now have three
"levels" of HOT:

(a) no indexes need update
(b) update only summarizing indexes
(c) update all indexes

The original text was really about on/off, and I'm not quite sure the
part about "exception" makes this very clear.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tomas Vondra (#2)
1 attachment(s)
Re: HOT readme missing documentation on summarizing index handling

On Fri, 7 Jul 2023 at 00:14, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Yeah, README.HOT should have been updated, and I see no reason not to
backpatch this to v16. Barring objections, I'll do that tomorrow.

I have two suggesting regarding the README.HOT changes:

1) I'm not entirely sure it's very clear what "referential integrity of
indexes across tuple updates" actually means. I'm afraid "referential
integrity" may lead readers to think about foreign keys. Maybe it'd be
better to explain this is about having index pointers to the new tuple
version, etc.

2) Wouldn't it be good to make it a bit more explicit we now have three
"levels" of HOT:

(a) no indexes need update
(b) update only summarizing indexes
(c) update all indexes

The original text was really about on/off, and I'm not quite sure the
part about "exception" makes this very clear.

Agreed on both points. Attached an updated version which incorporates
your points.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v2-0001-Add-documentation-in-README.HOT-for-handling-summ.patchapplication/octet-stream; name=v2-0001-Add-documentation-in-README.HOT-for-handling-summ.patchDownload
From 7c7579454e8ec3e0960c63a28e5267895a894a55 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 6 Jul 2023 13:32:09 +0200
Subject: [PATCH v2] Add documentation in README.HOT for handling summarizing
 indexes

---
 src/backend/access/heap/README.HOT | 63 ++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/README.HOT b/src/backend/access/heap/README.HOT
index 6fd1767f70..8f1a52b8b8 100644
--- a/src/backend/access/heap/README.HOT
+++ b/src/backend/access/heap/README.HOT
@@ -6,7 +6,7 @@ Heap Only Tuples (HOT)
 The Heap Only Tuple (HOT) feature eliminates redundant index entries and
 allows the re-use of space taken by DELETEd or obsoleted UPDATEd tuples
 without performing a table-wide vacuum.  It does this by allowing
-single-page vacuuming, also called "defragmentation".
+single-page vacuuming, also called "defragmentation" or "pruning".
 
 Note: there is a Glossary at the end of this document that may be helpful
 for first-time readers.
@@ -31,12 +31,20 @@ corrupt index, in the form of entries pointing to tuple slots that by now
 contain some unrelated content.  In any case we would prefer to be able
 to do vacuuming without invoking any user-written code.
 
-HOT solves this problem for a restricted but useful special case:
-where a tuple is repeatedly updated in ways that do not change its
-indexed columns.  (Here, "indexed column" means any column referenced
+HOT solves this problem for two restricted but useful special cases:
+
+First, where a tuple is repeatedly updated in ways that do not change
+its indexed columns.  (Here, "indexed column" means any column referenced
 at all in an index definition, including for example columns that are
 tested in a partial-index predicate but are not stored in the index.)
 
+Second, where the modified columns are only used in indexes do not
+contain tuple IDs, but maintain summaries of the indexed data by block.
+As these indexes don't contain references to identifyable tuples, they
+can't remove tuple references in VACUUM, and thus don't need to get a new
+and unique reference to a tuple.  These indexes still need to be notified
+of the new column data, but don't need a new HOT chain to be established.
+
 An additional property of HOT is that it reduces index size by avoiding
 the creation of identically-keyed index entries.  This improves search
 speeds.
@@ -102,16 +110,16 @@ This is safe because no index entry points to line pointer 2.  Subsequent
 insertions into the page can now recycle both line pointer 2 and the
 space formerly used by tuple 2.
 
-If an update changes any indexed column, or there is not room on the
-same page for the new tuple, then the HOT chain ends: the last member
-has a regular t_ctid link to the next version and is not marked
-HEAP_HOT_UPDATED.  (In principle we could continue a HOT chain across
-pages, but this would destroy the desired property of being able to
-reclaim space with just page-local manipulations.  Anyway, we don't
-want to have to chase through multiple heap pages to get from an index
-entry to the desired tuple, so it seems better to create a new index
-entry for the new tuple.)  If further updates occur, the next version
-could become the root of a new HOT chain.
+If an update changes any column indexed by a non-summarizing indexex, or
+if there is not room on the same page for the new tuple, then the HOT
+chain ends: the last member has a regular t_ctid link to the next version
+and is not marked HEAP_HOT_UPDATED.  (In principle we could continue a
+HOT chain across pages, but this would destroy the desired property of
+being able to reclaim space with just page-local manipulations.  Anyway,
+we don't want to have to chase through multiple heap pages to get from an
+index entry to the desired tuple, so it seems better to create a new
+index entry for the new tuple.)  If further updates occur, the next
+version could become the root of a new HOT chain.
 
 Line pointer 1 has to remain as long as there is any non-dead member of
 the chain on the page.  When there is not, it is marked "dead".
@@ -125,15 +133,28 @@ Note: we can use a "dead" line pointer for any DELETEd tuple,
 whether it was part of a HOT chain or not.  This allows space reclamation
 in advance of running VACUUM for plain DELETEs as well as HOT updates.
 
-The requirement for doing a HOT update is that none of the indexed
-columns are changed.  This is checked at execution time by comparing the
-binary representation of the old and new values.  We insist on bitwise
-equality rather than using datatype-specific equality routines.  The
-main reason to avoid the latter is that there might be multiple notions
-of equality for a datatype, and we don't know exactly which one is
-relevant for the indexes at hand.  We assume that bitwise equality
+The requirement for doing a HOT update is that indexes which point to
+the root line pointer (and thus need to be cleaned up by VACUUM when the
+tuple is dead) do not reference columns which are updated in that HOT
+chain.  Summarizing indexes (such as BRIN) are assumed to have no
+references to the tuples on a page and thus are ignored when checking
+HOT applicability.  The updated columns are checked at execution time by
+comparing the binary representation of the old and new values.  We insist
+on bitwise equality rather than using datatype-specific equality routines.
+The main reason to avoid the latter is that there might be multiple
+notions of equality for a datatype, and we don't know exactly which one
+is relevant for the indexes at hand.  We assume that bitwise equality
 guarantees equality for all purposes.
 
+If any columns that are included by non-summarizing indexes are updated,
+the HOT optimization is not applied, and the new tuple is inserted into
+all indexes of the table.  If none of the updated columns are included in
+the table's indexes, the HOT optimization is applied and no indexes are
+updated.  If instead the updated columns are only indexed by summariziong
+indexes, the HOT optimization is applied, but the update is propagated to
+all summarizing indexes.  (Realistically, we only need to propagate the
+update to the indexes that contain the updated values, but that is yet to
+be implemented.)
 
 Abort Cases
 -----------
-- 
2.40.1

#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Matthias van de Meent (#3)
Re: HOT readme missing documentation on summarizing index handling

On 7/7/23 18:34, Matthias van de Meent wrote:

On Fri, 7 Jul 2023 at 00:14, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Yeah, README.HOT should have been updated, and I see no reason not to
backpatch this to v16. Barring objections, I'll do that tomorrow.

I have two suggesting regarding the README.HOT changes:

1) I'm not entirely sure it's very clear what "referential integrity of
indexes across tuple updates" actually means. I'm afraid "referential
integrity" may lead readers to think about foreign keys. Maybe it'd be
better to explain this is about having index pointers to the new tuple
version, etc.

2) Wouldn't it be good to make it a bit more explicit we now have three
"levels" of HOT:

(a) no indexes need update
(b) update only summarizing indexes
(c) update all indexes

The original text was really about on/off, and I'm not quite sure the
part about "exception" makes this very clear.

Agreed on both points. Attached an updated version which incorporates
your points.

Thanks, pushed after correcting a couple typos.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tomas Vondra (#4)
Re: HOT readme missing documentation on summarizing index handling

On Fri, 7 Jul 2023 at 19:06, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 7/7/23 18:34, Matthias van de Meent wrote:

On Fri, 7 Jul 2023 at 00:14, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

The original text was really about on/off, and I'm not quite sure the
part about "exception" makes this very clear.

Agreed on both points. Attached an updated version which incorporates
your points.

Thanks, pushed after correcting a couple typos.

Thanks!

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#6Aleksander Alekseev
aleksander@timescale.com
In reply to: Matthias van de Meent (#5)
Re: HOT readme missing documentation on summarizing index handling

Hi,

Thanks, pushed after correcting a couple typos.

Thanks!

I noticed that ec99d6e9c87a introduced a slight typo:

s/if there is not room/if there is no room

--
Best regards,
Aleksander Alekseev