Heap WARM Tuples - Design Draft
Write Amplification Reduction Method (WARM)
====================================
A few years back, we developed HOT to address the problem associated with
MVCC and frequent updates and it has served us very well. But in the light
of Uber's recent technical blog highlighting some of the problems that are
still remaining, especially for workloads where HOT does not help to the
full extent, Simon, myself and others at 2ndQuadrant discussed several
ideas and what I present below is an outcome of that. This is not to take
away credit from anybody else. Others may have thought about similar ideas,
but I haven’t seen any concrete proposal so far.
At present, WARM looks to be a better idea than LITE, though we have that
as a backstop also (says Simon).
Background
==========
As a background, HOT primarily helps in two ways.
1. It avoids additional index entries when a tuple is updated in a way such
that no index columns are changed. Before we implemented HOT, every update
would generate a new index entry in every index on the table. This not only
resulted in bigger and bloater indexes, but also made vacuuming difficult
because heap tuples can't be removed without first removing the
corresponding index entries. HOT made it possible to avoid additional
index entries whenever possible and ensured most dead heap tuples can be
removed at a page level.
2. HOT also helped other cases such as DELETE and non-HOT updates by
reducing such heap tuples to a DEAD line pointer stub, until a regular
vacuum happens. But the space consumed by the heap tuple is freed and
returned to the FSM for future utilization.
Given that HOT was going to change the very core of MVCC and how tuples are
stored and accessed, we made a decision to keep things simple and two
important conditions were spelled out for doing HOT update.
1. There must be enough space in the same page to store the new version of
the tuple.
2. None of the index columns must be updated.
While it was generally easy to satisfy the first condition by either
leaving some free space in heap pages or system would anyways come to a
stable state after initial few updates. The second condition is somewhat
harder to control. Those who are aware would carefully design their schema
and choose indexes so that HOT conditions are met. But that may not be
possible for every workload, as seen from the Uber example. But many others
may have faced similar situations.
This proposal is to relax the second condition, so it’s not quite HOT, just
WARM.
Basic Idea
========
The basic idea is to allow updates that change an index column, without
requiring every index on the table to cause a new index entry. We still
require the first HOT condition i.e. the heap page must have sufficient
free space to store the new version of the tuple.
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.
This method succeeds in reducing the write amplification, but causes other
issues which also need to be solved. WARM breaks the invariant that all
tuples in a HOT chain have the same index values and so an IndexScan would
need to re-check the index scan conditions against the visible tuple
returned from heap_hot_search(). We must still check visibility, so we only
need to re-check scan conditions on that one tuple version.
We don’t want to force a recheck for every index fetch because that will
slow everything down. So we would like a simple and efficient way of
knowing about the existence of a WARM tuple in a chain and do a recheck in
only those cases, ideally O(1). Having a HOT chain contain a WARM tuple is
discussed below as being a “WARM chain”, implying it needs re-check.
We could solve this by either 1) marking both old and new index tuples as
"recheck-required" or 2) marking the HOT chain on the heap as
"recheck-required" such that any tuple returned from the chain requires a
recheck.
The advantage of doing 1) is that the recheck is required only for the
compromised index.
Note that in presence of multiple index keys pointing to the same root line
pointer, the chain needs re-check for both the old as well as the new index
key. The old index key must not fetch the new tuple(s) and the new index
key must not fetch the old tuple(s). So irrespective of whether an old or a
new tuple is being returned, the caller must know about the WARM tuples in
the chains. So marking the index would require us to mark both old and new
index tuples as requiring re-check. That requires an additional index scan
to locate the old row and then an additional write to force it to re-check,
which is algorithmically O(NlogN) on table size.
Marking the heap, (2) is O(1) per updated index, so seems better. But since
we don't know with respect to which index or indexes the chain is broken,
all index scans must do a recheck for a tuple returned from a WARM chain.
How do we mark a chain as WARM? How do we clear the flag? There are a few
possible approaches:
1. Mark the first WARM tuple which causes HOT chain to become WARM with a
special HEAP_RECHECK_REQUIRED flag (this flag must then be carried over to
any further additions to the chain. Then whenever a tuple if returned from
a chain, we must look forward into the chain for WARM tuple and let the
caller know about potential breakage. This seems simple but would require
us to follow HOT chain every time to check if it’s HOT or WARM.
2. Mark the root line pointer (or the root tuple) with a special
HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in
the chain. Since all indexes point to the root line pointer, it should be
enough to just mark the root line pointer (or tuple) with this flag.
3. We could also adopt a hybrid approach and mark the new index tuple and
new heap tuple with HEAP_RECHECK_REQUIRED flag and presence of either of
the flag in either tuple forces recheck.
Approach 2 seems more reasonable and simple.
There are only 2 bits for lp_flags and all combinations are already used.
But for LP_REDIRECT root line pointer, we could use the lp_len field to
store this special flag, which is not used for LP_REDIRECT line pointers.
So we are able to mark the root line pointer.
Let me explain the idea in more detail using an example and technique 2)
CREATE TABLE test (col1 int, col2 int, col3 int, col4 text);
CREATE INDEX testindx_col1 ON test (col1);
CREATE INDEX testindx_col2 ON test (col2);
CREATE INDEX testindx_col3 ON test (col3);
INSERT INTO test VALUES (1, 11, 111, 'foo');
Let’s assume a single heap tuple and three indexes on three different
columns. Lets assume that the heap tuple has CTID (0,1) at this point. At
this point, each index has exactly one index entry pointing to CTID (0,1)
testindx_col1: (1) ===> (0,1)
testindx_col2: (11) ===> (0,1)
testindx_col3: (111) ===> (0,1)
Now if a transaction T1 UPDATEs the table such as, "UPDATE test SET col1 =
2". This does not satisfy the HOT property since index column 'col1' is
being updated. But as per WARM algorithm, since only testindx_col1's index
key has changed, we insert a new index tuple only in testindx_col1. Say the
new heap tuple has CTID (0,2). So testindx_col1 will have a new index tuple
with key (2), but still pointing to CTID (0,1)
testindx_col1: (1) ===> (0,1)
(2) ===> (0,1)
testindx_col2: (11) ===> (0,1)
testindx_col3: (111) ===> (0,1)
The heap at this point has two tuples, linked by CTID chain.
(0,1)* ---> (0,2)
Both these tuples are reachable via all indexes, starting at the root
(0,1). A transaction which started before T1 should see (0,1) and those
started after T1 should see (0,2). The MVCC visibility will ensure that the
correct tuple is returned. But if a new transaction T2 is looking up (col1
= 1), while MVCC will return (0,2), the tuple does not satisfy the index
condition. Similarly, for an old transaction T0 looking up (col1 = 2), MVCC
will return (0,1) but that tuple does not satisfy the index condition
either. IndexScan will ensure that tuples are rechecked in such instances.
If T3 later updates col2 and T4 further updates col3,
T3: UPDATE test SET col2 = 12;
T4: UPDATE test SET col3 = 112;
The heap will look like:
(0,1)* ---> (0,2) ---> (0,3) ---> (0,4)
And the index pointers will be:
testindx_col1: (1) ===> (0,1)
(2) ===> (0,1)
testindx_col2: (11) ===> (0,1)
(12) ===> (0,1)
testindx_col3: (111) ===> (0,1)
(112) ===> (0,1)
The (*) here indicates that there may exist a WARM tuple in the chain and
index recheck is required. Our algorithm to recheck heap tuple when it’s
returned from a WARM chain, should ensure that only valid and matching
tuples are returned by the index scans.
Marking index tuples DEAD
=====================
When a WARM chain becomes DEAD and replaced by just a LP_DEAD stub,
referencing index tuples can be marked as killed, and removed by VACUUM.
HOT Pruning
==========
No change is required for HOT pruning other than to copy the
HEAP_CHECK_REQUIRED flag to the line pointer when root heap tuple is pruned
and the root line pointer is turned into redirected line pointer. Except
that, HOT prune can do its job i.e. remove dead tuples from the HOT or WARM
chains and if necessary turn root line pointers into redirect or dead line
pointers.
Clearing the flags
==============
We can't clear the flag until stale index pointers go, even if the chain is
all HOT again. In other words, the flag must remain set until there exists
a wrong index key pointing to the root tuple.
One idea is to somehow do this as part of the vacuum where we collect root
line pointers of WARM chains during the first phase of heap scan, check
the indexes for all such tuples (may be combined with index vacuum) and
then clear the heap flags during the second pass, unless new tuples are
added to the WARM chain. We can detect that by checking that all tuples in
the WARM chain still have XID less than the OldestXmin that VACUUM is using.
We don’t really need to count all references to a root of WARM chain. What
we need to know is if there exists an index which has more than one pointer
to the root tuple. If yes, the flag cannot be cleared. ISTM that even two
bits per htid will be enough to track this state. Other ideas welcome.
WAL Logging
===========
It’s important to ensure that the flag is set when it is absolutely
necessary, while having false positives is not a problem. We might do a
little wasteful work if the flag is incorrectly set. Since flag will be set
only during heap_update() and the operation is already WAL logged, this can
be piggybacked with the heap_update WAL record. Similarly, when a root
tuple is pruned to a redirect line pointer, the operation is already WAL
logged and we can piggyback setting of line pointer flag with that WAL
record.
Flag clearing need not be WAL logged, unless we can piggyback that to some
existing WAL logging.
Vacuum
========
Vacuum should continue to do what it does today i.e. prune HOT chains and
collect LP_DEAD line pointers and remove associated index entries and then
mark line pointers as LP_UNUSED. It could be extended to do additional work
of clearing the flags to avoid rechecks when they are not any more required
as discussed above.
Testing
=====
It will be straightforward to have a developer-level index_recheck GUC
parameter.
Index_recheck = off (default) | on
For testing, compare results between two queries, one with index_recheck=on
and one with index_recheck=off. If this finds any difference we’ll know we
have a bug.
We can use this in testing the feature in development and also use it in
the field if we suspect problems.
Comments/suggestions?
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.
That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?
The index points to the head of the HOT chain, and you just walk the
chain on the page --- on need to look at other chains on the page.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
On Thu, Aug 4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?The index points to the head of the HOT chain, and you just walk the
chain on the page --- on need to look at other chains on the page.
When doing an update, you'll need to re-find the root tuple, to insert
the root ctid for the new index tuples.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 04:29:09PM +0530, Pavan Deolasee wrote:
Write Amplification Reduction Method (WARM)
====================================A few years back, we developed HOT to address the problem associated with MVCC
and frequent updates and it has served us very well. But in the light of Uber's
recent technical blog highlighting some of the problems that are still
remaining, especially for workloads where HOT does not help to the full extent,
Simon, myself and others at 2ndQuadrant discussed several ideas and what I
present below is an outcome of that. This is not to take away credit from
anybody else. Others may have thought about similar ideas, but I haven’t seen
any concrete proposal so far.
HOT was a huge win for Postgres and I am glad you are reviewing
improvements.
This method succeeds in reducing the write amplification, but causes other
issues which also need to be solved. WARM breaks the invariant that all tuples
in a HOT chain have the same index values and so an IndexScan would need to
re-check the index scan conditions against the visible tuple returned from
heap_hot_search(). We must still check visibility, so we only need to re-check
scan conditions on that one tuple version.
We don’t want to force a recheck for every index fetch because that will slow
everything down. So we would like a simple and efficient way of knowing about
the existence of a WARM tuple in a chain and do a recheck in only those cases,
ideally O(1). Having a HOT chain contain a WARM tuple is discussed below as
being a “WARM chain”, implying it needs re-check.
In summary, we are already doing visibility checks on the HOT chain, so
a recheck if the heap tuple matches the index value is only done at most
on the one visible tuple in the chain, not ever tuple in the chain.
2. Mark the root line pointer (or the root tuple) with a special
HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in the
chain. Since all indexes point to the root line pointer, it should be enough to
just mark the root line pointer (or tuple) with this flag.
Yes, I think #2 is the easiest. Also, if we modify the index page, we
would have to WAL the change and possibly WAL log the full page write
of the index page. :-(
Approach 2 seems more reasonable and simple.
There are only 2 bits for lp_flags and all combinations are already used. But
for LP_REDIRECT root line pointer, we could use the lp_len field to store this
special flag, which is not used for LP_REDIRECT line pointers. So we are able
to mark the root line pointer.
Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.
Also, what is currently in the lp_len field for LP_REDIRECT? Zeros, or
random data? I am asking for pg_upgrade purposes.
One idea is to somehow do this as part of the vacuum where we collect root line
pointers of WARM chains during the first phase of heap scan, check the indexes
for all such tuples (may be combined with index vacuum) and then clear the heap
flags during the second pass, unless new tuples are added to the WARM chain. We
can detect that by checking that all tuples in the WARM chain still have XID
less than the OldestXmin that VACUUM is using.
Yes, it seems natural to clear the ctid HEAP_RECHECK_REQUIRED flag where
you are adjusting the HOT chain anyway.
It’s important to ensure that the flag is set when it is absolutely necessary,
while having false positives is not a problem. We might do a little wasteful
work if the flag is incorrectly set. Since flag will be set only during
heap_update() and the operation is already WAL logged, this can be piggybacked
with the heap_update WAL record. Similarly, when a root tuple is pruned to a
redirect line pointer, the operation is already WAL logged and we can piggyback
setting of line pointer flag with that WAL record.Flag clearing need not be WAL logged, unless we can piggyback that to some
existing WAL logging.
Agreed, good point.
Very nice!
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?
Hmm, sorry, I did think of that point and I thought I had added it to the doc.
So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?
I think its doable, but it will be fiddly.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 09:58:29AM -0700, Andres Freund wrote:
On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
On Thu, Aug 4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?The index points to the head of the HOT chain, and you just walk the
chain on the page --- on need to look at other chains on the page.When doing an update, you'll need to re-find the root tuple, to insert
the root ctid for the new index tuples.
Ah, I had not thought of that --- good point.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 7:59 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
We don’t want to force a recheck for every index fetch because that will
slow everything down. So we would like a simple and efficient way of knowing
about the existence of a WARM tuple in a chain and do a recheck in only
those cases, ideally O(1). Having a HOT chain contain a WARM tuple is
discussed below as being a “WARM chain”, implying it needs re-check.We could solve this by either 1) marking both old and new index tuples as
"recheck-required" or 2) marking the HOT chain on the heap as
"recheck-required" such that any tuple returned from the chain requires a
recheck.The advantage of doing 1) is that the recheck is required only for the
compromised index.
Note that in presence of multiple index keys pointing to the same root line
pointer, the chain needs re-check for both the old as well as the new index
key. The old index key must not fetch the new tuple(s) and the new index key
must not fetch the old tuple(s). So irrespective of whether an old or a new
tuple is being returned, the caller must know about the WARM tuples in the
chains. So marking the index would require us to mark both old and new index
tuples as requiring re-check. That requires an additional index scan to
locate the old row and then an additional write to force it to re-check,
which is algorithmically O(NlogN) on table size.Marking the heap, (2) is O(1) per updated index, so seems better. But since
we don't know with respect to which index or indexes the chain is broken,
all index scans must do a recheck for a tuple returned from a WARM chain.How do we mark a chain as WARM? How do we clear the flag? There are a few
possible approaches:1. Mark the first WARM tuple which causes HOT chain to become WARM with a
special HEAP_RECHECK_REQUIRED flag (this flag must then be carried over to
any further additions to the chain. Then whenever a tuple if returned from a
chain, we must look forward into the chain for WARM tuple and let the caller
know about potential breakage. This seems simple but would require us to
follow HOT chain every time to check if it’s HOT or WARM.2. Mark the root line pointer (or the root tuple) with a special
HEAP_RECHECK_REQUIRED flag to tell us about the presence of a WARM tuple in
the chain. Since all indexes point to the root line pointer, it should be
enough to just mark the root line pointer (or tuple) with this flag.3. We could also adopt a hybrid approach and mark the new index tuple and
new heap tuple with HEAP_RECHECK_REQUIRED flag and presence of either of the
flag in either tuple forces recheck.
I've actually been thinking of another possibility in the context of
LITE, and what I was pondering looks eerily similar to WARM, so let me
summarize (I was going to write a lengthier POC but I thought I better
comment now since you're introducing the same concept):
Marks are troublesome becuase, as you mention, you would need one
"WARM" bit per index to be efficient. We also don't want a recheck on
every index access.
HOT is very close to WARM already, even the code. The only problem
with HOT is that it stops scanning the update chain when a tuple is
not HOT-updated. If heap_hot_search_buffer knew to keep on following
the update chain when there is a WARM update on the index in question,
this could work.
But it won't work if it's implemented naively, because if there was
another item pointer in another page of the index pointing to a heap
tuple that belongs to an already-visited chain, which is possible in
the below case, then the scan would produce duplicate rows:
N: Row versions:
->: update
* : causes a new index tuple on index I
1 -> 2 -> 3 -> 4 -> 5
* * *
a b a <- key
If the scan qual is "key in ('a','b')", then the index scan will visit
both versions of the row, 1, 3 and 4. Suppose only 5 is visible, and
suppose the whole chain lies in the same page.
There is one update chain, 1..5, but three WARM chains: 1..2, 3, 4..5
When visiting item pointers, 1 will be found, the chain will be
followed and eventually version 5 will pass the visibility check and
the tuple will pass the scan quals, it will even match the key in the
index item, so no key check will fix this. 5 will be yielded when
visiting version 1.
Next, version 3 will be visited. There's no easy way to know that we
already visited 3, so we'll follow the chain all the way to 5, and,
again, it will be yielded. Here, the key won't match, so maybe this
case could be filtered out with a key check.
Finally, version 4 will be visited, the chain will be followed to 5,
and as in the first case, it will be yielded.
So the scan yielded the same heap tuple thrice, which isn't an ideal situation.
It would be totally acceptable for a bitmap index scan, so the
expensive logic I will be outlining below may be skipped when building
a bitmap (lossy or not).
To fix this, what I was pondering, and I believe it would be viable,
is to teach heap_hot_search_buffer about the WARM invariant. It will
make correctness heavily depend on the invariant being true, which
means either flagging item pointers as WARM chains, or requiring a
reindex during upgrade to make sure all indexes verify the invariant -
as old indices won't.
When an item pointer is WARM (either implicit or explicit),
heap_hot_search_buffer will only stop following the update chain if it
reaches not the end of a HOT chain, but rather a key change. For this,
it will need to recheck the key of all but the first TID in the chain,
so singleton chains (ie: not-updated or frozen tuples) won't incurr
this extra cost. When walking the chain, it will need to form an index
tuple for both the old and new versions, and compare them, and follow
the chain *only* if the keys are equal.
As an optimization, if the tuple is marked HOT-updated, the key check
may be skipped.
Vacuum will need to be taught not to break the invariant, but I
believe this is viable and efficient.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:
Approach 2 seems more reasonable and simple.
There are only 2 bits for lp_flags and all combinations are already used. But
for LP_REDIRECT root line pointer, we could use the lp_len field to store this
special flag, which is not used for LP_REDIRECT line pointers. So we are able
to mark the root line pointer.Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.
Hmm. Mostly there will be one, so this is just for the first update
after any VACUUM.
Adding a new linepointer just to hold this seems kludgy and could mean
we run out of linepointers.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 09:58:29AM -0700, Andres Freund wrote:
On 2016-08-04 12:55:25 -0400, Bruce Momjian wrote:
On Thu, Aug 4, 2016 at 09:31:18AM -0700, Andres Freund wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?The index points to the head of the HOT chain, and you just walk the
chain on the page --- on need to look at other chains on the page.When doing an update, you'll need to re-find the root tuple, to insert
the root ctid for the new index tuples.
Also, I was thinking we could just point into the middle of the HOT
chain to limit the number of rows we have to check, but the problem
there is that HOT pruning could then not remove that middle ctid, which
is bad.
I wonder if walking all HOT chains on a single page to find the root is
cheap enough. The ctid tells you if it is part of a HOT chain, right?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?Hmm, sorry, I did think of that point and I thought I had added it to the doc.
So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?I think its doable, but it will be fiddly.
I'm less afraid of the fiddlyness of finding the root tuple, than the
cost. It's not cheap to walk through, potentially, all hot chains to
find the root ctid.
Have you considered what it'd take to allow index pointers to allow to
point to "intermediate" root tuples? I.e. have some indexes point into
the middle of an update-chain, whereas others still point to the
beginning? There's certainly some complications involved with that, but
it'd also have the advantage in reducing the amount of rechecking that
needs to be done.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2016 at 18:17, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers. Only
the index whose key is being changed will need a new index entry. The new
index entry will be set to the CTID of the root line pointer.That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?Hmm, sorry, I did think of that point and I thought I had added it to the doc.
So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?I think its doable, but it will be fiddly.
I'm less afraid of the fiddlyness of finding the root tuple, than the
cost. It's not cheap to walk through, potentially, all hot chains to
find the root ctid.Have you considered what it'd take to allow index pointers to allow to
point to "intermediate" root tuples? I.e. have some indexes point into
the middle of an update-chain, whereas others still point to the
beginning? There's certainly some complications involved with that, but
it'd also have the advantage in reducing the amount of rechecking that
needs to be done.
"Intermediate root tuples" was my first attempt at this and it didn't
work. I'll dig up the details, some problem in VACUUM, IIRC.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 06:16:02PM +0100, Simon Riggs wrote:
On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:
Approach 2 seems more reasonable and simple.
There are only 2 bits for lp_flags and all combinations are already used. But
for LP_REDIRECT root line pointer, we could use the lp_len field to store this
special flag, which is not used for LP_REDIRECT line pointers. So we are able
to mark the root line pointer.Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.Hmm. Mostly there will be one, so this is just for the first update
after any VACUUM.Adding a new linepointer just to hold this seems kludgy and could mean
we run out of linepointers.
Ah, so in cases where there isn't an existing LP_REDIRECT for the chain,
you create one and use the lp_len to identify it as a WARM chain? Hmm.
You can't update the indexes pointing to the existing ctid, so what you
would really have to do is to write over the existing ctid with
LP_REDIRECT plus WARM marker, and move the old ctid to a new ctid slot?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 06:18:47PM +0100, Simon Riggs wrote:
Have you considered what it'd take to allow index pointers to allow to
point to "intermediate" root tuples? I.e. have some indexes point into
the middle of an update-chain, whereas others still point to the
beginning? There's certainly some complications involved with that, but
it'd also have the advantage in reducing the amount of rechecking that
needs to be done."Intermediate root tuples" was my first attempt at this and it didn't
work. I'll dig up the details, some problem in VACUUM, IIRC.
I think the problem is that HOT chain pruning becomes almost impossible
except via VACUUM.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-04 18:18:47 +0100, Simon Riggs wrote:
On 4 August 2016 at 18:17, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-04 18:11:17 +0100, Simon Riggs wrote:
I'm less afraid of the fiddlyness of finding the root tuple, than the
cost. It's not cheap to walk through, potentially, all hot chains to
find the root ctid.Have you considered what it'd take to allow index pointers to allow to
point to "intermediate" root tuples? I.e. have some indexes point into
the middle of an update-chain, whereas others still point to the
beginning? There's certainly some complications involved with that, but
it'd also have the advantage in reducing the amount of rechecking that
needs to be done."Intermediate root tuples" was my first attempt at this and it didn't
work. I'll dig up the details, some problem in VACUUM, IIRC.
I think it's doable, but the question is how to do cleanup. It's kind of
easy to end up with a page full of line pointers which are all reachable
from indexes, but otherwise useless. But I'm not sure that's actually
that inevitable:
a) Given that we fully scan indexes anyway, we could apply redirections
to the indexes themselves, during vacuum. Then redirections could be
removed afterwards.
b) Start fixing up indexes, by refinding entries in the index from the
heap tuple. That requires expressions to actually be immutable, but
personally I think it's stupid to cater for expressions that are
not. Then we can update indexes to point to the "final tuple" after
pruning. It'd also pave the way to avoid the full index scan at the
end of vacuum (in some cases).
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 10:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2016 at 17:31, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2016-08-04 16:29:09 +0530, Pavan Deolasee wrote:
Indexes whose values do not change do not require new index pointers.
Only
the index whose key is being changed will need a new index entry. The
new
index entry will be set to the CTID of the root line pointer.
That seems to require tracing all hot-chains in a page, to actually
figure out what the root line pointer of a warm-updated HOT tuple is,
provided it's HOT_UPDATED itself. Or have you found a smart way to
figure that out?Hmm, sorry, I did think of that point and I thought I had added it to the
doc.So, yes, I agree - re-locating the root is the biggest downside to
this idea. Perhaps Pavan has other thoughts?
I clearly missed this problem, so what I say now is not fully thought
through. I see couple of options:
1. Most often the tuple that's being updated will be fetched by recent
index scan. So we can potentially cache last (few) mappings of CTID to
their root line pointers. May be we can store that in RelationData on a per
relation basis.
2. I also think that most often the tuple that will be updated will have
its t_ctid pointing to itself. This sounds more invasive, but can we
somehow utilise the t_ctid field to store the OffsetNumber of the root line
pointer? The root line pointer only exists when the tuple under
consideration has HEAP_ONLY_TUPLE flag set and we know for such tuples,
BlockNumber in t_ctid must be same as current block (unless HEAP_UPDATED is
also set and the updating transaction eventually aborted).
We may have to fall back to a full page scan to find the root line pointer,
but we can limit that for corner cases.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 4, 2016 at 10:49 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Aug 4, 2016 at 06:16:02PM +0100, Simon Riggs wrote:
On 4 August 2016 at 18:05, Bruce Momjian <bruce@momjian.us> wrote:
Approach 2 seems more reasonable and simple.
There are only 2 bits for lp_flags and all combinations are already
used. But
for LP_REDIRECT root line pointer, we could use the lp_len field to
store this
special flag, which is not used for LP_REDIRECT line pointers. So we
are able
to mark the root line pointer.
Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.Hmm. Mostly there will be one, so this is just for the first update
after any VACUUM.Adding a new linepointer just to hold this seems kludgy and could mean
we run out of linepointers.Ah, so in cases where there isn't an existing LP_REDIRECT for the chain,
you create one and use the lp_len to identify it as a WARM chain? Hmm.
If the root tuple still exists, we store the WARM flag (or
HEAP_RECHECK_REQUIRED as used in the original post) in the tuple header
itself. When the root tuple becomes dead and HOT prune decides to replace
it with a LP_REDIRECT line pointer, the information is moved to lp_len
(which is currently set to 0 for LP_REDIRECT items). Does that answer your
question?
You can't update the indexes pointing to the existing ctid, so what you
would really have to do is to write over the existing ctid with
LP_REDIRECT plus WARM marker, and move the old ctid to a new ctid slot?
Not really. I hope the above answers this, but please let me know if you
mean something else.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 4, 2016 at 2:14 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
To fix this, what I was pondering, and I believe it would be viable,
is to teach heap_hot_search_buffer about the WARM invariant. It will
make correctness heavily depend on the invariant being true, which
means either flagging item pointers as WARM chains, or requiring a
reindex during upgrade to make sure all indexes verify the invariant -
as old indices won't.When an item pointer is WARM (either implicit or explicit),
heap_hot_search_buffer will only stop following the update chain if it
reaches not the end of a HOT chain, but rather a key change. For this,
it will need to recheck the key of all but the first TID in the chain,
so singleton chains (ie: not-updated or frozen tuples) won't incurr
this extra cost. When walking the chain, it will need to form an index
tuple for both the old and new versions, and compare them, and follow
the chain *only* if the keys are equal.As an optimization, if the tuple is marked HOT-updated, the key check
may be skipped.Vacuum will need to be taught not to break the invariant, but I
believe this is viable and efficient.
And I didn't mention the invariant.
The invariant would be that all WARM chains:
* contain entire HOT chains (ie: no item pointers pointing to the
middle of a HOT chain)
* start at the item pointer, and end when the t_ctid of the heap
tuple either points to another page, or a tuple with a different index
key
* there is exactly one WARM chain containing any heap tuple,
including the singleton chain represented by a tuple whose WARM chain
contains only itself
This is maintained by:
* No item pointer will be created for row versions whose immediately
previous version is already on the index with the same key
Cleanup can only proceed by applying cleanup on HOT chains, by moving
the head of a WARM chain forward, or by removing it entirely. It may
be that VACUUM already works like that, but I'm unsure.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 11:04:49PM +0530, Pavan Deolasee wrote:
If the root tuple still exists, we store the WARM flag (or
HEAP_RECHECK_REQUIRED as used in the original post) in the tuple header itself.
When the root tuple becomes dead and HOT prune decides to replace it with a
LP_REDIRECT line pointer, the information is moved to lp_len (which is
currently set to 0 for LP_REDIRECT items). Does that answer your question?
Ah, brilliant!
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 4, 2016 at 01:05:53PM -0400, Bruce Momjian wrote:
Approach 2 seems more reasonable and simple.�
There are only 2 bits for lp_flags and all combinations are already used. But
for LP_REDIRECT root line pointer, we could use the lp_len field to store this
special flag, which is not used for LP_REDIRECT line pointers. So we are able
to mark the root line pointer.Uh, as I understand it, we only use LP_REDIRECT when we have _removed_
the tuple that the ctid was pointing to, but it seems you would need to
set HEAP_RECHECK_REQUIRED earlier than that.Also, what is currently in the lp_len field for LP_REDIRECT? Zeros, or
random data? I am asking for pg_upgrade purposes.
It seems LP_REDIRECT's lp_len is always zero: :-)
/*
* ItemIdSetRedirect
* Set the item identifier to be REDIRECT, with the specified link.
* Beware of multiple evaluations of itemId!
*/
#define ItemIdSetRedirect(itemId, link) \
( \
(itemId)->lp_flags = LP_REDIRECT, \
(itemId)->lp_off = (link), \
(itemId)->lp_len = 0 \
^^^^^^^^^^
)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers