pgsql: Improve performance of get_actual_variable_range with recently-d

Started by Tom Laneover 8 years ago2 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Improve performance of get_actual_variable_range with recently-dead tuples.

In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return. However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.

To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead. The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed. This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster. We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.

The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries. But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.

We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people. Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.

Dmitriy Sarafannikov, reviewed by Andrey Borodin

Discussion: /messages/by-id/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd

Modified Files
--------------
src/backend/access/heap/heapam.c | 3 +++
src/backend/utils/adt/selfuncs.c | 40 +++++++++++++++++++++++++++-------------
src/backend/utils/time/tqual.c | 22 ++++++++++++++++++++++
src/include/utils/snapshot.h | 4 +++-
src/include/utils/tqual.h | 10 ++++++++++
5 files changed, 65 insertions(+), 14 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#1)
1 attachment(s)
Re: [COMMITTERS] pgsql: Improve performance of get_actual_variable_range with recently-d

On Fri, Sep 8, 2017 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Improve performance of get_actual_variable_range with recently-dead tuples.

In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return. However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.

To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead. The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed. This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster. We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.

The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries. But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.

We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people. Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.

Dmitriy Sarafannikov, reviewed by Andrey Borodin

Discussion: /messages/by-id/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd

Modified Files
--------------
src/backend/access/heap/heapam.c | 3 +++
src/backend/utils/adt/selfuncs.c | 40 +++++++++++++++++++++++++++-------------
src/backend/utils/time/tqual.c | 22 ++++++++++++++++++++++
src/include/utils/snapshot.h | 4 +++-
src/include/utils/tqual.h | 10 ++++++++++
5 files changed, 65 insertions(+), 14 deletions(-)

While reading README in nbtree I found a sentence about snapshot that
is "There is one minor exception, which is that the optimizer
sometimes looks at the boundaries of value ranges using
SnapshotDirty". As the snapshot being used has been changed to
SnapshotNonVacuumable by this commit should we fix this sentence?
Attached a patch for fixing this.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

fix_README_in_nbtree.patchapplication/octet-stream; name=fix_README_in_nbtree.patchDownload
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index a3f11da..f25760f 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -533,14 +533,15 @@ toast rows will also be visible. So as long as we follow a toast
 pointer from a visible (live) tuple the corresponding toast rows
 will also be visible, so we do not need to recheck MVCC on them.
 There is one minor exception, which is that the optimizer sometimes
-looks at the boundaries of value ranges using SnapshotDirty, which
-could result in returning a newer value for query statistics; this
-would affect the query plan in rare cases, but not the correctness.
-The risk window is small since the stats look at the min and max values
-in the index, so the scan retrieves a tid then immediately uses it
-to look in the heap. It is unlikely that the tid could have been
-deleted, vacuumed and re-inserted in the time taken to look in the heap
-via direct tid access. So we ignore that scan type as a problem.
+looks at the boundaries of value ranges using SnapshotNonVacuumable
+which uses HeapTupleSatisfiesVacuum(), which could result in returning
+a newer value for query statistics; this would affect the query plan
+in rare cases, but not the correctness. The risk window is small since
+the stats look at the min and max values in the index, so the scan
+retrieves a tid then immediately uses it to look in the heap. It is
+unlikely that the tid could have been deleted, vacuumed and re-inserted
+in the time taken to look in the heap via direct tid access. So we ignore
+that scan type as a problem.
 
 Other Things That Are Handy to Know
 -----------------------------------