commit 487fbecb3411a9eb0559af856cc9779f242cdc3e Author: Simon Riggs Date: Mon Jun 21 13:57:27 2021 +0100 New doc chapter for Hash Indexes diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 45b701426b..459add557b 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -89,6 +89,7 @@ + diff --git a/doc/src/sgml/hash.sgml b/doc/src/sgml/hash.sgml new file mode 100644 index 0000000000..6830b384bf --- /dev/null +++ b/doc/src/sgml/hash.sgml @@ -0,0 +1,161 @@ + + + +Hash Indexes + + + index + Hash + + + + Overview + + + PostgreSQL includes an implementation of persistent on-disk hash indexes, + which are now fully crash recoverable. Any data type can be indexed by a + hash index, including data types that do not have a well-defined linear + ordering. Hash indexes store only the hash value of the data being + indexed, thus there are no restrictions on the size of the data column + being indexed. + + + + Hash indexes support only single-column indexes and do not allow + uniqueness checking. + + + + Hash indexes support only the = operator, so WHERE clauses that specify + range operations will not be able to take advantage of hash indexes. + + + + Each hash index tuple stores just the 4-byte hash value, not the actual + column value. As a result, hash indexes may be much smaller than B-trees + when indexing longer data items such as UUIDs, URLs etc.. The absence of + the column value also makes all hash index scans lossy. Hash indexes may + take part in bitmap index scans and backward scans. + + + + Hash indexes are best optimized for SELECTs and UPDATEs using equality + scans on larger tables. In a B-tree index, searches must descend + through the tree until the leaf page is found. In tables with millions + of rows this descent can increase access time to data. The equivalent + of a leaf page in a hash index is referred to as a bucket page. In + contrast, a hash index allows accessing the bucket pages directly, + thereby potentially reducing index access time in larger tables. This + reduction in "logical I/O" becomes even more pronounced on indexes/data + larger than shared_buffers/RAM. + + + + Hash indexes have been designed to cope with uneven distributions of + hash values. Direct access to the bucket pages works well if the hash + values are evenly distributed. When inserts mean that the bucket page + becomes full, additional overflow pages are chained to that specific + bucket page, locally expanding the storage for index tuples that match + that hash value. When scanning a hash bucket during queries we need to + scan through all of the overflow pages. Thus an unbalanced hash index + might actually be worse than a B-tree in terms of number of block + accesses required, for some data. + + + + As a result of the overflow cases, we can say that hash indexes are + most suitable for unique, nearly unique data or data with a low number + of rows per hash bucket will be suitable for hash indexes. One + possible way to avoid problems is to exclude highly non-unique values + from the index using a partial index condition, but this may not be + suitable in many cases. + + + + Like B-Trees, hash indexes perform simple index tuple deletion. This + is a deferred maintenance operation that deletes index tuples that are + known to be safe to delete (those whose item identifier's LP_DEAD bit + is already set). This is performed speculatively upon each insert, + though may not succeed if the page is pinned by another backend. + Deletion of dead index pointers also occurs during VACUUM. If an + overflow page becomes empty, overflow pages can be recycled for reuse + in other buckets, though we never return them to the operating system. + + + + There is currently no provision to shrink a hash index, other than by + rebuilding it with REINDEX. + There is no provision for reducing the number of buckets, either. + + + + Hash indexes may expand the number of bucket pages as the number of + rows indexed grows. + The hash key-to-bucket-number mapping is chosen so that the index can be + incrementally expanded. When a new bucket is to be added to the index, + exactly one existing bucket will need to be "split", with some of its + tuples being transferred to the new bucket according to the updated + key-to-bucket-number mapping. This is essentially the same hash table + management technique embodied in src/backend/utils/hash/dynahash.c for + in-memory hash tables used within PostgreSQL internals. + + + + The expansion occurs in the foreground, which could increase execution + time for user inserts. Thus, hash indexes may not be suitable for tables + with rapidly increasing number of rows. + + + + + + Implementation + + + There are four kinds of pages in a hash index: the meta page (page zero), + which contains statically allocated control information; primary bucket + pages; overflow pages; and bitmap pages, which keep track of overflow + pages that have been freed and are available for re-use. For addressing + purposes, bitmap pages are regarded as a subset of the overflow pages. + + + + Both scanning the index and inserting tuples require locating the bucket + where a given tuple ought to be located. To do this, we need the bucket + count, highmask, and lowmask from the metapage; however, it's undesirable + for performance reasons to have to have to lock and pin the metapage for + every such operation. Instead, we retain a cached copy of the metapage + in each backend's relcache entry. This will produce the correct + bucket mapping as long as the target bucket hasn't been split since the + last cache refresh. + + + + Primary bucket pages and overflow pages are allocated independently since + any given index might need more or fewer overflow pages relative to its + number of buckets. The hash code uses an interesting set of addressing + rules to support a variable number of overflow pages while not having to + move primary bucket pages around after they are created. + + + + Each row in the table indexed is represented by a single index tuple in + the hash index. Hash index tuples are stored in the bucket pages, and if + they exist, the overflow pages. + We speed up searches by keeping the index entries in any + one index page sorted by hash code, thus allowing binary search to be used + within an index page. Note however that there is *no* assumption about the + relative ordering of hash codes across different index pages of a bucket. + + + + The bucket splitting algorithms to expand the hash index are too complex to + be worthy of mention here, though are described in more detail in + src/backend/access/hash/README. + The split algorithm is crash safe and can be restarted if not completed + successfully. + + + + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index d453be3909..dba9cf413f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -266,6 +266,7 @@ break is not needed in a wider output rendering. &spgist; &gin; &brin; + &hash; &storage; &bki; &planstats;