Proposal: Improve bitmap costing for lossy pages
I would like to propose a patch to improve the cost of bitmap heap
scan that is sensitive to work_mem. Currently, in bitmap scan, we
don't consider work_mem. Now, in cases when there are a lot of lossy
pages bitmap scan gets selected that eventually leads to degraded
performance.
While evaluating parallel bitmap heap scan on TPCH we noticed that in
many queries selecting bitmap heap scan gives good performance high
work_mem but at low work_mem it slows the query compared to sequence
scan or index scan. This shows that bitmap heap scan is a better
alternative when most of the heap pages fit into work_mem.
Attached POC patch fixes the problem by considering work_mem for bitmap costing.
Performance numbers with this patch on different values of work_mem
are as follows,
workload: TPCH scale factor 20
machine: POWER 8
work_mem = 4MB
Query Head(ms) Patch(ms) Improvement Change in plan
4 13759.632 14464.491 0.95x PBHS -> PSS
5 47581.558 41888.853 1.14x BHS -> SS
6 14051.553 13853.449 1.01x PBHS -> PSS
8 21529.98 11289.25 1.91x PBHS -> PSS
10 37844.51 34460.669 1.10x BHS -> SS
14 10131.49 15281.49 0.66x BHS -> SS
15 43579.833 34971.051 1.25x BHS -> SS
work_mem = 20MB
Query Head(ms) Patch(ms) Improvement Change in plan
6 14592 13521.06 1.08x PBHS -> PSS
8 20223.106 10716.062 1.89x PBHS -> PSS
15 40486.957 33687.706 1.20x BHS -> PSS
work_mem = 64MB
Query Head(ms) Patch(ms) Improvement Change in plan
15 40904.572 25750.873 1.59x BHS -> PBHS
work_mem = 1GB
No plan got changed
Most of the queries show decent improvement, however, Q14 shows
regression at work_mem = 4MB. On analysing this case, I found that
number of pages_fetched calculated by "Mackert and Lohman formula" is
very high (1112817) compared to the actual unique heap pages fetched
(293314). Therefore, while costing bitmap scan using 1112817 pages and
4MB of work_mem, we predicted that even after we lossify all the pages
it can not fit into work_mem, hence bitmap scan was not selected.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v1.patchapplication/octet-stream; name=improve_bitmap_cost_v1.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c66019e..20604d6 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -1562,3 +1562,38 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_exact_pages
+ *
+ * Calculate the number of exact pages which will fit into bitmap for given
+ * memory (maxbytes) and given pages.
+ */
+double
+tbm_calculate_exact_pages(double maxbytes, double pages)
+{
+ long nbuckets;
+ double exact = -1;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ /*
+ * Compute the number of exact_bucktes. Number of exact pages will be same
+ * as exact_bucket as each exct_bucket will contain only one page
+ * information. We derive below formulae by these 2 equations.
+ *
+ * Eq1: nbuckets = exact_bucket + lossy_buckets
+ * Eq2: total_pages = exact_bucket + (lossy_buckets * WORDS_PER_CHUNK)
+ */
+ exact = (WORDS_PER_CHUNK * nbuckets) / (WORDS_PER_CHUNK-1) -
+ pages / (WORDS_PER_CHUNK-1);
+
+ return exact;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 52643d0..1d18177 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5125,6 +5125,9 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5169,8 +5172,45 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate how many pages are going to exactly fit
+ * into bitmap and how many pages will be lossyfied.
+ *
+ * Fixme: Mackert and Lohman formula will give the number of heap pages
+ * fetched but for our calculation we need number of unique heap_pages
+ * which are actualy going to the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ exact_pages = tbm_calculate_exact_pages(work_mem * 1024L, heap_pages);
+
+ /*
+ * We have already got the exact pages so remaining pages will be lossy.
+ */
+ if (heap_pages > exact_pages)
+ lossy_pages = heap_pages - exact_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (exact_pages > 0 && lossy_pages)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
+ {
*cost = indexTotalCost;
+
+ /*
+ * If exact_pages are -ve then even after lossifying we will not be
+ * able to fit them into work_mem. Hence, disable this path.
+ */
+ if (exact_pages < 0)
+ *cost += disable_cost;
+ }
if (tuple)
*tuple = tuples_fetched;
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index 87f4bb7..39c3da3 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -71,4 +71,5 @@ extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern double tbm_calculate_exact_pages(double maxbytes, double pages);
#endif /* TIDBITMAP_H */
On Thu, May 18, 2017 at 2:52 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Most of the queries show decent improvement, however, Q14 shows
regression at work_mem = 4MB. On analysing this case, I found that
number of pages_fetched calculated by "Mackert and Lohman formula" is
very high (1112817) compared to the actual unique heap pages fetched
(293314). Therefore, while costing bitmap scan using 1112817 pages and
4MB of work_mem, we predicted that even after we lossify all the pages
it can not fit into work_mem, hence bitmap scan was not selected.
You might need to adjust effective_cache_size. The Mackert and Lohman
formula isn't exactly counting unique pages fetched. It will count
the same page twice if it thinks the page will be evicted from the
cache after the first fetch and before the second one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, May 18, 2017 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Thanks for the feedback and sorry for the delayed response.
You might need to adjust effective_cache_size.
You are right. But, effective_cache_size will have the impact on the
number of pages_fetched when it's used as parameterized path (i.e
inner side of the nested loop). But for our case where we see the
wrong number of pages got estimated (Q10), it was for the
non-parameterized path. However, I have also tested with high
effective cache size but did not observe any change.
The Mackert and Lohman
formula isn't exactly counting unique pages fetched.
Right.
It will count
the same page twice if it thinks the page will be evicted from the
cache after the first fetch and before the second one.
That too when loop count > 1. If loop_count =1 then seems like it
doesn't consider the effective_cache size. But, actually, multiple
tuples can fall into the same page.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 8, 2017 at 10:44 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, May 18, 2017 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Thanks for the feedback and sorry for the delayed response.
You might need to adjust effective_cache_size.
You are right. But, effective_cache_size will have the impact on the
number of pages_fetched when it's used as parameterized path (i.e
inner side of the nested loop). But for our case where we see the
wrong number of pages got estimated (Q10), it was for the
non-parameterized path.
Ah, oops. My mistake.
One thing to keep in mind that this is just an estimate. It's not
going to be right 100% of the time no matter what you do. The goal is
to make the estimates better than they are today, and the patch can
succeed in being better overall even if there are some cases where
things get worse. Have you tried to analyze what is causing the bad
estimate in this one case?
The formula that compute_bitmap_pages is using here to compute the
number of page fetches is (2.0 * T * tuples_fetched) / (2.0 * T +
tuples_fetched), where T is the number of pages in the table. Now the
idea here is that when tuples_fetched is small, the number of pages
fetched is likely to be almost equal to the number of tuples fetched,
because probably all of the tuples will be on separate pages. As the
number of tuples grows larger, we assume it's likely that sometimes
two or more of them will be on the same page, so pages_fetched grows
more slowly. When tuples_fetched = T, that is, the number of tuples
equals the number of pages, we estimate that we're fetching 2/3 of the
table, because some pages will have no tuples to fetch at all, while
others have more than one. When tuples_fetched = 2 * T or greater, we
assume we'll fetch every page in the table.
But this could be wrong. If there are 100 tuples per paged, we could
have tuples_fetched = 2 * T but actually fetch only T / 50 pages
rather than T pages, if all the rows we need to fetch are tightly
clustered. That would be a 50x estimation error; the one you're
seeing is about 3.8x. And my guess is that it's exactly this problem:
the TIDs being fetched are not spread out evenly through the whole
table, but are rather all clustered, but you could try to verify that
through some experimentation. I'm not sure we have the statistics to
solve that problem in a principled way. It seems loosely related to
the physical-to-logical correlation which we do store, but not so
closely that any way of using that information directly is obvious.
Instead of trying to immediate improve things on the optimizer side,
I'm wondering whether our first step should be to try to improve
things on the executor side - i.e. reduce the number of pages that
actually get lossified. tbm_lossify says:
* XXX Really stupid implementation: this just lossifies pages in
* essentially random order. We should be paying some attention to the
* number of bits set in each page, instead.
As the comment says, the current implementation is really stupid,
which means we're lossifying more pages than really necessary. There
is some previous discussion of this topic here:
/messages/by-id/20160923205843.zcs533sqdzlh4cpo@alap3.anarazel.de
There are two main considerations here. One, it's better to lossify
pages with many bits set than with few bits set, because the
additional work we thereby incur is less. Two, it's better to lossify
pages that are in the same chunk as other pages which we are also
going to lossify, because that's how we actually save memory. The
current code will cheerfully lossify a chunk that contains no other
pages, or will lossify one page from a chunk but not the others,
saving no memory but hurting performance.
Maybe the simplest change here would be to make it so that when we
decide to lossify a chunk, we lossify all pages in the chunk, but only
if there's more than one. In other words, given a proposed page P to
lossify, probe the hash table for all keys in the same chunk as P and
build up a words[] array for the proposed chunk. If that words[]
array will end up with only 1 bit set, then forget the whole thing;
otherwise, delete all of the entries for the individual pages and
insert the new chunk instead.
A further refinement would be to try to do a better job picking which
chunks to lossify in the first place. I don't have a clear idea of
how we could go about doing that. There's an unused padding byte
available inside PageTableEntry, and really it's more like 28 bits,
because status only needs 2 bits and ischunk and recheck only need 1
bit each. So without increasing the memory usage at all, we could use
those bits to store some kind of information that would give us a clue
as to whether a certain entry was likely to be a good candidate for
lossification. What to store there is a little less clear, but one
idea is to store the number of page table entries that could be saved
by lossifying the chunk. We could iterate through the hash table once
and work out the correct value for every chunk, storing 0 for any
pages that wouldn't be chunk headers. As we go, we could keep a
separate array that counts the number of times we found an opportunity
for lossification that would save N pages; that is, maintain an array
chance_to_save[PAGES_PER_CHUNK] such that after looping through the
whole page table, chances_to_save[i] is equal to the number of page
table entries for which lossifying all pages in the chunk would save i
entries. Then, based on how many entries we need to save, we could
cheaply compute a threshold value and lossify all chunks that save at
least that many pages. This isn't perfect because it ignores the
number of bits set for each individual page, and it adds some cost
because you have to iterate through the hash table twice, but it seems
pretty good -- you lossify the chunks where it saves the most storage
instead of picking randomly.
As far as the patch itself is concerned, tbm_calculate_exact_pages()
is computing the number of "exact pages" which will fit into the
TIDBitmap, but I think that instead of tbm_calculate_exact_pages() you
should have something like tbm_calculate_entries() that just returns
nbuckets, and then let the caller work out how many entries are going
to be exact and how many are going to be inexact. An advantage of
that approach is that the new function could be used by tbm_create()
instead of duplicating the logic. I'm not sure that the way you are
doing the rest of the calculation is wrong, but I've got no confidence
that it's right, either: the way WORDS_PER_CHUNK is used looks pretty
random, and the comments aren't enough for me to figure it out.
It's unclear what assumptions we should make while trying to estimate
the number of lossy pages. The effectiveness of lossification depends
on the average number of pages that get folded into a chunk; but how
many will that be? If we made some of the improvements proposed
above, it would probably be higher than it is now, but either way it's
not clear what number to use. One possible assumption is that the
pages that get lossified are exactly average, so:
double entries_saved_per_lossy_page = Max(heap_pages_fetched /
tbm_max_entries - 1, 1.0);
lossy_pages = (heap_pages_fetched - tbm_max_entries) /
pages_saved_per_lossy_page;
exact_pages = heap_pages_fetched - lossy_pages;
If the TBM fits into work_mem, heap_pages_fetched / tbm_max_entries is
the average number of entries per chunk, so one less than that value
is the number of pages we expect to save by lossifying an average
chunk and all of its entries. This might even be too optimistic given
the way tbm_lossify() works today, since there's currently no
guarantee we'd save anything at all; we might lossify a bunch of extra
stuff just for fun.
Another possible assumption is that the pages that get lossified are
particularly good candidates for lossification -- they are, say, twice
as dense as the typical page. To reflect such an assumption, you'd
just make entries_saved_per_lossy_page bigger e.g. by inserting "2 *"
at the front of the formula.
There could be other ways of computing this, too -- you've got one! --
but I'm not sure that WORDS_PER_CHUNK should be involved at all. The
number of entries saved per lossy page will only be WORDS_PER_CHUNK -
1 in the really fortunate case where not only does the algorithm
always pick the chunk with the most pages as the next one to lossify,
but also that chunk always has the maximum number of possible pages in
it. That isn't likely on real data distributions.
Curious to hear more of your (or anyone's) thoughts on this. This is
a tricky problem and the performance gains you've gotten seem to show
this area is clearly worth some effort.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 26, 2017 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 8, 2017 at 10:44 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, May 18, 2017 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Thanks for the feedback. I haven't yet worked on optimizer side
feedback but I have done some work for improving the executor part,
please find my comment inline.
There are two main considerations here. One, it's better to lossify
pages with many bits set than with few bits set, because the
additional work we thereby incur is less. Two, it's better to lossify
pages that are in the same chunk as other pages which we are also
going to lossify, because that's how we actually save memory. The
current code will cheerfully lossify a chunk that contains no other
pages, or will lossify one page from a chunk but not the others,
saving no memory but hurting performance.Maybe the simplest change here would be to make it so that when we
decide to lossify a chunk, we lossify all pages in the chunk, but only
if there's more than one. In other words, given a proposed page P to
lossify, probe the hash table for all keys in the same chunk as P and
build up a words[] array for the proposed chunk. If that words[]
array will end up with only 1 bit set, then forget the whole thing;
otherwise, delete all of the entries for the individual pages and
insert the new chunk instead.
I have attempted a very simple POC with this approach just to see how
many lossy pages we can save if we lossify all the pages falling in
the same chunk first, before moving to the next page. I have taken
some data on TPCH scale 20 with different work_mem (only calculated
lossy pages did not test performance). I did not see a significant
reduction in lossy pages. (POC patch attached with the mail in case
someone is interested to test or more experiment).
64MB
TPCH Query Head Lossy_pages Patch Lossy_pages
lossy_page_reduce
Q6 534984 529745
5239
Q15 535072 529785 5287
Q20 1586933 1584731 2202
40MB
TPCH Query Head Lossy_pages Patch Lossy_pages lossy_page_reduce
Q6 995223 993490
1733
Q14 337894 332890
5004
Q15 995417 993511
1906
Q20 1654016 1652982
1034
20MB
TPCH Query Head Lossy_pages Patch Lossy_pages
lossy_page_reduce
Q4 166551 165280
1271
Q5 330679 330028
651
Q6 1160339 1159937
402
Q14 666897 666032
865
Q15 1160518 1160017
501
Q20 1982981 1982828
153
As far as the patch itself is concerned, tbm_calculate_exact_pages()
is computing the number of "exact pages" which will fit into the
TIDBitmap, but I think that instead of tbm_calculate_exact_pages() you
should have something like tbm_calculate_entries() that just returns
nbuckets, and then let the caller work out how many entries are going
to be exact and how many are going to be inexact. An advantage of
that approach is that the new function could be used by tbm_create()
instead of duplicating the logic.
Ok
I'm not sure that the way you are
doing the rest of the calculation is wrong, but I've got no confidence
that it's right, either: the way WORDS_PER_CHUNK is used looks pretty
random, and the comments aren't enough for me to figure it out.
+ * Eq1: nbuckets = exact_bucket + lossy_buckets
+ * Eq2: total_pages = exact_bucket + (lossy_buckets * WORDS_PER_CHUNK)
I have derived my formulae based on these two equations. But, it
assumes that all the lossy_buckets(chunk) will hold a WORDS_PER_CHUNK
number of pages, which seems very optimistic.
It's unclear what assumptions we should make while trying to estimate
the number of lossy pages. The effectiveness of lossification depends
on the average number of pages that get folded into a chunk; but how
many will that be? If we made some of the improvements proposed
above, it would probably be higher than it is now, but either way it's
not clear what number to use. One possible assumption is that the
pages that get lossified are exactly average, so:double entries_saved_per_lossy_page = Max(heap_pages_fetched /
tbm_max_entries - 1, 1.0);
lossy_pages = (heap_pages_fetched - tbm_max_entries) /
pages_saved_per_lossy_page;
exact_pages = heap_pages_fetched - lossy_pages;
Seems ok until "entries_saved_per_lossy_page is 2" but if this become
more than 2 then this calculation seems problamatic. Consider below
examples:
heap_pages_fetched = 100 and tbm_max_entries = 25
then with the above formulae
lossy_pages = (100-25)/3 = 25, exact_pages=75
heap_pages_fetched = 100 and tbm_max_entries = 10
lossy_pages = (100-10)/9 = 10 and exact_pages=90
So by reducing the tbm_max_entries I am getting more exact pages,
which seems wrong. It seems to me that if
entries_saved_per_lossy_page is > 2 then if we calculate the
exact_pages the same way we are calculating lossy_pages then it will
be more accurate.
i.e.
exact_pages = (heap_pages_fetched - tbm_max_entries)
/pages_saved_per_lossy_page;
If the TBM fits into work_mem, heap_pages_fetched / tbm_max_entries is
the average number of entries per chunk, so one less than that value
is the number of pages we expect to save by lossifying an average
chunk and all of its entries. This might even be too optimistic given
the way tbm_lossify() works today, since there's currently no
guarantee we'd save anything at all; we might lossify a bunch of extra
stuff just for fun.Another possible assumption is that the pages that get lossified are
particularly good candidates for lossification -- they are, say, twice
as dense as the typical page. To reflect such an assumption, you'd
just make entries_saved_per_lossy_page bigger e.g. by inserting "2 *"
at the front of the formula.There could be other ways of computing this, too -- you've got one! --
but I'm not sure that WORDS_PER_CHUNK should be involved at all. The
number of entries saved per lossy page will only be WORDS_PER_CHUNK -
1 in the really fortunate case where not only does the algorithm
always pick the chunk with the most pages as the next one to lossify,
but also that chunk always has the maximum number of possible pages in
it. That isn't likely on real data distributions.Curious to hear more of your (or anyone's) thoughts on this. This is
a tricky problem and the performance gains you've gotten seem to show
this area is clearly worth some effort.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_lossify_approach2.patchapplication/octet-stream; name=improve_lossify_approach2.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53ad..6b543c6 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -1395,38 +1395,110 @@ tbm_lossify(TIDBitmap *tbm)
Assert(tbm->iterating == TBM_NOT_ITERATING);
Assert(tbm->status == TBM_HASH);
- pagetable_start_iterate_at(tbm->pagetable, &i, tbm->lossify_start);
+ pagetable_start_iterate_at(tbm->pagetable, &i, 0);
while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL)
{
+ PagetableEntry chunk = {0};
+ BlockNumber chunk_pageno;
+ bitmapword *words = NULL;
+ int bitno;
+ int bitnum;
+ int wordnum;
+ int npages;
+
if (page->ischunk)
continue; /* already a chunk header */
- /*
- * If the page would become a chunk header, we won't save anything by
- * converting it to lossy, so skip it.
- */
if ((page->blockno % PAGES_PER_CHUNK) == 0)
continue;
- /* This does the dirty work ... */
- tbm_mark_page_lossy(tbm, page->blockno);
+ /*
+ * Find the chunk number in which this page will fall. Find all the
+ * pages in pagestable which will fall in the same chunk. if number
+ * of pages in the chunk will be more than one then lossify all the
+ * pages of this chunk.
+ */
+ bitno = page->blockno % PAGES_PER_CHUNK;
+ chunk_pageno = page->blockno - bitno;
+
+ page = pagetable_lookup(tbm->pagetable, chunk_pageno);
+ if (page && page->ischunk)
+ words = page->words;
- if (tbm->nentries <= tbm->maxentries / 2)
+ npages = 0;
+
+ for (bitno = 0; bitno < PAGES_PER_CHUNK; bitno++)
{
- /*
- * We have made enough room. Remember where to start lossifying
- * next round, so we evenly iterate over the hashtable.
- */
- tbm->lossify_start = i.cur;
- break;
+ page = pagetable_lookup(tbm->pagetable, chunk_pageno + bitno);
+ if (page != NULL && !page->ischunk)
+ {
+ wordnum = WORDNUM(bitno);
+ bitnum = BITNUM(bitno);
+ chunk.words[wordnum] |= ((bitmapword) 1 << bitnum);
+ if (words)
+ words[wordnum] |= ((bitmapword) 1 << bitnum);
+ npages++;
+ }
}
- /*
- * Note: tbm_mark_page_lossy may have inserted a lossy chunk into the
- * hashtable and may have deleted the non-lossy chunk. We can
- * continue the same hash table scan, since failure to visit one
- * element or visiting the newly inserted element, isn't fatal.
- */
+ if (npages > 1)
+ {
+ int schunkbit = 0;
+ bool found;
+
+ while (npages-- > 0)
+ {
+ tbm_advance_schunkbit(&chunk, &schunkbit);
+ pagetable_delete(tbm->pagetable, chunk_pageno + schunkbit);
+ schunkbit++;
+ tbm->nentries--;
+ tbm->npages--;
+ }
+
+ /* Look up or create entry for chunk-header page */
+ page = pagetable_insert(tbm->pagetable, chunk_pageno, &found);
+
+ /* Initialize it if not present before */
+ if (!found)
+ {
+ char oldstatus = page->status;
+
+ MemSet(page, 0, sizeof(PagetableEntry));
+
+ /* we assume it had some tuple bit(s) set, so mark it lossy */
+ memcpy(page->words, chunk.words, sizeof(page->words));
+
+ page->status = oldstatus;
+ page->blockno = chunk_pageno;
+ page->ischunk = true;
+
+ /* must count it too */
+ tbm->nentries++;
+ tbm->nchunks++;
+ }
+ else if (!page->ischunk)
+ {
+ char oldstatus = page->status;
+
+ /* chunk header page was formerly non-lossy, make it lossy */
+ MemSet(page, 0, sizeof(PagetableEntry));
+ page->status = oldstatus;
+ page->blockno = chunk_pageno;
+ page->ischunk = true;
+
+ /* we assume it had some tuple bit(s) set, so mark it lossy */
+ memcpy(page->words, chunk.words, sizeof(page->words));
+ /* adjust counts */
+ tbm->nchunks++;
+ tbm->npages--;
+ }
+
+ if (tbm->nentries <= tbm->maxentries / 2)
+ {
+ /* We have made enough room. */
+ break;
+ }
+ }
}
/*
On Thu, Aug 17, 2017 at 12:06 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have attempted a very simple POC with this approach just to see how
many lossy pages we can save if we lossify all the pages falling in
the same chunk first, before moving to the next page. I have taken
some data on TPCH scale 20 with different work_mem (only calculated
lossy pages did not test performance). I did not see a significant
reduction in lossy pages. (POC patch attached with the mail in case
someone is interested to test or more experiment).
That's not an impressive savings. Maybe this approach is a dud, and
we should go back to just tackling the planner end of it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Dilip,
Recently I was thinking about this too, when working on the index-only
count(*) patch for indexes supporting amgetbitmap [1]/messages/by-id/251401bb-6f53-b957-4128-578ac22e8acf@postgrespro.ru. That patch
teaches bitmap heap scan node to skip heap fetches under certain
conditions. Exact tidbitmap pages are a prerequisite for this, so the
lossines of the bitmap heavily influences the cost of a scan.
I used a very simple estimation: lossy_pages = max(0, total_pages -
maxentries / 2). The rationale is that after the initial lossification,
the number of lossy pages grows slower. It is good enough to reflect
this initial sharp increase in the lossy page number.
The thing that seems more important to me here is that the tidbitmap is
very aggressive in lossifying the pages: it tries to keep the number of
entries under maxentries / 2, see tbm_lossify():
...
if (tbm->nentries <= tbm->maxentries / 2)
{
/*
* We have made enough room.
...
I think we could try higher fill factor, say, 0.9. tbm_lossify basically
just continues iterating over the hashtable with not so much overhead
per a call, so calling it more frequently should not be a problem. On
the other hand, it would have to process less pages, and the bitmap
would be less lossy.
I didn't benchmark the index scan per se with the 0.9 fill factor, but
the reduction of lossy pages was significant.
Regards,
Alexander Kuzmenkov
[1]: /messages/by-id/251401bb-6f53-b957-4128-578ac22e8acf@postgrespro.ru
/messages/by-id/251401bb-6f53-b957-4128-578ac22e8acf@postgrespro.ru
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 22, 2017 at 8:40 PM, Alexander Kumenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Hi Dilip,
Recently I was thinking about this too, when working on the index-only
count(*) patch for indexes supporting amgetbitmap [1]. That patch teaches
bitmap heap scan node to skip heap fetches under certain conditions. Exact
tidbitmap pages are a prerequisite for this, so the lossines of the bitmap
heavily influences the cost of a scan.I used a very simple estimation: lossy_pages = max(0, total_pages -
maxentries / 2). The rationale is that after the initial lossification, the
number of lossy pages grows slower. It is good enough to reflect this
initial sharp increase in the lossy page number.
Make sense to me.
The thing that seems more important to me here is that the tidbitmap is very
aggressive in lossifying the pages: it tries to keep the number of entries
under maxentries / 2, see tbm_lossify():
...
if (tbm->nentries <= tbm->maxentries / 2)
{
/*
* We have made enough room.
...
I think we could try higher fill factor, say, 0.9. tbm_lossify basically
just continues iterating over the hashtable with not so much overhead per a
call, so calling it more frequently should not be a problem. On the other
hand, it would have to process less pages, and the bitmap would be less
lossy.I didn't benchmark the index scan per se with the 0.9 fill factor, but the
reduction of lossy pages was significant.
I will try this and produce some performance number.
Thanks for the input.
[1]
/messages/by-id/251401bb-6f53-b957-4128-578ac22e8acf@postgrespro.ru
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 23, 2017 at 9:45 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
...
if (tbm->nentries <= tbm->maxentries / 2)
{
/*
* We have made enough room.
...
I think we could try higher fill factor, say, 0.9. tbm_lossify basically
just continues iterating over the hashtable with not so much overhead per a
call, so calling it more frequently should not be a problem. On the other
hand, it would have to process less pages, and the bitmap would be less
lossy.I didn't benchmark the index scan per se with the 0.9 fill factor, but the
reduction of lossy pages was significant.I will try this and produce some performance number.
I have done some performance testing as suggested by Alexander (patch attached).
Performance results: I can see a significant reduction in lossy_pages
count in all the queries and also a noticeable reduction in the
execution time in some of the queries. I have tested with 2 different
work_mem. Below are the test results (lossy pages count and execution
time).
TPCH benchmark: 20 scale factor
Machine: Power 4 socket
Tested with max_parallel_worker_per_gather=0
Work_mem: 20 MB
(Lossy Pages count:)
Query head patch
4 166551 35478
5 330679 35765
6 1160339 211357
14 666897 103275
15 1160518 211544
20 1982981 405903
(Time in ms:)
Query head patch
4 14849 14093
5 76790 74486
6 25816 14327
14 16011 11093
15 51381 35326
20 211115 195501
Work_mem: 40 MB
(Lossy Pages count)
Query head patch
6 995223 195681
14 337894 75744
15 995417 195873
20 1654016 199113
(Time in ms)
Query head patch
6 23819 14571
14 13514 11183
15 49980 32400
20 204441 188978
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
lossify_slow.patchapplication/octet-stream; name=lossify_slow.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53ad..a20505b 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -1411,7 +1411,7 @@ tbm_lossify(TIDBitmap *tbm)
/* This does the dirty work ... */
tbm_mark_page_lossy(tbm, page->blockno);
- if (tbm->nentries <= tbm->maxentries / 2)
+ if (tbm->nentries <= tbm->maxentries * 0.9)
{
/*
* We have made enough room. Remember where to start lossifying
On Tue, Aug 29, 2017 at 1:08 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
(Time in ms)
Query head patch6 23819 14571
14 13514 11183
15 49980 32400
20 204441 188978
These are cool results, but this patch is obviously not ready for
prime time as-is, since there are various other references that will
need to be updated:
* Since we are called as soon as nentries exceeds maxentries, we should
* push nentries down to significantly less than maxentries, or else we'll
* just end up doing this again very soon. We shoot for maxentries/2.
/*
* With a big bitmap and small work_mem, it's possible that we cannot get
* under maxentries. Again, if that happens, we'd end up uselessly
* calling tbm_lossify over and over. To prevent this from becoming a
* performance sink, force maxentries up to at least double the current
* number of entries. (In essence, we're admitting inability to fit
* within work_mem when we do this.) Note that this test will not fire if
* we broke out of the loop early; and if we didn't, the current number of
* entries is simply not reducible any further.
*/
if (tbm->nentries > tbm->maxentries / 2)
tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;
I suggest defining a TBM_FILLFACTOR constant instead of repeating the
value 0.9 in a bunch of places. I think it would also be good to try
to find the sweet spot for that constant. Making it bigger reduces
the number of lossy entries created, but making it smaller reduces
the number of times we have to walk the bitmap. So if, for example,
0.75 is sufficient to produce almost all of the gain, then I think we
would want to prefer 0.75 to 0.9. But if 0.9 is better, then we can
stick with that.
Note that a value higher than 0.9375 wouldn't be sane without some
additional safety precautions because maxentries could be as low as
16.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 30, 2017 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I suggest defining a TBM_FILLFACTOR constant instead of repeating the
value 0.9 in a bunch of places. I think it would also be good to try
to find the sweet spot for that constant. Making it bigger reduces
the number of lossy entries created, but making it smaller reduces
the number of times we have to walk the bitmap. So if, for example,
0.75 is sufficient to produce almost all of the gain, then I think we
would want to prefer 0.75 to 0.9. But if 0.9 is better, then we can
stick with that.Note that a value higher than 0.9375 wouldn't be sane without some
additional safety precautions because maxentries could be as low as
16.
Thanks for the feedback. I will work on it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 2:26 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Thanks for the feedback. I will work on it.
Another thought is that you probably want/need to test across a range
of work_mem values.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 11:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I have repeated one of the tests after fixing the problems pointed by
you but this time results are not that impressive. Seems like below
check was the problem in the previous patch
if (tbm->nentries > tbm->maxentries / 2)
tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;
Because we were lossifying only till tbm->nentries becomes 90% of
tbm->maxentries but later we had this check which will always be true
and tbm->maxentries will be doubled and that was the main reason of
huge reduction of lossy pages, basically, we started using more
work_mem in all the cases.
I have taken one reading just to see the impact after fixing the
problem with the patch.
Work_mem: 40 MB
(Lossy Pages count)
Query head patch
6 995223 733087
14 337894 206824
15 995417 798817
20 1654016 1588498
Still, we see a good reduction in lossy pages count. I will perform
the test at different work_mem and for different values of
TBM_FILFACTOR and share the number soon.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 4, 2017 at 11:18 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 31, 2017 at 11:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I have repeated one of the tests after fixing the problems pointed by
you but this time results are not that impressive. Seems like below
check was the problem in the previous patchif (tbm->nentries > tbm->maxentries / 2)
tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;Because we were lossifying only till tbm->nentries becomes 90% of
tbm->maxentries but later we had this check which will always be true
and tbm->maxentries will be doubled and that was the main reason of
huge reduction of lossy pages, basically, we started using more
work_mem in all the cases.I have taken one reading just to see the impact after fixing the
problem with the patch.Work_mem: 40 MB
(Lossy Pages count)Query head patch
6 995223 733087
14 337894 206824
15 995417 798817
20 1654016 1588498Still, we see a good reduction in lossy pages count. I will perform
the test at different work_mem and for different values of
TBM_FILFACTOR and share the number soon.
I haven't yet completely measured the performance with executor
lossification change, meanwhile, I have worked on some of the comments
on optimiser change and taken the performance again, I still see good
improvement in the performance (almost 2x for some of the queries) and
with new method of lossy pages calculation I don't see regression in
Q14 (now Q14 is not changing its plan).
I used lossy_pages = max(0, total_pages - maxentries / 2). as
suggesed by Alexander.
Performance Results:
Machine: Intell 56 core machine (2 NUMA node)
work_mem: varies.
TPCH S.F: 20
Median of 3 runs.
work_mem = 4MB
Query Patch(ms) Head(ms) Change in plan
4 4686.186 5039.295 PBHS -> PSS
5 26772.192 27500.800 BHS -> SS
6 6615.916 7760.005 PBHS -> PSS
8 6370.611 12407.731 PBHS -> PSS
15 17493.564 24242.256 BHS -> SS
work_mem = 20MB
Query Patch(ms) Head(ms) Change in plan
6 6656.467 7469.961 PBHS -> PSS
8 6116.526 12300.784 PBHS -> PSS
15 17873.726 22913.421 BHS -> PSS
work_mem = 64MB
Query Patch(ms) Head(ms) Change in plan
15 14900.881 27460.093 BHS -> PBHS
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v2.patchapplication/octet-stream; name=improve_bitmap_cost_v2.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53ad..6689c63 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -295,10 +295,7 @@ tbm_create(long maxbytes, dsa_area *dsa)
* for our purpose. Also count an extra Pointer per entry for the arrays
* created during iteration readout.
*/
- nbuckets = maxbytes /
- (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
- nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
- nbuckets = Max(nbuckets, 16); /* sanity limit */
+ nbuckets = tbm_calculate_entires(maxbytes);
tbm->maxentries = (int) nbuckets;
tbm->lossify_start = 0;
tbm->dsa = dsa;
@@ -1562,3 +1559,26 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_entires
+ *
+ * Calculate the number of estimated entries in the bitmap for given memory
+ * in bytes.
+ */
+long
+tbm_calculate_entires(double maxbytes)
+{
+ long nbuckets;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ return nbuckets;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..5b15f54 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5132,6 +5132,10 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
+ long maxentries;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5176,8 +5180,42 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate get the estimated maxentries in the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ maxentries = tbm_calculate_entires(work_mem * 1024L);
+
+ /*
+ * After the initial lossification that is maxentries / 2, the number of
+ * lossy pages grows slower. It is good enough to reflect this initial
+ * sharp increase in the lossy page number.
+ */
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
+ exact_pages = heap_pages - lossy_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (exact_pages > 0 && lossy_pages)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
+ {
*cost = indexTotalCost;
+
+ /*
+ * If exact_pages are -ve then even after lossifying we will not be
+ * able to fit them into work_mem. Hence, disable this path.
+ */
+ if (exact_pages < 0)
+ *cost += disable_cost;
+ }
if (tuple)
*tuple = tuples_fetched;
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index f9a1902..1a8d9fc 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -70,5 +70,6 @@ extern void tbm_end_iterate(TBMIterator *iterator);
extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern long tbm_calculate_entires(double maxbytes);
#endif /* TIDBITMAP_H */
On Sun, Sep 17, 2017 at 4:34 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have repeated one of the tests after fixing the problems pointed by
you but this time results are not that impressive. Seems like below
check was the problem in the previous patchif (tbm->nentries > tbm->maxentries / 2)
tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;Because we were lossifying only till tbm->nentries becomes 90% of
tbm->maxentries but later we had this check which will always be true
and tbm->maxentries will be doubled and that was the main reason of
huge reduction of lossy pages, basically, we started using more
work_mem in all the cases.I have taken one reading just to see the impact after fixing the
problem with the patch.Work_mem: 40 MB
(Lossy Pages count)Query head patch
6 995223 733087
14 337894 206824
15 995417 798817
20 1654016 1588498Still, we see a good reduction in lossy pages count. I will perform
the test at different work_mem and for different values of
TBM_FILFACTOR and share the number soon.I haven't yet completely measured the performance with executor
lossification change, meanwhile, I have worked on some of the comments
on optimiser change and taken the performance again, I still see good
improvement in the performance (almost 2x for some of the queries) and
with new method of lossy pages calculation I don't see regression in
Q14 (now Q14 is not changing its plan).I used lossy_pages = max(0, total_pages - maxentries / 2). as
suggesed by Alexander.Performance Results:
Machine: Intell 56 core machine (2 NUMA node)
work_mem: varies.
TPCH S.F: 20
Median of 3 runs.work_mem = 4MB
Query Patch(ms) Head(ms) Change in plan
4 4686.186 5039.295 PBHS -> PSS
5 26772.192 27500.800 BHS -> SS
6 6615.916 7760.005 PBHS -> PSS
8 6370.611 12407.731 PBHS -> PSS
15 17493.564 24242.256 BHS -> SS
work_mem = 20MB
Query Patch(ms) Head(ms) Change in plan
6 6656.467 7469.961 PBHS -> PSS
8 6116.526 12300.784 PBHS -> PSS
15 17873.726 22913.421 BHS -> PSS
work_mem = 64MB
Query Patch(ms) Head(ms) Change in plan
15 14900.881 27460.093 BHS -> PBHS
There was some problem with the previous patch, even if the bitmap was
enough to hold all the heap pages I was calculating the lossy pages.
I have fixed that in the attached patch. I have also verified the
performance it's same as reported in the previous email.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v3.patchapplication/octet-stream; name=improve_bitmap_cost_v3.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index c4e53ad..6689c63 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -295,10 +295,7 @@ tbm_create(long maxbytes, dsa_area *dsa)
* for our purpose. Also count an extra Pointer per entry for the arrays
* created during iteration readout.
*/
- nbuckets = maxbytes /
- (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
- nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
- nbuckets = Max(nbuckets, 16); /* sanity limit */
+ nbuckets = tbm_calculate_entires(maxbytes);
tbm->maxentries = (int) nbuckets;
tbm->lossify_start = 0;
tbm->dsa = dsa;
@@ -1562,3 +1559,26 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_entires
+ *
+ * Calculate the number of estimated entries in the bitmap for given memory
+ * in bytes.
+ */
+long
+tbm_calculate_entires(double maxbytes)
+{
+ long nbuckets;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ return nbuckets;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..b178caf 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5132,6 +5132,10 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
+ long maxentries;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5176,6 +5180,33 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate get the estimated maxentries in the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ maxentries = tbm_calculate_entires(work_mem * 1024L);
+
+ /*
+ * After the initial lossification that is maxentries / 2, the number of
+ * lossy pages grows slower. It is good enough to reflect this initial
+ * sharp increase in the lossy page number.
+ */
+ if (maxentries < heap_pages)
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
+
+ exact_pages = heap_pages - lossy_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (lossy_pages > 0)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
*cost = indexTotalCost;
if (tuple)
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index f9a1902..1a8d9fc 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -70,5 +70,6 @@ extern void tbm_end_iterate(TBMIterator *iterator);
extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern long tbm_calculate_entires(double maxbytes);
#endif /* TIDBITMAP_H */
On Sun, Sep 17, 2017 at 7:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I used lossy_pages = max(0, total_pages - maxentries / 2). as
suggesed by Alexander.
Does that formula accurately estimate the number of lossy pages?
The performance results look good, but that's a slightly different
thing from whether the estimate is accurate.
+ nbuckets = tbm_calculate_entires(maxbytes);
entires?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 5, 2017 at 8:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Sep 17, 2017 at 7:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I used lossy_pages = max(0, total_pages - maxentries / 2). as
suggesed by Alexander.Does that formula accurately estimate the number of lossy pages?
I have printed the total_pages, exact_pages and lossy_pages during
planning time, and for testing purpose, I tweak the code a bit so that
it doesn't consider lossy_pages in cost calculation (same as base
code).
I have tested TPCH scale factor 20. at different work_mem(4MB, 20MB,
64MB) and noted down the estimated pages vs actual pages.
Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.
work_mem=4 MB
query:4
estimated: total_pages=552472.000000 exact_pages=32768.000000
lossy_pages=519704.000000
actual: exact=18548 lossy=146141
query:6
estimated: total_pages=1541449.000000 exact_pages=32768.000000
lossy_pages=1508681.000000
actual: exact=13417 lossy=430385
query:8
estimated: total_pages=552472.000000 exact_pages=32768.000000
lossy_pages=519704.000000
actual: exact=56869 lossy=495603
query:14
estimated: total_pages=1149603.000000 exact_pages=32768.000000
lossy_pages=1116835.000000
actual: exact=17115 lossy=280949
work_mem: 20 MB
query:4
estimated: total_pages=552472.000000 exact_pages=163840.000000
lossy_pages=388632.000000
actual: exact=109856 lossy=57761
query:6
estimated: total_pages=1541449.000000 exact_pages=163840.000000
lossy_pages=1377609.000000
actual: exact=59771 lossy=397956
query:8
estimated: total_pages=552472.000000 exact_pages=163840.000000
lossy_pages=388632.000000
actual: Heap Blocks: exact=221777 lossy=330695
query:14
estimated: total_pages=1149603.000000 exact_pages=163840.000000
lossy_pages=985763.000000
actual: exact=63381 lossy=235513
work_mem:64 MB
query:4
estimated: total_pages=552472.000000 exact_pages=552472.000000
lossy_pages=0.000000
actual: exact=166005 lossy=0
query:6
estimated: total_pages=1541449.000000 exact_pages=524288.000000
lossy_pages=1017161.000000
actual: exact=277717 lossy=185919
query:8
estimated: total_pages=552472.000000 exact_pages=552472.000000
lossy_pages=0.000000
actual: exact=552472 lossy=0
query:14
estimated: total_pages=1149603.000000 exact_pages=524288.000000
lossy_pages=625315.000000
actual: exact=309091 lossy=0
The performance results look good, but that's a slightly different
thing from whether the estimate is accurate.+ nbuckets = tbm_calculate_entires(maxbytes);
entires?
changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v4.patchapplication/octet-stream; name=improve_bitmap_cost_v4.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 01d6bc5..170ed7f 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -265,7 +265,6 @@ TIDBitmap *
tbm_create(long maxbytes, dsa_area *dsa)
{
TIDBitmap *tbm;
- long nbuckets;
/* Create the TIDBitmap struct and zero all its fields */
tbm = makeNode(TIDBitmap);
@@ -279,11 +278,7 @@ tbm_create(long maxbytes, dsa_area *dsa)
* for our purpose. Also count an extra Pointer per entry for the arrays
* created during iteration readout.
*/
- nbuckets = maxbytes /
- (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
- nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
- nbuckets = Max(nbuckets, 16); /* sanity limit */
- tbm->maxentries = (int) nbuckets;
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);
tbm->lossify_start = 0;
tbm->dsa = dsa;
tbm->dsapagetable = InvalidDsaPointer;
@@ -1546,3 +1541,26 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_entires
+ *
+ * Calculate the number of estimated entries in the bitmap for given memory
+ * in bytes.
+ */
+long
+tbm_calculate_entires(double maxbytes)
+{
+ long nbuckets;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ return nbuckets;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f76da49..485768f 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5126,6 +5126,10 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
+ long maxentries;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5170,6 +5174,33 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate get the estimated maxentries in the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ maxentries = tbm_calculate_entires(work_mem * 1024L);
+
+ /*
+ * After the initial lossification that is maxentries / 2, the number of
+ * lossy pages grows slower. It is good enough to reflect this initial
+ * sharp increase in the lossy page number.
+ */
+ if (maxentries < heap_pages)
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
+
+ exact_pages = heap_pages - lossy_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (lossy_pages > 0)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
*cost = indexTotalCost;
if (tuple)
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index f9a1902..1a8d9fc 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -70,5 +70,6 @@ extern void tbm_end_iterate(TBMIterator *iterator);
extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern long tbm_calculate_entires(double maxbytes);
#endif /* TIDBITMAP_H */
Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.
I think the problem might be that the total_pages includes cache effects
and rescans. For bitmap entries we should use something like relation
pages * selectivity.
Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2
workers, SSD storage.
It shows significant improvement on 4MB work_mem and no change on 2GB.
Here are the results (execution time in seconds, average of 5 runs):
work_mem 4MB 2GB
Query master patch master patch
4 179 174 168 167
5 190 168 155 156
6 280 87 227 229
8 197 114 179 172
10 269 227 190 192
14 110 108 106 105
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
The performance results look good, but that's a slightly different
thing from whether the estimate is accurate.+ nbuckets = tbm_calculate_entires(maxbytes);
entires?
changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);
My point was not that you should add a cast, but that you wrote
"entires" rather than "entries".
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 6:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
The performance results look good, but that's a slightly different
thing from whether the estimate is accurate.+ nbuckets = tbm_calculate_entires(maxbytes);
entires?
changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);My point was not that you should add a cast, but that you wrote
"entires" rather than "entries".
My bad, I thought you were suggesting the variable name as "entries"
instead of "nbuckets" so I removed the variable "nbuckets". I will
fix the typo in the function name and post the patch.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.I think the problem might be that the total_pages includes cache effects and
rescans. For bitmap entries we should use something like relation pages *
selectivity.
I have noticed that for the TPCH case if I use "pages * selectivity"
it give me better results, but IMHO directly multiplying the pages
with selectivity may not be the correct way to calculate the number of
heap pages it can only give the correct result when all the TID being
fetched are clustered. But on the other hand "Mackert and Lohman
formula" formulae consider that all the TID's are evenly distributed
across the heap pages which can also give the wrong estimation like we
are seeing in our TPCH case.
Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2
workers, SSD storage.
It shows significant improvement on 4MB work_mem and no change on 2GB.Here are the results (execution time in seconds, average of 5 runs):
work_mem 4MB 2GB
Query master patch master patch
4 179 174 168 167
5 190 168 155 156
6 280 87 227 229
8 197 114 179 172
10 269 227 190 192
14 110 108 106 105
Thanks for the testing number looks good to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 7:04 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Oct 6, 2017 at 6:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
The performance results look good, but that's a slightly different
thing from whether the estimate is accurate.+ nbuckets = tbm_calculate_entires(maxbytes);
entires?
changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);My point was not that you should add a cast, but that you wrote
"entires" rather than "entries".My bad, I thought you were suggesting the variable name as "entries"
instead of "nbuckets" so I removed the variable "nbuckets". I will
fix the typo in the function name and post the patch.
Fixed in the attached version.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v5.patchapplication/octet-stream; name=improve_bitmap_cost_v5.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 01d6bc5..5554c51 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -265,7 +265,6 @@ TIDBitmap *
tbm_create(long maxbytes, dsa_area *dsa)
{
TIDBitmap *tbm;
- long nbuckets;
/* Create the TIDBitmap struct and zero all its fields */
tbm = makeNode(TIDBitmap);
@@ -279,11 +278,7 @@ tbm_create(long maxbytes, dsa_area *dsa)
* for our purpose. Also count an extra Pointer per entry for the arrays
* created during iteration readout.
*/
- nbuckets = maxbytes /
- (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
- nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
- nbuckets = Max(nbuckets, 16); /* sanity limit */
- tbm->maxentries = (int) nbuckets;
+ tbm->maxentries = (int) tbm_calculate_entries(maxbytes);
tbm->lossify_start = 0;
tbm->dsa = dsa;
tbm->dsapagetable = InvalidDsaPointer;
@@ -1546,3 +1541,26 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_entries
+ *
+ * Calculate the number of estimated entries in the bitmap for given memory
+ * in bytes.
+ */
+long
+tbm_calculate_entries(double maxbytes)
+{
+ long nbuckets;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ return nbuckets;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f76da49..ca88559 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5126,6 +5126,10 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
+ long maxentries;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5170,6 +5174,33 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate get the estimated maxentries in the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ maxentries = tbm_calculate_entries(work_mem * 1024L);
+
+ /*
+ * After the initial lossification that is maxentries / 2, the number of
+ * lossy pages grows slower. It is good enough to reflect this initial
+ * sharp increase in the lossy page number.
+ */
+ if (maxentries < heap_pages)
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
+
+ exact_pages = heap_pages - lossy_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (lossy_pages > 0)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
*cost = indexTotalCost;
if (tuple)
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index f9a1902..d3ad0a5 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -70,5 +70,6 @@ extern void tbm_end_iterate(TBMIterator *iterator);
extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern long tbm_calculate_entries(double maxbytes);
#endif /* TIDBITMAP_H */
On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.I think the problem might be that the total_pages includes cache effects and
rescans. For bitmap entries we should use something like relation pages *
selectivity.I have noticed that for the TPCH case if I use "pages * selectivity"
it give me better results, but IMHO directly multiplying the pages
with selectivity may not be the correct way to calculate the number of
heap pages it can only give the correct result when all the TID being
fetched are clustered. But on the other hand "Mackert and Lohman
formula" formulae consider that all the TID's are evenly distributed
across the heap pages which can also give the wrong estimation like we
are seeing in our TPCH case.
I agree with the point that the total_pages included the cache effects
and rescan when loop_count > 1, that can be avoided if we always
calculate heap_pages as it is calculated in the else part
(loop_count=0). Fortunately, in all the TPCH query plan what I posted
up thread bitmap scan was never at the inner side of the NLJ so
loop_count was always 0. I will fix this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 9:21 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.I think the problem might be that the total_pages includes cache effects and
rescans. For bitmap entries we should use something like relation pages *
selectivity.I have noticed that for the TPCH case if I use "pages * selectivity"
it give me better results, but IMHO directly multiplying the pages
with selectivity may not be the correct way to calculate the number of
heap pages it can only give the correct result when all the TID being
fetched are clustered. But on the other hand "Mackert and Lohman
formula" formulae consider that all the TID's are evenly distributed
across the heap pages which can also give the wrong estimation like we
are seeing in our TPCH case.I agree with the point that the total_pages included the cache effects
and rescan when loop_count > 1, that can be avoided if we always
calculate heap_pages as it is calculated in the else part
(loop_count=0). Fortunately, in all the TPCH query plan what I posted
up thread bitmap scan was never at the inner side of the NLJ so
loop_count was always 0. I will fix this.
I have fixed the issue. Now, for calculating the lossy pages it will
not consider the rescan. As mentioned above it will not affect the
TPCH plan so haven't measured the performance again.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_bitmap_cost_v6.patchapplication/octet-stream; name=improve_bitmap_cost_v6.patchDownload
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 01d6bc5..5554c51 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -265,7 +265,6 @@ TIDBitmap *
tbm_create(long maxbytes, dsa_area *dsa)
{
TIDBitmap *tbm;
- long nbuckets;
/* Create the TIDBitmap struct and zero all its fields */
tbm = makeNode(TIDBitmap);
@@ -279,11 +278,7 @@ tbm_create(long maxbytes, dsa_area *dsa)
* for our purpose. Also count an extra Pointer per entry for the arrays
* created during iteration readout.
*/
- nbuckets = maxbytes /
- (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
- nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
- nbuckets = Max(nbuckets, 16); /* sanity limit */
- tbm->maxentries = (int) nbuckets;
+ tbm->maxentries = (int) tbm_calculate_entries(maxbytes);
tbm->lossify_start = 0;
tbm->dsa = dsa;
tbm->dsapagetable = InvalidDsaPointer;
@@ -1546,3 +1541,26 @@ pagetable_free(pagetable_hash *pagetable, void *pointer)
tbm->dsapagetableold = InvalidDsaPointer;
}
}
+
+/*
+ * tbm_calculate_entries
+ *
+ * Calculate the number of estimated entries in the bitmap for given memory
+ * in bytes.
+ */
+long
+tbm_calculate_entries(double maxbytes)
+{
+ long nbuckets;
+
+ /*
+ * Estimate number of hashtable entries we can have within maxbytes. This
+ * estimates the hash cost as sizeof(PagetableEntry).
+ */
+ nbuckets = maxbytes /
+ (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
+ nbuckets = Min(nbuckets, INT_MAX - 1); /* safety limit */
+ nbuckets = Max(nbuckets, 16); /* sanity limit */
+
+ return nbuckets;
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..97a69c8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5127,6 +5127,10 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
double T;
double pages_fetched;
double tuples_fetched;
+ double heap_pages = 0;
+ double exact_pages = 0;
+ double lossy_pages = 0;
+ long maxentries;
/*
* Fetch total cost of obtaining the bitmap, as well as its total
@@ -5141,6 +5145,20 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;
+ /*
+ * For a single scan, the number of heap pages that need to be fetched
+ * is the same as the Mackert and Lohman formula for the case T <= b
+ * (ie, no re-reads needed).
+ */
+ pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
+
+ /*
+ * Calculate the number of pages fetched from the heap. Then based on
+ * current work_mem estimate get the estimated maxentries in the bitmap.
+ */
+ heap_pages = Min(pages_fetched, baserel->pages);
+ maxentries = tbm_calculate_entries(work_mem * 1024L);
+
if (loop_count > 1)
{
/*
@@ -5155,22 +5173,32 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
root);
pages_fetched /= loop_count;
}
- else
- {
- /*
- * For a single scan, the number of heap pages that need to be fetched
- * is the same as the Mackert and Lohman formula for the case T <= b
- * (ie, no re-reads needed).
- */
- pages_fetched =
- (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched);
- }
if (pages_fetched >= T)
pages_fetched = T;
else
pages_fetched = ceil(pages_fetched);
+ /*
+ * After the initial lossification that is maxentries / 2, the number of
+ * lossy pages grows slower. It is good enough to reflect this initial
+ * sharp increase in the lossy page number.
+ */
+ if (maxentries < heap_pages)
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
+
+ exact_pages = heap_pages - lossy_pages;
+
+ /*
+ * If there are lossy pages then recompute the number of tuples processed
+ * by the bitmap heap node. For the exact_pages it's baserel->tuples *
+ * indexSelectivity and for lossy_pages we have to process all the tuples.
+ */
+ if (lossy_pages > 0)
+ tuples_fetched = clamp_row_est (indexSelectivity *
+ (exact_pages / heap_pages) * baserel->tuples +
+ (lossy_pages / heap_pages) * baserel->tuples);
+
if (cost)
*cost = indexTotalCost;
if (tuple)
diff --git a/src/include/nodes/tidbitmap.h b/src/include/nodes/tidbitmap.h
index f9a1902..d3ad0a5 100644
--- a/src/include/nodes/tidbitmap.h
+++ b/src/include/nodes/tidbitmap.h
@@ -70,5 +70,6 @@ extern void tbm_end_iterate(TBMIterator *iterator);
extern void tbm_end_shared_iterate(TBMSharedIterator *iterator);
extern TBMSharedIterator *tbm_attach_shared_iterate(dsa_area *dsa,
dsa_pointer dp);
+extern long tbm_calculate_entries(double maxbytes);
#endif /* TIDBITMAP_H */
Hi Dilip,
v6 patch:
42 + /*
43 + * Estimate number of hashtable entries we can have within
maxbytes. This
44 + * estimates the hash cost as sizeof(PagetableEntry).
45 + */
46 + nbuckets = maxbytes /
47 + (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));
It took me a little while to understand this calculation. You have moved this
code from tbm_create(), but I think you should move the following
comment as well:
tidbitmap.c:
276 /*
277 * Estimate number of hashtable entries we can have within
maxbytes. This
278 * estimates the hash cost as sizeof(PagetableEntry), which
is good enough
279 * for our purpose. Also count an extra Pointer per entry
for the arrays
280 * created during iteration readout.
281 */
Regards,
Amul
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 9, 2017 at 3:55 AM, amul sul <sulamul@gmail.com> wrote:
It took me a little while to understand this calculation. You have moved this
code from tbm_create(), but I think you should move the following
comment as well:
I made an adjustment that I hope will address your concern here, made
a few other adjustments, and committed this.
One point of concern that wasn't entirely addressed in the above
discussion is the accuracy of this formula:
+ lossy_pages = Max(0, heap_pages - maxentries / 2);
When I first looked at Dilip's test results, I thought maybe this
formula was way off. But on closer study, the formula does a decent
(not fantastic) job of estimating the number of exact pages. The fact
that the number of lossy pages is off is just because the Mackert and
Lohman formula is overestimating how many pages are fetched. Now, in
Dilip's results, this formula more often than not - but not invariably
- predicted more exact pages than we actually got. So possibly
instead of maxentries / 2 we could subtract maxentries or some other
multiple of maxentries. I don't know what's actually best here, but I
think there's a strong argument that this is an improvement as it
stands, and we can adjust it later if it becomes clear what would be
better.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 11, 2017 at 3:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Nov 9, 2017 at 3:55 AM, amul sul <sulamul@gmail.com> wrote:
It took me a little while to understand this calculation. You have moved this
code from tbm_create(), but I think you should move the following
comment as well:I made an adjustment that I hope will address your concern here, made
a few other adjustments, and committed this.
Thanks, Robert.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers