optimizing vacuum truncation scans
After a large bulk load aborted near the end, I decided to vacuum the main
table so as to not leave a huge chunk of free space in the middle of it,
before re-running the bulk load. This vacuum took a frustratingly long
time, as the backwards scan over the table to truncate the space did not
trigger read-ahead (or read-behind) in the file system. (Which was ext4).
In the hind-sight, the best thing to do would have been to open a psql
session and take an access-share lock on the main table, so that it would
prevent the truncation from occurring. Leaving cleaned, reusable space at
the end of the table to be refilled upon the next (successful) run of the
bulk load. But it would be nice if the system could do some optimization
for me, as hind sight never comes in time.
I did literally the simplest thing I could think of as a proof of concept
patch, to see if it would actually fix things. I just jumped back a
certain number of blocks occasionally and prefetched them forward, then
resumed the regular backward scan. The patch and driving script are
attached.
It was very successful in this test. I've report the fastest time to do
the truncation for each setting of the prefetch size out of about 10 runs
each, after removing a handful of cases where buffer pins prevented it from
doing a full-size truncation. (I used the minimum, because the server has
other duties what would occasionally kick in and cause a spike in time for
a few runs). 0 setting disables this feature.
JJFETCH trunc_time(s) 0 355.78 1 350.5 2 277.61 4 213.9 8 172.79 16
138.87 32 133.02 64 76.25 128 47.63 256 35.28 512 20.44 1024 18.6
2048 15.63 4096 15.34 8192 14.72 16384 13.45 32768 13.13
For context, the forward-scan part of this vacuum took about 80 sec (IQR 75
to 88).
Do we want something like this at all? If so, what would we have to do to
implement it for real? I thought that maybe the back-off amount could
start at one block and double each time, until it hits a maximum. And that
the maximum would be either seg_size, or 1/8 of effective_cache_size,
whichever is lower. (I don't think we would want to prefetch more than a
fraction of effective_cache_size, or else it could have already been
displaced by the time the scan gets back to it).
Also, what would be the best way to drill a hole through bufmgr.c into md.c
so that the prefetch could specify an entire range, rather than looping
over each individual block?
What would have to be done to detect people running on SSD and disable the
feature, if anything?
I'll add this to next commitfest as WIP patch.
Cheers,
Jeff
On 4/19/15 9:09 PM, Jeff Janes wrote:
I did literally the simplest thing I could think of as a proof of
concept patch, to see if it would actually fix things. I just jumped
back a certain number of blocks occasionally and prefetched them
forward, then resumed the regular backward scan. The patch and driving
script are attached.
Shouldn't completely empty pages be set as all-visible in the VM? If so,
can't we just find the largest not-all-visible page and move forward
from there, instead of moving backwards like we currently do?
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why don't
we just scan forward? I suspect that eons ago we didn't have that and
just blindly reverse-scanned until we finally hit a non-empty buffer...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Apr 19, 2015 at 10:38 PM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 4/19/15 9:09 PM, Jeff Janes wrote:
I did literally the simplest thing I could think of as a proof of
concept patch, to see if it would actually fix things. I just jumped
back a certain number of blocks occasionally and prefetched them
forward, then resumed the regular backward scan. The patch and driving
script are attached.Shouldn't completely empty pages be set as all-visible in the VM? If so,
can't we just find the largest not-all-visible page and move forward from
there, instead of moving backwards like we currently do?
If the entire table is all-visible, we would be starting from the
beginning, even though the beginning of the table still has read only
tuples present.
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why don't we
just scan forward? I suspect that eons ago we didn't have that and just
blindly reverse-scanned until we finally hit a non-empty buffer...
nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan. But maybe the combination of the two?
If it is above nonempty_pages, then anyone who wrote into the page after
vacuum passed it must have cleared the VM bit. And currently I think no one
but vacuum ever sets VM bit back on, so once cleared it would stay cleared.
In any event nonempty_pages could be used to set the guess as to how many
pages (if any) might be worth prefetching, as that is not needed for
correctness.
Cheers,
Jeff
On 4/20/15 1:50 AM, Jeff Janes wrote:
Shouldn't completely empty pages be set as all-visible in the VM? If
so, can't we just find the largest not-all-visible page and move
forward from there, instead of moving backwards like we currently do?If the entire table is all-visible, we would be starting from the
beginning, even though the beginning of the table still has read only
tuples present.
Except we'd use it in conjunction with nonempty_pages. IIRC, that's set
to the last page that vacuum saw data on. If any page after that got
written to after vacuum visited it, the VM bit would be cleared. So
after we acquire the exclusive lock, AFAICT it's safe to just scan the
VM starting with nonempty_pages.
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why
don't we just scan forward? I suspect that eons ago we didn't have
that and just blindly reverse-scanned until we finally hit a
non-empty buffer...nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan. But maybe the combination of the
two? If it is above nonempty_pages, then anyone who wrote into the page
after vacuum passed it must have cleared the VM bit. And currently I
think no one but vacuum ever sets VM bit back on, so once cleared it
would stay cleared.
Right.
In any event nonempty_pages could be used to set the guess as to how
many pages (if any) might be worth prefetching, as that is not needed
for correctness.
Yeah, but I think we'd do a LOT better with the VM idea, because we
could immediately truncate without scanning anything.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Apr 19, 2015 at 7:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I did literally the simplest thing I could think of as a proof of concept patch, to see if it would actually fix things. I just jumped back a certain number of blocks occasionally and prefetched them forward, then resumed the regular backward scan.
IIRC, this patches introduces optional prefetch for the backward scan
where file system prefetch may not work well. The result is promising!
Also, what would be the best way to drill a hole through bufmgr.c into md.c so that the prefetch could specify an entire range, rather than looping over each individual block?
Different from mdread(), which requires a buffer in argument.
Extending mdprefetch() looks natural and safe to me. This shall also
layout a foundation for other scan style optimizations.
What would have to be done to detect people running on SSD and disable the feature, if anything?
There are other call sites of prefetch - so this shall be a topic not
just for this optimization.
Regards,
Qingqing
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 4/20/15 1:50 AM, Jeff Janes wrote:
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why
don't we just scan forward? I suspect that eons ago we didn't have
that and just blindly reverse-scanned until we finally hit a
non-empty buffer...nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan. But maybe the combination of the
two? If it is above nonempty_pages, then anyone who wrote into the page
after vacuum passed it must have cleared the VM bit. And currently I
think no one but vacuum ever sets VM bit back on, so once cleared it
would stay cleared.Right.
In any event nonempty_pages could be used to set the guess as to how
many pages (if any) might be worth prefetching, as that is not needed
for correctness.Yeah, but I think we'd do a LOT better with the VM idea, because we could
immediately truncate without scanning anything.
Right now all the interlocks to make this work seem to be in place (only
vacuum and startup can set visibility map bits, and only one vacuum can be
in a table at a time). But as far as I can tell, those assumption are not
"baked in" and we have pondered loosening them before.
For example, letting HOT clean up mark a page as all-visible if it finds it
be such. Now in that specific case it would be OK, as HOT cleanup would
not cause a page to become empty (or could it? If an insert on a table
with no indexes was rolled back, and hot clean up found it and cleaned it
up, it could conceptually become empty--unless we make special code to
prevent it) , and so the page would have to be below nonempty_pages. But
there may be other cases.
And I know other people have mentioned making VACUUM concurrent (although I
don't see the value in that myself).
So doing it this way would be hard to beat (scanning a bitmap vs the table
itself), but it would also introduce a modularity violation that I am not
sure is worth it.
Of course this could always be reverted if its requirements became a
problem for a more important change (assuming of course that we detected
the problem)
Cheers,
Jeff
On Tue, May 26, 2015 at 12:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:On 4/20/15 1:50 AM, Jeff Janes wrote:
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why
don't we just scan forward? I suspect that eons ago we didn't have
that and just blindly reverse-scanned until we finally hit a
non-empty buffer...nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan. But maybe the combination of the
two? If it is above nonempty_pages, then anyone who wrote into the page
after vacuum passed it must have cleared the VM bit. And currently I
think no one but vacuum ever sets VM bit back on, so once cleared it
would stay cleared.Right.
In any event nonempty_pages could be used to set the guess as to how
many pages (if any) might be worth prefetching, as that is not needed
for correctness.Yeah, but I think we'd do a LOT better with the VM idea, because we could
immediately truncate without scanning anything.Right now all the interlocks to make this work seem to be in place (only
vacuum and startup can set visibility map bits, and only one vacuum can be
in a table at a time). But as far as I can tell, those assumption are not
"baked in" and we have pondered loosening them before.For example, letting HOT clean up mark a page as all-visible if it finds
it be such. Now in that specific case it would be OK, as HOT cleanup would
not cause a page to become empty (or could it? If an insert on a table
with no indexes was rolled back, and hot clean up found it and cleaned it
up, it could conceptually become empty--unless we make special code to
prevent it) , and so the page would have to be below nonempty_pages. But
there may be other cases.And I know other people have mentioned making VACUUM concurrent (although
I don't see the value in that myself).So doing it this way would be hard to beat (scanning a bitmap vs the table
itself), but it would also introduce a modularity violation that I am not
sure is worth it.Of course this could always be reverted if its requirements became a
problem for a more important change (assuming of course that we detected
the problem)
The fatal problem here is that nonempty_pages is unreliable. If vacuum
skips all-visible pages, it doesn't necessarily increment nonempty_pages
beyond that skippage. So if you just rely on nonempty_pages, you will
truncate away pages that were already all visible but are not empty. If we
changed it so that it did increment nonempty_pages past the skipped ones,
then pages which were all empty and got marked as all visible without being
truncated (say, because a lock could not be acquired, or because there was
a non-empty page after them which later became empty), then those pages
would never get truncated away.
As it is currently, it is not clear what purpose nonempty_pages serves. It
is a guardian value which doesn't seem to actually guard anything. At best
it prevents you from needing to inspect one page (the page of the guardian
value itself) to see if that page is actually empty, and finding that it is
not. That hardly seems worthwhile.
We could adopt two nonempty_pages counters, once that fails low on skipped
all-visible pages, and one that failed high on them. And then fast
truncate down to the high one, and do the current page by page scan between
the low and high. That seems rather grotesque.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
As it is currently, it is not clear what purpose nonempty_pages serves.
It's to allow an early exit at vacuumlazy.c:271 in case there is no chance
of deleting a useful number of pages.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, May 26, 2015 at 12:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:On 4/20/15 1:50 AM, Jeff Janes wrote:
For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why
don't we just scan forward? I suspect that eons ago we didn't have
that and just blindly reverse-scanned until we finally hit a
non-empty buffer...nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan. But maybe the combination of the
two? If it is above nonempty_pages, then anyone who wrote into the page
after vacuum passed it must have cleared the VM bit. And currently I
think no one but vacuum ever sets VM bit back on, so once cleared it
would stay cleared.Right.
In any event nonempty_pages could be used to set the guess as to how
many pages (if any) might be worth prefetching, as that is not needed
for correctness.Yeah, but I think we'd do a LOT better with the VM idea, because we could
immediately truncate without scanning anything.Right now all the interlocks to make this work seem to be in place (only
vacuum and startup can set visibility map bits, and only one vacuum can be
in a table at a time). But as far as I can tell, those assumption are not
"baked in" and we have pondered loosening them before.For example, letting HOT clean up mark a page as all-visible if it finds
it be such. Now in that specific case it would be OK, as HOT cleanup would
not cause a page to become empty (or could it? If an insert on a table
with no indexes was rolled back, and hot clean up found it and cleaned it
up, it could conceptually become empty--unless we make special code to
prevent it) , and so the page would have to be below nonempty_pages. But
there may be other cases.And I know other people have mentioned making VACUUM concurrent (although
I don't see the value in that myself).So doing it this way would be hard to beat (scanning a bitmap vs the table
itself), but it would also introduce a modularity violation that I am not
sure is worth it.Of course this could always be reverted if its requirements became a
problem for a more important change (assuming of course that we detected
the problem)
Attached is a patch that implements the vm scan for truncation. It
introduces a variable to hold the last blkno which was skipped during the
forward portion. Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum. Any other process rendering
the page nonempty are required to clear the vm bit, and no other process
can set the bit again during the vacuum's lifetime. So if the bit is still
set, the page is still empty without needing to inspect it.
There is still the case of pages which had their visibility bit set by a
prior vacuum and then were not inspected by the current one. Once the
truncation scan runs into these pages, it falls back to the previous
behavior of reading block by block backwards. So there could still be
reason to optimize that fallback using forward-reading prefetch.
Using the previously shown test case, this patch reduces the truncation
part of the vacuum to 2 seconds.
Cheers,
Jeff
Attachments:
vac_trunc_trust_vm.patchapplication/octet-stream; name=vac_trunc_trust_vm.patchDownload+34-0
On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Attached is a patch that implements the vm scan for truncation. It
introduces a variable to hold the last blkno which was skipped during the
forward portion. Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum. Any other process rendering the
page nonempty are required to clear the vm bit, and no other process can set
the bit again during the vacuum's lifetime. So if the bit is still set, the
page is still empty without needing to inspect it.
The patch looks good and it improves the vacuum truncation speed significantly.
There is still the case of pages which had their visibility bit set by a
prior vacuum and then were not inspected by the current one. Once the
truncation scan runs into these pages, it falls back to the previous
behavior of reading block by block backwards. So there could still be
reason to optimize that fallback using forward-reading prefetch.
The case, I didn't understand is that, how the current vacuum misses
the page which
was set by the prior vacuum?
The page should be counted either in skipped_pages or in
nonempty_pages. Is it possible
that a page doesn't comes under these two categories and it is not empty also?
If the above doesn't exist, then we can directly truncate the relation
from the highest block
number of either nonempty_pages or skipped_pages to till the end of
the relation.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Attached is a patch that implements the vm scan for truncation. It
introduces a variable to hold the last blkno which was skipped during the
forward portion. Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must havebeen
observed to be empty by the current vacuum. Any other process rendering
the
page nonempty are required to clear the vm bit, and no other process can
set
the bit again during the vacuum's lifetime. So if the bit is still set,
the
page is still empty without needing to inspect it.
The patch looks good and it improves the vacuum truncation speed
significantly.There is still the case of pages which had their visibility bit set by a
prior vacuum and then were not inspected by the current one. Once the
truncation scan runs into these pages, it falls back to the previous
behavior of reading block by block backwards. So there could still be
reason to optimize that fallback using forward-reading prefetch.The case, I didn't understand is that, how the current vacuum misses
the page which
was set by the prior vacuum?
The prior vacuum set them to all visible, but then doesn't delete them.
Either because it got interrupted or because there were still some pages
after them (at the time) that were not all empty.
The current vacuum skips them entirely on the forward scan because it
thinks it is waste of time to vacuum all visible pages. Since it skips
them, it doesn't know if they are empty as well as being all-visible.
There is no permanent indicator of the pages being all-empty, there is only
the inference based on the current vacuum's counters and protected by the
lock held on the table.
The page should be counted either in skipped_pages or in
nonempty_pages. Is it possible
that a page doesn't comes under these two categories and it is not empty
also?If the above doesn't exist, then we can directly truncate the relation
from the highest block
number of either nonempty_pages or skipped_pages to till the end of
the relation.
Right, and that is what this does (provided the vm bit is still set, so it
does still have to loop over the vm to verify that it is still set, while
it holds the access exclusive lock).
The problem is that the pages between the two counters are not known to be
empty, and also not known to be nonempty. Someone has to be willing to go
check on those pages at some point, or else they will never be eligible for
truncation.
Cheers,
Jeff
On Thu, Jul 9, 2015 at 4:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
There is still the case of pages which had their visibility bit set by a
prior vacuum and then were not inspected by the current one. Once the
truncation scan runs into these pages, it falls back to the previous
behavior of reading block by block backwards. So there could still be
reason to optimize that fallback using forward-reading prefetch.The case, I didn't understand is that, how the current vacuum misses
the page which
was set by the prior vacuum?The prior vacuum set them to all visible, but then doesn't delete them.
Either because it got interrupted or because there were still some pages
after them (at the time) that were not all empty.The current vacuum skips them entirely on the forward scan because it thinks
it is waste of time to vacuum all visible pages. Since it skips them, it
doesn't know if they are empty as well as being all-visible. There is no
permanent indicator of the pages being all-empty, there is only the
inference based on the current vacuum's counters and protected by the lock
held on the table.The page should be counted either in skipped_pages or in
nonempty_pages. Is it possible
that a page doesn't comes under these two categories and it is not empty
also?If the above doesn't exist, then we can directly truncate the relation
from the highest block
number of either nonempty_pages or skipped_pages to till the end of
the relation.Right, and that is what this does (provided the vm bit is still set, so it
does still have to loop over the vm to verify that it is still set, while it
holds the access exclusive lock).
Thanks I got it. To re-verify the vm bit of a page after getting the
access exclusive lock,
we have to do the backward scan.
I also feel that your prefetch buffer patch needed to improve the
performance, as because
there is a need of backward scan to re-verify vm bit,
I will do some performance tests and send you the results.
The problem is that the pages between the two counters are not known to be
empty, and also not known to be nonempty. Someone has to be willing to go
check on those pages at some point, or else they will never be eligible for
truncation.
Yes, there is no way to identify the page is empty or not without
reading the page.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
I will do some performance tests and send you the results.
Here are the performance results tested on my machine.
Head vm patch vm+prefetch patch
First vacuum 120sec <1sec <1sec
second vacuum 180 sec 180 sec 30 sec
I did some modifications in the code to skip the vacuum truncation by
the first vacuum command.
This way I collected the second vacuum time taken performance.
I just combined your vm and prefetch patch into a single patch
vm+prefetch patch without a GUC.
I kept the prefetch as 32 and did the performance test. I chosen
prefetch based on the current
buffer access strategy, which is 32 for vacuum presently instead of an
user option.
Here I attached the modified patch with both vm+prefetch logic.
I will do some tests on a machine with SSD and let you know the
results. Based on these results,
we can decide whether we need a GUC or not? based on the impact of
prefetch on ssd machines.
Regards,
Hari Babu
Fujitsu Australia
Attachments:
vac_trunc_trust_vm_and_prefetch.patchapplication/octet-stream; name=vac_trunc_trust_vm_and_prefetch.patchDownload+48-0
On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
I will do some performance tests and send you the results.
Here are the performance results tested on my machine.
Head vm patch vm+prefetch patch
First vacuum 120sec <1sec <1sec
second vacuum 180 sec 180 sec 30 secI did some modifications in the code to skip the vacuum truncation by
the first vacuum command.
This way I collected the second vacuum time taken performance.I just combined your vm and prefetch patch into a single patch
vm+prefetch patch without a GUC.
I kept the prefetch as 32 and did the performance test. I chosen
prefetch based on the current
buffer access strategy, which is 32 for vacuum presently instead of an
user option.
Here I attached the modified patch with both vm+prefetch logic.I will do some tests on a machine with SSD and let you know the
results. Based on these results,
we can decide whether we need a GUC or not? based on the impact of
prefetch on ssd machines.
Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.
Head vm patch vm+prefetch patch
First vacuum 6.24 sec 2.91 sec 2.91 sec
second vacuum 6.66 sec 6.66 sec 7.19 sec
There is a small performance impact on SSD with prefetch.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 13, 2015 at 5:16 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
I will do some performance tests and send you the results.
Here are the performance results tested on my machine.
Head vm patch vm+prefetch patch
First vacuum 120sec <1sec <1sec
second vacuum 180 sec 180 sec 30 secI did some modifications in the code to skip the vacuum truncation by
the first vacuum command.
This way I collected the second vacuum time taken performance.I just combined your vm and prefetch patch into a single patch
vm+prefetch patch without a GUC.
I kept the prefetch as 32 and did the performance test. I chosen
prefetch based on the current
buffer access strategy, which is 32 for vacuum presently instead of an
user option.
Here I attached the modified patch with both vm+prefetch logic.I will do some tests on a machine with SSD and let you know the
results. Based on these results,
we can decide whether we need a GUC or not? based on the impact of
prefetch on ssd machines.Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.Head vm patch vm+prefetch patch
First vacuum 6.24 sec 2.91 sec 2.91 sec
second vacuum 6.66 sec 6.66 sec 7.19 secThere is a small performance impact on SSD with prefetch.
The above prefetch overhead is observed with prefeching of 1639345 pages.
I feel this overhead is small.
Hi Jeff,
If you are fine with earlier attached patch, then I will mark this patch as
ready for committer, to get some committer view on the patch.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
Attached is a patch that implements the vm scan for truncation. It
introduces a variable to hold the last blkno which was skipped during the
forward portion. Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum. Any other process rendering
the page nonempty are required to clear the vm bit, and no other process
can set the bit again during the vacuum's lifetime. So if the bit is still
set, the page is still empty without needing to inspect it.
One case where this patch can behave differently then current
code is, when before taking Exclusive lock on relation for truncation,
if some backend clears the vm bit and then inserts-deletes-prune that
page. I think with patch it will not consider to truncate pages whereas
current code will allow to truncate it as it re-verifies each page. I think
such a case would be rare and we might not want to bother about it,
but still worth to think about it once.
Some minor points about patch:
count_nondeletable_pages()
{
..
if (PageIsNew(page) || PageIsEmpty(page))
{
/* PageIsNew probably shouldn't happen... */
UnlockReleaseBuffer(buf);
continue;
}
..
}
Why vmbuffer is not freed in above loop?
count_nondeletable_pages()
{
..
+ /*
+ * Pages that were inspected and found to be empty by this vacuum run
+ * must still be empty if the vm bit is still set. Only vacuum sets
+ * vm bits, and only one vacuum can exist in a table at one time.
+ */
+ trust_vm=vacrelstats->nonempty_pages>vacrelstats->skipped_pages ?
+ vacrelstats->nonempty_pages : vacrelstats->skipped_pages;
..
}
I think it is better to have spaces in above assignment statement
(near '=' and near '>')
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
Attached is a patch that implements the vm scan for truncation. It
introduces a variable to hold the last blkno which was skipped during the
forward portion. Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum. Any other process rendering the
page nonempty are required to clear the vm bit, and no other process can set
the bit again during the vacuum's lifetime. So if the bit is still set, the
page is still empty without needing to inspect it.One case where this patch can behave differently then current
code is, when before taking Exclusive lock on relation for truncation,
if some backend clears the vm bit and then inserts-deletes-prune that
page. I think with patch it will not consider to truncate pages whereas
current code will allow to truncate it as it re-verifies each page. I think
such a case would be rare and we might not want to bother about it,
but still worth to think about it once.
Thanks for your review.
corrected the code as instead of returning the blkno after visibility map
check failure, the code just continue to verify the contents as
earlier approach.
Some minor points about patch:
count_nondeletable_pages()
{
..
if (PageIsNew(page) || PageIsEmpty(page))
{
/* PageIsNew probably shouldn't happen... */
UnlockReleaseBuffer(buf);
continue;
}
..
}Why vmbuffer is not freed in above loop?
In the above check, we are just continuing to the next blkno, at the
end of the loop the vmbuffer
is freed.
count_nondeletable_pages() { .. + /* + * Pages that were inspected and found to be empty by this vacuum run + * must still be empty if the vm bit is still set. Only vacuum sets + * vm bits, and only one vacuum can exist in a table at one time. + */ + trust_vm=vacrelstats->nonempty_pages>vacrelstats->skipped_pages ? + vacrelstats->nonempty_pages : vacrelstats->skipped_pages;..
}I think it is better to have spaces in above assignment statement
(near '=' and near '>')
corrected.
Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.
I marked the patch as ready for committer.
Regards,
Hari Babu
Fujitsu Australia
On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:
On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
One case where this patch can behave differently then current
code is, when before taking Exclusive lock on relation for truncation,
if some backend clears the vm bit and then inserts-deletes-prune that
page. I think with patch it will not consider to truncate pages whereas
current code will allow to truncate it as it re-verifies each page. I
think
such a case would be rare and we might not want to bother about it,
but still worth to think about it once.Thanks for your review.
corrected the code as instead of returning the blkno after visibility map
check failure, the code just continue to verify the contents as
earlier approach.
Okay, I think this should work.
Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.
I think it is better to just get the first patch in as that in itself is a
clear win and then we can evaluate the second one (prefetching during
truncation) separately. I think after first patch, the use case for doing
prefetch seems to be less beneficial and I think it could hurt by loading
not required pages (assume you have prefetched 32 pages and after
inspecting first page, it decides to quit the loop as that is non-empty page
and other is when it has prefetched just because for page the visibility map
bit was cleared, but for others it is set) in OS buffer cache.
Having said that, I am not against prefetching in this scenario as that
can help in more cases then it could hurt, but I think it will be better
to evaluate that separately.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 16, 2015 at 10:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
wrote:Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.I think it is better to just get the first patch in as that in itself is a
clear win and then we can evaluate the second one (prefetching during
truncation) separately. I think after first patch, the use case for doing
prefetch seems to be less beneficial and I think it could hurt by loading
not required pages (assume you have prefetched 32 pages and after
inspecting first page, it decides to quit the loop as that is non-empty page
and other is when it has prefetched just because for page the visibility map
bit was cleared, but for others it is set) in OS buffer cache.
Yes, in the above cases, the prefetch is an overhead. I am not sure
how frequently such
scenarios occur in real time scenarios.
Having said that, I am not against prefetching in this scenario as that
can help in more cases then it could hurt, but I think it will be better
to evaluate that separately.
Yes, the prefetch patch works better in cases, where majorly the first
vacuum skips
the truncation because of lock waiters. If such cases are more in real
time scenarios,
then the prefetch patch is needed.
Regards,
Hari Babu
Fujitsu Australia
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 16 July 2015 at 06:51, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
I marked the patch as ready for committer.
The most recent results seem to indicate that the prefetch isn't worth
pursuing, but the vm test is. Can someone repeat the perf tests on
something larger so we can see, when/if there is a benefit?
Jeff, can you add detailed comments to explain the theory of operation of
these patches? The patches add the code, but don't say why. We will forget,
so lets put the detail in there now please. Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services