New gist vacuum.

Started by Костя Кузнецовover 10 years ago23 messageshackers
Jump to latest

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

seaq-access.patchtext/x-diff; name=seaq-access.patchDownload+910-26
#2Michael Paquier
michael@paquier.xyz
In reply to: Костя Кузнецов (#1)
Re: New gist vacuum.

On Fri, Sep 11, 2015 at 7:52 AM, Костя Кузнецов <chapaev28@ya.ru> wrote:

old version:

INFO: vacuuming "public.point_tbl"
INFO: scanned index "gpointind" to remove 11184520 row versions
DETAIL: CPU 84.70s/72.26u sec elapsed 27007.14 sec.
[...]

new vacuum is about
INFO: vacuuming "public.point_tbl"
INFO: scanned index "gpointind" to remove 11184520 row versions
DETAIL: CPU 13.00s/27.57u sec elapsed 1864.22 sec.
[...]
There is a big speed up + we can reuse some pages.

Indeed. Interesting. You should definitely add your patch to the next
commit fest:
https://commitfest.postgresql.org/7/
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Костя Кузнецов (#1)
Re: New gist vacuum.

On Thu, Sep 10, 2015 at 3:52 PM, Костя Кузнецов <chapaev28@ya.ru> wrote:

Hello. I am student from gsoc programm.
My project is sequantial access in vacuum of gist.

New vacuum has 2 big step:
physical order scan pages and cleaning after 1 step.

1 scan - scan all pages and create information map(hashmap) and add
information to rescan stack( stack of pages that needed to rescanning

This is interesting work. I think the patch needs a rebase to the git
HEAD. There is a minor conflict in gistRedoPageUpdateRecord. It is a
little confusing because your patch introduces new code and then
immediately comments it out (using //, which is not a comment style
allowed in our style guide) and that phantom code confuses the
conflict resolution process.

There are several other places throughout the patch that use //
comment style to comment out code which the patch itself added. Those
should be removed, and the real comments should be converted to /*
this */ style.

I also got a compiler warning, it looks like a missing #include:

gistutil.c: In function 'gistNewBuffer':
gistutil.c:788:4: warning: implicit declaration of function
'TransactionIdPrecedes' [-Wimplicit-function-declaration]
if (GistPageIsDeleted(page) &&
TransactionIdPrecedes(p->pd_prune_xid, RecentGlobalDataXmin)) {
^

Also, I didn't see a check on the size of the stack. Is there an
argument that this stack will not be able to grow to be large enough
to cause trouble?

Thanks,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Jeff Janes (#3)
Re: New gist vacuum.

<div>Thank you, Jeff.</div><div>I reworking patch now. All // warning will be deleted.</div><div>About memory consumption new version will control size of stack and will operate with map of little size because i want delete old style vacuum(now ifО©╫maintenance_work_mem less than needed to build info map we use old-style vacuum with logical order).</div>

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Костя Кузнецов (#4)
Re: New gist vacuum.

Костя Кузнецов wrote:

<div>Thank you, Jeff.</div><div>I reworking patch now. All // warning will be deleted.</div><div>About memory consumption new version will control size of stack and will operate with map of little size because i want delete old style vacuum(now if maintenance_work_mem less than needed to build info map we use old-style vacuum with logical order).</div>

You never got around to submitting the updated version of this patch,
and it's been a long time now, so I'm marking it as returned with
feedback for now. Please do submit a new version once you have it,
since this seems to be very useful.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andrey Borodin
amborodin@acm.org
In reply to: Alvaro Herrera (#5)
Re: New gist vacuum.

Hello!

31 янв. 2016 г., в 17:18, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):

Костя Кузнецов wrote:

<div>Thank you, Jeff.</div><div>I reworking patch now. All // warning will be deleted.</div><div>About memory consumption new version will control size of stack and will operate with map of little size because i want delete old style vacuum(now if maintenance_work_mem less than needed to build info map we use old-style vacuum with logical order).</div>

You never got around to submitting the updated version of this patch,
and it's been a long time now, so I'm marking it as returned with
feedback for now. Please do submit a new version once you have it,
since this seems to be very useful.

I've rebased patch (see attachment), it seems to work. It still requires a bit of styling, docs and tests, at least.
Also, I thinks that hash table is not very good option if we have all pages there: we should either use array or do not fill table for every page.

If author and community do not object, I want to continue work on Konstantin's patch.

Best regards, Andrey Borodin.

Attachments:

0001-GiST-VACUUM-rebase.patchapplication/octet-stream; name=0001-GiST-VACUUM-rebase.patch; x-unix-mode=0644Download+923-32
#7Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#6)
Re: New gist vacuum.

Hi hackers!

Here is the patch that deletes pages during GiST VACUUM.

12 нояб. 2017 г., в 23:20, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):

If author and community do not object, I want to continue work on Konstantin's patch.

==Purpose==
Long story short, some time ago Konstantin Kuznetsov hacked out a patch that added GiST scan with physical order of scan.
This scan is using a lot of memory to build map of whole GiST graph. If there is not enough maintenance memory, patch had the fallback to old GiST VACUUM.
New behavior was deleting pages while old (still used) was not.

I've rebased patch, fixed some bugs and decided that it solves too much in a single step.

Here is the patch, which adds functionality of GiST page deletes.
If this is committed, porting physical scan code will be much easier.

==What is changed==
When GiST VACUUM scans graph for removed tuples, it remembers internal pages that are referencing completely empty leaf pages.
Then in additional step, these pages are rescanned to delete references and mark leaf pages as free.

==Limitations==
At least one reference on each internal pages is left undeleted to preserve balancing of the tree.
Pages that has FOLLOW-RIGHT flag also are not deleted, even if empty.

Thank you for your attention, any thoughts are welcome.

Best regards, Andrey Borodin.

Attachments:

0001-Delete-pages-during-GiST-VACCUM.patchapplication/octet-stream; name=0001-Delete-pages-during-GiST-VACCUM.patch; x-unix-mode=0644Download+201-21
#8Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#7)
Re: New gist vacuum.

Hi hackers!

19 дек. 2017 г., в 15:58, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Here is the patch that deletes pages during GiST VACUUM.

Here is new version of the patch for GiST VACUUM.
There are two main changes:
1. During rescan for page deletion only know to be recently empty pages are rescanned.
2. I've re-implemented physical scan with array instead of hash table.

Thanks!
Merry Christmas and happy New Year.

Best regards, Andrey Borodin.

Attachments:

0001-Delete-pages-during-GiST-VACCUM.patchapplication/octet-stream; name=0001-Delete-pages-during-GiST-VACCUM.patch; x-unix-mode=0644Download+203-21
0002-Physical-GiST-scan.patchapplication/octet-stream; name=0002-Physical-GiST-scan.patch; x-unix-mode=0644Download+278-43
#9Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#8)
Re: New gist vacuum.

Hi!

28 дек. 2017 г., в 16:37, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Here is new version of the patch for GiST VACUUM.
There are two main changes:
1. During rescan for page deletion only know to be recently empty pages are rescanned.
2. I've re-implemented physical scan with array instead of hash table.

There is one more minor spot in GiST VACUUM. It takes heap tuples count for statistics for partial indexes, while it should not.

If gistvacuumcleanup() is not given a statistics gathered by gistbulkdelete() it returns incorrect tuples count for partial index.
Here's the micropatch, which fixes that corner case.
To reproduce this effect I used this query:
create table y as select cube(random()) c from generate_series(1,10000) y; create index on y using gist(c) where c~>1 > 0.5;
vacuum verbose y;
Before patch it will report 10000 tuples, with patch it will report different values around 5000.

I do not know, should I register separate commitfest entry? The code is very close to main GiST VACUUM patch, but solves a bit different problem.

Best regards, Andrey Borodin.

Attachments:

0001-Count-tuples-correctly-during-GiST-VACUUM-of-partial.patchapplication/octet-stream; name=0001-Count-tuples-correctly-during-GiST-VACUUM-of-partial.patch; x-unix-mode=0644Download+23-7
#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#9)
Re: New gist vacuum.

Hi!

On Sat, Dec 30, 2017 at 12:18 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

28 дек. 2017 г., в 16:37, Andrey Borodin <x4mmm@yandex-team.ru>

написал(а):

Here is new version of the patch for GiST VACUUM.
There are two main changes:
1. During rescan for page deletion only know to be recently empty pages

are rescanned.

2. I've re-implemented physical scan with array instead of hash table.

There is one more minor spot in GiST VACUUM. It takes heap tuples count
for statistics for partial indexes, while it should not.

If gistvacuumcleanup() is not given a statistics gathered by
gistbulkdelete() it returns incorrect tuples count for partial index.
Here's the micropatch, which fixes that corner case.
To reproduce this effect I used this query:
create table y as select cube(random()) c from generate_series(1,10000) y;
create index on y using gist(c) where c~>1 > 0.5;
vacuum verbose y;
Before patch it will report 10000 tuples, with patch it will report
different values around 5000.

It's very good that you've fixed that.

I do not know, should I register separate commitfest entry? The code is

very close to main GiST VACUUM patch, but solves a bit different problem.

Yes, I think it deserves separate commitfest entry. Despite it's related
to GiST VACUUM, it's a separate fix.
I've made small improvements to this patch: variable naming, formatting,
comments.
BTW, do we really need to set shouldCount depending on whether we receive
stats argument or not? What if we always set shouldCount as in the first
branch of "if"?

shouldCount = !heap_attisnull(rel->rd_indextuple, Anum_pg_index_indpred) ||
info->estimated_count;

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Count-tuples-correctly-during-GiST-VACUUM-of-partial-2.patchapplication/octet-stream; name=0001-Count-tuples-correctly-during-GiST-VACUUM-of-partial-2.patchDownload+28-6
#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#8)
Re: New gist vacuum.

Hi!

On Thu, Dec 28, 2017 at 2:37 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

19 дек. 2017 г., в 15:58, Andrey Borodin <x4mmm@yandex-team.ru>

написал(а):

Here is the patch that deletes pages during GiST VACUUM.

Here is new version of the patch for GiST VACUUM.
There are two main changes:
1. During rescan for page deletion only know to be recently empty pages
are rescanned.
2. I've re-implemented physical scan with array instead of hash table.

I'm very glad this patch isn't forgotten. I've assigned to review this
patch.
My first note on this patchset is following. These patches touches
sensitive aspects or GiST and are related to complex concurrency issues.
Thus, I'm sure both of them deserve high-level description in
src/backend/access/gist/README. Given this description, it would be much
easier to review the patches.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#12Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#10)
Re: New gist vacuum.

Hi, Alexander!

Many thanks for looking into patches!
A little bit later I will provide answer in other branch of discussion.

15 янв. 2018 г., в 23:34, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):

I do not know, should I register separate commitfest entry? The code is very close to main GiST VACUUM patch, but solves a bit different problem.

Yes, I think it deserves separate commitfest entry. Despite it's related to GiST VACUUM, it's a separate fix.

Ok, done. https://commitfest.postgresql.org/17/1483

I've made small improvements to this patch: variable naming, formatting, comments.

Great, thanks!

BTW, do we really need to set shouldCount depending on whether we receive stats argument or not? What if we always set shouldCount as in the first branch of "if"?

shouldCount = !heap_attisnull(rel->rd_indextuple, Anum_pg_index_indpred) ||
info->estimated_count;

We do not need to count if we have exact count from heap and this index is not partial. ITSM that it is quite common case.

Best regards, Andrey Borodin.

#13Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#11)
Re: New gist vacuum.

Hi!

15 янв. 2018 г., в 23:42, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):

I'm very glad this patch isn't forgotten. I've assigned to review this patch.

Cool, thanks!

My first note on this patchset is following. These patches touches sensitive aspects or GiST and are related to complex concurrency issues.

Indeed! That is why I've divided patches: first one implements very simple scan algorithm with concurrency and recovery, second replaces simple algorithm with two more tricky algorithms.

Thus, I'm sure both of them deserve high-level description in src/backend/access/gist/README. Given this description, it would be much easier to review the patches.

Yes, that's definitely true. Please find README patch attached.

Best regards, Andrey Borodin.

Attachments:

0003-Update-README-with-info-on-new-GiST-VACUUM.patchapplication/octet-stream; name=0003-Update-README-with-info-on-new-GiST-VACUUM.patch; x-unix-mode=0644Download+35-1
#14Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#13)
Re: New gist vacuum.

Hello, Alexander!

16 янв. 2018 г., в 21:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Please find README patch attached.

Here's v2 version. Same code, but x2 comments. Also fixed important typo in readme BFS->DFS. Feel free to ping me any time with questions.

Best regards, Andrey Borodin.

Attachments:

0002-Physical-GiSC-scan-during-VACUUM-v2.patchapplication/octet-stream; name=0002-Physical-GiSC-scan-during-VACUUM-v2.patch; x-unix-mode=0644Download+297-41
0003-Update-README-with-info-on-new-GiST-VACUUM-v2.patchapplication/octet-stream; name=0003-Update-README-with-info-on-new-GiST-VACUUM-v2.patch; x-unix-mode=0644Download+35-1
0001-Delete-pages-during-GiST-VACUUM-v2.patchapplication/octet-stream; name=0001-Delete-pages-during-GiST-VACUUM-v2.patch; x-unix-mode=0644Download+203-21
#15David Steele
david@pgmasters.net
In reply to: Andrey Borodin (#14)
Re: Re: New gist vacuum.

Hi Andrey,

On 1/21/18 5:34 AM, Andrey Borodin wrote:

Hello, Alexander!

16 янв. 2018 г., в 21:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Please find README patch attached.

Here's v2 version. Same code, but x2 comments. Also fixed important typo in readme BFS->DFS. Feel free to ping me any time with questions.

This patch has not gotten review and does not seem like a good fit for
the last PG11 CF so I am marking it Returned with Feedback.

Regards,
--
-David
david@pgmasters.net

#16Andrey Borodin
amborodin@acm.org
In reply to: David Steele (#15)
Re: New gist vacuum.

Hi, David!

7 февр. 2018 г., в 18:39, David Steele <david@pgmasters.net> написал(а):

Hi Andrey,

On 1/21/18 5:34 AM, Andrey Borodin wrote:

Hello, Alexander!

16 янв. 2018 г., в 21:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Please find README patch attached.

Here's v2 version. Same code, but x2 comments. Also fixed important typo in readme BFS->DFS. Feel free to ping me any time with questions.

This patch has not gotten review and does not seem like a good fit for
the last PG11 CF so I am marking it Returned with Feedback.

Why do you think this patch does not seem good fit for CF?

I've been talking with Alexander just yesterday at PgConf.Russia, and he was going to provide review.

Best regards, Andrey Borodin.

#17David Steele
david@pgmasters.net
In reply to: Andrey Borodin (#16)
Re: New gist vacuum.

Hi Andrey,

On 2/7/18 10:46 AM, Andrey Borodin wrote:

7 февр. 2018 г., в 18:39, David Steele <david@pgmasters.net> написал(а):

Hi Andrey,

On 1/21/18 5:34 AM, Andrey Borodin wrote:

Hello, Alexander!

16 янв. 2018 г., в 21:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
Please find README patch attached.

Here's v2 version. Same code, but x2 comments. Also fixed important typo in readme BFS->DFS. Feel free to ping me any time with questions.

This patch has not gotten review and does not seem like a good fit for
the last PG11 CF so I am marking it Returned with Feedback.

Why do you think this patch does not seem good fit for CF?

Apologies for the brevity. I had about 40 patches to go through yesterday.

The reason it does not seem a good fit is that it's a new, possibly
invasive patch that has not gotten any review in the last three CFs
since it was reintroduced. I'm not sure why that's the case and I have
no opinion about the patch itself, but there it is.

We try to avoid new patches in the last CF that could be destabilizing
and this patch appears to be in that category. I know it has been
around for a while, but the lack of review makes it "new" in the context
of the last CF for PG11.

I've been talking with Alexander just yesterday at PgConf.Russia, and he was going to provide review.

Great! I'd suggest you submit this patch for the CF after 2018-03.

However, that's completely up to you.

Regards,
--
-David
david@pgmasters.net

#18Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Alexander Korotkov (#10)
Re: New gist vacuum.

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Hello.

I have added small change to patch to allow it be compiled using msvc (uint64_t -> uint64).
Everything seems to work, check-world is passing.

Actually patch fixes two issues:
1) Partial GIST indexes now have corrent tuples count estimation.
2) Now subsequent calls to VACUUM on GIST index (like "VACCUM table_name") do not change tuples count to estimated number of tuples in table (which is changed even without any updates in table due current implementation).

I think it is fine to commit.

Patch is also availble on github:
https://github.com/michail-nikolaev/postgres/commit/ff5171b586e4eb60ea5d15a18055d8ea4e44d244?ts=4

I'll attach patch file next message.

The new status of this patch is: Ready for Committer

#19Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Michail Nikolaev (#18)
Re: New gist vacuum.

I'll attach patch file next message.

Updated patch is attached.

Attachments:

gist-vacuum-count.patchapplication/octet-stream; name=gist-vacuum-count.patchDownload+34-18
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michail Nikolaev (#18)
Re: New gist vacuum.

Michail Nikolaev <michail.nikolaev@gmail.com> writes:

I have added small change to patch to allow it be compiled using msvc (uint64_t -> uint64).
Everything seems to work, check-world is passing.

Actually patch fixes two issues:
1) Partial GIST indexes now have corrent tuples count estimation.
2) Now subsequent calls to VACUUM on GIST index (like "VACCUM table_name") do not change tuples count to estimated number of tuples in table (which is changed even without any updates in table due current implementation).

I think it is fine to commit.

I took a quick look at this. I wonder what is the point of making
the counting conditional. Since the function is visiting every
index page anyway, why not just always count and unconditionally
provide an exact answer? The number of cycles saved by skipping
"tuplesCount += PageGetMaxOffsetNumber(page)" on each leaf page
is surely trivial.

regards, tom lane

#21Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Borodin (#21)
#23Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#22)