GSoC 2015 proposal: Support for microvacuum for GiST

Started by Ilia Ivanickialmost 11 years ago3 messages
#1Ilia Ivanicki
ivanicki.ilia@gmail.com

Hi, hackers!

I want to show my proposal idea for GSoC 2015. I'm newbie to coding for
PostgreSQL.

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120

***
*Abstract:*
Currently support for microvacuum is implemented only for BTree index. But
GiST index is so useful and widely used for user defined datatypes instead
of btree. During index search it reads page by page. Every tuple on the
page in buffer marked as "dead" if it doesn't visible for all transactions.
Whenever before receiving next page we check "dead" items and mark current
page as "has garbage"[1]. When the page gets full, all the killed items are
removed by calling microvacuum[2].

*Benefits **to **the **PostgreSQL **Community*

The improvement can reduce handover time during execution VACUUM. It will
be useful for high-loaded system, where PostgreSQL is used.

*Quantifiable results*

Reducing VACUUM run time and INSERT run time for GiST.

*Project **details *

I'm going to implement support for microvacuum for GiST as well as it was
implemented for BTree access method, just taking into account specificity
of GiST.

During IndexScan we get pages from GiST index and download elected page one
by one into buffer. Every item from buffering page is checked for "dead".
If item really is "dead", we write item's adress in structure BTScanOpaque
<http://doxygen.postgresql.org/structBTScanOpaqueData.html&gt; in function
btgettuple(). Before receiving next pafe into buffer it is started
_bt_killitems()
<http://doxygen.postgresql.org/nbtutils_8c.html#a60f25ce314f5395e6f6ae44ccbae8859&gt;,
which checked "dead" tuples with function ItemPointerEquals
<http://doxygen.postgresql.org/itemptr_8c.html#ad919b8efe8c466023017a83955157d6b&gt;
(). If page contains at least one "dead" tuple, it's marked:
opaque->btpo_flags |= BTP_HAS_GARBAGE
<http://doxygen.postgresql.org/nbtree_8h.html#a3b7c77849276ff8617edc1f84441c230&gt;
;
MarkBufferDirtyHint
<http://doxygen.postgresql.org/bufmgr_8c.html#ac40bc4868e97a49a25dd8be7c98b6773&gt;
(so->currPos
<http://doxygen.postgresql.org/structBTScanOpaqueData.html#a70a715bd5c5db16b699f5449495b0f70&gt;
.buf
<http://doxygen.postgresql.org/structBTScanPosData.html#a26f8687a5a566266e4d4190a4c16a0ef&gt;,
true);

_bt_killitems()
<http://doxygen.postgresql.org/nbtutils_8c.html#a60f25ce314f5395e6f6ae44ccbae8859&gt;
is
called when we want to download next page to buffer or end of IndexScan or
ReScan.

Further, when call chain is called btinsert() -> _bt_doinsert() ->
_bt_findinsertloc(), if the page, which should be carried out insert, is
marked by HAS_GARBAGE flag, then function _bt_vacuum_one_page() is started.
It vacuum just one index page.

I'm going to realize such features for GiST index.

*Project Schedule *

until May 31

Solve architecture questions with help of community.

1 June – 30 June

First, approximate implementation supporting microvacuum for GiST.

I’ve got bachelor's degree in this month so I haven’t much time to work on
project.

1 July – 31 July

Implementation of supporting microvacuum for GiST and testing.

1 August -15 August

Final refactoring, testing and committing.

*About myself*

I'm last year student at Moscow Engineering and Physical Institute at
department "Cybernetics".

*Links *

1.
http://doxygen.postgresql.org/nbtutils_8c.html#a60f25ce314f5395e6f6ae44ccbae8859
2.
http://doxygen.postgresql.org/nbtinsert_8c.html#a89450d93d20d3d5e2d1e68849b69ee32
3. https://wiki.postgresql.org/wiki/GSoC_2015#Core

_______________________________________________________________________________

Best wishes,

Ivanitskiy Ilya.
<https://slovari.yandex.ru/newbie/en-ru&gt;

#2Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Ilia Ivanicki (#1)
Re: GSoC 2015 proposal: Support for microvacuum for GiST

On 03/26/2015 11:37 PM, Ilia Ivanicki wrote:

*Abstract:*
Currently support for microvacuum is implemented only for BTree index. But
GiST index is so useful and widely used for user defined datatypes instead
of btree. During index search it reads page by page. Every tuple on the
page in buffer marked as "dead" if it doesn't visible for all transactions.
Whenever before receiving next page we check "dead" items and mark current
page as "has garbage"[1]. When the page gets full, all the killed items are
removed by calling microvacuum[2].

Seems reasonable. Should be a pretty straightforward to implement.

*Project Schedule *

until May 31

Solve architecture questions with help of community.

1 June – 30 June

First, approximate implementation supporting microvacuum for GiST.

I’ve got bachelor's degree in this month so I haven’t much time to work on
project.

1 July – 31 July

Implementation of supporting microvacuum for GiST and testing.

1 August -15 August

Final refactoring, testing and committing.

GSoC should be treated as a full-time job, that's how much time you're
expected to dedicate to it. Having bachelor's degree exams in June would
be a serious problem. You'll need to discuss with the potential mentors
on how to make up for that time.

Other than that, the schedule seems fairly relaxed. In fact, this
project seems a bit too small for a GSoC project. I'd suggest coming up
with some additional GiST-related work that you could do, in addition to
the microvacuum thing. Otherwise I think there's a risk that you finish
the patch in May, and have nothing to do for the rest of the summer.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ilia Ivanicki
ivanicki.ilia@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: GSoC 2015 proposal: Support for microvacuum for GiST

GSoC should be treated as a full-time job, that's how much time you're

expected to dedicate to it. Having bachelor's degree exams in June would be
a serious problem. You'll need to discuss with the potential mentors on how
to make up for that time.

My bachelor's diploma is almost done and I will have enough time for GSoC
work.

Other than that, the schedule seems fairly relaxed. In fact, this project
seems a bit too small for a GSoC project. I'd suggest coming up with some
additional GiST-related work that you could do, in addition to the
microvacuum thing. Otherwise I think there's a risk that you finish the
patch in May, and have nothing to do for the rest of the summer.

I want to take additional work-item for gsoc 2015.

I don't known, which item of todo is completed, but I compose list of
items:

1) add support for microvacuum for GIN index in common with Anastasiya
Lubennikova (she will be realize function amgettuple in GIN), if it's a
possible feature.

2) bug with Index on inet changes query result (
/messages/by-id/201010112055.o9BKtZf7011251@wwwmaster.postgresql.org
)

3) Teach GIN cost estimation about "fast scans"(I know very little about
GIN, but discussion in mailing list was interesting for me)

4) pg_restore unusable for expensive matviews (
/messages/by-id/20140820021530.2534.43156@wrigleys.postgresql.org
)

5) may be community can suggest me such thing with GiST or microvacuum for
GiST will be usefull for all.

Best wishes,
Ivanitskiy Ilya.